[
  {
    "path": ".dockerignore",
    "content": "src/.deps/\nsrc/.libs/\nsrc/Makefile\nsrc/Makefile.in\nsrc/aclocal.m4\nsrc/autom4te.cache/\nsrc/compile\nsrc/config.guess\nsrc/config.log\nsrc/config.status\nsrc/config.sub\nsrc/configure\nsrc/depcomp\nsrc/gitversion.h\nsrc/install-sh\nsrc/libtool\nsrc/ltmain.sh\nsrc/m4/libtool.m4\nsrc/m4/ltoptions.m4\nsrc/m4/ltsugar.m4\nsrc/m4/ltversion.m4\nsrc/m4/lt~obsolete.m4\nsrc/missing\nsrc/ppcg\nsrc/test-driver\nsrc/build\nsrc/opencl_test.sh\nsrc/polybench_test.sh\nsrc/.nfs*\nsrc/*.o\nsrc/.vscode\nsrc/autosa\nsrc/tags\n\nautosa\nautosa.tmp\n.nfs*\n"
  },
  {
    "path": ".gitignore",
    "content": "src/.deps/\nsrc/.libs/\nsrc/Makefile\nsrc/Makefile.in\nsrc/aclocal.m4\nsrc/autom4te.cache/\nsrc/compile\nsrc/config.guess\nsrc/config.log\nsrc/config.status\nsrc/config.sub\nsrc/configure\nsrc/depcomp\nsrc/gitversion.h\nsrc/install-sh\nsrc/libtool\nsrc/ltmain.sh\nsrc/m4/libtool.m4\nsrc/m4/ltoptions.m4\nsrc/m4/ltsugar.m4\nsrc/m4/ltversion.m4\nsrc/m4/lt~obsolete.m4\nsrc/missing\nsrc/ppcg\nsrc/test-driver\nsrc/build\nsrc/opencl_test.sh\nsrc/polybench_test.sh\nsrc/.nfs*\nsrc/*.o\nsrc/.vscode\nsrc/autosa\nsrc/tags\n\nautosa\nautosa.tmp\n.nfs*\n.vscode\n.libs\nautosa_scripts/__pycache__\ndocs/_build\nautosa_scripts/tuner/__pycache__\nautosa_scripts/tuner/outdir\n\nautosa_scripts/odyssey/db/*\nautosa_scripts/odyssey/outdir/*\nautosa_scripts/odyssey/__pycache__\nautosa_scripts/odyssey/tmp/*\nautosa_scripts/odyssey/solver/*\nautosa_scripts/odyssey/designs/register\n"
  },
  {
    "path": ".gitmodules",
    "content": "[submodule \"src/isl\"]\n\tpath = src/isl\n\turl = git://repo.or.cz/isl.git\n[submodule \"src/pet\"]\n\tpath = src/pet\n\turl = git://repo.or.cz/pet.git\n[submodule \"src/cJSON\"]\n\tpath = src/cJSON\n\turl = https://github.com/DaveGamble/cJSON.git\n[submodule \"src/barvinok\"]\n\tpath = src/barvinok\n\turl = https://repo.or.cz/barvinok.git\n"
  },
  {
    "path": "ChangeLog",
    "content": "version: 0.01\n2020-5-10 Jie Wang <jiewang@cs.ucla.edu>\nchanges:\n  - initial release of AutoSA\n"
  },
  {
    "path": "Dockerfile",
    "content": "# Get the base Ubuntu image from Docker Hub\nFROM ubuntu:latest\nLABEL maintainer=\"jiewang@cs.ucla.edu\"\nENV DEBIAN_FRONTEND=noninteractive \n\n# Update apps on the base image\nRUN apt-get -y update && apt-get install -y\n\n# Install the prerequisites\nRUN apt-get -y install apt-utils automake autoconf libtool libtool-bin pkg-config libgmp3-dev libyaml-dev python3.6 python3-pip git wget cmake vim gdb  \nRUN apt-get -y install libllvm-9-ocaml-dev libllvm9 llvm-9 llvm-9-dev llvm-9-doc llvm-9-examples llvm-9-runtime clang-9 clang-tools-9 clang-9-doc libclang-common-9-dev libclang-9-dev libclang1-9 clang-format-9 python-clang-9 clangd-9\nRUN ln -s /usr/bin/llvm-config-9 /usr/bin/llvm-config\n\n# Install NTL for barvinok\nRUN mkdir /ntl\nWORKDIR /ntl\nRUN wget https://www.shoup.net/ntl/ntl-11.4.3.tar.gz\nRUN gunzip ntl-11.4.3.tar.gz\nRUN tar xf ntl-11.4.3.tar\nWORKDIR /ntl/ntl-11.4.3/src\nRUN ./configure NTL_GMP_LIP=on\nRUN make -j4\nRUN make install\n\n# Copy the current folder to the Docker image\nCOPY . /usr/src/docker_autosa\n\n# Specify the working directory\nWORKDIR /usr/src/docker_autosa\n\n# Install AutoSA\nRUN ./install.sh\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License (MIT)\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of\nthis software and associated documentation files (the \"Software\"), to deal in\nthe Software without restriction, including without limitation the rights to\nuse, copy, modify, merge, publish, distribute, sublicense, and/or sell copies\nof the Software, and to permit persons to whom the Software is furnished to do\nso, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "<div align=\"center\">\n  <img src=\".github/autosa_logo.png\", width=\"200\">\n</div>\n\n# AutoSA: Polyhedral-Based Systolic Array Auto-Compilation\n\n[Documentation](https://autosa.readthedocs.io/en/latest/) |\n[Installation](https://autosa.readthedocs.io/en/latest/installation.html) |\n[Tutorials](https://autosa.readthedocs.io/en/latest/tutorials/index.html) |\n[Examples](https://autosa.readthedocs.io/en/latest/examples/index.html)\n\nThis repository includes the code for AutoSA. AutoSA is an end-to-end systolic array compiler based on the polyhedral model. It takes algorithms in high-level programming languages (C) as inputs, performs polyhedral transformation and other architecture optimizations to map algorithms to systolic array architecture. The generated designs are in HLS C.\n\n## Quick Start\nWe offer a Docker image for quick start.\n```bash\ndocker pull whbldhwj/autosa:latest\n```\n\nLet's try one small example. The input code can be found at `${AUTOSA_ROOT}/autosa_tests/mm/kernel.c`. The code region to be transformed to systolic array is annotated using a pair of pragmas `scop` and `endscop`.\n\n1. Generating HLS C Code.\n\nRun the following command to compile generate a systolic array.\n```c\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize\n```\nThe generated code can be found in `${AUTOSA_ROOT}/autosa.tmp/output/src/`.\nFor detailed explaination of each AutoSA compilation option, please run\n```c\n./autosa --help\n```\nor refer to [AutoSA Compilation Options](https://autosa.readthedocs.io/en/latest/tutorials/getting_started.html#autosa-compilation-options).\n\n2. Generating FPGA Bitstream\n\nTo generate the final bitsteam, set up your local Vitis development kit first.\nThen execute the makefile to build the design.\n```\ncp ${AUTOSA_ROOT}/autosa_tests/mm/Makefile autosa.tmp/output/\ncp ${AUTOSA_ROOT}/autosa_tests/mm/connectivity.cfg autosa.tmp/output/\ncd ${AUTOSA_ROOT}/autosa.tmp/output\nmake all\n```\n**Makefile Options Descriptions**\n\n* `MODE := hw_emu`: Set the build configuration mode to HW Emulation, other modes: sw_emu|hw\n* `PLATFORM := xilinx_u250_xdma_201830_2`: Select the target platform\n* `KERNEL_SRC := src/kernel_kernel.cpp`: List the kernel source files\n* `HOST_SRC := src/kernel_host.cpp`: List the host source files\n\nThe `connectivity.cfg` describes the DRAM port mapping. For more details about how to change the DRAM port mapping, please refer to the Xilinx tutorials.\n\n3. Verifying Designs Using Xilinx HLS\n\nAutoSA also supports generate HLS projects. Add the flag\n```\n--hls\n```\nto the command when compiling the program.\n\n```c\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls\n```\n\nAutoSA will generate an HLS host file `${AUTOSA_ROOT}/autosa.tmp/output/src/kernel_host.cpp` instead of the OpenCL host file generated in the previous step. To build the HLS project, run the following commands.\n```\ncp ${AUTOSA_ROOT}/autosa_scripts/hls_scripts/hls_script.tcl autosa.tmp/output/\ncd ${AUTOSA_ROOT}/autosa.tmp/output\nvivado_hls -f hls_script.tcl\n```\n\nFor more detailed instructions on using AutoSA, please refer to the [AutoSA Documentation](https://autosa.readthedocs.io/en/latest/).\n\n## Send Us Failure Cases and Feedback!\nAutoSA is open source for research purposes, and we would like to continously improve it! Please let us know if...\n\n1. you find any bug in the AutoSA code.\n2. you find any application that fails the compilation flow of AutoSA.\n3. you know how to further help improve any part of the compiler.\n4. etc.\n\n## Authors and Contributors\nAutoSA is currently maintained by [Jie Wang](http://cadlab.cs.ucla.edu/~jaywang/).\nBesides, we gratefully acknowledge the authors of [PPCG](https://github.com/Meinersbur/ppcg) for developing and actively maintaining PPCG as an open-source project.\n\n## Papers\nMore implementation details of AutoSA are covered in [our paper](http://cadlab.cs.ucla.edu/~jaywang/papers/fpga21-autosa.pdf). If you find this project useful in your research, please consider citing:\n\n    @inproceedings{wang2021autosa,\n      title={AutoSA: A Polyhedral Compiler for High-Performance Systolic Arrays on FPGA},\n      author={Wang, Jie and Guo, Licheng and Cong, Jason},\n      booktitle={Proceedings of the 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays},\n      year={2021}\n    }\n"
  },
  {
    "path": "autosa_config/autosa_config.json",
    "content": "{\n    \"space_time\": {\n        \"mode\": \"manual\"\n    },\n    \"array_part\": {\n        \"enable\": 1,\n        \"mode\": \"manual\"\n    },\n    \"array_part_L2\": {\n        \"enable\": 1,\n        \"mode\": \"manual\"\n    },\n    \"latency\": {\n        \"enable\": 1,\n        \"mode\": \"manual\"\n    },\n    \"simd\": {\n        \"enable\": 1,\n        \"mode\": \"manual\"\n    },\n    \"hbm\": {\n        \"mode\": \"manual\"\n    }\n}\n"
  },
  {
    "path": "autosa_config/hw_info.json",
    "content": "{\n  \"BRAM18K\": 5376,\n  \"DSP\": 12288,\n  \"FF\": 3456000,\n  \"LUT\": 1728000,\n  \"URAM\": 1280\n}\n"
  },
  {
    "path": "autosa_config/hw_info_libs/hw_info.json.ku3",
    "content": "{\n  \"BRAM\": 2160,\n  \"DSP\": 2760,\n  \"FF\": 663360,\n  \"LUT\": 331680,\n  \"URAM\": 0\n}\n"
  },
  {
    "path": "autosa_config/hw_info_libs/hw_info.json.u200",
    "content": "{\n  \"BRAM\": 4320,\n  \"DSP\": 6840,\n  \"FF\": 2364480,\n  \"LUT\": 1182240,\n  \"URAM\": 960\n}\n"
  },
  {
    "path": "autosa_config/hw_info_libs/hw_info.json.u250",
    "content": "{\n  \"BRAM18K\": 5376,\n  \"DSP\": 12288,\n  \"FF\": 3456000,\n  \"LUT\": 1728000,\n  \"URAM\": 1280\n}\n"
  },
  {
    "path": "autosa_config/module_group.json",
    "content": "{\n  \"x\": 8,\n  \"y\": 1\n}\n"
  },
  {
    "path": "autosa_config/optimizer_settings.json",
    "content": "{\n    \"training\": {\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 4\n            }\n        },\n        \"pruning\": {\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    32,\n                    256\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ],\n                \"PE_ratio\": 2\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 1\n        }\n    },\n    \"synth\": {\n        \"multiprocess\": {\n            \"n_job\": 16\n        },\n        \"sample\": {\n            \"n\": 16\n        }\n    },\n    \"search\": {\n        \"metric\": \"latency\",\n        \"cycle_period\": 5,\n        \"mode\": \"customized\",\n        \"n_random\": 5,\n        \"log\": {\n            \"n_record\": 10\n        },\n        \"resource_target\": [\"BRAM18K\", \"DSP\"],\n        \"time_out\": -1,\n        \"update_time_interval\": 2,        \n        \"pruning\": {\n            \"random_start\": {\n                \"enable\": 1,\n                \"n_trial\": 3,\n                \"n_random\": 3\n            },\n            \"resource\": {                \n                \"range\": {\n                    \"FF\": [\n                        0.25,\n                        0.7\n                    ],\n                    \"LUT\": [\n                        0.3,\n                        0.75\n                    ],\n                    \"DSP\": [\n                        0.6,\n                        0.75\n                    ],\n                    \"BRAM18K\": [\n                        0.3,\n                        0.7\n                    ],\n                    \"URAM\": [\n                        0,\n                        0.6\n                    ]\n                }\n            },\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    190,\n                    210\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    64,\n                    1280\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    190,\n                    210\n                ],\n                \"PE_ratio\": 3\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 32\n        },\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 8\n            }\n        }\n    }\n}\n"
  },
  {
    "path": "autosa_config/optimizer_settings_libs/gemm3_fp32.json",
    "content": "{\n    \"training\": {\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 8\n            }\n        },\n        \"pruning\": {\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    16,\n                    256\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ],\n                \"PE_ratio\": 2\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 1\n        }\n    },\n    \"synth\": {\n        \"multiprocess\": {\n            \"n_job\": 16\n        },\n        \"sample\": {\n            \"n\": 16\n        }\n    },\n    \"search\": {\n        \"metric\": \"latency\",\n        \"cycle_period\": 5,\n        \"mode\": \"customized\",\n        \"n_random\": 5,\n        \"log\": {\n            \"n_record\": 10\n        },\n        \"resource_target\": [\"BRAM18K\", \"DSP\"],\n        \"time_out\": -1,\n        \"update_time_interval\": 2,        \n        \"pruning\": {\n            \"random_start\": {\n                \"enable\": 1,\n                \"n_trial\": 3,\n                \"n_random\": 3\n            },\n            \"resource\": {                \n                \"range\": {\n                    \"FF\": [\n                        0.25,\n                        0.7\n                    ],\n                    \"LUT\": [\n                        0.3,\n                        0.75\n                    ],\n                    \"DSP\": [\n                        0.6,\n                        0.75\n                    ],\n                    \"BRAM18K\": [\n                        0.3,\n                        0.7\n                    ],\n                    \"URAM\": [\n                        0,\n                        0.6\n                    ]\n                }\n            },\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    190,\n                    210\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    64,\n                    1280\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    190,\n                    210\n                ],\n                \"PE_ratio\": 3\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 32\n        },\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 8\n            }\n        }\n    }\n}"
  },
  {
    "path": "autosa_config/optimizer_settings_libs/gemm3_int16.json",
    "content": "{\n    \"training\": {\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 16\n            }\n        },\n        \"pruning\": {\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    64,\n                    256\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ],\n                \"PE_ratio\": 2\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 1\n        }\n    },\n    \"synth\": {\n        \"multiprocess\": {\n            \"n_job\": 16\n        },\n        \"sample\": {\n            \"n\": 16\n        }\n    },\n    \"search\": {\n        \"metric\": \"latency\",\n        \"cycle_period\": 5,\n        \"mode\": \"customized\",\n        \"n_random\": 5,\n        \"log\": {\n            \"n_record\": 10\n        },\n        \"resource_target\": [\"BRAM18K\", \"DSP\"],\n        \"time_out\": -1,\n        \"update_time_interval\": 2,        \n        \"pruning\": {\n            \"random_start\": {\n                \"enable\": 1,\n                \"n_trial\": 3,\n                \"n_random\": 3\n            },\n            \"resource\": {                \n                \"range\": {\n                    \"FF\": [\n                        0.3,\n                        0.7\n                    ],\n                    \"LUT\": [\n                        0.3,\n                        0.7\n                    ],\n                    \"DSP\": [\n                        0.6,\n                        0.75\n                    ],\n                    \"BRAM18K\": [\n                        0.2,\n                        0.7\n                    ],\n                    \"URAM\": [\n                        0,\n                        0.6\n                    ]\n                }\n            },\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    480,\n                    640\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    64,\n                    1280\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    480,\n                    640\n                ],\n                \"PE_ratio\": 3\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 32\n        },\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 16\n            }\n        }\n    }\n}"
  },
  {
    "path": "autosa_config/optimizer_settings_libs/gemm3_int16_32.json",
    "content": "{\n    \"training\": {\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 8\n            }\n        },\n        \"pruning\": {\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    16,\n                    256\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ],\n                \"PE_ratio\": 2\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 1\n        }\n    },\n    \"synth\": {\n        \"multiprocess\": {\n            \"n_job\": 16\n        },\n        \"sample\": {\n            \"n\": 16\n        }\n    },\n    \"search\": {\n        \"metric\": \"latency\",\n        \"cycle_period\": 5,\n        \"mode\": \"customized\",\n        \"n_random\": 5,\n        \"log\": {\n            \"n_record\": 10\n        },\n        \"resource_target\": [\"DSP\"],\n        \"time_out\": -1,\n        \"update_time_interval\": 2,        \n        \"pruning\": {\n            \"random_start\": {\n                \"enable\": 1,\n                \"n_trial\": 3,\n                \"n_random\": 3\n            },\n            \"resource\": {                \n                \"range\": {\n                    \"FF\": [\n                        0.25,\n                        0.7\n                    ],\n                    \"LUT\": [\n                        0.3,\n                        0.75\n                    ],\n                    \"DSP\": [\n                        0.5,\n                        0.7\n                    ],\n                    \"BRAM18K\": [\n                        0.3,\n                        0.7\n                    ],\n                    \"URAM\": [\n                        0,\n                        0.6\n                    ]\n                }\n            },\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    200,\n                    300\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    64,\n                    1024\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    200,\n                    300\n                ],\n                \"PE_ratio\": 3\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 32\n        },\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 32\n            }\n        }\n    }\n}"
  },
  {
    "path": "autosa_config/optimizer_settings_libs/gemm3_int8.json",
    "content": "{\n    \"training\": {\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 8\n            }\n        },\n        \"pruning\": {\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    16,\n                    256\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ],\n                \"PE_ratio\": 2\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 1\n        }\n    },\n    \"synth\": {\n        \"multiprocess\": {\n            \"n_job\": 16\n        },\n        \"sample\": {\n            \"n\": 16\n        }\n    },\n    \"search\": {\n        \"metric\": \"latency\",\n        \"cycle_period\": 5,\n        \"mode\": \"customized\",\n        \"n_random\": 5,\n        \"log\": {\n            \"n_record\": 10\n        },\n        \"resource_target\": [\"BRAM18K\", \"DSP\"],\n        \"time_out\": -1,\n        \"update_time_interval\": 2,        \n        \"pruning\": {\n            \"random_start\": {\n                \"enable\": 1,\n                \"n_trial\": 3,\n                \"n_random\": 3\n            },\n            \"resource\": {                \n                \"range\": {\n                    \"FF\": [\n                        0.25,\n                        0.7\n                    ],\n                    \"LUT\": [\n                        0.3,\n                        0.75\n                    ],\n                    \"DSP\": [\n                        0.5,\n                        0.7\n                    ],\n                    \"BRAM18K\": [\n                        0.3,\n                        0.75\n                    ],\n                    \"URAM\": [\n                        0,\n                        0.6\n                    ]\n                }\n            },\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    350,\n                    450\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    400,\n                    1500\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    350,\n                    450\n                ],\n                \"PE_ratio\": 3\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 32\n        },\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 32\n            }\n        }\n    }\n}"
  },
  {
    "path": "autosa_config/optimizer_settings_libs/gemm3_int8_64.json",
    "content": "{\n    \"training\": {\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 8\n            }\n        },\n        \"pruning\": {\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    16,\n                    256\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ],\n                \"PE_ratio\": 2\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 1\n        }\n    },\n    \"synth\": {\n        \"multiprocess\": {\n            \"n_job\": 16\n        },\n        \"sample\": {\n            \"n\": 16\n        }\n    },\n    \"search\": {\n        \"metric\": \"latency\",\n        \"cycle_period\": 5,\n        \"mode\": \"customized\",\n        \"n_random\": 5,\n        \"log\": {\n            \"n_record\": 10\n        },\n        \"resource_target\": [\"DSP\"],\n        \"time_out\": -1,\n        \"update_time_interval\": 2,        \n        \"pruning\": {\n            \"random_start\": {\n                \"enable\": 1,\n                \"n_trial\": 3,\n                \"n_random\": 3\n            },\n            \"resource\": {                \n                \"range\": {\n                    \"FF\": [\n                        0.25,\n                        0.7\n                    ],\n                    \"LUT\": [\n                        0.3,\n                        0.75\n                    ],\n                    \"DSP\": [\n                        0.5,\n                        0.7\n                    ],\n                    \"BRAM18K\": [\n                        0.3,\n                        0.75\n                    ],\n                    \"URAM\": [\n                        0,\n                        0.6\n                    ]\n                }\n            },\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    150,\n                    200\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    256,\n                    1000\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    150,\n                    200\n                ],\n                \"PE_ratio\": 3\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 32\n        },\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 64\n            }\n        }\n    }\n}"
  },
  {
    "path": "autosa_config/optimizer_settings_libs/gemm4_fp32.json",
    "content": "{\n    \"training\": {\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 8\n            }\n        },\n        \"pruning\": {\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    64,\n                    256\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ],\n                \"PE_ratio\": 2\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 1\n        }\n    },\n    \"synth\": {\n        \"multiprocess\": {\n            \"n_job\": 16\n        },\n        \"sample\": {\n            \"n\": 16\n        }\n    },\n    \"search\": {\n        \"metric\": \"latency\",\n        \"cycle_period\": 5,\n        \"mode\": \"customized\",\n        \"n_random\": 5,\n        \"log\": {\n            \"n_record\": 10\n        },\n        \"resource_target\": [\"BRAM18K\", \"DSP\"],\n        \"time_out\": -1,\n        \"update_time_interval\": 2,        \n        \"pruning\": {\n            \"random_start\": {\n                \"enable\": 1,\n                \"n_trial\": 3,\n                \"n_random\": 3\n            },\n            \"resource\": {                \n                \"range\": {\n                    \"FF\": [\n                        0.25,\n                        0.7\n                    ],\n                    \"LUT\": [\n                        0.3,\n                        0.75\n                    ],\n                    \"DSP\": [\n                        0.5,\n                        0.7\n                    ],\n                    \"BRAM18K\": [\n                        0.3,\n                        0.7\n                    ],\n                    \"URAM\": [\n                        0,\n                        0.6\n                    ]\n                }\n            },\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    200,\n                    210\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    64,\n                    512\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    200,\n                    210\n                ],\n                \"PE_ratio\": 3\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 32\n        },\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 32\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 8\n            }\n        }\n    }\n}"
  },
  {
    "path": "autosa_config/optimizer_settings_libs/mm_small.json",
    "content": "{\n    \"training\": {\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 4\n            }\n        },\n        \"pruning\": {\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    32,\n                    256\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ],\n                \"PE_ratio\": 2\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 1\n        }\n    },\n    \"synth\": {\n        \"multiprocess\": {\n            \"n_job\": 16\n        },\n        \"sample\": {\n            \"n\": 16\n        }\n    },\n    \"search\": {\n        \"metric\": \"latency\",\n        \"cycle_period\": 5,\n        \"mode\": \"customized\",\n        \"n_random\": 5,\n        \"log\": {\n            \"n_record\": 10\n        },\n        \"resource_target\": [\"BRAM18K\", \"DSP\"],\n        \"time_out\": -1,\n        \"update_time_interval\": 2,        \n        \"pruning\": {\n            \"random_start\": {\n                \"enable\": 1,\n                \"n_trial\": 3,\n                \"n_random\": 3\n            },\n            \"resource\": {                \n                \"range\": {\n                    \"FF\": [\n                        0.25,\n                        0.7\n                    ],\n                    \"LUT\": [\n                        0.3,\n                        0.75\n                    ],\n                    \"DSP\": [\n                        0.0,\n                        0.5\n                    ],\n                    \"BRAM18K\": [\n                        0.0,\n                        0.5\n                    ],\n                    \"URAM\": [\n                        0,\n                        0.6\n                    ]\n                }\n            },\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    32,\n                    128\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    32,\n                    512\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    32,\n                    128\n                ],\n                \"PE_ratio\": 3\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 32\n        },\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 8\n            }\n        }\n    }\n}\n"
  },
  {
    "path": "autosa_config/optimizer_settings_libs/mttkrp_fp32.json",
    "content": "{\n    \"training\": {\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 8\n            }\n        },\n        \"pruning\": {\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    80,\n                    256\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ],\n                \"PE_ratio\": 2\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 1\n        }\n    },\n    \"synth\": {\n        \"multiprocess\": {\n            \"n_job\": 16\n        },\n        \"sample\": {\n            \"n\": 16\n        }\n    },\n    \"search\": {\n        \"metric\": \"latency\",\n        \"cycle_period\": 5,\n        \"mode\": \"customized\",\n        \"n_random\": 5,\n        \"log\": {\n            \"n_record\": 10\n        },\n        \"resource_target\": [\"BRAM18K\", \"DSP\"],\n        \"time_out\": -1,\n        \"update_time_interval\": 2,        \n        \"pruning\": {\n            \"random_start\": {\n                \"enable\": 1,\n                \"n_trial\": 3,\n                \"n_random\": 3\n            },\n            \"resource\": {                \n                \"range\": {\n                    \"FF\": [\n                        0.25,\n                        0.7\n                    ],\n                    \"LUT\": [\n                        0.3,\n                        0.75\n                    ],\n                    \"DSP\": [\n                        0.6,\n                        0.7\n                    ],\n                    \"BRAM18K\": [\n                        0.2,\n                        0.5\n                    ],\n                    \"URAM\": [\n                        0,\n                        0.6\n                    ]\n                }\n            },\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    120,\n                    130\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    70,\n                    512\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    120,\n                    130\n                ],\n                \"PE_ratio\": 3\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 32\n        },\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 8\n            }\n        }\n    }\n}"
  },
  {
    "path": "autosa_config/optimizer_settings_libs/ttm_fp32.json",
    "content": "{\n    \"training\": {\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 8\n            }\n        },\n        \"pruning\": {\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    64,\n                    256\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ],\n                \"PE_ratio\": 2\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 1\n        }\n    },\n    \"synth\": {\n        \"multiprocess\": {\n            \"n_job\": 16\n        },\n        \"sample\": {\n            \"n\": 16\n        }\n    },\n    \"search\": {\n        \"metric\": \"latency\",\n        \"cycle_period\": 5,\n        \"mode\": \"customized\",\n        \"n_random\": 5,\n        \"log\": {\n            \"n_record\": 10\n        },\n        \"resource_target\": [\"BRAM18K\", \"DSP\"],\n        \"time_out\": 3,\n        \"update_time_interval\": 2,        \n        \"pruning\": {\n            \"random_start\": {\n                \"enable\": 1,\n                \"n_trial\": 3,\n                \"n_random\": 3\n            },\n            \"resource\": {                \n                \"range\": {\n                    \"FF\": [\n                        0.25,\n                        0.7\n                    ],\n                    \"LUT\": [\n                        0.3,\n                        0.75\n                    ],\n                    \"DSP\": [\n                        0.6,\n                        0.7\n                    ],\n                    \"BRAM18K\": [\n                        0.1,\n                        0.5\n                    ],\n                    \"URAM\": [\n                        0,\n                        0.6\n                    ]\n                }\n            },\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    190,\n                    200\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    64,\n                    640\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    190,\n                    200\n                ],\n                \"PE_ratio\": 3\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 32\n        },\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 8\n            }\n        }\n    }\n}"
  },
  {
    "path": "autosa_config/optimizer_settings_libs/ttmc_fp32.json",
    "content": "{\n    \"training\": {\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"random\",\n                \"n\": 2,\n                \"loop_limit\": 8\n            }\n        },\n        \"pruning\": {\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    80,\n                    256\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    8,\n                    32\n                ],\n                \"PE_ratio\": 2\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 1\n        }\n    },\n    \"synth\": {\n        \"multiprocess\": {\n            \"n_job\": 16\n        },\n        \"sample\": {\n            \"n\": 16\n        }\n    },\n    \"search\": {\n        \"metric\": \"latency\",\n        \"cycle_period\": 5,\n        \"mode\": \"customized\",\n        \"n_random\": 5,\n        \"log\": {\n            \"n_record\": 10\n        },\n        \"resource_target\": [\"BRAM18K\", \"DSP\"],\n        \"time_out\": 5,\n        \"update_time_interval\": 2,        \n        \"pruning\": {\n            \"random_start\": {\n                \"enable\": 1,\n                \"n_trial\": 3,\n                \"n_random\": 3\n            },\n            \"resource\": {                \n                \"range\": {\n                    \"FF\": [\n                        0.25,\n                        0.7\n                    ],\n                    \"LUT\": [\n                        0.3,\n                        0.75\n                    ],\n                    \"DSP\": [\n                        0.6,\n                        0.7\n                    ],\n                    \"BRAM18K\": [\n                        0.1,\n                        0.5\n                    ],\n                    \"URAM\": [\n                        0,\n                        0.6\n                    ]\n                }\n            },\n            \"array_part\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    120,\n                    140\n                ]\n            },\n            \"array_part_L2\": {\n                \"enable\": 1\n            },\n            \"latency_hiding\": {\n                \"enable\": 1,\n                \"reg_size\": [\n                    64,\n                    640\n                ]\n            },\n            \"SIMD_vectorization\": {\n                \"enable\": 1,\n                \"PE_num\": [\n                    120,\n                    140\n                ],\n                \"PE_ratio\": 3\n            }\n        },\n        \"multiprocess\": {\n            \"n_job\": 32\n        },\n        \"sample\": {\n            \"space_time\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1\n            },\n            \"array_part\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 8\n            }\n        }\n    }\n}"
  },
  {
    "path": "autosa_scripts/autosa.py",
    "content": "#!/usr/bin/env python3\nimport sys\nimport subprocess\nimport os\nimport time\n\ndef exec_sys_cmd(cmd):\n    p = subprocess.Popen(cmd, shell=True)\n    ret = p.wait()\n    return ret\n\nif __name__ == \"__main__\":\n    # Some default values\n    output_dir = './autosa.tmp/output'\n    target = 'autosa_hls_c'\n    src_file_prefix = 'kernel'\n    xilinx_host = 'opencl'\n    tuning = False\n    isl_flag = '--isl-schedule-whole-component' # This flag forces ISL to perform loop fusion as much as possible\n    hcl = False\n\n    # Parse and update the arguments\n    n_arg = len(sys.argv)\n    argv = sys.argv\n    tuning_idx = -1\n    insert_isl_flag = True\n    assign_loop_permute = False\n    explore_loop_permute = False\n    for i in range(n_arg):\n        arg = argv[i]            \n        if 'output-dir' in arg:\n            output_dir = arg.split('=')[-1]\n        if 'target' in arg:\n            target = arg.split('=')[-1]\n        if 'tuning-method' in arg:            \n            tuning = True\n            tuning_idx = i\n        if 'isl-schedule-whole-component' in arg:\n            insert_isl_flag = False\n        if 'loop-permute-order' in arg:\n            assign_loop_permute = True\n        if 'explore-loop-permute' in arg:\n            explore_loop_permute = True\n    if n_arg > 1:\n        src_file = argv[1]\n        src_file_prefix = os.path.basename(src_file).split('.')[0]\n    if n_arg > 1 and target == 'autosa_hls_c':\n        # Check whether to generate HLS or OpenCL host for Xilinx FPGAs\n        for arg in argv:\n            if '--hls' in arg:\n                xilinx_host = 'hls'\n            if '--hcl' in arg:\n                hcl = True    \n    if n_arg > 1 and target == 'autosa_opencl':\n        for arg in argv:\n            if '--hcl' in arg:\n                hcl = True    \n   \n    # Cache the AutoSA command\n    autosa_cmd = ' '.join(argv)\n    exec_sys_cmd(f'echo \"{autosa_cmd}\" > {output_dir}/src/cmd')\n\n    argv[0] = './src/autosa'\n    if insert_isl_flag:\n        argv.append(isl_flag)\n\n    # Check if the output directory exists\n    if not os.path.isdir(output_dir):\n        raise RuntimeError('Output directory is not specified.')\n\n    # Execute the AutoSA        \n    #start_time = time.perf_counter()\n    complete = False\n    permute_idx = 0\n    while not complete:\n        if permute_idx > 0:\n            argv.append(f'--autosa-loop-permute-order={permute_idx}')\n        process = subprocess.run(argv)\n        if process.returncode != 0:\n            print(\"[AutoSA] Error: Exit abnormally!\")\n            sys.exit(process.returncode)\n        else:        \n            if not os.path.exists(output_dir + '/src/completed'):\n                sys.exit(process.returncode)    \n        exec_sys_cmd(f'rm {output_dir}/src/completed')                   \n        #runtime = time.perf_counter() - start_time\n        #print(f'runtime: {runtime}')\n\n        # Generate the top module\n        print(\"[AutoSA] Post-processing the generated code...\")\n        #start_time = time.perf_counter()\n        if not os.path.exists(f'{output_dir}/src/{src_file_prefix}_top_gen.cpp'):\n            raise RuntimeError(f'{output_dir}/src/{src_file_prefix}_top_gen.cpp not exists.')\n        cmd = 'g++ -o ' + output_dir + '/src/top_gen ' + output_dir + \\\n              '/src/' + src_file_prefix + '_top_gen.cpp ' + \\\n              '-I./src/isl/include -L./src/isl/.libs -lisl'\n        exec_sys_cmd(cmd)\n        my_env = os.environ.copy()\n        cwd = os.getcwd()\n        if 'LD_LIBRARY_PATH' in my_env:\n            my_env['LD_LIBRARY_PATH'] += os.pathsep + cwd + '/src/isl/.libs'\n        else:\n            my_env['LD_LIBRARY_PATH'] = os.pathsep + cwd + '/src/isl/.libs'\n        cmd = output_dir + '/src/top_gen'\n        process = subprocess.run(cmd.split(), env=my_env)\n        #runtime = time.perf_counter() - start_time\n        #print(f'runtime: {runtime}')\n\n        complete = True     \n        if tuning and explore_loop_permute:   \n            for filename in os.listdir(f'{output_dir}'):\n                if filename.startswith(\"permute\"):\n                    if filename.endswith(\"done\"):\n                        complete = True                    \n                    else:\n                        permute_idx = int(filename.split(\"_\")[-1])                        \n                        if assign_loop_permute:\n                            complete = True\n                        else:\n                            complete = False                        \n\n                    os.remove(f'{output_dir}/{filename}')\n                    break            \n\n    if not tuning:\n        # Generate the final code    \n        if target == 'autosa_hls_c' or target == 'autosa_tapa':\n            cmd = './autosa_scripts/codegen.py -c ' + output_dir + \\\n                  '/src/top.cpp -d ' + output_dir + '/src/' + src_file_prefix + \\\n                  '_kernel_modules.cpp -t ' + target + ' -o ' + output_dir + '/src/' + \\\n                  src_file_prefix + '_kernel.cpp'\n            if hcl:\n                cmd += ' --hcl'\n        elif target == 'autosa_opencl':\n            cmd = './autosa_scripts/codegen.py -c ' + output_dir + \\\n                  '/src/top.cpp -d ' + output_dir + '/src/' + src_file_prefix + \\\n                  '_kernel_modules.cl -t ' + target + ' -o ' + output_dir + '/src/' + \\\n                  src_file_prefix + '_kernel.cl'\n            if hcl:\n                cmd += ' --hcl'\n        elif target == 'autosa_catapult_c':\n            cmd = './autosa_scripts/codegen.py -c ' + output_dir + \\\n                  '/src/top.cpp -d ' + output_dir + '/src/' + src_file_prefix + \\\n                  '_kernel_modules.cpp -t ' + target + ' -o ' + output_dir + '/src/' + \\\n                  src_file_prefix + '_kernel_hw.h' + ' --tb ' + output_dir + '/src/' + \\\n                  src_file_prefix + '_host.cpp'\n        if target == 'autosa_hls_c':\n            cmd += ' --host '\n            cmd += xilinx_host\n                    \n        exec_sys_cmd(cmd)            \n\n        # Copy the input code to the output directory           \n        exec_sys_cmd(f'cp {argv[1]} {output_dir}/src/')\n        headers = src_file.split('.')\n        headers[-1] = 'h'\n        headers = \".\".join(headers)\n        if os.path.exists(headers):\n            exec_sys_cmd(f'cp {headers} {output_dir}/src/')        \n\n        # Clean up the temp files        \n        if target == 'autosa_hls_c' and xilinx_host == 'opencl':\n            exec_sys_cmd(f'rm {output_dir}/src/{src_file_prefix}_kernel.h')            \n        exec_sys_cmd(f'rm {output_dir}/src/top_gen')\n        exec_sys_cmd(f'rm {output_dir}/src/top.cpp')\n        exec_sys_cmd(f'rm {output_dir}/src/{src_file_prefix}_top_gen.cpp')    \n        exec_sys_cmd(f'rm {output_dir}/src/{src_file_prefix}_top_gen.h')    \n        if target == 'autosa_hls_c' or target == 'autosa_catapult_c':\n            exec_sys_cmd(f'rm {output_dir}/src/{src_file_prefix}_kernel_modules.cpp')\n        elif target == 'autosa_opencl':\n            exec_sys_cmd(f'rm {output_dir}/src/{src_file_prefix}_kernel_modules.cl')        \n"
  },
  {
    "path": "autosa_scripts/codegen.py",
    "content": "#!/usr/bin/env python3\n\nimport sympy\nimport sys\nimport argparse\nimport re\nimport numpy as np\nimport os\n\ndef delete_arg_from_arg_list(line, arg, content):\n    \"\"\" Delete the argument from the argument list\n\n    Parameters\n    ----------\n    line: list\n        codeline containing the argument list\n    arg: list\n        argument to be deleted\n    line_id: int\n        the current line id\n    content: list\n        the printed content before current line\n    \"\"\"\n    line = line.strip()\n    # print(line)\n    if line[-1] != ',':\n        # print('test\\n')\n        # print(line)\n        # print(content[-1])\n        comma_pos = content[-1].find(',')\n        content[-1] = content[-1][:comma_pos] + '\\n'\n\n    \"\"\"\n    line = re.sub(r'( )(' + re.escape(arg) + r')(,)',\n                  '', line)\n    line = re.sub(r'( )(' + re.escape(arg) + r')(\\))',\n                  r'\\g<3>', line)\n    line = re.sub(r'(\\()(' + re.escape(arg) + r')(, )',\n                  r'\\g<1>', line)\n    line = re.sub(r'(\\()(' + re.escape(arg) + r')(\\))',\n                  r'\\g<1>\\g<3>', line)\n    \"\"\"\n\ndef print_module_def(\n        f,\n        arg_map,\n        module_def,\n        inline_module_defs,\n        def_args,\n        call_args_type):\n    \"\"\" Print out module definitions for Intel OpenCL\n\n    This function prints out the module definition with all arguments in the code\n    replaced by the calling arguments.\n    We will first extract the module ids and fifos from the module definition\n    argument lists. These arguments are deleted from the argument lists as we will\n    plug in the exact module ids and fifos from a call of this modules.\n    As an example, the original module\n      void A_IO_L3_in(int idx, fifo_type fifo)\n    will be modified to\n      void A_IO_L3_in_[arg_map[idx]]()\n\n    Parameters\n    ----------\n    f:\n        file handle\n    arg_map:\n        maps from module definition args to module call args\n    module_def:\n        a list storing the module definition texts\n    inline_module_defs:\n        a dict containing all the inline module definitions\n    def_args:\n        a list storing the module definition arguments\n    call_args_type:\n        a list storing the type of each module call arg\n    \"\"\"\n    # Print inline module definitions\n    if inline_module_defs:\n        # Each inline module should be only printed once.\n        # We assume the module ids and fifos are unchanged in multiple inline module\n        # calls. Therefore, only the first encounter will be handled.\n        inline_module_handled = []\n        for inline_module in inline_module_defs:\n            # Search for the inline modules\n            for line_id in range(len(module_def)):\n                line = module_def[line_id]\n                if line.find(inline_module + '(') != -1:\n                    # The current line contains the inline module call\n                    if inline_module in inline_module_handled:\n                        # Replace the module call\n                        line_indent = line.find(inline_module)\n                        line = ' ' * line_indent + inline_module\n                        for i in range(len(def_args)):\n                            def_arg = def_args[i]\n                            arg_type = call_args_type[i]\n                            if arg_type == 'module id':\n                                line += '_'\n                                line += arg_map[def_arg]\n                        line += '(\\n'\n                        module_def[line_id] = line\n                        continue\n                    else:\n                        inline_module_handled.append(inline_module)\n                    # Print the inline module definition\n                    inline_module_call_args = []\n                    inline_module_call_args_type = []\n                    inline_module_def_args = []\n                    inline_module_arg_map = {}\n                    inline_module_name = inline_module\n                    inline_module_def = inline_module_defs[inline_module_name]\n                    # Extract the arg list in module definition\n                    for inline_module_line in inline_module_def:\n                        if inline_module_line.find('void') != -1:\n                            m = re.search(r'\\((.+?)\\)', inline_module_line)\n                            if m:\n                                def_args_old = m.group(1)\n                    def_args_old = def_args_old.split(', ')\n                    for arg in def_args_old:\n                        arg = arg.split()[-1]\n                        inline_module_def_args.append(arg)\n                    # Extract the arg list in module call\n                    next_line_id = line_id + 1\n                    next_line = module_def[next_line_id]\n                    while next_line.find(');') == -1:\n                        m = re.search(r'/\\*(.+?)\\*/', next_line)\n                        if m:\n                            arg_type = m.group(1).strip()\n                            inline_module_call_args_type.append(arg_type)\n                            m = re.search(r'\\*/ (.+)', next_line)\n                            if m:\n                                call_arg = m.group(1).split(',')[0]\n                                inline_module_call_args.append(call_arg)\n                        next_line_id += 1\n                        next_line = module_def[next_line_id]\n                    # Build a mapping between the def_arg to call_arg\n                    #print(inline_module_def_args)\n                    #print(inline_module_call_args)\n                    for i in range(len(inline_module_def_args)):\n                        def_arg = inline_module_def_args[i]\n                        call_arg = inline_module_call_args[i]\n                        inline_module_arg_map[def_arg] = call_arg\n                    # Replace the module ids and fifos from the upper module\n                    for def_arg in inline_module_arg_map:\n                        call_arg = inline_module_arg_map[def_arg]\n                        if call_arg in arg_map:\n                            inline_module_arg_map[def_arg] = arg_map[call_arg]\n                    print_module_def(\n                        f,\n                        inline_module_arg_map,\n                        inline_module_def.copy(),\n                        None,\n                        inline_module_def_args,\n                        inline_module_call_args_type)\n                    # Replace the inline module call with the new inline module\n                    # name\n                    line_indent = line.find(inline_module)\n                    line = ' ' * line_indent + inline_module\n                    for i in range(len(def_args)):\n                        def_arg = def_args[i]\n                        arg_type = call_args_type[i]\n                        if arg_type == 'module id':\n                            line += '_'\n                            line += arg_map[def_arg]\n                    line += '(\\n'\n                    module_def[line_id] = line\n\n    # Extract module ids and fifos from def_args\n    module_id_args = []\n    fifo_args = []\n    # print(def_args)\n    # print(call_args_type)\n    for i in range(len(def_args)):\n        def_arg = def_args[i]\n        arg_type = call_args_type[i]\n        if arg_type == 'module id':\n            module_id_args.append(def_arg)\n        if arg_type == 'fifo':\n            fifo_args.append(def_arg)\n\n    # Start printing\n    print_content = []\n    print_content.append('/* Module Definition */\\n')\n    line_id = 0\n    for line in module_def:\n        if line.find('void') != -1:\n            # This line is kernel argument.\n            # All module id and fifo arguments are deleted\n            m = re.search(r'(.+?)\\(', line)\n            if m:\n                prefix = m.group(1)\n            arg_start_pos = line.find('(')\n            arg_end_pos = line.rfind(')')\n            def_args = line[arg_start_pos + 1 : arg_end_pos]\n            #m = re.search(r'\\((.+?)\\)', line)\n            #if m:\n            #    def_args = m.group(1)\n            def_args = def_args.split(', ')\n            new_def_args = []\n            for i in range(len(def_args)):\n                if call_args_type[i] != 'module id' and call_args_type[i] != 'fifo':\n                    new_def_args.append(def_args[i])\n            # f.write(prefix + '(')\n            # Print the module_name\n            print_content.append(prefix)\n            for module_id in module_id_args:\n                print_content.append('_' + arg_map[module_id])\n            print_content.append('(')\n            first = True\n            for arg in new_def_args:\n                if not first:\n                    print_content.append(', ')\n                print_content.append(arg)\n                first = False\n            #print_content.append(')\\n')\n            print_content.append(line[arg_end_pos:])\n        else:\n            # module ids\n            for module_id in module_id_args:\n                if line.find(module_id) != -1:\n                    # Test if it is inside an argument list\n                    m = re.search(\n                        r'/\\* module id \\*/ ' +\n                        re.escape(module_id),\n                        line)\n                    if m:\n                        # Delete if from the argument list\n                        delete_arg_from_arg_list(\n                            line, module_id, print_content)\n                        line = None\n                        break\n                    else:\n                        # Plug in module ids\n                        line = re.sub(\n                            r'([^a-zA-Z_])(' +\n                            re.escape(module_id) +\n                            r')([^a-zA-Z0-9_])',\n                            r'\\g<1>' +\n                            re.escape(\n                                arg_map[module_id]) +\n                            r'\\g<3>',\n                            line)\n            # fifos\n            if line:\n                for fifo in fifo_args:\n                    if line.find(fifo) != -1:\n                        # Test if it is inside a read/write API call\n                        if line.find('read_channel_intel') != - \\\n                                1 or line.find('write_channel_intel') != -1:\n                            # Plug in fifos\n                            line = re.sub(\n                                r'([^a-zA-Z_])(' +\n                                re.escape(fifo) +\n                                r')([^a-zA-Z0-9_])',\n                                r'\\g<1>' +\n                                re.escape(\n                                    arg_map[fifo]) +\n                                r'\\g<3>',\n                                line)\n                        else:\n                            # Test if it is inside an argument list\n                            m = re.search(\n                                r'/\\* fifo \\*/ ' + re.escape(fifo), line)\n                            if m:\n                                # Delete it from the argument list\n                                delete_arg_from_arg_list(\n                                    line, fifo, print_content)\n                                line = None\n                                break\n            if line is not None:\n                print_content.append(line)\n        line_id += 1\n    print_content.append('/* Module Definition */\\n\\n')\n\n    f.writelines(print_content)\n\n\ndef generate_intel_kernel(\n        kernel,\n        headers,\n        module_defs,\n        module_calls,\n        fifo_decls):\n    \"\"\" Generate the final Intel code\n\n    This function plugs in the module definitions into each module call and replace\n    index ids and fifo arguments.\n\n    Parameters\n    ----------\n    kernel:\n        the output file\n    headers:\n        list containing the headers to be printed\n    module_defs:\n        dict containing the module definitions\n    module_calls:\n        list containing the module calls\n    fifo_decls:\n        list containing the fifo declarations\n    \"\"\"\n    inline_module_defs = {}\n    with open(kernel, 'w') as f:\n        # Print out headers\n        for header in headers:\n            f.write(header)\n        f.write('\\n')\n\n        f.write('#pragma OPENCL EXTENSION cl_intel_channels : enable\\n\\n')\n\n        # Print out channels\n        f.write('/* Channel Declaration */\\n')\n        for fifo_decl in fifo_decls:\n            f.write(fifo_decl + '\\n')\n        f.write('/* Channel Declaration */\\n\\n')\n\n        # Extract the inline modules\n        # These modules are those that exist in the module_defs but not in the\n        # module_calls.\n        for module_name in module_defs:\n            inline_module = 1\n            for module_call in module_calls:\n                line = module_call[0]\n                m = re.search(r'(.+?)\\(', line)\n                if m:\n                    cur_module_name = m.group(1)\n                if module_name == cur_module_name:\n                    inline_module = 0\n                    break\n            if inline_module:\n                inline_module_defs[module_name] = module_defs[module_name]\n\n        # print out module definitions\n        for module_call in module_calls:\n            # f.write('/* Module Definition */\\n')\n            def_args = []\n            call_args = []\n            call_args_type = []\n            arg_map = {}\n            # Extract the module name\n            line = module_call[0]\n            m = re.search(r'(.+?)\\(', line)\n            if m:\n                module_name = m.group(1)\n            module_def = module_defs[module_name]\n            # extract the arg list in module definition\n            for line in module_def:\n                if line.find('void') != -1:\n                    arg_start_pos = line.find('(')\n                    arg_end_pos = line.rfind(')')\n                    def_args_old = line[arg_start_pos + 1 : arg_end_pos]\n                    #m = re.search(r'\\((.+?)\\)', line)\n                    #if m:\n                    #    def_args_old = m.group(1)\n            def_args_old = def_args_old.split(', ')\n            for arg in def_args_old:\n                arg = arg.split()[-1]\n                def_args.append(arg)\n\n            # extract the arg list in module call\n            for line in module_call:\n                m = re.search(r'/\\*(.+?)\\*/', line)\n                if m:\n                    arg_type = m.group(1).strip()\n                    call_args_type.append(arg_type)\n                    n = re.search(r'\\*/ (.+)', line)\n                    if n:\n                        call_arg = n.group(1).strip(',')\n                        call_args.append(call_arg)\n\n            # build a mapping between the def_arg to call_arg\n            for i in range(len(def_args)):\n                call_arg_type = call_args_type[i]\n                if call_arg_type == 'module id' or call_arg_type == 'fifo':\n                    def_arg = def_args[i]\n                    call_arg = call_args[i]\n                    arg_map[def_arg] = call_arg\n\n            # print out the module definition with call args plugged in\n            print_module_def(\n                f,\n                arg_map,\n                module_def.copy(),\n                inline_module_defs,\n                def_args,\n                call_args_type)\n            # f.write('/* Module Definition */\\n\\n')\n\ndef contains_pipeline_for(pos, lines):\n    \"\"\" Examine if there is any for loop with hls_pipeline annotation inside the current for loop\n\n    \"\"\"\n    n_l_bracket = 0\n    n_r_bracket = 0\n    code_len = len(lines)\n    init_state = 1\n    while pos < code_len and n_r_bracket <= n_l_bracket:\n        if lines[pos].find('{') != -1:\n            n_l_bracket += 1\n        if lines[pos].find('}') != -1:\n            n_r_bracket += 1\n        if lines[pos].find('for') != -1:\n            if init_state:\n                init_state = 0\n            else:\n                if lines[pos + 1].find('hls_pipeline') != -1:\n                    return 1\n        if n_l_bracket == n_r_bracket and not init_state:\n            break\n        pos += 1\n    return 0\n\n\ndef insert_xlnx_pragmas(lines):\n    \"\"\" Insert HLS pragmas for Xilinx program\n\n    Replace the comments of \"// hls_pipeline\" and \"// hls_unroll\" with\n    HLS pragmas\n    For \"// hls pipeline\", find the previous for loop before hitting any \"}\".\n    Insert \"#pragma HLS PIPELINE II=1\" below the for loop.\n    For \"// hls unroll\", find the previous for loop before hitting the \"simd\" mark.\n    Insert \"#pragma HLS UNROLL\" below the for loop.\n    For \"// hls_dependence.x\", the position is the same with hls_pipeline.\n    Insert \"#pragma HLS DEPENDENCE variable=x inter false\".\n\n    Parameters\n    ----------\n    lines:\n        contains the codelines of the program\n    \"\"\"\n    # Handle hls_dependence\n    handle_dep_pragma = 1\n\n    code_len = len(lines)\n    pos = 0\n    while pos < code_len:\n        line = lines[pos]\n        if line.find(\"// hls_pipeline\") != - \\\n                1 or line.find(\"// hls_dependence\") != -1:\n            is_pipeline = 0\n            is_dep = 0\n            if line.find('// hls_pipeline') != -1:\n                is_pipeline = 1\n            else:\n                is_dep = 1\n            # Find if there is any other hls_pipeline/hls_dependence annotation\n            # below\n            n_l_bracket = 0\n            n_r_bracket = 0\n            next_pos = pos + 1\n            find_pipeline = 0\n            init_state = 1\n            while next_pos < code_len and n_r_bracket <= n_l_bracket:\n                if is_pipeline and lines[next_pos].find('hls_pipeline') != -1:\n                    find_pipeline = 1\n                    break\n                if is_dep and lines[next_pos].find(\n                        'hls_dependence') != -1 and handle_dep_pragma:\n                    find_pipeline = 1\n                    break\n                if lines[next_pos].find('{') != -1:\n                    n_l_bracket += 1\n                    init_state = 0\n                if lines[next_pos].find('}') != -1:\n                    n_r_bracket += 1\n                if n_l_bracket == n_r_bracket and not init_state:\n                    break\n                next_pos += 1\n            if find_pipeline:\n                pos += 1\n                continue\n\n            # Find the for loop above before hitting any \"}\"\n            prev_pos = pos - 1\n            find_for = 0\n            n_l_bracket = 0\n            n_r_bracket = 0\n            while prev_pos >= 0:\n                if lines[prev_pos].find('while') != -1:\n                    break\n                if lines[prev_pos].find('{') != -1:\n                    n_l_bracket += 1\n                if lines[prev_pos].find('}') != -1:\n                    n_r_bracket += 1\n                if lines[prev_pos].find('for') != -1:\n                    if n_l_bracket > n_r_bracket:\n                        # check if the pragma is already inserted\n                        if is_pipeline and lines[prev_pos +\n                                                 1].find('#pragma HLS PIPELINE II=1\\n') == -1:\n                            find_for = 1\n                        if is_dep and lines[prev_pos + 2].find(\n                                '#pragma HLS DEPENDENCE') == -1 and handle_dep_pragma:\n                            find_for = 1\n                        # check if there is any other for loop with\n                        # hls_pipeline annotation inside\n                        if contains_pipeline_for(prev_pos, lines):\n                            find_for = 0\n                        break\n                prev_pos -= 1\n            if find_for == 1:\n                # insert the pragma right after the for loop\n                indent = lines[prev_pos].find('for')\n                if line.find(\"hls_pipeline\") != -1:\n                    new_line = ' ' * indent + \"#pragma HLS PIPELINE II=1\\n\"\n                else:\n                    line_cp = line\n                    var_name = line_cp.strip().split('.')[-1]\n                    new_line = ' ' * indent + \"#pragma HLS DEPENDENCE variable=\" + \\\n                        var_name + \" inter false\\n\"\n                lines.insert(prev_pos + 1, new_line)\n                del lines[pos + 1]\n        elif line.find(\"// hls_unroll\") != -1:\n            # Find the for loop above before hitting any \"simd\"\n            prev_pos = pos - 1\n            find_for = 0\n            while prev_pos >= 0 and lines[prev_pos].find('simd') == -1:\n                if lines[prev_pos].find('for') != -1:\n                    find_for = 1\n                    break\n                prev_pos -= 1\n            if find_for == 1:\n                # insert the pragma right after the for loop\n                indent = lines[prev_pos].find('for')\n                new_line = ' ' * indent + \"#pragma HLS UNROLL\\n\"\n                lines.insert(prev_pos + 1, new_line)\n                del lines[pos + 1]\n        pos = pos + 1\n\n    return lines\n\ndef insert_catapult_pragmas(lines):\n    \"\"\" Insert Catapult HLS pragmas for Catapult program\n\n    Replace the comments of \"// hls_unroll\" with HLS pragmas    \n    For \"// hls unroll\", find the next for loop right below the mark.\n    Insert \"#pragma unroll yes\" before the for loop.    \n\n    Parameters\n    ----------\n    lines:\n        contains the codelines of the program\n    \"\"\"\n    # Handle hls_dependence\n    handle_dep_pragma = 1\n\n    code_len = len(lines)\n    pos = 0\n    while pos < code_len:\n        line = lines[pos]    \n        if line.find(\"// hls_unroll\") != -1:\n            # Find the for loop below\n            next_pos = pos + 1\n            find_for = 0\n            if lines[next_pos].find('for') != -1:                            \n                # insert the pragma right before the for loop\n                indent = lines[next_pos].find('for')\n                new_line = ' ' * indent + \"#pragma unroll yes\\n\"\n                lines.insert(next_pos, new_line)\n                del lines[pos]\n        pos = pos + 1\n\n    return lines\n\ndef float_to_int(matchobj):\n    str_expr = matchobj.group(0)\n    if float(str_expr) == int(float(str_expr)):\n        return str(int(float(str_expr)))\n    else:\n        return str_expr\n\n\ndef index_simplify(matchobj):\n    str_expr = matchobj.group(0)\n    if str_expr == '[arb]' or str_expr == '[!arb]' or str_expr == '[index[n]':\n        return str_expr\n    if '++' in str_expr:\n        return str_expr\n    expr = sympy.sympify(str_expr[1: len(str_expr) - 1])\n    \"\"\"\n    This will sometimes cause bugs due to the different semantics in C\n    E.g., x = 9, (x+3)/4 != x/4+3/4.\n    We could use cxxcode, but it will generate floating expressions which are\n    expensive on FPGA.\n    At present, we check if there is floor or ceil in the expression.\n    If so, we abort and use the original expression. Otherwise, we replace it\n    with the simplified one.\n    \"\"\"\n    expr = sympy.simplify(expr)\n    new_str_expr = sympy.printing.ccode(expr)\n#  # We will try to replace floats with integers if values won't change\n#  new_str_expr = re.sub('\\d+\\.\\d+', float_to_int, new_str_expr)\n\n    if 'floor' in new_str_expr or 'ceil' in new_str_expr or '.0' in new_str_expr:\n        return str_expr\n    else:\n        return '[' + new_str_expr + ']'\n\n\ndef mod_simplify(matchobj):\n    str_expr = matchobj.group(0)\n    str_expr = str_expr[1: len(str_expr) - 3]\n    expr = sympy.sympify(str_expr)\n    expr = sympy.simplify(expr)\n    str_expr = str(expr)\n\n    return '(' + str_expr + ') %'\n\n\ndef simplify_expressions(lines):\n    \"\"\" Simplify the index expressions in the program\n\n    Use Sympy to simplify all the array index expressions in the program.\n\n    Parameters\n    ----------\n    lines:\n        contains the codelines of the program\n    \"\"\"\n    code_len = len(lines)\n    # Simplify array index expressions\n    for pos in range(code_len):\n        line = lines[pos]\n        line = re.sub(r'\\[(.+?)\\]', index_simplify, line)\n        lines[pos] = line\n\n    # Simplify mod expressions\n    for pos in range(code_len):\n        line = lines[pos]\n        line = re.sub(r'\\((.+?)\\) %', mod_simplify, line)\n        lines[pos] = line\n\n    return lines\n\ndef shrink_bit_width(lines, target):\n    \"\"\" Calculate the bitwidth of the iterator and shrink it to the proper size\n\n    We will examine the for loops. Examine the upper bound of the loop. If the\n    upper bound is a number, we will compute the bitwidth of the iterator.\n    For Intel target, we will also look for iterator definitions marked with\n    \"/* UB: [...] */\". The shallow bitwidth is calculated and replace the previous\n    data type.\n\n    Parameters\n    ----------\n    lines:\n        contains the codelines of the program\n    target:\n        xilinx|intel\n    \"\"\"\n    code_len = len(lines)\n    for pos in range(code_len):\n        line = lines[pos]\n        if line.find('for') != -1:\n            # Parse the loop upper bound\n            m = re.search('<=(.+?);', line)\n            if m:\n                ub = m.group(1).strip()\n                if ub.isnumeric():\n                    # Replace it with shallow bit width\n                    bitwidth = int(np.ceil(np.log2(float(ub) + 1))) + 1\n                    if target == 'xilinx':\n                        new_iter_t = 'ap_uint<' + str(bitwidth) + '>'\n                    elif target == 'intel':\n                        new_iter_t = 'uint' + str(bitwidth) + '_t'\n                    elif target == 'catapult':\n                        new_iter_t = 'ac_int<' + str(bitwidth) + ', false>'\n                    line = re.sub('int', new_iter_t, line)\n                    lines[pos] = line\n            m = re.search('<(.+?);', line)\n            if m:\n                ub = m.group(1).strip()\n                if ub.isnumeric():\n                    #print(pos)\n                    # Replace it with shallow bit width                    \n                    bitwidth = int(np.ceil(np.log2(float(ub)))) + 1\n                    if target == 'xilinx':\n                        new_iter_t = 'ap_uint<' + str(bitwidth) + '>'\n                    elif target == 'intel':\n                        new_iter_t = 'uint' + str(bitwidth) + '_t'\n                    elif target == 'catapult':\n                        new_iter_t = 'ac_int<' + str(bitwidth) + ', false>'\n                    line = re.sub('int', new_iter_t, line)\n                    lines[pos] = line\n\n    for pos in range(code_len):\n        line = lines[pos]\n        m = re.search(r'/\\* UB: (.+?) \\*/', line)\n        if m:\n            ub = m.group(1).strip()\n            if ub.isnumeric():\n                # Replace it with shallow bit width\n                bitwidth = int(np.ceil(np.log2(float(ub) + 1))) + 1\n                if target == 'xilinx':\n                    new_iter_t = 'ap_uint<' + str(bitwidth) + '>'\n                elif target == 'intel':\n                    new_iter_t = 'uint' + str(bitwidth) + '_t'\n                elif target == 'catapult':\n                    new_iter_t = 'ac_int<' + str(bitwidth) + ', false>'\n                #line = re.sub('int', new_iter_t, line)\n                line = re.sub(\n                    r'(int)' +\n                    r'\\s' +\n                    r'([a-zA-Z])',\n                    new_iter_t +\n                    r' \\g<2>',\n                    line)\n                lines[pos] = line\n\n    return lines\n\n\ndef lift_split_buffers(lines):\n    \"\"\" Lift the split buffers in the program\n\n    For each module, if we find any split buffers with the name \"data_split\",\n    we will lift them out of the for loops and put them in the variable declaration\n    section at the beginning of the module.\n\n    Parameters\n    ----------\n    lines:\n        contains the codelines of the program\n    \"\"\"\n    code_len = len(lines)\n    for pos in range(code_len):\n        line = lines[pos]\n        if line.find('variable=data_split') != -1:\n            # Search for the variable declaration section\n            decl_pos = -1\n            prev_pos = pos - 1\n            while prev_pos >= 0:\n                prev_line = lines[prev_pos]\n                if prev_line.find('Variable Declaration') != -1:\n                    decl_pos = prev_pos\n                    break\n                prev_pos -= 1\n            # Move the two code lines at [pos - 1] and [pos] to [decl_pos] and\n            # [decl_pos + 1]\n            indent = lines[decl_pos].find('/*')\n            line1 = ' ' * indent + lines[pos - 1].lstrip()\n            line2 = ' ' * indent + lines[pos].lstrip()\n            del lines[pos - 1]\n            del lines[pos - 1]\n            lines.insert(decl_pos, line1)\n            lines.insert(decl_pos + 1, line2)\n\n    return lines\n\ndef build_dummy_module_def(group_name, fifo_type, module_in, PE_ids):\n    \"\"\" Build the definition of the dummy module\n\n    Parameters\n    ----------\n    group_name: str\n    fifo_type: str\n    module_in: int\n    PE_ids: list\n    \"\"\"\n    dir_str = 'out' if module_in == 0 else 'in'\n    index_str = ['idx', 'idy', 'idz']\n    fifo_name = f'fifo_{group_name}_{dir_str}'\n\n    lines = []\n    lines.append('/* Module Definition */\\n')\n    lines.append(f'void {group_name}_PE_dummy_{dir_str}(')\n    for pos in range(len(PE_ids)):\n        lines.append(f'int {index_str[pos]}, ')\n    lines.append(f'hls::stream<{fifo_type}> &{fifo_name}){{\\n')\n    if module_in == 0:\n        lines.append(f'  if (!{fifo_name}.full())\\n')\n        lines.append(f'    {fifo_name}.write(0);\\n')\n    else:\n        lines.append(f'  {fifo_type} fifo_data = {fifo_name}.read();\\n')\n    lines.append(f'}}\\n')\n    lines.append(f'/* Module Definition */\\n')\n\n    return lines\n\ndef build_dummy_module_call(group_name, fifo_name, module_in, PE_ids):\n    \"\"\" Build the call of the dummy module\n\n    Parameters\n    ----------\n    group_name: str\n    fifo_name: str\n    module_in: int\n    PE_ids: list\n    \"\"\"\n    dir_str = 'out' if module_in == 0 else 'in'\n\n    lines = []\n    lines.append('\\n')\n    lines.append('  /* Module Call */\\n')\n    lines.append(f'  {group_name}_PE_dummy_{dir_str}(\\n')\n    for id in PE_ids:\n        lines.append(f'    /* module id */ {id},\\n')\n    lines.append(f'    /* fifo */ {fifo_name}\\n')\n    lines.append(f'  );\\n')\n    lines.append(f'  /* Module Call */\\n')\n\n    return lines\n\ndef insert_dummy_modules(def_lines, call_lines):\n    \"\"\" Insert the missing dummy modules\n\n    Collect the FIFO information of PEs (fifo_name, fifo_type).\n    Delete those FIFOs that are connected to other modules.\n    Insert dummy modules for the rest of FIFOs.\n\n    Parameters\n    ----------\n    def_lines: list\n        Contains the codelines of the module definitions\n    call_lines: list\n        Contains the codelines of the module calls\n    \"\"\"\n    PE_fifos = []\n    for line in def_lines:\n        if line.find('void PE_wrapper') != -1:\n            # Parse the argument list\n            m = re.search(r'\\((.+?)\\)', line)\n            args = m.group(1).strip().split(',')\n            for arg in args:\n                if arg.find('fifo') != -1:\n                    m = re.search(r'stream<(.+?)>', arg)\n                    fifo_type = m.group(1)\n                    fifo_name = arg.split('&')[-1]\n                    PE_fifos.append({'type': fifo_type, 'name': fifo_name})\n    #print(PE_fifos)\n    # Collect all used fifos\n    used_fifos = {}\n    kernel_start = 0\n    for line in call_lines:\n        if line.find('void kernel0') != -1:\n            kernel_start = 1\n        if kernel_start:\n            if line.find('* fifo *') != -1:\n                fifo = line.strip().split('*')[2][2:]\n                if fifo[-1] == ',':\n                    fifo = fifo[:-1]\n                # Only process PE level fifos\n                if fifo.find('PE') == -1:\n                    continue\n                if fifo not in used_fifos:\n                    used_fifos[fifo] = -1\n                else:\n                    del used_fifos[fifo]\n    #print(used_fifos)\n    # Locate the fifo position\n    inside_module = False\n    inside_PE = False\n    fifo_pos = 0\n    PE_call_start = -1\n    PE_call_end = -1\n    line_id = 0\n    for line in call_lines:\n        if line.find('Module Call') != -1:\n            inside_module = not inside_module\n            if inside_PE:\n                PE_call_end = line_id\n            inside_PE = False\n        if inside_module:\n            if line.find('PE_wrapper') != -1:\n                inside_PE = True\n                fifo_pos = 0\n                if PE_call_start == -1:\n                    PE_call_start = line_id - 1\n            if inside_PE:\n                if line.find('fifo') != -1:\n                    for used_fifo in used_fifos:\n                        if line.find(used_fifo) != -1:\n                            used_fifos[used_fifo] = fifo_pos\n                    fifo_pos += 1\n        line_id += 1\n    #print(used_fifos)\n    # Insert the dummy module definitions\n    offset_line = 0\n    for used_fifo in used_fifos:\n        fifo_info = PE_fifos[used_fifos[used_fifo]]\n        # Extract the module direction\n        if fifo_info['name'].endswith('in'):\n            module_in = 0\n        else:\n            module_in = 1\n        # Extract the group name\n        if fifo_info['name'].endswith('in'):\n            group_name = fifo_info['name'][5:-3]\n        else:\n            group_name = fifo_info['name'][5:-4]\n        # Extract the PE ids\n        PE_ids = used_fifo[len(f'fifo_{group_name}_PE_'):].split('_')\n        #print(used_fifo, module_in, group_name, PE_ids)\n\n        # Build the dummy module definition\n        module_def = build_dummy_module_def(group_name, fifo_info['type'], module_in, PE_ids)\n        #print(module_def)\n        def_lines += module_def\n        def_lines.append('\\n')\n\n        # Build the dummy module call\n        module_call = build_dummy_module_call(group_name, used_fifo, module_in, PE_ids) # TODO\n        if module_in == 0:\n            for i in range(len(module_call)):\n                call_lines.insert(PE_call_start - 1 + i, module_call[i])\n            offset_line += len(module_call)\n        else:\n            for i in range(len(module_call)):\n                call_lines.insert(PE_call_end + 1 + offset_line + i, module_call[i])\n\n    #print(PE_call_start, PE_call_end)\n\n    return def_lines, call_lines\n\ndef modify_tb(lines):\n    \"\"\" Modify the test bench for Catapult HLS\n    \n    Replace the int main with CCS_MAIN.\n\n    Paramters\n    ---------\n    lines: list\n        contains the codelines of the test bench\n    \"\"\"\n    for pos in range(len(lines)):\n        line = lines[pos]\n        if line.find('int main') != -1:\n            line = line.replace('int main', 'CCS_MAIN')\n        lines[pos] = line\n    return lines\n\ndef reorder_module_calls(lines, target):\n    \"\"\" Reorder the module calls in the program\n\n    For I/O module calls, we will reverse the sequence of calls for output modules.\n    Starting from the first module, enlist the module calls until the boundary module\n    is met.\n    Reverse the list and print it.\n\n    Parameters\n    ----------\n    lines: list\n        contains the codelines of the program\n    target: string\n        xilinx|intel|catapult\n    \"\"\"\n\n    code_len = len(lines)\n    module_calls = []\n    module_start = 0\n    module_call = []\n    output_io = 0\n    boundary = 0\n    new_module = 0\n    prev_module_name = \"\"\n    first_line = -1\n    last_line = -1\n    reset = 0\n\n    for pos in range(code_len):\n        line = lines[pos]\n        if line.find(\"/* Module Call */\") != -1:\n            if module_start == 0:\n                module_start = 1\n            else:\n                module_start = 0\n\n            if module_start:\n                # Examine if the module is an output I/O module\n                nxt_line = lines[pos + 1]\n                if nxt_line.find(\"IO\") != -1 and nxt_line.find(\"out\") != -1:\n                    output_io = 1\n                    # Examine if the module is an boundary module\n                    if nxt_line.find(\"boundary\") != -1:\n                        boundary = 1\n                # Extract the module name\n                nxt_line = nxt_line.strip()\n                if nxt_line.find('<') != -1:\n                    module_name = nxt_line.split('<')[0]\n                else:\n                    module_name = nxt_line.split('(')[0]\n                if target == 'catapult':                    \n                    module_name = module_name[:module_name.find('_inst')]\n\n                if module_name.find('wrapper'):\n                    module_name = module_name[:-8]\n                if boundary:\n                    module_name = module_name[:-9]\n                if prev_module_name == \"\":\n                    prev_module_name = module_name\n                    first_line = pos\n                else:\n                    if prev_module_name != module_name:\n                        new_module = 1\n                        prev_module_name = module_name\n                        first_line = pos\n                        reset = 0\n                    else:\n                        if reset:\n                            first_line = pos\n                            reset = 0\n                        new_module = 0\n\n            if not module_start:\n                if output_io:\n                    last_line = pos\n                    module_call.append(line)\n                    module_calls.append(module_call.copy())\n                    module_call.clear()\n                    if boundary:\n                        # Pop out the previous module calls except the last one\n                        if new_module:\n                            module_calls = module_calls[-1:]\n                        # Reverse the list\n                        module_calls.reverse()\n                        # Insert it back\n                        left_lines = lines[last_line + 1:]\n                        lines = lines[:first_line]\n                        first = 1\n                        for c in module_calls:\n                            if not first:\n                                lines.append(\"\\n\")\n                            lines = lines + c\n                            first = 0\n                        lines = lines + left_lines\n                        # Clean up\n                        module_calls.clear()\n                        boundary = 0\n                        output_io = 0\n                        reset = 1\n                    if new_module:\n                        # Pop out the previous module calls except the last one\n                        module_calls = module_calls[-1:]\n\n\n        if module_start and output_io:\n            module_call.append(line)\n\n    return lines\n\ndef xilinx_run(\n        kernel_call,\n        kernel_def,\n        kernel='autosa.tmp/output/src/kernel_kernel.cpp',\n        host='opencl',\n        hcl=False):\n    \"\"\" Generate the kernel file for Xilinx platform\n\n    We will copy the content of kernel definitions before the kernel calls.\n\n    Parameters\n    ----------\n    kernel_call:\n        file containing kernel calls\n    kernel_def:\n        file containing kernel definitions\n    kernel:\n        output kernel file\n    hcl:\n        integrated with HeteroCL\n    \"\"\"\n\n    # Load kernel definition file\n    lines = []\n    with open(kernel_def, 'r') as f:\n        lines = f.readlines()\n    call_lines = []\n    with open(kernel_call, 'r') as f:\n        call_lines = f.readlines()\n\n    # Simplify the expressions\n    lines = simplify_expressions(lines)\n\n    # Change the loop iterator type\n    lines = shrink_bit_width(lines, 'xilinx')\n\n    # Insert the HLS pragmas\n    lines = insert_xlnx_pragmas(lines)\n\n    # Lift the split_buffers\n    lines = lift_split_buffers(lines)\n\n    ## Insert missing dummy modules\n    #lines, call_lines = insert_dummy_modules(lines, call_lines)\n\n    kernel = str(kernel)\n    print(\"Please find the generated file: \" + kernel)\n\n    with open(kernel, 'w') as f:\n        if host == 'opencl' or hcl == True:\n            # Merge kernel header file\n            kernel_header = kernel.split('.')\n            kernel_header[-1] = 'h'\n            kernel_header = \".\".join(kernel_header)\n            with open(kernel_header, 'r') as f2:\n                header_lines = f2.readlines()\n                f.writelines(header_lines)\n            f.write('\\n')\n\n        f.writelines(lines)\n\n        # Reorder module calls\n        call_lines = reorder_module_calls(call_lines, 'xilinx')\n        f.writelines(call_lines)\n\n        ## Load kernel call file\n        #with open(kernel_call, 'r') as f2:\n        #    lines = f2.readlines()\n        #    # Reorder module calls\n        #    lines = reorder_module_calls(lines)\n        #    f.writelines(lines)\n\n\ndef catapult_run(\n        kernel_call,\n        kernel_def,\n        tb,\n        kernel='autosa.tmp/output/src/kernel_kernel_hw.h',\n        host='opencl'):\n    \"\"\" Generate the kernel file for Catapult HLS platform\n\n    We will copy the content of kernel definitions before the kernel calls.\n\n    Parameters\n    ----------\n    kernel_call:\n        file containing kernel calls\n    kernel_def:\n        file containing kernel definitions\n    tb: \n        file containing test bench\n    kernel:\n        output kernel file\n    \"\"\"\n\n    # Load kernel definition file\n    lines = []\n    with open(kernel_def, 'r') as f:\n        lines = f.readlines()\n    call_lines = []\n    with open(kernel_call, 'r') as f:\n        call_lines = f.readlines()\n\n    # Simplify the expressions\n    lines = simplify_expressions(lines)\n\n    # Change the loop iterator type\n    lines = shrink_bit_width(lines, 'catapult')\n\n    # Insert the HLS pragmas\n    lines = insert_catapult_pragmas(lines)\n\n    # Lift the split_buffers\n    lines = lift_split_buffers(lines)    \n\n    kernel = str(kernel)\n    print(\"Please find the generated file: \" + kernel)\n\n    with open(kernel, 'w') as f:\n        #if host == 'opencl':\n        #    # Merge kernel header file\n        #    kernel_header = kernel.split('.')\n        #    kernel_header[-1] = 'h'\n        #    kernel_header = \".\".join(kernel_header)\n        #    with open(kernel_header, 'r') as f2:\n        #        header_lines = f2.readlines()\n        #        f.writelines(header_lines)\n        #    f.write('\\n')\n\n        f.writelines(lines)\n\n        # Reorder module calls\n        call_lines = reorder_module_calls(call_lines, 'catapult')\n\n        f.writelines(call_lines)      \n\n     # Modify the test bench\n    with open(tb, 'r') as f:\n        tb_lines = f.readlines()    \n    tb_lines = modify_tb(tb_lines)\n    with open(tb, 'w') as f:\n        f.writelines(tb_lines)\n\ndef insert_intel_pragmas(lines):\n    \"\"\" Insert Intel OpenCL pragmas for Intel program\n\n    Replace the comments of \"// hls_unroll\" with OpenCL pragmas.\n    For \"hls unroll\", find the previous for loop before hitting the \"simd\" mark.\n    Insert \"#pragma unroll\" above the for loop.\n    Replace the comments of \"// hls_coalesce\" with OpenCL pragma \"#pragma loop_coalesce\".\n\n    Parameters\n    ----------\n    lines:\n        contains the codelines of the program\n    \"\"\"\n    code_len = len(lines)\n    pos = 0\n    while pos < code_len:\n        line = lines[pos]\n        if line.find('// hls_unroll') != -1:\n            # Find the for loop above before hitting any \"simd\"\n            prev_pos = pos - 1\n            find_for = 0\n            while prev_pos >= 0 and lines[prev_pos].find('simd') == -1:\n                if lines[prev_pos].find('for') != -1:\n                    find_for = 1\n                    break\n                prev_pos -= 1\n            if find_for == 1:\n                # Insert the pragma right before the for loop\n                indent = lines[prev_pos].find('for')\n                new_line = ' ' * indent + \"#pragma unroll\\n\"\n                lines.insert(prev_pos, new_line)\n                del lines[pos + 1]\n#    if line.find('// hls_coalesce') != -1:\n#      indent = line.find('// hls_coalesce')\n#      new_line = ' ' * indent + \"#pragma loop_coalesce\\n\"\n#      del lines[pos]\n#      lines.insert(pos, new_line)\n        pos = pos + 1\n\n    return lines\n\n\ndef intel_run(\n        kernel_call,\n        kernel_def,\n        kernel='autosa.tmp/output/src/kernel_kernel.cpp',\n        hcl=False):\n    \"\"\" Generate the kernel file for Intel platform\n\n    We will extract all the fifo declarations and module calls.\n    Then plug in the module definitions into each module call.\n\n    Parameters\n    ----------\n    kernel_call:\n        file containing kernel calls\n    kernel_def:\n        file containing kernel definitions\n    kernel:\n        output kernel file\n    hcl:\n        integrated with HeteroCL\n    \"\"\"\n    # Load kernel call file\n    module_calls = []\n    fifo_decls = []\n    with open(kernel_call, 'r') as f:\n        add = False\n        while True:\n            line = f.readline()\n            if not line:\n                break\n            # Extract the fifo declaration and add to the list\n            if add:\n                line = line.strip()\n                fifo_decls.append(line)\n            if line.find('/* FIFO Declaration */') != -1:\n                if add:\n                    fifo_decls.pop(len(fifo_decls) - 1)\n                add = not add\n\n    with open(kernel_call, 'r') as f:\n        add = False\n        module_call = []\n        while True:\n            line = f.readline()\n            if not line:\n                break\n            # Extract the module call and add to the list\n            if add:\n                line = line.strip()\n                module_call.append(line)\n            if line.find('/* Module Call */') != -1:\n                if add:\n                    module_call.pop(len(module_call) - 1)\n                    module_calls.append(module_call.copy())\n                    module_call.clear()\n                add = not add\n\n    module_defs = {}\n    headers = []\n    #print(hcl)\n    with open(kernel_def, 'r') as f:\n        while True:\n            line = f.readline()\n            if not line:\n                break\n            if line.find('#include') != -1:\n                #line = line.strip()\n                if hcl == True and line.find('_kernel.h') != -1:\n                    # Replace the header include with header contents\n                    #print(line)\n                    file_name = re.search(r'include \\\"(.+?)\\\"', line).group(1)\n                    file_path = os.path.dirname(kernel) + '/' + file_name                    \n                    with open(file_path, 'r') as f2:\n                        header_lines = f2.readlines()\n                        headers += header_lines\n                else:\n                    headers.append(line)\n\n    with open(kernel_def, 'r') as f:\n        add = False\n        module_def = []\n        while True:\n            line = f.readline()\n            if not line:\n                break\n            # Extract the module definition and add to the dict\n            if add:\n                module_def.append(line)\n                # Extract the module name\n                if (line.find('void')) != -1:\n                    m = re.search(r'void (.+?)\\(', line)\n                    if m:\n                        module_name = m.group(1)\n                        #print(module_name)\n            if line.find('/* Module Definition */') != -1:\n                if add:\n                    module_def.pop(len(module_def) - 1)\n                    module_defs[module_name] = module_def.copy()\n                    module_def.clear()\n                    # Post-process the module definition\n                    # Simplify the expressions\n                    module_defs[module_name] = simplify_expressions(\n                        module_defs[module_name])\n                    # Insert the OpenCL pragmas\n                    module_defs[module_name] = insert_intel_pragmas(\n                        module_defs[module_name])\n                    # Change the loop iterator type\n                    module_defs[module_name] = shrink_bit_width(\n                        module_defs[module_name], 'intel')\n                add = not add\n\n    # compose the kernel file\n    kernel = str(kernel)\n    generate_intel_kernel(\n        kernel,\n        headers,\n        module_defs,\n        module_calls,\n        fifo_decls)\n\n\ndef tapa_run(\n        kernel_call,\n        kernel_def,\n        kernel='autosa.tmp/output/src/kernel_kernel.cpp'):\n    \"\"\" Generate the kernel file for TAPA platform\n\n    We will copy the content of kernel definitions before the kernel calls.\n\n    Parameters\n    ----------\n    kernel_call:\n        file containing kernel calls\n    kernel_def:\n        file containing kernel definitions\n    \"\"\"\n\n    # Load kernel definition file\n    lines = []\n    with open(kernel_def, 'r') as f:\n        lines = f.readlines()\n    call_lines = []\n    with open(kernel_call, 'r') as f:\n        call_lines = f.readlines()\n\n    # Simplify the expressions\n    lines = simplify_expressions(lines)\n\n    # Change the loop iterator type\n    lines = shrink_bit_width(lines, 'xilinx')\n\n    # Insert the HLS pragmas\n    lines = insert_xlnx_pragmas(lines)\n\n    # Lift the split_buffers\n    lines = lift_split_buffers(lines)\n\n    kernel = str(kernel)\n    print(\"Please find the generated file: \" + kernel)\n\n    with open(kernel, 'w') as f:\n        f.writelines(lines)\n        f.writelines(call_lines)\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser(description='==== AutoSA CodeGen ====')\n    parser.add_argument(\n        '-c',\n        '--kernel-call',\n        metavar='KERNEL_CALL',\n        required=True,\n        help='kernel function call')\n    parser.add_argument(\n        '-d',\n        '--kernel-def',\n        metavar='KERNEL_DEF',\n        required=True,\n        help='kernel function definition')\n    parser.add_argument(\n        '--tb',\n        metavar='TB',\n        required=False,\n        help='test bench')    \n    parser.add_argument(\n        '-t',\n        '--target',\n        metavar='TARGET',\n        required=True,\n        help='hardware target: autosa_hls_c|autosa_opencl|autosa_catapult_c')\n    parser.add_argument(\n        '-o',\n        '--output',\n        metavar='OUTPUT',\n        required=False,\n        help='output kernel file')\n    parser.add_argument(\n        '--host',\n        metavar='HOST',\n        required=False,\n        help='Xilinx host target: hls|opencl',\n        default='opencl')\n    parser.add_argument(\n        '--hcl',        \n        action='store_true',\n        default=False,\n        help='HeteroCL integration')\n\n    args = parser.parse_args()\n\n    if args.target == 'autosa_opencl':\n        intel_run(args.kernel_call, args.kernel_def, args.output, args.hcl)\n    elif args.target == 'autosa_hls_c':\n        xilinx_run(args.kernel_call, args.kernel_def, args.output, args.host, args.hcl)\n    elif args.target == 'autosa_tapa':\n        tapa_run(args.kernel_call, args.kernel_def, args.output)\n    elif args.target == 'autosa_catapult_c':\n        catapult_run(args.kernel_call, args.kernel_def, args.tb, args.output, args.host)\n"
  },
  {
    "path": "autosa_scripts/hls_scripts/hls_script.tcl",
    "content": "############################################################\n## This file is generated automatically by Vivado HLS.\n## Please DO NOT edit it.\n## Copyright (C) 1986-2019 Xilinx, Inc. All Rights Reserved.\n############################################################\nopen_project hls_prj\nset_top kernel0\nadd_files src/kernel_kernel.h\nadd_files src/kernel_kernel.cpp\nadd_files -tb src/kernel_host.cpp\nopen_solution \"solution1\"\nset_part {xcu200-fsgd2104-2-e}\ncreate_clock -period 5 -name default\nconfig_compile -name_max_length 50\n#source \"./prj/solution1/directives.tcl\"\ncsim_design\n#csynth_design\n#cosim_design \n#cosim_design -trace_level all\n#cosim_design -setup -trace_level all\n#export_design -format ip_catalog\nexit\n"
  },
  {
    "path": "autosa_scripts/hls_scripts/hls_script_synth.tcl",
    "content": "############################################################\n## This file is generated automatically by Vivado HLS.\n## Please DO NOT edit it.\n## Copyright (C) 1986-2019 Xilinx, Inc. All Rights Reserved.\n############################################################\nopen_project hls_prj\nset_top kernel0\nadd_files src/kernel_kernel.h\nadd_files src/kernel_kernel.cpp\nadd_files -tb src/kernel_host.cpp\nopen_solution \"solution1\"\nset_part {xcu200-fsgd2104-2-e}\ncreate_clock -period 5 -name default\nconfig_compile -name_max_length 50\n#source \"./prj/solution1/directives.tcl\"\n#csim_design\ncsynth_design\n#cosim_design \n#cosim_design -trace_level all\n#cosim_design -setup -trace_level all\n#export_design -format ip_catalog\nexit\n"
  },
  {
    "path": "autosa_scripts/intel_opencl_scripts/Makefile",
    "content": "APP ?= kernel\nAOCL_BOARD ?= s10mx_hbm_es\nSW_EMU_AOCX ?= $(APP)_sw_emu.aocx\nHW_EMU_AOCX ?= $(APP)_hw_emu.aocx\nHW_AOCX ?= $(APP)_hw.aocx\nAOCO ?= $(APP).aoco\nAOCR ?= $(APP).aocr\n\n# Compiler\nAOC ?= aoc\nCXX ?= g++\nAOC_FLAGS ?= -board=$(AOCL_BOARD) -fp-relaxed -report -hyper-optimized-handshaking=off -I $(INTELFPGAOCLSDKROOT)/include/kernel_headers\n\nTARGET ?= host\nSW_EMU_TARGET ?= host_sw_emu\nTARGET_DIR ?= bin\nAOCL_UTILS ?= $(INTELFPGAOCLSDKROOT)/examples_aoc/common\n\n# Directories\nINC_DIRS := src $(AOCL_UTILS)/inc\nLIB_DIRS := \n\n# Files\nINCS := $(wildcard src/*.h)\nHOST_SRCS := $(wildcard src/$(APP)_host.cpp $(AOCL_UTILS)/src/AOCLUtils/*.cpp)\nKERNEL_SRCS := src/$(APP)_kernel.cl\n\nifeq ($(VERBOSE),1)\nECHO := \nelse\nECHO := @\nendif\n\n# Where is the Intel(R) FPGA SDK for OpenCL(TM) software?\nifeq ($(wildcard $(INTELFPGAOCLSDKROOT)),)\n$(error Set INTELFPGAOCLSDKROOT to the root directory of the Intel(R) FPGA SDK for OpenCL(TM) software installation)\nendif\nifeq ($(wildcard $(INTELFPGAOCLSDKROOT)/host/include/CL/opencl.h),)\n$(error Set INTELFPGAOCLSDKROOT to the root directory of the Intel(R) FPGA SDK for OpenCL(TM) software installation.)\nendif\n\n# OpenCL compile and link flags.\nAOCL_COMPILE_CONFIG := $(shell aocl compile-config )\nAOCL_LINK_LIBS := $(shell aocl ldlibs )\nAOCL_LINK_FLAGS := $(shell aocl ldflags )\n# Linking with defences enabled\nAOCL_LINK_FLAGS += -z noexecstack\nAOCL_LINK_FLAGS += -Wl,-z,relro,-z,now\nAOCL_LINK_FLAGS += -Wl,-Bsymbolic\nAOCL_LINK_FLAGS += -pie\nAOCL_LINK_CONFIG := $(AOCL_LINK_FLAGS) $(AOCL_LINK_LIBS)\n\n# Compilation flags\nifeq ($(DEBUG),1)\nCXXFLAGS += -g\nelse\nCXXFLAGS += -O2\nendif\nCXXFLAGS += -std=gnu++0x\n\n# Compiling with defences enabled\nCXXFLAGS += -fstack-protector\nCXXFLAGS += -D_FORTIFY_SOURCE=2\nCXXFLAGS += -Wformat -Wformat-security\nCXXFLAGS += -fPIE\n\n# We must force GCC to never assume that it can shove in its own\n# sse2/sse3 versions of strlen and strcmp because they will CRASH.\n# Very hard to debug!\nCXXFLAGS += -fPIC\n\nLIBS := rt pthread\n\n## Make it all!\n#all : $(TARGET_DIR)/$(TARGET)\n\nsw_emu : $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(SW_EMU_AOCX)\n\nhls: $(TARGET_DIR)/$(AOCR)\n\nhw : $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(HW_AOCX)\n\nhw_emu: $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(HW_EMU_AOCX)\n\nhw_emu_check: $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(HW_EMU_AOCX)\n\tCL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 $(TARGET_DIR)/$(TARGET) $(HW_EMU_AOCX)\n\nsw_emu_check : $(TARGET_DIR)/$(SW_EMU_TARGET) $(TARGET_DIR)/$(SW_EMU_AOCX)\n\tCL_CONTEXT_EMULATOR_DEVICE_INTELFPGA=1 $(TARGET_DIR)/$(TARGET) $(SW_EMU_AOCX)\n\nhw_check : $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(HW_AOCX)\n\t$(TARGET_DIR)/$(TARGET) $(HW_AOCX)\n\n# Host executable target.\n$(TARGET_DIR)/$(TARGET) : Makefile $(HOST_SRCS) $(INCS) $(TARGET_DIR)\n\t$(ECHO)$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(EXTRACXXFLAGS) -fPIC $(foreach D,$(INC_DIRS),-I$D) \\\n\t\t\t$(AOCL_COMPILE_CONFIG) $(HOST_SRCS) $(AOCL_LINK_CONFIG) \\\n\t\t\t$(foreach D,$(LIB_DIRS),-L$D) \\\n\t\t\t$(foreach L,$(LIBS),-l$L) \\\n\t\t\t-o $(TARGET_DIR)/$(TARGET)\n\n$(TARGET_DIR)/$(SW_EMU_TARGET) : Makefile $(HOST_SRCS) $(INCS) $(TARGET_DIR)\n\t$(ECHO)$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(EXTRACXXFLAGS) -fPIC $(foreach D,$(INC_DIRS),-I$D) \\\n\t\t\t$(AOCL_COMPILE_CONFIG) $(HOST_SRCS) $(AOCL_LINK_CONFIG) \\\n\t\t\t$(foreach D,$(LIB_DIRS),-L$D) \\\n\t\t\t$(foreach L,$(LIBS),-l$L) \\\n\t\t\t-o $(TARGET_DIR)/$(TARGET) -DEMULATE\n\n$(TARGET_DIR) :\n\t$(ECHO)mkdir $(TARGET_DIR)\n\n$(TARGET_DIR)/$(SW_EMU_AOCX) : $(KERNEL_SRCS)\n\t$(AOC) $(AOC_FLAGS) -march=emulator -legacy-emulator -o $@ $^\n\n$(TARGET_DIR)/$(HW_EMU_AOCX) : $(KERNEL_SRCS)\n\t$(AOC) $(AOC_FLAGS) -march=simulator -ghdl -o $@ $^\n\n$(TARGET_DIR)/$(HW_AOCX) : $(KERNEL_SRCS)\n\t$(AOC) $(AOC_FLAGS) -o $@ $^\n\n$(TARGET_DIR)/$(AOCO) : $(KERNEL_SRCS)\n\t$(AOC) $(AOC_FLAGS) -c -o $@ $^\n\n$(TARGET_DIR)/$(AOCR) : $(TARGET_DIR)/$(AOCO)\n\t$(AOC) $(AOC_FLAGS) -rtl -o $@ $^\n\n# Standard make targets\nclean :\n\t$(ECHO)rm -rf $(TARGET_DIR)/*\n\n.PHONY : all clean\n"
  },
  {
    "path": "autosa_scripts/intel_opencl_scripts/common/inc/AOCLUtils/aocl_utils.h",
    "content": "// Copyright (C) 2013-2020 Altera Corporation, San Jose, California, USA. All rights reserved.\n// Permission is hereby granted, free of charge, to any person obtaining a copy of this\n// software and associated documentation files (the \"Software\"), to deal in the Software\n// without restriction, including without limitation the rights to use, copy, modify, merge,\n// publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to\n// whom the Software is furnished to do so, subject to the following conditions:\n// The above copyright notice and this permission notice shall be included in all copies or\n// substantial portions of the Software.\n// \n// THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES\n// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT\n// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,\n// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR\n// OTHER DEALINGS IN THE SOFTWARE.\n// \n// This agreement shall be governed in all respects by the laws of the State of California and\n// by the laws of the United States of America.\n\n// Main include file for AOCLUtils. Includes all other utility header files.\n\n#ifndef AOCL_UTILS_H\n#define AOCL_UTILS_H\n\n#include \"AOCLUtils/opencl.h\"\n#include \"AOCLUtils/scoped_ptrs.h\"\n#include \"AOCLUtils/options.h\"\n\n#endif\n\n"
  },
  {
    "path": "autosa_scripts/intel_opencl_scripts/common/inc/AOCLUtils/opencl.h",
    "content": "// Copyright (C) 2013-2020 Altera Corporation, San Jose, California, USA. All rights reserved.\n// Permission is hereby granted, free of charge, to any person obtaining a copy of this\n// software and associated documentation files (the \"Software\"), to deal in the Software\n// without restriction, including without limitation the rights to use, copy, modify, merge,\n// publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to\n// whom the Software is furnished to do so, subject to the following conditions:\n// The above copyright notice and this permission notice shall be included in all copies or\n// substantial portions of the Software.\n// \n// THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES\n// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT\n// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,\n// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR\n// OTHER DEALINGS IN THE SOFTWARE.\n// \n// This agreement shall be governed in all respects by the laws of the State of California and\n// by the laws of the United States of America.\n\n// OpenCL utility functions.\n\n#ifndef AOCL_UTILS_OPENCL_H\n#define AOCL_UTILS_OPENCL_H\n\n#include <string.h>\n#include <stdio.h>\n#include <stdlib.h>\n#include <string>\n\n#include \"CL/opencl.h\"\n\n// This is assumed to be externally provided by the application.\nextern void cleanup();\n\nnamespace aocl_utils {\n\n// Host allocation functions\nvoid *alignedMalloc(size_t size);\nvoid alignedFree(void *ptr);\n\n// Error functions\nvoid printError(cl_int error);\nvoid _checkError(int line,\n                 const char *file,\n                 cl_int error,\n                 const char *msg,\n                 ...); // does not return\n#define checkError(status, ...) _checkError(__LINE__, __FILE__, status, __VA_ARGS__)\n\n// Sets the current working directory to the same directory that contains\n// this executable. Returns true on success.\nbool setCwdToExeDir();\n\n// Find a platform that contains the search string in its name (case-insensitive match).\n// Returns NULL if no match is found.\ncl_platform_id findPlatform(const char *platform_name_search);\n\n// Returns the name of the platform.\nstd::string getPlatformName(cl_platform_id pid);\n\n// Returns the name of the device.\nstd::string getDeviceName(cl_device_id did);\n\n// Returns an array of device ids for the given platform and the\n// device type.\n// Return value must be freed with delete[].\ncl_device_id *getDevices(cl_platform_id pid, cl_device_type dev_type, cl_uint *num_devices);\n\n// Create a OpenCL program from a binary file.\n// The program is created for all given devices associated with the context. The same\n// binary is used for all devices.\ncl_program createProgramFromBinary(cl_context context, const char *binary_file_name, const cl_device_id *devices, unsigned num_devices);\n\n// Load binary file.\n// Return value must be freed with delete[].\nunsigned char *loadBinaryFile(const char *file_name, size_t *size);\n\n// Checks if a file exists.\nbool fileExists(const char *file_name);\n\n// Returns the path to the AOCX file to use for the given device.\n// This is special handling for examples for the Intel(R) FPGA SDK for OpenCL(TM).\n// It uses the device name to get the board name and then looks for a\n// corresponding AOCX file. Specifically, it gets the device name and\n// extracts the board name assuming the device name has the following format:\n//  <board> : ...\n//\n// Then the AOCX file is <prefix>_<version>_<board>.aocx. If this\n// file does not exist, then the file name defaults to <prefix>.aocx.\nstd::string getBoardBinaryFile(const char *prefix, cl_device_id device);\n\n// Returns the time from a high-resolution timer in seconds. This value\n// can be used with a value returned previously to measure a high-resolution\n// time difference.\ndouble getCurrentTimestamp();\n\n// Returns the difference between the CL_PROFILING_COMMAND_END and\n// CL_PROFILING_COMMAND_START values of a cl_event object.\n// This requires that the command queue associated with the event be created\n// with the CL_QUEUE_PROFILING_ENABLE property.\n//\n// The return value is in nanoseconds.\ncl_ulong getStartEndTime(cl_event event);\n\n// Returns the maximum time span for the given set of events.\n// The time span starts at the earliest event start time.\n// The time span ends at the latest event end time.\ncl_ulong getStartEndTime(cl_event *events, unsigned num_events);\n\n// Wait for the specified number of milliseconds.\nvoid waitMilliseconds(unsigned ms);\n\n// OpenCL context callback function that simply prints the error information\n// to stdout (via printf).\nvoid oclContextCallback(const char *errinfo, const void *, size_t, void *);\n\n} // ns aocl_utils\n\n#endif\n\n"
  },
  {
    "path": "autosa_scripts/intel_opencl_scripts/common/inc/AOCLUtils/options.h",
    "content": "// Copyright (C) 2013-2020 Altera Corporation, San Jose, California, USA. All rights reserved.\n// Permission is hereby granted, free of charge, to any person obtaining a copy of this\n// software and associated documentation files (the \"Software\"), to deal in the Software\n// without restriction, including without limitation the rights to use, copy, modify, merge,\n// publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to\n// whom the Software is furnished to do so, subject to the following conditions:\n// The above copyright notice and this permission notice shall be included in all copies or\n// substantial portions of the Software.\n// \n// THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES\n// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT\n// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,\n// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR\n// OTHER DEALINGS IN THE SOFTWARE.\n// \n// This agreement shall be governed in all respects by the laws of the State of California and\n// by the laws of the United States of America.\n\n// Declares a utility class used to parse command-line options.\n\n#ifndef AOCL_UTILS_OPTIONS_H\n#define AOCL_UTILS_OPTIONS_H\n\n#include <map>\n#include <sstream>\n#include <string>\n#include <vector>\n\nnamespace aocl_utils {\n\nclass Options {\npublic:\n  typedef std::vector<std::string> StringVec;\n\n  Options();\n  Options(int num, char *argv[]);\n\n  bool has(const std::string &name) const;\n  std::string &get(const std::string &name); // will create an empty option if it does not exist\n  const std::string &get(const std::string &name) const; // error if option does not exist\n\n  void set(const std::string &name, const std::string &value) { get(name) = value; }\n\n  // Command line options must be of the following form:\n  //  [-]-name (indicates option exists)\n  //  [-]-name=value\n  //\n  // This function assumes that the values are from main(int, char *).\n  // This means that the argv[0] is skipped.\n  void addFromCommandLine(int num, char *argv[]);\n\n  // This templated function converts the option value to the given type.\n  // An assert is raised if the conversion fails.\n  template<typename T>\n  T get(const std::string &name) const;\n\n  template<typename T>\n  void set(const std::string &name, const T &value);\n\n  // Non-options are arguments processed in addFromCommandLine\n  // that were not recognized as options.\n  const StringVec &getNonOptions() const { return m_nonoptions; }\n  size_t getNonOptionCount() const { return m_nonoptions.size(); }\n  const std::string &getNonOption(size_t i) const { return m_nonoptions[i]; }\n\nprivate:\n  typedef std::map<std::string, std::string> OptionMap;\n\n  // Displays an error message indicating that a nameless option\n  // was provided.\n  void errorNameless() const;\n\n  // Displays an error message indicating that the given option\n  // has the wrong type and then exits with an error code.\n  void errorWrongType(const std::string &name) const;\n\n  // Displays an error message indicating that the given option\n  // does not exist and then exits with an error code.\n  void errorNonExistent(const std::string &name) const;\n\n  OptionMap m_options;\n  StringVec m_nonoptions;\n\n  Options(const Options &); // not implemented\n  void operator =(const Options &); // not implemented\n};\n\ntemplate<typename T>\nT Options::get(const std::string &name) const {\n  std::stringstream ss;\n  ss << get(name);\n\n  T v;\n  ss >> v;\n  if(ss.fail() || !ss.eof()) {\n    // Failed to parse or did not consume the whole string value.\n    errorWrongType(name);\n  }\n  return v;\n}\n\n// Specialization for bool. \ntemplate<>\ninline bool Options::get<bool>(const std::string &name) const {\n  if(has(name)) {\n    const std::string &v = get(name);\n    if(v == \"1\") {\n      return true;\n    }\n  }\n  return false;\n}\n\n// Specialization for std::string. Simply returns the option string.\n// Requires specialization because using stringstream to read the string\n// will stop at the first whitespace character (which is wrong).\ntemplate<>\ninline std::string Options::get<std::string>(const std::string &name) const {\n  return get(name);\n}\n\n// This assumes the type T can be serialized to a string and back (when get\n// is called).\ntemplate<typename T>\nvoid Options::set(const std::string &name, const T &value) {\n  std::stringstream ss;\n  ss << value;\n  set(name, ss.str());\n}\n\n} // ns aocl_utils\n\n#endif\n\n"
  },
  {
    "path": "autosa_scripts/intel_opencl_scripts/common/inc/AOCLUtils/scoped_ptrs.h",
    "content": "// Copyright (C) 2013-2020 Altera Corporation, San Jose, California, USA. All rights reserved.\n// Permission is hereby granted, free of charge, to any person obtaining a copy of this\n// software and associated documentation files (the \"Software\"), to deal in the Software\n// without restriction, including without limitation the rights to use, copy, modify, merge,\n// publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to\n// whom the Software is furnished to do so, subject to the following conditions:\n// The above copyright notice and this permission notice shall be included in all copies or\n// substantial portions of the Software.\n// \n// THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES\n// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT\n// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,\n// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR\n// OTHER DEALINGS IN THE SOFTWARE.\n// \n// This agreement shall be governed in all respects by the laws of the State of California and\n// by the laws of the United States of America.\n\n// Scoped pointer definitions.\n\n#ifndef AOCL_UTILS_SCOPED_PTRS_H\n#define AOCL_UTILS_SCOPED_PTRS_H\n\nnamespace aocl_utils {\n\n// Interface is essentially the combination of std::auto_ptr and boost's smart pointers,\n// along with some small extensions (auto conversion to T*).\n\n// scoped_ptr: assumes pointer was allocated with operator new; destroys with operator delete\ntemplate<typename T>\nclass scoped_ptr {\npublic:\n  typedef scoped_ptr<T> this_type;\n\n  scoped_ptr() : m_ptr(NULL) {}\n  scoped_ptr(T *ptr) : m_ptr(ptr) {}\n  ~scoped_ptr() { reset(); }\n\n  T *get() const { return m_ptr; }\n  operator T *() const { return m_ptr; }\n  T *operator ->() const { return m_ptr; }\n  T &operator *() const { return *m_ptr; }\n\n  this_type &operator =(T *ptr) { reset(ptr); return *this; }\n\n  void reset(T *ptr = NULL) { delete m_ptr; m_ptr = ptr; }\n  T *release() { T *ptr = m_ptr; m_ptr = NULL; return ptr; }\n\nprivate:\n  T *m_ptr;\n\n  // noncopyable\n  scoped_ptr(const this_type &);\n  this_type &operator =(const this_type &);\n};\n\n// scoped_array: assumes pointer was allocated with operator new[]; destroys with operator delete[]\n// Also supports allocation/reset with a number, which is the number of\n// elements of type T.\ntemplate<typename T>\nclass scoped_array {\npublic:\n  typedef scoped_array<T> this_type;\n\n  scoped_array() : m_ptr(NULL) {}\n  scoped_array(T *ptr) : m_ptr(NULL) { reset(ptr); }\n  explicit scoped_array(size_t n) : m_ptr(NULL) { reset(n); }\n  ~scoped_array() { reset(); }\n\n  T *get() const { return m_ptr; }\n  operator T *() const { return m_ptr; }\n  T *operator ->() const { return m_ptr; }\n  T &operator *() const { return *m_ptr; }\n  T &operator [](int index) const { return m_ptr[index]; }\n\n  this_type &operator =(T *ptr) { reset(ptr); return *this; }\n\n  void reset(T *ptr = NULL) { delete[] m_ptr; m_ptr = ptr; }\n  void reset(size_t n) { reset(new T[n]); }\n  T *release() { T *ptr = m_ptr; m_ptr = NULL; return ptr; }\n\nprivate:\n  T *m_ptr;\n\n  // noncopyable\n  scoped_array(const this_type &);\n  this_type &operator =(const this_type &);\n};\n\n// scoped_aligned_ptr: assumes pointer was allocated with alignedMalloc; destroys with alignedFree\n// Also supports allocation/reset with a number, which is the number of\n// elements of type T\ntemplate<typename T>\nclass scoped_aligned_ptr {\npublic:\n  typedef scoped_aligned_ptr<T> this_type;\n\n  scoped_aligned_ptr() : m_ptr(NULL) {}\n  scoped_aligned_ptr(T *ptr) : m_ptr(NULL) { reset(ptr); }\n  explicit scoped_aligned_ptr(size_t n) : m_ptr(NULL) { reset(n); }\n  ~scoped_aligned_ptr() { reset(); }\n\n  T *get() const { return m_ptr; }\n  operator T *() const { return m_ptr; }\n  T *operator ->() const { return m_ptr; }\n  T &operator *() const { return *m_ptr; }\n  T &operator [](int index) const { return m_ptr[index]; }\n\n  this_type &operator =(T *ptr) { reset(ptr); return *this; }\n\n  void reset(T *ptr = NULL) { if(m_ptr) alignedFree(m_ptr); m_ptr = ptr; }\n  void reset(size_t n) { reset((T*) alignedMalloc(sizeof(T) * n)); }\n  T *release() { T *ptr = m_ptr; m_ptr = NULL; return ptr; }\n\nprivate:\n  T *m_ptr;\n\n  // noncopyable\n  scoped_aligned_ptr(const this_type &);\n  this_type &operator =(const this_type &);\n};\n\n#if USE_SVM_API == 1\n// scoped_SVM_aligned_ptr: assumes pointer was allocated with clSVMAlloc; destroys with clSVMFree\n// Also supports allocation/reset with a number, which is the number of\n// elements of type T\ntemplate<typename T>\nclass scoped_SVM_aligned_ptr {\npublic:\n\ttypedef scoped_SVM_aligned_ptr<T> this_type;\n\n\tscoped_SVM_aligned_ptr() : m_ptr(NULL) {}\n\tscoped_SVM_aligned_ptr(T *ptr) : m_ptr(NULL) { reset(ptr); }\n\texplicit scoped_SVM_aligned_ptr(cl_context ctx, size_t n) : m_ptr(NULL) { reset(ctx, n); }\n\t~scoped_SVM_aligned_ptr() { reset(); }\n\n\tT *get() const { return m_ptr; }\n\toperator T *() const { return m_ptr; }\n\tT *operator ->() const { return m_ptr; }\n\tT &operator *() const { return *m_ptr; }\n\tT &operator [](int index) const { return m_ptr[index]; }\n\n\tthis_type &operator =(T *ptr) { reset(ptr); return *this; }\n\n\tvoid reset(T *ptr = NULL) { if (m_ptr) clSVMFree(m_ctx, m_ptr); m_ptr = ptr; }\n\tvoid reset(cl_context ctx, size_t n) { reset((T*)clSVMAlloc(ctx, 0, sizeof(T) * n, 0)); m_ctx = ctx; }\n\tT *release() { T *ptr = m_ptr; m_ptr = NULL; return ptr; }\n\nprivate:\n\tT *m_ptr;\n\tcl_context m_ctx;\n\n\t// noncopyable\n\tscoped_SVM_aligned_ptr(const this_type &);\n\tthis_type &operator =(const this_type &);\n};\n#endif /* USE_SVM_API == 1 */\n\n} // ns aocl_utils\n\n#endif\n\n"
  },
  {
    "path": "autosa_scripts/intel_opencl_scripts/common/readme.css",
    "content": "/*\nCopyright (C) 2013-2020 Altera Corporation, San Jose, California, USA. All rights reserved.\nPermission is hereby granted, free of charge, to any person obtaining a copy of this\nsoftware and associated documentation files (the \"Software\"), to deal in the Software\nwithout restriction, including without limitation the rights to use, copy, modify, merge,\npublish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to\nwhom the Software is furnished to do so, subject to the following conditions:\nThe above copyright notice and this permission notice shall be included in all copies or\nsubstantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\nEXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES\nOF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\nNONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT\nHOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,\nWHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\nFROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR\nOTHER DEALINGS IN THE SOFTWARE.\n\nThis agreement shall be governed in all respects by the laws of the State of California and\nby the laws of the United States of America.\n*/\n\nbody {\n  margin: 0 1em 1em 1em;\n  font-family: sans-serif;\n}\nul {\n  list-style-type: square;\n}\npre, code, kbd, samp, tt {\n  font-family: monospace, sans-serif;\n  font-size: 1em;\n}\n\nh1 {\n  font-size: 200%;\n  color: #fff;\n  background-color: #0067a6;\n  margin: 0 -0.5em;\n  padding: 0.25em 0.5em;\n}\nh1 .preheading {\n  font-size: 40%;\n  font-weight: normal;\n}\nh2 {\n  font-size: 125%;\n  background-color: #bae5ff;\n  margin: 1.5em -0.8em 0 -0.8em;\n  padding: 0.2em 0.8em;\n}\nh3 {\n  margin-top: 1.5em;\n  font-size: 100%;\n  border-bottom: 1px dotted #000;\n}\n\ntable {\n  border: 2px solid #0067a6;\n  border-collapse: collapse;\n}\nth {\n  border-bottom: 1px solid #0067a6;\n  border-left: 1px dotted #0067a6;\n  border-right: 1px dotted #0067a6;\n  background-color: #bae5ff;\n  padding: 0.3em;\n  font-size: 90%;\n}\ntd {\n  padding: 0.3em;\n  border: 1px dotted #0067a6;\n}\n\ntable.reqs {\n  margin: 0 auto;\n}\ntable.reqs td {\n  white-space: nowrap;\n  text-align: center;\n}\ntable.reqs td:first-child,\ntable.reqs tr:first-child th:first-child {\n  text-align: left;\n}\ntable.reqs td.req {\n  background-color: #b3ef71;\n  font-size: 150%;\n  padding: 0 0.3em;\n}\ntable.reqs td.req .either {\n  font-size: 50%;\n}\ntable.reqs td.unsupported {\n  white-space: normal;\n  background-color: #ccc;\n  max-width: 20em;\n}\ntable.reqs a.note {\n  text-decoration: none;\n}\nol.req-notes > li {\n  margin-bottom: 0.75em;\n}\n\ntable.history {\n  margin: 0 auto;\n}\ntable.history td {\n  text-align: center;\n  vertical-align: top;\n}\ntable.history .changes {\n  text-align: left;\n}\ntable.history tbody tr:first-child td {\n  background-color: #b3ef71;\n}\ntable.history ul {\n  margin: 0;\n  padding-left: 1em;\n}\n\ntable.pkg-contents {\n  margin: 0 auto;\n}\ntable.pkg-contents th,\ntable.pkg-contents td {\n  text-align: left;\n  vertical-align: top;\n}\ntable.pkg-contents td.path {\n  font-family: monospace, sans-serif;\n  font-size: 1em;\n}\ntable.pkg-contents tr.highlight td {\n  background-color: #ffc;\n  font-weight: bold;\n  color: #000;\n}\ntable.pkg-contents td p:first-child {\n  margin-top: 0;\n}\ntable.pkg-contents td p:last-child {\n  margin-bottom: 0;\n}\n\ntable.parameters {\n  margin-left: 3em;\n  margin-right: 3em;\n  font-family: monospace, sans-serif;\n  font-size: 1em;\n}\ntable.parameters th,\ntable.parameters td {\n  font-family: sans-serif;\n  text-align: center;\n  vertical-align: top;\n}\ntable.parameters .name,\ntable.parameters .desc {\n  text-align: left;\n}\ntable.parameters .name {\n  white-space: nowrap;\n}\ntable.parameters td.name,\ntable.parameters td.default {\n  font-family: monospace, sans-serif;\n  font-size: 1em;\n}\ntable.parameters ul {\n  margin-top: 0;\n}\ntable.parameters td ul:last-child {\n  margin-bottom: 0;\n}\n\ntable.indent {\n  margin-left: 3em;\n}\n\n.doc .title {\n  background-color: #eee;\n  padding: 0.35em;\n  margin-bottom: 0.5em;\n}\n.doc .title a {\n  font-weight: bold;\n}\n.doc .desc {\n  margin-left: 2em;\n  margin-right: 2em;\n}\n\n.left {\n  text-align: left;\n}\n.center {\n  text-align: center;\n}\n.right {\n  text-align: right;\n}\n\n.mono {\n  font-family: monospace, sans-serif;\n  font-size: 1em;\n}\n.highlight {\n  font-weight: bold;\n  color: #0067a6;\n}\n.nowrap {\n  white-space: nowrap;\n}\n\n.command {\n  font-family: monospace, sans-serif;\n  font-size: 1em;\n  margin: 0 3em;\n  background-color: #ffc;\n  border: 1px solid #aaa;\n  padding: 0.5em 1em;\n}\n.console-output,\n.code-block {\n  display: block;\n  font-family: monospace, sans-serif;\n  font-size: 1em;\n  margin: 0 3em;\n  background-color: #fff;\n  border: 1px solid #aaa !important;\n  padding: 1.8em 1em 0.5em 1em !important;\n  position: relative;\n}\n.console-output .heading,\n.code-block .heading {\n  position: absolute;\n  left: 0;\n  top: 0;\n  width: 100%;\n  font-size: 80%;\n  text-transform: uppercase;\n  background-color: #e8e8e8;\n  padding: 0.3125em 0;\n  border-bottom: 1px dotted #888;\n}\n.console-output .heading span,\n.code-block .heading span {\n  padding: 0 1.25em;\n}\n.not-released {\n  font-weight: bold;\n  color: red;\n}\n.license,\n.trademark {\n  font-size: 80%;\n}\n"
  },
  {
    "path": "autosa_scripts/intel_opencl_scripts/common/src/AOCLUtils/opencl.cpp",
    "content": "// Copyright (C) 2013-2020 Altera Corporation, San Jose, California, USA. All rights reserved.\n// Permission is hereby granted, free of charge, to any person obtaining a copy of this\n// software and associated documentation files (the \"Software\"), to deal in the Software\n// without restriction, including without limitation the rights to use, copy, modify, merge,\n// publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to\n// whom the Software is furnished to do so, subject to the following conditions:\n// The above copyright notice and this permission notice shall be included in all copies or\n// substantial portions of the Software.\n// \n// THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES\n// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT\n// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,\n// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR\n// OTHER DEALINGS IN THE SOFTWARE.\n// \n// This agreement shall be governed in all respects by the laws of the State of California and\n// by the laws of the United States of America.\n\n#include \"AOCLUtils/aocl_utils.h\"\n#include <algorithm>\n#include <stdarg.h>\n\n#ifdef _WIN32 // Windows\n#include <windows.h>\n#else         // Linux\n#include <stdio.h> \n#include <unistd.h> // readlink, chdir\n#endif\n\nnamespace aocl_utils {\n\nstatic const char *const VERSION_STR = \"191\";\n\n//////////////////////////////////////////\n// Host allocation functions for alignment\n//////////////////////////////////////////\n\n// This is the minimum alignment requirement to ensure DMA can be used.\nconst unsigned AOCL_ALIGNMENT = 64;\n\n#ifdef _WIN32 // Windows\nvoid *alignedMalloc(size_t size) {\n  return _aligned_malloc (size, AOCL_ALIGNMENT);\n}\n\nvoid alignedFree(void * ptr) {\n  _aligned_free(ptr);\n}\n#else          // Linux\nvoid *alignedMalloc(size_t size) {\n  void *result = NULL;\n  int rc;\n  rc = posix_memalign (&result, AOCL_ALIGNMENT, size);\n  return result;\n}\n\nvoid alignedFree(void * ptr) {\n  free (ptr);\n}\n#endif\n\n///////////////////////////////\n// Error functions\n///////////////////////////////\n\n// Print the error associciated with an error code\nvoid printError(cl_int error) {\n  // Print error message\n  switch(error)\n  {\n    case -1:\n      printf(\"CL_DEVICE_NOT_FOUND \");\n      break;\n    case -2:\n      printf(\"CL_DEVICE_NOT_AVAILABLE \");\n      break;\n    case -3:\n      printf(\"CL_COMPILER_NOT_AVAILABLE \");\n      break;\n    case -4:\n      printf(\"CL_MEM_OBJECT_ALLOCATION_FAILURE \");\n      break;\n    case -5:\n      printf(\"CL_OUT_OF_RESOURCES \");\n      break;\n    case -6:\n      printf(\"CL_OUT_OF_HOST_MEMORY \");\n      break;\n    case -7:\n      printf(\"CL_PROFILING_INFO_NOT_AVAILABLE \");\n      break;\n    case -8:\n      printf(\"CL_MEM_COPY_OVERLAP \");\n      break;\n    case -9:\n      printf(\"CL_IMAGE_FORMAT_MISMATCH \");\n      break;\n    case -10:\n      printf(\"CL_IMAGE_FORMAT_NOT_SUPPORTED \");\n      break;\n    case -11:\n      printf(\"CL_BUILD_PROGRAM_FAILURE \");\n      break;\n    case -12:\n      printf(\"CL_MAP_FAILURE \");\n      break;\n    case -13:\n      printf(\"CL_MISALIGNED_SUB_BUFFER_OFFSET \");\n      break;\n    case -14:\n      printf(\"CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST \");\n      break;\n    case -15:\n      printf(\"CL_COMPILE_PROGRAM_FAILURE \");\n      break;\n    case -16:\n      printf(\"CL_LINKER_NOT_AVAILABLE \");\n      break;\n    case -17:\n      printf(\"CL_LINK_PROGRAM_FAILURE \");\n      break;\n    case -18:\n      printf(\"CL_DEVICE_PARTITION_FAILED \");\n      break;\n    case -19:\n      printf(\"CL_KERNEL_ARG_INFO_NOT_AVAILABLE \");\n      break;\n\n    case -30:\n      printf(\"CL_INVALID_VALUE \");\n      break;\n    case -31:\n      printf(\"CL_INVALID_DEVICE_TYPE \");\n      break;\n    case -32:\n      printf(\"CL_INVALID_PLATFORM \");\n      break;\n    case -33:\n      printf(\"CL_INVALID_DEVICE \");\n      break;\n    case -34:\n      printf(\"CL_INVALID_CONTEXT \");\n      break;\n    case -35:\n      printf(\"CL_INVALID_QUEUE_PROPERTIES \");\n      break;\n    case -36:\n      printf(\"CL_INVALID_COMMAND_QUEUE \");\n      break;\n    case -37:\n      printf(\"CL_INVALID_HOST_PTR \");\n      break;\n    case -38:\n      printf(\"CL_INVALID_MEM_OBJECT \");\n      break;\n    case -39:\n      printf(\"CL_INVALID_IMAGE_FORMAT_DESCRIPTOR \");\n      break;\n    case -40:\n      printf(\"CL_INVALID_IMAGE_SIZE \");\n      break;\n    case -41:\n      printf(\"CL_INVALID_SAMPLER \");\n      break;\n    case -42:\n      printf(\"CL_INVALID_BINARY \");\n      break;\n    case -43:\n      printf(\"CL_INVALID_BUILD_OPTIONS \");\n      break;\n    case -44:\n      printf(\"CL_INVALID_PROGRAM \");\n      break;\n    case -45:\n      printf(\"CL_INVALID_PROGRAM_EXECUTABLE \");\n      break;\n    case -46:\n      printf(\"CL_INVALID_KERNEL_NAME \");\n      break;\n    case -47:\n      printf(\"CL_INVALID_KERNEL_DEFINITION \");\n      break;\n    case -48:\n      printf(\"CL_INVALID_KERNEL \");\n      break;\n    case -49:\n      printf(\"CL_INVALID_ARG_INDEX \");\n      break;\n    case -50:\n      printf(\"CL_INVALID_ARG_VALUE \");\n      break;\n    case -51:\n      printf(\"CL_INVALID_ARG_SIZE \");\n      break;\n    case -52:\n      printf(\"CL_INVALID_KERNEL_ARGS \");\n      break;\n    case -53:\n      printf(\"CL_INVALID_WORK_DIMENSION \");\n      break;\n    case -54:\n      printf(\"CL_INVALID_WORK_GROUP_SIZE \");\n      break;\n    case -55:\n      printf(\"CL_INVALID_WORK_ITEM_SIZE \");\n      break;\n    case -56:\n      printf(\"CL_INVALID_GLOBAL_OFFSET \");\n      break;\n    case -57:\n      printf(\"CL_INVALID_EVENT_WAIT_LIST \");\n      break;\n    case -58:\n      printf(\"CL_INVALID_EVENT \");\n      break;\n    case -59:\n      printf(\"CL_INVALID_OPERATION \");\n      break;\n    case -60:\n      printf(\"CL_INVALID_GL_OBJECT \");\n      break;\n    case -61:\n      printf(\"CL_INVALID_BUFFER_SIZE \");\n      break;\n    case -62:\n      printf(\"CL_INVALID_MIP_LEVEL \");\n      break;\n    case -63:\n      printf(\"CL_INVALID_GLOBAL_WORK_SIZE \");\n      break;\n    case -64:\n      printf(\"CL_INVALID_PROPERTY \");\n      break;\n    case -65:\n      printf(\"CL_INVALID_IMAGE_DESCRIPTOR \");\n      break;\n    case -66:\n      printf(\"CL_INVALID_COMPILER_OPTIONS \");\n      break;\n    case -67:\n      printf(\"CL_INVALID_LINKER_OPTIONS \");\n      break;\n    case -68:\n      printf(\"CL_INVALID_DEVICE_PARTITION_COUNT \");\n      break;\n    case -69:\n      printf(\"CL_INVALID_PIPE_SIZE \");\n      break;\n    case -70:\n      printf(\"CL_INVALID_DEVICE_QUEUE \");\n      break;\n\n    case -1001:\n      printf(\"CL_PLATFORM_NOT_FOUND_KHR \");\n      break;\n\n    case -1094:\n      printf(\"CL_INVALID_ACCELERATOR_INTEL \");\n      break;\n    case -1095:\n      printf(\"CL_INVALID_ACCELERATOR_TYPE_INTEL \");\n      break;\n    case -1096:\n      printf(\"CL_INVALID_ACCELERATOR_DESCRIPTOR_INTEL \");\n      break;\n    case -1097:\n      printf(\"CL_ACCELERATOR_TYPE_NOT_SUPPORTED_INTEL \");\n      break;\n    default:\n      printf(\"UNRECOGNIZED ERROR CODE (%d)\", error);\n  }\n}\n\n// Print line, file name, and error code if there is an error. Exits the\n// application upon error.\nvoid _checkError(int line,\n                 const char *file,\n                 cl_int error,\n                 const char *msg,\n                 ...) {\n  // If not successful\n  if(error != CL_SUCCESS) {\n    // Print line and file\n    printf(\"ERROR: \");\n    printError(error);\n    printf(\"\\nLocation: %s:%d\\n\", file, line);\n\n    // Print custom message.\n    va_list vl;\n    va_start(vl, msg);\n    vprintf(msg, vl);\n    printf(\"\\n\");\n    va_end(vl);\n\n    // Cleanup and bail.\n    cleanup();\n    exit(error);\n  }\n}\n\n// Sets the current working directory to be the same as the directory\n// containing the running executable.\nbool setCwdToExeDir() {\n#ifdef _WIN32 // Windows\n  HMODULE hMod = GetModuleHandle(NULL);\n  char path[MAX_PATH];\n  GetModuleFileNameA(hMod, path, MAX_PATH);\n\n#else         // Linux\n  // Get path of executable.\n  char path[300];\n  ssize_t n = readlink(\"/proc/self/exe\", path, sizeof(path)/sizeof(path[0]) - 1);\n  if(n == -1) {\n    return false;\n  }\n  path[n] = 0;\n#endif\n\n  // Find the last '\\' or '/' and terminate the path there; it is now\n  // the directory containing the executable.\n  size_t i;\n  for(i = strnlen(path, sizeof(path)) - 1; i > 0 && path[i] != '/' && path[i] != '\\\\'; --i);\n  path[i] = '\\0';\n\n  // Change the current directory.\n#ifdef _WIN32 // Windows\n  SetCurrentDirectoryA(path);\n#else         // Linux\n  int rc;\n  rc = chdir(path);\n#endif\n\n  return true;\n}\n\n// Searches all platforms for the first platform whose name\n// contains the search string (case-insensitive).\ncl_platform_id findPlatform(const char *platform_name_search) {\n  cl_int status;\n\n  std::string search = platform_name_search;\n  std::transform(search.begin(), search.end(), search.begin(), tolower);\n\n  // Get number of platforms.\n  cl_uint num_platforms;\n  status = clGetPlatformIDs(0, NULL, &num_platforms);\n  checkError(status, \"Query for number of platforms failed\");\n\n  // Get a list of all platform ids.\n  scoped_array<cl_platform_id> pids(num_platforms);\n  status = clGetPlatformIDs(num_platforms, pids, NULL);\n  checkError(status, \"Query for all platform ids failed\");\n\n  // For each platform, get name and compare against the search string.\n  for(unsigned i = 0; i < num_platforms; ++i) {\n    std::string name = getPlatformName(pids[i]);\n\n    // Convert to lower case.\n    std::transform(name.begin(), name.end(), name.begin(), tolower);\n\n    if(name.find(search) != std::string::npos) {\n      // Found!\n      return pids[i];\n    }\n  }\n\n  // No platform found.\n  return NULL;\n}\n\n// Returns the platform name.\nstd::string getPlatformName(cl_platform_id pid) {\n  cl_int status;\n\n  size_t sz;\n  status = clGetPlatformInfo(pid, CL_PLATFORM_NAME, 0, NULL, &sz);\n  checkError(status, \"Query for platform name size failed\");\n\n  scoped_array<char> name(sz);\n  status = clGetPlatformInfo(pid, CL_PLATFORM_NAME, sz, name, NULL);\n  checkError(status, \"Query for platform name failed\");\n\n  return name.get();\n}\n\n// Returns the device name.\nstd::string getDeviceName(cl_device_id did) {\n  cl_int status;\n\n  size_t sz;\n  status = clGetDeviceInfo(did, CL_DEVICE_NAME, 0, NULL, &sz);\n  checkError(status, \"Failed to get device name size\");\n\n  scoped_array<char> name(sz);\n  status = clGetDeviceInfo(did, CL_DEVICE_NAME, sz, name, NULL);\n  checkError(status, \"Failed to get device name\");\n\n  return name.get();\n}\n\n// Returns the list of all devices.\ncl_device_id *getDevices(cl_platform_id pid, cl_device_type dev_type, cl_uint *num_devices) {\n  cl_int status;\n\n  status = clGetDeviceIDs(pid, dev_type, 0, NULL, num_devices);\n  checkError(status, \"Query for number of devices failed\");\n\n  cl_device_id *dids = new cl_device_id[*num_devices];\n  status = clGetDeviceIDs(pid, dev_type, *num_devices, dids, NULL);\n  checkError(status, \"Query for device ids\");\n\n  // For Windows, clGetDeviceIDs() always gives num_devices = 128, so we have to find the actual number of available devices\n  // See Release Notes here: https://www.intel.com/content/www/us/en/programmable/documentation/ewa1412772636144.html#ewa1412773000284\n#ifdef _WIN32\n  unsigned num_available = 0;\n  cl_bool is_available;\n  for (unsigned i = 0; i < *num_devices; i++) {\n    status = clGetDeviceInfo(dids[i], CL_DEVICE_AVAILABLE, sizeof(is_available), &is_available, NULL);\n    checkError(status, \"Failed to get device availability\");\n    if (is_available != CL_TRUE)\n      break;\n    num_available++;\n  }\n  *num_devices = num_available;\n#endif\n\n  return dids;\n}\n\n// Create a program for all devices associated with the context.\ncl_program createProgramFromBinary(cl_context context, const char *binary_file_name, const cl_device_id *devices, unsigned num_devices) {\n  // Early exit for potentially the most common way to fail: AOCX does not exist.\n  if(!fileExists(binary_file_name)) {\n    printf(\"AOCX file '%s' does not exist.\\n\", binary_file_name);\n    checkError(CL_INVALID_PROGRAM, \"Failed to load binary file\");\n  }\n\n  // Load the binary.\n  size_t binary_size;\n  scoped_array<unsigned char> binary(loadBinaryFile(binary_file_name, &binary_size));\n  if(binary == NULL) {\n    checkError(CL_INVALID_PROGRAM, \"Failed to load binary file\");\n  }\n\n  scoped_array<size_t> binary_lengths(num_devices);\n  scoped_array<unsigned char *> binaries(num_devices);\n  for(unsigned i = 0; i < num_devices; ++i) {\n    binary_lengths[i] = binary_size;\n    binaries[i] = binary;\n  }\n\n  cl_int status;\n  scoped_array<cl_int> binary_status(num_devices);\n\n  cl_program program = clCreateProgramWithBinary(context, num_devices, devices, binary_lengths,\n      (const unsigned char **) binaries.get(), binary_status, &status);\n  checkError(status, \"Failed to create program with binary\");\n  for(unsigned i = 0; i < num_devices; ++i) {\n    checkError(binary_status[i], \"Failed to load binary for device\");\n  }\n\n  return program;\n}\n\n// Loads a file in binary form.\nunsigned char *loadBinaryFile(const char *file_name, size_t *size) {\n  // Open the File\n  FILE* fp;\n  long ftell_size;\n  size_t elements_read;\n#ifdef _WIN32\n  if(fopen_s(&fp, file_name, \"rb\") != 0) {\n    return NULL;\n  }\n#else\n  fp = fopen(file_name, \"rb\");\n  if(fp == 0) {\n    return NULL;\n  }\n#endif\n\n  // Get the size of the file\n  fseek(fp, 0, SEEK_END);\n  ftell_size = ftell(fp);\n  if (ftell_size < 0) {\n    fclose(fp);\n    return NULL;\n  }\n  *size = (unsigned)ftell_size;\n\n  // Allocate space for the binary\n  unsigned char *binary = new unsigned char[*size];\n\n  // Go back to the file start\n  rewind(fp);\n\n  // Read the file into the binary\n  elements_read = fread((void*)binary, *size, 1, fp);\n  if(elements_read == 0) {\n    delete[] binary;\n    fclose(fp);\n    return NULL;\n  }\n\n  fclose(fp);\n  return binary;\n}\n\nbool fileExists(const char *file_name) {\n#ifdef _WIN32 // Windows\n  DWORD attrib = GetFileAttributesA(file_name);\n  return (attrib != INVALID_FILE_ATTRIBUTES && !(attrib & FILE_ATTRIBUTE_DIRECTORY));\n#else         // Linux\n  return access(file_name, R_OK) != -1;\n#endif\n}\n\nstd::string getBoardBinaryFile(const char *prefix, cl_device_id device) {\n  // First check if <prefix>.aocx exists. Use it if it does.\n  std::string file_name = std::string(prefix) + \".aocx\";\n  if(fileExists(file_name.c_str())) {\n    return file_name;\n  }\n\n  // Now get the name of the board. For Intel(R) FPGA SDK for OpenCL(TM) boards,\n  // the name of the device is presented as:\n  //  <board name> : ...\n  std::string device_name = getDeviceName(device);\n\n  // Now search for the \" :\" in the device name.\n  size_t end = device_name.find(\" :\");\n  if(end != std::string::npos) {\n    std::string board_name(device_name, 0, end);\n\n    // Look for a AOCX with the name <prefix>_<board_name>_<version>.aocx.\n    file_name = std::string(prefix) + \"_\" + board_name + \"_\" + VERSION_STR + \".aocx\";\n    if(fileExists(file_name.c_str())) {\n      return file_name;\n    }\n  }\n\n  // At this point just use <prefix>.aocx. This file doesn't exist\n  // and this should trigger an error later.\n  return std::string(prefix) + \".aocx\";\n}\n\n// High-resolution timer.\ndouble getCurrentTimestamp() {\n#ifdef _WIN32 // Windows\n  // Use the high-resolution performance counter.\n\n  static LARGE_INTEGER ticks_per_second = {};\n  if(ticks_per_second.QuadPart == 0) {\n    // First call - get the frequency.\n    QueryPerformanceFrequency(&ticks_per_second);\n  }\n\n  LARGE_INTEGER counter;\n  QueryPerformanceCounter(&counter);\n\n  double seconds = double(counter.QuadPart) / double(ticks_per_second.QuadPart);\n  return seconds;\n#else         // Linux\n  timespec a;\n  clock_gettime(CLOCK_MONOTONIC, &a);\n  return (double(a.tv_nsec) * 1.0e-9) + double(a.tv_sec);\n#endif\n}\n\ncl_ulong getStartEndTime(cl_event event) {\n  cl_int status;\n\n  cl_ulong start, end;\n  status = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(start), &start, NULL);\n  checkError(status, \"Failed to query event start time\");\n  status = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(end), &end, NULL);\n  checkError(status, \"Failed to query event end time\");\n\n  return end - start;\n}\n\ncl_ulong getStartEndTime(cl_event *events, unsigned num_events) {\n  cl_int status;\n\n  cl_ulong min_start = 0;\n  cl_ulong max_end = 0;\n  for(unsigned i = 0; i < num_events; ++i) {\n    cl_ulong start, end;\n    status = clGetEventProfilingInfo(events[i], CL_PROFILING_COMMAND_START, sizeof(start), &start, NULL);\n    checkError(status, \"Failed to query event start time\");\n    status = clGetEventProfilingInfo(events[i], CL_PROFILING_COMMAND_END, sizeof(end), &end, NULL);\n    checkError(status, \"Failed to query event end time\");\n\n    if(i == 0) {\n      min_start = start;\n      max_end = end;\n    }\n    else {\n      if(start < min_start) {\n        min_start = start;\n      }\n      if(end > max_end) {\n        max_end = end;\n      }\n    }\n  }\n\n  return max_end - min_start;\n}\n\nvoid waitMilliseconds(unsigned ms) {\n#ifdef _WIN32 // Windows\n  Sleep(ms);\n#else         // Linux\n  timespec sleeptime = {0, 0};\n  sleeptime.tv_sec = ms / 1000;\n  sleeptime.tv_nsec = long(ms % 1000) * 1000000L;  // convert to nanoseconds\n  nanosleep(&sleeptime, NULL);\n#endif\n}\n\nvoid oclContextCallback(const char *errinfo, const void *, size_t, void *) {\n  printf(\"Context callback: %s\\n\", errinfo);\n}\n\n} // ns aocl_utils\n\n"
  },
  {
    "path": "autosa_scripts/intel_opencl_scripts/common/src/AOCLUtils/options.cpp",
    "content": "// Copyright (C) 2013-2020 Altera Corporation, San Jose, California, USA. All rights reserved.\n// Permission is hereby granted, free of charge, to any person obtaining a copy of this\n// software and associated documentation files (the \"Software\"), to deal in the Software\n// without restriction, including without limitation the rights to use, copy, modify, merge,\n// publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to\n// whom the Software is furnished to do so, subject to the following conditions:\n// The above copyright notice and this permission notice shall be included in all copies or\n// substantial portions of the Software.\n// \n// THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES\n// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT\n// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,\n// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR\n// OTHER DEALINGS IN THE SOFTWARE.\n// \n// This agreement shall be governed in all respects by the laws of the State of California and\n// by the laws of the United States of America.\n\n#include \"AOCLUtils/aocl_utils.h\"\n#include <algorithm>\n#include <iostream>\n#include <stdlib.h>\n#include <vector>\n\nnamespace aocl_utils {\n\nOptions::Options() {\n}\n\nOptions::Options(int num, char *argv[]) {\n  addFromCommandLine(num, argv);\n}\n\nbool Options::has(const std::string &name) const {\n  return m_options.find(name) != m_options.end();\n}\n\nstd::string &Options::get(const std::string &name) {\n  return m_options[name];\n}\n\nconst std::string &Options::get(const std::string &name) const {\n  OptionMap::const_iterator it = m_options.find(name);\n  if(it == m_options.end()) {\n    errorNonExistent(name);\n  }\n  return it->second;\n}\n\nvoid Options::addFromCommandLine(int num, char *argv[]) {\n  for(int i = 1; i < num; ++i) {\n    const std::string arg = argv[i];\n\n    // Look for the first '-'.\n    if(arg.size() > 1 && arg[0] == '-') {\n      size_t eq = arg.find('=');\n      size_t name_start = 1;\n\n      // Check if there's a second '-'.\n      if(arg.size() > 2 && arg[1] == '-') {\n        name_start = 2;\n      }\n\n      if(eq == std::string::npos) {\n        // No '='; treat as a boolean option.\n        set(arg.substr(name_start), true);\n      }\n      else if(eq == name_start) {\n        // No name?!\n        errorNameless();\n      }\n      else {\n        set(arg.substr(name_start, eq - name_start), arg.substr(eq + 1));\n      }\n    }\n    else {\n      // Not an option.\n      m_nonoptions.push_back(arg);\n    }\n  }\n}\n\nvoid Options::errorNameless() const {\n  std::cerr << \"No name provided for option.\\n\";\n  exit(1);\n}\n\nvoid Options::errorNonExistent(const std::string &name) const {\n  std::cerr << \"Option '\" << name << \"' does not exist.\\n\";\n  exit(1);\n}\n\nvoid Options::errorWrongType(const std::string &name) const {\n  std::cerr << \"Value for option '\" << name << \"' is not of the right type (value = '\"\n            << get(name) << \"').\\n\";\n  exit(1);\n}\n\n} // ns aocl_utils\n\n"
  },
  {
    "path": "autosa_scripts/intel_opencl_scripts/compile_design.sh",
    "content": "#!/bin/bash\n\n# - A script to compile and run the host program and bitstream on Intel OpenCL platform\n\nif [ $# != 1 ];\nthen\n  echo \"Usage: compile_design.sh [hw|emu|sim]\"\n  exit\nfi  \nmode=$1\necho $mode\n\necho \"Compiling the bitstream...\"\nif [ \"$mode\" == \"hw\" ]\nthen \n  # Compile the bitstream\n  # Change the board to your target board if necessary\n  aoc src/kernel_kernel.cl -o bin/kernel_kernel.aocx -fp-relaxed -board=s10mx_hbm_es\nelif [ \"$mode\" == \"emu\" ]\nthen\n  # Compiling for emulator\n  aoc -march=emulator src/kernel_kernel.cl -o bin/kernel_kernel.aocx -fp-relaxed -DEMULATE -legacy-emulator\nelif [ \"$mode\" == \"sim\" ]\nthen\n  # Compiling for simulator\n  aoc -march=simulator src/kernel_kernel.cl -o bin/kernel_kernel.aocx -fp-relaxed\nelse\n  echo \"Error: Unsupported mode\"\n  exit\nfi\n\n#echo \"Compiling the host program...\"\n## Compile the host program\n#make\n\n#echo \"Running the program...\"\n#case \"$mode\" in\n#    \"hw\")\n#      # Run the host program\n#      bin/host\n#      ;;\n#    \"emu\")\n#      # Run the host program with the emulator\n#      bin/host -emulator\n#      ;;\n#    \"sim\")\n#      # Run the host program with the simulator\n#      CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 bin/host\n#      ;;\n#esac\n"
  },
  {
    "path": "autosa_scripts/latency_model.py",
    "content": "import os\nimport json\nimport re\nimport xml.etree.ElementTree as ET\nimport numpy as np\nimport pandas as pd\nimport joblib\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn import metrics\nfrom sklearn.model_selection import train_test_split\nfrom scipy.stats.mstats import gmean\nfrom statistics import mean\nimport shutil\nimport math\nimport argparse\n\ndef extract_latency_info(design_dir):\n    \"\"\" Extract loop information of the design.\n\n    Returns a dictionary containing the following infomation:\n    - loop_infos: dict\n    - module_list: list\n    - array_info: dict\n    - module_grouped: dict\n\n    Parameters\n    ----------\n    design_dir: str\n        The design directory\n    \"\"\"\n    loop_path = f'{design_dir}/latency_est'\n    loop_info_files = os.listdir(loop_path)\n    loop_info_all = {}\n    module_names = []\n\n    for f_name in loop_info_files:\n        if f_name == 'array_info.json':\n            with open(loop_path + '/' + f_name) as f:\n                array_info = json.load(f)\n        else:\n            with open(loop_path + '/' + f_name) as f:\n                loop_info_module = json.load(f)\n                module_name = loop_info_module['module_name']\n                loop_info_all[module_name] = loop_info_module\n                module_names.append(module_name)\n\n    module_grouped = {}\n    # Place inter_trans and intra_trans module under the outer module\n    for module_name in module_names:\n        # intra_trans\n        if module_name.find('intra_trans') != -1:\n            module_name_prefix = module_name[:-12]\n            if module_name_prefix not in module_grouped:\n                module_grouped[module_name_prefix] = {}\n            module_grouped[module_name_prefix]['intra_trans'] = module_name\n\n            module_name_prefix = module_name_prefix + '_boundary'\n            if module_name_prefix not in module_grouped:\n                module_grouped[module_name_prefix] = {}\n            module_grouped[module_name_prefix]['intra_trans'] = module_name\n\n        # inter_trans\n        elif module_name.find('inter_trans') != -1:\n            if module_name.find('boundary') != -1:\n                module_name_prefix = module_name[:-21] + '_boundary'\n            else:\n                module_name_prefix = module_name[:-12]\n\n            if module_name_prefix not in module_grouped:\n                module_grouped[module_name_prefix] = {}\n            module_grouped[module_name_prefix]['inter_trans'] = module_name\n        else:\n            if module_name not in module_grouped:\n                module_grouped[module_name] = {}\n\n    ret = {'loop_infos': loop_info_all, 'module_list': module_names, \\\n           'module_grouped': module_grouped, 'array_info': array_info}\n\n    return ret\n\ndef convert_latency_infos_to_df(latency_infos):\n    \"\"\" Convert the latency infos into a dataframe.\n\n    \"\"\"\n    return\n\ndef is_loop_struct_leaf_empty(loop_struct):\n    \"\"\" Examine if the leaf node of the loop struct is empty.\n\n    Parameters\n    ----------\n    loop_struct: dict\n        loop structure in JSON format\n    \"\"\"\n    if \"loop\" in loop_struct:\n        child = loop_struct['loop']['child']\n        if child == None:\n            return 1\n        else:\n            return is_loop_struct_leaf_empty(child)\n    elif \"mark\" in loop_struct:\n        child = loop_struct['mark']['child']\n        if child == None:\n            return 1\n        else:\n            return is_loop_struct_leaf_empty(child)\n    elif \"user\" in loop_struct:\n        child = loop_struct['user']['user_expr']\n        if child == None:\n            return 1\n        else:\n            return 0\n    elif \"block\" in loop_struct:\n        children = loop_struct['block']['child']\n        if children == None:\n            return 1\n        else:\n            for child in children:\n                is_empty = is_loop_struct_leaf_empty(child)\n                if is_empty == 0:\n                    return 0\n            return 1\n    elif \"if\" in loop_struct:\n        if_struct = loop_struct['if']\n        then_block = if_struct['then']\n        is_empty = is_loop_struct_leaf_empty(then_block)\n        if is_empty == 0:\n            return 0\n        if 'else' in if_struct:\n            else_block = if_struct['else']\n            is_empty = is_loop_struct_leaf_empty(else_block)\n            if is_empty == 0:\n                return 0\n            return 1\n    return 1\n\ndef loop_struct_has_non_simd_loop(loop_struct, config):\n    \"\"\" Examine if the leaf node of the loop struct has any non-SIMD loop.\n\n    \"\"\"\n    if \"loop\" in loop_struct:\n        if config['under_simd'] == 1:\n            return 0\n        else:\n            return 1\n    elif \"mark\" in loop_struct:\n        mark = loop_struct['mark']\n        mark_name = mark['mark_name']\n        if mark_name == 'simd':\n            config['under_simd'] = 1\n        child = mark['child']\n        if child == None:\n            return 0\n        else:\n            return loop_struct_has_non_simd_loop(child, config)\n    elif \"user\" in loop_struct:\n        return 0\n    elif \"block\" in loop_struct:\n        children = loop_struct['block']['child']\n        if children == None:\n            return 0\n        else:\n            for child in children:\n                has_non_simd_loop = loop_struct_has_non_simd_loop(child, config)\n                if has_non_simd_loop == 1:\n                    return 1\n            return 0\n    elif \"if\" in loop_struct:\n        if_struct = loop_struct['if']\n        then_block = if_struct['then']\n        has_non_simd_loop = loop_struct_has_non_simd_loop(then_block, config)\n        if has_non_simd_loop == 1:\n            return 1\n        if 'else' in if_struct:\n            else_block = if_struct['else']\n            has_non_simd_loop = loop_struct_has_non_simd_loop(else_block, config)\n            if has_non_simd_loop == 1:\n                return 1\n        return 0\n\n    return 0\n\ndef loop_struct_has_for_loop(loop_struct):\n    \"\"\" Examine if the leaf node of the loop struct has any for loop.\n\n    \"\"\"\n    if \"loop\" in loop_struct:\n        return 1\n    elif \"mark\" in loop_struct:\n        child = loop_struct['mark']['child']\n        if child == None:\n            return 0\n        else:\n            return loop_struct_has_for_loop(child)\n    elif \"user\" in loop_struct:\n        child = loop_struct['user']['user_expr']\n        return 0\n    elif \"block\" in loop_struct:\n        children = loop_struct['block']['child']\n        if children == None:\n            return 0\n        else:\n            for child in children:\n                has_for_loop = loop_struct_has_for_loop(child)\n                if has_for_loop == 1:\n                    return 1\n            return 0\n    elif \"if\" in loop_struct:\n        if_struct = loop_struct['if']\n        then_block = if_struct['then']\n        has_for_loop = loop_struct_has_for_loop(then_block)\n        if has_for_loop == 1:\n            return 1\n        if 'else' in if_struct:\n            else_block = if_struct['else']\n            has_for_loop = loop_struct_has_for_loop(else_block)\n            if has_for_loop == 1:\n                return 1\n        return 0\n\n    return 0\n\ndef predict_module_latency_xilinx(loop_struct, config):\n    \"\"\" Predict the module latency for Xilinx FPGAs.\n\n    \"\"\"\n    latency = config['latency']\n    if \"loop\" in loop_struct:\n        config['under_loop'] = 1\n        # Extract the loop information\n        loop = loop_struct['loop']\n        loop_info = loop['loop_info']\n        lb = loop_info['lb']\n        ub = loop_info['ub']\n        iterator = loop_info['iter']\n        # Check if lb/ub is real number\n        if lb.isnumeric():\n            lb_n = int(lb)\n        else:\n            lb_n = 0\n            #raise NotImplementedError(f'Non-number loop lower bound ({lb}) is not supported.')\n        if ub.isnumeric():\n            ub_n = int(ub)\n        else:\n            raise NotImplementedError(f'Non-number loop upper bound ({ub}) is not supported.')\n        config['context'][iterator] = {}\n        config['context'][iterator]['lb'] = lb_n\n        config['context'][iterator]['ub'] = ub_n\n        if config['under_unroll'] == 0:\n            latency = latency * (ub_n - lb_n + 1)\n            config['latency'] = latency\n        child = loop['child']\n        # if it is an outer module, we will need to update loop_prefix at each loop level.\n        if config['module_type'] == 1:\n            if config['loop_prefix'] == 'Loop':\n                config['loop_prefix'] = config['loop_prefix'] + str(config['loop_offset'])\n            else:\n                config['loop_prefix'] = config['loop_prefix'] + '.' + str(config['loop_offset'])\n        # Store the current for loop\n        config['last_for']['iter'] = iterator\n        config['last_for']['lb'] = lb_n\n        config['last_for']['ub'] = ub_n\n        if config['under_coalesce'] == 1:\n            config['last_for']['under_coalesce'] = 1\n        else:\n            config['last_for']['under_coalesce'] = 0\n        predict_module_latency_xilinx(child, config)\n    elif \"mark\" in loop_struct:\n        mark = loop_struct['mark']\n        mark_name = mark['mark_name']\n        # If we meet the 'hls_unroll' mark, the loop below no longer counts in to the loop iteration.\n        if mark_name == 'simd':\n            config['under_unroll'] = 1\n        if mark_name == 'access_coalesce':\n            config['under_coalesce'] = 1\n        if mark_name == 'access_serialize':\n            config['under_serialize'] = 1\n        child = mark['child']\n        predict_module_latency_xilinx(child, config)\n    elif \"user\" in loop_struct:\n        user = loop_struct['user']\n        user_expr = user['user_expr']\n        config['under_unroll'] = 0\n        config['under_coalesce'] = 0\n        if config['module_type'] == 1:\n            # For outer module, we directly return.\n            config['under_serialize'] = 0\n            if config['latency'] == 1:\n                config['latency'] = 0\n            return\n\n        #if config['module_name'] == 'A_IO_L2_in':\n        #    print(latency)\n        # Set II and depth to 1 by default.\n        II = 1\n        depth = 1\n        #print(latency, user_expr)\n        if user_expr.find('dram') != -1:\n            # This is a DRAM stmt, we will plug in the estimated model.\n            # Extract the array name\n            #module_name = config['module_name']\n            #array_name = module_name.split('_')[0]\n            #array_info = config['array_info'][array_name]\n\n            if config['last_for']['under_coalesce'] == 1 and \\\n               config['under_serialize'] == 0:\n                # This statement accesses the dram under a coalesced loop.\n                burst_len = (config['last_for']['ub'] - config['last_for']['lb'])\n                # The DRAM latency is etimated as 200ns\n                dram_latency = 200 / config['cycle'] + burst_len + depth\n                latency = latency / burst_len * dram_latency\n            elif config['under_serialize'] == 1:\n                # This statement accesses the dram with serialized data.\n                latency = (latency - 1) * II + depth\n            else:\n                latency = latency * (200 / config['cycle'] + depth)\n        else:\n            latency = (latency - 1) * II + depth\n        config['under_serialize'] = 0\n        config['latency'] = latency\n    elif \"block\" in loop_struct:\n        block = loop_struct['block']\n        block_child = block['child']\n\n        # Check if only one child is valid and the rest only contain the empty leaf node.\n        # If so, continue from the non-empty leaf node w/o further action.\n        n_child = 0\n        for child in block_child:\n            is_empty = is_loop_struct_leaf_empty(child)\n            if is_empty == 0:\n                n_child += 1\n                single_child = child\n\n        if n_child == 1:\n            predict_module_latency_xilinx(single_child, config)\n            return\n\n        # Check if the current block contains \"simd\" mark.\n        # If so, continue from \"simd\" branch w/o any further action.\n        simd_child = 0\n        for child in block_child:\n            if \"mark\" in child:\n                mark_name = child['mark']['mark_name']\n                if mark_name == 'simd':\n                    config['under_unroll'] = 1\n                    child = child['mark']['child']\n                    simd_child = 1\n                    break\n        if simd_child == 1:\n            predict_module_latency_xilinx(child, config)\n            return\n\n        # Proceed as normal.\n        # Check if the child contains any non-simd loop. If yes, we will\n        # update the loop prefix.\n        for child in block_child:\n            local_config = {}\n            local_config['under_simd'] = 0\n            has_non_simd_loop = loop_struct_has_non_simd_loop(child, local_config)\n            if has_non_simd_loop:\n                if config['module_type'] != 1 and config['under_loop'] == 1:\n                    if config['loop_prefix'] == 'Loop':\n                        config['loop_prefix'] = config['loop_prefix'] + str(config['loop_offset'])\n                    else:\n                        config['loop_prefix'] = config['loop_prefix'] + '.' + str(config['loop_offset'])\n                break\n        loop_prefix = config['loop_prefix']\n        loop_offset = 1\n        under_loop = config['under_loop']\n\n        # If the block is under loop and all childrens are user nodes,\n        # we will proceed and dive into the user nodes.\n        all_user_child = 1\n        for child in block_child:\n            has_for_loop = loop_struct_has_for_loop(child)\n            if has_for_loop:\n                all_user_child = 0\n                break\n        latency = config['latency']\n        block_latency = 0\n        for child in block_child:\n            config['loop_offset'] = loop_offset\n            config['loop_prefix'] = loop_prefix\n            if under_loop == 1:\n                config['under_loop'] = 0\n            has_for_loop = loop_struct_has_for_loop(child)\n            if all_user_child:\n                # Select the statement with the longest latency.\n                config['latency'] = latency\n                predict_module_latency_xilinx(child, config)\n                block_latency = max(block_latency, config['latency'])\n            else:\n                # Accumulate the latency.\n                if has_for_loop:\n                    config['latency'] = 1\n                    predict_module_latency_xilinx(child, config)\n                    loop_offset += 1\n                    block_latency += config['latency']\n        if all_user_child:\n            latency = block_latency\n        else:\n            latency = latency * max(block_latency, 1)\n        config['latency'] = latency\n    elif \"if\" in loop_struct:\n        # For if then clause, we will treat it as similar as block by\n        # adding up the latency of all sub blocks.\n        latency = config['latency']\n        block_latency = 0\n        if_struct = loop_struct['if']\n        then_block = if_struct['then']\n        if config['module_type'] != 1 and config['under_loop'] == 1:\n            if config['loop_prefix'] == 'Loop':\n                config['loop_prefix'] = config['loop_prefix'] + str(config['loop_offset'])\n            else:\n                config['loop_prefix'] = config['loop_prefix'] + '.' + str(config['loop_offset'])\n        loop_prefix = config['loop_prefix']\n        loop_offset = config['loop_offset']\n        has_for_loop = loop_struct_has_for_loop(then_block)\n        if has_for_loop:\n            config['latency'] = 1\n            predict_module_latency_xilinx(then_block, config)\n            block_latency = max(block_latency, config['latency'])\n        if 'else' in if_struct:\n            loop_offset += 1\n            config['loop_offset'] = loop_offset\n            else_block = if_struct['else']\n            has_for_loop = loop_struct_has_for_loop(else_block)\n            if has_for_loop:\n                config['latency'] = 1\n                predict_module_latency_xilinx(else_block, config)\n                block_latency = max(block_latency, config['latency'])\n        #print('1: ', latency)\n        #print('2: ', block_latency)\n        latency = latency * max(block_latency, 1)\n        config['latency'] = latency\n\ndef predict_design_latency(latency_info, cycle=5, early_stop=-1):\n    \"\"\" Predict the latency for a single design.\n\n    We assume that the II and depth for each stmt to be one.\n\n    Parameters\n    ----------\n    latency_info: dict\n        A dict containing the latency info of the design.\n    cycle: int\n        The cycle time. (in ns)\n    early_stop: int\n        The baseline latency. If set -1, early stop is disabled.\n    \"\"\"\n    latency_all = {}\n    config = {}\n    config['cycle'] = cycle\n    module_grouped = latency_info['module_grouped']\n    array_info = latency_info['array_info']\n    loop_infos = latency_info['loop_infos']\n\n    drain_latency = 0\n    drain_outer = 1\n\n    for module_name in module_grouped:\n        if 'dummy' in module_name:\n            # Simply skip the dummy module\n            continue\n        if module_name not in loop_infos:\n            continue\n\n        ## debug\n        #if module_name != 'A_IO_L2_in':\n        #    continue\n        #print(module_name)\n        ## debug\n\n        module = module_grouped[module_name]\n        #print(module)\n\n        config['context'] = {}\n        config['latency'] = 1\n        config['loop_prefix'] = 'Loop'\n        config['loop_offset'] = 1\n        config['under_unroll'] = 0\n        config['under_coalesce'] = 0\n        config['under_serialize'] = 0\n        config['under_loop'] = 0\n        config['last_for'] = {}\n        config['array_info'] = array_info\n        config['module_name'] = module_name\n        # 0: default 1: outer 2: inter_trans 3: intra_trans\n        config['module_type'] = 0\n\n        if 'inter_trans' in module or 'intra_trans' in module:\n            # This is a filter module. We take it as double buffered by default.\n            config['module_type'] = 1\n            module_loop_info = loop_infos[module_name]\n            predict_module_latency_xilinx(module_loop_info, config)\n            outer_latency = config['latency']\n\n            # inter module\n            config['module_type'] = 2\n            config['latency'] = 1\n            config['loop_prefix'] = 'Loop'\n            config['loop_offset'] = 1\n            sub_module_name = module['inter_trans']\n            config['module_name'] = sub_module_name\n            module_loop_info = loop_infos[sub_module_name]\n            predict_module_latency_xilinx(module_loop_info, config)\n            inter_trans_latency = config['latency']\n\n            # intra module\n            config['module_type'] = 3\n            config['latency'] = 1\n            config['loop_prefix'] = 'Loop'\n            config['loop_offset'] = 1\n            sub_module_name = module['intra_trans']\n            config['module_name'] = sub_module_name\n            module_loop_info = loop_infos[sub_module_name]\n            predict_module_latency_xilinx(module_loop_info, config)\n            intra_trans_latency = config['latency']\n\n            ## debug\n            #print(outer_latency)\n            #print(inter_trans_latency)\n            #print(intra_trans_latency)\n            ## debug\n\n            if module_loop_info['module_prop']['double_buffer'] == 1:\n                module_latency = outer_latency * max(inter_trans_latency, intra_trans_latency)\n                if module_loop_info['module_prop']['in'] == 1:\n                    module_latency += intra_trans_latency\n                else:\n                    module_latency += inter_trans_latency\n            else:\n                module_latency = outer_latency * (inter_trans_latency + intra_trans_latency)\n            # Hack: For GEMM4\n            #if 'C' in module_name:\n            if 'drain' in module_name:\n                drain_outer = max(1, outer_latency)\n\n            latency_all[module_name] = module_latency\n        else:\n            module_loop_info = loop_infos[module_name]\n            #print(config['module_name'])\n            predict_module_latency_xilinx(module_loop_info, config)\n            latency_all[module_name] = config['latency']\n            # Hack: For GEMM4\n            #if 'C' in module_name:\n            if 'drain' in module_name:\n                drain_latency = max(drain_latency, config['latency'])\n\n        # If we set early stop, we are using a baseline latency to compare.\n        # If any of the module latency is greater than the baseline, we\n        # will return immediately.\n        if early_stop != -1:\n            if config['latency'] > early_stop:\n                return config['latency']\n\n    #print(latency_all)\n    drain_last_tile_latency = drain_latency / drain_outer\n    latency = 0\n    for lat in latency_all:\n        if latency_all[lat] > latency:\n            latency = latency_all[lat]\n    #print(latency)\n    #print(drain_last_tile_latency)\n    latency += drain_last_tile_latency\n\n    return int(latency)\n\ndef unit_test_predict_design_latency(design_dir):\n    \"\"\" Unit test for design latency prediction\n\n    Paramters\n    ---------\n    design_dir: str\n        Design directory\n    \"\"\"\n    latency_info = extract_latency_info(design_dir)\n    latency = predict_design_latency(latency_info, 5)\n    print(\"latency: \", latency)\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser(description=\"==== AutoSA Latency Model ====\")\n    parser.add_argument('-d', required=True, help='design directory')\n\n    args = parser.parse_args()\n    unit_test_predict_design_latency(args.d)\n"
  },
  {
    "path": "autosa_scripts/module_group.py",
    "content": "#!/usr/bin/env python3\n\nimport sympy\nimport sys\nimport argparse\nimport re\nimport json\nimport numpy as np\n\n\ndef compose_final_file(output_f, prefix_content, module_defs, top_kernel):\n    with open(output_f, 'w') as f:\n        f.writelines(prefix_content)\n        for module_name in module_defs:\n            module_def = module_defs[module_name]\n            f.write('/* Module Definition */\\n')\n            f.writelines(module_def)\n            f.write('/* Module Definition */\\n\\n')\n\n        f.writelines(top_kernel['prefix_content'])\n        f.write(' ' * 4 + '/* FIFO Declaration */\\n')\n        for fifo_name in top_kernel['fifo_decls']:\n            fifo_decl = top_kernel['fifo_decls'][fifo_name]\n            f.writelines(fifo_decl)\n        f.write(' ' * 4 + '/* FIFO Declaration */\\n\\n')\n\n        for module_call in top_kernel['module_calls']:\n            f.write(' ' * 4 + '/* Module Call */\\n')\n            f.writelines(module_call['content'])\n            f.write(' ' * 4 + '/* Module Call */\\n\\n')\n        f.write('}\\n')\n        # Note: this one is for extern \"C\" in the OpenCL kernel\n        f.write('}\\n')\n\n\ndef extract_fifos_from_module_call(module_call):\n    \"\"\"\n\n    Returns a list containing all the fifos in the module call.\n    \"\"\"\n    fifos = []\n    for line in module_call:\n        if line.find('/* fifo */') != -1:\n            m = re.search(r'\\*/ (.+),', line)\n            if m:\n                fifo = m.group(1)\n                fifos.append(fifo)\n            else:\n                m = re.search(r'\\*/ (.+)', line.strip())\n                if m:\n                    fifo = m.group(1)\n                    fifos.append(fifo)\n    return fifos\n\n\ndef compose_group_wrapper(\n        x_start,\n        y_start,\n        group_modules,\n        module_fifo_decls,\n        module_ext_fifos):\n    \"\"\" Compose the module definition of the group wrapper module\n\n    Retuns a list [module_name, module_def, module_call]\n    \"\"\"\n    module_name = 'PE_module_group_wrapper_' + \\\n        str(x_start) + '_' + str(y_start)\n    # Build the module definition\n    module_def = []\n    # Head\n    module_def.append('void ' + module_name + '(\\n')\n    first = 1\n    for fifo in module_ext_fifos:\n        fifo_name = fifo['fifo_name']\n        fifo_type = fifo['fifo_type']\n        if not first:\n            module_def.append(',\\n')\n        module_def.append(' ' * 4 + fifo_type + ' &' + fifo_name)\n        first = 0\n    module_def.append(')\\n')\n    module_def.append('{\\n')\n    module_def.append('#pragma HLS INLINE OFF\\n')\n    module_def.append('#pragma HLS DATAFLOW\\n')\n\n    # fifo declarations\n    module_def.append(' ' * 4 + '/* FIFO Declaration */\\n')\n    for fifo_name in module_fifo_decls:\n        fifo_decl = module_fifo_decls[fifo_name]\n        module_def += fifo_decl\n    module_def.append(' ' * 4 + '/* FIFO Declaration */\\n\\n')\n\n    # module calls\n    for module_call in group_modules:\n        content = module_call['content']\n        module_def.append(' ' * 4 + '/* Module Call */\\n')\n        module_def += content\n        module_def.append(' ' * 4 + '/* Module Call */\\n\\n')\n\n    module_def.append('}\\n')\n\n    # Build the module call\n    module_call = []\n    module_call.append(' ' * 4 + module_name + '(\\n')\n    # Insert the external fifos\n    first = 1\n    for fifo in module_ext_fifos:\n        fifo_name = fifo['fifo_name']\n        if not first:\n            module_call.append(',\\n')\n        module_call.append(' ' * 8 + '/* fifo */ ' + fifo_name)\n        first = 0\n    module_call.append('\\n')\n    module_call.append(' ' * 4 + ');\\n')\n    return [module_name, module_def, module_call]\n\n\ndef create_group_wrapper(\n        x_start,\n        y_start,\n        group_modules,\n        module_defs,\n        top_kernel):\n    \"\"\" Create a wrapper module for all the modules in the current group\n\n    First figure out the internal fifos in this group.\n    Internal fifos are those fifos that have been used by modules inside the\n    group.\n    These internal fifos will be removed from the top_kernel['fifo_decls']\n    and moved inside the current wrapper module.\n    Next, for the external fifos, place them in the argument lists of the current\n    group.\n    Append the defition of this wrapper modules to module_defs.\n    Append a new module call of this wrapper module to the\n    top_kernel['module_calls']\n    and remove the module calls of sub modules in this group from top_kernel['module_calls'].\n\n    Args:\n      x_start: the start x index of PE module ids\n      y_start: the start y index of PE module ids\n      group_modules: list containing all module calls in the current group\n      module_defs: dict containing the module definitions\n      top_kernel: dict containing the top kernel content\n    \"\"\"\n    # print(x_start, y_start)\n    internal_fifos = []\n    external_fifos = []\n    for module in group_modules:\n        # print(module['module_name'])\n        fifos = extract_fifos_from_module_call(module['content'])\n        # print(fifos)\n        for fifo in fifos:\n            if fifo in external_fifos:\n                internal_fifos.append(fifo)\n                external_fifos.remove(fifo)\n            else:\n                external_fifos.append(fifo)\n\n    # Remove internal fifos from the top_kernels and place them inside the current\n    # wrapper.\n    module_fifo_decls = {}\n    for fifo in internal_fifos:\n        fifo_decl = top_kernel['fifo_decls'][fifo]\n        del top_kernel['fifo_decls'][fifo]\n        module_fifo_decls[fifo] = fifo_decl\n    module_ext_fifos = []\n    for fifo in external_fifos:\n        ext_fifo_item = {}\n        ext_fifo_item['fifo_name'] = fifo\n        # Extract the fifo type\n        fifo_decl = top_kernel['fifo_decls'][fifo]\n        first_line = fifo_decl[0]\n        m = re.search(r'\\*/ (.+?) fifo', first_line)\n        if m:\n            fifo_type = m.group(1)\n            ext_fifo_item['fifo_type'] = fifo_type\n        module_ext_fifos.append(ext_fifo_item)\n\n    # Compose the definition and call of the wrapper module\n    [module_name, module_def, module_call] = compose_group_wrapper(\n        x_start, y_start, group_modules, module_fifo_decls, module_ext_fifos)\n    # Insert the new definition into the module_defs\n    module_defs[module_name] = module_def\n\n    # Remove the module calls of this group from top_kernel['module_calls']\n    module_offset = len(top_kernel['module_calls'])\n    for module in group_modules:\n        module_offset = min(module_offset,\n                            top_kernel['module_calls'].index(module))\n        top_kernel['module_calls'].remove(module)\n    # Insert a new module call at the position 'module_offset'\n    module_call_item = {'module_name': module_name, 'content': module_call}\n    top_kernel['module_calls'].insert(module_offset, module_call_item)\n\n\ndef module_grouping(\n        output_f,\n        prefix_content,\n        module_defs,\n        top_kernel,\n        group_config):\n    \"\"\"\n\n    Args:\n      output_f: output kernel file\n      prefix_content: list containing the file content before the first module\n      definition\n      module_defs: dict containing the module definitions\n      top_kernel: dict containing the top kernel content\n      {\n        'prefix_content': list containign the file content before the first fifo declaration\n        'fifo_decls': dict containing the fifo declarations\n        'module_calls': list containing the module calls\n                        a module call is a dict containing fields:\n                        module_name, module_ids, content\n      }\n    \"\"\"\n    # Examine if this file is legal to be grouped\n    # Currently, we only allow module ids and fifos in the PE-level modules\n    group_legal = True\n    module_calls = top_kernel['module_calls']\n    for module_call in module_calls:\n        module_name = module_call['module_name']\n        if 'PE' in module_name or 'IO_L1' in module_name:\n            # This is a PE-level module\n            module_call_content = module_call['content']\n            for i in range(1, len(module_call_content)):\n                line = module_call_content[i]\n                m = re.search(r'/\\* (.+?) \\*/', line)\n                if m:\n                    arg_type = m.group(1)\n                    if arg_type != 'module id' and arg_type != 'fifo':\n                        group_legal = False\n                        break\n                    if arg_type == 'module id':\n                        # Extract the module id\n                        m = re.search(r'\\*/ (.+?),', line)\n                        if m:\n                            module_id = m.group(1)\n                            module_call['module_ids'].append(int(module_id))\n\n    if not group_legal:\n        print(\n            '[AutoSA] Error: Unable to group modules. PE-level modules contain non-fifo' +\n            ' or non-module-id arguments.\\n')\n        compose_final_file(output_f, prefix_content, module_defs, top_kernel)\n        return\n\n    # Extract the PE grid size\n    grid_x = 0\n    grid_y = 0\n    pe_dim = 0\n    for module_call in module_calls:\n        module_name = module_call['module_name']\n        if 'PE' in module_name and 'dummy' not in module_name:\n            pe_dim = len(module_call['module_ids'])\n            grid_x = max(module_call['module_ids'][0], grid_x)\n            grid_y = max(module_call['module_ids'][1], grid_y)\n    # TODO: At present, this scripts only work for 2D arrays\n    grid_x += 1\n    grid_y += 1\n    group_modules_list = []\n    for x_start in range(0, grid_x, group_config['x']):\n        for y_start in range(0, grid_y, group_config['y']):\n            # Grasp all the PE-level modules in the current group\n            group_modules = []\n            for module_call in module_calls:\n                module_name = module_call['module_name']\n                if 'PE' in module_name and 'dummy' not in module_name:\n                    if module_call['module_ids'][0] in range(\n                            x_start,\n                            x_start +\n                            group_config['x']) and module_call['module_ids'][1] in range(\n                            y_start,\n                            y_start +\n                            group_config['y']):\n                        group_modules.append(module_call)\n                if 'IO_L1' in module_name:\n                    # Extract the PE module ids from the last fifo\n                    module_call_content = module_call['content']\n                    last_fifo_line = module_call_content[-2]\n                    m = re.search(r'\\*/ (.+)', last_fifo_line.strip())\n                    if m:\n                        last_fifo = m.group(1)\n                        last_fifo = last_fifo.split('_')\n                        pe_x = int(last_fifo[-2])\n                        pe_y = int(last_fifo[-1])\n                        if pe_x in range(\n                                x_start,\n                                x_start +\n                                group_config['x']) and pe_y in range(\n                                y_start,\n                                y_start +\n                                group_config['y']):\n                            group_modules.append(module_call)\n\n            group_modules_list.append({'x_start': x_start, 'y_start': y_start,\n                                       'group_modules': group_modules.copy()})\n\n    for group in group_modules_list:\n        # Create group wrapper modules\n        create_group_wrapper(group['x_start'], group['y_start'],\n                             group['group_modules'], module_defs, top_kernel)\n\n    # Compose the final file\n    compose_final_file(output_f, prefix_content, module_defs, top_kernel)\n\n\ndef run(input_f, output_f, config, host='opencl'):\n    \"\"\" Module group\n\n    This function will group the PE-level modules (PE and IO_L1)\n    according to the group configuration files.\n    Specifically, given the grouping constraint {x, y}, we will group all PE-level\n    modules into blocks with dimensions x and y.\n    We will insert new wrapper functions to wrap the original modules in the\n    group.\n    FIFOs connecting these modules internally will be placed inside the wrapper.\n\n    Note: This script only supports:\n          - 2D array\n          - Xilinx OpenCL kernel\n\n    Args:\n      input_f: input kernel file\n      output_f: output kernel file\n      config: grouping configuration file\n      host: Xilinx host target\n    \"\"\"\n    # Load the group configuration file\n    group_config = {}\n    with open(config, 'r') as f:\n        group_config = json.load(f)\n\n    # Extract:\n    # - file content before the first module definition\n    # - module definitions\n    # - top kernel\n    #   - file content before the first fifo declaration\n    #   - fifo declarations\n    #   - module calls\n    #   - fifo content after the last module call\n    lines = []\n    with open(input_f, 'r') as f:\n        lines = f.readlines()\n\n    prefix_content = []\n    module_defs = {}\n    top_kernel = {'prefix_content': [], 'fifo_decls': {}, 'module_calls': []}\n    prefix_content_flag = 1\n    module_defs_flag = 0\n    top_kernel_flag = 0\n    module_def_add = False\n    module_def = []\n\n    top_kernel_prefix_content_flag = 1\n    top_kernel_fifo_decls_flag = 0\n    top_kernel_module_calls_flag = 0\n    top_kernel_fifo_decls_add = False\n    top_kernel_module_calls_add = False\n    module_call = []\n\n    for line in lines:\n        if prefix_content_flag:\n            if line.find('Module Definition') != -1:\n                prefix_content_flag = 0\n                module_defs_flag = 1\n            else:\n                prefix_content.append(line)\n        if module_defs_flag:\n            if line.find('extern \\\"C\\\"') != -1:\n                # TODO: only opencl is supported\n                module_defs_flag = 0\n                top_kernel_flag = 1\n            else:\n                if module_def_add:\n                    module_def.append(line)\n                    if (line.find('void')) != -1:\n                        m = re.search(r'void (.+?)\\(', line.strip())\n                        if m:\n                            module_name = m.group(1)\n                if line.find('/* Module Definition */') != -1:\n                    if module_def_add:\n                        module_def.pop(len(module_def) - 1)\n                        module_defs[module_name] = module_def.copy()\n                        module_def.clear()\n                    module_def_add = not module_def_add\n        if top_kernel_flag:\n            if top_kernel_prefix_content_flag:\n                if line.find('/* FIFO Declaration */') != -1:\n                    top_kernel_prefix_content_flag = 0\n                    top_kernel_fifo_decls_flag = 1\n                else:\n                    top_kernel['prefix_content'].append(line)\n            if top_kernel_fifo_decls_flag:\n                if line.find('/* FIFO Declaration */') != -1:\n                    if not top_kernel_fifo_decls_add:\n                        top_kernel_fifo_decls_add = not top_kernel_fifo_decls_add\n                    else:\n                        top_kernel_fifo_decls_flag = 0\n                        top_kernel_module_calls_flag = 1\n                else:\n                    if line.find('hls::stream') != -1:\n                        m = re.search(r'> (.+?);', line)\n                        if m:\n                            fifo_name = m.group(1)\n                            top_kernel['fifo_decls'][fifo_name] = [line]\n                    if line.find('HLS STREAM') != -1:\n                        m = re.search(r'variable=(.+?) ', line)\n                        if m:\n                            fifo_name = m.group(1)\n                            top_kernel['fifo_decls'][fifo_name].append(line)\n            if top_kernel_module_calls_flag:\n                if line.find('/* Module Call */') != -1:\n                    if top_kernel_module_calls_add:\n                        module_call_object = {'module_name': module_name,\n                                              'module_ids': [],\n                                              'content': module_call.copy()}\n                        top_kernel['module_calls'].append(module_call_object)\n                        module_call.clear()\n                    top_kernel_module_calls_add = not top_kernel_module_calls_add\n                else:\n                    if top_kernel_module_calls_add:\n                        module_call.append(line)\n                        m = re.search(r'(.+?)\\(', line.strip())\n                        if m:\n                            module_name = m.group(1)\n\n    # Group modules and print out to 'output_f'\n    module_grouping(\n        output_f,\n        prefix_content,\n        module_defs,\n        top_kernel,\n        group_config)\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser(\n        description='==== AutoSA Utils: Module Grouping ====')\n    parser.add_argument('-i', '--input', required=True, help='kernel file')\n    parser.add_argument(\n        '-o',\n        '--output',\n        required=True,\n        help='modified kernel file')\n    parser.add_argument(\n        '-c',\n        '--config',\n        required=True,\n        help='grouping configuration')\n    parser.add_argument(\n        '--host',\n        required=False,\n        help='Xilinx host target: hls|opencl',\n        default='opencl')\n\n    args = parser.parse_args()\n    run(args.input, args.output, args.config, args.host)\n"
  },
  {
    "path": "autosa_scripts/odyssey/RL_utils.py",
    "content": "import torch.nn as nn\nimport numpy as np\nimport random\nimport bisect\nimport copy\n\nimport torch\nimport torch.optim as optim\nimport torch.nn.functional as F\nfrom torch.distributions import Categorical\n\nimport utils\n\ndevice = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\nLR_ACTOR = 1e-3 # learning rate of the actor\nGAMMA = 0.9  # discount factor\nEPSILON = 2**(-12)\nCLIPPING_MODEL = 100\n\nclass RLEnv():\n    def __init__(self, search_task, cst, param_idx_map, idx_param_map, search_obj, dim_size, n_action_steps, action_size):\n        \"\"\"    \n        search_task: search task object\n        dim_size: dimension of the problem space, 3 for GEMM\n        n_action_steps: dimension of the action vector, 6 for GEMM\n        \"\"\"\n        self.search_task = search_task\n        self.cst = cst\n        self.param_idx_map = param_idx_map\n        self.idx_param_map = idx_param_map\n        self.search_obj = search_obj\n        self.dim_size = dim_size\n        self.n_action_steps = n_action_steps\n        self.action_size = action_size\n        action_bound, action_bottom = self.build_action_space()\n        self.action_bound = action_bound\n        self.action_bottom = action_bottom\n        \n        self.state = np.array([0.5]*n_action_steps) # (action vector)        \n        # Sum of adjusted rewards\n        self.adjusted_epoch_rewards = 0        \n        # Sum of raw rewards\n        self.epoch_rewards = 0\n        self.prev_epoch_rewards = 0        \n        self.sig = 1\n        # The minimal reward during the whole training process\n        self.min_reward = float(\"inf\")\n        self.epoch = 0\n        self.best_epoch_rewards = float(\"-inf\")        \n        # Keep track of best rewards during the training process\n        self.rewards_log = []\n        self.best_rewards_log = []\n\n    def reset(self):\n        \"\"\" Reset the state of the environment.\n\n        \"\"\"\n        # (i_t1, j_t1, k_t1, i_t2, j_t2, k_t2)\n        self.state = np.array([0]*6, dtype=np.float)\n        self.adjusted_epoch_rewards = 0        \n        self.epoch_rewards = 0        \n        self.sig = 1\n        self.sol = []\n        infos = {}\n\n        return self.state, infos\n\n    def get_state(self):\n        return self.state\n\n    def set_constraint(self, cst):\n        \"\"\" Set up hw constraint.\n        \"\"\"\n        self.cst = cst\n\n    def build_action_space(self):\n        action_bound = [self.search_task.workload[\"params\"][\"i\"], \n                        self.search_task.workload[\"params\"][\"j\"], \n                        self.search_task.workload[\"params\"][\"k\"], \n                        self.search_task.workload[\"params\"][\"i\"], \n                        self.search_task.workload[\"params\"][\"j\"], \n                        min(256 // self.search_task.dw, 64, self.search_task.workload[\"params\"][\"k\"])]\n        action_bottom = [1 for a in range(self.n_action_steps)]\n        return action_bound, action_bottom\n\n    def overuse_constraint(self, used_cst):\n        score = 0\n        if not used_cst:\n            # If constraint doesn't exist, return True to exclude this design\n            return True, score\n\n        overuse = False\n\n        if used_cst['BRAM18K'] > self.cst.hw_cst['BRAM18K']:\n            score += 0.5 * (used_cst['BRAM18K'] - self.cst.hw_cst['BRAM18K']) / self.cst.hw_cst['BRAM18K']\n            overuse = True\n        if used_cst['URAM'] > self.cst.hw_cst['URAM']:\n            score += 0.5 * (used_cst['URAM'] - self.cst.hw_cst['URAM']) / self.cst.hw_cst['URAM']\n            overuse = True    \n        if used_cst['DSP'] > self.cst.hw_cst['DSP']:\n            score += 0.5 * (used_cst['DSP'] - self.cst.hw_cst['DSP']) / self.cst.hw_cst['DSP']\n            overuse = True\n        \n        return overuse, score\n\n    def update_total_reward_constraint(self, constraint, reward):\n        \"\"\" Accumulate the resource and rewards in one epoch.\n        Currently we only consider rewards.\n        \"\"\"        \n        self.epoch_rewards += reward\n\n    def get_reward(self, task_params):\n        \"\"\" Call the cost models to get the reward for current solution.\n\n        Returns\n        -------\n        reward:\n            The adjusted reward for the current solution.\n        constraint:\n            The used constraint of the current solution.\n        reward_raw:\n            The unadjusted reward for the current solution.\n        \"\"\"\n        reward, used_constraint, reward_meta = self.search_task.evaluate(task_params, self.search_obj)\n        #reward, constraint = self.search_task.evaluate(sol)        \n        if reward == None or reward == 0:\n            return -1, None, -1, None        \n        reward_raw = reward        \n        self.min_reward = min(self.min_reward, reward_raw)\n        # Adjust the reward by subtracting the minimal reward found so far\n        # to stabilize the training.\n        reward -= self.min_reward\n        self.adjusted_epoch_rewards += reward\n\n        return reward, used_constraint, reward_raw, reward_meta\n\n    def norm_state(self, T):\n        \"\"\" Normalize the state to the range of [-1, 1] to stabilize the training.\n        The input state is in the range of [0, 1].\n        \"\"\"\n        T[:-1] = (T[:-1] - 0.5) * 2\n        return T\n\n    def update_mode_and_step(self):\n        pass        \n\n    def update_reward_epoch(self):\n        if self.epoch_rewards > self.best_epoch_rewards:\n            self.best_epoch_rewards = self.epoch_rewards\n\n    def update_best_reward_list(self, succeed):\n        \"\"\" Update the information\n        \"\"\"\n        self.epoch += 1\n        # If the current epoch fails, we roll back to the reward in the last successful epoch.\n        self.epoch_rewards = self.prev_epoch_rewards if not succeed else self.prev_epoch_rewards\n        self.prev_epoch_rewards = self.epoch_rewards\n        self.rewards_log.append(self.epoch_rewards)\n        self.best_rewards_log.append(self.best_epoch_rewards)\n\n    def update_reward_impt(self, done):\n        impt = None        \n        return impt\n\n    def convert_action_to_vals(self, action):\n        \"\"\" Convert the actions to the real tiling factors.\n        \"\"\"\n        action_norm = np.array([float(a) / self.action_size for a in action]).clip(0, 1)\n        # i_t1, j_t1, k_t1\n        for i in range(3):\n            action[i] = int(action[i] / self.action_size * self.action_bound[i])\n        # i_t2, j_t2, k_t2\n        for i in range(3):\n            action[i] = int(action[i] / self.action_size * self.action_bound[i])\n\n        task_params = {}\n        for p, param in self.search_task.design.params_config[\"tunable\"].items():\n            task_params[param[\"name\"]] = action[self.param_idx_map[param[\"name\"]]]\n        for p, param in self.search_task.design.params_config[\"external\"].items():\n            task_params[param[\"name\"]] = self.search_task.workload[\"params\"][param[\"name\"]]\n        task_params = self.search_task.adjust_params(task_params)\n        task_params = self.search_task.design.infer_params(task_params)\n\n        action = []\n        for p, param in self.search_task.design.params_config[\"tunable\"].items():\n            action.append(task_params[param[\"name\"]])\n\n        return action, action_norm\n\n    def step(self, action):\n        infos = {}\n        infos['succeed'] = 0        \n        done = 0\n        action = action.cpu().numpy().flatten()        \n        # Scale the action back to the real tiling factors\n        # Actions are in the levels of 1 to self.action_size.\n        # We will need to scale them back to the corresponding tiling factors.\n        action_val = [int(a) + 1 for a in action]\n        action_val, action_norm = self.convert_action_to_vals(action_val)        \n        # Compose the solution\n        task_params = {}\n        for p, param in self.search_task.design.params_config[\"tunable\"].items():\n            task_params[param[\"name\"]] = action_val[self.param_idx_map[param[\"name\"]]]\n        for p, param in self.search_task.design.params_config[\"external\"].items():\n            task_params[param[\"name\"]] = self.search_task.workload[\"params\"][param[\"name\"]]\n        task_params = self.search_task.adjust_params(task_params)\n        task_params = self.search_task.design.infer_params(task_params)\n        infos['sol'] = task_params\n        reward, used_constraint, reward_raw, reward_meta = self.get_reward(task_params)\n        infos['cst'] = used_constraint        \n        infos['reward_meta'] = reward_meta\n        self.update_total_reward_constraint(used_constraint, reward_raw)\n        self.sol.append((copy.deepcopy(action_val)))\n\n        # Penalize the solution that overuses the resource\n        overuse, overuse_score = self.overuse_constraint(used_constraint)\n        if overuse:            \n            reward = (-self.adjusted_epoch_rewards + reward) * overuse_score\n            reward_raw = 0\n            done = 1\n        if reward == -1:\n            done = 1\n        infos['reward_raw'] = reward_raw\n\n        # Normalize the state to [-1,1]\n        self.state = self.norm_state(action_norm)\n        if not done:\n            infos[\"succeed\"] = 1\n            done = 1\n            self.update_reward_epoch()\n        infos[\"epoch_rewards\"] = self.epoch_rewards\n        self.update_best_reward_list(infos[\"succeed\"]) if done else None        \n        impt = self.update_reward_impt(done)        \n        return self.state, reward, done, infos, self.sig, impt\n\nclass RLAgent():\n    def __init__(self, dim_size, n_action_steps, action_size, seed, batch, decay=0.95):\n        \"\"\"\n        Parameters\n        ----------\n        dim_size:\n            problem space dimensions, 3 for GEMM\n        n_action_steps:\n            dimension of one action, 6 for GEMM\n        action_size:\n            levels of each action step in one action\n        seed:\n            random seed\n        \"\"\"\n        # Attributes\n        self.dim_size = dim_size\n        self.action_size = action_size\n        self.n_action_steps = n_action_steps        \n        # [n_action_steps]\n        self.state_div = [n_action_steps]\n        self.state_div = np.cumsum(self.state_div)\n        self.seed = random.seed(seed)\n        self.batch = batch\n\n        # Actor\n        self.actor = Actor(dim_size, n_action_steps, action_size, seed).to(device)\n        self.actor_optimizer = optim.Adam(self.actor.parameters(), lr=LR_ACTOR)\n        self.scheduler = optim.lr_scheduler.ReduceLROnPlateau(self.actor_optimizer, factor=0.9, min_lr=1e-6)\n        self.decay = decay\n        self.epoch = 0\n\n        # Book-keeping\n        self.saved_log_probs = []\n        self.rewards = []\n        self.baseline = None\n        self.lowest_reward = 0\n        self.best_epoch_reward = float(\"-Inf\")\n        self.has_succeeed_history = False\n        self.bad_counts = 0\n\n    def reset(self):\n        \"\"\" Rest the internal status\n        \"\"\"\n        self.saved_log_probs = []\n        self.rewards = []\n\n    def adjust_lr(self, ratio, min_lr=1e-8):\n        \"\"\" Adjust the learning rate\n        \"\"\"\n        for param_group in self.actor_optimizer.param_groups:\n            param_group['lr'] = max(min_lr, param_group['lr'] * ratio)\n\n    def act(self, state, infos, eps=0.0, temperature=1):\n        \"\"\" Perform one action given the current state.\n        \"\"\"        \n        actions = state[0:self.state_div[0]]\n\n        # Convert them to pytorch data structs        \n        actions = torch.from_numpy(actions).type(torch.FloatTensor).to(device)        \n\n        # Run the policy network        \n        (p) = self.actor(actions, temperature=temperature)\n        #print(p)\n        m = Categorical(p)\n        #print(m)\n        action = m.sample()\n        \n        if random.random() < eps:\n            action2 = action.data + 1 if random.random() < 0.5 else action.data - 1\n            action2 = torch.from_numpy(np.array([action2]))\n            action2 = torch.clamp(action2, 0, p.size(1)-1)\n            return action2.data, m.log_prob(action2)\n        else:\n            return action.data, m.log_prob(action)\n\n    def step(self, state, actions, log_prob, reward, next_state, done, sig, impt, infos):\n        \"\"\" Update and train the policy network\n        \"\"\"\n        self.rewards.append(reward)\n        #print('returned', reward)\n        self.saved_log_probs.append(log_prob)        \n        self.epoch += 1        \n        if self.epoch == self.batch:\n            self.learn(GAMMA, impt, infos)\n\n    def impt_adj_reward(self, reward, impt):\n        \"\"\" Adjust the rewards\n        \"\"\"\n        if impt is not None:\n            reward[:len(impt)] = reward[:len(impt)] * impt\n        return reward\n\n    def learn(self, gamma, impt, infos):\n        \"\"\" Train the policy network\n        \"\"\"\n        rewards = np.array(self.rewards)\n        # Normalize the rewards\n        rewards = (rewards - rewards.mean()) / (rewards.std() + EPSILON)\n        # Adjust the rewards\n        rewards = self.impt_adj_reward(rewards, impt)\n        dis_rewards = []\n        R = 0\n        for r in rewards[::-1]:\n            R = r + gamma * R\n            dis_rewards.insert(0, R)\n        dis_rewards = np.array(dis_rewards)\n        dis_rewards = (dis_rewards - dis_rewards.mean()) / (dis_rewards.std() + EPSILON)\n\n        policy_loss = []\n        for log_prob, r in zip(self.saved_log_probs, dis_rewards):\n            policy_loss.append(-log_prob * r)\n        policy_loss = torch.cat(policy_loss).sum()\n\n        self.actor_optimizer.zero_grad()\n        policy_loss.backward()\n        torch.nn.utils.clip_grad_norm_(self.actor.parameters(), CLIPPING_MODEL)\n        self.actor_optimizer.step()\n        self.reset()\n\nclass Actor(nn.Module):\n    def __init__(self, dim_size, n_action_steps, action_size, seed, h_size=128, hidden_dim=10):\n        \"\"\"\n        We implement a simple FC networks as the policy network.\n\n        Parameters\n        ----------\n        dim_size:\n            The problem space dimension\n        n_action_steps:\n            Number of the action steps, 6 for GEMM (i_t1, j_t1, k_t1, i_t2, j_t2, k_t2)\n        action_size:\n            Level of action steps, max(i, j, k) for GEMM\n        h_size:\n            FC layer dimension\n        hidden_dim:\n            dimensions of the encoder layer\n        \"\"\"\n        super().__init__()\n        self.seed = torch.manual_seed(seed)\n\n        # Encoder\n        self.encoder_action = nn.Linear(n_action_steps, hidden_dim)\n\n        # hidden_dim -> h_size\n        self.fc11 = nn.Linear(hidden_dim, h_size)\n        # h_size -> h_size\n        self.fc12 = nn.Linear(h_size, h_size)\n        # h_size -> h_size\n        self.fc13 = nn.Linear(h_size, h_size)\n\n        self.fc21 = nn.Linear(h_size, action_size)\n        self.fc22 = nn.Linear(h_size, action_size)\n        self.fc23 = nn.Linear(h_size, action_size)\n        self.fc24 = nn.Linear(h_size, action_size)\n        self.fc25 = nn.Linear(h_size, action_size)\n        self.fc26 = nn.Linear(h_size, action_size)\n\n        self.output1 = nn.Linear(action_size, action_size)\n        self.output2 = nn.Linear(action_size, action_size)\n        self.output3 = nn.Linear(action_size, action_size)\n        self.output4 = nn.Linear(action_size, action_size)\n        self.output5 = nn.Linear(action_size, action_size)\n        self.output6 = nn.Linear(action_size, action_size)\n\n        self.decoder = [self.fc21, self.fc22, self.fc23, self.fc24, self.fc25, self.fc26]\n        self.n_action_steps = n_action_steps\n\n    def forward(self, action_steps, temperature=1):\n        \"\"\"\n        Network forward inference.\n\n        Paramters\n        ---------\n        \"\"\"\n        x1 = self.encoder_action(action_steps)\n        x1 = x1.unsqueeze(0)\n\n        x = x1\n        x = F.relu(self.fc11(x))\n        x = F.relu(self.fc12(x))\n        x = F.relu(self.fc13(x))\n\n        # i1\n        decoder = self.decoder[0]\n        x1 = F.relu(decoder(x))\n        x1 = self.output1(x1)\n        x1 = F.softmax(x1/temperature, dim=1)\n\n        # j1\n        decoder = self.decoder[1]\n        x2 = F.relu(decoder(x))\n        x2 = self.output2(x2)\n        x2 = F.softmax(x2/temperature, dim=1)\n\n        # k1\n        decoder = self.decoder[2]\n        x3 = F.relu(decoder(x))\n        x3 = self.output3(x3)\n        x3 = F.softmax(x3/temperature, dim=1)\n\n        # i2\n        decoder = self.decoder[3]\n        x4 = F.relu(decoder(x))\n        x4 = self.output4(x4)\n        x4 = F.softmax(x4/temperature, dim=1)\n\n        # j2\n        decoder = self.decoder[4]\n        x5 = F.relu(decoder(x))\n        x5 = self.output5(x5)\n        x5 = F.softmax(x5/temperature, dim=1)\n\n        # k2\n        decoder = self.decoder[5]\n        x6 = F.relu(decoder(x))\n        x6 = self.output6(x6)\n        x6 = F.softmax(x6/temperature, dim=1)\n\n        # Return the concatenated (x1, x2, x3, x4, x5, x6)\n        x = torch.cat((x1, x2, x3, x4, x5, x6), dim=0)\n\n        return (x)"
  },
  {
    "path": "autosa_scripts/odyssey/analyze.py",
    "content": "import seaborn as sns\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport csv\nimport pandas as pd\nimport os\nimport re\nimport scipy\n\nfolder = \"resnet50_array24\"\n\ndesign_info = {}\nwith open(f\"{folder}/history.log\") as f:\n    lines = f.readlines()\n    design_idx = 0\n    design_lines = []\n    start_end = []\n    for line_idx in range(len(lines)):\n        line = lines[line_idx]\n        if line.find(f\"<record{design_idx}><begin>\") != -1:\n            start_end.append(line_idx)\n        if line.find(\"arch sol\") != -1:\n            start_end.append(line_idx)\n        if line.find(f\"<record{design_idx}><end>\") != -1:\n            start_end.append(line_idx)\n            design_lines.append(start_end)\n            start_end = []\n            design_idx += 1    \n    layer_infos = []\n    layer_info = {}\n    for design_idx in range(len(design_lines)):\n        for line_idx in range(design_lines[design_idx][0], design_lines[design_idx][1]):\n            line = lines[line_idx]            \n            if line.find(\"latency\") != -1 and 'latency' not in layer_info:\n                layer_info[\"latency\"] = float(line.split(\":\")[-1].strip().strip(','))                \n            if line.find(\"DSP efficiency\") != -1:\n                layer_info[\"DSP_eff\"] = float(line.split(\":\")[-1].strip().strip(','))\n            if line.find(\"CTC(FLOP/byte)\") != -1:\n                layer_info[\"CTC\"] = float(line.split(\":\")[-1].strip().strip(','))\n            if line.find(\"design\") != -1:\n                layer_info[\"design\"] = line.split(\":\")[-1].strip().strip(',')\n                dataflow_idx = layer_info[\"design\"][6:]                \n                layer_infos.append(layer_info)\n                layer_info = {}    \n    design_info[\"array_infos\"] = layer_infos\n\n    # Extract the last array\n    layer_infos = []\n    layer_info = {}\n    #print(design_lines[-1][1], design_lines[-1][2])\n    for line_idx in range(design_lines[-1][1], design_lines[-1][2]):\n        line = lines[line_idx]\n        if line.find(\"\\'sol\\':\") != -1:\n            layer_infos.append(layer_info)\n            layer_info = {}\n        if line.find(\"\\'latency\\':\") != -1 and 'latency' not in layer_info:\n            layer_info[\"latency\"] = float(line.split(\":\")[-1].strip().strip(','))            \n        if line.find(\"CTC\") != -1:\n            layer_info[\"CTC\"] = float(line.split(\":\")[-1].strip().strip(','))\n        if line.find(\"DSP_eff\") != -1:\n            layer_info[\"DSP_eff\"] = float(line.split(\":\")[-1].strip().strip(','))            \n    design_info[\"last_array_info\"] = layer_infos\n\n# Plot\ndict_data = {\"Latency\": [], \"DSP Eff\": [], \"CTC\": [], \"Layer\": []}\nlayer_idx = 0\nfor idx in range(len(design_info[\"array_infos\"]) - 1):\n    layer_info = design_info[\"array_infos\"][idx]\n    dict_data[\"Latency\"].append(layer_info[\"latency\"])\n    dict_data[\"DSP Eff\"].append(layer_info[\"DSP_eff\"])\n    dict_data[\"CTC\"].append(layer_info[\"CTC\"])\n    dict_data[\"Layer\"].append(layer_idx + 1)\n    layer_idx += 1\nfor idx in range(len(design_info[\"last_array_info\"])):\n    layer_info = design_info[\"last_array_info\"][idx]\n    #print(layer_info)\n    dict_data[\"Latency\"].append(layer_info[\"latency\"])\n    dict_data[\"DSP Eff\"].append(layer_info[\"DSP_eff\"])\n    dict_data[\"CTC\"].append(layer_info[\"CTC\"])\n    dict_data[\"Layer\"].append(layer_idx + 1)\n    layer_idx += 1\nprint(\"max CTC: \", max(dict_data[\"CTC\"]))\nprint(\"max latency: \", max(dict_data[\"Latency\"]))\n\ndf = pd.DataFrame.from_dict(dict_data)\nsns.set_theme()\nsns.set(rc={'figure.figsize':(20,5)})\n\n'''\ng = sns.lineplot(\n    data=df,\n    x=\"Layer\", y=\"Latency\", markers=True\n)\ng.set(xticks=df.Layer.values)\nplt.xlabel(\"Layer\")\nplt.ylabel(\"Latency\")\ng.figure.savefig(\"network_latency_cmp\")\n\ng = sns.lineplot(\n    data=df,\n    x=\"Layer\", y=\"DSP Eff\", markers=True\n)\ng.set(xticks=df.Layer.values)\nplt.xlabel(\"Layer\")\nplt.ylabel(\"DSP Eff\")\nplt.ylim(0, 1.1)\ng.figure.savefig(\"network_dsp_eff_cmp\")\n'''\n\ng = sns.lineplot(\n    data=df,\n    x=\"Layer\", y=\"CTC\", markers=True\n)\ng.set(xticks=df.Layer.values)\nplt.xlabel(\"Layer\")\nplt.ylabel(\"CTC\")\nplt.ylim(0, 250)\ng.figure.savefig(\"network_ctc_cmp\")\n"
  },
  {
    "path": "autosa_scripts/odyssey/clean_up.sh",
    "content": "#!/bin/bash\n\nrm -rf db/*\nrm -rf opentuner.db\nrm -rf outdir/*\nrm -rf __pycache__\nrm -rf tmp/*\n"
  },
  {
    "path": "autosa_scripts/odyssey/cst/hw_cst.json",
    "content": "{\n  \"BRAM18K\": {\n    \"total\": 5376,\n    \"ratio\": 0.7\n  },\n  \"DSP\": {\n    \"total\": 12288,\n    \"ratio\": 0.7\n  },\n  \"FF\": {\n    \"total\": 3456000,\n    \"ratio\": 0.7\n  },\n  \"LUT\": {\n    \"total\": 1728000,\n    \"ratio\": 0.7\n  },\n  \"URAM\": {\n    \"total\": 1280,\n    \"ratio\": 0.7\n  }\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/cst/single_test.json",
    "content": "{\n  \"BRAM18K\": {\n    \"total\": 300,\n    \"ratio\": 1.0\n  },\n  \"DSP\": {\n    \"total\": 800,\n    \"ratio\": 1.0\n  },\n  \"FF\": {\n    \"total\": 3456000,\n    \"ratio\": 0.7\n  },\n  \"LUT\": {\n    \"total\": 1728000,\n    \"ratio\": 0.7\n  },\n  \"URAM\": {\n    \"total\": 1280,\n    \"ratio\": 0.7\n  }\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/cst/u250.json",
    "content": "{\n  \"BRAM18K\": {\n    \"total\": 5376,\n    \"ratio\": 0.7\n  },\n  \"DSP\": {\n    \"total\": 12288,\n    \"ratio\": 0.7\n  },\n  \"FF\": {\n    \"total\": 3456000,\n    \"ratio\": 0.7\n  },\n  \"LUT\": {\n    \"total\": 1728000,\n    \"ratio\": 0.7\n  },\n  \"URAM\": {\n    \"total\": 1280,\n    \"ratio\": 0.7\n  }\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/cst/vu9p.json",
    "content": "{\n  \"BRAM18K\": {\n    \"total\": 4318,\n    \"ratio\": 0.7\n  },\n  \"DSP\": {\n    \"total\": 6840,\n    \"ratio\": 0.7\n  },\n  \"URAM\": {\n    \"total\": 960,\n    \"ratio\": 0.7\n  }\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/design.py",
    "content": "import numpy as np\nimport json\nimport sys\nimport os\nfrom numpy import ceil, floor\n\nclass Design(object):\n    def __init__(self, name):\n        self.name = name # design name        \n        self.est_resource_func = None\n        self.est_latency_func = None\n        self.est_activity_func = None\n        self.infer_params_func = None\n        self.random_sampling_func = None\n        self.bound_check_func = None\n        self.params_config = None      \n        self.desp = None  \n\n    def print_resource_est_func(self, f, desp):\n        f.write(\"def est_resource(params):\\n\")\n        # Load parameters\n        f.write(\"\\t\")\n        is_first = True\n        for p in desp[\"params\"]:\n            if not is_first:\n                f.write(\", \")\n            f.write(p[\"name\"])\n            is_first = False\n        f.write(\" = \")\n        is_first = True\n        for p in desp[\"params\"]:\n            if not is_first:\n                f.write(\", \")\n            f.write(f'params[\\\"{p[\"name\"]}\\\"]')\n            is_first = False\n        f.write(\"\\n\\n\")\n\n        f.write(\"\\t# DSP\\n\")\n        f.write(f\"\\tDSP = {desp['compute']['PE']['num']} * \")\n        f.write(f\"{desp['compute']['PE']['unroll_factor']} * \")\n        if desp[\"compute\"][\"PE\"][\"ele_type\"] == \"float\":\n            f.write(f\"5\\n\")\n        else:\n            raise RuntimeError(f\"Unsupported data type {desp['compute']['PE']['ele_type']} in resource estimation\")        \n        f.write(\"\\n\")\n\n        # Print function est_BRAM18K\n        f.write(\"\\t# BRAM18K\\n\")\n        f.write(\"\\tdef est_BRAM18K(ele_size, ele_num, pack):\\n\")\n        #f.write(f\"\\t\\treturn ceil(ele_size*8*pack / 18) * ceil(ele_num/pack/1024)\\n\\n\")\n        f.write(f\"\\t\\treturn ceil(ele_size*8*pack / 36) * ceil(ele_num/pack/512)\\n\\n\")\n\n        f.write(f\"\\tres_meta = {{}}\\n\")\n        # Check if drain module can be merged.\n        # Note: It should be supported in the codegen of AutoSA. However, currently, \n        # we move it here in the tuner.\n        mem_meta_info = {}\n        out_module = {}\n        out_drain_module = {}\n        for module in desp[\"memory\"]:\n            module_mem = desp[\"memory\"][module]\n            if module.endswith('_out'):\n                item = {'buf_size': module_mem['buf_size'], \n                        'num': module_mem['num']}\n                if module.find('drain') != -1:\n                    item['merged'] = 0\n                    out_drain_module[module_mem['array']] = item\n                else:                    \n                    if module_mem['array'] not in out_module:\n                        out_module[module_mem['array']] = [item]\n                    else:\n                        out_module[module_mem['array']].append(item)\n        for array in out_drain_module:\n            if array in out_module:\n                for m in out_module[array]:                \n                    if m['buf_size'] == out_drain_module[array]['buf_size'] and \\\n                       m['num'] == out_drain_module[array]['num']:\n                       out_drain_module[array]['merged'] = 1\n\n        for module in desp[\"memory\"]:\n            module_mem = desp[\"memory\"][module]\n            if module.find('drain') != -1 and out_drain_module[module_mem['array']]['merged'] == 1:\n                continue\n            f.write(f\"\\t{module}_unit_memory = est_BRAM18K({module_mem['ele_size']}, \")\n            f.write(f\"{module_mem['buf_size']}, \")\n            if \"data_pack_factor_inter\" in module_mem:\n                f.write(f\"{module_mem['data_pack_factor_inter']})\\n\")\n            else:\n                f.write(f\"1)\\n\")\n            f.write(f\"\\tres_meta[\\\"{module}\\\"] = {{\\\"ele_size\\\": {module_mem['ele_size']}, \\\"buf_size\\\": {module_mem['buf_size']}, \\\"data_pack_factor\\\": 1, \\\"num\\\": {module_mem['num']}}}\\n\")\n            if module_mem['double_buffer']:\n                f.write(f\"\\tres_meta[\\\"{module}\\\"][\\\"num\\\"] *= 2\\n\")\n            if \"data_pack_factor\" in module_mem:\n                f.write(f\"\\tres_meta[\\\"{module}\\\"][\\\"data_pack_factor\\\"] = {module_mem['data_pack_factor_inter']}\\n\")\n        #f.write(\"\\tprint(A_IO_L1_in_unit_memory)\\n\")\n        #f.write(\"\\tprint(A_IO_L2_in_unit_memory)\\n\")\n        #f.write(\"\\tprint(B_IO_L2_in_unit_memory)\\n\")        \n        #f.write(\"\\tprint(PE_unit_memory)\\n\")\n        #f.write(\"\\tprint(C_1_IO_L2_out_unit_memory)\\n\")        \n        #f.write(\"\\tprint(C_drain_IO_L1_out_unit_memory)\\n\")\n\n        f.write(\"\\tBRAM18K = \")\n        is_first = True\n        for module in desp[\"memory\"]:\n            module_mem = desp[\"memory\"][module]\n            if module.find('drain') != -1 and out_drain_module[module_mem['array']]['merged'] == 1:\n                continue\n            if not is_first:\n                f.write(\" + \")            \n            f.write(f\"{module}_unit_memory\")\n            if module_mem[\"double_buffer\"]:\n                f.write(f\" * 2\")\n            else:\n                f.write(f\" * 1\")\n            f.write(f\" * {module_mem['num']}\")            \n            is_first = False            \n        f.write(\"\\n\\n\")\n\n        f.write(\"\\t# URAM\\n\")\n        f.write(\"\\tURAM = 0\\n\\n\")\n\n        f.write(\"\\tres = {\\\"DSP\\\": DSP, \\\"BRAM18K\\\": BRAM18K, \\\"URAM\\\": URAM}\\n\")\n        for module in desp[\"memory\"]:\n            module_mem = desp[\"memory\"][module]\n            if module.find('drain') != -1 and out_drain_module[module_mem['array']]['merged'] == 1:\n                continue\n            f.write(f\"\\tres['{module}_unit_memory'] = {module}_unit_memory\\n\")\n\n        f.write(\"\\n\\treturn res, res_meta\\n\")\n        f.write(\"\\n\")\n\n    def print_latency_est_func(self, f, desp):\n        f.write(\"def est_latency(params):\\n\")\n        # Load parameters\n        f.write(\"\\t\")\n        is_first = True\n        for p in desp[\"params\"]:\n            if not is_first:\n                f.write(\", \")\n            f.write(p[\"name\"])\n            is_first = False\n        f.write(\" = \")\n        is_first = True\n        for p in desp[\"params\"]:\n            if not is_first:\n                f.write(\", \")\n            f.write(f'params[\\\"{p[\"name\"]}\\\"]')\n            is_first = False\n        f.write(\"\\n\\n\")\n\n        def extract_latency_expr(lat, info):\n            ret = \"\"\n            if lat[\"type\"] == \"block\":\n                info[\"has_for_child\"] = 0\n                no_for_child = True\n                is_first = True\n                ret += \"(\"\n                for child in lat[\"child\"]:\n                    if not is_first:\n                        ret += \" + \"                    \n                    ret += extract_latency_expr(child, info)                    \n                    if info[\"has_for_child\"] == 1:\n                        no_for_child = False\n                    is_first = False\n                ret += \")\"\n                if no_for_child:\n                    ret = \"1\"\n            elif lat[\"type\"] == \"for\":                \n                child = lat[\"child\"]\n                expr = extract_latency_expr(child, info)                \n                if info[\"valid\"]:\n                    ret = lat[\"bounds\"][1] + \" * \" + expr\n                else:\n                    ret = expr\n                info[\"has_for_child\"] = 1\n            elif lat[\"type\"] == \"mark\":      \n                if info[\"under_mark\"] and lat[\"content\"] == info[\"under_mark\"]:\n                    info[\"valid\"] = True\n                if lat[\"content\"] == \"simd\":\n                    if info[\"valid\"]:\n                        ret = \"1\"\n                    else:\n                        ret = \"0\"\n                else:\n                    child = lat[\"child\"]\n                    ret = extract_latency_expr(child, info)\n                if info[\"under_mark\"] and lat[\"content\"] == info[\"under_mark\"]:\n                    info[\"valid\"] = False\n            elif lat[\"type\"] == \"user\":\n                user_expr = lat[\"child\"][\"user_expr\"]\n                if 'inter_intra' in user_expr or 'intra_inter' in user_expr:                    \n                    if user_expr[:-2].split(\".\")[-1] == \"1\":\n                        double_buffer = 1\n                    else:\n                        double_buffer = 0                    \n                    # Plug in submodule latency\n                    if f\"{info['name']}_inter\" in info[\"modules\"]:\n                        inter_expr = info[\"modules\"][f\"{info['name']}_inter\"]\n                    else:\n                        inter_expr = None\n                    if f\"{info['name']}_intra\" in info[\"modules\"]:\n                        intra_expr = info[\"modules\"][f\"{info['name']}_intra\"]\n                    else:\n                        intra_expr = None\n\n                    if inter_expr and intra_expr:\n                        if info[\"in\"] == 1 or info[\"in\"] == 0:\n                            ret = inter_expr\n                        else:\n                            if double_buffer:\n                                ret = f\"max({inter_expr}, {intra_expr})\"\n                            else:\n                                ret = f\"({inter_expr} + {intra_expr})\"\n                        info[\"has_for_child\"] = 1\n                    else:                        \n                        ret = \"1\"                        \n                    if not info[\"valid\"]:\n                        ret = \"0\"\n                elif \"inter_trans\" in user_expr:\n                    # Plug in submodule latency\n                    if f\"{info['name']}_inter\" in info[\"modules\"]:\n                        ret = info[\"modules\"][f\"{info['name']}_inter\"]\n                    else:\n                        ret = \"1\"\n                    if not info[\"valid\"]:\n                        ret = \"0\"\n                elif \"intra_trans\" in user_expr:\n                    # Plug in submodule latency                    \n                    if f\"{info['name']}_intra\" in info[\"modules\"]:\n                        ret = info[\"modules\"][f\"{info['name']}_intra\"]\n                    else:\n                        ret = \"1\"\n                    if not info[\"valid\"]:\n                        ret = \"0\"\n                else:\n                    ret = \"1\"\n            elif lat[\"type\"] == \"if\":\n                # Only examine the first child\n                child = lat[\"child\"][0]\n                ret = extract_latency_expr(child, info)\n            elif lat[\"type\"] == \"array_tile\":      \n                if info[\"module_attr\"][\"to_dram\"] == 1:\n                    if info[\"module_attr\"][\"serialize\"] == 0:\n                        # Consider the DRAM latency here.\n                        ret = \"(\" + f\"{lat['size']}/{lat['last_dim']}*(20+{lat['last_dim']}/(512/8/{lat['ele_size']}))\" + \")\"\n                    else:\n                        ret = \"(\" + lat[\"size\"] + \"/\" + f\"min({lat['data_pack_factor']}, 512/8/{lat['ele_size']})\" + \")\"\n                else:\n                    ret = \"(\" + lat[\"size\"] + \"/\" + lat[\"data_pack_factor\"] + \")\"                    \n            else:\n                raise RuntimeError(f\"Unsupported latency node type {lat['type']}\")\n\n            return ret\n\n        # Check if drain module can be omitted\n        # Note: It should be supported in the codegen of AutoSA. However, currently,\n        # we move it here in the tuner.        \n        out_module = {}\n        out_drain_module = {}\n        for module in desp[\"memory\"]:\n            module_mem = desp[\"memory\"][module]\n            if module.endswith('_out'):\n                item = {'buf_size': module_mem['buf_size'], \n                        'num': module_mem['num']}\n                if module.find('drain') != -1:\n                    item['merged'] = 0\n                    out_drain_module[module_mem['array']] = item\n                else:                    \n                    if module_mem['array'] not in out_module:\n                        out_module[module_mem['array']] = [item]\n                    else:\n                        out_module[module_mem['array']].append(item)\n        for array in out_drain_module:\n            if array in out_module:\n                for m in out_module[array]:                \n                    if m['buf_size'] == out_drain_module[array]['buf_size'] and \\\n                       m['num'] == out_drain_module[array]['num']:\n                       out_drain_module[array]['merged'] = 1\n\n        # Latency prologue\n        latency_prologue_items = []\n        info = {\"has_for_child\": 0, \"name\": None, \"modules\": {}}\n        for i in range(2):\n            for module in desp[\"latency\"]:\n                if desp[\"attr\"][module][\"in\"] != 1:\n                    continue\n                if \"inter\" in module or \"intra\" in module:                    \n                    # Keep all the latency AST under the mark.\n                    info[\"valid\"] = True\n                    info[\"under_mark\"] = None\n                    info[\"in\"] = 1\n                else:\n                    # Only keep the latency AST under the mark.\n                    info[\"valid\"] = False\n                    info[\"under_mark\"] = \"array\"\n                    info[\"in\"] = 1\n                module_lat = desp[\"latency\"][module]  \n                info[\"name\"] = module     \n                info[\"module_attr\"] = desp[\"attr\"][module]\n                info[\"modules\"][module] = extract_latency_expr(module_lat, info)\n        for module in info[\"modules\"]:\n            if \"inter\" in module or \"intra\" in module:\n                continue\n            if module.find('drain') != -1 and out_drain_module[module_mem['array']]['merged'] == 1:\n                continue\n            f.write(f\"\\t{module}_single_latency = \")                        \n            f.write(info[\"modules\"][module])\n            f.write(f\"\\n\")      \n            latency_prologue_items.append(f\"{module}_single_latency\")\n        f.write(\"\\tlatency_prologue = max(\")\n        is_first = True\n        for module in info[\"modules\"]:\n            if \"inter\" in module or \"intra\" in module:\n                continue \n            if module.find('drain') != -1 and out_drain_module[module_mem['array']]['merged'] == 1:\n                continue           \n            if not is_first:\n                f.write(\", \")\n            f.write(f\"{module}_single_latency\")\n            is_first = False\n        f.write(\")\\n\\n\")\n\n        # Latency epilogue\n        latency_epilogue_items = []\n        info = {\"has_for_child\": 0, \"name\": None, \"modules\": {}}\n        for i in range(2):\n            for module in desp[\"latency\"]:\n                if desp[\"attr\"][module][\"in\"] != 0:\n                    continue\n                if \"inter\" in module or \"intra\" in module:\n                    info[\"valid\"] = True\n                    info[\"under_mark\"] = None\n                    info[\"in\"] = 0\n                else:\n                    info[\"valid\"] = False\n                    info[\"under_mark\"] = \"array\"\n                    info[\"in\"] = 0\n                module_lat = desp[\"latency\"][module]  \n                info[\"name\"] = module                \n                info[\"module_attr\"] = desp[\"attr\"][module]\n                info[\"modules\"][module] = extract_latency_expr(module_lat, info)\n        for module in info[\"modules\"]:            \n            if \"inter\" in module or \"intra\" in module:\n                continue\n            if module.find('drain') != -1:\n                array_name = module[:module.find(\"_drain_IO\")]                \n                if out_drain_module[array_name]['merged'] == 1:\n                    continue\n            f.write(f\"\\t{module}_single_latency = \")                        \n            f.write(info[\"modules\"][module])\n            latency_epilogue_items.append(f\"{module}_single_latency\")\n            f.write(f\"\\n\")        \n        cnt = 0\n        for module in info[\"modules\"]:\n            if \"inter\" in module or \"intra\" in module:\n                continue    \n            if module.find('drain') != -1:\n                array_name = module[:module.find(\"_drain_IO\")]                \n                if out_drain_module[array_name]['merged'] == 1:\n                    continue                 \n            cnt += 1\n        if cnt == 1:\n            f.write(\"\\tlatency_epilogue = \")\n        else:\n            f.write(\"\\tlatency_epilogue = max(\")\n        is_first = True\n        for module in info[\"modules\"]:\n            if \"inter\" in module or \"intra\" in module:\n                continue    \n            if module.find('drain') != -1:\n                array_name = module[:module.find(\"_drain_IO\")]                \n                if out_drain_module[array_name]['merged'] == 1:\n                    continue                    \n            if not is_first:\n                f.write(\", \")\n            f.write(f\"{module}_single_latency\")\n            is_first = False\n        if cnt == 1:            \n            f.write(\"\\n\\n\")\n        else:\n            f.write(\")\\n\\n\")\n\n        # Latency main\n        latency_main_items = []\n        info = {\"has_for_child\": 0, \"name\": None, \"modules\": {}}\n        for i in range(2):\n            # Run second time to fill in the incomplete expression            \n            for module in desp[\"latency\"]:\n                module_lat = desp[\"latency\"][module]  \n                info[\"name\"] = module\n                info[\"valid\"] = True\n                info[\"under_mark\"] = None\n                info[\"in\"] = -1\n                info[\"module_attr\"] = desp[\"attr\"][module]\n                info[\"modules\"][module] = extract_latency_expr(module_lat, info)            \n        for module in info[\"modules\"]:\n            if \"inter\" in module or \"intra\" in module:\n                continue\n            if module.find('drain') != -1:\n                array_name = module[:module.find(\"_drain_IO\")]                \n                if out_drain_module[array_name]['merged'] == 1:\n                    continue                  \n            f.write(f\"\\t{module}_latency = \")                        \n            f.write(info[\"modules\"][module])\n            f.write(f\"\\n\")        \n            latency_main_items.append(f\"{module}_latency\")\n        f.write(\"\\tlatency_main = max(\")\n        is_first = True\n        for module in info[\"modules\"]:\n            if \"inter\" in module or \"intra\" in module:\n                continue   \n            if module.find('drain') != -1:\n                array_name = module[:module.find(\"_drain_IO\")]                \n                if out_drain_module[array_name]['merged'] == 1:\n                    continue                      \n            if not is_first:\n                f.write(\", \")\n            f.write(f\"{module}_latency\")\n            is_first = False\n        f.write(\")\\n\\n\")\n\n        #f.write(\"\\tprint(latency_prologue, latency_main, latency_epilogue)\\n\\n\")\n\n        f.write(\"\\tlatency = latency_prologue + latency_main + latency_epilogue\\n\\n\")\n        \n        f.write(\"\\t# Meta information, used for conv fusion only\\n\")\n        f.write(\"\\tlatency_meta = {\\\"latency_prologue\\\": {}, \\\"latency_main\\\": {}, \\\"latency_epilogue\\\": {}}\\n\")\n        # Prologue        \n        for item in latency_prologue_items:            \n            f.write(f\"\\tlatency_meta[\\\"latency_prologue\\\"][\\\"{item}\\\"] = {item}\\n\")\n        # Epilogue\n        for item in latency_epilogue_items:            \n            f.write(f\"\\tlatency_meta[\\\"latency_epilogue\\\"][\\\"{item}\\\"] = {item}\\n\")\n        # Main\n        for item in latency_main_items:            \n            f.write(f\"\\tlatency_meta[\\\"latency_main\\\"][\\\"{item}\\\"] = {item}\\n\")\n\n        f.write(\"\\treturn latency, latency_meta\\n\")\n        f.write(\"\\n\")\n\n    def print_activity_est_func(self, f, desp):\n        f.write(\"def est_activity(params):\\n\")\n        # Load parameters\n        f.write(\"\\t\")\n        is_first = True\n        for p in desp[\"params\"]:\n            if not is_first:\n                f.write(\", \")\n            f.write(p[\"name\"])\n            is_first = False\n        f.write(\" = \")\n        is_first = True\n        for p in desp[\"params\"]:\n            if not is_first:\n                f.write(\", \")\n            f.write(f'params[\\\"{p[\"name\"]}\\\"]')\n            is_first = False\n        f.write(\"\\n\\n\")\n\n        def extract_stmt_call_num_expr(lat, info):\n            ret = \"\"\n            if lat[\"type\"] == \"block\":\n                info[\"has_for_child\"] = 0\n                no_for_child = True\n                is_first = True\n                ret += \"(\"\n                for child in lat[\"child\"]:\n                    if not is_first:\n                        ret += \" + \"                    \n                    ret += extract_stmt_call_num_expr(child, info)                    \n                    if info[\"has_for_child\"] == 1:\n                        no_for_child = False\n                    is_first = False\n                ret += \")\"\n                if no_for_child:\n                    ret = \"1\"\n            elif lat[\"type\"] == \"for\":                \n                child = lat[\"child\"]\n                expr = extract_stmt_call_num_expr(child, info)                \n                #if not info[\"ignore_inter\"]:\n                #    if info[\"valid\"]:\n                #        ret = lat[\"bounds\"][1] + \" * \" + expr\n                #    else:\n                #        ret = expr\n                #else:\n                #ret = expr\n                ret = lat[\"bounds\"][1] + \" * \" + expr\n                info[\"has_for_child\"] = 1\n            elif lat[\"type\"] == \"mark\":      \n                if info[\"under_mark\"] and lat[\"content\"] == info[\"under_mark\"]:\n                    info[\"valid\"] = True\n                if lat[\"content\"] == \"simd\":\n                    if info[\"valid\"]:\n                        ret = \"1\"\n                    else:\n                        ret = \"0\"\n                else:\n                    child = lat[\"child\"]\n                    ret = extract_stmt_call_num_expr(child, info)\n                if info[\"under_mark\"] and lat[\"content\"] == info[\"under_mark\"]:\n                    info[\"valid\"] = False\n            elif lat[\"type\"] == \"user\":\n                user_expr = lat[\"child\"][\"user_expr\"]\n                if 'inter_intra' in user_expr or 'intra_inter' in user_expr:                    \n                    if user_expr[:-2].split(\".\")[-1] == \"1\":\n                        double_buffer = 1\n                    else:\n                        double_buffer = 0                    \n                    # Plug in submodule latency\n                    if f\"{info['name']}_inter\" in info[\"modules\"]:\n                        inter_expr = info[\"modules\"][f\"{info['name']}_inter\"]\n                    else:\n                        inter_expr = None\n                    if f\"{info['name']}_intra\" in info[\"modules\"]:\n                        intra_expr = info[\"modules\"][f\"{info['name']}_intra\"]\n                    else:\n                        intra_expr = None\n\n                    if inter_expr and intra_expr:\n                        if info[\"in\"] == 1 or info[\"in\"] == 0:\n                            ret = inter_expr\n                        else:\n                            if info['target'] in [\"on_chip_transfer_io\"]:\n                                ret = f\"({inter_expr})\"\n                            elif info['target'] in [\"on_chip_transfer_pe\"]:\n                                ret = f\"({intra_expr})\"\n                            else:\n                                ret = f\"({inter_expr} + {intra_expr})\"                            \n                        info[\"has_for_child\"] = 1\n                    else:                        \n                        ret = \"1\"                        \n                    if not info[\"valid\"]:\n                        ret = \"0\"\n                #elif \"inter_trans\" in user_expr:\n                #    # Plug in submodule latency\n                #    if f\"{info['name']}_inter\" in info[\"modules\"]:\n                #        ret = info[\"modules\"][f\"{info['name']}_inter\"]\n                #    else:\n                #        ret = \"1\"\n                #    if not info[\"valid\"]:\n                #        ret = \"0\"\n                #elif \"intra_trans\" in user_expr:\n                #    # Plug in submodule latency                    \n                #    if f\"{info['name']}_intra\" in info[\"modules\"]:\n                #        ret = info[\"modules\"][f\"{info['name']}_intra\"]\n                #    else:\n                #        ret = \"1\"\n                #    if not info[\"valid\"]:\n                #        ret = \"0\"                \n                else: \n                    if info[\"target\"] in [\"on_chip_transfer_pe\", \"on_chip_transfer_io\", \"pe_compute_op\", \"on_chip_acc\"]:\n                        ret = \"0\"\n                    else:\n                        ret = \"1\"\n            elif lat[\"type\"] == \"if\":\n                # Only examine the first child\n                child = lat[\"child\"][0]\n                ret = extract_stmt_call_num_expr(child, info)\n            elif lat[\"type\"] == \"array_tile\":           \n                if info[\"target\"] in [\"on_chip_acc\"]:\n                    ret = \"(\" + lat[\"size\"] + \"/\" + lat[\"data_pack_factor\"] + \")\"\n                else:\n                    ret = \"(\" + lat[\"size\"] + \")\"\n            else:\n                raise RuntimeError(f\"Unsupported latency node type {lat['type']}\")\n\n            return ret\n        \n        # Merge drain modules if necessary\n        out_module = {}\n        out_drain_module = {}\n        for module in desp[\"memory\"]:\n            module_mem = desp[\"memory\"][module]\n            if module.endswith('_out'):\n                item = {'buf_size': module_mem['buf_size'], \n                        'num': module_mem['num']}\n                if module.find('drain') != -1:\n                    item['merged'] = 0\n                    out_drain_module[module_mem['array']] = item\n                else:                    \n                    if module_mem['array'] not in out_module:\n                        out_module[module_mem['array']] = [item]\n                    else:\n                        out_module[module_mem['array']].append(item)\n        for array in out_drain_module:\n            if array in out_module:\n                for m in out_module[array]:                \n                    if m['buf_size'] == out_drain_module[array]['buf_size'] and \\\n                       m['num'] == out_drain_module[array]['num']:\n                       out_drain_module[array]['merged'] = 1\n\n        # Extract the off-chip access expression\n        info = {\"has_for_child\": 0, \"name\": None, \"modules\": {}}\n        for i in range(2):\n            # Run second time to fill in the incomplete expression            \n            for module in desp[\"latency\"]:                \n                module_lat = desp[\"latency\"][module]  \n                info[\"name\"] = module\n                info[\"valid\"] = True\n                info[\"under_mark\"] = None\n                info[\"target\"] = \"off_chip_acc\"\n                info[\"in\"] = -1\n                info[\"module_attr\"] = desp[\"attr\"][module]\n                info[\"modules\"][module] = extract_stmt_call_num_expr(module_lat, info)\n\n        f.write(\"\\tactivity = {}\\n\")\n        f.write(\"\\tactivity[\\\"off_chip_acc_num_meta\\\"] = {}\\n\")\n        # Off-chip access\n        # outermost I/O module latency * data_pack_factor\n        f.write(\"\\toff_chip_acc_num = 0\\n\")\n        for module in info[\"modules\"]:\n            if desp[\"attr\"][module][\"to_dram\"] != 1:\n                continue\n            if \"inter\" in module or \"intra\" in module:\n                continue\n            if module.find('drain') != -1:\n                array_name = module[:module.find(\"_drain_IO\")]                \n                if out_drain_module[array_name]['merged'] == 1:\n                    continue                      \n            f.write(f\"\\t{module}_off_chip_acc_num = \")\n            f.write(info[\"modules\"][module])\n            f.write(\"\\n\")\n            f.write(f\"\\tactivity[\\\"off_chip_acc_num_meta\\\"][\\\"{module}\\\"] = {module}_off_chip_acc_num\\n\")\n            f.write(f\"\\toff_chip_acc_num += {module}_off_chip_acc_num\\n\")\n        \n        f.write(\"\\tactivity[\\\"off_chip_acc_num\\\"] = off_chip_acc_num\\n\\n\")\n\n        # NOC access        \n        # For each I/O group,\n        # sum_{io_level}(#io_modules(level)*inter_latency*data_pack_factor_inter) + #pe_modules*intra_latency*data_pack_factor_intra\n        # Extract the on-chip data transfer expression\n        info = {\"has_for_child\": 0, \"name\": None, \"modules\": {}}\n        for i in range(2):\n            # Run second time to fill in the incomplete expression            \n            for module in desp[\"latency\"]:                \n                module_lat = desp[\"latency\"][module]  \n                info[\"name\"] = module\n                info[\"valid\"] = True\n                info[\"under_mark\"] = None\n                info[\"target\"] = \"on_chip_transfer_io\"\n                info[\"in\"] = -1\n                info[\"module_attr\"] = desp[\"attr\"][module]\n                info[\"modules\"][module] = extract_stmt_call_num_expr(module_lat, info)\n\n        f.write(\"\\tnoc_hop_num = 0\\n\")        \n        for module in desp[\"io\"]:                 \n            if module.find('drain') != -1:\n                array_name = module[:module.find(\"_drain_IO\")]                \n                if out_drain_module[array_name]['merged'] == 1:\n                    continue                 \n            f.write(f\"\\t{module}_io_noc_hop_num = (1 + {desp['io'][module]['dims'][-1]}) / 2\\n\")            \n            f.write(f\"\\t{module}_io_noc_hop_num *= {info['modules'][module]}\\n\")\n            if len(desp['io'][module]['dims']) > 1:\n                for idx in range(len(desp['io'][module]['dims']) - 1):\n                    f.write(f\"\\t{module}_io_noc_hop_num *= {desp['io'][module]['dims'][idx]}\\n\")\n            f.write(f\"\\tnoc_hop_num += {module}_io_noc_hop_num\\n\")\n            \n        info = {\"has_for_child\": 0, \"name\": None, \"modules\": {}}\n        for i in range(2):\n            # Run second time to fill in the incomplete expression            \n            for module in desp[\"latency\"]:                \n                module_lat = desp[\"latency\"][module]  \n                info[\"name\"] = module\n                info[\"valid\"] = True\n                info[\"under_mark\"] = None\n                info[\"target\"] = \"on_chip_transfer_pe\"\n                info[\"in\"] = -1\n                info[\"module_attr\"] = desp[\"attr\"][module]\n                info[\"modules\"][module] = extract_stmt_call_num_expr(module_lat, info)\n        for module in desp[\"io\"]:\n            if module.find('drain') != -1:\n                array_name = module[:module.find(\"_drain_IO\")]                \n                if out_drain_module[array_name]['merged'] == 1:\n                    continue                    \n            if desp[\"attr\"][module][\"to_pe\"]:                \n                f.write(f\"\\t{module}_pe_noc_hop_num = {desp['compute']['PE']['num']}\\n\")\n                f.write(f\"\\t{module}_pe_noc_hop_num *= {info['modules'][module]}\\n\")\n                f.write(f\"\\t{module}_pe_noc_hop_num *= {desp['memory'][module]['data_pack_factor_intra']}\\n\")\n                f.write(f\"\\tnoc_hop_num += {module}_pe_noc_hop_num\\n\")\n\n        f.write(\"\\tactivity[\\\"noc_hop_num\\\"] = noc_hop_num\\n\\n\")\n        \n        # Computations\n        info = {\"has_for_child\": 0, \"name\": None, \"modules\": {}}\n        for i in range(2):\n            # Run second time to fill in the incomplete expression            \n            for module in desp[\"latency\"]:                \n                module_lat = desp[\"latency\"][module]  \n                info[\"name\"] = module\n                info[\"valid\"] = True\n                info[\"under_mark\"] = None\n                info[\"target\"] = \"pe_compute_op\"                \n                info[\"in\"] = -1\n                info[\"module_attr\"] = desp[\"attr\"][module]\n                info[\"modules\"][module] = extract_stmt_call_num_expr(module_lat, info)\n        \n        # Compute operation\n        # PE latency * simd\n        f.write(\"\\tcompute_stmt_call_num = 0\\n\")\n        f.write(f\"\\tcompute_stmt_call_num = {desp['compute']['PE']['unroll_factor']}\\n\")        \n        f.write(f\"\\tcompute_stmt_call_num *= {info['modules']['PE']}\\n\")\n        f.write(f\"\\tcompute_stmt_call_num *= {desp['compute']['PE']['num']}\\n\")\n        f.write(\"\\tactivity[\\\"compute_stmt_call_num\\\"] = compute_stmt_call_num\\n\\n\")\n\n        # IO module access        \n        # sum(inter latency * data_pack_factor_inter + intra latency * data_pack_factor_inter)\n        info = {\"has_for_child\": 0, \"name\": None, \"modules\": {}}\n        for i in range(2):\n            # Run second time to fill in the incomplete expression            \n            for module in desp[\"latency\"]:                \n                module_lat = desp[\"latency\"][module]  \n                info[\"name\"] = module\n                info[\"valid\"] = True\n                info[\"under_mark\"] = None\n                info[\"target\"] = \"on_chip_acc\"\n                info[\"in\"] = -1\n                info[\"module_attr\"] = desp[\"attr\"][module]\n                info[\"modules\"][module] = extract_stmt_call_num_expr(module_lat, info)\n\n        f.write(\"\\tio_module_mem_acc_num = 0\\n\")\n        for module in desp[\"memory\"]:\n            module_mem = desp[\"memory\"][module]\n            if module.find('drain') != -1 and out_drain_module[module_mem['array']]['merged'] == 1:\n                continue\n            if \"PE\" in module:\n                continue            \n            f.write(f\"\\t{module}_mem_acc_num = {info['modules'][module]}\\n\")\n            f.write(f\"\\t{module}_mem_acc_num *= {desp['memory'][module]['data_pack_factor_inter']}\\n\")\n            f.write(f\"\\tio_module_mem_acc_num += {module}_mem_acc_num\\n\")\n        \n        f.write(\"\\tactivity[\\\"io_module_mem_acc_num\\\"] = io_module_mem_acc_num\\n\\n\")\n        \n        # PE module access\n        # PE latency * simd * 4 (op1, op2, res(R), res(W))        \n        f.write(\"\\tpe_module_reg_acc_num = 0\\n\")\n        f.write(\"\\tpe_module_mem_acc_num = 0\\n\")\n        if \"PE\" in desp[\"memory\"]:\n            f.write(\"\\tpe_module_reg_acc_num = 2\\n\") # op1, op2\n            f.write(\"\\tpe_module_mem_acc_num = 2\\n\") # res(R), res(W)\n        else:\n            f.write(\"\\tpe_module_reg_acc_num = 4\\n\") # op1, op2, res(R), res(W)\n            f.write(\"\\tpe_module_mem_acc_num = 0\\n\") #         \n        f.write(f\"\\tpe_module_reg_acc_num *= {desp['compute']['PE']['unroll_factor']}\\n\")\n        f.write(f\"\\tpe_module_reg_acc_num *= {info['modules']['PE']}\\n\")\n        f.write(f\"\\tpe_module_reg_acc_num *= {desp['compute']['PE']['num']}\\n\")\n        f.write(f\"\\tpe_module_mem_acc_num *= {desp['compute']['PE']['unroll_factor']}\\n\")\n        f.write(f\"\\tpe_module_mem_acc_num *= {info['modules']['PE']}\\n\")\n        f.write(f\"\\tpe_module_mem_acc_num *= {desp['compute']['PE']['num']}\\n\")\n        f.write(\"\\tactivity[\\\"pe_module_reg_acc_num\\\"] = pe_module_reg_acc_num\\n\")\n        f.write(\"\\tactivity[\\\"pe_module_mem_acc_num\\\"] = pe_module_mem_acc_num\\n\\n\")\n\n        f.write(\"\\treturn activity\\n\")\n        f.write(\"\\n\")\n        \n    def print_infer_params_func(self, f, desp):\n        f.write(\"def infer_params(params):\\n\")\n        # Load parameters\n        f.write(\"\\t\")\n        is_first = True\n        for p in desp[\"params\"]:\n            if \"tags\" in p and \"auto_infer\" in p[\"tags\"]:\n                continue\n            if not is_first:\n                f.write(\", \")            \n            f.write(p[\"name\"])\n            is_first = False\n        f.write(\" = \")\n        is_first = True\n        for p in desp[\"params\"]:\n            if \"tags\" in p and \"auto_infer\" in p[\"tags\"]:\n                continue\n            if not is_first:\n                f.write(\", \")            \n            f.write(f'params[\\\"{p[\"name\"]}\\\"]')\n            is_first = False\n        f.write(\"\\n\\n\")\n\n        for p in desp[\"params\"]:\n            if \"tags\" in p and \"auto_infer\" in p[\"tags\"]:\n                f.write(f\"\\t{p['name']}_choices = [n*{p['bounds'][0]} for n in range(1, {p['bounds'][1]}//{p['bounds'][0]}+1) if {p['bounds'][1]}%(n*{p['bounds'][0]})==0]\\n\")\n                f.write(f\"\\tif len({p['name']}_choices) == 0:\\n\")\n                f.write(f\"\\t\\treturn None\\n\")\n                f.write(f\"\\tparams[\\\"{p['name']}\\\"] = max({p['name']}_choices)\\n\")\n        f.write(\"\\n\")                \n        f.write(\"\\treturn params\\n\\n\")\n\n    def print_random_sampling_func(self, f, desp):\n        f.write(\"def random_sampling(params):\\n\")\n        f.write(f\"\\tdef filter_non_power_of_two(x):\\n\")\n        f.write(f\"\\t\\tif np.log2(x) != int(np.log2(x)):\\n\")\n        f.write(f\"\\t\\t\\treturn True\\n\")\n        f.write(f\"\\t\\treturn False\\n\\n\")\n        # Print the task params\n        for p in self.params_config[\"external\"]:\n            f.write(f\"\\t{p} = params[\\\"{p}\\\"]\\n\")\n        f.write(\"\\twhile True:\\n\")\n        params_to_process = []\n        for param in self.params_config[\"tunable\"]:\n            params_to_process.append(self.params_config[\"tunable\"][param])\n        #while len(params_to_process) > 0:            \n        while True:\n            update = False\n            for param in params_to_process:\n                if \"divisors\" not in param: \n                    #print(\"first \", param[\"name\"])                   \n                    f.write(f\"\\t\\tsample = random.randint(int({param['bounds'][0]}), int({param['bounds'][1]}))\\n\")\n                    f.write(f\"\\t\\t{param['name']} = sample\\n\")\n                    f.write(f\"\\t\\tparams[\\\"{param['name']}\\\"] = sample\\n\")\n                    params_to_process.remove(param)\n                    update = True\n            if not update:\n                break\n        while len(params_to_process) > 0:            \n            for param in params_to_process:                \n                if \"divisors\" in param and param[\"divisors\"] not in params_to_process:                    \n                    #print(\"second \", param[\"name\"])\n                    if \"tags\" in param and \"power_of_two\" in param[\"tags\"]:\n                        f.write(f\"\\t\\tsample = random.sample(utils.get_divisors(int({param['bounds'][1]}), filter_non_power_of_two), 1)[-1]\\n\")\n                    else:\n                        f.write(f\"\\t\\tsample = random.sample(utils.get_divisors(int({param['bounds'][1]}), None), 1)[-1]\\n\")\n                    f.write(f\"\\t\\t{param['name']} = sample\\n\")\n                    f.write(f\"\\t\\tparams[\\\"{param['name']}\\\"] = sample\\n\")\n                    params_to_process.remove(param)\n        # Latency hiding\n        if \"PE\" not in desp[\"memory\"]:        \n            f.write(f\"\\t\\tbreak\\n\")\n        else:\n            f.write(f\"\\t\\tlatency_factors = 1\\n\")\n            for p, param in self.params_config[\"tunable\"].items():\n                if param[\"attr\"] == \"latency_tiling_factor\":\n                    f.write(f\"\\t\\tlatency_factors *= {param['name']}\\n\")\n                if param[\"attr\"] == \"SIMD_tiling_factor\":\n                    f.write(f\"\\t\\tsimd_factor = {param['name']}\\n\")\n            data_type = desp[\"memory\"][\"PE\"][\"ele_type\"]\n            if data_type == \"float\":\n                f.write(f\"\\t\\tif latency_factors >= 8 * simd_factor:\\n\")\n                f.write(f\"\\t\\t\\tbreak\\n\")\n            else:\n                raise RuntimeError(f\"Unsupported data type in random sample generation: {data_type}\")\n        f.write(\"\\n\")                \n        f.write(\"\\treturn params\\n\\n\")        \n\n    def print_bound_check_func(self, f, desp):\n        f.write(\"def bound_check(params):\\n\")\n        f.write(f\"\\tdef filter_non_power_of_two(x):\\n\")\n        f.write(f\"\\t\\tif np.log2(x) != int(np.log2(x)):\\n\")\n        f.write(f\"\\t\\t\\treturn True\\n\")\n        f.write(f\"\\t\\treturn False\\n\\n\")\n        # Load parameters\n        f.write(\"\\t\")\n        is_first = True\n        for p in desp[\"params\"]:\n            if not is_first:\n                f.write(\", \")\n            f.write(p[\"name\"])\n            is_first = False\n        f.write(\" = \")\n        is_first = True\n        for p in desp[\"params\"]:\n            if not is_first:\n                f.write(\", \")\n            f.write(f'params[\\\"{p[\"name\"]}\\\"]')\n            is_first = False\n        f.write(\"\\n\\n\")\n        for p in desp[\"params\"]:\n            if \"bounds\" in p:\n                f.write(f\"\\tif {p['name']} < {p['bounds'][0]}:\\n\")\n                f.write(f\"\\t\\treturn False\\n\")\n                # If the parameter is the first-level tiling factors, \n                # ignore the upper bounds.\n                if not p['name'].endswith('t1'):\n                    f.write(f\"\\tif {p['name']} > {p['bounds'][1]}:\\n\")\n                    f.write(f\"\\t\\treturn False\\n\")\n            if \"tags\" in p and \"power_of_two\" in p[\"tags\"]:\n                f.write(f\"\\tif filter_non_power_of_two({p['name']}):\\n\")\n                f.write(f\"\\t\\treturn False\\n\")\n        # Latency hiding\n        if \"PE\" in desp[\"memory\"]:\n            f.write(f\"\\tlatency_factors = 1\\n\")\n            for p, param in self.params_config[\"tunable\"].items():\n                if param[\"attr\"] == \"latency_tiling_factor\":\n                    f.write(f\"\\tlatency_factors *= {param['name']}\\n\")\n                if param[\"attr\"] == \"SIMD_tiling_factor\":\n                    f.write(f\"\\tsimd_factor = {param['name']}\\n\")\n            data_type = desp[\"memory\"][\"PE\"][\"ele_type\"]\n            if data_type == \"float\":\n                f.write(f\"\\tif latency_factors < 8 * simd_factor:\\n\")\n                f.write(f\"\\t\\treturn False\\n\")\n            else:\n                raise RuntimeError(f\"Unsupported data type in random sample generation: {data_type}\")\n        \n        f.write(\"\\treturn True\\n\\n\")        \n\n    def print_compute_arch_cst_func(self, f, desp):\n        f.write(\"def compute_arch_cst(params):\\n\")\n        # Load parameters\n        f.write(\"\\t\")\n        is_first = True\n        for p in desp[\"params\"]:\n            if not is_first:\n                f.write(\", \")\n            f.write(p[\"name\"])\n            is_first = False\n        f.write(\" = \")\n        is_first = True\n        for p in desp[\"params\"]:\n            if not is_first:\n                f.write(\", \")\n            f.write(f'params[\\\"{p[\"name\"]}\\\"]')\n            is_first = False\n        f.write(\"\\n\\n\")\n\n        f.write(\"\\tarch_features = {}\\n\")\n        \n        # Compute basic architecture information                \n        f.write(f\"\\tarch_features['dims'] = []\\n\")\n        for dim in desp['compute']['PE']['dims']:\n            f.write(f\"\\tarch_features[\\\"dims\\\"].append({dim})\\n\")\n            f.write(f\"\\tif arch_features[\\\"dims\\\"][-1] == 0:\\n\")        \n            f.write(f\"\\t\\treturn None\\n\")\n        f.write(f\"\\tarch_features[\\\"SIMD\\\"] = {desp['compute']['PE']['unroll_factor']}\\n\")\n\n        # data packing factors\n        f.write(\"\\tarch_features[\\\"data_pack\\\"] = {}\\n\")\n        for module in desp[\"memory\"]:\n            module_mem = desp[\"memory\"][module]\n            if 'data_pack_factor' in module_mem:\n                f.write(f\"\\tarch_features[\\\"data_pack\\\"][\\\"{module_mem['array']}\\\"] = [{module_mem['data_pack_factor']}]\\n\")\n\n        f.write(\"\\n\\treturn arch_features\\n\\n\")\n\n    def register(self, desp, py_f):\n        \"\"\" Register the design in the descriptor file\n        Generate all the necessary functions for evaluating the performance of the \n        target design.         \n        \"\"\"        \n        # Tuning parameters            \n        self.params_config = {\"external\": {}, \"tunable\": {}, \"infer\": {}}\n        for param in desp[\"params\"]:\n            if param[\"tunable\"]:\n                self.params_config[\"tunable\"][param[\"name\"]] = param\n            else:\n                if \"external\" in param[\"tags\"]:\n                    self.params_config[\"external\"][param[\"name\"]] = param\n                elif \"auto_infer\" in param[\"tags\"]:\n                    self.params_config[\"infer\"][param[\"name\"]] = param\n        \n        # Print design function            \n        with open(py_f, 'w') as f:\n            f.write(\"from math import ceil\\n\")\n            f.write(\"import numpy as np\\n\")\n            f.write(\"import random\\n\")\n            f.write(\"import utils\\n\\n\")\n\n            # Generate resource est func        \n            self.print_resource_est_func(f, desp)\n\n            # Generate latency est func\n            self.print_latency_est_func(f, desp)            \n        \n            # Generate activity est func\n            self.print_activity_est_func(f, desp)\n\n            # Generate infer parameter func\n            self.print_infer_params_func(f, desp)\n\n            # Generate the random sampling func\n            self.print_random_sampling_func(f, desp)\n\n            # Generate the bound check func\n            self.print_bound_check_func(f, desp)\n\n            # Generate the compute arch cst func\n            self.print_compute_arch_cst_func(f, desp)                \n\n        sys.path.append(os.path.dirname(py_f))\n        basename = os.path.basename(py_f).split(\".\")[0]        \n        module = __import__(basename)\n        self.est_resource_func = module.est_resource\n        self.est_latency_func = module.est_latency\n        self.est_activity_func = module.est_activity\n        self.infer_params_func = module.infer_params\n        self.random_sampling_func = module.random_sampling\n        self.bound_check_func = module.bound_check\n        self.compute_arch_cst_func = module.compute_arch_cst\n        self.desp = desp\n\n    def est_latency(self, params):\n        if not self.est_latency_func:\n            raise RuntimeError(f\"Latency estimation function for design {self.name} undefined\")\n        else:\n            return self.est_latency_func(params)\n    \n    def est_resource(self, params):\n        if not self.est_latency_func:\n            raise RuntimeError(f\"Resource estimation function for design {self.name} undefined\")\n        else:\n            return self.est_resource_func(params)\n\n    def est_activity(self, params):\n        if not self.est_activity_func:\n            raise RuntimeError(f\"Activity estimation function for design {self.name} undefined\")\n        else:\n            return self.est_activity_func(params)\n\n    def infer_params(self, params):\n        if not self.infer_params_func:\n            raise RuntimeError(f\"Internal parameter inference function for design {self.name} undefined\")\n        else:\n            return self.infer_params_func(params)\n\n    def random_sampling(self, params):\n        if not self.random_sampling_func:\n            raise RuntimeError(f\"Random sampling function for design {self.name} undefined\")\n        else:\n            return self.random_sampling_func(params)\n\n    def bound_check(self, params):\n        if not self.bound_check_func:\n            raise RuntimeError(f\"Bound check function for design {self.name} undefined\")\n        else:\n            return self.bound_check_func(params)            \n\n    def compute_arch_cst(self, params):\n        if not self.compute_arch_cst_func:\n            raise RuntimeError(f\"Compute architecture constraints function for design {self.name} undefined\")\n        else:\n            params = self.infer_params(params)\n            if params:\n                arch_cst = self.compute_arch_cst_func(params)\n                res = self.est_resource(params)\n                arch_cst['res_usage'] = res\n                return arch_cst\n            else:\n                return None"
  },
  {
    "path": "autosa_scripts/odyssey/designs/kernel3.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(j_t1/j_t2))\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"A_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"B_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\",\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"i_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"i_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(k_t1/k_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.16.2(c0, c1, c2, p0, 0, c5, c6, c7, 0, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c5)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"A_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p9\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"k_t1\",\n                                                \"size\": \"i_t2*k_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_serialize\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c3\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"j_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"j_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c4\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(k_t1/k_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.16.2(c0, c1, c2, p0, 0, c5, c6, c7, 0, 2 * p0 + 32 * c1 + c6, 32 * c2 + 2 * c5)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(j_t1/j_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p10\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"k_t1\",\n                                                \"size\": \"j_t2*k_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_serialize\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t2*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t2*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"j_t2\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"i_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"user_expr\": \"in_trans.fifo_C_drain_local.fifo_C_drain.1.2.1(c0, c1, 1, p0, p1, 15, c6, c7, 1, 2 * p1 + 32 * c0 + c7, 2 * p0 + 32 * c1 + c6)\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        \"content\": \"hls_pipeline\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"simd\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"latency\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p12\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"j_t2\",\n                                                        \"size\": \"i_t2*j_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c3\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p12\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"j_t2\",\n                                                        \"size\": \"i_t2*j_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c3\",\n                                            \"type\": \"for\"\n                                        }\n                                    ],\n                                    \"type\": \"if\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p12\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"j_t2\",\n                                                    \"size\": \"i_t2*j_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_serialize\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c3\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"j_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"i_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": [\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c5)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p1 + 32 * c1 + c6, 32 * c2 + 2 * c5)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"k_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"S_0(2 * p0 + 32 * c0 + c7, 2 * p1 + 32 * c1 + c6, 32 * c2 + 2 * c5 + c8)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_unroll\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c8\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            {\n                                                                \"child\": [\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out.fifo_C_drain.1.1(c0, c1, 1, p0, p1, 15, c6, c7, 2 * p0 + 32 * c0 + c7, 2 * p1 + 32 * c1 + c6)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    }\n                                                                ],\n                                                                \"type\": \"if\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p1 + 32 * c1 + c6, 32 * c2 + 2 * c5)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c5)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            }\n                                                        ],\n                                                        \"type\": \"block\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c7\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c5\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L2_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t2*k_t1)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"B_IO_L2_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t2*k_t1)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t2)\",\n            \"data_pack_factor_inter\": \"p12\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((j_t1/j_t2)*(i_t1/i_t2))\"\n        },\n        \"PE\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(j_t1/j_t2))\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p12\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel0_0.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L1_in\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_IO_L1_out\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L1_in\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in.fifo_cout.1.1(1, c1, c2, c3, p0, 4 * c1 + c5, 4 * c2 + c6, 2 * p0 + c7 + 8)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(r_t1/r_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(c_t1/c_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(i_t1/i_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"p\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"q\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"c_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"r_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"bounds\": [\n                                                                                        \"0\",\n                                                                                        \"o_t2\"\n                                                                                    ],\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c5 + c8 + c11, 4 * c2 + 2 * c6 + c9 + c10, 8 * c0 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c3 + c12, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"bounds\": [\n                                                                                                            \"0\",\n                                                                                                            \"i_t2\"\n                                                                                                        ],\n                                                                                                        \"child\": {\n                                                                                                            \"child\": {\n                                                                                                                \"child\": {\n                                                                                                                    \"user_expr\": \"S_0(2 * p0 + 8 * c3 + c12, 4 * c1 + 2 * c5 + c11, 4 * c2 + 2 * c6 + c10, 8 * c0 + 2 * c7 + c13, c8, c9)\"\n                                                                                                                },\n                                                                                                                \"type\": \"user\"\n                                                                                                            },\n                                                                                                            \"content\": \"hls_unroll\",\n                                                                                                            \"type\": \"mark\"\n                                                                                                        },\n                                                                                                        \"iterator\": \"c13\",\n                                                                                                        \"type\": \"for\"\n                                                                                                    },\n                                                                                                    \"content\": \"simd\",\n                                                                                                    \"type\": \"mark\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": [\n                                                                                                        {\n                                                                                                            \"child\": {\n                                                                                                                \"user_expr\": \"out.fifo_cout_drain.1.1(1, c1, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 4 * c1 + 2 * c5 + c11, 4 * c2 + 2 * c6 + c10, 2 * p0 + 8 * c3 + c12)\"\n                                                                                                            },\n                                                                                                            \"type\": \"user\"\n                                                                                                        }\n                                                                                                    ],\n                                                                                                    \"type\": \"if\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"out.fifo_cin.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c5 + c8 + c11, 4 * c2 + 2 * c6 + c9 + c10, 8 * c0 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"block\"\n                                                                                        },\n                                                                                        \"content\": \"hls_pipeline\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    \"iterator\": \"c10\",\n                                                                                    \"type\": \"for\"\n                                                                                },\n                                                                                \"content\": \"latency\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c11\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c12\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c0\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"iterator\": \"c1\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"iterator\": \"c9\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c8\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out.fifo_cout.1.1(0, c1, c2, c3, p0, 4 * c1 + c5, 4 * c2 + c6, 2 * p0 + c7)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p14\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"i_t1\",\n                    \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"p\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"q\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"o_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.8.2(c0, c1, c2, c3, 0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c1 + 2 * c5 + c8 + c11, 4 * c2 + 2 * c6 + c9 + c10, 8 * c0 + 2 * c7)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_pipeline\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c10\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c11\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c12\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c0\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c1\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c9\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c8\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"r_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"o_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"out_trans.fifo_cout.fifo_cout_local.1.2.1(1, c1, c2, c3, p0, c5, c6, 0, 0, 0, c10, c11, c12, 0, 4 * c1 + 2 * c5 + c11, 4 * c2 + 2 * c6 + c10, 2 * p0 + 8 * c3 + c12)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"r_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"o_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_local.fifo_cout.1.2.1(0, c1, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c5 + c11, 4 * c2 + 2 * c6 + c10, 2 * p0 + 8 * c3 + c12)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(o_t1/o_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p15\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t2\",\n                                            \"size\": \"r_t1*c_t1*o_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(o_t1/o_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p15\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t2\",\n                                            \"size\": \"r_t1*c_t1*o_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"r_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"o_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.2.1(1, c1, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c5 + c11, 4 * c2 + 2 * c6 + c10, 2 * p0 + 8 * c3 + c12)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(o_t1/o_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t2\",\n                                            \"size\": \"r_t1*c_t1*o_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"c_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"r_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.4.2(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p0 + 8 * c3 + c12, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p17\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"i_t1\",\n                                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"cin_IO_L2_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"cout_IO_L1_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"cout_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"w_IO_L1_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t2*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel0_1.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L1_in\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(c_t1/c_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"p\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"q\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"c_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"r_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"o_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": [\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c5 + c8 + c11, 4 * c2 + 2 * c6 + c9 + c10, 8 * c3 + 2 * c7)\"\n                                                                                            },\n                                                                                            \"type\": \"user\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c0 + c12, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                                            },\n                                                                                            \"type\": \"user\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"bounds\": [\n                                                                                                    \"0\",\n                                                                                                    \"i_t2\"\n                                                                                                ],\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"child\": {\n                                                                                                            \"user_expr\": \"S_0(2 * p0 + 8 * c0 + c12, 4 * c1 + 2 * c5 + c11, 4 * c2 + 2 * c6 + c10, 8 * c3 + 2 * c7 + c13, c8, c9)\"\n                                                                                                        },\n                                                                                                        \"type\": \"user\"\n                                                                                                    },\n                                                                                                    \"content\": \"hls_unroll\",\n                                                                                                    \"type\": \"mark\"\n                                                                                                },\n                                                                                                \"iterator\": \"c13\",\n                                                                                                \"type\": \"for\"\n                                                                                            },\n                                                                                            \"content\": \"simd\",\n                                                                                            \"type\": \"mark\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"out.fifo_cout_drain.1.1(c0, c1, c2, 1, p0, c5, c6, 3, 2, 2, c10, c11, c12, 4 * c1 + 2 * c5 + c11, 4 * c2 + 2 * c6 + c10, 2 * p0 + 8 * c0 + c12)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"if\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"user_expr\": \"out.fifo_cin.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c5 + c8 + c11, 4 * c2 + 2 * c6 + c9 + c10, 8 * c3 + 2 * c7)\"\n                                                                                            },\n                                                                                            \"type\": \"user\"\n                                                                                        }\n                                                                                    ],\n                                                                                    \"type\": \"block\"\n                                                                                },\n                                                                                \"content\": \"hls_pipeline\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c10\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c11\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c12\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c0\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c1\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c9\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c8\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"data_pack_factor\": \"p14\",\n                        \"ele_size\": 4,\n                        \"last_dim\": \"i_t1\",\n                        \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                        \"type\": \"array_tile\"\n                    },\n                    \"content\": \"access_serialize\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"p\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"q\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"o_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.8.2(c0, c1, c2, c3, 0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c1 + 2 * c5 + c8 + c11, 4 * c2 + 2 * c6 + c9 + c10, 8 * c3 + 2 * c7)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_pipeline\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c10\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c11\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c12\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c0\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c1\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c9\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c8\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"r_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"o_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.2.1(c0, c1, c2, 1, p0, c5, c6, 3, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c5 + c11, 4 * c2 + 2 * c6 + c10, 2 * p0 + 8 * c0 + c12)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(o_t1/o_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p16\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"o_t2\",\n                                                \"size\": \"r_t1*c_t1*o_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_serialize\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"c_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"r_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.4.2(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p0 + 8 * c0 + c12, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p17\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"i_t1\",\n                                                    \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_serialize\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"cin_IO_L2_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"w_IO_L1_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t2*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel0_2.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L1_in\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_IO_L1_out\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L1_in\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in.fifo_cout.1.1(c0, 1, c2, c3, p0, c5 + 4, 4 * c2 + c6, 2 * p0 + 8 * c0 + c7)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(r_t1/r_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(c_t1/c_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(i_t1/i_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"p\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"q\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"c_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"r_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"bounds\": [\n                                                                                        \"0\",\n                                                                                        \"o_t2\"\n                                                                                    ],\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 4 * c2 + 2 * c5 + c8 + c11, 4 * c3 + 2 * c6 + c9 + c10, 8 * c1 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c0 + c12, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"bounds\": [\n                                                                                                            \"0\",\n                                                                                                            \"i_t2\"\n                                                                                                        ],\n                                                                                                        \"child\": {\n                                                                                                            \"child\": {\n                                                                                                                \"child\": {\n                                                                                                                    \"user_expr\": \"S_0(2 * p0 + 8 * c0 + c12, 4 * c2 + 2 * c5 + c11, 4 * c3 + 2 * c6 + c10, 8 * c1 + 2 * c7 + c13, c8, c9)\"\n                                                                                                                },\n                                                                                                                \"type\": \"user\"\n                                                                                                            },\n                                                                                                            \"content\": \"hls_unroll\",\n                                                                                                            \"type\": \"mark\"\n                                                                                                        },\n                                                                                                        \"iterator\": \"c13\",\n                                                                                                        \"type\": \"for\"\n                                                                                                    },\n                                                                                                    \"content\": \"simd\",\n                                                                                                    \"type\": \"mark\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": [\n                                                                                                        {\n                                                                                                            \"child\": {\n                                                                                                                \"user_expr\": \"out.fifo_cout_drain.1.1(c0, 1, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 4 * c2 + 2 * c5 + c11, 4 * c3 + 2 * c6 + c10, 2 * p0 + 8 * c0 + c12)\"\n                                                                                                            },\n                                                                                                            \"type\": \"user\"\n                                                                                                        }\n                                                                                                    ],\n                                                                                                    \"type\": \"if\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"out.fifo_cin.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 4 * c2 + 2 * c5 + c8 + c11, 4 * c3 + 2 * c6 + c9 + c10, 8 * c1 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"block\"\n                                                                                        },\n                                                                                        \"content\": \"hls_pipeline\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    \"iterator\": \"c10\",\n                                                                                    \"type\": \"for\"\n                                                                                },\n                                                                                \"content\": \"latency\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c11\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c12\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c0\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"iterator\": \"c1\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"iterator\": \"c9\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c8\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out.fifo_cout.1.1(c0, 0, c2, c3, p0, c5, 4 * c2 + c6, 2 * p0 + 8 * c0 + c7)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p14\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"i_t1\",\n                    \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"p\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"q\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"o_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.8.2(c0, c1, c2, c3, 0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c2 + 2 * c5 + c8 + c11, 4 * c3 + 2 * c6 + c9 + c10, 8 * c1 + 2 * c7)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_pipeline\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c10\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c11\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c12\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c0\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c1\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c9\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c8\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"r_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"o_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"out_trans.fifo_cout.fifo_cout_local.1.2.1(c0, 1, c2, c3, p0, c5, c6, 0, 0, 0, c10, c11, c12, 0, 4 * c2 + 2 * c5 + c11, 4 * c3 + 2 * c6 + c10, 2 * p0 + 8 * c0 + c12)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"r_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"o_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_local.fifo_cout.1.2.1(c0, 0, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 1, 4 * c2 + 2 * c5 + c11, 4 * c3 + 2 * c6 + c10, 2 * p0 + 8 * c0 + c12)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(o_t1/o_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p15\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t2\",\n                                            \"size\": \"r_t1*c_t1*o_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(o_t1/o_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p15\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t2\",\n                                            \"size\": \"r_t1*c_t1*o_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"r_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"o_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.2.1(c0, 1, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 1, 4 * c2 + 2 * c5 + c11, 4 * c3 + 2 * c6 + c10, 2 * p0 + 8 * c0 + c12)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(o_t1/o_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t2\",\n                                            \"size\": \"r_t1*c_t1*o_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(c_t1/c_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"p\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"q\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"c_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"r_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"o_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.4.2(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p0 + 8 * c0 + c12, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    \"content\": \"hls_pipeline\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"content\": \"simd\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c10\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c11\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c12\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c0\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c1\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c9\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c8\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L1\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(o_t1/o_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p17\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"i_t1\",\n                                        \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c6\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"cin_IO_L2_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"cout_IO_L1_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"cout_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"w_IO_L1_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t2*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel1_0.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L1_in\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_IO_L1_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in.fifo_cout.1.1(1, c1, c2, c3, p0, 2 * p0 + 4 * c1 + c5, 4 * c2 + c6, c7 + 8)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(o_t1/o_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(c_t1/c_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(i_t1/i_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"p\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"q\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"c_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"o_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"bounds\": [\n                                                                                        \"0\",\n                                                                                        \"r_t2\"\n                                                                                    ],\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 4 * c1 + c8 + c12, 4 * c2 + 2 * c6 + c9 + c10, 8 * c0 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 8 * c3 + 2 * c5 + c11, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"bounds\": [\n                                                                                                            \"0\",\n                                                                                                            \"i_t2\"\n                                                                                                        ],\n                                                                                                        \"child\": {\n                                                                                                            \"child\": {\n                                                                                                                \"child\": {\n                                                                                                                    \"user_expr\": \"S_0(8 * c3 + 2 * c5 + c11, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c6 + c10, 8 * c0 + 2 * c7 + c13, c8, c9)\"\n                                                                                                                },\n                                                                                                                \"type\": \"user\"\n                                                                                                            },\n                                                                                                            \"content\": \"hls_unroll\",\n                                                                                                            \"type\": \"mark\"\n                                                                                                        },\n                                                                                                        \"iterator\": \"c13\",\n                                                                                                        \"type\": \"for\"\n                                                                                                    },\n                                                                                                    \"content\": \"simd\",\n                                                                                                    \"type\": \"mark\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 8 * c3 + 2 * c5 + c11, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": [\n                                                                                                        {\n                                                                                                            \"child\": {\n                                                                                                                \"user_expr\": \"out.fifo_cout_drain.1.1(1, c1, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c6 + c10, 8 * c3 + 2 * c5 + c11)\"\n                                                                                                            },\n                                                                                                            \"type\": \"user\"\n                                                                                                        }\n                                                                                                    ],\n                                                                                                    \"type\": \"if\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"block\"\n                                                                                        },\n                                                                                        \"content\": \"hls_pipeline\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    \"iterator\": \"c10\",\n                                                                                    \"type\": \"for\"\n                                                                                },\n                                                                                \"content\": \"latency\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c11\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c12\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c0\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"iterator\": \"c1\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"iterator\": \"c9\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c8\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out.fifo_cout.1.1(0, c1, c2, c3, p0, 2 * p0 + 4 * c1 + c5, 4 * c2 + c6, c7)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(o_t1/o_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(i_t1/i_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"c_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"o_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"r_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.4.2(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p0 + 4 * c1 + c8 + c12, 4 * c2 + 2 * c6 + c9 + c10, 8 * c0 + 2 * c7)\"\n                                                                                    },\n                                                                                    \"type\": \"user\"\n                                                                                },\n                                                                                \"content\": \"hls_pipeline\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"content\": \"simd\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L1\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L2\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p14\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"i_t1\",\n                                            \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"out_trans.fifo_cout.fifo_cout_local.1.4.1(1, c1, c2, c3, p0, c5, c6, 0, 0, 0, c10, c11, c12, 0, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c6 + c10, 8 * c3 + 2 * c5 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_local.fifo_cout.1.4.1(0, c1, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 1, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c6 + c10, 8 * c3 + 2 * c5 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p15\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t2*c_t1*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p15\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t2*c_t1*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.4.1(1, c1, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 1, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c6 + c10, 8 * c3 + 2 * c5 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t2*c_t1*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p17\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"i_t1\",\n                    \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"p\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"q\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"o_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"r_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.8.2(c0, c1, c2, c3, 0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c3 + 2 * c5 + c11, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_pipeline\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c10\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c11\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c12\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c0\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c1\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c9\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c8\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"(((((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cout_IO_L1_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cout_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel1_1.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(c_t1/c_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"p\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"q\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"c_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"o_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"r_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": [\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 4 * c1 + c8 + c12, 4 * c2 + 2 * c6 + c9 + c10, 8 * c3 + 2 * c7)\"\n                                                                                            },\n                                                                                            \"type\": \"user\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c5 + c11, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                                            },\n                                                                                            \"type\": \"user\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"bounds\": [\n                                                                                                    \"0\",\n                                                                                                    \"i_t2\"\n                                                                                                ],\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"child\": {\n                                                                                                            \"user_expr\": \"S_0(8 * c0 + 2 * c5 + c11, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c6 + c10, 8 * c3 + 2 * c7 + c13, c8, c9)\"\n                                                                                                        },\n                                                                                                        \"type\": \"user\"\n                                                                                                    },\n                                                                                                    \"content\": \"hls_unroll\",\n                                                                                                    \"type\": \"mark\"\n                                                                                                },\n                                                                                                \"iterator\": \"c13\",\n                                                                                                \"type\": \"for\"\n                                                                                            },\n                                                                                            \"content\": \"simd\",\n                                                                                            \"type\": \"mark\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c5 + c11, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                                            },\n                                                                                            \"type\": \"user\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"out.fifo_cout_drain.1.1(c0, c1, c2, 1, p0, c5, c6, 3, 2, 2, c10, c11, c12, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c6 + c10, 8 * c0 + 2 * c5 + c11)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"if\"\n                                                                                        }\n                                                                                    ],\n                                                                                    \"type\": \"block\"\n                                                                                },\n                                                                                \"content\": \"hls_pipeline\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c10\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c11\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c12\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c0\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c1\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c9\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c8\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"c_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"o_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"r_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.4.2(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p0 + 4 * c1 + c8 + c12, 4 * c2 + 2 * c6 + c9 + c10, 8 * c3 + 2 * c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p14\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"i_t1\",\n                                                    \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_serialize\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.4.1(c0, c1, c2, 1, p0, c5, c6, 3, 2, 2, c10, c11, c12, 1, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c6 + c10, 8 * c0 + 2 * c5 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p16\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"o_t1\",\n                                                \"size\": \"r_t2*c_t1*o_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_serialize\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"data_pack_factor\": \"p17\",\n                        \"ele_size\": 4,\n                        \"last_dim\": \"i_t1\",\n                        \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t1\",\n                        \"type\": \"array_tile\"\n                    },\n                    \"content\": \"access_serialize\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"p\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"q\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"o_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"r_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.8.2(c0, c1, c2, c3, 0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c0 + 2 * c5 + c11, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_pipeline\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c10\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c11\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c12\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c0\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c1\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c9\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c8\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"(((((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel1_2.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L1_in\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_IO_L1_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in.fifo_cout.1.1(c0, 1, c2, c3, p0, 2 * p0 + c5 + 4, 4 * c2 + c6, 8 * c0 + c7)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(o_t1/o_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(c_t1/c_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(i_t1/i_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"p\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"q\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"c_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"o_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"bounds\": [\n                                                                                        \"0\",\n                                                                                        \"r_t2\"\n                                                                                    ],\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 4 * c2 + c8 + c12, 4 * c3 + 2 * c6 + c9 + c10, 8 * c1 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c5 + c11, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"bounds\": [\n                                                                                                            \"0\",\n                                                                                                            \"i_t2\"\n                                                                                                        ],\n                                                                                                        \"child\": {\n                                                                                                            \"child\": {\n                                                                                                                \"child\": {\n                                                                                                                    \"user_expr\": \"S_0(8 * c0 + 2 * c5 + c11, 2 * p0 + 4 * c2 + c12, 4 * c3 + 2 * c6 + c10, 8 * c1 + 2 * c7 + c13, c8, c9)\"\n                                                                                                                },\n                                                                                                                \"type\": \"user\"\n                                                                                                            },\n                                                                                                            \"content\": \"hls_unroll\",\n                                                                                                            \"type\": \"mark\"\n                                                                                                        },\n                                                                                                        \"iterator\": \"c13\",\n                                                                                                        \"type\": \"for\"\n                                                                                                    },\n                                                                                                    \"content\": \"simd\",\n                                                                                                    \"type\": \"mark\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c5 + c11, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": [\n                                                                                                        {\n                                                                                                            \"child\": {\n                                                                                                                \"user_expr\": \"out.fifo_cout_drain.1.1(c0, 1, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 2 * p0 + 4 * c2 + c12, 4 * c3 + 2 * c6 + c10, 8 * c0 + 2 * c5 + c11)\"\n                                                                                                            },\n                                                                                                            \"type\": \"user\"\n                                                                                                        }\n                                                                                                    ],\n                                                                                                    \"type\": \"if\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"block\"\n                                                                                        },\n                                                                                        \"content\": \"hls_pipeline\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    \"iterator\": \"c10\",\n                                                                                    \"type\": \"for\"\n                                                                                },\n                                                                                \"content\": \"latency\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c11\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c12\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c0\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"iterator\": \"c1\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"iterator\": \"c9\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c8\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out.fifo_cout.1.1(c0, 0, c2, c3, p0, 2 * p0 + c5, 4 * c2 + c6, 8 * c0 + c7)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"c_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"o_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"r_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.4.2(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p0 + 4 * c2 + c8 + c12, 4 * c3 + 2 * c6 + c9 + c10, 8 * c1 + 2 * c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p14\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"i_t1\",\n                                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"out_trans.fifo_cout.fifo_cout_local.1.4.1(c0, 1, c2, c3, p0, c5, c6, 0, 0, 0, c10, c11, c12, 0, 2 * p0 + 4 * c2 + c12, 4 * c3 + 2 * c6 + c10, 8 * c0 + 2 * c5 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_local.fifo_cout.1.4.1(c0, 0, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 1, 2 * p0 + 4 * c2 + c12, 4 * c3 + 2 * c6 + c10, 8 * c0 + 2 * c5 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p15\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t2*c_t1*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p15\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t2*c_t1*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.4.1(c0, 1, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 1, 2 * p0 + 4 * c2 + c12, 4 * c3 + 2 * c6 + c10, 8 * c0 + 2 * c5 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t2*c_t1*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p17\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"i_t1\",\n                    \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"p\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"q\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"o_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"r_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.8.2(c0, c1, c2, c3, 0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c0 + 2 * c5 + c11, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_pipeline\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c10\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c11\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c12\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c0\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c1\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c9\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c8\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"(((((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cout_IO_L1_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cout_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel2_0.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L1_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_IO_L1_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in.fifo_cout.1.1(1, c1, c2, c3, p0, 4 * c1 + c5, 2 * p0 + 4 * c2 + c6, c7 + 8)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(o_t1/o_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(r_t1/r_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(i_t1/i_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"p\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"q\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"r_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"o_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"bounds\": [\n                                                                                        \"0\",\n                                                                                        \"c_t2\"\n                                                                                    ],\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c6 + c8 + c10, 2 * p0 + 4 * c2 + c9 + c12, 8 * c0 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 8 * c3 + 2 * c5 + c11, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"bounds\": [\n                                                                                                            \"0\",\n                                                                                                            \"i_t2\"\n                                                                                                        ],\n                                                                                                        \"child\": {\n                                                                                                            \"child\": {\n                                                                                                                \"child\": {\n                                                                                                                    \"user_expr\": \"S_0(8 * c3 + 2 * c5 + c11, 4 * c1 + 2 * c6 + c10, 2 * p0 + 4 * c2 + c12, 8 * c0 + 2 * c7 + c13, c8, c9)\"\n                                                                                                                },\n                                                                                                                \"type\": \"user\"\n                                                                                                            },\n                                                                                                            \"content\": \"hls_unroll\",\n                                                                                                            \"type\": \"mark\"\n                                                                                                        },\n                                                                                                        \"iterator\": \"c13\",\n                                                                                                        \"type\": \"for\"\n                                                                                                    },\n                                                                                                    \"content\": \"simd\",\n                                                                                                    \"type\": \"mark\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 8 * c3 + 2 * c5 + c11, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": [\n                                                                                                        {\n                                                                                                            \"child\": {\n                                                                                                                \"user_expr\": \"out.fifo_cout_drain.1.1(1, c1, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 4 * c1 + 2 * c6 + c10, 2 * p0 + 4 * c2 + c12, 8 * c3 + 2 * c5 + c11)\"\n                                                                                                            },\n                                                                                                            \"type\": \"user\"\n                                                                                                        }\n                                                                                                    ],\n                                                                                                    \"type\": \"if\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"block\"\n                                                                                        },\n                                                                                        \"content\": \"hls_pipeline\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    \"iterator\": \"c10\",\n                                                                                    \"type\": \"for\"\n                                                                                },\n                                                                                \"content\": \"latency\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c11\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c12\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c0\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"iterator\": \"c1\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"iterator\": \"c9\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c8\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out.fifo_cout.1.1(0, c1, c2, c3, p0, 4 * c1 + c5, 2 * p0 + 4 * c2 + c6, c7)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(o_t1/o_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(i_t1/i_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"o_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"c_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.4.2(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c1 + 2 * c6 + c8 + c10, 2 * p0 + 4 * c2 + c9 + c12, 8 * c0 + 2 * c7)\"\n                                                                                    },\n                                                                                    \"type\": \"user\"\n                                                                                },\n                                                                                \"content\": \"hls_pipeline\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"content\": \"simd\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L1\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L2\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p14\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"i_t1\",\n                                            \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"r_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"out_trans.fifo_cout.fifo_cout_local.1.4.1(1, c1, c2, c3, p0, c5, c6, 0, 0, 0, c10, c11, c12, 0, 4 * c1 + 2 * c6 + c10, 2 * p0 + 4 * c2 + c12, 8 * c3 + 2 * c5 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"r_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_local.fifo_cout.1.4.1(0, c1, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c6 + c10, 2 * p0 + 4 * c2 + c12, 8 * c3 + 2 * c5 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p15\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t1*c_t2*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p15\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t1*c_t2*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"r_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.4.1(1, c1, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c6 + c10, 2 * p0 + 4 * c2 + c12, 8 * c3 + 2 * c5 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t1*c_t2*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p17\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"i_t1\",\n                    \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(r_t1/r_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"p\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"q\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"o_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"c_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.8.2(c0, c1, c2, c3, 0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c3 + 2 * c5 + c11, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_pipeline\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c10\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c11\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c12\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c0\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c1\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c9\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c8\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cout_IO_L1_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cout_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel2_1.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"p\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"q\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"r_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"o_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"c_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": [\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c6 + c8 + c10, 2 * p0 + 4 * c2 + c9 + c12, 8 * c3 + 2 * c7)\"\n                                                                                            },\n                                                                                            \"type\": \"user\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c5 + c11, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                                            },\n                                                                                            \"type\": \"user\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"bounds\": [\n                                                                                                    \"0\",\n                                                                                                    \"i_t2\"\n                                                                                                ],\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"child\": {\n                                                                                                            \"user_expr\": \"S_0(8 * c0 + 2 * c5 + c11, 4 * c1 + 2 * c6 + c10, 2 * p0 + 4 * c2 + c12, 8 * c3 + 2 * c7 + c13, c8, c9)\"\n                                                                                                        },\n                                                                                                        \"type\": \"user\"\n                                                                                                    },\n                                                                                                    \"content\": \"hls_unroll\",\n                                                                                                    \"type\": \"mark\"\n                                                                                                },\n                                                                                                \"iterator\": \"c13\",\n                                                                                                \"type\": \"for\"\n                                                                                            },\n                                                                                            \"content\": \"simd\",\n                                                                                            \"type\": \"mark\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c5 + c11, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                                            },\n                                                                                            \"type\": \"user\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"out.fifo_cout_drain.1.1(c0, c1, c2, 1, p0, c5, c6, 3, 2, 2, c10, c11, c12, 4 * c1 + 2 * c6 + c10, 2 * p0 + 4 * c2 + c12, 8 * c0 + 2 * c5 + c11)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"if\"\n                                                                                        }\n                                                                                    ],\n                                                                                    \"type\": \"block\"\n                                                                                },\n                                                                                \"content\": \"hls_pipeline\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c10\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c11\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c12\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c0\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c1\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c9\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c8\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"o_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"c_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.4.2(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c1 + 2 * c6 + c8 + c10, 2 * p0 + 4 * c2 + c9 + c12, 8 * c3 + 2 * c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p14\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"i_t1\",\n                                                    \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_serialize\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"r_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.4.1(c0, c1, c2, 1, p0, c5, c6, 3, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c6 + c10, 2 * p0 + 4 * c2 + c12, 8 * c0 + 2 * c5 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p16\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"o_t1\",\n                                                \"size\": \"r_t1*c_t2*o_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_serialize\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"data_pack_factor\": \"p17\",\n                        \"ele_size\": 4,\n                        \"last_dim\": \"i_t1\",\n                        \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t1\",\n                        \"type\": \"array_tile\"\n                    },\n                    \"content\": \"access_serialize\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(r_t1/r_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"p\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"q\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"o_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"c_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.8.2(c0, c1, c2, c3, 0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c0 + 2 * c5 + c11, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_pipeline\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c10\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c11\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c12\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c0\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c1\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c9\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c8\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel2_2.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L1_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_IO_L1_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in.fifo_cout.1.1(c0, 1, c2, c3, p0, c5 + 4, 2 * p0 + 4 * c2 + c6, 8 * c0 + c7)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(o_t1/o_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(r_t1/r_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(i_t1/i_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"p\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"q\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"r_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"o_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"bounds\": [\n                                                                                        \"0\",\n                                                                                        \"c_t2\"\n                                                                                    ],\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 4 * c2 + 2 * c6 + c8 + c10, 2 * p0 + 4 * c3 + c9 + c12, 8 * c1 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c5 + c11, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"bounds\": [\n                                                                                                            \"0\",\n                                                                                                            \"i_t2\"\n                                                                                                        ],\n                                                                                                        \"child\": {\n                                                                                                            \"child\": {\n                                                                                                                \"child\": {\n                                                                                                                    \"user_expr\": \"S_0(8 * c0 + 2 * c5 + c11, 4 * c2 + 2 * c6 + c10, 2 * p0 + 4 * c3 + c12, 8 * c1 + 2 * c7 + c13, c8, c9)\"\n                                                                                                                },\n                                                                                                                \"type\": \"user\"\n                                                                                                            },\n                                                                                                            \"content\": \"hls_unroll\",\n                                                                                                            \"type\": \"mark\"\n                                                                                                        },\n                                                                                                        \"iterator\": \"c13\",\n                                                                                                        \"type\": \"for\"\n                                                                                                    },\n                                                                                                    \"content\": \"simd\",\n                                                                                                    \"type\": \"mark\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c5 + c11, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                {\n                                                                                                    \"child\": [\n                                                                                                        {\n                                                                                                            \"child\": {\n                                                                                                                \"user_expr\": \"out.fifo_cout_drain.1.1(c0, 1, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 4 * c2 + 2 * c6 + c10, 2 * p0 + 4 * c3 + c12, 8 * c0 + 2 * c5 + c11)\"\n                                                                                                            },\n                                                                                                            \"type\": \"user\"\n                                                                                                        }\n                                                                                                    ],\n                                                                                                    \"type\": \"if\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"block\"\n                                                                                        },\n                                                                                        \"content\": \"hls_pipeline\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    \"iterator\": \"c10\",\n                                                                                    \"type\": \"for\"\n                                                                                },\n                                                                                \"content\": \"latency\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c11\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c12\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c0\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"iterator\": \"c1\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"iterator\": \"c9\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c8\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out.fifo_cout.1.1(c0, 0, c2, c3, p0, c5, 2 * p0 + 4 * c2 + c6, 8 * c0 + c7)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"o_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"c_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.4.2(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c2 + 2 * c6 + c8 + c10, 2 * p0 + 4 * c3 + c9 + c12, 8 * c1 + 2 * c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p14\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"i_t1\",\n                                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"r_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"out_trans.fifo_cout.fifo_cout_local.1.4.1(c0, 1, c2, c3, p0, c5, c6, 0, 0, 0, c10, c11, c12, 0, 4 * c2 + 2 * c6 + c10, 2 * p0 + 4 * c3 + c12, 8 * c0 + 2 * c5 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"r_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_local.fifo_cout.1.4.1(c0, 0, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 1, 4 * c2 + 2 * c6 + c10, 2 * p0 + 4 * c3 + c12, 8 * c0 + 2 * c5 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p15\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t1*c_t2*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p15\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t1*c_t2*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"r_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.4.1(c0, 1, c2, c3, p0, c5, c6, 3, 2, 2, c10, c11, c12, 1, 4 * c2 + 2 * c6 + c10, 2 * p0 + 4 * c3 + c12, 8 * c0 + 2 * c5 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t1*c_t2*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p17\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"i_t1\",\n                    \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(r_t1/r_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"p\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"q\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"o_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"c_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.8.2(c0, c1, c2, c3, 0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c0 + 2 * c5 + c11, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_pipeline\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c10\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c11\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c12\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c0\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c1\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c9\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c8\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cout_IO_L1_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cout_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel3_0.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(c_t1/c_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"p\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"q\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"c_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"r_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"o_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": [\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c6 + c8 + c11, 4 * c2 + 2 * c7 + c9 + c10, 2 * p0 + 8 * c0)\"\n                                                                                            },\n                                                                                            \"type\": \"user\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"in.fifo_cout_1.1.1(c0, c1, c2, c3, p0, c5, c6, c7, 0, 0, c10, c11, c12, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 8 * c3 + 2 * c5 + c12)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"if\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 8 * c3 + 2 * c5 + c12, c8, c9, 2 * p0 + 8 * c0)\"\n                                                                                            },\n                                                                                            \"type\": \"user\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"bounds\": [\n                                                                                                    \"0\",\n                                                                                                    \"i_t2\"\n                                                                                                ],\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"child\": {\n                                                                                                            \"user_expr\": \"S_0(8 * c3 + 2 * c5 + c12, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 2 * p0 + 8 * c0 + c13, c8, c9)\"\n                                                                                                        },\n                                                                                                        \"type\": \"user\"\n                                                                                                    },\n                                                                                                    \"content\": \"hls_unroll\",\n                                                                                                    \"type\": \"mark\"\n                                                                                                },\n                                                                                                \"iterator\": \"c13\",\n                                                                                                \"type\": \"for\"\n                                                                                            },\n                                                                                            \"content\": \"simd\",\n                                                                                            \"type\": \"mark\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"out.fifo_cout_drain.1.1(1, c1, c2, c3, 3, c5, c6, c7, 2, 2, c10, c11, c12, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 8 * c3 + 2 * c5 + c12)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"if\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"out.fifo_cout_1.1.1(c0, c1, c2, c3, p0, c5, c6, c7, 2, 2, c10, c11, c12, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 8 * c3 + 2 * c5 + c12)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"if\"\n                                                                                        }\n                                                                                    ],\n                                                                                    \"type\": \"block\"\n                                                                                },\n                                                                                \"content\": \"hls_pipeline\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c10\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c11\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c12\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c0\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c1\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c9\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c8\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(o_t1/o_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(c_t1/c_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"c_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"r_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"o_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.2.2(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c1 + 2 * c6 + c8 + c11, 4 * c2 + 2 * c7 + c9 + c10, 2 * p0 + 8 * c0)\"\n                                                                                    },\n                                                                                    \"type\": \"user\"\n                                                                                },\n                                                                                \"content\": \"hls_pipeline\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"content\": \"simd\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L1\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L2\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p14\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"i_t2\",\n                                            \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p16\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"o_t1\",\n                    \"size\": \"r_t1*c_t1*o_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(r_t1/r_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"c_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"o_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out_trans.fifo_cout_1.fifo_cout_1_local.1.8.1(1, c1, c2, c3, 0, c5, c6, c7, 0, 0, c10, c11, c12, 0, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 8 * c3 + 2 * c5 + c12)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"content\": \"simd\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c10\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c11\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c12\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c9\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c8\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.1.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p16\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"o_t1\",\n                    \"size\": \"r_t1*c_t1*o_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(r_t1/r_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"c_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"o_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"in_trans.fifo_cout_1_local.fifo_cout_1.1.8.1(0, c1, c2, c3, 3, c5, c6, c7, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 8 * c3 + 2 * c5 + c12)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"content\": \"simd\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c10\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c11\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c12\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c9\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c8\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"child\": {\n                \"child\": [\n                    {\n                        \"child\": {\n                            \"data_pack_factor\": \"p17\",\n                            \"ele_size\": 4,\n                            \"last_dim\": \"o_t1\",\n                            \"size\": \"r_t1*c_t1*o_t1\",\n                            \"type\": \"array_tile\"\n                        },\n                        \"content\": \"access_coalesce\",\n                        \"type\": \"mark\"\n                    },\n                    {\n                        \"child\": {\n                            \"data_pack_factor\": \"p17\",\n                            \"ele_size\": 4,\n                            \"last_dim\": \"o_t1\",\n                            \"size\": \"r_t1*c_t1*o_t1\",\n                            \"type\": \"array_tile\"\n                        },\n                        \"content\": \"access_coalesce\",\n                        \"type\": \"mark\"\n                    }\n                ],\n                \"type\": \"if\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"o_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.4.1(1, c1, c2, c3, 3, c5, c6, c7, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 8 * c3 + 2 * c5 + c12)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c10\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c11\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c12\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p17\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"o_t1\",\n                                        \"size\": \"r_t1*c_t1*o_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p18\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p18\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"c_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"r_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.2.2(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c3 + 2 * c5 + c12, c8, c9, 2 * p0 + 8 * c0)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p18\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"i_t2\",\n                                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"cout_1_IO_L2_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"w_IO_L1_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p18\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,16),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p18\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel3_1.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(c_t1/c_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"p\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"q\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"c_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"r_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"o_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": [\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c6 + c8 + c11, 4 * c2 + 2 * c7 + c9 + c10, 2 * p0 + 8 * c3)\"\n                                                                                            },\n                                                                                            \"type\": \"user\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"in.fifo_cout_1.1.1(c0, c1, c2, c3, p0, c5, c6, c7, 0, 0, c10, c11, c12, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 8 * c0 + 2 * c5 + c12)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"if\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c5 + c12, c8, c9, 2 * p0 + 8 * c3)\"\n                                                                                            },\n                                                                                            \"type\": \"user\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"bounds\": [\n                                                                                                    \"0\",\n                                                                                                    \"i_t2\"\n                                                                                                ],\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"child\": {\n                                                                                                            \"user_expr\": \"S_0(8 * c0 + 2 * c5 + c12, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 2 * p0 + 8 * c3 + c13, c8, c9)\"\n                                                                                                        },\n                                                                                                        \"type\": \"user\"\n                                                                                                    },\n                                                                                                    \"content\": \"hls_unroll\",\n                                                                                                    \"type\": \"mark\"\n                                                                                                },\n                                                                                                \"iterator\": \"c13\",\n                                                                                                \"type\": \"for\"\n                                                                                            },\n                                                                                            \"content\": \"simd\",\n                                                                                            \"type\": \"mark\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"out.fifo_cout_1.1.1(c0, c1, c2, c3, p0, c5, c6, c7, 2, 2, c10, c11, c12, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 8 * c0 + 2 * c5 + c12)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"if\"\n                                                                                        }\n                                                                                    ],\n                                                                                    \"type\": \"block\"\n                                                                                },\n                                                                                \"content\": \"hls_pipeline\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c10\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c11\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c12\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c0\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c1\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c9\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c8\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"c_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"r_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.2.2(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c1 + 2 * c6 + c8 + c11, 4 * c2 + 2 * c7 + c9 + c10, 2 * p0 + 8 * c3)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p14\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"i_t2\",\n                                                    \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_serialize\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": [\n                            {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.1.1()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.state_handle()\"\n                                },\n                                \"type\": \"user\"\n                            }\n                        ],\n                        \"type\": \"block\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"data_pack_factor\": \"p16\",\n                            \"ele_size\": 4,\n                            \"last_dim\": \"o_t1\",\n                            \"size\": \"r_t1*c_t1*o_t1\",\n                            \"type\": \"array_tile\"\n                        },\n                        \"content\": \"access_serialize\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"access_coalesce\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"array\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(o_t1/o_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(c_t1/c_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"o_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"in_trans_reduce_+.fifo_cout_1_local.fifo_cout_1.1.8.1(c0, c1, c2, c3, 3, c5, c6, c7, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 8 * c0 + 2 * c5 + c12)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_pipeline\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c10\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c11\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c12\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L1\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L2\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"c_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"r_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.2.2(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c0 + 2 * c5 + c12, c8, c9, 2 * p0 + 8 * c3)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p17\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"i_t2\",\n                                                    \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_serialize\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"w_IO_L1_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,16),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel3_2.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(c_t1/c_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"p\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"q\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"c_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"r_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"o_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": [\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 4 * c2 + 2 * c6 + c8 + c11, 4 * c3 + 2 * c7 + c9 + c10, 2 * p0 + 8 * c1)\"\n                                                                                            },\n                                                                                            \"type\": \"user\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"in.fifo_cout_1.1.1(c0, c1, c2, c3, p0, c5, c6, c7, 0, 0, c10, c11, c12, 4 * c2 + 2 * c6 + c11, 4 * c3 + 2 * c7 + c10, 8 * c0 + 2 * c5 + c12)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"if\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c5 + c12, c8, c9, 2 * p0 + 8 * c1)\"\n                                                                                            },\n                                                                                            \"type\": \"user\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": {\n                                                                                                \"bounds\": [\n                                                                                                    \"0\",\n                                                                                                    \"i_t2\"\n                                                                                                ],\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"child\": {\n                                                                                                            \"user_expr\": \"S_0(8 * c0 + 2 * c5 + c12, 4 * c2 + 2 * c6 + c11, 4 * c3 + 2 * c7 + c10, 2 * p0 + 8 * c1 + c13, c8, c9)\"\n                                                                                                        },\n                                                                                                        \"type\": \"user\"\n                                                                                                    },\n                                                                                                    \"content\": \"hls_unroll\",\n                                                                                                    \"type\": \"mark\"\n                                                                                                },\n                                                                                                \"iterator\": \"c13\",\n                                                                                                \"type\": \"for\"\n                                                                                            },\n                                                                                            \"content\": \"simd\",\n                                                                                            \"type\": \"mark\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"out.fifo_cout_drain.1.1(c0, 1, c2, c3, 3, c5, c6, c7, 2, 2, c10, c11, c12, 4 * c2 + 2 * c6 + c11, 4 * c3 + 2 * c7 + c10, 8 * c0 + 2 * c5 + c12)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"if\"\n                                                                                        },\n                                                                                        {\n                                                                                            \"child\": [\n                                                                                                {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"out.fifo_cout_1.1.1(c0, c1, c2, c3, p0, c5, c6, c7, 2, 2, c10, c11, c12, 4 * c2 + 2 * c6 + c11, 4 * c3 + 2 * c7 + c10, 8 * c0 + 2 * c5 + c12)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                }\n                                                                                            ],\n                                                                                            \"type\": \"if\"\n                                                                                        }\n                                                                                    ],\n                                                                                    \"type\": \"block\"\n                                                                                },\n                                                                                \"content\": \"hls_pipeline\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c10\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c11\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c12\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c0\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c1\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c9\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c8\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"c_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"r_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.2.2(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c2 + 2 * c6 + c8 + c11, 4 * c3 + 2 * c7 + c9 + c10, 2 * p0 + 8 * c1)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p14\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"i_t2\",\n                                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p16\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"o_t1\",\n                    \"size\": \"r_t1*c_t1*o_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(r_t1/r_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"c_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"o_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out_trans.fifo_cout_1.fifo_cout_1_local.1.8.1(c0, 1, c2, c3, 0, c5, c6, c7, 0, 0, c10, c11, c12, 0, 4 * c2 + 2 * c6 + c11, 4 * c3 + 2 * c7 + c10, 8 * c0 + 2 * c5 + c12)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"content\": \"simd\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c10\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c11\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c12\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c9\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c8\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.1.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p16\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"o_t1\",\n                    \"size\": \"r_t1*c_t1*o_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(r_t1/r_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"c_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"o_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"in_trans.fifo_cout_1_local.fifo_cout_1.1.8.1(c0, 0, c2, c3, 3, c5, c6, c7, 2, 2, c10, c11, c12, 1, 4 * c2 + 2 * c6 + c11, 4 * c3 + 2 * c7 + c10, 8 * c0 + 2 * c5 + c12)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"content\": \"simd\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c10\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c11\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c12\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c9\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c8\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"child\": {\n                \"child\": [\n                    {\n                        \"child\": {\n                            \"data_pack_factor\": \"p17\",\n                            \"ele_size\": 4,\n                            \"last_dim\": \"o_t1\",\n                            \"size\": \"r_t1*c_t1*o_t1\",\n                            \"type\": \"array_tile\"\n                        },\n                        \"content\": \"access_coalesce\",\n                        \"type\": \"mark\"\n                    },\n                    {\n                        \"child\": {\n                            \"data_pack_factor\": \"p17\",\n                            \"ele_size\": 4,\n                            \"last_dim\": \"o_t1\",\n                            \"size\": \"r_t1*c_t1*o_t1\",\n                            \"type\": \"array_tile\"\n                        },\n                        \"content\": \"access_coalesce\",\n                        \"type\": \"mark\"\n                    }\n                ],\n                \"type\": \"if\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"o_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.4.1(c0, 1, c2, c3, 3, c5, c6, c7, 2, 2, c10, c11, c12, 1, 4 * c2 + 2 * c6 + c11, 4 * c3 + 2 * c7 + c10, 8 * c0 + 2 * c5 + c12)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c10\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c11\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c12\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c7\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p17\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"o_t1\",\n                                        \"size\": \"r_t1*c_t1*o_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p18\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p18\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(c_t1/c_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"p\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"q\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"c_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"r_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"o_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.2.2(c0, c1, c2, c3, p0, c5, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c0 + 2 * c5 + c12, c8, c9, 2 * p0 + 8 * c1)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    \"content\": \"hls_pipeline\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"content\": \"simd\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c10\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c11\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c12\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c0\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c1\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c9\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c8\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L1\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p18\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"i_t2\",\n                                        \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c6\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"cout_1_IO_L2_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"w_IO_L1_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p18\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,16),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p18\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel4_0.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(r_t1/r_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L1_in\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_IO_L1_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_IO_L2_in\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_IO_L2_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in.fifo_cout.1.1(1, c1, c2, c3, p0, p1, 2 * p1 + 4 * c1 + c6, 4 * c2 + c7, 2 * p0 + c8 + 8)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(c_t1/c_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(i_t1/i_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"p\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"q\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"c_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"r_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"bounds\": [\n                                                                                    \"0\",\n                                                                                    \"o_t2\"\n                                                                                ],\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p1 + 4 * c1 + c8 + c11, 4 * c2 + 2 * c6 + c9 + c10, 8 * c0 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c3 + c12, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"bounds\": [\n                                                                                                        \"0\",\n                                                                                                        \"i_t2\"\n                                                                                                    ],\n                                                                                                    \"child\": {\n                                                                                                        \"child\": {\n                                                                                                            \"child\": {\n                                                                                                                \"user_expr\": \"S_0(2 * p0 + 8 * c3 + c12, 2 * p1 + 4 * c1 + c11, 4 * c2 + 2 * c6 + c10, 8 * c0 + 2 * c7 + c13, c8, c9)\"\n                                                                                                            },\n                                                                                                            \"type\": \"user\"\n                                                                                                        },\n                                                                                                        \"content\": \"hls_unroll\",\n                                                                                                        \"type\": \"mark\"\n                                                                                                    },\n                                                                                                    \"iterator\": \"c13\",\n                                                                                                    \"type\": \"for\"\n                                                                                                },\n                                                                                                \"content\": \"simd\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c3 + c12, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": [\n                                                                                                    {\n                                                                                                        \"child\": {\n                                                                                                            \"user_expr\": \"out.fifo_cout_drain.1.1(1, c1, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 2 * p1 + 4 * c1 + c11, 4 * c2 + 2 * c6 + c10, 2 * p0 + 8 * c3 + c12)\"\n                                                                                                        },\n                                                                                                        \"type\": \"user\"\n                                                                                                    }\n                                                                                                ],\n                                                                                                \"type\": \"if\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p1 + 4 * c1 + c8 + c11, 4 * c2 + 2 * c6 + c9 + c10, 8 * c0 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"block\"\n                                                                                    },\n                                                                                    \"content\": \"hls_pipeline\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"iterator\": \"c10\",\n                                                                                \"type\": \"for\"\n                                                                            },\n                                                                            \"content\": \"latency\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c11\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c12\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c0\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"iterator\": \"c1\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c9\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c8\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out.fifo_cout.1.1(0, c1, c2, c3, p0, p1, 2 * p1 + 4 * c1 + c6, 4 * c2 + c7, 2 * p0 + c8)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(i_t1/i_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"c_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"r_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"o_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.8.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p0 + 4 * c1 + c8 + c11, 4 * c2 + 2 * c6 + c9 + c10, 8 * c0 + 2 * c7)\"\n                                                                                    },\n                                                                                    \"type\": \"user\"\n                                                                                },\n                                                                                \"content\": \"hls_pipeline\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"content\": \"simd\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L1\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L3\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p14\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"i_t1\",\n                                            \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t2*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t2*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(c_t1/c_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"c_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"r_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out_trans.fifo_cout.fifo_cout_local.1.2.1(1, c1, c2, c3, p0, p1, c6, 0, 0, 0, c10, c11, c12, 0, 2 * p0 + 4 * c1 + c11, 4 * c2 + 2 * c6 + c10, 2 * p1 + 8 * c3 + c12)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t2*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t2*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(c_t1/c_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"c_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"r_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_cout_local.fifo_cout.1.2.1(0, c1, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 1, 2 * p0 + 4 * c1 + c11, 4 * c2 + 2 * c6 + c10, 2 * p1 + 8 * c3 + c12)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t2*c_t1*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t2*c_t1*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t2*c_t1*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t2*c_t1*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p15\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t2\",\n                                                    \"size\": \"r_t2*c_t1*o_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p15\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t2\",\n                                                    \"size\": \"r_t2*c_t1*o_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t2*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t2*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(c_t1/c_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"c_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"r_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.2.1(1, c1, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 1, 2 * p0 + 4 * c1 + c11, 4 * c2 + 2 * c6 + c10, 2 * p1 + 8 * c3 + c12)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t2*c_t1*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t2*c_t1*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p16\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t2\",\n                                                    \"size\": \"r_t2*c_t1*o_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"c_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"r_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.8.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p0 + 8 * c3 + c12, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p17\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"i_t1\",\n                                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(r_t1/r_t2))\"\n        },\n        \"cin_IO_L2_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"(((((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cout_IO_L1_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(o_t1/o_t2))\"\n        },\n        \"cout_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(o_t1/o_t2))\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(o_t1/o_t2))\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t2*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel4_1.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(r_t1/r_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(i_t1/i_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"c_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"r_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"o_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": [\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p1 + 4 * c1 + c8 + c11, 4 * c2 + 2 * c6 + c9 + c10, 8 * c3 + 2 * c7)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c0 + c12, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"bounds\": [\n                                                                                                \"0\",\n                                                                                                \"i_t2\"\n                                                                                            ],\n                                                                                            \"child\": {\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"S_0(2 * p0 + 8 * c0 + c12, 2 * p1 + 4 * c1 + c11, 4 * c2 + 2 * c6 + c10, 8 * c3 + 2 * c7 + c13, c8, c9)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                \"content\": \"hls_unroll\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            \"iterator\": \"c13\",\n                                                                                            \"type\": \"for\"\n                                                                                        },\n                                                                                        \"content\": \"simd\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c0 + c12, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_drain.1.1(c0, c1, c2, 1, p0, p1, c6, 3, 2, 2, c10, c11, c12, 2 * p1 + 4 * c1 + c11, 4 * c2 + 2 * c6 + c10, 2 * p0 + 8 * c0 + c12)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p1 + 4 * c1 + c8 + c11, 4 * c2 + 2 * c6 + c9 + c10, 8 * c3 + 2 * c7)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    }\n                                                                                ],\n                                                                                \"type\": \"block\"\n                                                                            },\n                                                                            \"content\": \"hls_pipeline\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"c_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"r_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.8.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p0 + 4 * c1 + c8 + c11, 4 * c2 + 2 * c6 + c9 + c10, 8 * c3 + 2 * c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p14\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"i_t1\",\n                                                    \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_serialize\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t2*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t2*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(c_t1/c_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"c_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"r_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.2.1(c0, c1, c2, 1, p0, p1, c6, 3, 2, 2, c10, c11, c12, 1, 2 * p0 + 4 * c1 + c11, 4 * c2 + 2 * c6 + c10, 2 * p1 + 8 * c0 + c12)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t2*c_t1*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t2*c_t1*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p16\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"o_t2\",\n                                                        \"size\": \"r_t2*c_t1*o_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_serialize\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"c_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"r_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.8.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p0 + 8 * c0 + c12, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p17\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"i_t1\",\n                                                    \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_serialize\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(r_t1/r_t2))\"\n        },\n        \"cin_IO_L2_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"(((((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(o_t1/o_t2))\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t2*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel4_2.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(r_t1/r_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L1_in\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_IO_L1_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_IO_L2_in\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_IO_L2_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in.fifo_cout.1.1(c0, 1, c2, c3, p0, p1, 2 * p1 + c6 + 4, 4 * c2 + c7, 2 * p0 + 8 * c0 + c8)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(c_t1/c_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(i_t1/i_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"p\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"q\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"c_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"r_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"bounds\": [\n                                                                                    \"0\",\n                                                                                    \"o_t2\"\n                                                                                ],\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p1 + 4 * c2 + c8 + c11, 4 * c3 + 2 * c6 + c9 + c10, 8 * c1 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c0 + c12, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"bounds\": [\n                                                                                                        \"0\",\n                                                                                                        \"i_t2\"\n                                                                                                    ],\n                                                                                                    \"child\": {\n                                                                                                        \"child\": {\n                                                                                                            \"child\": {\n                                                                                                                \"user_expr\": \"S_0(2 * p0 + 8 * c0 + c12, 2 * p1 + 4 * c2 + c11, 4 * c3 + 2 * c6 + c10, 8 * c1 + 2 * c7 + c13, c8, c9)\"\n                                                                                                            },\n                                                                                                            \"type\": \"user\"\n                                                                                                        },\n                                                                                                        \"content\": \"hls_unroll\",\n                                                                                                        \"type\": \"mark\"\n                                                                                                    },\n                                                                                                    \"iterator\": \"c13\",\n                                                                                                    \"type\": \"for\"\n                                                                                                },\n                                                                                                \"content\": \"simd\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c0 + c12, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": [\n                                                                                                    {\n                                                                                                        \"child\": {\n                                                                                                            \"user_expr\": \"out.fifo_cout_drain.1.1(c0, 1, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 2 * p1 + 4 * c2 + c11, 4 * c3 + 2 * c6 + c10, 2 * p0 + 8 * c0 + c12)\"\n                                                                                                        },\n                                                                                                        \"type\": \"user\"\n                                                                                                    }\n                                                                                                ],\n                                                                                                \"type\": \"if\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p1 + 4 * c2 + c8 + c11, 4 * c3 + 2 * c6 + c9 + c10, 8 * c1 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"block\"\n                                                                                    },\n                                                                                    \"content\": \"hls_pipeline\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"iterator\": \"c10\",\n                                                                                \"type\": \"for\"\n                                                                            },\n                                                                            \"content\": \"latency\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c11\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c12\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c0\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"iterator\": \"c1\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c9\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c8\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out.fifo_cout.1.1(c0, 0, c2, c3, p0, p1, 2 * p1 + c6, 4 * c2 + c7, 2 * p0 + 8 * c0 + c8)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"c_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"r_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.8.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p0 + 4 * c2 + c8 + c11, 4 * c3 + 2 * c6 + c9 + c10, 8 * c1 + 2 * c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p14\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"i_t1\",\n                                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t2*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t2*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(c_t1/c_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"c_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"r_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out_trans.fifo_cout.fifo_cout_local.1.2.1(c0, 1, c2, c3, p0, p1, c6, 0, 0, 0, c10, c11, c12, 0, 2 * p0 + 4 * c2 + c11, 4 * c3 + 2 * c6 + c10, 2 * p1 + 8 * c0 + c12)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t2*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t2*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(c_t1/c_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"c_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"r_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_cout_local.fifo_cout.1.2.1(c0, 0, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 1, 2 * p0 + 4 * c2 + c11, 4 * c3 + 2 * c6 + c10, 2 * p1 + 8 * c0 + c12)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t2*c_t1*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t2*c_t1*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t2*c_t1*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t2*c_t1*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p15\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t2\",\n                                                    \"size\": \"r_t2*c_t1*o_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p15\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t2\",\n                                                    \"size\": \"r_t2*c_t1*o_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t2*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t2*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(c_t1/c_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"c_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"r_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.2.1(c0, 1, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 1, 2 * p0 + 4 * c2 + c11, 4 * c3 + 2 * c6 + c10, 2 * p1 + 8 * c0 + c12)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t2*c_t1*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t2*c_t1*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p16\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t2\",\n                                                    \"size\": \"r_t2*c_t1*o_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(c_t1/c_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"p\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"q\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"c_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"r_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"o_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.8.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p0 + 8 * c0 + c12, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    \"content\": \"hls_pipeline\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"content\": \"simd\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c10\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c11\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c12\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c0\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c1\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c9\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c8\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"pe\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(o_t1/o_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p17\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"i_t1\",\n                                        \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c6\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(r_t1/r_t2))\"\n        },\n        \"cin_IO_L2_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"(((((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cout_IO_L1_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(o_t1/o_t2))\"\n        },\n        \"cout_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(o_t1/o_t2))\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(o_t1/o_t2))\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t2*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel5_0.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(c_t1/c_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L1_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_IO_L1_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_IO_L2_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_IO_L2_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in.fifo_cout.1.1(1, c1, c2, c3, p0, p1, 4 * c1 + c6, 2 * p1 + 4 * c2 + c7, 2 * p0 + c8 + 8)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(r_t1/r_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(i_t1/i_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"p\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"q\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"r_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"c_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"bounds\": [\n                                                                                    \"0\",\n                                                                                    \"o_t2\"\n                                                                                ],\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c6 + c8 + c10, 2 * p1 + 4 * c2 + c9 + c11, 8 * c0 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c3 + c12, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"bounds\": [\n                                                                                                        \"0\",\n                                                                                                        \"i_t2\"\n                                                                                                    ],\n                                                                                                    \"child\": {\n                                                                                                        \"child\": {\n                                                                                                            \"child\": {\n                                                                                                                \"user_expr\": \"S_0(2 * p0 + 8 * c3 + c12, 4 * c1 + 2 * c6 + c10, 2 * p1 + 4 * c2 + c11, 8 * c0 + 2 * c7 + c13, c8, c9)\"\n                                                                                                            },\n                                                                                                            \"type\": \"user\"\n                                                                                                        },\n                                                                                                        \"content\": \"hls_unroll\",\n                                                                                                        \"type\": \"mark\"\n                                                                                                    },\n                                                                                                    \"iterator\": \"c13\",\n                                                                                                    \"type\": \"for\"\n                                                                                                },\n                                                                                                \"content\": \"simd\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c3 + c12, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": [\n                                                                                                    {\n                                                                                                        \"child\": {\n                                                                                                            \"user_expr\": \"out.fifo_cout_drain.1.1(1, c1, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 4 * c1 + 2 * c6 + c10, 2 * p1 + 4 * c2 + c11, 2 * p0 + 8 * c3 + c12)\"\n                                                                                                        },\n                                                                                                        \"type\": \"user\"\n                                                                                                    }\n                                                                                                ],\n                                                                                                \"type\": \"if\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c6 + c8 + c10, 2 * p1 + 4 * c2 + c9 + c11, 8 * c0 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"block\"\n                                                                                    },\n                                                                                    \"content\": \"hls_pipeline\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"iterator\": \"c10\",\n                                                                                \"type\": \"for\"\n                                                                            },\n                                                                            \"content\": \"latency\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c11\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c12\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c0\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"iterator\": \"c1\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c9\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c8\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out.fifo_cout.1.1(0, c1, c2, c3, p0, p1, 4 * c1 + c6, 2 * p1 + 4 * c2 + c7, 2 * p0 + c8)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(i_t1/i_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"c_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"o_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.8.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c1 + 2 * c6 + c8 + c10, 2 * p0 + 4 * c2 + c9 + c11, 8 * c0 + 2 * c7)\"\n                                                                                    },\n                                                                                    \"type\": \"user\"\n                                                                                },\n                                                                                \"content\": \"hls_pipeline\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"content\": \"simd\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L1\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L3\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p14\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"i_t1\",\n                                            \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t2*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t2*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"r_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out_trans.fifo_cout.fifo_cout_local.1.2.1(1, c1, c2, c3, p0, p1, c6, 0, 0, 0, c10, c11, c12, 0, 4 * c1 + 2 * c6 + c10, 2 * p0 + 4 * c2 + c11, 2 * p1 + 8 * c3 + c12)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t2*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t2*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"r_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_cout_local.fifo_cout.1.2.1(0, c1, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c6 + c10, 2 * p0 + 4 * c2 + c11, 2 * p1 + 8 * c3 + c12)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t1*c_t2*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t1*c_t2*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t1*c_t2*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t1*c_t2*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p15\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t2\",\n                                                    \"size\": \"r_t1*c_t2*o_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p15\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t2\",\n                                                    \"size\": \"r_t1*c_t2*o_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t2*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t2*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"r_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.2.1(1, c1, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c6 + c10, 2 * p0 + 4 * c2 + c11, 2 * p1 + 8 * c3 + c12)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t1*c_t2*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t1*c_t2*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p16\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t2\",\n                                                    \"size\": \"r_t1*c_t2*o_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"c_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.8.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p0 + 8 * c3 + c12, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p17\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"i_t1\",\n                                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(c_t1/c_t2))\"\n        },\n        \"cin_IO_L2_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cout_IO_L1_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t2)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(o_t1/o_t2))\"\n        },\n        \"cout_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t2)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(o_t1/o_t2))\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t2)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(o_t1/o_t2))\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t2*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel5_1.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(c_t1/c_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(i_t1/i_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"c_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"o_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": [\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c6 + c8 + c10, 2 * p1 + 4 * c2 + c9 + c11, 8 * c3 + 2 * c7)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c0 + c12, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"bounds\": [\n                                                                                                \"0\",\n                                                                                                \"i_t2\"\n                                                                                            ],\n                                                                                            \"child\": {\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"S_0(2 * p0 + 8 * c0 + c12, 4 * c1 + 2 * c6 + c10, 2 * p1 + 4 * c2 + c11, 8 * c3 + 2 * c7 + c13, c8, c9)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                \"content\": \"hls_unroll\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            \"iterator\": \"c13\",\n                                                                                            \"type\": \"for\"\n                                                                                        },\n                                                                                        \"content\": \"simd\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c0 + c12, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_drain.1.1(c0, c1, c2, 1, p0, p1, c6, 3, 2, 2, c10, c11, c12, 4 * c1 + 2 * c6 + c10, 2 * p1 + 4 * c2 + c11, 2 * p0 + 8 * c0 + c12)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c6 + c8 + c10, 2 * p1 + 4 * c2 + c9 + c11, 8 * c3 + 2 * c7)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    }\n                                                                                ],\n                                                                                \"type\": \"block\"\n                                                                            },\n                                                                            \"content\": \"hls_pipeline\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"c_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.8.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c1 + 2 * c6 + c8 + c10, 2 * p0 + 4 * c2 + c9 + c11, 8 * c3 + 2 * c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p14\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"i_t1\",\n                                                    \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_serialize\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t2*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t2*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"r_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.2.1(c0, c1, c2, 1, p0, p1, c6, 3, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c6 + c10, 2 * p0 + 4 * c2 + c11, 2 * p1 + 8 * c0 + c12)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t1*c_t2*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t1*c_t2*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p16\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"o_t2\",\n                                                        \"size\": \"r_t1*c_t2*o_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_serialize\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"c_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.8.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p0 + 8 * c0 + c12, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p17\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"i_t1\",\n                                                    \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_serialize\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(c_t1/c_t2))\"\n        },\n        \"cin_IO_L2_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t2)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(o_t1/o_t2))\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t2*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel5_2.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(c_t1/c_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L1_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_IO_L1_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_IO_L2_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_IO_L2_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in.fifo_cout.1.1(c0, 1, c2, c3, p0, p1, c6 + 4, 2 * p1 + 4 * c2 + c7, 2 * p0 + 8 * c0 + c8)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(r_t1/r_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(i_t1/i_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"p\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"q\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"r_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"c_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"bounds\": [\n                                                                                    \"0\",\n                                                                                    \"o_t2\"\n                                                                                ],\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 4 * c2 + 2 * c6 + c8 + c10, 2 * p1 + 4 * c3 + c9 + c11, 8 * c1 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c0 + c12, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"bounds\": [\n                                                                                                        \"0\",\n                                                                                                        \"i_t2\"\n                                                                                                    ],\n                                                                                                    \"child\": {\n                                                                                                        \"child\": {\n                                                                                                            \"child\": {\n                                                                                                                \"user_expr\": \"S_0(2 * p0 + 8 * c0 + c12, 4 * c2 + 2 * c6 + c10, 2 * p1 + 4 * c3 + c11, 8 * c1 + 2 * c7 + c13, c8, c9)\"\n                                                                                                            },\n                                                                                                            \"type\": \"user\"\n                                                                                                        },\n                                                                                                        \"content\": \"hls_unroll\",\n                                                                                                        \"type\": \"mark\"\n                                                                                                    },\n                                                                                                    \"iterator\": \"c13\",\n                                                                                                    \"type\": \"for\"\n                                                                                                },\n                                                                                                \"content\": \"simd\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c0 + c12, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": [\n                                                                                                    {\n                                                                                                        \"child\": {\n                                                                                                            \"user_expr\": \"out.fifo_cout_drain.1.1(c0, 1, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 4 * c2 + 2 * c6 + c10, 2 * p1 + 4 * c3 + c11, 2 * p0 + 8 * c0 + c12)\"\n                                                                                                        },\n                                                                                                        \"type\": \"user\"\n                                                                                                    }\n                                                                                                ],\n                                                                                                \"type\": \"if\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 4 * c2 + 2 * c6 + c8 + c10, 2 * p1 + 4 * c3 + c9 + c11, 8 * c1 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"block\"\n                                                                                    },\n                                                                                    \"content\": \"hls_pipeline\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"iterator\": \"c10\",\n                                                                                \"type\": \"for\"\n                                                                            },\n                                                                            \"content\": \"latency\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c11\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c12\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c0\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"iterator\": \"c1\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c9\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c8\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out.fifo_cout.1.1(c0, 0, c2, c3, p0, p1, c6, 2 * p1 + 4 * c2 + c7, 2 * p0 + 8 * c0 + c8)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"c_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.8.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c2 + 2 * c6 + c8 + c10, 2 * p0 + 4 * c3 + c9 + c11, 8 * c1 + 2 * c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p14\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"i_t1\",\n                                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t2*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t2*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"r_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out_trans.fifo_cout.fifo_cout_local.1.2.1(c0, 1, c2, c3, p0, p1, c6, 0, 0, 0, c10, c11, c12, 0, 4 * c2 + 2 * c6 + c10, 2 * p0 + 4 * c3 + c11, 2 * p1 + 8 * c0 + c12)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t2*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t2*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"r_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_cout_local.fifo_cout.1.2.1(c0, 0, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 1, 4 * c2 + 2 * c6 + c10, 2 * p0 + 4 * c3 + c11, 2 * p1 + 8 * c0 + c12)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t1*c_t2*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t1*c_t2*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t1*c_t2*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t1*c_t2*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p15\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t2\",\n                                                    \"size\": \"r_t1*c_t2*o_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p15\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t2\",\n                                                    \"size\": \"r_t1*c_t2*o_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t2*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t2*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"r_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.2.1(c0, 1, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 1, 4 * c2 + 2 * c6 + c10, 2 * p0 + 4 * c3 + c11, 2 * p1 + 8 * c0 + c12)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t1*c_t2*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(o_t1/o_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t2\",\n                                                            \"size\": \"r_t1*c_t2*o_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p16\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t2\",\n                                                    \"size\": \"r_t1*c_t2*o_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"p\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"q\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"r_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"c_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"o_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.8.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p0 + 8 * c0 + c12, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    \"content\": \"hls_pipeline\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"content\": \"simd\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c10\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c11\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c12\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c0\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c1\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c9\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c8\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"pe\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(o_t1/o_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p17\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"i_t1\",\n                                        \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c6\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(c_t1/c_t2))\"\n        },\n        \"cin_IO_L2_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cout_IO_L1_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t2)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(o_t1/o_t2))\"\n        },\n        \"cout_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t2)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(o_t1/o_t2))\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t2)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(o_t1/o_t2))\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t2*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel6_0.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(i_t1/i_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L2_in\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_1_IO_L2_out\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_1_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(c_t1/c_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"c_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"r_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"o_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": [\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c6 + c8 + c11, 4 * c2 + 2 * c7 + c9 + c10, 2 * p1 + 8 * c0)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 0, 0, c10, c11, c12, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 2 * p0 + 8 * c3 + c12)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c3 + c12, c8, c9, 2 * p1 + 8 * c0)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"bounds\": [\n                                                                                                \"0\",\n                                                                                                \"i_t2\"\n                                                                                            ],\n                                                                                            \"child\": {\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"S_0(2 * p0 + 8 * c3 + c12, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 2 * p1 + 8 * c0 + c13, c8, c9)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                \"content\": \"hls_unroll\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            \"iterator\": \"c13\",\n                                                                                            \"type\": \"for\"\n                                                                                        },\n                                                                                        \"content\": \"simd\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_drain.1.1(1, c1, c2, c3, p0, 3, c6, c7, 2, 2, c10, c11, c12, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 2 * p0 + 8 * c3 + c12)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 2, 2, c10, c11, c12, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 2 * p0 + 8 * c3 + c12)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c6 + c8 + c11, 4 * c2 + 2 * c7 + c9 + c10, 2 * p1 + 8 * c0)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    }\n                                                                                ],\n                                                                                \"type\": \"block\"\n                                                                            },\n                                                                            \"content\": \"hls_pipeline\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(c_t1/c_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"c_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"r_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"o_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.2.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c1 + 2 * c6 + c8 + c11, 4 * c2 + 2 * c7 + c9 + c10, 2 * p0 + 8 * c0)\"\n                                                                                    },\n                                                                                    \"type\": \"user\"\n                                                                                },\n                                                                                \"content\": \"hls_pipeline\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"content\": \"simd\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L1\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L3\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p14\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"i_t2\",\n                                            \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"o_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out_trans.fifo_cout_1.fifo_cout_1_local.1.2.1(1, c1, c2, c3, p0, 0, c6, c7, 0, 0, c10, c11, c12, 0, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 2 * p0 + 8 * c3 + c12)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c10\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c11\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c12\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"o_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in_trans.fifo_cout_1_local.fifo_cout_1.1.2.1(0, c1, c2, c3, p0, 3, c6, c7, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 2 * p0 + 8 * c3 + c12)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c10\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c11\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c12\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(o_t1/o_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t2\",\n                                            \"size\": \"r_t1*c_t1*o_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(o_t1/o_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t2\",\n                                            \"size\": \"r_t1*c_t1*o_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"child\": [\n                {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"ceil((o/o_t1))\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L3\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"array\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c2\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                }\n            ],\n            \"type\": \"if\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"r_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"o_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.2.1(1, c1, c2, c3, 3, p1, c6, c7, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 2 * p1 + 8 * c3 + c12)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(o_t1/o_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p17\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"o_t2\",\n                                                        \"size\": \"r_t1*c_t1*o_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(o_t1/o_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p17\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"o_t2\",\n                                                        \"size\": \"r_t1*c_t1*o_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        }\n                                    ],\n                                    \"type\": \"if\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p17\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"o_t2\",\n                                                \"size\": \"r_t1*c_t1*o_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.state_handle()\"\n                                                },\n                                                \"type\": \"user\"\n                                            }\n                                        ],\n                                        \"type\": \"block\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p18\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p18\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"p\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"q\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"c_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"o_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.2.2(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p1 + 8 * c3 + c12, c8, c9, 2 * p0 + 8 * c0)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"content\": \"simd\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c10\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c11\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c12\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c0\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c1\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": [\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(o_t1/o_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p18\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(o_t1/o_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p18\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(o_t1/o_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p18\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"i_t2\",\n                                                        \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(i_t1/i_t2))\"\n        },\n        \"cin_IO_L2_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"cout_1_IO_L2_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"w_IO_L1_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t2*((p-1)+1))*((q-1)+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p18\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(o_t1/o_t2))\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,16),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p18\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel6_1.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(i_t1/i_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L2_out\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_1_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(c_t1/c_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"c_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"r_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"o_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": [\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c6 + c8 + c11, 4 * c2 + 2 * c7 + c9 + c10, 2 * p1 + 8 * c3)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 0, 0, c10, c11, c12, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 2 * p0 + 8 * c0 + c12)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c0 + c12, c8, c9, 2 * p1 + 8 * c3)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"bounds\": [\n                                                                                                \"0\",\n                                                                                                \"i_t2\"\n                                                                                            ],\n                                                                                            \"child\": {\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"S_0(2 * p0 + 8 * c0 + c12, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 2 * p1 + 8 * c3 + c13, c8, c9)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                \"content\": \"hls_unroll\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            \"iterator\": \"c13\",\n                                                                                            \"type\": \"for\"\n                                                                                        },\n                                                                                        \"content\": \"simd\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 2, 2, c10, c11, c12, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 2 * p0 + 8 * c0 + c12)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c6 + c8 + c11, 4 * c2 + 2 * c7 + c9 + c10, 2 * p1 + 8 * c3)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    }\n                                                                                ],\n                                                                                \"type\": \"block\"\n                                                                            },\n                                                                            \"content\": \"hls_pipeline\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"c_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"r_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.2.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c1 + 2 * c6 + c8 + c11, 4 * c2 + 2 * c7 + c9 + c10, 2 * p0 + 8 * c3)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p14\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"i_t2\",\n                                                    \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_serialize\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": [\n                            {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.state_handle()\"\n                                },\n                                \"type\": \"user\"\n                            }\n                        ],\n                        \"type\": \"block\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p16\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"o_t2\",\n                                        \"size\": \"r_t1*c_t1*o_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p16\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"o_t2\",\n                                        \"size\": \"r_t1*c_t1*o_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                }\n                            ],\n                            \"type\": \"if\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c6\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"io_L3\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"array\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(c_t1/c_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"o_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"in_trans_reduce_+.fifo_cout_1_local.fifo_cout_1.1.2.1(c0, c1, c2, c3, p0, 3, c6, c7, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 2 * p0 + 8 * c0 + c12)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_pipeline\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c10\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c11\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c12\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L1\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L3\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(o_t1/o_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p16\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"o_t2\",\n                                                \"size\": \"r_t1*c_t1*o_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_serialize\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.state_handle()\"\n                                                },\n                                                \"type\": \"user\"\n                                            }\n                                        ],\n                                        \"type\": \"block\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"p\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"q\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"c_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"o_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.2.2(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p1 + 8 * c0 + c12, c8, c9, 2 * p0 + 8 * c3)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"content\": \"simd\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c10\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c11\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c12\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c0\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c1\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": [\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(o_t1/o_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p17\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(o_t1/o_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p17\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(o_t1/o_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p17\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"i_t2\",\n                                                            \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_serialize\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(i_t1/i_t2))\"\n        },\n        \"cin_IO_L2_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"w_IO_L1_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t2*((p-1)+1))*((q-1)+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(o_t1/o_t2))\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,16),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel6_2.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(i_t1/i_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L2_in\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_1_IO_L2_out\": {\n            \"dims\": [\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_1_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(o_t1/o_t2)\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(c_t1/c_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"c_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"r_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"o_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": [\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 4 * c2 + 2 * c6 + c8 + c11, 4 * c3 + 2 * c7 + c9 + c10, 2 * p1 + 8 * c1)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 0, 0, c10, c11, c12, 4 * c2 + 2 * c6 + c11, 4 * c3 + 2 * c7 + c10, 2 * p0 + 8 * c0 + c12)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 8 * c0 + c12, c8, c9, 2 * p1 + 8 * c1)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"bounds\": [\n                                                                                                \"0\",\n                                                                                                \"i_t2\"\n                                                                                            ],\n                                                                                            \"child\": {\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"S_0(2 * p0 + 8 * c0 + c12, 4 * c2 + 2 * c6 + c11, 4 * c3 + 2 * c7 + c10, 2 * p1 + 8 * c1 + c13, c8, c9)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                \"content\": \"hls_unroll\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            \"iterator\": \"c13\",\n                                                                                            \"type\": \"for\"\n                                                                                        },\n                                                                                        \"content\": \"simd\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_drain.1.1(c0, 1, c2, c3, p0, 3, c6, c7, 2, 2, c10, c11, c12, 4 * c2 + 2 * c6 + c11, 4 * c3 + 2 * c7 + c10, 2 * p0 + 8 * c0 + c12)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 2, 2, c10, c11, c12, 4 * c2 + 2 * c6 + c11, 4 * c3 + 2 * c7 + c10, 2 * p0 + 8 * c0 + c12)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 4 * c2 + 2 * c6 + c8 + c11, 4 * c3 + 2 * c7 + c9 + c10, 2 * p1 + 8 * c1)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    }\n                                                                                ],\n                                                                                \"type\": \"block\"\n                                                                            },\n                                                                            \"content\": \"hls_pipeline\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"c_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"r_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.2.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c2 + 2 * c6 + c8 + c11, 4 * c3 + 2 * c7 + c9 + c10, 2 * p0 + 8 * c1)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p14\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"i_t2\",\n                                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"o_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out_trans.fifo_cout_1.fifo_cout_1_local.1.2.1(c0, 1, c2, c3, p0, 0, c6, c7, 0, 0, c10, c11, c12, 0, 4 * c2 + 2 * c6 + c11, 4 * c3 + 2 * c7 + c10, 2 * p0 + 8 * c0 + c12)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c10\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c11\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c12\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"o_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in_trans.fifo_cout_1_local.fifo_cout_1.1.2.1(c0, 0, c2, c3, p0, 3, c6, c7, 2, 2, c10, c11, c12, 1, 4 * c2 + 2 * c6 + c11, 4 * c3 + 2 * c7 + c10, 2 * p0 + 8 * c0 + c12)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c10\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c11\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c12\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(o_t1/o_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t2\",\n                                            \"size\": \"r_t1*c_t1*o_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(o_t1/o_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t2\",\n                                            \"size\": \"r_t1*c_t1*o_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"child\": [\n                {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((r/r_t1))\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"ceil((c/c_t1))\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L3\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"array\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c3\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                }\n            ],\n            \"type\": \"if\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t2\",\n                                \"size\": \"r_t1*c_t1*o_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"r_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"o_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.2.1(c0, 1, c2, c3, 3, p1, c6, c7, 2, 2, c10, c11, c12, 1, 4 * c2 + 2 * c6 + c11, 4 * c3 + 2 * c7 + c10, 2 * p1 + 8 * c0 + c12)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(o_t1/o_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p17\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"o_t2\",\n                                                        \"size\": \"r_t1*c_t1*o_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(o_t1/o_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p17\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"o_t2\",\n                                                        \"size\": \"r_t1*c_t1*o_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        }\n                                    ],\n                                    \"type\": \"if\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p17\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"o_t2\",\n                                                \"size\": \"r_t1*c_t1*o_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.state_handle()\"\n                                                },\n                                                \"type\": \"user\"\n                                            }\n                                        ],\n                                        \"type\": \"block\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(o_t1/o_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p18\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p18\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"p\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"q\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"c_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"o_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.2.2(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p1 + 8 * c0 + c12, c8, c9, 2 * p0 + 8 * c1)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"content\": \"simd\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c10\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c11\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c12\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c0\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c1\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": [\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(o_t1/o_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p18\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(o_t1/o_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p18\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(o_t1/o_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p18\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"i_t2\",\n                                                        \"size\": \"o_t2*((p-1)+1)*((q-1)+1)*i_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((o_t1/o_t2)*(i_t1/i_t2))\"\n        },\n        \"cin_IO_L2_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"cout_1_IO_L2_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t1)*o_t2)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(o_t1/o_t2)\"\n        },\n        \"w_IO_L1_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t2*((p-1)+1))*((q-1)+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p18\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(o_t1/o_t2))\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,16),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p18\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel7_0.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(c_t1/c_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L1_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_IO_L1_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_IO_L2_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_IO_L2_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in.fifo_cout.1.1(1, c1, c2, c3, p0, p1, 2 * p0 + 4 * c1 + c6, 2 * p1 + 4 * c2 + c7, c8 + 8)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(o_t1/o_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(i_t1/i_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"p\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"q\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"o_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"c_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"bounds\": [\n                                                                                    \"0\",\n                                                                                    \"r_t2\"\n                                                                                ],\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 4 * c1 + c8 + c12, 2 * p1 + 4 * c2 + c9 + c11, 8 * c0 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c3 + 2 * c6 + c10, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"bounds\": [\n                                                                                                        \"0\",\n                                                                                                        \"i_t2\"\n                                                                                                    ],\n                                                                                                    \"child\": {\n                                                                                                        \"child\": {\n                                                                                                            \"child\": {\n                                                                                                                \"user_expr\": \"S_0(8 * c3 + 2 * c6 + c10, 2 * p0 + 4 * c1 + c12, 2 * p1 + 4 * c2 + c11, 8 * c0 + 2 * c7 + c13, c8, c9)\"\n                                                                                                            },\n                                                                                                            \"type\": \"user\"\n                                                                                                        },\n                                                                                                        \"content\": \"hls_unroll\",\n                                                                                                        \"type\": \"mark\"\n                                                                                                    },\n                                                                                                    \"iterator\": \"c13\",\n                                                                                                    \"type\": \"for\"\n                                                                                                },\n                                                                                                \"content\": \"simd\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c3 + 2 * c6 + c10, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": [\n                                                                                                    {\n                                                                                                        \"child\": {\n                                                                                                            \"user_expr\": \"out.fifo_cout_drain.1.1(1, c1, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 2 * p0 + 4 * c1 + c12, 2 * p1 + 4 * c2 + c11, 8 * c3 + 2 * c6 + c10)\"\n                                                                                                        },\n                                                                                                        \"type\": \"user\"\n                                                                                                    }\n                                                                                                ],\n                                                                                                \"type\": \"if\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"block\"\n                                                                                    },\n                                                                                    \"content\": \"hls_pipeline\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"iterator\": \"c10\",\n                                                                                \"type\": \"for\"\n                                                                            },\n                                                                            \"content\": \"latency\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c11\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c12\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c0\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"iterator\": \"c1\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c9\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c8\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out.fifo_cout.1.1(0, c1, c2, c3, p0, p1, 2 * p0 + 4 * c1 + c6, 2 * p1 + 4 * c2 + c7, c8)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.state_handle()\"\n                                                },\n                                                \"type\": \"user\"\n                                            }\n                                        ],\n                                        \"type\": \"block\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(i_t1/i_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"p\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"q\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.4.2(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p1 + 4 * c1 + c8 + c12, 2 * p0 + 4 * c2 + c9 + c11, 8 * c0 + 2 * c7)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"content\": \"simd\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c10\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c11\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c12\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c0\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c1\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": [\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(r_t1/r_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t1\",\n                                                                \"size\": \"(((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(r_t1/r_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t1\",\n                                                                \"size\": \"(((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(r_t1/r_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p14\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"i_t1\",\n                                                        \"size\": \"(((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"o_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out_trans.fifo_cout.fifo_cout_local.1.4.1(1, c1, c2, c3, p0, p1, c6, 0, 0, 0, c10, c11, c12, 0, 2 * p1 + 4 * c1 + c12, 2 * p0 + 4 * c2 + c11, 8 * c3 + 2 * c6 + c10)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"o_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_cout_local.fifo_cout.1.4.1(0, c1, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 1, 2 * p1 + 4 * c1 + c12, 2 * p0 + 4 * c2 + c11, 8 * c3 + 2 * c6 + c10)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(r_t1/r_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t1\",\n                                                            \"size\": \"r_t2*c_t2*o_t1\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(r_t1/r_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t1\",\n                                                            \"size\": \"r_t2*c_t2*o_t1\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(r_t1/r_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t1\",\n                                                            \"size\": \"r_t2*c_t2*o_t1\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(r_t1/r_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t1\",\n                                                            \"size\": \"r_t2*c_t2*o_t1\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p15\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t1\",\n                                                    \"size\": \"r_t2*c_t2*o_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p15\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t1\",\n                                                    \"size\": \"r_t2*c_t2*o_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"o_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.4.1(1, c1, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 1, 2 * p1 + 4 * c1 + c12, 2 * p0 + 4 * c2 + c11, 8 * c3 + 2 * c6 + c10)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(r_t1/r_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t1\",\n                                                            \"size\": \"r_t2*c_t2*o_t1\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(r_t1/r_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t1\",\n                                                            \"size\": \"r_t2*c_t2*o_t1\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p16\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t1\",\n                                                    \"size\": \"r_t2*c_t2*o_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"c_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"r_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.8.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c3 + 2 * c6 + c10, c8, c9, 8 * c0 + 2 * c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p17\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"i_t1\",\n                                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t2)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(c_t1/c_t2))\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"(((((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(r_t1/r_t2))\"\n        },\n        \"cout_IO_L1_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(r_t1/r_t2))\"\n        },\n        \"cout_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(r_t1/r_t2))\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(r_t1/r_t2))\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel7_1.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(c_t1/c_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(i_t1/i_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"o_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"c_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"r_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": [\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 4 * c1 + c8 + c12, 2 * p1 + 4 * c2 + c9 + c11, 8 * c3 + 2 * c7)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c6 + c10, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"bounds\": [\n                                                                                                \"0\",\n                                                                                                \"i_t2\"\n                                                                                            ],\n                                                                                            \"child\": {\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"S_0(8 * c0 + 2 * c6 + c10, 2 * p0 + 4 * c1 + c12, 2 * p1 + 4 * c2 + c11, 8 * c3 + 2 * c7 + c13, c8, c9)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                \"content\": \"hls_unroll\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            \"iterator\": \"c13\",\n                                                                                            \"type\": \"for\"\n                                                                                        },\n                                                                                        \"content\": \"simd\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c6 + c10, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_drain.1.1(c0, c1, c2, 1, p0, p1, c6, 3, 2, 2, c10, c11, c12, 2 * p0 + 4 * c1 + c12, 2 * p1 + 4 * c2 + c11, 8 * c0 + 2 * c6 + c10)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    }\n                                                                                ],\n                                                                                \"type\": \"block\"\n                                                                            },\n                                                                            \"content\": \"hls_pipeline\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.state_handle()\"\n                                                },\n                                                \"type\": \"user\"\n                                            }\n                                        ],\n                                        \"type\": \"block\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(i_t1/i_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"p\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"q\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.4.2(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p1 + 4 * c1 + c8 + c12, 2 * p0 + 4 * c2 + c9 + c11, 8 * c3 + 2 * c7)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"content\": \"simd\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c10\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c11\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c12\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c0\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c1\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": [\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(r_t1/r_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t1\",\n                                                                \"size\": \"(((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(r_t1/r_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t1\",\n                                                                \"size\": \"(((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(r_t1/r_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p14\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"i_t1\",\n                                                            \"size\": \"(((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_serialize\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"o_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.4.1(c0, c1, c2, 1, p0, p1, c6, 3, 2, 2, c10, c11, c12, 1, 2 * p1 + 4 * c1 + c12, 2 * p0 + 4 * c2 + c11, 8 * c0 + 2 * c6 + c10)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(r_t1/r_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t1\",\n                                                            \"size\": \"r_t2*c_t2*o_t1\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(r_t1/r_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t1\",\n                                                            \"size\": \"r_t2*c_t2*o_t1\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p16\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"o_t1\",\n                                                        \"size\": \"r_t2*c_t2*o_t1\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_serialize\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"c_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"r_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.8.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c0 + 2 * c6 + c10, c8, c9, 8 * c3 + 2 * c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p17\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"i_t1\",\n                                                    \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_serialize\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t2)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(c_t1/c_t2))\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"(((((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(r_t1/r_t2))\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(r_t1/r_t2))\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel7_2.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(c_t1/c_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L1_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_IO_L1_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_IO_L2_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_IO_L2_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in.fifo_cout.1.1(c0, 1, c2, c3, p0, p1, 2 * p0 + c6 + 4, 2 * p1 + 4 * c2 + c7, 8 * c0 + c8)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(o_t1/o_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(i_t1/i_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"p\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"q\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"o_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"c_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"bounds\": [\n                                                                                    \"0\",\n                                                                                    \"r_t2\"\n                                                                                ],\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 4 * c2 + c8 + c12, 2 * p1 + 4 * c3 + c9 + c11, 8 * c1 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c6 + c10, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"bounds\": [\n                                                                                                        \"0\",\n                                                                                                        \"i_t2\"\n                                                                                                    ],\n                                                                                                    \"child\": {\n                                                                                                        \"child\": {\n                                                                                                            \"child\": {\n                                                                                                                \"user_expr\": \"S_0(8 * c0 + 2 * c6 + c10, 2 * p0 + 4 * c2 + c12, 2 * p1 + 4 * c3 + c11, 8 * c1 + 2 * c7 + c13, c8, c9)\"\n                                                                                                            },\n                                                                                                            \"type\": \"user\"\n                                                                                                        },\n                                                                                                        \"content\": \"hls_unroll\",\n                                                                                                        \"type\": \"mark\"\n                                                                                                    },\n                                                                                                    \"iterator\": \"c13\",\n                                                                                                    \"type\": \"for\"\n                                                                                                },\n                                                                                                \"content\": \"simd\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c6 + c10, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            },\n                                                                                            {\n                                                                                                \"child\": [\n                                                                                                    {\n                                                                                                        \"child\": {\n                                                                                                            \"user_expr\": \"out.fifo_cout_drain.1.1(c0, 1, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 2 * p0 + 4 * c2 + c12, 2 * p1 + 4 * c3 + c11, 8 * c0 + 2 * c6 + c10)\"\n                                                                                                        },\n                                                                                                        \"type\": \"user\"\n                                                                                                    }\n                                                                                                ],\n                                                                                                \"type\": \"if\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"block\"\n                                                                                    },\n                                                                                    \"content\": \"hls_pipeline\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"iterator\": \"c10\",\n                                                                                \"type\": \"for\"\n                                                                            },\n                                                                            \"content\": \"latency\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c11\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c12\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c0\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"iterator\": \"c1\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c9\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c8\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"child\": [\n                                                {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out.fifo_cout.1.1(c0, 0, c2, c3, p0, p1, 2 * p0 + c6, 2 * p1 + 4 * c2 + c7, 8 * c0 + c8)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.state_handle()\"\n                                                },\n                                                \"type\": \"user\"\n                                            }\n                                        ],\n                                        \"type\": \"block\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(i_t1/i_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"p\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"q\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.4.2(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p1 + 4 * c2 + c8 + c12, 2 * p0 + 4 * c3 + c9 + c11, 8 * c1 + 2 * c7)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"content\": \"simd\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c10\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c11\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c12\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c0\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c1\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": [\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(r_t1/r_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t1\",\n                                                                \"size\": \"(((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(r_t1/r_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t1\",\n                                                                \"size\": \"(((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(r_t1/r_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p14\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"i_t1\",\n                                                        \"size\": \"(((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t1\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"o_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out_trans.fifo_cout.fifo_cout_local.1.4.1(c0, 1, c2, c3, p0, p1, c6, 0, 0, 0, c10, c11, c12, 0, 2 * p1 + 4 * c2 + c12, 2 * p0 + 4 * c3 + c11, 8 * c0 + 2 * c6 + c10)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p15\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"o_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_cout_local.fifo_cout.1.4.1(c0, 0, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 1, 2 * p1 + 4 * c2 + c12, 2 * p0 + 4 * c3 + c11, 8 * c0 + 2 * c6 + c10)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(r_t1/r_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t1\",\n                                                            \"size\": \"r_t2*c_t2*o_t1\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(r_t1/r_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t1\",\n                                                            \"size\": \"r_t2*c_t2*o_t1\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(r_t1/r_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t1\",\n                                                            \"size\": \"r_t2*c_t2*o_t1\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(r_t1/r_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p15\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t1\",\n                                                            \"size\": \"r_t2*c_t2*o_t1\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p15\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t1\",\n                                                    \"size\": \"r_t2*c_t2*o_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p15\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t1\",\n                                                    \"size\": \"r_t2*c_t2*o_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"o_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.4.1(c0, 1, c2, c3, p0, p1, c6, 3, 2, 2, c10, c11, c12, 1, 2 * p1 + 4 * c2 + c12, 2 * p0 + 4 * c3 + c11, 8 * c0 + 2 * c6 + c10)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c10\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c11\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c12\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(r_t1/r_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t1\",\n                                                            \"size\": \"r_t2*c_t2*o_t1\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(r_t1/r_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p16\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"o_t1\",\n                                                            \"size\": \"r_t2*c_t2*o_t1\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p16\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"o_t1\",\n                                                    \"size\": \"r_t2*c_t2*o_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t1\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"p\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"q\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"o_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"c_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"r_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.8.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c0 + 2 * c6 + c10, c8, c9, 8 * c1 + 2 * c7)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    \"content\": \"hls_pipeline\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"content\": \"simd\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c10\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c11\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c12\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c0\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c1\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c9\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c8\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"pe\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p17\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"i_t1\",\n                                        \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t2)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(c_t1/c_t2))\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"(((((r_t2-1)+(p-1))+1)*(((c_t2-1)+(q-1))+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(r_t1/r_t2))\"\n        },\n        \"cout_IO_L1_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(r_t1/r_t2))\"\n        },\n        \"cout_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p15\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(r_t1/r_t2))\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(r_t1/r_t2))\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t1,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel8_0.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(i_t1/i_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L2_in\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_1_IO_L2_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_1_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(c_t1/c_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"c_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"o_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"r_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": [\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 4 * c1 + c8 + c12, 4 * c2 + 2 * c7 + c9 + c10, 2 * p1 + 8 * c0)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 0, 0, c10, c11, c12, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c7 + c10, 8 * c3 + 2 * c6 + c11)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c3 + 2 * c6 + c11, c8, c9, 2 * p1 + 8 * c0)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"bounds\": [\n                                                                                                \"0\",\n                                                                                                \"i_t2\"\n                                                                                            ],\n                                                                                            \"child\": {\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"S_0(8 * c3 + 2 * c6 + c11, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c7 + c10, 2 * p1 + 8 * c0 + c13, c8, c9)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                \"content\": \"hls_unroll\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            \"iterator\": \"c13\",\n                                                                                            \"type\": \"for\"\n                                                                                        },\n                                                                                        \"content\": \"simd\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c3 + 2 * c6 + c11, c8, c9, 2 * p1 + 8 * c0)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_drain.1.1(1, c1, c2, c3, p0, 3, c6, c7, 2, 2, c10, c11, c12, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c7 + c10, 8 * c3 + 2 * c6 + c11)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 2, 2, c10, c11, c12, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c7 + c10, 8 * c3 + 2 * c6 + c11)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    }\n                                                                                ],\n                                                                                \"type\": \"block\"\n                                                                            },\n                                                                            \"content\": \"hls_pipeline\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.state_handle()\"\n                                                },\n                                                \"type\": \"user\"\n                                            }\n                                        ],\n                                        \"type\": \"block\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"p\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"q\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"c_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"o_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.2.2(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p1 + 4 * c1 + c8 + c12, 4 * c2 + 2 * c7 + c9 + c10, 2 * p0 + 8 * c0)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"content\": \"simd\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c10\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c11\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c12\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c0\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c1\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": [\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(r_t1/r_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(r_t1/r_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(r_t1/r_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p14\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"i_t2\",\n                                                        \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"r_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out_trans.fifo_cout_1.fifo_cout_1_local.1.8.1(1, c1, c2, c3, p0, 0, c6, c7, 0, 0, c10, c11, c12, 0, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c7 + c10, 8 * c3 + 2 * c6 + c11)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c10\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c11\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c12\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"r_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in_trans.fifo_cout_1_local.fifo_cout_1.1.8.1(0, c1, c2, c3, p0, 3, c6, c7, 2, 2, c10, c11, c12, 1, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c7 + c10, 8 * c3 + 2 * c6 + c11)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c10\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c11\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c12\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t2*c_t1*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t2*c_t1*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"child\": [\n                {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"ceil((o/o_t1))\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L3\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"array\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c2\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                }\n            ],\n            \"type\": \"if\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.4.1(1, c1, c2, c3, 3, p1, c6, c7, 2, 2, c10, c11, c12, 1, 2 * p1 + 4 * c1 + c12, 4 * c2 + 2 * c7 + c10, 8 * c3 + 2 * c6 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(r_t1/r_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p17\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"o_t1\",\n                                                        \"size\": \"r_t2*c_t1*o_t1\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(r_t1/r_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p17\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"o_t1\",\n                                                        \"size\": \"r_t2*c_t1*o_t1\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        }\n                                    ],\n                                    \"type\": \"if\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p17\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"o_t1\",\n                                                \"size\": \"r_t2*c_t1*o_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p18\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p18\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"c_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"o_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"r_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.2.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c3 + 2 * c6 + c11, c8, c9, 2 * p0 + 8 * c0)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p18\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"i_t2\",\n                                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(i_t1/i_t2))\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"(((((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(r_t1/r_t2))\"\n        },\n        \"cout_1_IO_L2_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p18\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,16),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p18\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel8_1.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(i_t1/i_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L2_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_1_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(c_t1/c_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"c_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"o_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"r_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": [\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 4 * c1 + c8 + c12, 4 * c2 + 2 * c7 + c9 + c10, 2 * p1 + 8 * c3)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 0, 0, c10, c11, c12, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c7 + c10, 8 * c0 + 2 * c6 + c11)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c6 + c11, c8, c9, 2 * p1 + 8 * c3)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"bounds\": [\n                                                                                                \"0\",\n                                                                                                \"i_t2\"\n                                                                                            ],\n                                                                                            \"child\": {\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"S_0(8 * c0 + 2 * c6 + c11, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c7 + c10, 2 * p1 + 8 * c3 + c13, c8, c9)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                \"content\": \"hls_unroll\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            \"iterator\": \"c13\",\n                                                                                            \"type\": \"for\"\n                                                                                        },\n                                                                                        \"content\": \"simd\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c6 + c11, c8, c9, 2 * p1 + 8 * c3)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 2, 2, c10, c11, c12, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c7 + c10, 8 * c0 + 2 * c6 + c11)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    }\n                                                                                ],\n                                                                                \"type\": \"block\"\n                                                                            },\n                                                                            \"content\": \"hls_pipeline\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.state_handle()\"\n                                                },\n                                                \"type\": \"user\"\n                                            }\n                                        ],\n                                        \"type\": \"block\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"p\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"q\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"c_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"o_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.2.2(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p1 + 4 * c1 + c8 + c12, 4 * c2 + 2 * c7 + c9 + c10, 2 * p0 + 8 * c3)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"content\": \"simd\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c10\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c11\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c12\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c0\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c1\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": [\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(r_t1/r_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(r_t1/r_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(r_t1/r_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p14\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"i_t2\",\n                                                            \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_serialize\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": [\n                            {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.state_handle()\"\n                                },\n                                \"type\": \"user\"\n                            }\n                        ],\n                        \"type\": \"block\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(r_t1/r_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p16\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"o_t1\",\n                                        \"size\": \"r_t2*c_t1*o_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p16\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"o_t1\",\n                                        \"size\": \"r_t2*c_t1*o_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                }\n                            ],\n                            \"type\": \"if\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c6\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"io_L3\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"array\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(c_t1/c_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"o_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"r_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"in_trans_reduce_+.fifo_cout_1_local.fifo_cout_1.1.8.1(c0, c1, c2, c3, p0, 3, c6, c7, 2, 2, c10, c11, c12, 1, 2 * p0 + 4 * c1 + c12, 4 * c2 + 2 * c7 + c10, 8 * c0 + 2 * c6 + c11)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_pipeline\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c10\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c11\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c12\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L1\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L3\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p16\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"o_t1\",\n                                                \"size\": \"r_t2*c_t1*o_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_serialize\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"c_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"o_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"r_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.2.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c0 + 2 * c6 + c11, c8, c9, 2 * p0 + 8 * c3)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p17\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"i_t2\",\n                                                    \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_serialize\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(i_t1/i_t2))\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"(((((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(r_t1/r_t2))\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,16),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel8_2.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(i_t1/i_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L2_in\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_1_IO_L2_out\": {\n            \"dims\": [\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_1_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(r_t1/r_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(c_t1/c_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"c_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"o_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"r_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": [\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 2 * p0 + 4 * c2 + c8 + c12, 4 * c3 + 2 * c7 + c9 + c10, 2 * p1 + 8 * c1)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 0, 0, c10, c11, c12, 2 * p0 + 4 * c2 + c12, 4 * c3 + 2 * c7 + c10, 8 * c0 + 2 * c6 + c11)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c6 + c11, c8, c9, 2 * p1 + 8 * c1)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"bounds\": [\n                                                                                                \"0\",\n                                                                                                \"i_t2\"\n                                                                                            ],\n                                                                                            \"child\": {\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"S_0(8 * c0 + 2 * c6 + c11, 2 * p0 + 4 * c2 + c12, 4 * c3 + 2 * c7 + c10, 2 * p1 + 8 * c1 + c13, c8, c9)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                \"content\": \"hls_unroll\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            \"iterator\": \"c13\",\n                                                                                            \"type\": \"for\"\n                                                                                        },\n                                                                                        \"content\": \"simd\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c6 + c11, c8, c9, 2 * p1 + 8 * c1)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_drain.1.1(c0, 1, c2, c3, p0, 3, c6, c7, 2, 2, c10, c11, c12, 2 * p0 + 4 * c2 + c12, 4 * c3 + 2 * c7 + c10, 8 * c0 + 2 * c6 + c11)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 2, 2, c10, c11, c12, 2 * p0 + 4 * c2 + c12, 4 * c3 + 2 * c7 + c10, 8 * c0 + 2 * c6 + c11)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    }\n                                                                                ],\n                                                                                \"type\": \"block\"\n                                                                            },\n                                                                            \"content\": \"hls_pipeline\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.state_handle()\"\n                                                },\n                                                \"type\": \"user\"\n                                            }\n                                        ],\n                                        \"type\": \"block\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"p\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"q\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"c_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"o_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.2.2(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 0, 2 * p1 + 4 * c2 + c8 + c12, 4 * c3 + 2 * c7 + c9 + c10, 2 * p0 + 8 * c1)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"content\": \"simd\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c10\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c11\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c12\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c0\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c1\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": [\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(r_t1/r_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(r_t1/r_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(r_t1/r_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p14\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"i_t2\",\n                                                        \"size\": \"(((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1)*i_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"r_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out_trans.fifo_cout_1.fifo_cout_1_local.1.8.1(c0, 1, c2, c3, p0, 0, c6, c7, 0, 0, c10, c11, c12, 0, 2 * p0 + 4 * c2 + c12, 4 * c3 + 2 * c7 + c10, 8 * c0 + 2 * c6 + c11)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c10\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c11\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c12\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(c_t1/c_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"c_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"r_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in_trans.fifo_cout_1_local.fifo_cout_1.1.8.1(c0, 0, c2, c3, p0, 3, c6, c7, 2, 2, c10, c11, c12, 1, 2 * p0 + 4 * c2 + c12, 4 * c3 + 2 * c7 + c10, 8 * c0 + 2 * c6 + c11)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c10\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c11\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c12\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t2*c_t1*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(r_t1/r_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t2*c_t1*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"child\": [\n                {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((r/r_t1))\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"ceil((c/c_t1))\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L3\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"array\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c3\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                }\n            ],\n            \"type\": \"if\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(r_t1/r_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t2*c_t1*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(c_t1/c_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"c_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.4.1(c0, 1, c2, c3, 3, p1, c6, c7, 2, 2, c10, c11, c12, 1, 2 * p1 + 4 * c2 + c12, 4 * c3 + 2 * c7 + c10, 8 * c0 + 2 * c6 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(r_t1/r_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p17\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"o_t1\",\n                                                        \"size\": \"r_t2*c_t1*o_t1\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(r_t1/r_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p17\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"o_t1\",\n                                                        \"size\": \"r_t2*c_t1*o_t1\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        }\n                                    ],\n                                    \"type\": \"if\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(r_t1/r_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p17\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"o_t1\",\n                                                \"size\": \"r_t2*c_t1*o_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p18\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p18\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(c_t1/c_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"p\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"q\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"c_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"o_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"r_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.2.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c0 + 2 * c6 + c11, c8, c9, 2 * p0 + 8 * c1)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    \"content\": \"hls_pipeline\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"content\": \"simd\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c10\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c11\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c12\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c0\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c1\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c9\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c8\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"pe\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p18\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"i_t2\",\n                                        \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((r_t1/r_t2)*(i_t1/i_t2))\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"(((((r_t2-1)+(p-1))+1)*((((((c_t1/c_t2)-1)*c_t2)+(c_t2-1))+(q-1))+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(r_t1/r_t2))\"\n        },\n        \"cout_1_IO_L2_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t2*c_t1)*o_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(r_t1/r_t2)\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p18\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,16),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p18\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel9_0.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(i_t1/i_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L2_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_1_IO_L2_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_1_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"o_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"c_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": [\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c7 + c8 + c10, 2 * p0 + 4 * c2 + c9 + c12, 2 * p1 + 8 * c0)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 0, 0, c10, c11, c12, 4 * c1 + 2 * c7 + c10, 2 * p0 + 4 * c2 + c12, 8 * c3 + 2 * c6 + c11)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c3 + 2 * c6 + c11, c8, c9, 2 * p1 + 8 * c0)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"bounds\": [\n                                                                                                \"0\",\n                                                                                                \"i_t2\"\n                                                                                            ],\n                                                                                            \"child\": {\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"S_0(8 * c3 + 2 * c6 + c11, 4 * c1 + 2 * c7 + c10, 2 * p0 + 4 * c2 + c12, 2 * p1 + 8 * c0 + c13, c8, c9)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                \"content\": \"hls_unroll\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            \"iterator\": \"c13\",\n                                                                                            \"type\": \"for\"\n                                                                                        },\n                                                                                        \"content\": \"simd\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c3 + 2 * c6 + c11, c8, c9, 2 * p1 + 8 * c0)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_drain.1.1(1, c1, c2, c3, p0, 3, c6, c7, 2, 2, c10, c11, c12, 4 * c1 + 2 * c7 + c10, 2 * p0 + 4 * c2 + c12, 8 * c3 + 2 * c6 + c11)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 2, 2, c10, c11, c12, 4 * c1 + 2 * c7 + c10, 2 * p0 + 4 * c2 + c12, 8 * c3 + 2 * c6 + c11)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    }\n                                                                                ],\n                                                                                \"type\": \"block\"\n                                                                            },\n                                                                            \"content\": \"hls_pipeline\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.state_handle()\"\n                                                },\n                                                \"type\": \"user\"\n                                            }\n                                        ],\n                                        \"type\": \"block\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"p\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"q\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"r_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"o_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"c_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.2.2(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c1 + 2 * c7 + c8 + c10, 2 * p1 + 4 * c2 + c9 + c12, 2 * p0 + 8 * c0)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"content\": \"simd\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c10\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c11\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c12\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c0\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c1\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": [\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(c_t1/c_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(c_t1/c_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(c_t1/c_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p14\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"i_t2\",\n                                                        \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(r_t1/r_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"r_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"c_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out_trans.fifo_cout_1.fifo_cout_1_local.1.8.1(1, c1, c2, c3, p0, 0, c6, c7, 0, 0, c10, c11, c12, 0, 4 * c1 + 2 * c7 + c10, 2 * p0 + 4 * c2 + c12, 8 * c3 + 2 * c6 + c11)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c10\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c11\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c12\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(r_t1/r_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"r_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"c_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in_trans.fifo_cout_1_local.fifo_cout_1.1.8.1(0, c1, c2, c3, p0, 3, c6, c7, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c7 + c10, 2 * p0 + 4 * c2 + c12, 8 * c3 + 2 * c6 + c11)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c10\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c11\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c12\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t1*c_t2*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t1*c_t2*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"child\": [\n                {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"ceil((o/o_t1))\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L3\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"array\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c2\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                }\n            ],\n            \"type\": \"if\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"r_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.4.1(1, c1, c2, c3, 3, p1, c6, c7, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c7 + c10, 2 * p1 + 4 * c2 + c12, 8 * c3 + 2 * c6 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(c_t1/c_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p17\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"o_t1\",\n                                                        \"size\": \"r_t1*c_t2*o_t1\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(c_t1/c_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p17\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"o_t1\",\n                                                        \"size\": \"r_t1*c_t2*o_t1\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        }\n                                    ],\n                                    \"type\": \"if\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p17\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"o_t1\",\n                                                \"size\": \"r_t1*c_t2*o_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p18\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p18\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(r_t1/r_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"o_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"c_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.2.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c3 + 2 * c6 + c11, c8, c9, 2 * p0 + 8 * c0)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((o/o_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p18\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"i_t2\",\n                                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c2\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(i_t1/i_t2))\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(c_t1/c_t2))\"\n        },\n        \"cout_1_IO_L2_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p18\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,16),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p18\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel9_1.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(i_t1/i_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L2_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_1_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"o_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"c_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": [\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 4 * c1 + 2 * c7 + c8 + c10, 2 * p0 + 4 * c2 + c9 + c12, 2 * p1 + 8 * c3)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 0, 0, c10, c11, c12, 4 * c1 + 2 * c7 + c10, 2 * p0 + 4 * c2 + c12, 8 * c0 + 2 * c6 + c11)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c6 + c11, c8, c9, 2 * p1 + 8 * c3)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"bounds\": [\n                                                                                                \"0\",\n                                                                                                \"i_t2\"\n                                                                                            ],\n                                                                                            \"child\": {\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"S_0(8 * c0 + 2 * c6 + c11, 4 * c1 + 2 * c7 + c10, 2 * p0 + 4 * c2 + c12, 2 * p1 + 8 * c3 + c13, c8, c9)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                \"content\": \"hls_unroll\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            \"iterator\": \"c13\",\n                                                                                            \"type\": \"for\"\n                                                                                        },\n                                                                                        \"content\": \"simd\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c6 + c11, c8, c9, 2 * p1 + 8 * c3)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 2, 2, c10, c11, c12, 4 * c1 + 2 * c7 + c10, 2 * p0 + 4 * c2 + c12, 8 * c0 + 2 * c6 + c11)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    }\n                                                                                ],\n                                                                                \"type\": \"block\"\n                                                                            },\n                                                                            \"content\": \"hls_pipeline\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.state_handle()\"\n                                                },\n                                                \"type\": \"user\"\n                                            }\n                                        ],\n                                        \"type\": \"block\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"p\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"q\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"r_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"o_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"c_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.2.2(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c1 + 2 * c7 + c8 + c10, 2 * p1 + 4 * c2 + c9 + c12, 2 * p0 + 8 * c3)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"content\": \"simd\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c10\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c11\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c12\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c0\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c1\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": [\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(c_t1/c_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(c_t1/c_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(c_t1/c_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p14\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"i_t2\",\n                                                            \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_serialize\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": [\n                            {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.state_handle()\"\n                                },\n                                \"type\": \"user\"\n                            }\n                        ],\n                        \"type\": \"block\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(c_t1/c_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p16\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"o_t1\",\n                                        \"size\": \"r_t1*c_t2*o_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p16\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"o_t1\",\n                                        \"size\": \"r_t1*c_t2*o_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                }\n                            ],\n                            \"type\": \"if\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c6\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"io_L3\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"array\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"r_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"o_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"c_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"in_trans_reduce_+.fifo_cout_1_local.fifo_cout_1.1.8.1(c0, c1, c2, c3, p0, 3, c6, c7, 2, 2, c10, c11, c12, 1, 4 * c1 + 2 * c7 + c10, 2 * p0 + 4 * c2 + c12, 8 * c0 + 2 * c6 + c11)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_pipeline\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c10\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c11\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c12\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L1\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L3\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c5\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p16\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"o_t1\",\n                                                \"size\": \"r_t1*c_t2*o_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_serialize\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(r_t1/r_t2)\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"p\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"q\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"r_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"o_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"c_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.2.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c0 + 2 * c6 + c11, c8, c9, 2 * p0 + 8 * c3)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    \"content\": \"hls_pipeline\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c10\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c11\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c12\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c0\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c1\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p17\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"i_t2\",\n                                                    \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_serialize\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(i_t1/i_t2))\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(c_t1/c_t2))\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,16),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/cnn/kernel9_2.json",
    "content": "{\n    \"attr\": {\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cin_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cin_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_1_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_1_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"w_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"w_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(i_t1/i_t2))\",\n            \"unroll_factor\": \"i_t2\"\n        }\n    },\n    \"io\": {\n        \"cin_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cin_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cin_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L2_in\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_1_IO_L2_out\": {\n            \"dims\": [\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_1_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_1_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(c_t1/c_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"w_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"w_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(o_t1/o_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(r_t1/r_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"p\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"q\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"r_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"bounds\": [\n                                                                    \"0\",\n                                                                    \"o_t2\"\n                                                                ],\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"c_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": [\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_cin.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 4 * c2 + 2 * c7 + c8 + c10, 2 * p0 + 4 * c3 + c9 + c12, 2 * p1 + 8 * c1)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"in.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 0, 0, c10, c11, c12, 4 * c2 + 2 * c7 + c10, 2 * p0 + 4 * c3 + c12, 8 * c0 + 2 * c6 + c11)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"in.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c6 + c11, c8, c9, 2 * p1 + 8 * c1)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"bounds\": [\n                                                                                                \"0\",\n                                                                                                \"i_t2\"\n                                                                                            ],\n                                                                                            \"child\": {\n                                                                                                \"child\": {\n                                                                                                    \"child\": {\n                                                                                                        \"user_expr\": \"S_0(8 * c0 + 2 * c6 + c11, 4 * c2 + 2 * c7 + c10, 2 * p0 + 4 * c3 + c12, 2 * p1 + 8 * c1 + c13, c8, c9)\"\n                                                                                                    },\n                                                                                                    \"type\": \"user\"\n                                                                                                },\n                                                                                                \"content\": \"hls_unroll\",\n                                                                                                \"type\": \"mark\"\n                                                                                            },\n                                                                                            \"iterator\": \"c13\",\n                                                                                            \"type\": \"for\"\n                                                                                        },\n                                                                                        \"content\": \"simd\",\n                                                                                        \"type\": \"mark\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out.fifo_w.2.1(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 8 * c0 + 2 * c6 + c11, c8, c9, 2 * p1 + 8 * c1)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_drain.1.1(c0, 1, c2, c3, p0, 3, c6, c7, 2, 2, c10, c11, c12, 4 * c2 + 2 * c7 + c10, 2 * p0 + 4 * c3 + c12, 8 * c0 + 2 * c6 + c11)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    },\n                                                                                    {\n                                                                                        \"child\": [\n                                                                                            {\n                                                                                                \"child\": {\n                                                                                                    \"user_expr\": \"out.fifo_cout_1.1.1(c0, c1, c2, c3, p0, p1, c6, c7, 2, 2, c10, c11, c12, 4 * c2 + 2 * c7 + c10, 2 * p0 + 4 * c3 + c12, 8 * c0 + 2 * c6 + c11)\"\n                                                                                                },\n                                                                                                \"type\": \"user\"\n                                                                                            }\n                                                                                        ],\n                                                                                        \"type\": \"if\"\n                                                                                    }\n                                                                                ],\n                                                                                \"type\": \"block\"\n                                                                            },\n                                                                            \"content\": \"hls_pipeline\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c10\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"latency\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                \"iterator\": \"c11\",\n                                                                \"type\": \"for\"\n                                                            },\n                                                            \"content\": \"latency\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c12\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c0\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c1\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c9\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"iterator\": \"c8\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            {\n                                                \"child\": {\n                                                    \"user_expr\": \"io_module.state_handle()\"\n                                                },\n                                                \"type\": \"user\"\n                                            }\n                                        ],\n                                        \"type\": \"block\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p14\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"p\"\n                            ],\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"q\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"r_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"o_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"c_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out_trans.fifo_cin.fifo_cin_local.1.2.2(c0, c1, c2, c3, p0, p1, c6, c7, c8, c9, c10, c11, c12, 0, 4 * c2 + 2 * c7 + c8 + c10, 2 * p1 + 4 * c3 + c9 + c12, 2 * p0 + 8 * c1)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"content\": \"simd\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c10\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c11\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c12\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c0\",\n                                \"type\": \"for\"\n                            },\n                            \"iterator\": \"c1\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cin_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": [\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(c_t1/c_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"(c_t1/c_t2)\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"data_pack_factor\": \"p14\",\n                                                                \"ele_size\": 4,\n                                                                \"last_dim\": \"i_t2\",\n                                                                \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t2\",\n                                                                \"type\": \"array_tile\"\n                                                            },\n                                                            \"content\": \"access_coalesce\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"io_L1\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                }\n                                            ],\n                                            \"type\": \"if\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cin_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((r/r_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((c/c_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(c_t1/c_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p14\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"i_t2\",\n                                                        \"size\": \"((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1)*i_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(r_t1/r_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"r_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"c_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out_trans.fifo_cout_1.fifo_cout_1_local.1.8.1(c0, 1, c2, c3, p0, 0, c6, c7, 0, 0, c10, c11, c12, 0, 4 * c2 + 2 * c7 + c10, 2 * p0 + 4 * c3 + c12, 8 * c0 + 2 * c6 + c11)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c10\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c11\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c12\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p16\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L2_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(o_t1/o_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(r_t1/r_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"r_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"o_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"c_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in_trans.fifo_cout_1_local.fifo_cout_1.1.8.1(c0, 0, c2, c3, p0, 3, c6, c7, 2, 2, c10, c11, c12, 1, 4 * c2 + 2 * c7 + c10, 2 * p0 + 4 * c3 + c12, 8 * c0 + 2 * c6 + c11)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c10\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c11\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c12\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c9\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c8\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"cout_1_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t1*c_t2*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_1_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(c_t1/c_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p16\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"o_t1\",\n                                            \"size\": \"r_t1*c_t2*o_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"child\": [\n                {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((o/o_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((r/r_t1))\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"ceil((c/c_t1))\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        \"content\": \"io_L2\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L3\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"array\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c3\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                }\n            ],\n            \"type\": \"if\"\n        },\n        \"cout_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(c_t1/c_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p17\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"o_t1\",\n                                \"size\": \"r_t1*c_t2*o_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c6\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(o_t1/o_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(r_t1/r_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"r_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"o_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"c_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"user_expr\": \"in_trans.fifo_cout_drain_local.fifo_cout_drain.1.4.1(c0, 1, c2, c3, 3, p1, c6, c7, 2, 2, c10, c11, c12, 1, 4 * c2 + 2 * c7 + c10, 2 * p1 + 4 * c3 + c12, 8 * c0 + 2 * c6 + c11)\"\n                                                            },\n                                                            \"type\": \"user\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"simd\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c10\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c11\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c12\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c9\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c8\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"cout_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(c_t1/c_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p17\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"o_t1\",\n                                                        \"size\": \"r_t1*c_t2*o_t1\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(c_t1/c_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p17\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"o_t1\",\n                                                        \"size\": \"r_t1*c_t2*o_t1\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        }\n                                    ],\n                                    \"type\": \"if\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"cout_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((r/r_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((c/c_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(c_t1/c_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p17\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"o_t1\",\n                                                \"size\": \"r_t1*c_t2*o_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c3\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p18\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p18\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"i_t2\",\n                                \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c7\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L2_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((r/r_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((c/c_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(o_t1/o_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(r_t1/r_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"p\"\n                                                ],\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"q\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"r_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"o_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"c_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"out_trans.fifo_w.fifo_w_local.1.2.2(c0, c1, c2, c3, p0, 0, c6, c7, c8, c9, c10, c11, c12, 0, 8 * c0 + 2 * c6 + c11, c8, c9, 2 * p0 + 8 * c1)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    \"content\": \"hls_pipeline\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"content\": \"simd\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c10\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"latency\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c11\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"latency\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c12\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c0\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"iterator\": \"c1\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"iterator\": \"c9\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c8\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"pe\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c4\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"w_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((o/o_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p18\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"i_t2\",\n                                        \"size\": \"o_t1*((p-1)+1)*((q-1)+1)*i_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c5\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"PE\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((c_t1/c_t2)*(i_t1/i_t2))\"\n        },\n        \"cin_IO_L1_in\": {\n            \"array\": \"cin\",\n            \"buf_size\": \"((((((((r_t1/r_t2)-1)*r_t2)+(r_t2-1))+(p-1))+1)*(((c_t2-1)+(q-1))+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p14\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(c_t1/c_t2))\"\n        },\n        \"cout_1_IO_L2_in\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cout_1_IO_L2_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p16\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"cout_drain_IO_L1_out\": {\n            \"array\": \"cout\",\n            \"buf_size\": \"((r_t1*c_t2)*o_t1)\",\n            \"data_pack_factor_inter\": \"p17\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(c_t1/c_t2)\"\n        },\n        \"w_IO_L2_in\": {\n            \"array\": \"w\",\n            \"buf_size\": \"(((o_t1*((p-1)+1))*((q-1)+1))*i_t2)\",\n            \"data_pack_factor_inter\": \"p18\",\n            \"data_pack_factor_intra\": \"i_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"q\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"p\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"o\",\n            \"split_by\": \"o_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"r\",\n            \"split_by\": \"r_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"c\",\n            \"split_by\": \"c_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c\"\n            ],\n            \"name\": \"c_t1\",\n            \"split_by\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o\"\n            ],\n            \"name\": \"o_t1\",\n            \"split_by\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r\"\n            ],\n            \"name\": \"r_t1\",\n            \"split_by\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"c_t1\"\n            ],\n            \"divisors\": [\n                \"c_t1\"\n            ],\n            \"name\": \"c_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"o_t1\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"o_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"r_t1\"\n            ],\n            \"divisors\": [\n                \"r_t1\"\n            ],\n            \"name\": \"r_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(i_t1,8)\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,4),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p14\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p15\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,16),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p16\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(o_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"o_t1\"\n            ],\n            \"name\": \"p17\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"i_t2\",\n                \"max(min(i_t2,16),i_t2)\"\n            ],\n            \"divisors\": [\n                \"i_t2\"\n            ],\n            \"multiples\": [\n                \"i_t2\"\n            ],\n            \"name\": \"p18\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel0_0.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_IO_L1_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"i_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"i_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(j_t1/j_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(k_t1/k_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"j_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"i_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.4.2(c0, c1, c2, p0, c4, c5, c6, c7, 0, 2 * p0 + 32 * c0 + c7, 32 * c1 + 2 * c5)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c5\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L1\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L2\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p9\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"k_t1\",\n                                        \"size\": \"i_t2*k_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p10\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"k_t1\",\n                    \"size\": \"j_t1*k_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(j_t1/j_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(k_t1/k_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"i_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.16.2(c0, c1, c2, 0, c4, c5, c6, c7, 0, 32 * c2 + 2 * c4 + c6, 32 * c1 + 2 * c5)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c5\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(j_t1/j_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"j_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"i_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"out_trans.fifo_C.fifo_C_local.1.4.1(c0, 1, c2, p0, c4, 0, c6, c7, 0, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c4 + c6)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(j_t1/j_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"j_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"i_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"in_trans.fifo_C_local.fifo_C.1.4.1(c0, 0, c2, p0, c4, 15, c6, c7, 1, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c4 + c6)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t1\",\n                                        \"size\": \"i_t2*j_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t1\",\n                                        \"size\": \"i_t2*j_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"user_expr\": \"io_module.intra_inter.0.0()\"\n                            },\n                            \"type\": \"user\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(j_t1/j_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"j_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"i_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"in_trans.fifo_C_drain_local.fifo_C_drain.1.4.1(c0, 1, c2, p0, c4, 15, c6, c7, 1, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c4 + c6)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p12\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t1\",\n                                        \"size\": \"i_t2*j_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"in.fifo_C.1.1(c0, 1, c2, p0, 2 * p0 + 32 * c0 + c4, c5 + 32)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(j_t1/j_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(k_t1/k_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"j_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"i_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": [\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 2 * p0 + 32 * c0 + c7, 32 * c1 + 2 * c5)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c2 + 2 * c4 + c6, 32 * c1 + 2 * c5)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"bounds\": [\n                                                                                    \"0\",\n                                                                                    \"k_t2\"\n                                                                                ],\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"S_0(2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c4 + c6, 32 * c1 + 2 * c5 + c8)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    \"content\": \"hls_unroll\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"iterator\": \"c8\",\n                                                                                \"type\": \"for\"\n                                                                            },\n                                                                            \"content\": \"simd\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": [\n                                                                                {\n                                                                                    \"child\": {\n                                                                                        \"user_expr\": \"out.fifo_C_drain.1.1(c0, 1, c2, p0, c4, 15, c6, c7, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c4 + c6)\"\n                                                                                    },\n                                                                                    \"type\": \"user\"\n                                                                                }\n                                                                            ],\n                                                                            \"type\": \"if\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out.fifo_B.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c2 + 2 * c4 + c6, 32 * c1 + 2 * c5)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        }\n                                                                    ],\n                                                                    \"type\": \"block\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c6\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c7\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c5\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c4\",\n                                        \"type\": \"for\"\n                                    },\n                                    {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out.fifo_C.1.1(c0, 0, c2, p0, 2 * p0 + 32 * c0 + c4, c5)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L1_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t2*k_t1)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"B_IO_L2_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t1*k_t1)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"C_IO_L1_in\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"C_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"data_pack_factor_inter\": \"p12\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"PE\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"p12\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel0_1.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_IO_L1_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"i_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"i_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(j_t1/j_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(k_t1/k_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.4.2(c0, c1, c2, p0, c4, c5, c6, c7, 0, 2 * p0 + 32 * c2 + c7, 32 * c0 + 2 * c5)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p9\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"k_t1\",\n                                            \"size\": \"i_t2*k_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c3\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p10\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"k_t1\",\n                    \"size\": \"j_t1*k_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(j_t1/j_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(k_t1/k_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"i_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.16.2(c0, c1, c2, 0, c4, c5, c6, c7, 0, 32 * c1 + 2 * c4 + c6, 32 * c0 + 2 * c5)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c5\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(j_t1/j_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"j_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"i_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"out_trans.fifo_C.fifo_C_local.1.4.1(1, c1, c2, p0, c4, 0, c6, c7, 0, 2 * p0 + 32 * c2 + c7, 32 * c1 + 2 * c4 + c6)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(j_t1/j_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"j_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"i_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"in_trans.fifo_C_local.fifo_C.1.4.1(0, c1, c2, p0, c4, 15, c6, c7, 1, 2 * p0 + 32 * c2 + c7, 32 * c1 + 2 * c4 + c6)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t1\",\n                                        \"size\": \"i_t2*j_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t1\",\n                                        \"size\": \"i_t2*j_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"user_expr\": \"io_module.intra_inter.0.0()\"\n                            },\n                            \"type\": \"user\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(j_t1/j_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"j_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"i_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"in_trans.fifo_C_drain_local.fifo_C_drain.1.4.1(1, c1, c2, p0, c4, 15, c6, c7, 1, 2 * p0 + 32 * c2 + c7, 32 * c1 + 2 * c4 + c6)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p12\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t1\",\n                                        \"size\": \"i_t2*j_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"in.fifo_C.1.1(1, c1, c2, p0, 2 * p0 + c4 + 32, 32 * c1 + c5)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(j_t1/j_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(k_t1/k_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"j_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"i_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": [\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 2 * p0 + 32 * c2 + c7, 32 * c0 + 2 * c5)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c1 + 2 * c4 + c6, 32 * c0 + 2 * c5)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"bounds\": [\n                                                                                    \"0\",\n                                                                                    \"k_t2\"\n                                                                                ],\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"S_0(2 * p0 + 32 * c2 + c7, 32 * c1 + 2 * c4 + c6, 32 * c0 + 2 * c5 + c8)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    \"content\": \"hls_unroll\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"iterator\": \"c8\",\n                                                                                \"type\": \"for\"\n                                                                            },\n                                                                            \"content\": \"simd\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": [\n                                                                                {\n                                                                                    \"child\": {\n                                                                                        \"user_expr\": \"out.fifo_C_drain.1.1(1, c1, c2, p0, c4, 15, c6, c7, 2 * p0 + 32 * c2 + c7, 32 * c1 + 2 * c4 + c6)\"\n                                                                                    },\n                                                                                    \"type\": \"user\"\n                                                                                }\n                                                                            ],\n                                                                            \"type\": \"if\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out.fifo_B.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c1 + 2 * c4 + c6, 32 * c0 + 2 * c5)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        }\n                                                                    ],\n                                                                    \"type\": \"block\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c6\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c7\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c5\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c4\",\n                                        \"type\": \"for\"\n                                    },\n                                    {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out.fifo_C.1.1(0, c1, c2, p0, 2 * p0 + c4, 32 * c1 + c5)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L1_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t2*k_t1)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"B_IO_L2_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t1*k_t1)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"C_IO_L1_in\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"C_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"data_pack_factor_inter\": \"p12\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"PE\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"p12\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel0_2.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L1_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"i_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"i_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(j_t1/j_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(k_t1/k_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.4.2(c0, c1, c2, p0, c4, c5, c6, c7, 0, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c5)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p9\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"k_t1\",\n                                                \"size\": \"i_t2*k_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_serialize\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c3\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"data_pack_factor\": \"p10\",\n                        \"ele_size\": 4,\n                        \"last_dim\": \"k_t1\",\n                        \"size\": \"j_t1*k_t1\",\n                        \"type\": \"array_tile\"\n                    },\n                    \"content\": \"access_serialize\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(j_t1/j_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(k_t1/k_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"i_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.16.2(c0, c1, c2, 0, c4, c5, c6, c7, 0, 32 * c1 + 2 * c4 + c6, 32 * c2 + 2 * c5)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c5\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"user_expr\": \"io_module.intra_inter.0.0()\"\n                            },\n                            \"type\": \"user\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(j_t1/j_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"j_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"i_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"in_trans.fifo_C_drain_local.fifo_C_drain.1.4.1(c0, c1, 1, p0, c4, 15, c6, c7, 1, 2 * p0 + 32 * c0 + c7, 32 * c1 + 2 * c4 + c6)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p12\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"j_t1\",\n                                            \"size\": \"i_t2*j_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_serialize\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(j_t1/j_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(k_t1/k_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"j_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"i_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": [\n                                                                {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c5)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c1 + 2 * c4 + c6, 32 * c2 + 2 * c5)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"k_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"user_expr\": \"S_0(2 * p0 + 32 * c0 + c7, 32 * c1 + 2 * c4 + c6, 32 * c2 + 2 * c5 + c8)\"\n                                                                                },\n                                                                                \"type\": \"user\"\n                                                                            },\n                                                                            \"content\": \"hls_unroll\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c8\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                {\n                                                                    \"child\": [\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out.fifo_C_drain.1.1(c0, c1, 1, p0, c4, 15, c6, c7, 2 * p0 + 32 * c0 + c7, 32 * c1 + 2 * c4 + c6)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        }\n                                                                    ],\n                                                                    \"type\": \"if\"\n                                                                },\n                                                                {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out.fifo_B.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c1 + 2 * c4 + c6, 32 * c2 + 2 * c5)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                }\n                                                            ],\n                                                            \"type\": \"block\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c5\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L1_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t2*k_t1)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"B_IO_L2_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t1*k_t1)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"data_pack_factor_inter\": \"p12\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"PE\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"p12\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel1_0.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L1_in\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L1_in\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_IO_L1_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p9\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"k_t1\",\n                    \"size\": \"i_t1*k_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"A_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(i_t1/i_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(k_t1/k_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"i_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"j_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.16.2(c0, c1, c2, 0, c4, c5, c6, c7, 0, 32 * c0 + 2 * c4 + c6, 32 * c1 + 2 * c5)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c5\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"j_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"j_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(k_t1/k_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"i_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"j_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.4.2(c0, c1, c2, p0, c4, c5, c6, c7, 0, 2 * p0 + 32 * c2 + c7, 32 * c1 + 2 * c5)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(j_t1/j_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p10\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"k_t1\",\n                                            \"size\": \"j_t2*k_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c3\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"i_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"out_trans.fifo_C.fifo_C_local.1.2.1(c0, 1, c2, p0, c4, 0, c6, c7, 0, 32 * c0 + 2 * c4 + c6, 2 * p0 + 32 * c2 + c7)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"i_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"in_trans.fifo_C_local.fifo_C.1.2.1(c0, 0, c2, p0, c4, 15, c6, c7, 1, 32 * c0 + 2 * c4 + c6, 2 * p0 + 32 * c2 + c7)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t2\",\n                                        \"size\": \"i_t1*j_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t2\",\n                                        \"size\": \"i_t1*j_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"user_expr\": \"io_module.intra_inter.0.0()\"\n                            },\n                            \"type\": \"user\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"i_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"in_trans.fifo_C_drain_local.fifo_C_drain.1.2.1(c0, 1, c2, p0, c4, 15, c6, c7, 1, 32 * c0 + 2 * c4 + c6, 2 * p0 + 32 * c2 + c7)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p12\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t2\",\n                                        \"size\": \"i_t1*j_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"in.fifo_C.1.1(c0, 1, c2, p0, 32 * c0 + c4, 2 * p0 + c5 + 32)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(i_t1/i_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(k_t1/k_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"i_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"j_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": [\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c0 + 2 * c4 + c6, 32 * c1 + 2 * c5)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 2 * p0 + 32 * c2 + c7, 32 * c1 + 2 * c5)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"bounds\": [\n                                                                                    \"0\",\n                                                                                    \"k_t2\"\n                                                                                ],\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"S_0(32 * c0 + 2 * c4 + c6, 2 * p0 + 32 * c2 + c7, 32 * c1 + 2 * c5 + c8)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    \"content\": \"hls_unroll\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"iterator\": \"c8\",\n                                                                                \"type\": \"for\"\n                                                                            },\n                                                                            \"content\": \"simd\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": [\n                                                                                {\n                                                                                    \"child\": {\n                                                                                        \"user_expr\": \"out.fifo_C_drain.1.1(c0, 1, c2, p0, c4, 15, c6, c7, 32 * c0 + 2 * c4 + c6, 2 * p0 + 32 * c2 + c7)\"\n                                                                                    },\n                                                                                    \"type\": \"user\"\n                                                                                }\n                                                                            ],\n                                                                            \"type\": \"if\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out.fifo_A.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c0 + 2 * c4 + c6, 32 * c1 + 2 * c5)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        }\n                                                                    ],\n                                                                    \"type\": \"block\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c6\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c7\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c5\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c4\",\n                                        \"type\": \"for\"\n                                    },\n                                    {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out.fifo_C.1.1(c0, 0, c2, p0, 32 * c0 + c4, 2 * p0 + c5)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L2_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t1*k_t1)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"B_IO_L1_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t2*k_t1)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"C_IO_L1_in\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"C_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"data_pack_factor_inter\": \"p12\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"PE\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p12\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel1_1.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L1_in\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L1_in\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_IO_L1_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p9\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"k_t1\",\n                    \"size\": \"i_t1*k_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"A_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(i_t1/i_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(k_t1/k_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"i_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"j_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.16.2(c0, c1, c2, 0, c4, c5, c6, c7, 0, 32 * c2 + 2 * c4 + c6, 32 * c0 + 2 * c5)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c5\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"j_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"j_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(k_t1/k_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"i_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"j_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.4.2(c0, c1, c2, p0, c4, c5, c6, c7, 0, 2 * p0 + 32 * c1 + c7, 32 * c0 + 2 * c5)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c5\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L1\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L2\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p10\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"k_t1\",\n                                        \"size\": \"j_t2*k_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"i_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"out_trans.fifo_C.fifo_C_local.1.2.1(1, c1, c2, p0, c4, 0, c6, c7, 0, 32 * c2 + 2 * c4 + c6, 2 * p0 + 32 * c1 + c7)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"i_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"in_trans.fifo_C_local.fifo_C.1.2.1(0, c1, c2, p0, c4, 15, c6, c7, 1, 32 * c2 + 2 * c4 + c6, 2 * p0 + 32 * c1 + c7)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t2\",\n                                        \"size\": \"i_t1*j_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t2\",\n                                        \"size\": \"i_t1*j_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"user_expr\": \"io_module.intra_inter.0.0()\"\n                            },\n                            \"type\": \"user\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"i_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"in_trans.fifo_C_drain_local.fifo_C_drain.1.2.1(1, c1, c2, p0, c4, 15, c6, c7, 1, 32 * c2 + 2 * c4 + c6, 2 * p0 + 32 * c1 + c7)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p12\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t2\",\n                                        \"size\": \"i_t1*j_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"in.fifo_C.1.1(1, c1, c2, p0, c4 + 32, 2 * p0 + 32 * c1 + c5)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(i_t1/i_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(k_t1/k_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"i_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"bounds\": [\n                                                                \"0\",\n                                                                \"j_t2\"\n                                                            ],\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"child\": [\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c2 + 2 * c4 + c6, 32 * c0 + 2 * c5)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 2 * p0 + 32 * c1 + c7, 32 * c0 + 2 * c5)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"bounds\": [\n                                                                                    \"0\",\n                                                                                    \"k_t2\"\n                                                                                ],\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"child\": {\n                                                                                            \"user_expr\": \"S_0(32 * c2 + 2 * c4 + c6, 2 * p0 + 32 * c1 + c7, 32 * c0 + 2 * c5 + c8)\"\n                                                                                        },\n                                                                                        \"type\": \"user\"\n                                                                                    },\n                                                                                    \"content\": \"hls_unroll\",\n                                                                                    \"type\": \"mark\"\n                                                                                },\n                                                                                \"iterator\": \"c8\",\n                                                                                \"type\": \"for\"\n                                                                            },\n                                                                            \"content\": \"simd\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": [\n                                                                                {\n                                                                                    \"child\": {\n                                                                                        \"user_expr\": \"out.fifo_C_drain.1.1(1, c1, c2, p0, c4, 15, c6, c7, 32 * c2 + 2 * c4 + c6, 2 * p0 + 32 * c1 + c7)\"\n                                                                                    },\n                                                                                    \"type\": \"user\"\n                                                                                }\n                                                                            ],\n                                                                            \"type\": \"if\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out.fifo_A.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c2 + 2 * c4 + c6, 32 * c0 + 2 * c5)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        }\n                                                                    ],\n                                                                    \"type\": \"block\"\n                                                                },\n                                                                \"content\": \"hls_pipeline\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            \"iterator\": \"c6\",\n                                                            \"type\": \"for\"\n                                                        },\n                                                        \"content\": \"latency\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c7\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c5\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"iterator\": \"c4\",\n                                        \"type\": \"for\"\n                                    },\n                                    {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out.fifo_C.1.1(0, c1, c2, p0, c4, 2 * p0 + 32 * c1 + c5)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L2_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t1*k_t1)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"B_IO_L1_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t2*k_t1)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"C_IO_L1_in\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"C_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"data_pack_factor_inter\": \"p12\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"PE\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p12\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel1_2.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L1_in\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"data_pack_factor\": \"p9\",\n                        \"ele_size\": 4,\n                        \"last_dim\": \"k_t1\",\n                        \"size\": \"i_t1*k_t1\",\n                        \"type\": \"array_tile\"\n                    },\n                    \"content\": \"access_serialize\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"A_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(i_t1/i_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(k_t1/k_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"i_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"j_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.16.2(c0, c1, c2, 0, c4, c5, c6, c7, 0, 32 * c0 + 2 * c4 + c6, 32 * c2 + 2 * c5)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c5\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"j_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"j_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(k_t1/k_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"i_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"j_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.4.2(c0, c1, c2, p0, c4, c5, c6, c7, 0, 2 * p0 + 32 * c1 + c7, 32 * c2 + 2 * c5)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(j_t1/j_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p10\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"k_t1\",\n                                                \"size\": \"j_t2*k_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_serialize\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c3\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"user_expr\": \"io_module.intra_inter.0.0()\"\n                            },\n                            \"type\": \"user\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"i_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"in_trans.fifo_C_drain_local.fifo_C_drain.1.2.1(c0, c1, 1, p0, c4, 15, c6, c7, 1, 32 * c0 + 2 * c4 + c6, 2 * p0 + 32 * c1 + c7)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p12\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"j_t2\",\n                                            \"size\": \"i_t1*j_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_serialize\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(k_t1/k_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"i_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"j_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": [\n                                                                {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c0 + 2 * c4 + c6, 32 * c2 + 2 * c5)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 2 * p0 + 32 * c1 + c7, 32 * c2 + 2 * c5)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"k_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"user_expr\": \"S_0(32 * c0 + 2 * c4 + c6, 2 * p0 + 32 * c1 + c7, 32 * c2 + 2 * c5 + c8)\"\n                                                                                },\n                                                                                \"type\": \"user\"\n                                                                            },\n                                                                            \"content\": \"hls_unroll\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c8\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                {\n                                                                    \"child\": [\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out.fifo_C_drain.1.1(c0, c1, 1, p0, c4, 15, c6, c7, 32 * c0 + 2 * c4 + c6, 2 * p0 + 32 * c1 + c7)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        }\n                                                                    ],\n                                                                    \"type\": \"if\"\n                                                                },\n                                                                {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out.fifo_A.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c0 + 2 * c4 + c6, 32 * c2 + 2 * c5)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                }\n                                                            ],\n                                                            \"type\": \"block\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c5\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L2_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t1*k_t1)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"B_IO_L1_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t2*k_t1)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"data_pack_factor_inter\": \"p12\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"PE\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p12\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel2_0.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"B_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(k_t1/k_t2)\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L1_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L1_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"C_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(j_t1/j_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"j_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"i_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.2.2(c0, c1, c2, p0, c4, c5, c6, c7, 0, 32 * c0 + 2 * c4 + c7, 2 * p0 + 32 * c1)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c5\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L1\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L2\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(k_t1/k_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p9\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"k_t2\",\n                                        \"size\": \"i_t1*k_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(j_t1/j_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.2.2(c0, c1, c2, p0, c4, c5, c6, c7, 0, 32 * c2 + 2 * c5 + c6, 2 * p0 + 32 * c1)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p10\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"k_t2\",\n                                            \"size\": \"j_t1*k_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c3\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": [\n                            {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.state_handle()\"\n                                },\n                                \"type\": \"user\"\n                            }\n                        ],\n                        \"type\": \"block\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p11\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"j_t1\",\n                    \"size\": \"i_t1*j_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(i_t1/i_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"i_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out_trans.fifo_C.fifo_C_local.1.16.1(c0, 1, c2, 0, c4, c5, c6, c7, 0, 32 * c0 + 2 * c4 + c7, 32 * c2 + 2 * c5 + c6)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c5\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": [\n                            {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.1.1()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.state_handle()\"\n                                },\n                                \"type\": \"user\"\n                            }\n                        ],\n                        \"type\": \"block\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p11\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"j_t1\",\n                    \"size\": \"i_t1*j_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(i_t1/i_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"i_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_C_local.fifo_C.1.16.1(c0, 0, c2, 15, c4, c5, c6, c7, 1, 32 * c0 + 2 * c4 + c7, 32 * c2 + 2 * c5 + c6)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c5\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"user_expr\": \"io_module.intra_inter.0.0()\"\n                            },\n                            \"type\": \"user\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"child\": {\n                \"child\": [\n                    {\n                        \"child\": {\n                            \"data_pack_factor\": \"p12\",\n                            \"ele_size\": 4,\n                            \"last_dim\": \"j_t1\",\n                            \"size\": \"i_t1*j_t1\",\n                            \"type\": \"array_tile\"\n                        },\n                        \"content\": \"access_coalesce\",\n                        \"type\": \"mark\"\n                    },\n                    {\n                        \"child\": {\n                            \"data_pack_factor\": \"p12\",\n                            \"ele_size\": 4,\n                            \"last_dim\": \"j_t1\",\n                            \"size\": \"i_t1*j_t1\",\n                            \"type\": \"array_tile\"\n                        },\n                        \"content\": \"access_coalesce\",\n                        \"type\": \"mark\"\n                    }\n                ],\n                \"type\": \"if\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(j_t1/j_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"in_trans.fifo_C_drain_local.fifo_C_drain.1.4.1(c0, 1, c2, 15, c4, c5, c6, c7, 1, 32 * c0 + 2 * c4 + c7, 32 * c2 + 2 * c5 + c6)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"data_pack_factor\": \"p12\",\n                                    \"ele_size\": 4,\n                                    \"last_dim\": \"j_t1\",\n                                    \"size\": \"i_t1*j_t1\",\n                                    \"type\": \"array_tile\"\n                                },\n                                \"content\": \"access_coalesce\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L1\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(j_t1/j_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"j_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"i_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": [\n                                                                {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c0 + 2 * c4 + c7, 2 * p0 + 32 * c1)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c2 + 2 * c5 + c6, 2 * p0 + 32 * c1)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                {\n                                                                    \"child\": [\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"in.fifo_C.1.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c0 + 2 * c4 + c7, 32 * c2 + 2 * c5 + c6)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        }\n                                                                    ],\n                                                                    \"type\": \"if\"\n                                                                },\n                                                                {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"k_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"user_expr\": \"S_0(32 * c0 + 2 * c4 + c7, 32 * c2 + 2 * c5 + c6, 2 * p0 + 32 * c1 + c8)\"\n                                                                                },\n                                                                                \"type\": \"user\"\n                                                                            },\n                                                                            \"content\": \"hls_unroll\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c8\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                {\n                                                                    \"child\": [\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out.fifo_C_drain.1.1(c0, 1, c2, 15, c4, c5, c6, c7, 32 * c0 + 2 * c4 + c7, 32 * c2 + 2 * c5 + c6)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out.fifo_C.1.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c0 + 2 * c4 + c7, 32 * c2 + 2 * c5 + c6)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        }\n                                                                    ],\n                                                                    \"type\": \"if\"\n                                                                }\n                                                            ],\n                                                            \"type\": \"block\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c5\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L1_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t1*k_t2)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(k_t1/k_t2)\"\n        },\n        \"B_IO_L1_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t1*k_t2)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(k_t1/k_t2)\"\n        },\n        \"C_IO_L2_in\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t1)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"C_IO_L2_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t1)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t1)\",\n            \"data_pack_factor_inter\": \"p12\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t1,16),1)\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"p12\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel2_1.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"B_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(k_t1/k_t2)\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L1_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L1_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"C_drain_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(j_t1/j_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.2.2(c0, c1, c2, p0, c4, c5, c6, c7, 0, 32 * c2 + 2 * c4 + c7, 2 * p0 + 32 * c0)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p9\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"k_t2\",\n                                            \"size\": \"i_t1*k_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c3\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(j_t1/j_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"j_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"i_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.2.2(c0, c1, c2, p0, c4, c5, c6, c7, 0, 32 * c1 + 2 * c5 + c6, 2 * p0 + 32 * c0)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c5\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L1\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L2\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(k_t1/k_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p10\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"k_t2\",\n                                        \"size\": \"j_t1*k_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L1\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": [\n                            {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.inter_intra.1.1()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.state_handle()\"\n                                },\n                                \"type\": \"user\"\n                            }\n                        ],\n                        \"type\": \"block\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_in_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p11\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"j_t1\",\n                    \"size\": \"i_t1*j_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(i_t1/i_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"i_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"out_trans.fifo_C.fifo_C_local.1.16.1(1, c1, c2, 0, c4, c5, c6, c7, 0, 32 * c2 + 2 * c4 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c5\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": [\n                            {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.1.1()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.state_handle()\"\n                                },\n                                \"type\": \"user\"\n                            }\n                        ],\n                        \"type\": \"block\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"data_pack_factor\": \"p11\",\n                    \"ele_size\": 4,\n                    \"last_dim\": \"j_t1\",\n                    \"size\": \"i_t1*j_t1\",\n                    \"type\": \"array_tile\"\n                },\n                \"content\": \"access_coalesce\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(i_t1/i_t2)\"\n                        ],\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"i_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"user_expr\": \"in_trans.fifo_C_local.fifo_C.1.16.1(0, c1, c2, 15, c4, c5, c6, c7, 1, 32 * c2 + 2 * c4 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                        },\n                                                        \"type\": \"user\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"simd\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c6\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c7\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c5\",\n                            \"type\": \"for\"\n                        },\n                        \"iterator\": \"c4\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"user_expr\": \"io_module.intra_inter.0.0()\"\n                            },\n                            \"type\": \"user\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"child\": {\n                \"child\": [\n                    {\n                        \"child\": {\n                            \"data_pack_factor\": \"p12\",\n                            \"ele_size\": 4,\n                            \"last_dim\": \"j_t1\",\n                            \"size\": \"i_t1*j_t1\",\n                            \"type\": \"array_tile\"\n                        },\n                        \"content\": \"access_coalesce\",\n                        \"type\": \"mark\"\n                    },\n                    {\n                        \"child\": {\n                            \"data_pack_factor\": \"p12\",\n                            \"ele_size\": 4,\n                            \"last_dim\": \"j_t1\",\n                            \"size\": \"i_t1*j_t1\",\n                            \"type\": \"array_tile\"\n                        },\n                        \"content\": \"access_coalesce\",\n                        \"type\": \"mark\"\n                    }\n                ],\n                \"type\": \"if\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(j_t1/j_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"in_trans.fifo_C_drain_local.fifo_C_drain.1.4.1(1, c1, c2, 15, c4, c5, c6, c7, 1, 32 * c2 + 2 * c4 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"data_pack_factor\": \"p12\",\n                                    \"ele_size\": 4,\n                                    \"last_dim\": \"j_t1\",\n                                    \"size\": \"i_t1*j_t1\",\n                                    \"type\": \"array_tile\"\n                                },\n                                \"content\": \"access_coalesce\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L1\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(j_t1/j_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"j_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"i_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": [\n                                                                {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c2 + 2 * c4 + c7, 2 * p0 + 32 * c0)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c1 + 2 * c5 + c6, 2 * p0 + 32 * c0)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                {\n                                                                    \"child\": [\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"in.fifo_C.1.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c2 + 2 * c4 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        }\n                                                                    ],\n                                                                    \"type\": \"if\"\n                                                                },\n                                                                {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"k_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"user_expr\": \"S_0(32 * c2 + 2 * c4 + c7, 32 * c1 + 2 * c5 + c6, 2 * p0 + 32 * c0 + c8)\"\n                                                                                },\n                                                                                \"type\": \"user\"\n                                                                            },\n                                                                            \"content\": \"hls_unroll\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c8\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                {\n                                                                    \"child\": [\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out.fifo_C_drain.1.1(1, c1, c2, 15, c4, c5, c6, c7, 32 * c2 + 2 * c4 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"out.fifo_C.1.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c2 + 2 * c4 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        }\n                                                                    ],\n                                                                    \"type\": \"if\"\n                                                                }\n                                                            ],\n                                                            \"type\": \"block\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c5\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L1_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t1*k_t2)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(k_t1/k_t2)\"\n        },\n        \"B_IO_L1_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t1*k_t2)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(k_t1/k_t2)\"\n        },\n        \"C_IO_L2_in\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t1)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"C_IO_L2_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t1)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t1)\",\n            \"data_pack_factor_inter\": \"p12\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t1,16),1)\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"p12\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel2_2.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"B_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 1\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"(k_t1/k_t2)\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L1_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L1_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L2_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(j_t1/j_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.2.2(c0, c1, c2, p0, c4, c5, c6, c7, 0, 32 * c0 + 2 * c4 + c7, 2 * p0 + 32 * c2)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p9\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"k_t2\",\n                                                \"size\": \"i_t1*k_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_serialize\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c3\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(j_t1/j_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.2.2(c0, c1, c2, p0, c4, c5, c6, c7, 0, 32 * c1 + 2 * c5 + c6, 2 * p0 + 32 * c2)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c4\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p10\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"k_t2\",\n                                                \"size\": \"j_t1*k_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_serialize\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c3\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"user_expr\": \"io_module.intra_inter.1.1()\"\n                            },\n                            \"type\": \"user\"\n                        },\n                        {\n                            \"child\": {\n                                \"user_expr\": \"io_module.state_handle()\"\n                            },\n                            \"type\": \"user\"\n                        }\n                    ],\n                    \"type\": \"block\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"data_pack_factor\": \"p11\",\n                            \"ele_size\": 4,\n                            \"last_dim\": \"j_t1\",\n                            \"size\": \"i_t1*j_t1\",\n                            \"type\": \"array_tile\"\n                        },\n                        \"content\": \"access_serialize\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"access_coalesce\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"array\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_out_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(j_t1/j_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"j_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"i_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in_trans_reduce_+.fifo_C_local.fifo_C.1.16.1(c0, c1, c2, 15, c4, c5, c6, c7, 1, 32 * c0 + 2 * c4 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c5\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L1\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L2\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(j_t1/j_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"j_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"i_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": [\n                                                                {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c0 + 2 * c4 + c7, 2 * p0 + 32 * c2)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c1 + 2 * c5 + c6, 2 * p0 + 32 * c2)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                },\n                                                                {\n                                                                    \"child\": [\n                                                                        {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"in.fifo_C.1.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c0 + 2 * c4 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        }\n                                                                    ],\n                                                                    \"type\": \"if\"\n                                                                },\n                                                                {\n                                                                    \"child\": {\n                                                                        \"bounds\": [\n                                                                            \"0\",\n                                                                            \"k_t2\"\n                                                                        ],\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"user_expr\": \"S_0(32 * c0 + 2 * c4 + c7, 32 * c1 + 2 * c5 + c6, 2 * p0 + 32 * c2 + c8)\"\n                                                                                },\n                                                                                \"type\": \"user\"\n                                                                            },\n                                                                            \"content\": \"hls_unroll\",\n                                                                            \"type\": \"mark\"\n                                                                        },\n                                                                        \"iterator\": \"c8\",\n                                                                        \"type\": \"for\"\n                                                                    },\n                                                                    \"content\": \"simd\",\n                                                                    \"type\": \"mark\"\n                                                                },\n                                                                {\n                                                                    \"child\": {\n                                                                        \"user_expr\": \"out.fifo_C.1.1(c0, c1, c2, p0, c4, c5, c6, c7, 32 * c0 + 2 * c4 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                                    },\n                                                                    \"type\": \"user\"\n                                                                }\n                                                            ],\n                                                            \"type\": \"block\"\n                                                        },\n                                                        \"content\": \"hls_pipeline\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c5\",\n                                    \"type\": \"for\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L1_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t1*k_t2)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(k_t1/k_t2)\"\n        },\n        \"B_IO_L1_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t1*k_t2)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(k_t1/k_t2)\"\n        },\n        \"C_IO_L2_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t1)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"1\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t1,16),1)\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel3_0.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"C_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"C_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(j_t1/j_t2))\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"A_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"B_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L1_in\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\",\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_IO_L1_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\",\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_IO_L2_in\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_IO_L2_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\",\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"i_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"i_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(k_t1/k_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"j_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"i_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.16.2(c0, c1, c2, p0, 0, c5, c6, c7, 0, 2 * p0 + 32 * c0 + c7, 32 * c1 + 2 * c5)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c5\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L1\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L3\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p9\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"k_t1\",\n                                        \"size\": \"i_t2*k_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"j_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"j_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c4\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(k_t1/k_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.16.2(c0, c1, c2, p0, 0, c5, c6, c7, 0, 2 * p0 + 32 * c2 + c6, 32 * c1 + 2 * c5)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(j_t1/j_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p10\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"k_t1\",\n                                            \"size\": \"j_t2*k_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t2*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t2*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"j_t2\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"i_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"user_expr\": \"out_trans.fifo_C.fifo_C_local.1.2.1(c0, 1, c2, p0, p1, 0, c6, c7, 0, 2 * p1 + 32 * c0 + c7, 2 * p0 + 32 * c2 + c6)\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        \"content\": \"hls_pipeline\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"simd\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"latency\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t2*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t2*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"j_t2\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"i_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"user_expr\": \"in_trans.fifo_C_local.fifo_C.1.2.1(c0, 0, c2, p0, p1, 15, c6, c7, 1, 2 * p1 + 32 * c0 + c7, 2 * p0 + 32 * c2 + c6)\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        \"content\": \"hls_pipeline\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"simd\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"latency\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p11\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"j_t2\",\n                                                        \"size\": \"i_t2*j_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c3\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p11\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"j_t2\",\n                                                        \"size\": \"i_t2*j_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c3\",\n                                            \"type\": \"for\"\n                                        }\n                                    ],\n                                    \"type\": \"if\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p11\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"j_t2\",\n                                                        \"size\": \"i_t2*j_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c3\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p11\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"j_t2\",\n                                                        \"size\": \"i_t2*j_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c3\",\n                                            \"type\": \"for\"\n                                        }\n                                    ],\n                                    \"type\": \"if\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p11\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"j_t2\",\n                                                \"size\": \"i_t2*j_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c3\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p11\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"j_t2\",\n                                                \"size\": \"i_t2*j_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c3\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t2*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t2*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"j_t2\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"i_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"user_expr\": \"in_trans.fifo_C_drain_local.fifo_C_drain.1.2.1(c0, 1, c2, p0, p1, 15, c6, c7, 1, 2 * p1 + 32 * c0 + c7, 2 * p0 + 32 * c2 + c6)\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        \"content\": \"hls_pipeline\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"simd\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"latency\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p12\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"j_t2\",\n                                                        \"size\": \"i_t2*j_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c3\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p12\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"j_t2\",\n                                                        \"size\": \"i_t2*j_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c3\",\n                                            \"type\": \"for\"\n                                        }\n                                    ],\n                                    \"type\": \"if\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p12\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"j_t2\",\n                                                \"size\": \"i_t2*j_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c3\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"in.fifo_C.1.1(c0, 1, c2, p0, p1, 2 * p0 + 32 * c0 + c5, 2 * p1 + c6 + 32)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(k_t1/k_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"j_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"i_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": [\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c0 + c7, 32 * c1 + 2 * c5)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p1 + 32 * c2 + c6, 32 * c1 + 2 * c5)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"k_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"user_expr\": \"S_0(2 * p0 + 32 * c0 + c7, 2 * p1 + 32 * c2 + c6, 32 * c1 + 2 * c5 + c8)\"\n                                                                                    },\n                                                                                    \"type\": \"user\"\n                                                                                },\n                                                                                \"content\": \"hls_unroll\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c8\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"simd\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    {\n                                                                        \"child\": [\n                                                                            {\n                                                                                \"child\": {\n                                                                                    \"user_expr\": \"out.fifo_C_drain.1.1(c0, 1, c2, p0, p1, 15, c6, c7, 2 * p0 + 32 * c0 + c7, 2 * p1 + 32 * c2 + c6)\"\n                                                                                },\n                                                                                \"type\": \"user\"\n                                                                            }\n                                                                        ],\n                                                                        \"type\": \"if\"\n                                                                    },\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p1 + 32 * c2 + c6, 32 * c1 + 2 * c5)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c0 + c7, 32 * c1 + 2 * c5)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    }\n                                                                ],\n                                                                \"type\": \"block\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c6\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c7\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c5\",\n                                        \"type\": \"for\"\n                                    },\n                                    {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out.fifo_C.1.1(c0, 0, c2, p0, p1, 2 * p0 + 32 * c0 + c5, 2 * p1 + c6)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L2_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t2*k_t1)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"B_IO_L2_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t2*k_t1)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"C_IO_L1_in\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t2)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((j_t1/j_t2)*(i_t1/i_t2))\"\n        },\n        \"C_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t2)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((j_t1/j_t2)*(i_t1/i_t2))\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t2)\",\n            \"data_pack_factor_inter\": \"p12\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((j_t1/j_t2)*(i_t1/i_t2))\"\n        },\n        \"PE\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(j_t1/j_t2))\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p12\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel3_1.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L1_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"C_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"C_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(j_t1/j_t2))\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"A_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"B_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L1_in\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\",\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_IO_L1_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\",\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_IO_L2_in\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_IO_L2_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\",\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"i_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"i_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(k_t1/k_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.16.2(c0, c1, c2, p0, 0, c5, c6, c7, 0, 2 * p0 + 32 * c2 + c7, 32 * c0 + 2 * c5)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"A_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p9\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"k_t1\",\n                                            \"size\": \"i_t2*k_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c3\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"j_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"j_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c4\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(k_t1/k_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"j_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"i_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.16.2(c0, c1, c2, p0, 0, c5, c6, c7, 0, 2 * p0 + 32 * c1 + c6, 32 * c0 + 2 * c5)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c5\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L1\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L3\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p10\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"k_t1\",\n                                        \"size\": \"j_t2*k_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t2*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t2*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"j_t2\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"i_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"user_expr\": \"out_trans.fifo_C.fifo_C_local.1.2.1(1, c1, c2, p0, p1, 0, c6, c7, 0, 2 * p1 + 32 * c2 + c7, 2 * p0 + 32 * c1 + c6)\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        \"content\": \"hls_pipeline\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"simd\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"latency\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t2*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t2*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"j_t2\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"i_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"user_expr\": \"in_trans.fifo_C_local.fifo_C.1.2.1(0, c1, c2, p0, p1, 15, c6, c7, 1, 2 * p1 + 32 * c2 + c7, 2 * p0 + 32 * c1 + c6)\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        \"content\": \"hls_pipeline\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"simd\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"latency\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p11\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"j_t2\",\n                                                        \"size\": \"i_t2*j_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c3\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p11\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"j_t2\",\n                                                        \"size\": \"i_t2*j_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c3\",\n                                            \"type\": \"for\"\n                                        }\n                                    ],\n                                    \"type\": \"if\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p11\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"j_t2\",\n                                                        \"size\": \"i_t2*j_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c3\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p11\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"j_t2\",\n                                                        \"size\": \"i_t2*j_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c3\",\n                                            \"type\": \"for\"\n                                        }\n                                    ],\n                                    \"type\": \"if\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p11\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"j_t2\",\n                                                \"size\": \"i_t2*j_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c3\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p11\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"j_t2\",\n                                                \"size\": \"i_t2*j_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c3\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t2*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t2*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"j_t2\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"i_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"user_expr\": \"in_trans.fifo_C_drain_local.fifo_C_drain.1.2.1(1, c1, c2, p0, p1, 15, c6, c7, 1, 2 * p1 + 32 * c2 + c7, 2 * p0 + 32 * c1 + c6)\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        \"content\": \"hls_pipeline\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"simd\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"latency\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p12\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"j_t2\",\n                                                        \"size\": \"i_t2*j_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c3\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p12\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"j_t2\",\n                                                        \"size\": \"i_t2*j_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c3\",\n                                            \"type\": \"for\"\n                                        }\n                                    ],\n                                    \"type\": \"if\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p12\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"j_t2\",\n                                                \"size\": \"i_t2*j_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c3\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"in.fifo_C.1.1(1, c1, c2, p0, p1, 2 * p0 + c5 + 32, 2 * p1 + 32 * c1 + c6)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(k_t1/k_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"j_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"bounds\": [\n                                                            \"0\",\n                                                            \"i_t2\"\n                                                        ],\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": [\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c2 + c7, 32 * c0 + 2 * c5)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p1 + 32 * c1 + c6, 32 * c0 + 2 * c5)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"bounds\": [\n                                                                                \"0\",\n                                                                                \"k_t2\"\n                                                                            ],\n                                                                            \"child\": {\n                                                                                \"child\": {\n                                                                                    \"child\": {\n                                                                                        \"user_expr\": \"S_0(2 * p0 + 32 * c2 + c7, 2 * p1 + 32 * c1 + c6, 32 * c0 + 2 * c5 + c8)\"\n                                                                                    },\n                                                                                    \"type\": \"user\"\n                                                                                },\n                                                                                \"content\": \"hls_unroll\",\n                                                                                \"type\": \"mark\"\n                                                                            },\n                                                                            \"iterator\": \"c8\",\n                                                                            \"type\": \"for\"\n                                                                        },\n                                                                        \"content\": \"simd\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    {\n                                                                        \"child\": [\n                                                                            {\n                                                                                \"child\": {\n                                                                                    \"user_expr\": \"out.fifo_C_drain.1.1(1, c1, c2, p0, p1, 15, c6, c7, 2 * p0 + 32 * c2 + c7, 2 * p1 + 32 * c1 + c6)\"\n                                                                                },\n                                                                                \"type\": \"user\"\n                                                                            }\n                                                                        ],\n                                                                        \"type\": \"if\"\n                                                                    },\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p1 + 32 * c1 + c6, 32 * c0 + 2 * c5)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c2 + c7, 32 * c0 + 2 * c5)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    }\n                                                                ],\n                                                                \"type\": \"block\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"iterator\": \"c6\",\n                                                        \"type\": \"for\"\n                                                    },\n                                                    \"content\": \"latency\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c7\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c5\",\n                                        \"type\": \"for\"\n                                    },\n                                    {\n                                        \"child\": [\n                                            {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out.fifo_C.1.1(0, c1, c2, p0, p1, 2 * p0 + c5, 2 * p1 + 32 * c1 + c6)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L2_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t2*k_t1)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"B_IO_L2_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t2*k_t1)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"C_IO_L1_in\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t2)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((j_t1/j_t2)*(i_t1/i_t2))\"\n        },\n        \"C_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t2)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((j_t1/j_t2)*(i_t1/i_t2))\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t2)\",\n            \"data_pack_factor_inter\": \"p12\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((j_t1/j_t2)*(i_t1/i_t2))\"\n        },\n        \"PE\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(j_t1/j_t2))\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p12\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel3_2.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(j_t1/j_t2))\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"A_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"B_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\",\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"i_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"i_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(k_t1/k_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.16.2(c0, c1, c2, p0, 0, c5, c6, c7, 0, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c5)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"A_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p9\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"k_t1\",\n                                                \"size\": \"i_t2*k_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_serialize\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c3\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"j_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t1\",\n                                \"size\": \"j_t2*k_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c4\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(k_t1/k_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.16.2(c0, c1, c2, p0, 0, c5, c6, c7, 0, 2 * p0 + 32 * c1 + c6, 32 * c2 + 2 * c5)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(j_t1/j_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p10\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"k_t1\",\n                                                \"size\": \"j_t2*k_t1\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_serialize\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                },\n                                \"type\": \"user\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t2*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t2*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"j_t2\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"i_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"user_expr\": \"in_trans.fifo_C_drain_local.fifo_C_drain.1.2.1(c0, c1, 1, p0, p1, 15, c6, c7, 1, 2 * p1 + 32 * c0 + c7, 2 * p0 + 32 * c1 + c6)\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        \"content\": \"hls_pipeline\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"simd\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c6\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c7\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"latency\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p12\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"j_t2\",\n                                                        \"size\": \"i_t2*j_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c3\",\n                                            \"type\": \"for\"\n                                        },\n                                        {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"(i_t1/i_t2)\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p12\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"j_t2\",\n                                                        \"size\": \"i_t2*j_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_coalesce\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"io_L1\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c3\",\n                                            \"type\": \"for\"\n                                        }\n                                    ],\n                                    \"type\": \"if\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p12\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"j_t2\",\n                                                    \"size\": \"i_t2*j_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_serialize\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"access_coalesce\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"io_L1\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c3\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"j_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"i_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": [\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c5)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p1 + 32 * c1 + c6, 32 * c2 + 2 * c5)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"k_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"S_0(2 * p0 + 32 * c0 + c7, 2 * p1 + 32 * c1 + c6, 32 * c2 + 2 * c5 + c8)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_unroll\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c8\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            {\n                                                                \"child\": [\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out.fifo_C_drain.1.1(c0, c1, 1, p0, p1, 15, c6, c7, 2 * p0 + 32 * c0 + c7, 2 * p1 + 32 * c1 + c6)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    }\n                                                                ],\n                                                                \"type\": \"if\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p1 + 32 * c1 + c6, 32 * c2 + 2 * c5)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c5)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            }\n                                                        ],\n                                                        \"type\": \"block\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c7\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c5\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L2_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t2*k_t1)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"B_IO_L2_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t2*k_t1)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t2)\",\n            \"data_pack_factor_inter\": \"p12\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((j_t1/j_t2)*(i_t1/i_t2))\"\n        },\n        \"PE\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t2)\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(j_t1/j_t2))\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t1,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p12\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel4_0.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"A_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(k_t1/k_t2))\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L1_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\",\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"A_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"B_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_IO_L2_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\",\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"C_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t2*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t2*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(j_t1/j_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"j_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"i_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.2.2(c0, c1, c2, p0, p1, c5, c6, c7, 0, 2 * p1 + 32 * c0 + c7, 2 * p0 + 32 * c1)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c5\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(i_t1/i_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p9\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"k_t2\",\n                                                            \"size\": \"i_t2*k_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c3\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(i_t1/i_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p9\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"k_t2\",\n                                                            \"size\": \"i_t2*k_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c3\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(i_t1/i_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p9\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"k_t2\",\n                                                    \"size\": \"i_t2*k_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c3\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c4\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(j_t1/j_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.2.2(c0, c1, c2, p0, 0, c5, c6, c7, 0, 32 * c2 + 2 * c5 + c6, 2 * p0 + 32 * c1)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p10\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"k_t2\",\n                                            \"size\": \"j_t1*k_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(j_t1/j_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_C.fifo_C_local.1.16.1(c0, 1, c2, p0, 0, c5, c6, c7, 0, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c5 + c6)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(j_t1/j_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"in_trans.fifo_C_local.fifo_C.1.16.1(c0, 0, c2, p0, 15, c5, c6, c7, 1, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c5 + c6)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t1\",\n                                        \"size\": \"i_t2*j_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t1\",\n                                        \"size\": \"i_t2*j_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"child\": [\n                {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((j/j_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c1\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                }\n            ],\n            \"type\": \"if\"\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(j_t1/j_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"j_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"i_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"in_trans.fifo_C_drain_local.fifo_C_drain.1.4.1(c0, 1, c2, 15, p1, c5, c6, c7, 1, 2 * p1 + 32 * c0 + c7, 32 * c2 + 2 * c5 + c6)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c5\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(i_t1/i_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p12\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"j_t1\",\n                                                    \"size\": \"i_t2*j_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c3\",\n                                        \"type\": \"for\"\n                                    },\n                                    {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(i_t1/i_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p12\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"j_t1\",\n                                                    \"size\": \"i_t2*j_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c3\",\n                                        \"type\": \"for\"\n                                    }\n                                ],\n                                \"type\": \"if\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p12\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"j_t1\",\n                                            \"size\": \"i_t2*j_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c3\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(j_t1/j_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"j_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"i_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": [\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c0 + c7, 2 * p1 + 32 * c1)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c2 + 2 * c5 + c6, 2 * p1 + 32 * c1)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": [\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"in.fifo_C.1.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c5 + c6)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    }\n                                                                ],\n                                                                \"type\": \"if\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"k_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"S_0(2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c5 + c6, 2 * p1 + 32 * c1 + c8)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_unroll\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c8\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            {\n                                                                \"child\": [\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out.fifo_C_drain.1.1(c0, 1, c2, p0, 15, c5, c6, c7, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c5 + c6)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out.fifo_C.1.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c0 + c7, 32 * c2 + 2 * c5 + c6)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    }\n                                                                ],\n                                                                \"type\": \"if\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c2 + 2 * c5 + c6, 2 * p1 + 32 * c1)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            }\n                                                        ],\n                                                        \"type\": \"block\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c7\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c5\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L1_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t2*k_t2)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((k_t1/k_t2)*(i_t1/i_t2))\"\n        },\n        \"B_IO_L2_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t1*k_t2)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(k_t1/k_t2)\"\n        },\n        \"C_IO_L2_in\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"C_IO_L2_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"data_pack_factor_inter\": \"p12\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t1,16),1)\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"p12\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel4_1.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"A_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(k_t1/k_t2))\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L1_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\",\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"A_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"B_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L2_in\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_IO_L2_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\",\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"C_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t2*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t2*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(j_t1/j_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"j_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"i_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.2.2(c0, c1, c2, p0, p1, c5, c6, c7, 0, 2 * p1 + 32 * c2 + c7, 2 * p0 + 32 * c0)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c5\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(i_t1/i_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p9\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"k_t2\",\n                                                            \"size\": \"i_t2*k_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c3\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(i_t1/i_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p9\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"k_t2\",\n                                                            \"size\": \"i_t2*k_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c3\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(i_t1/i_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p9\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"k_t2\",\n                                                    \"size\": \"i_t2*k_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c3\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c4\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(j_t1/j_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"j_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"i_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.2.2(c0, c1, c2, p0, 0, c5, c6, c7, 0, 32 * c1 + 2 * c5 + c6, 2 * p0 + 32 * c0)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c5\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L1\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L3\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(k_t1/k_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p10\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"k_t2\",\n                                        \"size\": \"j_t1*k_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(j_t1/j_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_C.fifo_C_local.1.16.1(1, c1, c2, p0, 0, c5, c6, c7, 0, 2 * p0 + 32 * c2 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(j_t1/j_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"in_trans.fifo_C_local.fifo_C.1.16.1(0, c1, c2, p0, 15, c5, c6, c7, 1, 2 * p0 + 32 * c2 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t1\",\n                                        \"size\": \"i_t2*j_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t1\",\n                                        \"size\": \"i_t2*j_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"child\": [\n                {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c0\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                }\n            ],\n            \"type\": \"if\"\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t1\",\n                                \"size\": \"i_t2*j_t1\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(j_t1/j_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"j_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"i_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"in_trans.fifo_C_drain_local.fifo_C_drain.1.4.1(1, c1, c2, 15, p1, c5, c6, c7, 1, 2 * p1 + 32 * c2 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c5\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(i_t1/i_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p12\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"j_t1\",\n                                                    \"size\": \"i_t2*j_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c3\",\n                                        \"type\": \"for\"\n                                    },\n                                    {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(i_t1/i_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p12\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"j_t1\",\n                                                    \"size\": \"i_t2*j_t1\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c3\",\n                                        \"type\": \"for\"\n                                    }\n                                ],\n                                \"type\": \"if\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p12\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"j_t1\",\n                                            \"size\": \"i_t2*j_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c3\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(j_t1/j_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"j_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"i_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": [\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c2 + c7, 2 * p1 + 32 * c0)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c1 + 2 * c5 + c6, 2 * p1 + 32 * c0)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": [\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"in.fifo_C.1.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c2 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    }\n                                                                ],\n                                                                \"type\": \"if\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"k_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"S_0(2 * p0 + 32 * c2 + c7, 32 * c1 + 2 * c5 + c6, 2 * p1 + 32 * c0 + c8)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_unroll\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c8\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            {\n                                                                \"child\": [\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out.fifo_C_drain.1.1(1, c1, c2, p0, 15, c5, c6, c7, 2 * p0 + 32 * c2 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out.fifo_C.1.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c2 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    }\n                                                                ],\n                                                                \"type\": \"if\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c1 + 2 * c5 + c6, 2 * p1 + 32 * c0)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            }\n                                                        ],\n                                                        \"type\": \"block\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c7\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c5\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L1_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t2*k_t2)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((k_t1/k_t2)*(i_t1/i_t2))\"\n        },\n        \"B_IO_L2_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t1*k_t2)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(k_t1/k_t2)\"\n        },\n        \"C_IO_L2_in\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"C_IO_L2_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"data_pack_factor_inter\": \"p12\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t1,16),1)\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t1,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"p12\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel4_2.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"A_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((i_t1/i_t2)*(k_t1/k_t2))\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L1_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\",\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"A_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"B_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L2_out\": {\n            \"dims\": [\n                \"(i_t1/i_t2)\"\n            ]\n        },\n        \"C_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(i_t1/i_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t2*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t2*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(j_t1/j_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"j_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"i_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.2.2(c0, c1, c2, p0, p1, c5, c6, c7, 0, 2 * p1 + 32 * c0 + c7, 2 * p0 + 32 * c2)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c5\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(i_t1/i_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p9\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"k_t2\",\n                                                            \"size\": \"i_t2*k_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c3\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(i_t1/i_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p9\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"k_t2\",\n                                                            \"size\": \"i_t2*k_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c3\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(i_t1/i_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p9\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"k_t2\",\n                                                        \"size\": \"i_t2*k_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_serialize\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c3\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c4\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(j_t1/j_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"j_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.2.2(c0, c1, c2, p0, 0, c5, c6, c7, 0, 32 * c1 + 2 * c5 + c6, 2 * p0 + 32 * c2)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p10\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"k_t2\",\n                                                \"size\": \"j_t1*k_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_serialize\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"user_expr\": \"io_module.intra_inter.0.1()\"\n                            },\n                            \"type\": \"user\"\n                        },\n                        {\n                            \"child\": {\n                                \"user_expr\": \"io_module.state_handle()\"\n                            },\n                            \"type\": \"user\"\n                        }\n                    ],\n                    \"type\": \"block\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t1\",\n                                        \"size\": \"i_t2*j_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t1\",\n                                        \"size\": \"i_t2*j_t1\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                }\n                            ],\n                            \"type\": \"if\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"io_L3\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"array\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_out_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(j_t1/j_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"j_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"i_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in_trans_reduce_+.fifo_C_local.fifo_C.1.16.1(c0, c1, c2, p0, 15, c5, c6, c7, 1, 2 * p0 + 32 * c0 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c5\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L1\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L3\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(i_t1/i_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p11\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"j_t1\",\n                                            \"size\": \"i_t2*j_t1\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_serialize\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(j_t1/j_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"j_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"i_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": [\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c0 + c7, 2 * p1 + 32 * c2)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c1 + 2 * c5 + c6, 2 * p1 + 32 * c2)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": [\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"in.fifo_C.1.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c0 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    }\n                                                                ],\n                                                                \"type\": \"if\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"k_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"S_0(2 * p0 + 32 * c0 + c7, 32 * c1 + 2 * c5 + c6, 2 * p1 + 32 * c2 + c8)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_unroll\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c8\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out.fifo_C.1.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c0 + c7, 32 * c1 + 2 * c5 + c6)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c1 + 2 * c5 + c6, 2 * p1 + 32 * c2)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            }\n                                                        ],\n                                                        \"type\": \"block\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c7\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c5\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L1_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t2*k_t2)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((k_t1/k_t2)*(i_t1/i_t2))\"\n        },\n        \"B_IO_L2_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t1*k_t2)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(k_t1/k_t2)\"\n        },\n        \"C_IO_L2_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t2*j_t1)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(i_t1/i_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t1,16),1)\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel5_0.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"B_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"B_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((j_t1/j_t2)*(k_t1/k_t2))\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"A_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L1_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\",\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"B_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L2_in\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_IO_L2_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\",\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"C_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c4\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"i_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"j_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.2.2(c0, c1, c2, p0, 0, c5, c6, c7, 0, 32 * c0 + 2 * c5 + c6, 2 * p0 + 32 * c1)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c5\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L1\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L3\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(k_t1/k_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p9\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"k_t2\",\n                                        \"size\": \"i_t1*k_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c4\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t2*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t2*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"i_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.2.2(c0, c1, c2, p0, p1, c5, c6, c7, 0, 2 * p1 + 32 * c2 + c7, 2 * p0 + 32 * c1)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c5\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(j_t1/j_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p10\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"k_t2\",\n                                                            \"size\": \"j_t2*k_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c3\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(j_t1/j_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p10\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"k_t2\",\n                                                            \"size\": \"j_t2*k_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c3\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(j_t1/j_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p10\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"k_t2\",\n                                                    \"size\": \"j_t2*k_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c3\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(i_t1/i_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"i_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"j_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_C.fifo_C_local.1.2.1(c0, 1, c2, p0, 0, c5, c6, c7, 0, 32 * c0 + 2 * c5 + c6, 2 * p0 + 32 * c2 + c7)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(i_t1/i_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"i_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"j_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"in_trans.fifo_C_local.fifo_C.1.2.1(c0, 0, c2, p0, 15, c5, c6, c7, 1, 32 * c0 + 2 * c5 + c6, 2 * p0 + 32 * c2 + c7)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t2\",\n                                        \"size\": \"i_t1*j_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t2\",\n                                        \"size\": \"i_t1*j_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"child\": [\n                {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((j/j_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c1\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                }\n            ],\n            \"type\": \"if\"\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"i_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"in_trans.fifo_C_drain_local.fifo_C_drain.1.2.1(c0, 1, c2, 15, p1, c5, c6, c7, 1, 32 * c0 + 2 * c5 + c6, 2 * p1 + 32 * c2 + c7)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c5\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(j_t1/j_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p12\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"j_t2\",\n                                                    \"size\": \"i_t1*j_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c3\",\n                                        \"type\": \"for\"\n                                    },\n                                    {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(j_t1/j_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p12\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"j_t2\",\n                                                    \"size\": \"i_t1*j_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c3\",\n                                        \"type\": \"for\"\n                                    }\n                                ],\n                                \"type\": \"if\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(j_t1/j_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p12\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"j_t2\",\n                                            \"size\": \"i_t1*j_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c3\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((k/k_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"j_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": [\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c0 + 2 * c5 + c6, 2 * p1 + 32 * c1)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c2 + c7, 2 * p1 + 32 * c1)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": [\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"in.fifo_C.1.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c0 + 2 * c5 + c6, 2 * p0 + 32 * c2 + c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    }\n                                                                ],\n                                                                \"type\": \"if\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"k_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"S_0(32 * c0 + 2 * c5 + c6, 2 * p0 + 32 * c2 + c7, 2 * p1 + 32 * c1 + c8)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_unroll\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c8\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            {\n                                                                \"child\": [\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out.fifo_C_drain.1.1(c0, 1, c2, p0, 15, c5, c6, c7, 32 * c0 + 2 * c5 + c6, 2 * p0 + 32 * c2 + c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out.fifo_C.1.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c0 + 2 * c5 + c6, 2 * p0 + 32 * c2 + c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    }\n                                                                ],\n                                                                \"type\": \"if\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c0 + 2 * c5 + c6, 2 * p1 + 32 * c1)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            }\n                                                        ],\n                                                        \"type\": \"block\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c7\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c5\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c2\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L2_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t1*k_t2)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(k_t1/k_t2)\"\n        },\n        \"B_IO_L1_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t2*k_t2)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((k_t1/k_t2)*(j_t1/j_t2))\"\n        },\n        \"C_IO_L2_in\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"C_IO_L2_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"data_pack_factor_inter\": \"p12\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,16),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p12\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel5_1.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"B_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"B_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L1_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_drain_IO_L2_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"C_drain_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((j_t1/j_t2)*(k_t1/k_t2))\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"A_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L1_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\",\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"B_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L2_in\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_IO_L2_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_drain_IO_L1_out\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\",\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_drain_IO_L2_out\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"C_drain_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c4\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(i_t1/i_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"i_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"j_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.2.2(c0, c1, c2, p0, 0, c5, c6, c7, 0, 32 * c2 + 2 * c5 + c6, 2 * p0 + 32 * c0)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"A_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p9\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"k_t2\",\n                                            \"size\": \"i_t1*k_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t2*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t2*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"i_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.2.2(c0, c1, c2, p0, p1, c5, c6, c7, 0, 2 * p1 + 32 * c1 + c7, 2 * p0 + 32 * c0)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c5\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(j_t1/j_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p10\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"k_t2\",\n                                                            \"size\": \"j_t2*k_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c3\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(j_t1/j_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p10\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"k_t2\",\n                                                            \"size\": \"j_t2*k_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c3\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(j_t1/j_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p10\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"k_t2\",\n                                                    \"size\": \"j_t2*k_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c3\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(i_t1/i_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"i_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"j_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_C.fifo_C_local.1.2.1(1, c1, c2, p0, 0, c5, c6, c7, 0, 32 * c2 + 2 * c5 + c6, 2 * p0 + 32 * c1 + c7)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.intra_inter.0.1()\"\n                                    },\n                                    \"type\": \"user\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"user_expr\": \"io_module.state_handle()\"\n                                    },\n                                    \"type\": \"user\"\n                                }\n                            ],\n                            \"type\": \"block\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p11\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(i_t1/i_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"i_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"j_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"in_trans.fifo_C_local.fifo_C.1.2.1(0, c1, c2, p0, 15, c5, c6, c7, 1, 32 * c2 + 2 * c5 + c6, 2 * p0 + 32 * c1 + c7)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t2\",\n                                        \"size\": \"i_t1*j_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t2\",\n                                        \"size\": \"i_t1*j_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"child\": [\n                {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((j/j_t1))\"\n                    ],\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"ceil((i/i_t1))\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.intra_inter.0.0()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L3\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"array\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c0\",\n                        \"type\": \"for\"\n                    },\n                    \"iterator\": \"c1\",\n                    \"type\": \"for\"\n                }\n            ],\n            \"type\": \"if\"\n        },\n        \"C_drain_IO_L1_out_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p12\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"j_t2\",\n                                \"size\": \"i_t1*j_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L1_out_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"i_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"in_trans.fifo_C_drain_local.fifo_C_drain.1.2.1(1, c1, c2, 15, p1, c5, c6, c7, 1, 32 * c2 + 2 * c5 + c6, 2 * p1 + 32 * c1 + c7)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c5\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"C_drain_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(j_t1/j_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p12\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"j_t2\",\n                                                    \"size\": \"i_t1*j_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c3\",\n                                        \"type\": \"for\"\n                                    },\n                                    {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(j_t1/j_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"data_pack_factor\": \"p12\",\n                                                    \"ele_size\": 4,\n                                                    \"last_dim\": \"j_t2\",\n                                                    \"size\": \"i_t1*j_t2\",\n                                                    \"type\": \"array_tile\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c3\",\n                                        \"type\": \"for\"\n                                    }\n                                ],\n                                \"type\": \"if\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"C_drain_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((j/j_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((i/i_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(j_t1/j_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p12\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"j_t2\",\n                                            \"size\": \"i_t1*j_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L1\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c3\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L2\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c0\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c1\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((i/i_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"j_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": [\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c2 + 2 * c5 + c6, 2 * p1 + 32 * c0)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c1 + c7, 2 * p1 + 32 * c0)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": [\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"in.fifo_C.1.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c2 + 2 * c5 + c6, 2 * p0 + 32 * c1 + c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    }\n                                                                ],\n                                                                \"type\": \"if\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"k_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"S_0(32 * c2 + 2 * c5 + c6, 2 * p0 + 32 * c1 + c7, 2 * p1 + 32 * c0 + c8)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_unroll\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c8\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            {\n                                                                \"child\": [\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out.fifo_C_drain.1.1(1, c1, c2, p0, 15, c5, c6, c7, 32 * c2 + 2 * c5 + c6, 2 * p0 + 32 * c1 + c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    },\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"out.fifo_C.1.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c2 + 2 * c5 + c6, 2 * p0 + 32 * c1 + c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    }\n                                                                ],\n                                                                \"type\": \"if\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c2 + 2 * c5 + c6, 2 * p1 + 32 * c0)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            }\n                                                        ],\n                                                        \"type\": \"block\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c7\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c5\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c0\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L2_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t1*k_t2)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(k_t1/k_t2)\"\n        },\n        \"B_IO_L1_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t2*k_t2)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((k_t1/k_t2)*(j_t1/j_t2))\"\n        },\n        \"C_IO_L2_in\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"C_IO_L2_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        },\n        \"C_drain_IO_L1_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"data_pack_factor_inter\": \"p12\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 0,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,16),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,4),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p12\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/designs_lib/gemm/kernel5_2.json",
    "content": "{\n    \"attr\": {\n        \"A_IO_L2_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L2_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"A_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"B_IO_L1_in\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L1_in_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"B_IO_L2_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 1,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        },\n        \"B_IO_L3_in\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 1,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"C_IO_L2_out\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_inter\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L2_out_intra\": {\n            \"double_buffer\": 1,\n            \"filter\": 1,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 1\n        },\n        \"C_IO_L3_out\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": 0,\n            \"io\": 1,\n            \"serialize\": 1,\n            \"to_dram\": 1,\n            \"to_pe\": 0\n        },\n        \"PE\": {\n            \"double_buffer\": 0,\n            \"filter\": 0,\n            \"in\": -1,\n            \"io\": 0,\n            \"serialize\": 0,\n            \"to_dram\": 0,\n            \"to_pe\": 0\n        }\n    },\n    \"compute\": {\n        \"PE\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"ele_type\": \"float\",\n            \"num\": \"((j_t1/j_t2)*(k_t1/k_t2))\",\n            \"unroll_factor\": \"k_t2\"\n        }\n    },\n    \"io\": {\n        \"A_IO_L2_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"A_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"B_IO_L1_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\",\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"B_IO_L2_in\": {\n            \"dims\": [\n                \"(k_t1/k_t2)\"\n            ]\n        },\n        \"B_IO_L3_in\": {\n            \"dims\": [\n                \"1\"\n            ]\n        },\n        \"C_IO_L2_out\": {\n            \"dims\": [\n                \"(j_t1/j_t2)\"\n            ]\n        },\n        \"C_IO_L3_out\": {\n            \"dims\": [\n                \"1\"\n            ]\n        }\n    },\n    \"latency\": {\n        \"A_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": [\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                        },\n                                        \"type\": \"user\"\n                                    },\n                                    {\n                                        \"child\": {\n                                            \"user_expr\": \"io_module.state_handle()\"\n                                        },\n                                        \"type\": \"user\"\n                                    }\n                                ],\n                                \"type\": \"block\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(k_t1/k_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p9\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"i_t1*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L2\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c4\",\n            \"type\": \"for\"\n        },\n        \"A_IO_L2_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"bounds\": [\n                            \"0\",\n                            \"(i_t1/i_t2)\"\n                        ],\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"i_t2\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"j_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"user_expr\": \"out_trans.fifo_A.fifo_A_local.1.2.2(c0, c1, c2, p0, 0, c5, c6, c7, 0, 32 * c0 + 2 * c5 + c6, 2 * p0 + 32 * c2)\"\n                                                    },\n                                                    \"type\": \"user\"\n                                                },\n                                                \"content\": \"hls_pipeline\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"simd\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c6\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c7\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"latency\",\n                            \"type\": \"mark\"\n                        },\n                        \"iterator\": \"c5\",\n                        \"type\": \"for\"\n                    },\n                    \"content\": \"pe\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L2\",\n            \"type\": \"mark\"\n        },\n        \"A_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"data_pack_factor\": \"p9\",\n                                                \"ele_size\": 4,\n                                                \"last_dim\": \"k_t2\",\n                                                \"size\": \"i_t1*k_t2\",\n                                                \"type\": \"array_tile\"\n                                            },\n                                            \"content\": \"access_serialize\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"access_coalesce\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": [\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.inter_intra.0.1()\"\n                                            },\n                                            \"type\": \"user\"\n                                        },\n                                        {\n                                            \"child\": {\n                                                \"user_expr\": \"io_module.state_handle()\"\n                                            },\n                                            \"type\": \"user\"\n                                        }\n                                    ],\n                                    \"type\": \"block\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_inter\": {\n            \"bounds\": [\n                \"0\",\n                \"(j_t1/j_t2)\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t2*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        },\n                        {\n                            \"child\": {\n                                \"data_pack_factor\": \"p10\",\n                                \"ele_size\": 4,\n                                \"last_dim\": \"k_t2\",\n                                \"size\": \"j_t2*k_t2\",\n                                \"type\": \"array_tile\"\n                            },\n                            \"content\": \"access_coalesce\",\n                            \"type\": \"mark\"\n                        }\n                    ],\n                    \"type\": \"if\"\n                },\n                \"content\": \"io_L1\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c3\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L1_in_intra\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(i_t1/i_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"i_t2\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"j_t2\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"user_expr\": \"out_trans.fifo_B.fifo_B_local.1.2.2(c0, c1, c2, p0, p1, c5, c6, c7, 0, 2 * p1 + 32 * c1 + c7, 2 * p0 + 32 * c2)\"\n                                                },\n                                                \"type\": \"user\"\n                                            },\n                                            \"content\": \"hls_pipeline\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"content\": \"simd\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c6\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"latency\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c7\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"latency\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c5\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"pe\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"io_L1\",\n            \"type\": \"mark\"\n        },\n        \"B_IO_L2_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": [\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(j_t1/j_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p10\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"k_t2\",\n                                                            \"size\": \"j_t2*k_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c3\",\n                                                \"type\": \"for\"\n                                            },\n                                            {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"(j_t1/j_t2)\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"data_pack_factor\": \"p10\",\n                                                            \"ele_size\": 4,\n                                                            \"last_dim\": \"k_t2\",\n                                                            \"size\": \"j_t2*k_t2\",\n                                                            \"type\": \"array_tile\"\n                                                        },\n                                                        \"content\": \"access_coalesce\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"content\": \"io_L1\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c3\",\n                                                \"type\": \"for\"\n                                            }\n                                        ],\n                                        \"type\": \"if\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"B_IO_L3_in\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(k_t1/k_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"(j_t1/j_t2)\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"data_pack_factor\": \"p10\",\n                                                        \"ele_size\": 4,\n                                                        \"last_dim\": \"k_t2\",\n                                                        \"size\": \"j_t2*k_t2\",\n                                                        \"type\": \"array_tile\"\n                                                    },\n                                                    \"content\": \"access_serialize\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"content\": \"access_coalesce\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"content\": \"io_L1\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c3\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"io_L2\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c4\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"io_L3\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": [\n                        {\n                            \"child\": {\n                                \"user_expr\": \"io_module.intra_inter.0.1()\"\n                            },\n                            \"type\": \"user\"\n                        },\n                        {\n                            \"child\": {\n                                \"user_expr\": \"io_module.state_handle()\"\n                            },\n                            \"type\": \"user\"\n                        }\n                    ],\n                    \"type\": \"block\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L2_out_inter\": {\n            \"child\": {\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"(j_t1/j_t2)\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": [\n                                {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t2\",\n                                        \"size\": \"i_t1*j_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                {\n                                    \"child\": {\n                                        \"data_pack_factor\": \"p11\",\n                                        \"ele_size\": 4,\n                                        \"last_dim\": \"j_t2\",\n                                        \"size\": \"i_t1*j_t2\",\n                                        \"type\": \"array_tile\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                }\n                            ],\n                            \"type\": \"if\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c3\",\n                    \"type\": \"for\"\n                },\n                \"content\": \"io_L3\",\n                \"type\": \"mark\"\n            },\n            \"content\": \"array\",\n            \"type\": \"mark\"\n        },\n        \"C_IO_L2_out_intra\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((k/k_t1))\"\n            ],\n            \"child\": {\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"child\": {\n                                    \"bounds\": [\n                                        \"0\",\n                                        \"(i_t1/i_t2)\"\n                                    ],\n                                    \"child\": {\n                                        \"child\": {\n                                            \"bounds\": [\n                                                \"0\",\n                                                \"i_t2\"\n                                            ],\n                                            \"child\": {\n                                                \"child\": {\n                                                    \"bounds\": [\n                                                        \"0\",\n                                                        \"j_t2\"\n                                                    ],\n                                                    \"child\": {\n                                                        \"child\": {\n                                                            \"child\": {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in_trans_reduce_+.fifo_C_local.fifo_C.1.2.1(c0, c1, c2, p0, 15, c5, c6, c7, 1, 32 * c0 + 2 * c5 + c6, 2 * p0 + 32 * c1 + c7)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            \"content\": \"hls_pipeline\",\n                                                            \"type\": \"mark\"\n                                                        },\n                                                        \"content\": \"simd\",\n                                                        \"type\": \"mark\"\n                                                    },\n                                                    \"iterator\": \"c6\",\n                                                    \"type\": \"for\"\n                                                },\n                                                \"content\": \"latency\",\n                                                \"type\": \"mark\"\n                                            },\n                                            \"iterator\": \"c7\",\n                                            \"type\": \"for\"\n                                        },\n                                        \"content\": \"latency\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"iterator\": \"c5\",\n                                    \"type\": \"for\"\n                                },\n                                \"content\": \"pe\",\n                                \"type\": \"mark\"\n                            },\n                            \"content\": \"io_L1\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"io_L2\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"io_L3\",\n                    \"type\": \"mark\"\n                },\n                \"content\": \"array\",\n                \"type\": \"mark\"\n            },\n            \"iterator\": \"c2\",\n            \"type\": \"for\"\n        },\n        \"C_IO_L3_out\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"child\": {\n                        \"child\": {\n                            \"bounds\": [\n                                \"0\",\n                                \"(j_t1/j_t2)\"\n                            ],\n                            \"child\": {\n                                \"child\": {\n                                    \"child\": {\n                                        \"child\": {\n                                            \"data_pack_factor\": \"p11\",\n                                            \"ele_size\": 4,\n                                            \"last_dim\": \"j_t2\",\n                                            \"size\": \"i_t1*j_t2\",\n                                            \"type\": \"array_tile\"\n                                        },\n                                        \"content\": \"access_serialize\",\n                                        \"type\": \"mark\"\n                                    },\n                                    \"content\": \"access_coalesce\",\n                                    \"type\": \"mark\"\n                                },\n                                \"content\": \"io_L2\",\n                                \"type\": \"mark\"\n                            },\n                            \"iterator\": \"c3\",\n                            \"type\": \"for\"\n                        },\n                        \"content\": \"io_L3\",\n                        \"type\": \"mark\"\n                    },\n                    \"content\": \"array\",\n                    \"type\": \"mark\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        },\n        \"PE\": {\n            \"bounds\": [\n                \"0\",\n                \"ceil((i/i_t1))\"\n            ],\n            \"child\": {\n                \"bounds\": [\n                    \"0\",\n                    \"ceil((j/j_t1))\"\n                ],\n                \"child\": {\n                    \"bounds\": [\n                        \"0\",\n                        \"ceil((k/k_t1))\"\n                    ],\n                    \"child\": {\n                        \"child\": {\n                            \"child\": {\n                                \"bounds\": [\n                                    \"0\",\n                                    \"(i_t1/i_t2)\"\n                                ],\n                                \"child\": {\n                                    \"child\": {\n                                        \"bounds\": [\n                                            \"0\",\n                                            \"i_t2\"\n                                        ],\n                                        \"child\": {\n                                            \"child\": {\n                                                \"bounds\": [\n                                                    \"0\",\n                                                    \"j_t2\"\n                                                ],\n                                                \"child\": {\n                                                    \"child\": {\n                                                        \"child\": [\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c0 + 2 * c5 + c6, 2 * p1 + 32 * c2)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"in.fifo_B.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 2 * p0 + 32 * c1 + c7, 2 * p1 + 32 * c2)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": [\n                                                                    {\n                                                                        \"child\": {\n                                                                            \"user_expr\": \"in.fifo_C.1.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c0 + 2 * c5 + c6, 2 * p0 + 32 * c1 + c7)\"\n                                                                        },\n                                                                        \"type\": \"user\"\n                                                                    }\n                                                                ],\n                                                                \"type\": \"if\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"bounds\": [\n                                                                        \"0\",\n                                                                        \"k_t2\"\n                                                                    ],\n                                                                    \"child\": {\n                                                                        \"child\": {\n                                                                            \"child\": {\n                                                                                \"user_expr\": \"S_0(32 * c0 + 2 * c5 + c6, 2 * p0 + 32 * c1 + c7, 2 * p1 + 32 * c2 + c8)\"\n                                                                            },\n                                                                            \"type\": \"user\"\n                                                                        },\n                                                                        \"content\": \"hls_unroll\",\n                                                                        \"type\": \"mark\"\n                                                                    },\n                                                                    \"iterator\": \"c8\",\n                                                                    \"type\": \"for\"\n                                                                },\n                                                                \"content\": \"simd\",\n                                                                \"type\": \"mark\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out.fifo_C.1.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c0 + 2 * c5 + c6, 2 * p0 + 32 * c1 + c7)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            },\n                                                            {\n                                                                \"child\": {\n                                                                    \"user_expr\": \"out.fifo_A.2.1(c0, c1, c2, p0, p1, c5, c6, c7, 32 * c0 + 2 * c5 + c6, 2 * p1 + 32 * c2)\"\n                                                                },\n                                                                \"type\": \"user\"\n                                                            }\n                                                        ],\n                                                        \"type\": \"block\"\n                                                    },\n                                                    \"content\": \"hls_pipeline\",\n                                                    \"type\": \"mark\"\n                                                },\n                                                \"iterator\": \"c6\",\n                                                \"type\": \"for\"\n                                            },\n                                            \"content\": \"latency\",\n                                            \"type\": \"mark\"\n                                        },\n                                        \"iterator\": \"c7\",\n                                        \"type\": \"for\"\n                                    },\n                                    \"content\": \"latency\",\n                                    \"type\": \"mark\"\n                                },\n                                \"iterator\": \"c5\",\n                                \"type\": \"for\"\n                            },\n                            \"content\": \"pe\",\n                            \"type\": \"mark\"\n                        },\n                        \"content\": \"array\",\n                        \"type\": \"mark\"\n                    },\n                    \"iterator\": \"c2\",\n                    \"type\": \"for\"\n                },\n                \"iterator\": \"c1\",\n                \"type\": \"for\"\n            },\n            \"iterator\": \"c0\",\n            \"type\": \"for\"\n        }\n    },\n    \"memory\": {\n        \"A_IO_L2_in\": {\n            \"array\": \"A\",\n            \"buf_size\": \"(i_t1*k_t2)\",\n            \"data_pack_factor_inter\": \"p9\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(k_t1/k_t2)\"\n        },\n        \"B_IO_L1_in\": {\n            \"array\": \"B\",\n            \"buf_size\": \"(j_t2*k_t2)\",\n            \"data_pack_factor_inter\": \"p10\",\n            \"data_pack_factor_intra\": \"k_t2\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"((k_t1/k_t2)*(j_t1/j_t2))\"\n        },\n        \"C_IO_L2_out\": {\n            \"array\": \"C\",\n            \"buf_size\": \"(i_t1*j_t2)\",\n            \"data_pack_factor_inter\": \"p11\",\n            \"data_pack_factor_intra\": \"1\",\n            \"double_buffer\": 1,\n            \"ele_size\": 4,\n            \"ele_type\": \"float\",\n            \"num\": \"(j_t1/j_t2)\"\n        }\n    },\n    \"params\": [\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"i\",\n            \"split_by\": \"i_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"j\",\n            \"split_by\": \"j_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"loop_ub\",\n            \"name\": \"k\",\n            \"split_by\": \"k_t1\",\n            \"tags\": [\n                \"external\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j\"\n            ],\n            \"name\": \"j_t1\",\n            \"split_by\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"k\"\n            ],\n            \"name\": \"k_t1\",\n            \"split_by\": \"k_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"array_part_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i\"\n            ],\n            \"name\": \"i_t1\",\n            \"split_by\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"j_t1\"\n            ],\n            \"divisors\": [\n                \"j_t1\"\n            ],\n            \"name\": \"j_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"latency_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"i_t1\"\n            ],\n            \"divisors\": [\n                \"i_t1\"\n            ],\n            \"name\": \"i_t2\",\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"SIMD_tiling_factor\",\n            \"bounds\": [\n                \"1\",\n                \"min(k_t1,8)\"\n            ],\n            \"divisors\": [\n                \"k_t1\"\n            ],\n            \"name\": \"k_t2\",\n            \"tags\": [\n                \"power_of_two\"\n            ],\n            \"tunable\": true\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,16),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p9\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"k_t2\",\n                \"max(min(k_t2,4),k_t2)\"\n            ],\n            \"divisors\": [\n                \"k_t2\"\n            ],\n            \"multiples\": [\n                \"k_t2\"\n            ],\n            \"name\": \"p10\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        },\n        {\n            \"attr\": \"data_pack_factor\",\n            \"bounds\": [\n                \"1\",\n                \"max(min(j_t2,16),1)\"\n            ],\n            \"divisors\": [\n                \"j_t2\"\n            ],\n            \"name\": \"p11\",\n            \"tags\": [\n                \"power_of_two\",\n                \"auto_infer\"\n            ],\n            \"tunable\": false\n        }\n    ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/explorer.py",
    "content": "import copy\nimport pprint\nimport numpy as np\nimport random\n\nimport utils\nimport tuners\nfrom search_task import SingleTask, MultiTask\n\nclass ArchExplorer(object):\n    \"\"\" Architecture explorer.\n    \"\"\"\n    def __init__(self, cst, search_obj, max_epochs, max_time, search_config, designs, workloads):\n        self.cst = cst\n        self.search_obj = search_obj\n        self.max_epochs = max_epochs\n        self.max_time = max_time\n        self.search_config = search_config\n        self.designs = designs\n        self.workloads = workloads\n\n    def search(self):\n        \"\"\" The gateway function to perform architecture search.\n        The input is a list of design descriptions \"designs\"\n        and a list of searching tasks \"tasks\".\n        \"\"\"\n        best_record = utils.SearchRecord().reset()\n\n        if self.search_config[\"explore_fusion\"]:\n            if self.search_config[\"explore_multi_acc\"]:\n                if self.search_config[\"method\"] == \"customized1\":\n                    best_record = self.search_fusion_multi_acc_customized1()\n                elif self.search_config[\"method\"] == \"customized2\":\n                    best_record = self.search_fusion_multi_acc_customized2()\n                    #best_record = self.search_fusion_multi_acc_customized2(design_idx=4)\n            else:\n                if self.search_config[\"method\"] == \"exhaustive\":\n                    best_record = self.search_fusion_single_acc_exhaustive() # TODO\n                elif self.search_config[\"method\"] == \"customized1\":\n                    #best_record = self.search_fusion_single_acc_customized1(design_idx=4)\n                    best_record = self.search_fusion_single_acc_customized1()\n                elif self.search_config[\"method\"] == \"customized2\":\n                    best_record = self.search_fusion_single_acc_customized2()\n                else:\n                    raise NotImplementedError(\"Undefined multi-accelerator search method.\")\n        else:\n            if self.search_config[\"explore_programmable\"]:\n                if self.search_config[\"method\"] == \"customized1\":\n                    best_record = self.search_programmable_single_acc_customized1() # TODO\n                else:\n                    raise NotImplementedError(\"Undefined single programmable accelerator search method.\")\n            else:\n                if self.search_config[\"method\"] == \"customized1\":\n                    best_record = self.search_non_fusion_single_acc_customized1(design_idx=self.search_config[\"design_idx\"])\n                    #best_record = self.search_non_fusion_single_acc_customized1(design_idx=4)\n                else:\n                    raise NotImplementedError(\"Undefined single accelerator search method.\")\n\n        return best_record\n\n    def tune(self, search_task, init_tasks=None, silent=0, use_cache=-1, meta=None):\n        \"\"\" Call tuners for the searching task.\n        init_tasks contains candidates for the initial population of the genetic search.\n        meta contains additional information used during the tuning.\n        \"\"\"\n        if use_cache == -1:\n            use_cache = self.search_config['use_db']\n        if use_cache:\n            # Check if the search task has been searched\n            if str(search_task) in self.search_config[\"search_records_db\"]:\n                return self.search_config[\"search_records_db\"][str(search_task)]\n                #return self.search_config[\"search_records_db\"][str(search_task)], self.search_config[\"search_records_db\"]\n\n        if isinstance(search_task, SingleTask):\n            if self.search_config['unit_task_method'] == \"genetic\":\n                # Use genetic search\n                search_record = tuners.genetic_search(search_task, self.cst, self.search_obj, self.max_epochs, self.max_time, \\\n                    n_worker=1, silent=silent, profiling=self.search_config[\"profiling\"])\n            elif self.search_config[\"unit_task_method\"] == \"random_pruning\":\n                search_record = tuners.random_search(search_task, self.cst, self.search_obj, self.max_epochs, self.max_time, \\\n                    n_worker=1, silent=silent, pruning=1, profiling=self.search_config[\"profiling\"])\n            elif self.search_config[\"unit_task_method\"] == \"random\":\n                search_record = tuners.random_search(search_task, self.cst, self.search_obj, self.max_epochs, self.max_time, \\\n                    n_worker=1, silent=silent, profiling=self.search_config[\"profiling\"])\n            elif self.search_config[\"unit_task_method\"] == \"exhaustive_pruning\":                \n                search_record = tuners.exhaustive_search(search_task, self.cst, self.search_obj, self.max_epochs, self.max_time, \\\n                    n_worker=1, silent=silent, pruning=1, profiling=self.search_config[\"profiling\"])\n            elif self.search_config[\"unit_task_method\"] == \"annealing\":\n                search_record = tuners.annealing_search(search_task, self.cst, self.search_obj, self.max_epochs, self.max_time, \\\n                    n_worker=1, silent=silent, profiling=self.search_config[\"profiling\"])\n            elif self.search_config[\"unit_task_method\"] == \"bayesian\":\n                search_record = tuners.bayesian_search(search_task, self.cst, self.search_obj, self.max_epochs, self.max_time, \\\n                    n_worker=1, silent=silent, profiling=self.search_config[\"profiling\"])\n            elif self.search_config[\"unit_task_method\"] == \"RL\":\n                search_record = tuners.RL_search(search_task, self.cst, self.search_obj, self.max_epochs, self.max_time, \\\n                    n_worker=1, silent=silent, profiling=self.search_config[\"profiling\"])\n            elif self.search_config[\"unit_task_method\"] == \"open_tuner\":\n                search_record = tuners.opentuner_search(search_task, self.cst, self.search_obj, self.max_epochs, self.max_time, \\\n                    n_worker=1, silent=silent, profiling=self.search_config[\"profiling\"], args=self.search_config[\"args\"])\n            else:\n                raise NotImplementedError(\"Undefined unit task method.\")\n        elif isinstance(search_task, MultiTask):\n            if search_task.fuse == 0:\n                if search_task.split == 0:\n                    search_record = tuners.non_fuse_genetic_search(search_task, init_tasks, self.cst, self.search_obj, self.max_epochs, self.max_time, \\\n                        n_worker=self.search_config['n_worker'], silent=silent, population_size=self.search_config['genetic_params']['population_size'][1], meta=meta)\n                else:\n                    if self.search_config[\"method\"] == \"customized1\":\n                        search_record = tuners.multi_acc_search1(search_task, init_tasks, self.cst, self.search_obj, self.max_epochs, self.max_time, \\\n                            n_worker=self.search_config['n_worker'], silent=silent, population_size=self.search_config['genetic_params']['population_size'][1], \\\n                            meta=meta, explorer=self, profiling=self.search_config[\"profiling\"])\n                    elif self.search_config[\"method\"] == \"customized2\":\n                        search_record = tuners.multi_acc_search2(search_task, init_tasks, self.cst, self.search_obj, self.max_epochs, self.max_time, \\\n                            n_worker=self.search_config['n_worker'], silent=silent, population_size=self.search_config['genetic_params']['population_size'][1], \\\n                            meta=meta, explorer=self, profiling=self.search_config[\"profiling\"])\n            elif search_task.fuse == 1:\n                search_record = tuners.fuse_genetic_search(search_task, init_tasks, self.cst, self.search_obj, self.max_epochs, self.max_time, \\\n                    n_worker=self.search_config['n_worker'], silent=silent, population_size=self.search_config['genetic_params']['population_size'][1], meta=meta, explorer=self)\n            elif search_task.fuse == 2:\n                search_record = tuners.all_fuse_genetic_search(search_task, init_tasks, self.cst, self.search_obj, self.max_epochs, self.max_time, \\\n                    n_worker=self.search_config['n_worker'], silent=silent, population_size=self.search_config['genetic_params']['population_size'][1], explorer=self)\n            else:\n                raise RuntimeError('Unknown search task type.')\n        else:\n            raise RuntimeError('Unknown search task type.')\n\n        '''\n        # Save the search results\n        if str(search_task) in self.search_config[\"search_records_db\"]:\n            self.search_config[\"search_records_db\"][str(search_task)].update(search_record)\n        else:\n            self.search_config[\"search_records_db\"][str(search_task)] = search_record\n        '''\n\n        return search_record\n        #return search_record, self.search_config[\"search_records_db\"]\n\n    def search_non_fusion_single_acc_exhaustive(self):\n        raise NotImplementedError(\"Unimplemented single accelerator search method.\")\n\n    def search_non_fusion_single_acc_customized1(self, design_idx=-1, search_task_configs=None, early_stop=-1, silent=0, workload_idx=None, prev_array=None, one_gen=False):\n        \"\"\" This function searches the best single accelerator for the search tasks.\n        We assume the tasks are executed in sequence on the acclerator.\n        The function first searches the best array configuration for each task.\n        The results are served as the initial candidate pool to kick off the\n        evolutionary search which searches for the best array configuration\n        that maximizes the overall performance.\n        Modify the search task configurations when the search_task_configs is valid.\n\n        If early_stop is set (not equal to -1), the search will be terminated\n        if the ideal latency is longer than the early_stop threshold.\n        If URAM is used, we will run the non-fuse search for one time and identify the\n        bottleneck of each layer. Following the increasing order of CTC ratio,\n        we check three arrasy: cin, cout, and w.\n        If any of them is the bottleneck, we will try to store them on-chip.\n        This process stops until there is no more URAM available on-chip.\n\n        \"prev_array\" is used for the TGPA-style multi-array setting.\n        When prev_array is set, when searching the solution of the current array,\n        the latency of each workload is adjusted to consider the setup latency.\n        \"\"\"\n        design_list = self.designs\n        if design_idx != -1:\n            # Only search a certain design\n            design_list = [self.designs[design_idx]]\n\n        if workload_idx:\n            workloads = [self.workloads[i] for i in workload_idx]\n        else:\n            workloads = self.workloads\n\n        # Test1: Fix r-axis to one\n        #search_task_configs = {}\n        #for i in range(len(self.workloads)):\n        #    search_task_configs[i] = {'fix_param': [['r', 1]]}\n\n        # Test2: Equate c_t1 = r_t1\n        #search_task_configs = {}\n        #for i in range(len(self.workloads)):\n        #    search_task_configs[i] = {'equate_params': [['r_t1', 'c_t1']]}\n\n        def est_URAM(width, depth):\n            \"\"\" Estimate URAM usage.\n            \"\"\"\n            mem = np.ceil(width / 72) * np.ceil(depth / 4096)\n            return mem\n\n        def modify_task_configs_uram(layer_infos, workloads, configs):\n            if not configs:\n                configs = {}\n                for layer_idx in range(len(layer_infos)):\n                    configs[layer_idx] = {\"cin_read_mode\": 0, \"w_read_mode\": 0, \"cout_write_mode\": 0}\n            c_mem = []\n            for layer_idx in range(len(layer_infos)):\n                c_mem.append([0, 0]) # input, output\n            w_mem = 0\n            def take_item(elem):\n                return elem[\"item\"]\n            def take_value(elem):\n                return elem[\"value\"]\n            def cal_c_mem(c_mem):\n                total_c_mem = [m[0] + m[1] for m in c_mem]\n                return max(total_c_mem)\n            for layer_info in layer_infos:\n                workload = workloads[layer_info[\"idx\"]]\n                if cal_c_mem(c_mem) + w_mem >= self.cst.hw_cst[\"URAM\"]:\n                    break\n                PE_latency = layer_info[\"reward_meta\"][\"latency_main\"][\"PE_latency\"]\n                cin_latency = [{\"item\": x, \"value\": layer_info[\"reward_meta\"][\"latency_main\"][x]} for x in layer_info[\"reward_meta\"][\"latency_main\"] if x.startswith(\"cin\")]\n                cin_latency.sort(key=take_value)\n                cout_latency = [{\"item\": x, \"value\": layer_info[\"reward_meta\"][\"latency_main\"][x]} for x in layer_info[\"reward_meta\"][\"latency_main\"] if x.startswith(\"cout\")]\n                cout_latency.sort(key=take_value)\n                w_latency = [{\"item\": x, \"value\": layer_info[\"reward_meta\"][\"latency_main\"][x]} for x in layer_info[\"reward_meta\"][\"latency_main\"] if x.startswith(\"w\")]\n                w_latency.sort(key=take_value)\n                bottlenecks = []\n                if cin_latency[-1]['value'] != cin_latency[-2]['value']:\n                    bottlenecks.append({\"item\": \"cin\", \"value\": cin_latency[-1]['value']})\n                if cout_latency[-1]['value'] != cout_latency[-2]['value']:\n                    bottlenecks.append({\"item\": \"cout\", \"value\": cout_latency[-1]['value']})\n                if w_latency[-1]['value'] != w_latency[-2]['value']:\n                    bottlenecks.append({\"item\": \"w\", \"value\": w_latency[-1]['value']})\n                bottlenecks.sort(key=take_value, reverse=True)\n                for b in bottlenecks:\n                    if b[\"value\"] <= PE_latency:\n                        break\n                    if b[\"item\"] == \"w\":\n                        # Compute the uram for w\n                        datapack = 8\n                        dw = 4 # Four bytes by default\n                        width = dw * 8 * datapack\n                        depth = workload[\"params\"][\"o\"] * workload[\"params\"][\"i\"] * \\\n                                workload[\"params\"][\"p\"] * workload[\"params\"][\"q\"] / datapack\n                        uram = est_URAM(width, depth)\n                        if cal_c_mem(c_mem) + w_mem + uram < self.cst.hw_cst[\"URAM\"]:\n                            configs[layer_info[\"idx\"]][\"w_read_mode\"] = 1\n                            w_mem += uram\n                    if b[\"item\"] == \"cin\" and layer_info[\"idx\"] > 0:\n                        # Compute the uram for cin\n                        datapack = 8\n                        dw = 4 # Four bytes by default\n                        width = dw * 8 * datapack\n                        depth = workload[\"params\"][\"i\"] * (workload[\"params\"][\"r\"] + workload[\"params\"][\"p\"] - 1) * \\\n                                (workload[\"params\"][\"c\"] + workload[\"params\"][\"q\"] - 1) / datapack\n                        uram = est_URAM(width, depth)\n                        old_c_mem = copy.deepcopy(c_mem)\n                        c_mem[layer_info[\"idx\"]][0] = max(c_mem[layer_info[\"idx\"]][0], uram)\n                        c_mem[layer_info[\"idx\"] - 1][1] = max(c_mem[layer_info[\"idx\"] - 1][1], uram)\n                        if cal_c_mem(c_mem) + w_mem < self.cst.hw_cst[\"URAM\"]:\n                            configs[layer_info[\"idx\"]][\"cin_read_mode\"] = 3\n                            configs[layer_info[\"idx\"] - 1][\"cout_write_mode\"] = 1\n                        else:\n                            c_mem = old_c_mem\n                    if b[\"item\"] == \"cout\" and layer_info[\"idx\"] < len(workloads) - 1:\n                        # Compute the uram for cout\n                        datapack = 8\n                        dw = 4\n                        width = dw * 8 * datapack\n                        depth = workload[\"params\"][\"o\"] * workload[\"params\"][\"r\"] * workload[\"params\"][\"c\"] / datapack\n                        uram = est_URAM(width, depth)\n                        old_c_mem = copy.deepcopy(c_mem)\n                        c_mem[layer_info[\"idx\"]][1] = max(c_mem[layer_info[\"idx\"]][1], uram)\n                        c_mem[layer_info[\"idx\"] + 1][0] = max(c_mem[layer_info[\"idx\"] + 1][0], uram)\n                        if cal_c_mem(c_mem) + w_mem < self.cst.hw_cst[\"URAM\"]:\n                            configs[layer_info[\"idx\"]][\"cout_write_mode\"] = 1\n                            configs[layer_info[\"idx\"] + 1][\"cin_read_mode\"] = 3\n                        else:\n                            c_mem = old_c_mem\n            return configs, cal_c_mem(c_mem) + w_mem\n\n        def modify_task_configs_prev_array(prev_array, configs):\n            prev_workload = prev_array['workloads']\n            prev_record = prev_array['record']\n            if not configs:\n                configs = {}\n                for layer_idx in range(len(workloads)):\n                    configs[layer_idx] = {\"prev_sol\": None, \"prev_workload\": None, \"prev_latency\": None}\n            for layer_idx in range(len(workloads)):\n                if layer_idx < len(prev_workload):\n                    configs[layer_idx]['prev_workload'] = self.workloads[prev_workload[layer_idx]]\n                    configs[layer_idx]['prev_sol'] = prev_record.task_sols[layer_idx]['sol']\n                    configs[layer_idx]['prev_latency'] = prev_record.task_sols[layer_idx]['latency']\n            return configs\n\n        def one_pass(workloads, design_list, silent, early_stop, search_task_configs):\n            # Search the best config for each task\n            repeat = True\n            repeat_iter = 0\n            job_list = []\n            while repeat:\n                search_tasks = []\n                # Single workload task\n                for workload in workloads:\n                    search_task = SingleTask(design_list[i], workload, self.cst)\n                    search_tasks.append(search_task)\n                # Modify the first search task, used for multi-acc search\n                if search_task_configs:\n                    for task_idx in range(len(search_tasks)):\n                        search_tasks[task_idx].configs = search_task_configs[task_idx]\n                # Silent the tuner if the #worker is greater than 1\n                local_silent = silent\n                if silent == 0:\n                    local_silent = 1 if self.search_config[\"n_worker\"] > 1 else 0\n                one_batch_n_job = 0\n                for t in search_tasks:\n                    for job in job_list:\n                        if job['job_hash'] == f'{str(t)}_{repeat_iter}':\n                            # Avoid duplicate task\n                            continue\n                    job_list.append(\n                        {'job_hash': f'{str(t)}_{repeat_iter}', 'func': self.tune, \\\n                         'args': [t, None, local_silent, 0]})\n                    one_batch_n_job += 1\n                # Fill in enough tasks for the initial population\n                #if len(job_list) + one_batch_n_job > self.search_config['genetic_params']['population_size'][1]:\n                #    repeat = False\n                repeat_iter += 1\n                if repeat_iter > 1:\n                    repeat = False\n\n            pool = utils.MyExecutor(self.search_config['n_worker'])\n            results = pool.exec(job_list)\n            init_tasks = []\n            for r in results:\n                if results[r].valid:\n                    init_tasks.append(results[r])\n\n            # Search the single array architecture\n            if early_stop != -1:\n                # Test if the ideal latency is longer than the early stop threshold.\n                ideal_latency = utils.compute_tasks_latency(search_tasks, init_tasks)\n                if ideal_latency > early_stop:\n                    return best_record\n\n            # Build the multi-workload search task\n            search_tasks = []\n            for workload in workloads:\n                search_task = SingleTask(design_list[i], workload, self.cst)\n                search_tasks.append(search_task)\n            if search_task_configs:\n                for task_idx in range(len(search_tasks)):\n                    search_tasks[task_idx].configs = search_task_configs[task_idx]\n            search_task = MultiTask(design_list[i], search_tasks, self.cst, fuse=0)\n            meta = {\"one_gen\": one_gen, \"xgb_params\": self.search_config[\"xgb_params\"]}\n            search_record = self.tune(search_task, init_tasks, silent=silent, meta=meta)\n\n            return search_record\n\n        best_record = utils.SearchRecord().reset()\n        if prev_array:\n            search_task_configs = modify_task_configs_prev_array(prev_array, search_task_configs)\n        for i in range(len(design_list)):\n            if len(self.workloads) == 1:\n                # Single task workload\n                search_task = SingleTask(design_list[i], workloads[0], self.cst)\n                if search_task_configs:\n                    search_task.configs = search_task_configs[0]\n                search_record = self.tune(search_task)\n                if search_record.valid:\n                    search_record.arch_sol = search_record.task_sols[0]['sol']\n                    if prev_array:\n                        total_latency = 0\n                        for task_sol in search_record.task_sols:\n                            task_sol['latency'] = task_sol['reward_meta']['latency']['latency_orig']\n                            total_latency += task_sol['latency']\n                        search_record.latency = total_latency\n                best_record.update(search_record, save=1)\n            else:\n                search_record = one_pass(workloads, design_list, silent, early_stop, search_task_configs)\n                if prev_array:\n                    total_latency = 0\n                    for task_sol in search_record.task_sols:\n                        task_sol['latency'] = task_sol['reward_meta']['latency']['latency_orig']\n                        total_latency += task_sol['latency']\n                    search_record.latency = total_latency\n                    if search_record.metric == \"latency\":\n                        search_record.reward = 1 / total_latency\n                best_record.update(search_record, save=1)\n                if self.search_config['use_uram'] == 1 and \"conv\" in workloads[0][\"tags\"]:\n                    import logging\n                    logger = logging.getLogger('AutoSA-Tuner')\n                    logger.info(\"Search again with URAM...\")\n                    # For CNN we test if any buffers can be fit on-chip\n                    layer_info = []\n                    for task_idx in range(len(search_record.task_sols)):\n                        task_sol = search_record.task_sols[task_idx]\n                        layer_info.append({\n                            \"idx\": task_idx,\n                            \"CTC\": task_sol[\"CTC\"],\n                            \"reward_meta\": task_sol[\"reward_meta\"][\"latency\"]\n                        })\n                    # Sort them by CTC ratio\n                    def getCTC(elem):\n                        return elem[\"CTC\"]\n                    layer_info.sort(key=getCTC)\n                    #pprint.pprint(layer_info)\n                    #exit(0)\n                    search_task_configs, uram = modify_task_configs_uram(layer_info, workloads, search_task_configs)\n                    # Run the search again with updated search configs\n                    search_record = one_pass(workloads, design_list, silent, early_stop, search_task_configs)\n                    search_record.cst[\"URAM\"] = uram\n                    if prev_array:\n                        total_latency = 0\n                        for task_sol in search_record.task_sols:\n                            task_sol['latency'] = task_sol['reward_meta']['latency']['latency_orig']\n                            total_latency += task_sol['latency']\n                        search_record.latency = total_latency\n                        if search_record.metric == \"latency\":\n                            search_record.reward = 1 / total_latency\n                    best_record.update(search_record, save=1)\n\n        return best_record\n\n    def search_fusion_single_acc_customized1(self, design_idx=-1, search_task_configs=None):\n        \"\"\" This function searches the best single accelerator configuration considering\n        the task fusion.\n        Note: We assume a linear dependence in the network.\n        There are two steps.\n        Step 1: Build a candidate pool of all the sub-graphs of interst. Search\n        for the best array configurations of these tasks.\n        Step 2: Use the candidate tasks in the previous step to kick off the\n        evo search. For each array config, use the DP to find the best fusion scheme.\n        \"\"\"\n        # Note: Consider FP32 only at 200MHz with 3 DDR ports\n        params = {\n            \"thres_CTC\": self.cst.hw_cst[\"DSP\"] / 5 * 2 * 0.2 / (12.8 * 3)\n        }\n\n        best_record = utils.SearchRecord().reset()\n\n        design_list = self.designs\n        if design_idx != -1:\n            # Only search a certain design\n            design_idx_list = [design_idx]\n        else:\n            design_idx_list = list(range(len(self.designs)))\n\n        for i in design_idx_list:\n            fusion_candidates = []\n            # Enqueue the single-workload tasks\n            repeat = True\n            repeat_iter = 0\n            job_list = []\n            while repeat:\n                search_tasks = []\n                for workload in self.workloads:\n                    search_task = SingleTask(design_list[i], workload, self.cst)\n                    search_tasks.append(search_task)\n                # Modify the first search task, used for multi-acc search\n                if search_task_configs:\n                    search_tasks[0].configs = search_task_configs\n                # Silent the tuner if the #worker is greater than 1\n                silent = 1 if self.search_config[\"n_worker\"] > 1 else 0\n                one_batch_n_job = 0\n                for t in search_tasks:\n                    for job in job_list:\n                        if job['job_hash'] == f'{str(t)}_{repeat_iter}':\n                            # Avoid duplicate task\n                            continue\n                    job_list.append(\n                        {'job_hash': f'{str(t)}_{repeat_iter}', 'func': self.tune, \\\n                         'args': [t, None, silent, 0]})\n                    one_batch_n_job += 1\n                # Fill in enough tasks for the initial population\n                if len(job_list) + one_batch_n_job > self.search_config['genetic_params']['population_size'][1]:\n                    repeat = False\n                repeat_iter += 1\n            pool = utils.MyExecutor(self.search_config['n_worker'])\n            results = pool.exec(job_list)\n            init_tasks = []\n            for r in results:\n                if results[r].valid:\n                    init_tasks.append(results[r])\n\n            # Sort the tasks based on the CTC ratio\n            network_best_records = {}\n            for record in init_tasks:\n                if record.task_sols[0]['hash'] in network_best_records:\n                    network_best_records[record.task_sols[0]['hash']].update(record)\n                else:\n                    network_best_records[record.task_sols[0]['hash']] = record\n\n            network_best_records_sorted = []\n            comm_bound_ops = []\n            for k, v in network_best_records.items():\n                network_best_records_sorted.append(v)\n            CTC_thres = params[\"thres_CTC\"]\n            def takeCTC(elem):\n                return elem.ctc\n            network_best_records_sorted.sort(key=takeCTC)\n            for record in network_best_records_sorted:\n                if record.dsp_eff < 0.5:\n                    CTC_thres = max(CTC_thres, record.ctc)\n                else:\n                    break\n            for record in network_best_records_sorted:\n                if record.ctc <= CTC_thres:\n                    comm_bound_ops.append(record)\n\n            # Enqueue the multi-workload tasks\n            comm_bound_layers = []\n            for layer_idx in range(len(self.workloads)):\n                layer = self.workloads[layer_idx]\n                for op in comm_bound_ops:\n                    if layer[\"name\"] in op.task_names:\n                        comm_bound_layers.append({\"ctc\": op.ctc, \"layers\": [layer_idx]})\n\n            searched_layers = []\n            def hash_layers(layer_ids):\n                ret = \"\"\n                for id in layer_ids:\n                    params = self.workloads[id][\"params\"]\n                    for k,v in params.items():\n                        ret += f\"{k}{v}\"\n                    for tag in self.workloads[id][\"tags\"]:\n                        ret += tag\n                return ret\n\n            def find_all_pairs(layer_ids):\n                # Find all pairs in the network with the same workload config as the \"layer_ids\"\n                layer_hash = hash_layers(layer_ids)\n                ret = []\n                for idx in range(len(self.workloads) - (len(layer_ids) - 1)):\n                    cmp_layer_ids = list(range(idx, idx + len(layer_ids)))\n                    if hash_layers(cmp_layer_ids) == layer_hash:\n                        task_names = [self.workloads[i][\"name\"] for i in cmp_layer_ids]\n                        ret.append({\"idx\": cmp_layer_ids, \"names\": task_names})\n                return ret\n\n            while len(comm_bound_layers) > 0:\n                # Sort the list based on the increasing order of CTC\n                def takeCTC(elem):\n                    return elem[\"ctc\"]\n                comm_bound_layers.sort(key=takeCTC)\n\n                # Start with the task with the lowest CTC\n                op_to_fuse = comm_bound_layers[0]\n\n                # Fuse it with neighbor layers\n                if op_to_fuse['layers'][0] > 0:\n                    prev_layers = self.workloads[op_to_fuse['layers'][0] - 1: op_to_fuse['layers'][0] + 1]\n                    prev_layers_idx = list(range(op_to_fuse['layers'][0] - 1, op_to_fuse['layers'][0] + 1))\n                    unfused_latency = 0\n                    for layer in prev_layers:\n                        for record in network_best_records_sorted:\n                            if layer[\"name\"] in record.task_names:\n                                unfused_latency += record.latency\n                                break\n                    #layer_hash = ''\n                    #for idx in prev_layers_idx:\n                    #    layer_hash += str(idx)\n                    layer_hash = hash_layers(prev_layers_idx)\n                    if layer_hash not in searched_layers:\n                        searched_layers.append(layer_hash)\n                        search_record = self.search_fusion_single_acc_customized2(prev_layers_idx, design_idx=i, search_task_configs=search_task_configs)\n                        if search_record.valid:\n                            if search_record.latency < unfused_latency:\n                                pairs = find_all_pairs(prev_layers_idx)\n                                for pair in pairs:\n                                    fusion_candidates.append(pair[\"names\"])\n                                init_tasks.insert(0, search_record)\n                                if search_record.ctc < CTC_thres:\n                                    for pair in pairs:\n                                        comm_bound_layers.append({\"ctc\": search_record.ctc, \"layers\": pair[\"idx\"]})\n                if op_to_fuse['layers'][-1] < len(self.workloads) - 1:\n                    nxt_layers = self.workloads[op_to_fuse['layers'][-1]: op_to_fuse['layers'][-1] + 2]\n                    nxt_layers_idx = list(range(op_to_fuse['layers'][-1], op_to_fuse['layers'][-1] + 2))\n                    unfused_latency = 0\n                    for layer in nxt_layers:\n                        for record in network_best_records_sorted:\n                            if layer[\"name\"] in record.task_names:\n                                unfused_latency += record.latency\n                                break\n                    layer_hash = hash_layers(nxt_layers_idx)\n                    if layer_hash not in searched_layers:\n                        searched_layers.append(layer_hash)\n                        search_record = self.search_fusion_single_acc_customized2(nxt_layers_idx, design_idx=i, search_task_configs=search_task_configs)\n                        if search_record.valid:\n                            if search_record.latency < unfused_latency:\n                                pairs = find_all_pairs(nxt_layers_idx)\n                                for pair in pairs:\n                                    fusion_candidates.append(pair[\"names\"])\n                                init_tasks.insert(0, search_record)\n                                if search_record.ctc < CTC_thres:\n                                    for pair in pairs:\n                                        comm_bound_layers.append({\"ctc\": search_record.ctc, \"layers\": pair[\"idx\"]})\n                # Pop out the op\n                comm_bound_layers = comm_bound_layers[1:]\n\n            # Kick off the local search\n            search_tasks = []\n            for workload in self.workloads:\n                search_task = SingleTask(design_list[i], workload, self.cst)\n                search_tasks.append(search_task)\n            # Modify the first search task, used for multi-acc search\n            if search_task_configs:\n                search_tasks[0].configs = search_task_configs\n            search_task = MultiTask(design_list[i], search_tasks, self.cst, fuse=1)\n            import logging\n            logger = logging.getLogger('AutoSA-Tuner')\n            logger.info(f\"fusion candidates: {fusion_candidates}\")\n\n            for idx in range(len(fusion_candidates)):\n                fusion_candidates[idx] = ''.join(fusion_candidates[idx])\n            meta = {'fusion_candidates': fusion_candidates}\n            search_record = self.tune(search_task, init_tasks, meta=meta)\n\n            best_record.update(search_record, save=1)\n\n        return best_record\n\n    def search_fusion_single_acc_customized2(self, workload_idx=None, design_idx=-1, search_task_configs=None, silent=0):\n        \"\"\" This function searches the best single accelerator configuration considering\n        the task fusion. All the layers are fused.\n        Note: We assume a linear dependence in the network.\n        There are two steps.\n        Step 1: Build a candidate pool of all the sub-graphs of interst. Search\n        for the best array configurations of these tasks.\n        Step 2: Use the candidate tasks in the previous step to kick off the\n        evo search.\n        \"\"\"\n        best_record = utils.SearchRecord().reset()\n\n        design_list = self.designs\n        if design_idx != -1:\n            # Only search a certain design\n            design_idx_list = [design_idx]\n        else:\n            design_idx_list = list(range(len(self.designs)))\n        workloads = [self.workloads[i] for i in workload_idx]\n\n        for i in design_idx_list:\n            # Enqueue the single-workload tasks\n            repeat = True\n            repeat_iter = 0\n            job_list = []\n            while repeat:\n                search_tasks = []\n                for workload in workloads:\n                    search_task = SingleTask(design_list[i], workload, self.cst)\n                    search_tasks.append(search_task)\n                # Modify the first search task, used for multi-acc search\n                if search_task_configs:\n                    search_tasks[0].configs = search_task_configs\n                # Modify the last layer\n                last_task = copy.deepcopy(search_tasks[-1])\n                last_task.fuse = 1\n                last_task.last_fuse = 1\n                last_task.use_uram = self.search_config[\"use_uram\"]\n                if last_task.use_uram:\n                    last_task.configs['cin_read_mode'] = 3\n                else:\n                    last_task.configs['cin_read_mode'] = 2\n                last_task.configs['cout_write_mode'] = 0\n                last_task.set_aux_func('update_cin_latency', 'update_cin_latency_last')\n                if last_task.use_uram == 0:\n                    last_task.set_aux_func('update_cin_buf', 'update_cin_buf_bram_last')\n                else:\n                    last_task.set_aux_func('update_cin_buf', 'update_cin_buf_uram_last')\n                search_tasks.append(last_task)\n\n                # Silent the tuner if the #worker is greater than 1\n                local_silent = silent\n                if silent == 0:\n                    local_silent = 1 if self.search_config[\"n_worker\"] > 1 else 0\n                one_batch_n_job = 0\n                for t in search_tasks:\n                    for job in job_list:\n                        if job['job_hash'] == f'{str(t)}_{repeat_iter}':\n                            # Avoid duplicate task\n                            continue\n                    job_list.append(\n                        {'job_hash': f'{str(t)}_{repeat_iter}', 'func': self.tune, \\\n                         'args': [t, None, local_silent, 0]})\n                    one_batch_n_job += 1\n                # Fill in enough tasks for the initial population\n                if len(job_list) + one_batch_n_job > self.search_config['genetic_params']['population_size'][1]:\n                    repeat = False\n                repeat_iter += 1\n\n            pool = utils.MyExecutor(self.search_config['n_worker'])\n            results = pool.exec(job_list)\n            init_tasks = []\n            for r in results:\n                if results[r].valid:\n                    init_tasks.append(results[r])\n\n            # Local search\n            search_tasks = []\n            for workload in workloads:\n                search_task = SingleTask(design_list[i], workload, self.cst)\n                search_tasks.append(search_task)\n            # Modify the first search task, used for multi-acc search\n            if search_task_configs:\n                search_tasks[0].configs = search_task_configs\n            search_task = MultiTask(design_list[i], search_tasks, self.cst, fuse=2, use_uram=self.search_config[\"use_uram\"])\n            search_record = self.tune(search_task, init_tasks, silent=silent)\n\n            best_record.update(search_record)\n\n        return best_record\n\n    def search_fusion_multi_acc_customized1(self, design_idx=-1, search_task_configs=None, silent=0):\n        \"\"\" This function searches the best multi-array configuration.\n        Run the single array search first.\n        Then explore different partitions schemes by setting different DSP utilization threshold.\n        For certain threshold, all the layers that achieve beyond the threshold are mapped\n        to a homogeneneous systolic array. The rest layers are mapped to separate\n        single systolic arrays.\n        \"\"\"\n        best_record = utils.SearchRecord().reset()\n\n        params = {\n            \"non_fuse_repeat\": 1, # Run the single-array search for multiple times to stablelize the results\n            \"n_designs\": 4, # Only select the top-k designs for consideration\n            \"util_interval\": 0.1, # DSP utilization interval for generating partition candidates\n            \"n_partition_candidates\": 3, # Only consider the top-k partitioning candidates\n            \"n_array_max\": self.search_config[\"max_n_array\"] # At most #arrays are supported\n        }\n\n        import logging\n        logger = logging.getLogger('AutoSA-Tuner')\n\n        design_list = self.designs\n        if design_idx != -1:\n            # Only search a certain design\n            design_idx_list = [design_idx]\n        else:\n            design_idx_list = list(range(len(self.designs)))\n        \n        '''\n        # Single array search        \n        design_history = []\n        single_array_record = utils.SearchRecord().reset()\n        for i in design_idx_list:\n            local_record = utils.SearchRecord().reset()\n            for repeat in range(params[\"non_fuse_repeat\"]):\n                #local_record.update(self.search_non_fusion_single_acc_customized1(design_idx=i, silent=silent, one_gen=True))\n                local_record.update(self.search_non_fusion_single_acc_customized1(design_idx=i, silent=silent))\n            design_history.append({\"idx\": i, \"record\": local_record})\n            single_array_record.update(local_record)\n        single_array_record.throughput = 1 / single_array_record.latency\n        '''\n        \n        import pickle\n        #pickle.dump(design_history, open(f'tmp/design_history_{self.search_config[\"workload\"]}', 'wb'))\n        #pickle.dump(single_array_record, open(f'tmp/single_array_record_{self.search_config[\"workload\"]}', 'wb'))\n        design_history = pickle.load(open(f'tmp/design_history_{self.search_config[\"workload\"]}', 'rb'))\n        single_array_record = pickle.load(open(f'tmp/single_array_record_{self.search_config[\"workload\"]}', 'rb'))        \n\n        '''\n        # For the scalability issue, we will only select the top-4 designs\n        # as the candidate dataflows for further exploration.\n        def take_record_latency(elem):\n            return elem[\"record\"].latency\n        design_history.sort(key=take_record_latency)\n        design_history = design_history[:min(params[\"n_designs\"], len(design_history))]\n        design_idx_list = [h[\"idx\"] for h in design_history]                \n        logger.info(f\"Selected design idx: {design_idx_list}\")\n        design_list = [self.designs[i] for i in design_idx_list]\n        '''\n\n        # Partition initialization        \n        # Setting 1: Parition the first x layers to single arrays, and place the rest on a single array        \n        # Setting 2: Group layers that are similar together        \n        def hash_partition(partition):\n            ret = \"\"\n            for p in partition:\n                ret += \"|\"\n                ret += ''.join(str(p))\n                ret += \"|\"\n            return ret\n\n        partition_candidates = []    \n\n        # Setting 1\n        '''\n        layer_sols = single_array_record.task_sols\n        dsp_eff_list = [sol[\"DSP_eff\"] for sol in layer_sols]\n        max_dsp_eff = max(dsp_eff_list)\n        op_list = [sol[\"ops\"] for sol in layer_sols]\n        total_ops = np.sum(op_list)\n        for split_pos in range(1, len(layer_sols)):\n            latency_list = []\n            # SL array\n            for sl_idx in range(split_pos):\n                dsp_eff = max_dsp_eff\n                t = op_list[sl_idx] / total_ops * dsp_eff\n                lat = op_list[sl_idx] / t\n                latency_list.append(lat)\n            # ML array\n            dsp_eff = np.mean(dsp_eff_list[split_pos:])\n            t = np.sum(op_list[split_pos:]) / total_ops * dsp_eff\n            lat = np.sum(op_list[split_pos:]) / t\n            latency_list.append(lat)\n            T = 1 / max(latency_list)\n            partition = []\n            for sl_idx in range(split_pos):\n                partition.append([sl_idx])\n            partition.append(list(range(split_pos, len(layer_sols))))\n            if len(partition) > params[\"n_array_max\"]:\n                continue\n            partition_candidates.append({\n                \"idx\": len(partition_candidates),\n                \"partition\": partition,\n                \"hash\": hash_partition(partition),\n                \"throughput\": T,\n                \"n_arrays\": len(partition)\n            })\n        # Sort the partition candidates by throughput\n        def take_throughput(elem):\n            return elem[\"throughput\"]\n        partition_candidates.sort(key=take_throughput, reverse=True)\n        logger.info(f\"Partition candidates:\\n{pprint.pformat(partition_candidates, indent=2)}\")\n        init_partition_candidates = [i for i in range(min(params[\"n_partition_candidates\"], len(partition_candidates)))]\n        '''\n        \n        # Setting 2\n        import statistics\n        layer_sols = single_array_record.task_sols\n        dsp_eff_list = [sol[\"DSP_eff\"] for sol in layer_sols]\n        op_list = [sol[\"ops\"] for sol in layer_sols]\n        for i in range(len(dsp_eff_list)):\n            print(i, dsp_eff_list[i])   \n        import csv\n        with open(\"dsp_eff.csv\", \"w\") as f:\n            columns = [\"layer\", \"dsp_eff\"]\n            writer = csv.DictWriter(f, fieldnames=columns)\n            writer.writeheader()\n            for i in range(len(dsp_eff_list)):\n                data = {\n                    \"layer\": i + 1,\n                    \"dsp_eff\": dsp_eff_list[i]\n                }\n                writer.writerow(data)\n\n        split_pos_list = []\n        # Always split the first layer, therefore start from the second layer\n        window = [layer_sols[1][\"DSP_eff\"], layer_sols[2][\"DSP_eff\"]]\n        stdev_cur = statistics.stdev(window)\n        for i in range(3, len(self.workloads)):\n            if len(window) > 2 and (\n                dsp_eff_list[i] > max(window) * 1.1 or dsp_eff_list[i] * 1.15 < min(window)):\n                #print(i, max(window))\n                split_pos_list.append(i) # Split before i-th layer\n                window = [layer_sols[i][\"DSP_eff\"]]\n            else:\n                window.append(layer_sols[i][\"DSP_eff\"])        \n        split_pos_list.insert(0, 1) # Always split the first layer\n        split_pos_list.append(len(self.workloads))        \n        print(split_pos_list)\n        #exit(0)\n        max_min_list = [max(dsp_eff_list), min(dsp_eff_list)] \n        #print(max_min_list)\n        stdev_max = statistics.stdev(max_min_list)\n        #print(stdev_max)\n        \n        # Compute the mean and stdev        \n        def profile_partition(split_pos_list, dsp_eff_list):\n            stdev_list = []\n            mean_list = []\n            mean_ratio_list = []\n            for i in range(1, len(split_pos_list)):\n                window = [dsp_eff_list[d] for d in range(split_pos_list[i - 1], split_pos_list[i])]                \n                mean_list.append(np.mean(window))\n                if len(window) > 1:\n                    stdev_list.append(statistics.stdev(window))\n                else:\n                    stdev_list.append(0)\n            for i in range(1, len(mean_list)):\n                ratio = abs((mean_list[i] - mean_list[i - 1]) / mean_list[i - 1])\n                mean_ratio_list.append(ratio)\n            return mean_list, stdev_list, mean_ratio_list\n        def estimate_partition_throughput(partition, dsp_eff_list, op_list):\n            latency = []\n            max_dsp_eff = max(dsp_eff_list)\n            for p in partition:\n                ops = 0                \n                for i in p:\n                    ops += op_list[i]\n                if len(p) == 1:\n                    dsp_eff = max_dsp_eff\n                else:\n                    #dsp_eff = np.mean([dsp_eff_list[i] for i in p])\n                    stdev_cur = statistics.stdev([dsp_eff_list[i] for i in p])\n                    dsp_eff = (min(dsp_eff_list) - max(dsp_eff_list)) / 2 * (stdev_cur / stdev_max) + max(dsp_eff_list)\n                    #dsp_eff = (min(dsp_eff_list) - max(dsp_eff_list)) * (stdev_cur / stdev_max) + max(dsp_eff_list)\n                throughput_cur = ops / np.sum(op_list) * dsp_eff\n                latency_cur = ops / throughput_cur\n                latency.append(latency_cur)\n            return 1 / max(latency)\n        \n        split_pos_list_old = copy.deepcopy(split_pos_list)        \n        # Merge \n        mean_list, stdev_list, mean_ratio_list = profile_partition(split_pos_list, dsp_eff_list)\n        cur_n_array = len(mean_list)            \n        while cur_n_array > 0:\n            if cur_n_array <= params[\"n_array_max\"] - 1:\n                partition = [[0]]\n                for i in range(len(split_pos_list) - 1):\n                    partition += [list(range(split_pos_list[i], split_pos_list[i + 1]))]\n                throughput = estimate_partition_throughput(partition, dsp_eff_list, op_list)\n                duplicate = False\n                for p_tmp in partition_candidates:\n                    if p_tmp[\"hash\"] == hash_partition(partition):\n                        duplicate = True\n                        break\n                if not duplicate:\n                    partition_candidates.append({\n                        \"idx\": len(partition_candidates),\n                        \"partition\": partition,\n                        \"hash\": hash_partition(partition),\n                        \"throughput\": throughput,\n                        \"n_arrays\": len(partition)\n                    })\n            # Sort the mean_ratio_list and merge the adjacent one with the smallest ratio                \n            if cur_n_array > 1:                       \n                sort_index = np.argsort(mean_ratio_list)\n                array_to_merge_idx = sort_index[0]\n                del(split_pos_list[array_to_merge_idx + 1])\n                mean_list, stdev_list, mean_ratio_list = profile_partition(split_pos_list, dsp_eff_list)    \n                cur_n_array = len(mean_list)\n            else:\n                cur_n_array -= 1           \n\n        # Split\n        split_pos_list = split_pos_list_old\n        mean_list, stdev_list, mean_ratio_list = profile_partition(split_pos_list, dsp_eff_list)\n        cur_n_array = len(mean_list)\n        while cur_n_array <= params[\"n_array_max\"] - 1:\n            partition = [[0]]\n            for i in range(len(split_pos_list) - 1):\n                partition += [list(range(split_pos_list[i], split_pos_list[i + 1]))]\n            throughput = estimate_partition_throughput(partition, dsp_eff_list, op_list)\n            duplicate = False\n            for p_tmp in partition_candidates:\n                if p_tmp[\"hash\"] == hash_partition(partition):\n                    duplicate = True\n                    break\n            if not duplicate:\n                partition_candidates.append({\n                    \"idx\": len(partition_candidates),\n                    \"partition\": partition,\n                    \"hash\": hash_partition(partition),\n                    \"throughput\": throughput,\n                    \"n_arrays\": len(partition)\n                })\n            \n            #print(stdev_list)\n            sort_index = np.argsort(stdev_list)                \n            array_to_split_index = sort_index[-1]\n            if stdev_list[array_to_split_index] == 0:                    \n                break\n            # Try different positions\n            #print(split_pos_list)\n            #print(stdev_list)\n            #print(array_to_split_index)\n            if split_pos_list[array_to_split_index + 1] - split_pos_list[array_to_split_index] > 2:\n                stdev_tmp_list = []                \n                for i in range(split_pos_list[array_to_split_index], split_pos_list[array_to_split_index + 1]):\n                    dsp_eff_tmp_list = dsp_eff_list[split_pos_list[array_to_split_index]: split_pos_list[array_to_split_index + 1]]                    \n                    del(dsp_eff_tmp_list[i - split_pos_list[array_to_split_index]])                    \n                    if len(dsp_eff_tmp_list) > 1:\n                        stdev_tmp_list.append(statistics.stdev(dsp_eff_tmp_list))\n                    else:\n                        stdev_tmp_list.append(0)\n                        break\n                sort_index = np.argsort(stdev_tmp_list)      \n                insert = 1\n                if sort_index[0] > 0:\n                    split_pos_list.insert(array_to_split_index + insert, split_pos_list[array_to_split_index] + sort_index[0])  \n                    insert += 1\n                if sort_index[0] < len(stdev_tmp_list) - 1:\n                    split_pos_list.insert(array_to_split_index + insert, split_pos_list[array_to_split_index] + sort_index[0] + 1)  \n                #split_pos_list.insert(array_to_split_index + 1, split_pos_list[array_to_split_index] + sort_index[0] + 1)\n            else:\n                split_pos_list.insert(array_to_split_index + 1, split_pos_list[array_to_split_index] + 1)\n            mean_list, stdev_list, mean_ratio_list = profile_partition(split_pos_list, dsp_eff_list)    \n            cur_n_array = len(mean_list)          \n        \n        #if len(mean_list) >= params[\"n_array_max\"] - 1:\n        #    # If the current #array is grater than the maximal array, merge them\n        #    cur_n_array = len(mean_list)            \n        #    while cur_n_array > 0:\n        #        if cur_n_array <= params[\"n_array_max\"] - 1:\n        #            partition = [[0]]\n        #            for i in range(len(split_pos_list) - 1):\n        #                partition += [list(range(split_pos_list[i], split_pos_list[i + 1]))]\n        #            throughput = estimate_partition_throughput(partition, dsp_eff_list, op_list)\n        #            partition_candidates.append({\n        #                \"idx\": len(partition_candidates),\n        #                \"partition\": partition,\n        #                \"hash\": hash_partition(partition),\n        #                \"throughput\": throughput,\n        #                \"n_arrays\": len(partition)\n        #            })\n        #        # Sort the mean_ratio_list and merge the adjacent one with the smallest ratio                \n        #        if cur_n_array > 1:                       \n        #            sort_index = np.argsort(mean_ratio_list)\n        #            array_to_merge_idx = sort_index[0]\n        #            del(split_pos_list[array_to_merge_idx + 1])\n        #            mean_list, stdev_list, mean_ratio_list = profile_partition(split_pos_list, dsp_eff_list)    \n        #            cur_n_array = len(mean_list)\n        #        else:\n        #            cur_n_array -= 1                                \n        #else:\n        #    # Else, split the array with the highest stdev\n        #    cur_n_array = len(mean_list)\n        #    while cur_n_array <= params[\"n_array_max\"] - 1:\n        #        partition = [[0]]\n        #        for i in range(len(split_pos_list) - 1):\n        #            partition += [list(range(split_pos_list[i], split_pos_list[i + 1]))]\n        #        throughput = estimate_partition_throughput(partition, dsp_eff_list, op_list)\n        #        partition_candidates.append({\n        #            \"idx\": len(partition_candidates),\n        #            \"partition\": partition,\n        #            \"hash\": hash_partition(partition),\n        #            \"throughput\": throughput,\n        #            \"n_arrays\": len(partition)\n        #        })\n        #        \n        #        #print(stdev_list)\n        #        sort_index = np.argsort(stdev_list)                \n        #        array_to_split_index = sort_index[-1]\n        #        if stdev_list[array_to_split_index] == 0:                    \n        #            break\n        #        # Try different positions\n        #        if split_pos_list[array_to_split_index + 1] - split_pos_list[array_to_split_index] > 2:\n        #            stdev_tmp_list = []                \n        #            for i in range(split_pos_list[array_to_split_index], split_pos_list[array_to_split_index + 1]):\n        #                dsp_eff_tmp_list = dsp_eff_list[split_pos_list[array_to_split_index]: split_pos_list[array_to_split_index + 1]]                    \n        #                del(dsp_eff_tmp_list[i - split_pos_list[array_to_split_index]])                    \n        #                if len(dsp_eff_tmp_list) > 1:\n        #                    stdev_tmp_list.append(statistics.stdev(dsp_eff_tmp_list))\n        #                else:\n        #                    stdev_tmp_list.append(0)\n        #                    break\n        #            sort_index = np.argsort(stdev_tmp_list)      \n        #            insert = 1\n        #            if sort_index[0] > 0:\n        #                split_pos_list.insert(array_to_split_index + insert, split_pos_list[array_to_split_index] + sort_index[0])  \n        #                insert += 1\n        #            if sort_index[0] < len(stdev_tmp_list) - 1:\n        #                split_pos_list.insert(array_to_split_index + insert, split_pos_list[array_to_split_index] + sort_index[0] + 1)  \n        #            #split_pos_list.insert(array_to_split_index + 1, split_pos_list[array_to_split_index] + sort_index[0] + 1)\n        #        else:\n        #            split_pos_list.insert(array_to_split_index + 1, split_pos_list[array_to_split_index] + 1)\n        #        mean_list, stdev_list, mean_ratio_list = profile_partition(split_pos_list, dsp_eff_list)    \n        #        cur_n_array = len(mean_list)                \n        \n        #def take_n_array(elem):\n        #    return elem[\"n_arrays\"]\n        #partition_candidates.sort(key=take_n_array, reverse=True)\n        def take_throughput(elem):\n            return elem[\"throughput\"]\n        partition_candidates.sort(key=take_throughput, reverse=True)\n        logger.info(f\"Partition candidates:\\n{pprint.pformat(partition_candidates, indent=2)}\")\n        init_partition_candidates = [i for i in range(min(params[\"n_partition_candidates\"], len(partition_candidates)))]\n        #pprint.pprint(partition_candidates)        \n        #exit(0)\n\n        '''        \n        # Internal testing\n        partition_candidates = []\n        partition = []\n        if self.search_config[\"workload\"] == \"vgg16\":            \n            partition.append([0])\n            partition.append([1])\n            partition.append([2])\n            partition.append([3])\n            partition.append([4])\n            partition.append(list(range(5, len(self.workloads))))            \n        elif self.search_config[\"workload\"] == \"resnet50\":\n            partition = []\n            partition.append([0])\n            partition.append(list(range(1, 10)))  # 2-10\n            partition.append(list(range(10, 23))) # 11-23\n            partition.append(list(range(23, 40)))  # 24-40\n            partition.append(list(range(40, len(self.workloads)))) # 41-end                    \n        elif self.search_config[\"workload\"] == \"mobilenetv2\":\n            partition = []\n            partition.append([0])            \n            partition.append(list(range(1, 2)))\n            partition.append(list(range(2, 3)))        \n            partition.append(list(range(3, 4)))\n            partition.append(list(range(4, 8)))\n            partition.append(list(range(8, 14)))        \n            partition.append(list(range(14, 22)))\n            partition.append(list(range(22, 28)))                \n            partition.append(list(range(28, len(self.workloads))))            \n        init_partition_candidates = [0]\n        partition_candidates.append({\n            \"idx\": len(partition_candidates),\n            \"partition\": partition,            \n            \"hash\": hash_partition(partition),\n            \"n_arrays\": len(partition)\n            })\n        '''\n        \n        design_idx_list = [4, 5, 6, 8]\n        design_list = [self.designs[i] for i in design_idx_list]\n\n        # Collect the init tasks\n        init_tasks = []\n        for i in range(len(design_list)):\n            job_list = []\n            local_silent = silent\n            if silent == 0:\n                local_silent = 1 if self.search_config[\"n_worker\"] > 1 else 0\n\n            for repeat in range(params[\"non_fuse_repeat\"]):\n                search_tasks = []\n                for workload in self.workloads:\n                    search_task = SingleTask(design_list[i], workload, self.cst)\n                    search_tasks.append(search_task)\n                for t in search_tasks:\n                    for job in job_list:\n                        if job['job_hash'] == f'{str(t)}_{repeat}':\n                            # Avoid duplicate task\n                            continue\n                    job_list.append(\n                        {'job_hash': f'{str(t)}_{repeat}', 'func': self.tune, \\\n                         'args': [t, None, local_silent, 0]})\n\n            pool = utils.MyExecutor(self.search_config['n_worker'])\n            results = pool.exec(job_list)\n            for r in results:\n                if results[r].valid:\n                    init_tasks.append(results[r])\n\n        # Local search\n        search_tasks = []\n        for workload in self.workloads:\n            search_task = SingleTask(design_list[0], workload, self.cst)\n            search_tasks.append(search_task)\n        if search_task_configs:\n            search_tasks[0].configs = search_task_configs\n        search_task = MultiTask(design_list, search_tasks, self.cst, split=1)\n        meta = {'partition_candidates': partition_candidates,\n                'design_idx_list': design_idx_list,\n                'init_partition_candidates': init_partition_candidates,\n                \"batch_size\": self.search_config[\"batch_size\"],\n                \"use_uram_all\": self.search_config[\"use_uram_all\"]}\n\n        '''\n        # For internal testing\n        import pickle\n        #pickle.dump(meta, open('tmp/meta', 'wb'))\n        #pickle.dump(init_tasks, open('tmp/init_tasks', 'wb'))\n        #exit(0)\n        meta = pickle.load(open('tmp/meta', 'rb'))\n        init_tasks = pickle.load(open('tmp/init_tasks', 'rb'))\n        meta['init_partition_candidates'] = [5]\n        design_list = [self.designs[i] for i in meta['design_idx_list']]\n        search_tasks = []\n        for workload in self.workloads:\n            search_task = SingleTask(design_list[0], workload, self.cst)\n            search_tasks.append(search_task)\n        if search_task_configs:\n            search_tasks[0].configs = search_task_configs\n        search_task = MultiTask(design_list, search_tasks, self.cst, split=1)\n        '''\n\n        search_record = self.tune(search_task, init_tasks, silent=silent, meta=meta)\n\n        best_record.update(search_record)\n\n        return best_record\n\n    def search_fusion_multi_acc_customized2(self, design_idx=-1, search_task_configs=None, silent=0):\n        \"\"\" This function searches the best multi-array configuration.\n        It will periodically schedule the layers onto different systolic arrays.\n        \"\"\"\n        best_record = utils.SearchRecord().reset()\n\n        params = {\n            \"non_fuse_repeat\": 1, # Run the single-array search for multiple times to stablelize the results\n            \"n_designs\": 4, # Only select the top-k designs for consideration\n            \"n_partition_candidates\": 3, # Only consider the top-k partitioning candidates\n            \"n_array_max\": self.search_config[\"max_n_array\"] # At most #arrays are supported\n        }\n\n        import logging\n        logger = logging.getLogger('AutoSA-Tuner')\n\n        design_list = self.designs\n        if design_idx != -1:\n            # Only search a certain design\n            design_idx_list = [design_idx]\n        else:\n            design_idx_list = list(range(len(self.designs)))\n                        \n        # Single array search        \n        design_history = []\n        single_array_record = utils.SearchRecord().reset()\n        search_task_configs = {}\n        #for i in range(len(self.workloads)):\n        #    search_task_configs[i] = {'fix_param': [['r', 1]]}\n        for i in design_idx_list:\n            local_record = utils.SearchRecord().reset()\n            for repeat in range(params[\"non_fuse_repeat\"]):\n                local_record.update(\\\n                    self.search_non_fusion_single_acc_customized1(design_idx=i, silent=silent, one_gen=True))\n                    #search_task_configs=search_task_configs))\n            design_history.append({\"idx\": i, \"record\": local_record})\n            single_array_record.update(local_record)\n        single_array_record.throughput = 1 / single_array_record.latency                \n\n        # For internal testing\n        import pickle\n        pickle.dump(design_history, open(f'tmp/design_history_{self.search_config[\"workload\"]}', 'wb'))\n        pickle.dump(single_array_record, open(f'tmp/single_array_record_{self.search_config[\"workload\"]}', 'wb'))\n        #design_history = pickle.load(open(f'tmp/design_history_{self.search_config[\"workload\"]}', 'rb'))\n        #single_array_record = pickle.load(open(f'tmp/single_array_record_{self.search_config[\"workload\"]}', 'rb'))        \n\n        '''\n        # For the scalability issue, we will only select the top-4 designs\n        # as the candidate dataflows for further exploration.\n        def take_record_latency(elem):\n            return elem[\"record\"].latency\n        design_history.sort(key=take_record_latency)\n        design_history = design_history[:min(params[\"n_designs\"], len(design_history))]\n        design_idx_list = [h[\"idx\"] for h in design_history]                \n        logger.info(f\"Selected design idx: {design_idx_list}\")\n        design_list = [self.designs[i] for i in design_idx_list]\n        '''\n\n        # Try all different #array combinations and rank based on the total ideal latency        \n        def hash_partition(partition):\n            ret = \"\"\n            for p in partition:\n                ret += \"|\"\n                ret += ''.join(str(p))\n                ret += \"|\"\n            return ret\n\n        partition_candidates = []          \n        for n_array in range(2, min(len(self.workloads), params[\"n_array_max\"]) + 1):\n            partition = [[] for i in range(n_array)]\n            for i in range(len(self.workloads)):\n                array_idx = i % n_array\n                partition[array_idx].append(i)\n            layer_sols = single_array_record.task_sols\n            dsp_eff_list = [sol[\"DSP_eff\"] for sol in layer_sols]\n            op_list = [sol[\"ops\"] for sol in layer_sols]\n            total_ops = np.sum(op_list)\n            throughput_list = []\n            for i in range(n_array):\n                dsp_eff_list_cur = [dsp_eff_list[p] for p in partition[i]]\n                dsp_eff_cur = np.mean(dsp_eff_list_cur)                \n                op_list_cur = [op_list[p] for p in partition[i]]\n                t_cur = np.sum(op_list_cur) / total_ops * dsp_eff_cur\n                throughput_list.append(t_cur)\n            record_latency = []\n            for i in range(n_array):\n                op_list_cur = [op_list[p] for p in partition[i]]\n                array_latency_cur = [op_cur / throughput_list[i] for op_cur in op_list_cur]\n                record_latency.append(array_latency_cur)\n\n            design_latency = 0\n            max_round = 0\n            for p in partition:\n                max_round = max(max_round, len(p))\n            for round in range(max_round):\n                array_latency = [record_latency[0][round] * self.search_config[\"batch_size\"]]\n                setup_latency = [0]\n                for array_idx in range(1, n_array):\n                    if round >= len(partition[array_idx]):\n                        break\n                    setup = record_latency[array_idx - 1][round] * 0.2\n                    setup_latency.append(setup)                    \n                    array_latency.append(max(record_latency[array_idx][round] * self.search_config[\"batch_size\"], array_latency[array_idx - 1]))\n                design_latency += (sum(setup_latency) + array_latency[-1])                    \n            design_throughput = 1 / design_latency * self.search_config[\"batch_size\"]\n            if len(partition) > params[\"n_array_max\"]:\n                continue\n            partition_candidates.append({\n                \"idx\": len(partition_candidates),\n                \"partition\": partition,\n                \"hash\": hash_partition(partition),\n                \"throughput\": design_throughput,\n                \"n_arrays\": len(partition)\n            })                \n\n        def take_throughput(elem):\n            return elem[\"throughput\"]\n        partition_candidates.sort(key=take_throughput, reverse=True)\n        logger.info(f\"Partition candidates:\\n{pprint.pformat(partition_candidates, indent=2)}\")\n        init_partition_candidates = [i for i in range(min(params[\"n_partition_candidates\"], len(partition_candidates)))]\n\n        design_idx_list = [4, 5, 6, 8]\n        design_list = [self.designs[i] for i in design_idx_list]\n\n        # Collect the init tasks\n        init_tasks = []\n        for i in range(len(design_list)):\n            job_list = []\n            local_silent = silent\n            if silent == 0:\n                local_silent = 1 if self.search_config[\"n_worker\"] > 1 else 0\n\n            for repeat in range(params[\"non_fuse_repeat\"]):\n                search_tasks = []\n                for workload in self.workloads:\n                    search_task = SingleTask(design_list[i], workload, self.cst)\n                    search_tasks.append(search_task)\n                for t in search_tasks:\n                    for job in job_list:\n                        if job['job_hash'] == f'{str(t)}_{repeat}':\n                            # Avoid duplicate task\n                            continue\n                    job_list.append(\n                        {'job_hash': f'{str(t)}_{repeat}', 'func': self.tune, \\\n                         'args': [t, None, local_silent, 0]})\n\n            pool = utils.MyExecutor(self.search_config['n_worker'])\n            results = pool.exec(job_list)\n            for r in results:\n                if results[r].valid:\n                    init_tasks.append(results[r])\n\n        # Local search\n        search_tasks = []\n        for workload in self.workloads:\n            search_task = SingleTask(design_list[0], workload, self.cst)\n            search_tasks.append(search_task)\n        if search_task_configs:\n            search_tasks[0].configs = search_task_configs\n        search_task = MultiTask(design_list, search_tasks, self.cst, split=1)\n        meta = {'partition_candidates': partition_candidates,                \n                'design_idx_list': design_idx_list,\n                'init_partition_candidates': init_partition_candidates,                \n                \"batch_size\": self.search_config[\"batch_size\"]}\n        search_record = self.tune(search_task, init_tasks, silent=silent, meta=meta)\n\n        best_record.update(search_record)\n\n        return best_record\n"
  },
  {
    "path": "autosa_scripts/odyssey/main.py",
    "content": "import argparse\nfrom datetime import datetime\nimport logging\nimport numpy as np\nimport os\nimport pickle\nimport concurrent.futures\nimport json\nimport pprint\n\nimport utils\nfrom tuners import Constraint\nfrom design import Design\nfrom explorer import ArchExplorer\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--outdir', type=str, default=\"outdir\", help=\"output directory\")\n    parser.add_argument('--db', type=str, default=\"db\", help=\"search database\")\n    parser.add_argument('--use-db', type=int, default=1, help=\"use database\")\n    parser.add_argument('--objective', type=str, default=\"latency\", help=\"optimization target [latency, off_chip_comm, energy, dsp_num]\")\n    parser.add_argument('--cst', type=str, default=\"hw_cst\", help=\"hardware constraint\")\n    parser.add_argument('--stop-after-epochs', type=int, default=-1, help=\"number of epochs of the unit searching task\")\n    parser.add_argument('--stop-after-time', type=int, default=-1, help=\"number of epochs of the unit searching task\")\n    parser.add_argument('--n-worker', type=int, default=8, help=\"number of workers for multi-processing\")\n    parser.add_argument('--designs', type=str, default=\"designs\", help=\"systolic array design directory\")\n    parser.add_argument('--design-idx', type=int, default=-1, help=\"systolic array design index\")\n    parser.add_argument('--workload', type=str, required=True, help=\"searching workload\")\n    # Architecture specific options\n    parser.add_argument('--explore-fusion', action=\"store_true\", help=\"explore layer fusion in a single accelerator\")\n    parser.add_argument('--explore-multi-acc', action=\"store_true\", help=\"explore using multiple accelerators\")\n    parser.add_argument('--explore-programmable', action=\"store_true\", help=\"explore programmable systolic array\")\n    parser.add_argument('--multi-array-mode', type=int, default=0, help=\"execution mode of the generic array in the multi-acc setting\")\n    parser.add_argument('--use-uram', type=int, default=0, help=\"use URAM for the intermediate data in the fused array\")\n    parser.add_argument('--use-uram-all', action=\"store_true\", help=\"use URAM for all the arrays in the multi-array system\")\n    parser.add_argument('--method', type=str, default=\"customized1\", help=\"searching method\")\n    parser.add_argument('--unit-task-method', type=str, default=\"genetic\", help=\"unit task searching method\")\n    #parser.add_argument('--multi-batch', action=\"store_true\", help=\"use multiple batches in the multi-acc array\")\n    parser.add_argument('--batch-size', type=int, default=1, help=\"use multiple batches in the multi-acc array\")\n    parser.add_argument('--profiling', action=\"store_true\", help=\"profiling\")\n    parser.add_argument('--max-n-array', type=int, default=8, help=\"maximal number of arrays\")\n    # Algorithm specific options\n    parser.add_argument('--xgb-n-gens', type=int, default=5)\n    parser.add_argument('--xgb-thres', type=float, default=0.6)\n    parser.add_argument('--xgb-thres-adjust', type=float, default=0.4)\n\n    args = parser.parse_args()\n\n    search_obj = args.objective\n\n    # Set up the working directory\n    now = datetime.now()\n    outdir = args.outdir\n    os.makedirs(outdir, exist_ok=True)\n    explore_config = \"\"\n    explore_config += \"f1\" if args.explore_fusion else \"f0\"\n    explore_config += \"ma1\" if args.explore_multi_acc else \"ma0\"\n    explore_config += \"p1\" if args.explore_programmable else \"p0\"\n    explore_config += f\"mam{args.multi_array_mode}\"\n    explore_config += f\"u{args.use_uram}\"\n    exp_name = f\"O_{args.objective}-W_{args.workload}-C_{explore_config}-T_{now.date()}-{now.time()}\"\n    outdir = f\"{outdir}/{exp_name}\"\n    os.makedirs(outdir, exist_ok=True)\n    logger = utils.init_logger(outdir)\n\n    # Load the hardware constraints\n    cst = Constraint(f'cst/{args.cst}.json')\n\n    # Load the workloads\n    with open(f'workload/{args.workload}.json') as f:\n        data = json.load(f)\n    workloads = []\n    for workload in data['workloads']:\n        workloads.append(workload)\n\n    # Load the designs\n    design_dir = args.designs\n    os.makedirs(f\"{design_dir}/register\", exist_ok=True)\n    designs = []\n    for f in os.listdir(design_dir):\n        if f.endswith(\".json\"):\n            with open(f'{design_dir}/{f}', 'r') as json_f:\n                desp = json.load(json_f)\n            design = Design(f.split(\".\")[0])\n            design.register(desp, f\"{design_dir}/register/{design.name}.py\")\n            designs.append(design)\n    def get_design_name(elem):\n        return elem.name\n    # Sort the designs by names\n    designs.sort(key=get_design_name)\n    if len(designs) == 0:\n        raise RuntimeError(\"No systolic array design was found.\")\n    #for design in designs:\n    #    print(design.name)\n\n    # Update the search stop criteria\n    max_epochs = -1\n    max_time = -1\n    if args.stop_after_epochs > 0:\n        max_epochs = args.stop_after_epochs\n    elif args.stop_after_time > 0:\n        max_time = args.stop_after_time\n    else:\n        max_time = 60 # 60 seconds by default\n\n    # Load the search database if existed\n    db_file = f'{args.db}/{str(cst)}.db'\n    if os.path.exists(db_file) and args.use_db:\n        search_db = pickle.load(open(db_file, 'rb'))\n        logger.info('Found existing tuning database!')\n    else:\n        search_db = None\n\n    # Start search\n    counter = utils.PerfCounter(logger)\n    counter.init_counter(\"total_search_time\")\n\n    search_config = {\n        \"method\": args.method, # [customized1, customized2, exhaustive]\n        \"n_worker\": args.n_worker,\n        \"unit_task_method\": args.unit_task_method, # [exhaustive_pruning, random, sa, bayesian, opentuner, RL]\n        \"profiling\": args.profiling,\n        \"workload\": args.workload,\n        \"design_idx\": args.design_idx,\n        \"genetic_params\": {\"population_size\": [200, 20]},\n        \"args\": args,\n        \"search_records_db\": {} if search_db == None else search_db,\n        \"explore_fusion\": args.explore_fusion,\n        \"explore_multi_acc\": args.explore_multi_acc,\n        \"explore_programmable\": args.explore_programmable,\n        \"multi_array_mode\": args.multi_array_mode,        \n        \"use_db\": args.use_db,\n        \"use_uram\": args.use_uram,\n        \"use_uram_all\": args.use_uram_all,\n        \"batch_size\": args.batch_size,\n        \"max_n_array\": args.max_n_array,\n        \"xgb_params\": {\n            \"n_gens\": args.xgb_n_gens,\n            \"thres\": args.xgb_thres,\n            \"thres_adjust\": args.xgb_thres_adjust\n        }\n    }    \n\n    explorer = ArchExplorer(cst, search_obj, max_epochs, max_time, search_config, designs, workloads)\n    search_record = explorer.search()\n\n    # Update the database\n    search_db = explorer.search_config[\"search_records_db\"]\n    if os.path.exists(db_file):\n        old_search_db = pickle.load(open(db_file, 'rb'))\n        for search_task in search_db:\n            if search_task in old_search_db:\n                old_search_db[search_task].update(search_db[search_task])\n            else:\n                old_search_db[search_task] = search_db[search_task]\n        pickle.dump(old_search_db, open(db_file, 'wb'))\n    else:\n        pickle.dump(search_db, open(db_file, 'wb'))\n\n    counter.update_counter(\"total_search_time\")\n    counter.print_counter(\"total_search_time\")\n\n    # Display and dump out the search results\n    #def print_records(record, num):\n    #    num += 1\n    #    if num > 10:\n    #        return\n    #    while record.records:\n    #        print(record.task_names, len(record.records))\n    #        for r in record.records:\n    #            print_records(r, num)\n    #print_records(search_record, 0)\n\n    logger.info(f'{search_record.to_str()}')\n    with open(f'{outdir}/history.log', 'w') as f:\n        f.write(search_record.to_str())\n"
  },
  {
    "path": "autosa_scripts/odyssey/requirements.txt",
    "content": "bayesian-optimization==1.1.0\ncertifi==2021.10.8\ndill @ file:///home/conda/feedstock_root/build_artifacts/dill_1623610058511/work\njoblib @ file:///tmp/build/80754af9/joblib_1635411271373/work\nmkl-fft==1.3.1\nmkl-random @ file:///tmp/build/80754af9/mkl_random_1626186066731/work\nmkl-service==2.4.0\nmultiprocess @ file:///home/conda/feedstock_root/build_artifacts/multiprocess_1623774446079/work\nnumpy @ file:///tmp/build/80754af9/numpy_and_numpy_base_1634095651905/work\npathos @ file:///home/conda/feedstock_root/build_artifacts/pathos_1623937754918/work\npox @ file:///home/conda/feedstock_root/build_artifacts/pox_1623773830989/work\nppft @ file:///home/conda/feedstock_root/build_artifacts/ppft_1623774454681/work\nscikit-learn @ file:///tmp/build/80754af9/scikit-learn_1635187048948/work\nscipy @ file:///tmp/build/80754af9/scipy_1630606796912/work\nsix @ file:///tmp/build/80754af9/six_1623709665295/work\nthreadpoolctl @ file:///Users/ktietz/demo/mc3/conda-bld/threadpoolctl_1629802263681/work\nxgboost==1.3.3\n"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/compute_network_info.py",
    "content": "import csv\nimport json\n\ncsv_columns = [\"Layer\", \"Name\", \"i\", \"o\", \"r\", \"c\", \"p\", \"q\", \"ops\", \"parallelism\", \"ai\", \"parallelism_norm\", \"ai_norm\",\n               \"throughput_free\", \"dsp_eff_free\", \"kernel\", \"latency_fixed\", \"dsp_eff_fixed\", \"throughput\", \"throughput_norm\"]\ndict_data = []\nwith open(\"../workload/resnet50.json\", \"r\") as f:\n    network_data = json.load(f)\n#for layer in network_data[\"workloads\"]:\nparallelism_min = float(\"inf\")\nai_min = float(\"inf\")\nfor idx in range(len(network_data[\"workloads\"])):\n    layer = network_data[\"workloads\"][idx]\n    i, o, r, c, p, q = layer[\"params\"][\"i\"], layer[\"params\"][\"o\"], layer[\"params\"][\"r\"], layer[\"params\"][\"c\"], \\\n                       layer[\"params\"][\"p\"], layer[\"params\"][\"q\"]\n    dict_data.append({\n        'Layer': idx + 1,\n        'Name': layer[\"name\"],\n        'i': i, 'o': o, 'r': r, 'c': c, 'p': p, 'q': q,\n        \"ops\": i*o*r*c*p*q, \"parallelism\": o*r*c, \"ai\": i*o*r*c*p*q/(i*(r+p-1)*(c+q-1)+o*r*c+i*o*p*q)\n    })\n    parallelism_min = min(parallelism_min, dict_data[-1][\"parallelism\"])\n    ai_min = min(ai_min, dict_data[-1][\"ai\"])\n# normalize\nfor data in dict_data:\n    data[\"parallelism_norm\"] = data[\"parallelism\"] / parallelism_min\n    data[\"ai_norm\"] = data[\"ai\"] / ai_min\n\n# load the tuning log\nlog_file = \"/home/jaywang/AutoSA_Tuner/refactor2/outdir/O_latency-W_resnet50-C_f0ma0p0mam0u0-T_2021-07-02-11:52:32.005352/tuning.log\"\nthroughput_min = float(\"inf\")\nwith open(log_file, \"r\") as f:\n    lines = f.readlines()\n    total_layer = 0\n    for line_idx in range(len(lines)):\n        line = lines[line_idx]    \n        if line.find(\"DSP_eff\") != -1:\n            dsp_eff = float(line.strip().split(\":\")[-1].strip(\",\"))\n            dict_data[total_layer][\"dsp_eff_fixed\"] = dsp_eff\n            latency = float(lines[line_idx + 2].strip().split(\":\")[-1].strip(\",\"))\n            dict_data[total_layer][\"latency_fixed\"] = latency\n            dict_data[total_layer][\"throughput\"] = dict_data[total_layer][\"ops\"] / dict_data[total_layer][\"latency_fixed\"]\n            throughput_min = min(throughput_min, dict_data[total_layer][\"throughput\"])\n            total_layer += 1\n            if total_layer >= len(dict_data):\n                break\n\n# normalize\nfor data in dict_data:\n    data[\"throughput_norm\"] = data[\"throughput\"] / throughput_min\n\nwith open(\"../tmp/resnet_info.csv\", \"w\") as csvfile:\n    write = csv.DictWriter(csvfile, fieldnames=csv_columns)\n    write.writeheader()\n    for data in dict_data:\n        write.writerow(data)"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/grid_search_xgb_params.py",
    "content": "import os\nimport subprocess\nimport re\nimport pprint\n\n'''\nfor model_gens in [5, 10, 20, 50]:\n    for xgb_thres in [0.2, 0.4, 0.6, 0.8]:\n        #for xgb_thres_adjust in [0.2, 0.4, 0.6, 0.8]:\n        # data1\n        #for xgb_thres_adjust in [0.2, 0.4]: \n        # data2\n        for xgb_thres_adjust in [0.6, 0.8]:\n            # Call the python command\n            cmd = f\"python main.py --workload=vgg16 --stop-after-time=10 --use-db=0 --n-worker=32 --design-idx=4 --xgb-n-gens={model_gens} --xgb-thres={xgb_thres} --xgb-thres-adjust={xgb_thres_adjust}\"\n            #os.system(f\"python main.py --workload=vgg16 --stop-after-time=10 --use-db=0 --n-worker=32 --design-idx=4 --xgb-n-gens={model_gens} --xgb-thres={xgb_thres} --xgb-thres-adjust={xgb_thres_adjust}\")\n            #print(cmd)\n            process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)\n            output, error = process.communicate()\n'''\n\n# Collect the best\nbasepath = \"./outdir/\"\nprjs = os.listdir(basepath)\nprjs.sort()\n#print(prjs)\n\nresults = []\nprj_idx = 0\nfor model_gens in [5, 10, 20, 50]:\n    for xgb_thres in [0.2, 0.4, 0.6, 0.8]:        \n        for xgb_thres_adjust in [0.6, 0.8]:\n            with open(f\"./outdir/{prjs[prj_idx]}/tuning.log\") as f:\n                lines = f.readlines()\n                rewards = []\n                for line in lines:\n                    if line.find(\"new best reward\") != -1:                        \n                        epoch = re.search(r\"Epoch (.+?):\", line).group(1)\n                        latency = re.search(r\"\\((.+?)\\)\", line).group(1)\n                        rewards.append({\"epoch\": int(epoch), \"latency\": float(latency)})\n                results.append({\"configs\": [model_gens, xgb_thres, xgb_thres_adjust], \"rewards\": rewards, \"prj\": prjs[prj_idx]})\n            prj_idx += 1\n\n# Sort the results\ndef takeBestReward(elem):\n    return elem[\"rewards\"][-1][\"latency\"]\nresults.sort(key=takeBestReward)\npprint.pprint(results)"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/img2col.py",
    "content": "import json\n\n#with open('workload/vgg16.json') as f:\n#with open('workload/resnet50.json') as f:\nwith open('workload/mobilenetv2.json') as f:\n    data = json.load(f)\n\nfor layer in data[\"workloads\"]:\n    i, o, r, c, p, q = layer[\"params\"][\"i\"], layer[\"params\"][\"o\"], layer[\"params\"][\"r\"], \\\n                       layer[\"params\"][\"c\"], layer[\"params\"][\"p\"], layer[\"params\"][\"q\"]\n    gemm_i = o\n    gemm_j = r * c\n    gemm_k = i * p * q\n    layer[\"params\"] = {\"i\": gemm_i, \"j\": gemm_j, \"k\": gemm_k}\n    layer[\"tags\"] = [\"gemm\"]\n\n\n#with open(\"workload/vgg16_img2col.json\", \"w\") as f:\n#with open(\"workload/resnet50_img2col.json\", \"w\") as f:\nwith open(\"workload/mobilenetv2_img2col.json\", \"w\") as f:\n    json.dump(data, f, indent=2)"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/run_arch1.sh",
    "content": "#!/bin/bash\n\ncd ..\nrm -rf outdir/*\nrm -rf tmp/*\nfor design_idx in 1 4 7 10 13 16 19 22 25 28\ndo\n    python main.py --workload=vgg16 --stop-after-time=10 --use-db=0 --n-worker=32 --design-idx=$design_idx\n    python main.py --workload=resnet50 --stop-after-time=10 --use-db=0 --n-worker=32 --design-idx=$design_idx\n    python main.py --workload=mobilenetv2 --stop-after-time=10 --use-db=0 --n-worker=32 --design-idx=$design_idx\ndone\ncp -r outdir/* tmp/\ncd -\n"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/run_arch1_free.sh",
    "content": "#!/bin/bash\n\ncd ..\nrm -rf outdir/*\nrm -rf tmp/*\nfor design_idx in 1 4 7 10 13 16 19 22 25 28\ndo\n    #for layer_idx in {1..49}\n    #do\n    #    python main.py --workload=resnet50_$layer_idx --stop-after-time=10 --use-db=0 --design-idx=$design_idx\n    #done    \n    for layer_idx in {1..36}\n    do\n        python main.py --workload=mobilenetv2_$layer_idx --stop-after-time=10 --use-db=0 --design-idx=$design_idx\n    done    \n    for layer_idx in {1..13}\n    do\n        python main.py --workload=vgg16_$layer_idx --stop-after-time=10 --use-db=0 --design-idx=$design_idx\n    done    \ndone\ncp -r outdir/* tmp/\ncd -\n"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/run_arch1_ml_cmp.sh",
    "content": "#!/bin/bash\n\ncd ..\nrm -rf outdir/*\n#rm -rf tmp/*\nfor design_idx in 1 4 7 10 13 16 19 22 25 28\ndo\n    #python main.py --workload=vgg16 --stop-after-time=10 --use-db=0 --n-worker=32 --design-idx=$design_idx\n    python main.py --workload=resnet50 --stop-after-time=15 --use-db=0 --n-worker=32 --design-idx=$design_idx\ndone\ncp -r outdir/* tmp/\ncd -\n"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/run_arch2.sh",
    "content": "#!/bin/bash\n\ncd ..\nrm -rf outdir/*\nrm -rf tmp/*\n\npython main.py --workload=vgg16 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=8\npython main.py --workload=resnet50 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=8\npython main.py --workload=mobilenetv2 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=8\n\npython main.py --workload=vgg16 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=16\npython main.py --workload=resnet50 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=16\npython main.py --workload=mobilenetv2 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=16\n\n#python main.py --workload=vgg16 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=24\npython main.py --workload=resnet50 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=24\npython main.py --workload=mobilenetv2 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=24\n\ncp -r outdir/* tmp/\ncd -\n"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/run_arch3.sh",
    "content": "#!/bin/bash\n\ncd ..\nrm -rf outdir/*\nrm -rf tmp/*\n\n#python main.py --workload=vgg16 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized2 --max-n-array=8\n#python main.py --workload=resnet50 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized2 --max-n-array=8\n#python main.py --workload=mobilenetv2 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized2 --max-n-array=8\n\npython main.py --workload=vgg16 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized2 --max-n-array=8 --batch-size=16\npython main.py --workload=resnet50 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized2 --max-n-array=8 --batch-size=16\npython main.py --workload=mobilenetv2 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized2 --max-n-array=8 --batch-size=16\n\ncp -r outdir/* tmp/\ncd -\n"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/run_arch4.sh",
    "content": "#!/bin/bash\n\ncd ..\nrm -rf outdir/*\n#rm -rf tmp/*\n\n#python main.py --workload=vgg16 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=8\n#python main.py --workload=resnet50 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=8\n#python main.py --workload=mobilenetv2 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=8\n\n#python main.py --workload=vgg16 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=16\n#python main.py --workload=resnet50 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=16\n#python main.py --workload=mobilenetv2 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=16\n\n#python main.py --workload=vgg16 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=24\n#python main.py --workload=resnet50 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=24\n#python main.py --workload=mobilenetv2 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=24\n\npython main.py --workload=resnet50 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=16 --use-uram-all\npython main.py --workload=resnet50 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=8 --use-uram-all\npython main.py --workload=mobilenetv2 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=16 --use-uram-all\npython main.py --workload=vgg16 --stop-after-time=10 --use-db=0 --n-worker=32 --explore-multi-acc --explore-fusion --method=customized1 --max-n-array=16\n\ncp -r outdir/* tmp/\ncd -\n"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/run_dataflow_cmp_cnn.sh",
    "content": "#!/bin/bash\n\ncd ..\nrm -rf outdir/*\nrm -rf tmp/*\n#for design_idx in {0..29}\nfor design_idx in 6 7 8 15 16 17 27 28 29\ndo\n    for layer_idx in {1..13} \n    do    \n        python main.py --workload=vgg16_$layer_idx --stop-after-time=10 --use-db=0 --unit-task-method=genetic --design-idx=$design_idx --profiling\n    done\ndone\ncp -r outdir/* tmp/\ncd -\n"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/run_dataflow_cmp_mm.sh",
    "content": "#!/bin/bash\n\ncd ..\nrm -rf outdir/*\nrm -rf tmp/*\n#for design_idx in {0..17}\nfor design_idx in 0\n#for design_idx in {14..14}\n#for design_idx in 6 7 8 12 13 14 15 16 17\ndo\n    #python main.py --workload=mm --stop-after-time=10 --use-db=0 --unit-task-method=genetic --design-idx=$design_idx --profiling\n    # Solver cmp\n    python main.py --workload=mm --stop-after-time=20 --use-db=0 --unit-task-method=genetic --design-idx=$design_idx --profiling \n    # Imperfect pruning\n    #python main.py --workload=mm --stop-after-time=10 --use-db=0 --unit-task-method=genetic --design-idx=$design_idx --profiling --objective=off_chip_comm\ndone\n\ncp -r outdir/* tmp/\ncd -\n"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/run_dataflow_cmp_mm_energy.sh",
    "content": "#!/bin/bash\n\ncd ..\nrm -rf outdir/*\nrm -rf tmp/*\nfor design_idx in {0..17}\ndo    \n    python main.py --workload=mm --stop-after-time=10 --use-db=0 --unit-task-method=genetic --design-idx=$design_idx --objective=energy --profiling\ndone\n\ncp -r outdir/* tmp/\ncd -\n"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/run_img2col_single.sh",
    "content": "#!/bin/bash\n\ncd ..\npython main.py --workload=vgg16_img2col --stop-after-time=10 --use-db=0 --n-worker=32\npython main.py --workload=resnet50_img2col --stop-after-time=10 --use-db=0 --n-worker=32\npython main.py --workload=mobilenetv2_img2col --stop-after-time=10 --use-db=0 --n-worker=32\ncd -\n"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/run_method_cmp.sh",
    "content": "#!/bin/bash\n\ncd ..\nrm -rf outdir/*\nrm -rf tmp/*\nfor design_idx in {0..17}\ndo\n    python main.py --workload=mm --stop-after-time=300 --use-db=0 --unit-task-method=genetic --profiling --design-idx=$design_idx\n    python main.py --workload=mm --stop-after-time=300 --use-db=0 --unit-task-method=random --profiling --design-idx=$design_idx\n    python main.py --workload=mm --stop-after-time=300 --use-db=0 --unit-task-method=random_pruning --profiling --design-idx=$design_idx\n    python main.py --workload=mm --stop-after-epoch=150000 --use-db=0 --unit-task-method=annealing --profiling --design-idx=$design_idx\n    python main.py --workload=mm --stop-after-epoch=300 --use-db=0 --unit-task-method=bayesian --profiling --design-idx=$design_idx\n    python main.py --workload=mm --stop-after-time=300 --use-db=0 --unit-task-method=open_tuner --profiling --design-idx=$design_idx\n    python main.py --workload=mm --stop-after-epoch=50000 --use-db=0 --unit-task-method=RL --profiling --design-idx=$design_idx\ndone\ncp -r outdir/* tmp/\ncd -\n"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/run_metric_cmp.sh",
    "content": "#!/bin/bash\n\ncd ..\nrm -rf outdir/*\nrm -rf tmp/*\n#python main.py --workload=mm --stop-after-time=20 --use-db=0 --unit-task-method=genetic --profiling --design-idx=0\n#python main.py --workload=mm --stop-after-time=20 --use-db=0 --unit-task-method=genetic --profiling --design-idx=1\n#python main.py --workload=mm --stop-after-time=20 --use-db=0 --unit-task-method=genetic --profiling --design-idx=2\n#python main.py --workload=mm --stop-after-time=20 --use-db=0 --unit-task-method=genetic --profiling --design-idx=3 --objective=off_chip_comm\npython main.py --workload=mm --stop-after-time=20 --use-db=0 --unit-task-method=genetic --profiling --design-idx=3 --objective=dsp_num\n#python main.py --workload=mm --stop-after-time=20 --use-db=0 --unit-task-method=genetic --profiling --design-idx=4\n#python main.py --workload=mm --stop-after-time=20 --use-db=0 --unit-task-method=genetic --profiling --design-idx=5\ncp -r outdir/* tmp/\ncd -\n"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/run_mutation_cmp.sh",
    "content": "#!/bin/bash\n\n# Use solver by default\n# Set epsilon to 0 when only using the factorization mutation\ncd ..\nrm -rf outdir/*\nrm -rf tmp/*\npython main.py --workload=mm --stop-after-time=20 --use-db=0 --unit-task-method=genetic --design-idx=3 --profiling\ncp -r outdir/* tmp/\ncd -"
  },
  {
    "path": "autosa_scripts/odyssey/scripts/split_cnn_layers.py",
    "content": "import csv\nimport json\n\n#network = \"resnet50\"\nnetwork = \"mobilenetv2\"\nwith open(f\"../workload/{network}.json\", \"r\") as f:\n    network_data = json.load(f)\nlayer_idx = 1\nfor layer in network_data[\"workloads\"]:\n    data = {}\n    data[\"workloads\"] = [layer]\n    with open(f\"../workload/{network}_{layer_idx}.json\", \"w\") as f:\n        json.dump(data, f, indent=4)\n    layer_idx += 1"
  },
  {
    "path": "autosa_scripts/odyssey/search_task.py",
    "content": "import json\nimport random\nimport numpy as np\nimport bisect\n\nimport utils\nfrom design import Design\n\nclass SingleTask(object):\n    \"\"\" Single workload searching task.\n    \"\"\"\n    def __init__(self, design, workload, hw_cst):\n        self.design = design\n        self.workload = workload\n\n        self.hw_cst = hw_cst\n        self.fre = 300 # 300 MHz\n        self.dw = 4 # bytes\n        self.dt = \"float\"\n        self.fuse = 0\n        self.last_fuse = 0 # the last fusion task in the network\n        self.use_uram = 0\n        self.serialize = 0\n        # Fixed architecture solution\n        self.arch_sol = None\n        self.arch_cst = None\n        self.arch_feature = None\n        self.fixed = 0        \n        # Other configs\n        self.configs = {}\n        self.aux_funcs = {}\n\n    def __repr__(self):\n        #ret = f't_{self.workload[\"name\"]}_'\n        ret = \"\"\n        for param in self.workload[\"params\"]:\n            ret += param            \n            ret += \"_\"\n            ret += f'{self.workload[\"params\"][param]}'\n        ret += f'_d_{self.design.name}'\n        ret += f'_cst_{self.hw_cst}'\n        ret += f'_f_{self.fuse}{self.last_fuse}'\n        ret += f'_u_{self.use_uram}'\n        ret += f'_s_{self.serialize}'\n        if self.fixed == 1:\n            ret += f'_fixed_'\n            for k, v in self.arch_sol.items():\n                ret += f'{k}{v}'\n        if len(self.configs) > 0:\n            ret += f'_config_'\n            for k, v in self.configs.items():\n                if k == \"fix_param\":\n                    ret += \"fix_param_\"\n                    for p_pair in v:\n                        ret += p_pair[0]\n                        ret += \"_\"\n                        ret += str(p_pair[1])\n                elif k == \"equate_params\":\n                    ret += \"equate_params_\"\n                    for p_pair in v:\n                        ret += p_pair[0]\n                        ret += \"_\"\n                        ret += p_pair[1]\n                elif k == \"prev_workload\":\n                    ret += \"prev_workload_\"\n                    ret += self.configs['prev_workload']['name']\n                elif k == \"prev_sol\":\n                    ret += \"prev_sol_\"\n                    for p in self.configs['prev_sol']:\n                        ret += p\n                        ret += \"_\"\n                        ret += str(self.configs['prev_sol'][p])\n                elif k == \"prev_latency\":\n                    ret += \"prev_latency_\"\n                    ret += str(self.configs['prev_latency'])\n                else:\n                    ret += f'{k}{v}'\n\n        return ret\n\n    def adjust_params(self, params):\n        \"\"\" Adjust the parameters based on its contraints.\n        \"\"\"\n        def filter_non_power_of_two(x):\n            if np.log2(x) != int(np.log2(x)):\n                return True\n            return False\n        \n        # Making all factors to be even numbers to have more divisors\n        #for p, param in self.design.params_config[\"tunable\"].items():\n        #    params[p] = int(np.ceil(params[p] / 2) * 2)        \n        for p in params:\n            params[p] = int(params[p])\n\n        # Making all divisor factors to be divisors of the dependent variable\n        for p, param in self.design.params_config[\"tunable\"].items():\n            #print(param)\n            if \"divisors\" in param:\n                if \"tags\" in param and \"power_of_two\" in param[\"tags\"]:\n                    choices = utils.get_divisors(int(params[param[\"divisors\"][0]]), filter_non_power_of_two)\n                else:\n                    choices = utils.get_divisors(int(params[param[\"divisors\"][0]]), None)\n                idx = bisect.bisect(choices, params[p])\n                if idx >= len(choices):\n                    idx -= 1\n                if idx > 1:\n                    if abs(choices[idx - 1] - params[p]) < abs(choices[idx] - params[p]):\n                        idx -= 1\n                #print(params[param[\"divisors\"][0]])\n                #print(\"idx\", idx)\n                #print(\"len\", len(choices))\n                params[p] = choices[idx]\n\n        # Adjust the fixed parameters        \n        if 'fix_param' in self.configs:\n            for fix_p in self.configs['fix_param']:\n                for p, param in self.design.params_config[\"tunable\"].items():                \n                    if p.startswith(fix_p[0]):\n                        params[p] = int(fix_p[1])\n        if 'equate_params' in self.configs:\n            for p_pair in self.configs['equate_params']:\n                params[p_pair[1]] = params[p_pair[0]]\n\n        return params\n\n    def generate_random_sample(self):\n        \"\"\" Generate a random sample in the design space.\n        \"\"\"\n        workload_params = {}\n        for param in self.workload[\"params\"]:\n            workload_params[param] = self.workload[\"params\"][param]\n        return self.design.random_sampling(workload_params)        \n\n    def check_arch_legality(self, arch_features):\n        \"\"\" Check if the current architecture is legal.\n        \"\"\"\n        if self.fixed == 0:\n            return True\n        # dims\n        for idx in range(len(arch_features['dims'])):\n            if arch_features['dims'][idx] > self.arch_cst['dims'][idx]:\n                return False\n        # SIMD\n        if arch_features['SIMD'] > self.arch_cst['SIMD']:\n            return False\n        # data pack\n        for arr in arch_features['data_pack']:\n            for idx in range(len(arch_features['data_pack'][arr])):\n                if arch_features['data_pack'][arr][idx] > self.arch_cst['data_pack'][arr][idx]:\n                    return False\n        # resource usage\n        for module in arch_features['resource']:\n            if module.endswith(\"unit_memory\"):\n                if arch_features[\"resource\"][module] > self.arch_cst['resource'][module]:\n                    return False\n\n        return True\n\n    def adjust_latency_buffer(self, latency, latency_meta, params):\n        \"\"\" Adjust latency and for customized search tasks.\n        cin_read_mode:\n        0: normal ping-pong mode, no need to adjust\n        1: load cin one time from the external memory\n        2: load cin from on-chip BRAM buffer\n        3: load cin from on-chip URAM buffer\n        cout_write_mode:\n        0: write to external memory\n        1: write to on-chip buffer\n        w_read_mode:\n        0: normal ping-pong mode, no need to adjust\n        1: load w from on-chip URAM buffer\n        Note: Only works for kernel4\n        \"\"\"\n        if ('cin_read_mode' not in self.configs) or ('cout_write_mode' not in self.configs):\n            return latency, latency_meta\n\n        \"\"\"\n        Latency prologue\n        \"\"\"        \n        w_latency_list = []\n        for item, value in latency_meta[\"latency_prologue\"].items():\n            if item.startswith('w'):\n                w_latency_list.append({\"item\": item, \"value\": value})\n        cin_latency_list = []\n        for item, value in latency_meta['latency_prologue'].items():\n            if item.startswith('cin'):\n                cin_latency_list.append({\"item\": item, \"value\": value})\n        # Sort the latency list by item names\n        def take_item(elem):\n            return elem['item']\n        w_latency_list.sort(key=take_item)\n        cin_latency_list.sort(key=take_item)\n\n        w_latency = 0\n        if 'w_read_mode' not in self.configs or (self.configs['w_read_mode'] == 0):\n            for w in w_latency_list:\n                w_latency = max(w_latency, w['value'])        \n        elif self.configs['w_read_mode'] == 1:\n            w_latency_list = w_latency_list[:-1]\n            for w in w_latency_list:\n                w_latency = max(w_latency, w['value'])\n\n        cin_latency = 0\n        if self.configs['cin_read_mode'] == 0:            \n            for cin in cin_latency_list:\n                cin_latency = max(cin_latency, cin['value'])\n        if self.configs['cin_read_mode'] == 1:\n            # Modify the cin latency            \n            for cin in cin_latency_list:\n                cin_latency = max(cin_latency, cin['value'])                            \n            cin_latency = self.call_aux_func('update_cin_latency')(cin_latency, self, params)            \n        elif self.configs['cin_read_mode'] == 2:                        \n            pass\n        elif self.configs['cin_read_mode'] == 3:\n            # Peel off the last one accessing the DRAM\n            cin_latency_list = cin_latency_list[:-1]            \n            for cin in cin_latency_list:\n                cin_latency = max(cin_latency, cin['value'])        \n        latency_prologue = max(w_latency, cin_latency)\n\n        \"\"\"\n        Latency main\n        \"\"\"\n        cout_latency_list = []\n        for item, value in latency_meta['latency_main'].items():\n            if item.startswith('cout'):\n                cout_latency_list.append({\"item\": item, \"value\": value})        \n        w_latency_list = []\n        for item, value in latency_meta['latency_main'].items():\n            if item.startswith('w'):\n                w_latency_list.append({\"item\": item, \"value\": value})                        \n        cin_latency_list = []\n        for item, value in latency_meta['latency_main'].items():\n            if item.startswith('cin'):\n                cin_latency_list.append({\"item\": item, \"value\": value})        \n        cout_latency_list.sort(key=take_item)  \n        w_latency_list.sort(key=take_item)  \n        cin_latency_list.sort(key=take_item)  \n\n        #latency_main = max(latency_meta['latency_main']['PE_latency'], w_latency)\n        latency_main = latency_meta['latency_main']['PE_latency']\n        w_latency = 0\n        if 'w_read_mode' not in self.configs or (self.configs['w_read_mode'] == 0):\n            for w in w_latency_list:\n                w_latency = max(w_latency, w['value'])            \n        else:\n            w_latency_list = w_latency_list[:-1]\n            for w in w_latency_list:\n                w_latency = max(w_latency, w['value'])\n\n        cin_latency = 0\n        if self.configs['cin_read_mode'] == 0:            \n            for cin in cin_latency_list:\n                cin_latency = max(cin_latency, cin['value'])            \n        elif self.configs['cin_read_mode'] == 1:\n            pass\n        elif self.configs['cin_read_mode'] == 2:\n            pass\n        elif self.configs['cin_read_mode'] == 3:\n            # Peel off the last one accessing the DRAM\n            cin_latency_list = cin_latency_list[:-1]            \n            for cin in cin_latency_list:\n                cin_latency = max(cin_latency, cin['value'])\n        \n        cout_latency = 0        \n        if self.configs['cout_write_mode'] == 0:            \n            for cout in cout_latency_list:\n                cout_latency = max(cout_latency, cout['value'])            \n        elif self.configs['cout_write_mode'] == 1:\n            # Peel off the last one accessing the DRAM\n            cout_latency_list = cout_latency_list[:-1]            \n            for cout in cout_latency_list:\n                cout_latency = max(cout_latency, cout['value'])\n        latency_main = max(latency_main, cin_latency, w_latency, cout_latency)\n        \n        \"\"\"\n        Latency epilogue\n        \"\"\"\n        cout_latency_list = []\n        for item, value in latency_meta['latency_epilogue'].items():\n            if item.startswith('cout'):\n                cout_latency_list.append({\"item\": item, \"value\": value})        \n        cout_latency_list.sort(key=take_item)\n\n        cout_latency = 0\n        if self.configs['cout_write_mode'] == 0:            \n            for cout in cout_latency_list:\n                cout_latency = max(cout_latency, cout['value'])           \n        elif self.configs['cout_write_mode'] == 1:\n            # Peel off the last one accessing the DRAM\n            cout_latency_list = cout_latency_list[:-1]        \n            for cout in cout_latency_list:\n                cout_latency = max(cout_latency, cout['value'])\n        latency_epilogue = cout_latency\n\n        #print(latency_prologue, latency_main, latency_epilogue)\n        if self.fuse == 1 and self.last_fuse == 1:            \n            n_iter = np.ceil(self.workload['params']['r'] / params['r_t1']) * \\\n                     np.ceil(self.workload['params']['c'] / params['c_t1'])\n            latency = n_iter * (latency_prologue + latency_main / n_iter + latency_epilogue) * n_iter\n        else:\n            latency = latency_prologue + latency_main + latency_epilogue\n        \n        latency_meta = {\n            \"latency_prologue\": latency_prologue,\n            \"latency_main\": latency_main,\n            \"latency_epilogue\": latency_epilogue\n        }\n\n        return latency, latency_meta\n\n    def adjust_latency_multi_acc(self, latency, latency_meta, params):\n        \"\"\" Adjust latency for multi-acc setting\n        \"\"\"\n        # Update the setup latency\n        if ('prev_workload' not in self.configs) or ('prev_sol' not in self.configs) or \\\n           ('prev_latency' not in self.configs):\n            return latency\n        \n        prev_workload = self.configs['prev_workload']\n        prev_sol = self.configs['prev_sol']\n        prev_latency = self.configs['prev_latency']\n        o1 = prev_workload[\"params\"]['o']\n        tr1 = min(prev_sol['r_t1'], prev_workload[\"params\"]['r'])\n        tc1 = min(prev_sol['c_t1'], prev_workload[\"params\"]['c'])\n        tr1_post = tr1\n        tc1_post = tc1\n        for tag in prev_workload[\"tags\"]:\n            if tag.startswith(\"maxpool\"):\n                stride = int(tag.split('_')[-1])\n                tr1_post /= stride\n                tc1_post /= stride\n        tr1_post = max(int(tr1_post), 1)\n        tc1_post = max(int(tc1_post), 1)\n\n        tr2 = min(params[\"r_t1\"], self.workload[\"params\"][\"r\"])\n        tc2 = min(params[\"c_t1\"], self.workload[\"params\"][\"c\"])\n        k = self.workload[\"params\"][\"p\"]\n        data_pack = params[\"i_t2\"]\n\n        c0 = np.ceil((tr2 + k - 1) / tr1_post)\n        c1 = np.ceil((tc2 + k - 1) / tc1_post)\n        trp = min(c0 * tr1, prev_workload[\"params\"][\"r\"])\n        tcp = min(c1 * tc1, prev_workload[\"params\"][\"c\"])\n        #if (prev_sol[\"r_t1\"] == params[\"r_t1\"]) and \\\n        #   (prev_sol[\"c_t1\"] == params[\"c_t1\"]):\n        #    tri = np.ceil(params[\"i_t1\"] / prev_sol[\"o_t1\"]) * prev_sol[\"o_t1\"]\n        #    setup = prev_latency / (np.ceil(prev_workload[\"params\"]['o'] / tri))\n        #else:\n        setup = prev_latency / (np.ceil(prev_workload[\"params\"][\"r\"] / trp) * np.ceil(prev_workload[\"params\"][\"c\"] / tcp))\n        \n        latency_meta = {\n            \"latency_orig\": latency\n        }\n\n        return latency + setup, latency_meta\n\n    def adjust_latency(self, latency, latency_meta, params):\n        \"\"\" Adjust latency and for customized search tasks.            \n        \"\"\"\n        adjust_buffer = False\n        adjust_multi_acc = False\n        for key in ['cin_read_mode', 'cout_write_mode', 'w_read_mode']:\n            for config_key in self.configs:\n                if key == config_key:\n                    adjust_buffer = True\n                    break\n        if adjust_buffer:\n            latency, latency_meta = self.adjust_latency_buffer(latency, latency_meta, params)\n            \n        for key in ['prev_workload', 'prev_sol', 'prev_latency']:\n            for config_key in self.configs:\n                if key == config_key:\n                    adjust_multi_acc = True\n                    break\n        if adjust_multi_acc:\n            latency, latency_meta = self.adjust_latency_multi_acc(latency, latency_meta, params)\n                    \n        return latency, latency_meta\n    \n    def adjust_resource(self, resource, resource_meta, params):\n        \"\"\" Update the cin buffer for fused design.\n        \"\"\"\n        if 'update_cin_buf' in self.aux_funcs:\n            def est_BRAM18K(ele_size, ele_num, pack):\n                #return np.ceil(ele_size * 8 * pack / 18) * np.ceil(ele_num / pack / 1024)\n                return np.ceil(ele_size * 8 * pack / 36) * np.ceil(ele_num / pack / 512)\n\n            if self.use_uram == 0:\n                # Update cin_buf\n                for item in resource_meta:\n                    if item.startswith(\"cin\"):\n                        cin_buf_size = est_BRAM18K(resource_meta[item]['ele_size'], resource_meta[item]['buf_size'], resource_meta[item]['data_pack_factor'])\n                        cin_buf_num = resource_meta[item]['num']\n                        break\n                resource[\"BRAM18K\"] -= (cin_buf_size * cin_buf_num)\n                cin_buf_size = max(self.call_aux_func('update_cin_buf')(self, params, resource_meta[item]['ele_size'] * 8 * resource_meta[item]['data_pack_factor'], resource_meta[item]['buf_size'] / resource_meta[item]['data_pack_factor']), cin_buf_size)\n                resource[\"BRAM18K\"] += (cin_buf_size * cin_buf_num)\n            else:\n                # Compute cin_buf\n                uram = resource[\"URAM\"]\n                for item in resource_meta:\n                    if item.startswith(\"cin\"):\n                        data_pack = resource_meta[item]['data_pack_factor']                        \n                        break\n                uram = max(self.call_aux_func('update_cin_buf')(self, params, data_pack) * 2, uram)\n                resource[\"URAM\"] = uram\n\n        return resource\n\n    def compute_arch_cst(self, params):\n        arch_cst = self.design.compute_arch_cst(params)\n        params = self.design.infer_params(params)\n        if params:\n            if not self.design.bound_check(params):\n                arch_cst = None\n            else:\n                resource, resource_meta = self.design.est_resource(params)\n                if len(self.configs) > 0:\n                    resource = self.adjust_resource(resource, resource_meta, params)\n                arch_cst['resource'] = resource\n        else:\n            arch_cst = None\n\n        return arch_cst\n\n    def evaluate(self, params, metric=\"latency\"):\n        if metric not in [\"latency\", \"off_chip_comm\", \"energy\", \"dsp_num\"]:\n            raise RuntimeError(f\"Not supported metric: {metric}\")\n\n        params = self.design.infer_params(params)\n        if params:\n            if not self.design.bound_check(params):                \n                return 0, None, None                \n            latency, latency_meta = self.design.est_latency(params)\n            if len(self.configs) > 0:\n                latency, latency_meta = self.adjust_latency(latency, latency_meta, params)            \n            if self.fixed == 1:\n                # Check the architecture constraints\n                arch_cst_cur = self.compute_arch_cst(params)\n                if not self.check_arch_legality(arch_cst_cur):\n                    return 0, None, None                        \n                resource = self.arch_cst['resource']\n            else:\n                resource, resource_meta = self.design.est_resource(params)\n                if len(self.configs) > 0:\n                    resource = self.adjust_resource(resource, resource_meta, params)                \n\n            # Compute the other activity\n            activity = self.design.est_activity(params)\n\n            if metric == \"latency\":                \n                if latency:\n                    return 1 / latency, resource, {'latency': latency_meta, 'activity': activity}\n                else:\n                    return 0, None, None\n            elif metric == \"off_chip_comm\":\n                if activity:\n                    latency_meta['latency'] = latency\n                    return 1 / activity[\"off_chip_acc_num\"], resource, {'latency': latency_meta, 'activity': activity}\n                else:\n                    return 0, None, None\n            elif metric == \"energy\":\n                if activity:\n                    latency_meta['latency'] = latency\n                    energy = self.compute_energy(activity)\n                    return 1 / energy, resource, {'latency': latency_meta, 'activity': activity}\n                else:\n                    return 0, None, None\n            elif metric == \"dsp_num\":\n                if activity:\n                    latency_meta['latency'] = latency\n                    return resource[\"DSP\"], resource, {'latency': latency_meta, 'activity': activity}\n                else:\n                    return 0, None, None\n        else:\n            return 0, None, None        \n\n    def compute_energy(self, activity):\n        \"\"\" Estimate the energy consumption of the design.\n        \"\"\"           \n        '''\n        def est_static_power(x, fre=300):\n            \"\"\"\n            returns in Watts\n            \"\"\"\n            x = x * 100\n            return (6.72 - 0.307 * x + 7.24 * 1e-3 * x * x) * (fre / 300)\n\n        # Default values (W at 300MHz)\n        res_unit_power = {\n            \"BRAM18K\": 0.0005033482143,\n\t\t    \"DSP\": 0.0008828125\n        }\n        # Compute the unit transaction energy\n        res_unit_energy = {\n\t\t    \"BRAM18K\": res_unit_power[\"BRAM18K\"] / (300 * 1e6) / 2 * 1e12,\n\t\t    \"DSP\": res_unit_power[\"DSP\"] / (300 * 1e6) * 1e12 * 5 # FP32\n\t    }\n\n        # DRAM default value\n        dram_unit_energy = 427.9 # (pJ) 16-bit 2GB DDR3 at 100MHz (from Wang HPCA)\n        # Scale the value \n        dram_unit_energy *= self.dw / 2\n        hop_unit_energy = 0\n\n        on_chip_energy = res_unit_energy[\"DSP\"] * activity[\"compute_stmt_call_num\"]\n        on_chip_energy += res_unit_energy[\"BRAM18K\"] * activity[\"io_module_mem_acc_num\"] + \\\n                          res_unit_energy[\"BRAM18K\"] * (activity[\"pe_module_mem_acc_num\"] + activity[\"pe_module_reg_acc_num\"])\n        on_chip_energy += hop_unit_energy * activity[\"noc_hop_num\"]\n        off_chip_energy = dram_unit_energy * activity[\"off_chip_acc_num\"]                \n\n        return (on_chip_energy + off_chip_energy) / 1e9        \n        '''\n                \n        # Eyeriss model (normalized)\n        res_unit_energy = {        \n            \"RF\": 1,\n            \"ALU\": 1,\n            \"GlobalBuf\": 6\n        }\n        dram_unit_energy = 200\n        hop_unit_energy = 2\n\n        '''\n        # Interstellar model (pJ)\n        res_unit_energy = {        \n            \"RF\": 0.03, \n            \"ALU\": 0.075,\n            \"GlobalBuf\": 6\n        }\n        dram_unit_energy = 200\n        hop_unit_energy = 0.035              \n        '''\n\n        on_chip_energy = res_unit_energy[\"ALU\"] * activity[\"compute_stmt_call_num\"]\n        on_chip_energy += res_unit_energy[\"GlobalBuf\"] * activity[\"io_module_mem_acc_num\"] + \\\n                          res_unit_energy[\"GlobalBuf\"] * activity[\"pe_module_mem_acc_num\"] + \\\n                          res_unit_energy[\"RF\"] * activity[\"pe_module_reg_acc_num\"]        \n        on_chip_energy += hop_unit_energy * activity[\"noc_hop_num\"]\n        off_chip_energy = dram_unit_energy * activity[\"off_chip_acc_num\"]\n\n        return (on_chip_energy + off_chip_energy) / 1e9        \n\n    def compute_dsp_eff(self, latency, dsp):\n        \"\"\" Compute the DSP efficiency of the current design.\n        Note: Only works for FP32 on Xilinx FPGA\n        \"\"\"\n        return (self.compute_ops() / (dsp / 5 * 2)) / latency\n\n    def compute_ops(self):\n        \"\"\" Compute the total amount of operations of the workload.\n        \"\"\"        \n        if \"gemm\" in self.workload[\"tags\"]:\n            return self.workload[\"params\"][\"i\"] * self.workload[\"params\"][\"j\"] * self.workload[\"params\"][\"k\"] * 2\n        elif \"conv\" in self.workload[\"tags\"]:\n            return self.workload[\"params\"][\"i\"] * self.workload[\"params\"][\"o\"] * self.workload[\"params\"][\"r\"] * self.workload[\"params\"][\"c\"] * self.workload[\"params\"][\"p\"] * self.workload[\"params\"][\"q\"] * 2\n        else:\n            raise RuntimeError(f\"Not supported workload: {self.workload['name']}\")\n\n    def compute_bw(self, params):\n        \"\"\" Compute the bandwidth requirement of the task.\n        Note: Only works for 32-bit data\n        \"\"\"\n        latency, _ = self.design.est_latency(params)\n        off_chip_trans = self.est_off_chip_trans(params)\n        bw = off_chip_trans * self.dw / (latency / (self.fre * 1e6)) / 1e9 # GB/s\n        \n        return bw\n\n    def est_off_chip_trans(self, params):        \n        activity = self.design.est_activity(params)\n        off_chip_acc_num_meta = activity['off_chip_acc_num_meta']\n        if \"conv\" in self.workload[\"tags\"]:\n            cin_trans = 0\n            w_trans = 0\n            cout_trans = 0\n            for module in off_chip_acc_num_meta:\n                if module.startswith(\"cin\"):\n                    cin_trans = off_chip_acc_num_meta[module]\n                if module.startswith(\"w\"):\n                    w_trans = off_chip_acc_num_meta[module]\n                if module.startswith(\"cout\"):\n                    cout_trans = off_chip_acc_num_meta[module]\n            if \"cin_read_mode\" in self.configs:\n                if self.configs[\"cin_read_mode\"] == 2 or self.configs[\"cin_read_mode\"] == 3:\n                    cin_trans = 0\n            if \"cout_write_mode\" in self.configs:\n                if self.configs[\"cout_write_mode\"] == 1:\n                    cout_trans = 0\n            if \"w_read_mode\" in self.configs:\n                if self.configs[\"w_reads_mode\"] == 1:\n                    w_trans = 0\n            return cin_trans + w_trans + cout_trans\n        else:\n            return activity[\"off_chip_acc_num\"]        \n        \n        '''\n        if \"gemm\" in self.workload[\"tags\"]:            \n            i, j, k = self.workload[\"params\"]['i'], self.workload[\"params\"]['j'], self.workload[\"params\"]['k']\n            i_t1, j_t1, k_t1 = params['i_t1'], params['j_t1'], params['k_t1']\n            trans = np.ceil(i / i_t1) * np.ceil(j / j_t1) * np.ceil(k / k_t1) * (i_t1 * k_t1 + j_t1 * k_t1) + \\\n                    np.ceil(i / i_t1) * np.ceil(j / j_t1) * (i_t1 * j_t1)\n        elif \"conv\" in self.workload[\"tags\"]:\n            i, o, r, c, p, q = self.workload[\"params\"][\"i\"], self.workload[\"params\"][\"i\"], \\\n                               self.workload[\"params\"][\"r\"], self.workload[\"params\"][\"c\"], \\\n                               self.workload[\"params\"][\"p\"], self.workload[\"params\"][\"q\"]\n            i_t1, o_t1, r_t1, c_t1 = params[\"i_t1\"], params[\"o_t1\"], \\\n                                     params[\"r_t1\"], params[\"c_t1\"]\n            cin_trans = i_t1 * (r_t1 + p - 1) * (c_t1 + q - 1)\n            w_trans = i_t1 * o_t1 * p * q\n            cout_trans = o_t1 * r_t1 * c_t1\n            if \"cin_read_mode\" in self.configs:\n                if self.configs[\"cin_read_mode\"] == 2 or self.configs[\"cin_read_mode\"] == 3:\n                    cin_trans = 0\n            if \"cout_write_mode\" in self.configs:\n                if self.configs[\"cout_write_mode\"] == 1:\n                    cout_trans = 0\n            if \"w_read_mode\" in self.configs:\n                if self.configs[\"w_reads_mode\"] == 1:\n                    w_trans = 0\n            trans = np.ceil(i / i_t1) * np.ceil(o / o_t1) * np.ceil(r / r_t1) * np.ceil(c / c_t1) * \\\n                    (cin_trans + w_trans) + \\\n                    np.ceil(o / o_t1) * np.ceil(r / r_t1) * np.ceil(c / c_t1) * cout_trans\n        else:\n            raise RuntimeError(f\"Not supported task: {self.task['name']}\")\n\n        return trans\n        '''        \n\n    def compute_ctc(self, params):\n        \"\"\" Compute the compute-to-communication ratio of the task.\n        \"\"\"\n        ops = self.compute_ops()\n        off_chip_trans = self.est_off_chip_trans(params)\n        comm = off_chip_trans * self.dw\n        ctc = ops / comm\n\n        return ctc\n\n    def set_arch_cst(self, arch_cst):\n        self.fixed = 1\n        self.arch_cst = arch_cst.copy()\n    \n    def clear_arch_cst(self):\n        self.fixed = 0\n        self.arch_cst = None\n\n    def set_arch_sol(self, sol):\n        self.arch_sol = sol\n\n    def set_aux_func(self, tag, func_name):\n        \"\"\" Set the auxiliary functions.\n        tag refers to the function tag.\n        func_name points to pre-defined functions.\n        \"\"\"\n        self.aux_funcs[tag] = func_name\n\n    def call_aux_func(self, tag):\n        # Preset functions\n        # Update the cin load latency\n        def update_cin_latency_last(lat, task, sol):\n            lat *= np.ceil(task.workload[\"params\"]['i'] / sol['i_t1'])\n            return lat\n        # Update the cin on-chip buffer\n        def update_cin_buf_bram_last(task, sol, width, depth):\n            depth *= np.ceil(task.workload[\"params\"]['i'] / sol['i_t1'])\n            #mem = np.ceil(width / 18) * np.ceil(depth / 1024)\n            mem = np.ceil(width / 36) * np.ceil(depth / 512)\n            return mem\n        # Update the cin on-chip buffer\n        def update_cin_buf_uram_last(task, sol, data_pack):\n            depth = task.workload[\"params\"]['i'] * sol['r_t1'] * sol['c_t1']\n            mem = np.ceil(task.dw * 8 * data_pack / 72) * np.ceil(depth / data_pack / 4096)\n            return mem\n        # Update the cin load latency\n        def update_cin_latency(lat, task, sol):\n            lat *= (np.ceil(task.workload[\"params\"]['i'] / sol['i_t1']) * \\\n                    np.ceil(task.workload[\"params\"]['r'] / sol['r_t1']) * \\\n                    np.ceil(task.workload[\"params\"]['c'] / sol['c_t1']))\n            return lat\n        # Update the cin on-chip buffer\n        def update_cin_buf_bram(task, sol, width, depth):\n            depth *= (np.ceil(task.workload[\"params\"]['i'] / sol['i_t1']) * \\\n                      np.ceil(task.workload[\"params\"]['r'] / sol['r_t1']) * \\\n                      np.ceil(task.workload[\"params\"]['c'] / sol['c_t1']))            \n            #mem = np.ceil(width / 18) * np.ceil(depth / 1024)\n            mem = np.ceil(width / 36) * np.ceil(depth / 512)\n            return mem\n        # Update the cin on-chip buffer    \n        def update_cin_buf_uram(task, sol, data_pack):\n            depth = task.workload[\"params\"]['i'] * task.workload[\"params\"]['r'] * task.workload[\"params\"]['c']\n            mem = np.ceil(task.dw * 8 * data_pack / 72) * np.ceil(depth / data_pack / 4096)\n            return mem\n\n        if self.aux_funcs[tag] == 'update_cin_latency_last':\n            return update_cin_latency_last\n        elif self.aux_funcs[tag] == 'update_cin_buf_bram_last':\n            return update_cin_buf_bram_last\n        elif self.aux_funcs[tag] == 'update_cin_buf_uram_last':\n            return update_cin_buf_uram_last\n        elif self.aux_funcs[tag] == 'update_cin_latency':\n            return update_cin_latency\n        elif self.aux_funcs[tag] == 'update_cin_buf_bram':\n            return update_cin_buf_bram\n        elif self.aux_funcs[tag] == 'update_cin_buf_uram':\n            return update_cin_buf_uram\n        else:\n            raise RuntimeError(f'Not supported function: {tag}')\n\n    def clear_aux_func(self):\n        self.aux_funcs = {}\n\nclass MultiTask(object):\n    \"\"\" Search task object used by the tuner.\n    # TODO: To be modified\n    \"\"\"\n    def __init__(self, design, search_tasks, hw_cst, fuse=0, max_latency=-1, split=0, use_uram=0):\n        self.design = design\n        self.tasks = search_tasks\n\n        self.hw_cst = hw_cst\n        self.fre = 200 # 200 MHz\n        self.dw = 4 # bytes\n        self.dt = \"float\"\n        self.fuse = fuse\n        self.max_latency = max_latency\n        self.split = split\n        self.use_uram = use_uram\n        for task in self.tasks:\n            task.use_uram = use_uram\n        # Fixed architecture solution\n        self.fixed = 0\n        self.arch_sol = None\n        self.arch_cst = None\n        # Other configs\n        self.configs = {}\n        if isinstance(self.design, Design):\n            # Initialize the external params, using the largest dimensions        \n            self.workload = {\"params\": {}}\n            for p, param in self.design.params_config[\"external\"].items():\n                self.workload[\"params\"][param[\"name\"]] = 1\n            for task in self.tasks:\n                for p, param in self.design.params_config[\"external\"].items():\n                    self.workload[\"params\"][param[\"name\"]] = max(self.workload[\"params\"][param[\"name\"]], task.workload[\"params\"][param[\"name\"]])\n\n    def __repr__(self):\n        ret = \"\"\n        for task in self.tasks:\n            ret += str(task)        \n        if isinstance(self.design, Design):\n            ret += f'_d_{self.design.name}'\n        else:\n            for design in self.design:\n                ret += f'_d_{design.name}'\n        ret += f'_cst_{self.hw_cst}'\n        ret += f'_f_{self.fuse}'\n        ret += f'_s_{self.split}'\n        ret += f'_u_{self.use_uram}'\n        if len(self.configs) > 0:\n            ret += f'_config_'\n            for k, v in self.configs:\n                ret += f'{k}{v}'\n\n        return ret    \n\n    def generate_random_sample(self):\n        \"\"\" Generate a random sample in the design space.\n        \"\"\"\n        workload_params = {}\n        for param in self.workload[\"params\"]:\n            workload_params[param] = self.workload[\"params\"][param]\n        return self.design.random_sampling(workload_params)    \n\n    def compute_dsp_eff(self, latency, dsp):\n        \"\"\" Compute the DSP efficiency of the current design.\n        Note: Only works for FP32 on Xilinx FPGA\n        \"\"\"\n        return (self.compute_ops() / (dsp / 5 * 2)) / latency\n\n    def compute_ops(self):\n        \"\"\" Compute the total amount of operations of the task.\n        \"\"\"\n        total_ops = 0\n        for task in self.tasks:\n            if \"gemm\" in task.workload[\"tags\"]:\n                total_ops += task.workload[\"params\"][\"i\"] * task.workload[\"params\"][\"j\"] * task.workload[\"params\"][\"k\"] * 2\n            elif \"conv\" in task.workload[\"tags\"]:\n                total_ops += task.workload[\"params\"][\"i\"] * task.workload[\"params\"][\"o\"] * task.workload[\"params\"][\"r\"] * task.workload[\"params\"][\"c\"] * task.workload[\"params\"][\"p\"] * task.workload[\"params\"][\"q\"] * 2            \n            else:\n                raise RuntimeError(f\"Not supported workload: {task.workload['tags']}\")\n        return total_ops\n\n    def compute_arch_cst(self, params):\n        \"\"\" Compute the architecture constraints.\n        \"\"\"\n        arch_cst = None\n        for task in self.tasks:\n            cur_arch_cst = task.compute_arch_cst(params)\n            # Take the one with looser contraints\n            if not arch_cst:\n                arch_cst = cur_arch_cst\n            else:\n                # dims\n                for idx in range(len(arch_cst['dims'])):\n                    arch_cst['dims'][idx] = max(arch_cst['dims'][idx], cur_arch_cst['dims'][idx])\n                # SIMD\n                arch_cst[\"SIMD\"] = max(arch_cst[\"SIMD\"], cur_arch_cst[\"SIMD\"])\n                # data pack\n                for arr in arch_cst['data_pack']:\n                    for idx in range(len(arch_cst['data_pack'][arr])):\n                        arch_cst['data_pack'][arr][idx] = max(arch_cst['data_pack'][arr][idx], cur_arch_cst['data_pack'][arr][idx])\n                # resource\n                for module in arch_cst['resource']:\n                    if module.endswith(\"unit_memory\"):\n                        arch_cst[\"resource\"][module] = max(arch_cst[\"resource\"][module], cur_arch_cst[\"resource\"][module])\n        \n        return arch_cst\n\n    def set_arch_cst(self, arch_cst):\n        \"\"\" Set the architecture constraints.\n        \"\"\"\n        self.fixed = 1\n        self.arch_cst = arch_cst.copy()\n        # Set the subtasks\n        for task in self.tasks:\n            task.set_arch_cst(arch_cst.copy())\n\n    def clear_arch_cst(self):\n        self.fixed = 0\n        self.arch_cst = None\n        for task in self.tasks:\n            task.clear_arch_cst()\n\n    def set_arch_sol(self, sol):\n        self.fixed = 1\n        self.arch_sol = sol\n        for task in self.tasks:\n            task.set_arch_sol(sol)"
  },
  {
    "path": "autosa_scripts/odyssey/solver.py",
    "content": "from subprocess import Popen, PIPE\nimport tempfile\nimport shutil\n\ndef off_chip_solver_gemm(search_task, cst, fixed_params=None, save=0):\n    \"\"\" If any parameter found in fixed_params, this parameter will not be tiled.\n    \"\"\"\n    with tempfile.TemporaryDirectory() as tmpdirname:\n        # Generate the model file\n        with open(f'{tmpdirname}/tmp.mod', 'w') as f:\n            for p in [\"i\", \"j\", \"k\"]:\n                f.write(f'param {p};\\n')            \n            f.write('param dsp_bound;\\n')\n            f.write('param bram_bound;\\n')\n            f.write('param data_w;\\n')\n            \n            for p in [\"i\", \"j\", \"k\"]:\n                f.write(f'var {p}1 integer >= 1, <= {p};\\n')\n            for p in [\"i\", \"j\"]:\n                f.write(f'var {p}2 integer >= 1, <= {p};\\n')\n            for p in [\"k\"]:\n                f.write(f'var {p}2 integer >= 1, <= {32/search_task.dw};\\n')\n            \n            for p in [\"i\", \"j\", \"k\"]:            \n                f.write(f'var c{p}1 integer >= 1, <= {p};\\n')\n                f.write(f'var c{p}2 integer >= 1, <= {p};\\n')\n            for p in [\"k\"]:\n                f.write(f'var c{p}3 integer >= 1, <= {p};\\n')            \n            \n            f.write('minimize target:\\n')\n            # off_chip/DSP\n            #f.write('\\t(i*cj1*k+ci1*j*k+i*j)/\\n')\n            #if search_task.design.name.startswith(\"kernel0\"):\n            #    f.write('\\t(ci2*k2);\\n\\n')\n            #elif search_task.design.name.startswith(\"kernel1\"):\n            #    f.write('\\t(cj2*k2);\\n\\n')\n            #elif search_task.design.name.startswith(\"kernel2\"):\n            #    f.write('\\t(k1);\\n\\n')\n            #elif search_task.design.name.startswith(\"kernel3\"):\n            #    f.write('\\t(ci2*cj2*k2);\\n\\n')\n            #elif search_task.design.name.startswith(\"kernel4\"):\n            #    f.write('\\t(ci2*k1);\\n\\n')\n            #elif search_task.design.name.startswith(\"kernel5\"):\n            #    f.write('\\t(cj2*k1);\\n\\n')\n            \n            # off_chip\n            #f.write('\\t(i*cj1*k+ci1*j*k+i*j);\\n\\n')\n\n            # compute\n            #f.write('\\t-(ci2*cj2*k2);\\n\\n')\n\n            # off_chip - compute            \n            f.write('\\t(i*cj1*k+ci1*j*k+i*j)-\\n')\n            if search_task.design.name.startswith(\"kernel0\"):\n                f.write('\\t(ci2*k2);\\n\\n')\n            elif search_task.design.name.startswith(\"kernel1\"):\n                f.write('\\t(cj2*k2);\\n\\n')\n            elif search_task.design.name.startswith(\"kernel2\"):\n                f.write('\\t(k1);\\n\\n')\n            elif search_task.design.name.startswith(\"kernel3\"):\n                f.write('\\t(ci2*cj2*k2);\\n\\n')\n            elif search_task.design.name.startswith(\"kernel4\"):\n                f.write('\\t(ci2*k1);\\n\\n')\n            elif search_task.design.name.startswith(\"kernel5\"):\n                f.write('\\t(cj2*k1);\\n\\n')\n\n            if search_task.design.name.startswith(\"kernel0\"):\n                f.write('subject to DSP_cst:\\n')\n                f.write('\\t0 <= ci2*1*k2*5 <= dsp_bound;\\n\\n') # Only works for FP32\n            elif search_task.design.name.startswith(\"kernel1\"):\n                f.write('subject to DSP_cst:\\n')\n                f.write('\\t0 <= cj2*1*k2*5 <= dsp_bound;\\n\\n')\n            elif search_task.design.name.startswith(\"kernel2\"):\n                f.write('subject to DSP_cst:\\n')\n                f.write('\\t0 <= k1*5 <= dsp_bound;\\n\\n')\n            elif search_task.design.name.startswith(\"kernel3\"):\n                f.write('subject to DSP_cst:\\n')\n                f.write('\\t0 <= ci2*cj2*k2*5 <= dsp_bound;\\n\\n')\n            elif search_task.design.name.startswith(\"kernel4\"):\n                f.write('subject to DSP_cst:\\n')\n                f.write('\\t0 <= ci2*k1*5 <= dsp_bound;\\n\\n')\n            elif search_task.design.name.startswith(\"kernel5\"):\n                f.write('subject to DSP_cst:\\n')\n                f.write('\\t0 <= cj2*k1*5 <= dsp_bound;\\n\\n')\n            \n            f.write('subject to BRAM_cst:\\n')\n            #f.write('\\t0 <= (data_w*i1*k1)/(18*1024)*2+\\n')\n            f.write('\\tceil(data_w/18)*ceil(i1*k1/1024)*2+\\n')\n            #f.write('\\t     (data_w*j1*k1)/(18*1024)*2+\\n')\n            f.write('\\tceil(data_w/18)/ceil(j1*k1/1024)*2+\\n')\n            #f.write('\\t     (data_w*i1*j1)/(18*1024)*2 <= bram_bound;\\n\\n')\n            f.write('\\tceil(data_w/18)/ceil(i1*j1/1024)*2 <= bram_bound;\\n\\n')\n\n            for p in [\"i\", \"j\", \"k\"]:\n                f.write(f'subject to c{p}1_cst:\\n')\n                f.write(f'\\t{p} = c{p}1*{p}1;\\n\\n')\n            for p in [\"i\", \"j\", \"k\"]:\n                f.write(f'subject to c{p}2_cst:\\n')\n                f.write(f'\\t{p}1 = c{p}2*{p}2;\\n\\n')\n            for p in [\"k\"]:\n                f.write(f'subject to c{p}3_cst:\\n')\n                f.write(f'\\t{p}2 = c{p}3*2;\\n\\n') # even number\n\n            if search_task.design.name.startswith(\"kernel0\") or \\\n               search_task.design.name.startswith(\"kernel1\") or \\\n               search_task.design.name.startswith(\"kernel3\"):             \n                f.write('subject to latency_hiding_cst:\\n')\n                f.write('\\ti2*j2 >= 8*k2;\\n\\n') # Only for FP32\n            \n        with open(f'{tmpdirname}/tmp.dat', 'w') as f:\n            for p in [\"i\", \"j\", \"k\"]:\n                f.write(f'param {p} := {search_task.workload[\"params\"][p]};\\n')            \n            f.write(f'param dsp_bound := {int(cst.hw_cst[\"DSP\"])};\\n')\n            f.write(f'param bram_bound := {int(cst.hw_cst[\"BRAM18K\"])};\\n')\n            f.write(f'param data_w := 32;\\n') # Only for FP32           \n\n        # Generate the AMPL script\n        with open(f'{tmpdirname}/tmp.run', 'w') as f:\n            f.write('option solver ipopt;\\n')\n            f.write('reset;\\n')\n            f.write('model ./solver/tmp.mod;\\n')\n            f.write('data ./solver/tmp.dat;\\n')\n            f.write('solve;\\n')\n            f.write('display target,i1,j1,k1,i2,j2,k2;\\n')\n        \n        # Call the solver    \n        cmd = [\"ampl\", f\"{tmpdirname}/tmp.run\"]\n        pipe = Popen(cmd, stdout=PIPE, stderr=PIPE)\n        text = pipe.communicate()[0].decode('ascii')\n\n        # Collect the results\n        text = text.split('\\n')\n        #print(text)\n        opt_dims = [1, 1, 1, 1, 1, 1]\n        update = 0\n        for line in text:\n            if line.startswith(\"i1 = \"):\n                opt_dims[0] = int(float(line.split('=')[-1].strip()) + 0.5)\n                update += 1\n            if line.startswith(\"j1 = \"):\n                opt_dims[1] = int(float(line.split('=')[-1].strip()) + 0.5)\n                update += 1\n            if line.startswith(\"k1 = \"):\n                opt_dims[2] = int(float(line.split('=')[-1].strip()) + 0.5)\n                update += 1\n            if line.startswith(\"i2 = \"):\n                opt_dims[3] = int(float(line.split('=')[-1].strip()) + 0.5)\n                update += 1\n            if line.startswith(\"j2 = \"):\n                opt_dims[4] = int(float(line.split('=')[-1].strip()) + 0.5)\n                update += 1\n            if line.startswith(\"k2 = \"):\n                opt_dims[5] = int(float(line.split('=')[-1].strip()) + 0.5)\n                update += 1\n        \n        #print(update, opt_dims)\n        if update != len(opt_dims):\n            # The solver isn't finished correctly.\n            opt_dims = None\n\n        if save == 1:\n            shutil.copyfile(f'{tmpdirname}/tmp.mod', 'solver/tmp.mod')\n            shutil.copyfile(f'{tmpdirname}/tmp.dat', 'solver/tmp.dat')\n            shutil.copyfile(f'{tmpdirname}/tmp.run', 'solver/tmp.run')\n    \n    return opt_dims\n\ndef off_chip_solver_conv(search_task, cst, fixed_params=None, save=0):\n    \"\"\" If any parameter found in fixed_params, this parameter will not be tiled.\n    \"\"\"\n    with tempfile.TemporaryDirectory() as tmpdirname:\n        # Generate the model file\n        with open(f'{tmpdirname}/tmp.mod', 'w') as f:\n            for p in [\"i\", \"o\", \"r\", \"c\", \"p\", \"q\"]:\n                f.write(f'param {p};\\n')            \n            f.write('param dsp_bound;\\n')\n            f.write('param bram_bound;\\n')\n            f.write('param data_w;\\n')\n            \n            for p in [\"i\", \"o\", \"r\", \"c\"]:\n                f.write(f'var {p}1 integer >= 1, <= {p};\\n')            \n            for p in [\"o\", \"r\", \"c\"]:\n                f.write(f'var {p}2 integer >= 1, <= {p};\\n')\n            for p in [\"i\"]:\n                f.write(f'var {p}2 integer >= 1, <= {32/search_task.dw};\\n')\n            \n            for p in [\"i\", \"o\", \"r\", \"c\"]:\n                f.write(f'var c{p}1 integer >= 1, <= {p};\\n')\n                f.write(f'var c{p}2 integer >= 1, <= {p};\\n')\n            for p in [\"i\"]:\n                f.write(f'var c{p}3 integer >= 1, <= {p};\\n')\n            \n            f.write('minimize target:\\n')\n            # off_chip/DSP\n            # Ignore the padded data\n            f.write('\\t(i*r*c*co1+i*o*p*q*cr1*cc1+o*r*c*ci1)/\\n')                        \n            if search_task.design.name.startswith(\"kernel0\"):\n                f.write('\\t(co2*i2);\\n\\n')\n            elif search_task.design.name.startswith(\"kernel1\"):\n                f.write('\\t(cr2*i2);\\n\\n')\n            elif search_task.design.name.startswith(\"kernel2\"):\n                f.write('\\t(cc2*i2);\\n\\n')\n            elif search_task.design.name.startswith(\"kernel3\"):\n                f.write('\\t(ci2*i2);\\n\\n')\n            elif search_task.design.name.startswith(\"kernel4\"):\n                f.write('\\t(co2*cr2*i2);\\n\\n')\n            elif search_task.design.name.startswith(\"kernel5\"):\n                f.write('\\t(co2*cc2*i2);\\n\\n')                \n            elif search_task.design.name.startswith(\"kernel6\"):\n                f.write('\\t(co2*ci2*i2);\\n\\n')\n            elif search_task.design.name.startswith(\"kernel7\"):\n                f.write('\\t(cr2*cc2*i2);\\n\\n')                \n            elif search_task.design.name.startswith(\"kernel8\"):\n                f.write('\\t(cr2*ci2*i2);\\n\\n')       \n            elif search_task.design.name.startswith(\"kernel9\"):\n                f.write('\\t(cc2*ci2*i2);\\n\\n')\n            else:\n                raise RuntimeError(f\"Not supported design by the solver: {search_task.design.name}\")            \n\n            if search_task.design.name.startswith(\"kernel0\"):\n                f.write('subject to DSP_cst:\\n')\n                f.write('\\t0 <= co2*i2*5 <= dsp_bound;\\n\\n')\n            elif search_task.design.name.startswith(\"kernel1\"):\n                f.write('subject to DSP_cst:\\n')\n                f.write('\\t0 <= cr2*i2*5 <= dsp_bound;\\n\\n')\n            elif search_task.design.name.startswith(\"kernel2\"):\n                f.write('subject to DSP_cst:\\n')\n                f.write('\\t0 <= cc2*i2*5 <= dsp_bound;\\n\\n')\n            elif search_task.design.name.startswith(\"kernel3\"):\n                f.write('subject to DSP_cst:\\n')\n                f.write('\\t0 <= ci2*i2*5 <= dsp_bound;\\n\\n')\n            elif search_task.design.name.startswith(\"kernel4\"):\n                f.write('subject to DSP_cst:\\n')\n                f.write('\\t0 <= co2*cr2*i2*5 <= dsp_bound;\\n\\n')\n            elif search_task.design.name.startswith(\"kernel5\"):\n                f.write('subject to DSP_cst:\\n')\n                f.write('\\t0 <= co2*cc2*i2*5 <= dsp_bound;\\n\\n')\n            elif search_task.design.name.startswith(\"kernel6\"):\n                f.write('subject to DSP_cst:\\n')\n                f.write('\\t0 <= co2*ci2*i2*5 <= dsp_bound;\\n\\n')\n            elif search_task.design.name.startswith(\"kernel7\"):\n                f.write('subject to DSP_cst:\\n')\n                f.write('\\t0 <= cr2*cc2*i2*5 <= dsp_bound;\\n\\n')\n            elif search_task.design.name.startswith(\"kernel8\"):\n                f.write('subject to DSP_cst:\\n')\n                f.write('\\t0 <= cr2*ci2*i2*5 <= dsp_bound;\\n\\n')\n            elif search_task.design.name.startswith(\"kernel9\"):\n                f.write('subject to DSP_cst:\\n')\n                f.write('\\t0 <= cc2*ci2*i2*5 <= dsp_bound;\\n\\n')\n            else:\n                raise RuntimeError(f\"Not supported design by the solver: {search_task.design.name}\")\n                        \n            f.write('subject to BRAM_cst:\\n')\n            f.write('\\t0 <= (data_w*i1*r1*c1)/(18*1024)*2+\\n')\n            f.write('\\t     (data_w*i1*o1*p*q)/(18*1024)*2+\\n')                \n            f.write('\\t     (data_w*o1*r1*c1)/(18*1024)*2 <= bram_bound;\\n\\n')            \n\n            for p in [\"i\", \"o\", \"r\", \"c\"]:\n                f.write(f'subject to c{p}1_cst:\\n')\n                f.write('\\t{p} = c{p}1*{p}1;\\n\\n')\n            for p in [\"i\", \"o\", \"r\", \"c\"]:\n                f.write(f'subject to c{p}2_cst:\\n')\n                f.write('\\t{p}1 = c{p}2*{p}2;\\n\\n')                \n            for p in [\"i\"]:\n                f.write(f'subject to c{p}3_cst:\\n')\n                f.write(f'\\t{p}2 = c{p}3*2;\\n\\n') # even number   \n\n            # TODO: Add other dataflows\n            if search_task.design.name.startswith(\"kernel0\") or \\\n               search_task.design.name.startswith(\"kernel1\") or \\\n               search_task.design.name.startswith(\"kernel2\") or \\\n               search_task.design.name.startswith(\"kernel4\") or \\\n               search_task.design.name.startswith(\"kernel5\") or \\\n               search_task.design.name.startswith(\"kernel7\"):             \n                f.write('subject to latency_hiding_cst:\\n')\n                f.write('\\to2*r2*c2 >= 8*i2;\\n\\n') # Only for FP32\n            \n        with open(f'{tmpdirname}/tmp.dat', 'w') as f:\n            for p in [\"i\", \"o\", \"r\", \"c\"]:\n                f.write(f'param {p} := {search_task.workload[\"params\"][p]};\\n')            \n            f.write(f'param dsp_bound := {int(cst.hw_cst[\"DSP\"])};\\n')\n            f.write(f'param bram_bound := {int(cst.hw_cst[\"BRAM18K\"])};\\n')\n            f.write(f'param data_w := 32;\\n') # Only for FP32           \n\n        # Generate the AMPL script\n        with open(f'{tmpdirname}/tmp.run', 'w') as f:\n            f.write('option solver ipopt;\\n')\n            f.write('reset;\\n')\n            f.write('model ./solver/tmp.mod;\\n')\n            f.write('data ./solver/tmp.dat;\\n')\n            f.write('solve;\\n')\n            f.write('display target,i1,o1,r1,c1,i2,o2,r2,c2;\\n')\n        \n        # Call the solver    \n        cmd = [\"ampl\", f\"{tmpdirname}/tmp.run\"]\n        pipe = Popen(cmd, stdout=PIPE, stderr=PIPE)\n        text = pipe.communicate()[0].decode('ascii')\n\n        # Collect the results\n        text = text.split('\\n')\n        #print(text)\n        opt_dims = [1, 1, 1, 1, 1, 1, 1, 1]\n        update = 0\n        for line in text:\n            if line.startswith(\"i1 = \"):\n                opt_dims[0] = int(float(line.split('=')[-1].strip()) + 0.5)\n                update += 1\n            if line.startswith(\"o1 = \"):\n                opt_dims[1] = int(float(line.split('=')[-1].strip()) + 0.5)\n                update += 1\n            if line.startswith(\"r1 = \"):\n                opt_dims[2] = int(float(line.split('=')[-1].strip()) + 0.5)\n                update += 1\n            if line.startswith(\"c1 = \"):\n                opt_dims[3] = int(float(line.split('=')[-1].strip()) + 0.5)\n                update += 1\n            if line.startswith(\"i2 = \"):\n                opt_dims[4] = int(float(line.split('=')[-1].strip()) + 0.5)\n                update += 1\n            if line.startswith(\"o2 = \"):\n                opt_dims[5] = int(float(line.split('=')[-1].strip()) + 0.5)\n                update += 1\n            if line.startswith(\"r2 = \"):\n                opt_dims[5] = int(float(line.split('=')[-1].strip()) + 0.5)\n                update += 1\n            if line.startswith(\"c2 = \"):\n                opt_dims[5] = int(float(line.split('=')[-1].strip()) + 0.5)\n                update += 1\n                \n        if update != len(opt_dims):\n            # The solver isn't finished correctly.\n            opt_dims = None\n\n        if save == 1:\n            shutil.copyfile(f'{tmpdirname}/tmp.mod', 'solver/tmp.mod')\n            shutil.copyfile(f'{tmpdirname}/tmp.dat', 'solver/tmp.dat')\n            shutil.copyfile(f'{tmpdirname}/tmp.run', 'solver/tmp.run')\n    \n    return opt_dims    \n\ndef off_chip_solver(search_task, cst, fixed_params=None, save=0):\n    \"\"\" Run the solver to minimize the off-chip data communication.\n    \"\"\"\n    if \"gemm\" in search_task.workload[\"tags\"]:\n        return off_chip_solver_gemm(search_task, cst, fixed_params, save)\n    elif \"conv\" in search_task.workload[\"tags\"]:\n        return off_chip_solver_conv(search_task, cst, fixed_params, save)\n    else:\n        RuntimeError(f\"Not supported task: {search_task.workload['name']}\")"
  },
  {
    "path": "autosa_scripts/odyssey/tuners.py",
    "content": "import json\nimport numpy as np\nimport xgboost as xgb\nimport random\nimport sys\nimport shutil\nimport copy\nimport pprint\nfrom bayes_opt import BayesianOptimization\nimport itertools\nimport csv\nfrom scipy import optimize\nimport math\nimport time\nfrom datetime import datetime\nfrom collections import deque\n\nimport utils\nfrom solver import off_chip_solver\nfrom search_task import MultiTask, SingleTask\n\n#import opentuner\n#from opentuner import ConfigurationManipulator\n#from opentuner import IntegerParameter\n#from opentuner import MeasurementInterface\n#from opentuner import Result\n#from opentuner.search.manipulator import PowerOfTwoParameter\n#\n#from RL_utils import RLAgent, RLEnv\n\nclass Constraint(object):\n    def __init__(self, cst_path):\n        with open(cst_path) as f:\n            data = json.load(f)\n        # Update the constraints\n        self.hw_cst = {}\n        for res in data:\n            self.hw_cst[res] = data[res][\"total\"] * data[res][\"ratio\"]\n            self.hw_cst[f'{res}_total'] = data[res][\"total\"]\n\n    def __repr__(self):\n        ret = \"\"\n        ret += f\"b{int(self.hw_cst['BRAM18K'])}\"\n        ret += f\"d{int(self.hw_cst['DSP'])}\"\n        ret += f\"u{int(self.hw_cst['URAM'])}\"\n        return ret\n\nclass Tuner(object):\n    def __init__(self, search_task, cst, search_obj, max_epoch, max_time, n_worker=1, silent=0, max=1):\n        self.search_task = search_task\n        self.cst = cst\n        self.search_obj = search_obj\n        self.max_epoch = max_epoch\n        self.max_time = max_time\n        self.max = max\n        if self.max == 1:\n            self.best_reward = 0\n        else:\n            self.best_reward = float('inf')\n        self.best_reward_meta = None\n        self.best_rewards = []\n        self.best_rewards_time = []\n        self.best_sol = None\n        self.best_sol_cst = None\n        self.last_update_epoch = -1\n        self.best_search_record = utils.SearchRecord().reset()\n        self.converge_time = 0\n        self.silent = silent\n        self.sub_task_silent = silent\n        self.n_worker = n_worker\n        # If multi-processing, silent the sub tasks\n        if n_worker > 1:\n            self.sub_task_silent = 1\n\n    def log(self, str, force=0):\n        \"\"\" If force is set to 1, we will print the log info regardless of the silence argument.\n        \"\"\"\n        if not self.silent or force:\n            import logging\n            logger = logging.getLogger('AutoSA-Tuner')\n            logger.info(str)\n            sys.stdout.flush()\n\n    def overuse_constraint(self, used_cst):\n        if not used_cst:\n            # If constraint doesn't exist, return True to exclude this design\n            return True\n\n        if used_cst['BRAM18K'] > self.cst.hw_cst['BRAM18K']:\n            return True\n        if used_cst['DSP'] > self.cst.hw_cst['DSP']:\n            return True\n        if used_cst['URAM'] > self.cst.hw_cst['URAM']:\n            return True\n\n        return False\n\ndef exhaustive_search(search_task, cst, search_obj, max_epochs, max_time, n_worker=1, silent=0, time_out=-1, pruning=0, profiling=0):\n    if profiling:\n        repeat_num = 3\n    else:\n        repeat_num = 1\n\n    tuner_params = {\n        \"pruning\": pruning,\n        \"DSP_thres\": [0.95, 1.0]\n    }\n\n    best_record = utils.SearchRecord().reset()\n    for repeat in range(repeat_num):\n        tuner = ExhaustiveTuner(search_task, cst, search_obj, max_epochs, max_time, tuner_params, n_worker, silent)\n        tuner.search()\n\n        search_record = tuner.best_search_record\n        best_record.update(search_record)\n\n        if profiling:\n            config_str = \"_exhaustive\"\n            if pruning:\n                config_str += \"_pruning\"\n\n            config_str += f\"_{search_task.design.name}\"\n            config_str += f\"_r{repeat}\"\n            with open(f'tmp/tuning_rewards{config_str}.csv', \"w\", newline='') as f:\n                fieldnames = ['epoch', 'reward', 'time']\n                writer = csv.DictWriter(f, fieldnames=fieldnames)\n                writer.writeheader()\n                for epoch in range(len(tuner.best_rewards)):\n                    writer.writerow({'epoch': epoch, 'reward': tuner.best_rewards[epoch], 'time': tuner.best_rewards_time[epoch]})\n\n    return best_record\n\nclass ExhaustiveTuner(Tuner):\n    def __init__(self, search_task, cst, obj, max_epoch, max_time, params, n_worker=1, silent=0):\n        super().__init__(search_task, cst, obj, max_epoch, max_time, n_worker=n_worker, silent=silent)\n        self.params = params\n        self.epoch = 0\n        if max_epoch > 0:\n            self.stop_criteria = \"epoch\"\n            self.max_epoch = max_epoch\n        else:\n            self.stop_criteria = \"time\"\n            self.max_time = max_time\n        self.counter = utils.PerfCounter()\n        self.param_idx_map = {} # Maps parameter name to its index in the sample\n        self.idx_param_map = {} # Maps the index to the parameter name\n        self.params_history = []\n\n    def search(self):\n        \"\"\" This tuner only works for GEMM (kernel3) \"\"\"\n        self.counter.init_counter('time')\n        self.counter.init_counter('converge_time')\n        self.epoch = 0\n\n        def filter_non_power_of_two(x):\n            if np.log2(x) != int(np.log2(x)):\n                return True\n            return False\n\n        #print(self.cst.hw_cst[\"DSP\"])\n\n        i, j, k = self.search_task.workload[\"params\"][\"i\"], self.search_task.workload[\"params\"][\"j\"], self.search_task.workload[\"params\"][\"k\"]\n        if not self.params[\"pruning\"]:\n            for i_t1 in range(1, i + 1):\n                for j_t1 in range(1, j + 1):\n                    for k_t1 in range(1, k + 1):\n                        for i_t2 in utils.get_divisors(int(i_t1), None):\n                            for j_t2 in utils.get_divisors(int(j_t1), None):\n                                for k_t2 in utils.get_divisors(int(min(k_t1,8)), filter_non_power_of_two):\n                                    latency_factors = 1\n                                    latency_factors *= i_t2\n                                    latency_factors *= j_t2\n                                    simd_factor = k_t2\n                                    if latency_factors >= 8 * simd_factor:\n                                    \tcontinue\n                                    params = {\n                                        \"i\": i, \"j\": j, \"k\": k,\n                                        \"i_t1\": i_t1, \"j_t1\": j_t1, \"k_t1\": k_t1,\n                                        \"i_t2\": i_t2, \"j_t2\": j_t2, \"k_t2\": k_t2,\n                                    }\n                                    task_params = self.search_task.adjust_params(task_params)\n                                    reward, used_constraint, reward_meta = self.search_task.evaluate(task_params, self.search_obj)\n                                    if self.overuse_constraint(used_constraint):\n                                        reward = 0\n                                    if reward > self.best_reward:\n                                        self.best_reward = reward\n                                        self.best_reward_meta = reward_meta\n                                        self.best_sol_cst = used_constraint\n                                        self.best_sol = task_params\n                                        self.log(f'Epoch {self.epoch}: new best reward: {self.best_reward} ({1/self.best_reward:.0f})')\n                                        #self.last_update_epoch = self.epoch\n                                        #self.counter.update_counter('converge_time')\n                                        self.best_search_record = utils.SearchRecord().extract_from_tuner_single_acc(self)\n                                    #self.best_rewards.append(self.best_reward)\n                                    #self.counter.update_counter('time')\n                                    #self.best_rewards_time.append(self.counter.get_counter('time'))\n\n                                    #if self.stop_criteria == \"epoch\" and self.epoch > self.max_epoch:\n                                    #    break\n                                    #if self.stop_criteria == \"time\":\n                                    #    self.counter.update_counter('time')\n                                    #    if self.counter.get_counter('time') > self.max_time:\n                                    #        break\n        else:\n            #for i_t1 in range(1, i + 1):\n            #for i_t1 in range(int(i/6), int(i/2)):\n            for i_t1 in range(200, 270):\n                if i_t1 % 2 != 0:\n                    continue\n                #for j_t1 in range(1, j + 1):\n                #for j_t1 in range(int(j/6), int(j/2)):\n                for j_t1 in range(200, 270):\n                    if j_t1 % 2 != 0:\n                        continue\n                    #for k_t1 in range(4, int(k/8)):\n                    for k_t1 in range(16, 64):\n                        if k_t1 % 2 != 0:\n                            continue\n                        for i_t2 in utils.get_divisors(int(i_t1), None):\n                            if i_t2 % 2 != 0:\n                                continue\n                            for j_t2 in utils.get_divisors(int(j_t1), None):\n                                if j_t2 % 2 != 0:\n                                    continue\n                                if (i_t1 / i_t2) * (j_t1 / j_t2) < 200:\n                                    continue\n                                if (i_t1 / i_t2) * (j_t1 / j_t2) > 240:\n                                    continue\n                                if 8 not in utils.get_divisors(int(min(k_t1,8))):\n                                    continue\n                                #if 4 not in utils.get_divisors(int(min(k_t1,8))):\n                                #    continue\n                                for k_t2 in [8]:\n                                #for k_t2 in utils.get_divisors(int(min(k_t1,8)), filter_non_power_of_two):\n                                    latency_factors = 1\n                                    latency_factors *= i_t2\n                                    latency_factors *= j_t2\n                                    simd_factor = k_t2\n                                    if latency_factors < 8 * simd_factor:\n                                    \tcontinue\n\n                                    dsp_usage = (i_t1 / i_t2) * (j_t1 / j_t2) * k_t2 * 5\n                                    if dsp_usage / self.cst.hw_cst[\"DSP\"] < self.params[\"DSP_thres\"][0] or \\\n                                       dsp_usage / self.cst.hw_cst[\"DSP\"] > self.params[\"DSP_thres\"][1]:\n                                        continue\n\n                                    task_params = {\n                                        \"i\": i, \"j\": j, \"k\": k,\n                                        \"i_t1\": i_t1, \"j_t1\": j_t1, \"k_t1\": k_t1,\n                                        \"i_t2\": i_t2, \"j_t2\": j_t2, \"k_t2\": k_t2,\n                                    }\n                                    task_params = self.search_task.adjust_params(task_params)\n                                    reward, used_constraint, reward_meta = self.search_task.evaluate(task_params, self.search_obj)\n                                    if self.overuse_constraint(used_constraint):\n                                        reward = 0\n                                    if reward > self.best_reward:\n                                        self.best_reward = reward\n                                        self.best_reward_meta = reward_meta\n                                        self.best_sol_cst = used_constraint\n                                        self.best_sol = task_params\n                                        self.log(f'Epoch {self.epoch}: new best reward: {self.best_reward} ({1/self.best_reward:.0f})')\n                                        self.best_search_record = utils.SearchRecord().extract_from_tuner_single_acc(self)\n\ndef random_search(search_task, cst, search_obj, max_epochs, max_time, n_worker=1, silent=0, time_out=-1, pruning=0, profiling=0):\n    if profiling:\n        repeat_num = 3\n    else:\n        repeat_num = 1\n\n    tuner_params = {\n        \"pruning\": pruning,\n        \"DSP_thres\": [0.6, 1.0]\n    }\n\n    best_record = utils.SearchRecord().reset()\n    for repeat in range(repeat_num):\n        tuner = RandomTuner(search_task, cst, search_obj, max_epochs, max_time, tuner_params, n_worker, silent)\n        tuner.search()\n\n        search_record = tuner.best_search_record\n        best_record.update(search_record)\n\n        if profiling:\n            config_str = \"_random\"\n            if pruning:\n                config_str += \"_pruning\"\n            \n            config_str += f\"_{search_task.design.name}\"\n            config_str += f\"_r{repeat}\"\n            with open(f'tmp/tuning_rewards{config_str}.csv', \"w\", newline='') as f:\n                fieldnames = ['epoch', 'reward', 'time']\n                writer = csv.DictWriter(f, fieldnames=fieldnames)\n                writer.writeheader()\n                for epoch in range(len(tuner.best_rewards)):\n                    writer.writerow({'epoch': epoch, 'reward': tuner.best_rewards[epoch], 'time': tuner.best_rewards_time[epoch]})\n\n    return best_record\n\nclass RandomTuner(Tuner):\n    def __init__(self, search_task, cst, obj, max_epoch, max_time, params, n_worker=1, silent=0):\n        super().__init__(search_task, cst, obj, max_epoch, max_time, n_worker=n_worker, silent=silent)\n        self.params = params\n        self.epoch = 0\n        if max_epoch > 0:\n            self.stop_criteria = \"epoch\"\n            self.max_epoch = max_epoch\n        else:\n            self.stop_criteria = \"time\"\n            self.max_time = max_time\n        self.counter = utils.PerfCounter()\n        self.param_idx_map = {} # Maps parameter name to its index in the sample\n        self.idx_param_map = {} # Maps the index to the parameter name\n        self.params_history = []\n\n    def generate_random_sample(self):\n        \"\"\" Generate a random sample from the design space.\n        We bookkeeping all the searched params to avoid duplicated search.\n        \"\"\"\n        duplicate = True\n        cnt = 0\n        task_params = None\n        while duplicate:\n            task_params = self.search_task.generate_random_sample()\n            # Serialize the params\n            params_hash = \"\"\n            for k, v in task_params.items():\n                params_hash += str(v)\n            if params_hash not in self.params_history:\n                duplicate = False\n                self.params_history.append(params_hash)\n            cnt += 1\n            if cnt > 20:\n                break\n\n        return task_params\n\n    def search(self):\n        self.counter.init_counter('time')\n        self.counter.init_counter('converge_time')\n        self.epoch = 0\n\n        while True:            \n            task_params = None\n            if self.params[\"pruning\"]:\n                while True:\n                    task_params = self.generate_random_sample()\n                    if not task_params:\n                        break\n                    self.epoch += 1\n                    self.best_rewards.append(self.best_reward)\n                    self.counter.update_counter('time')\n                    self.best_rewards_time.append(self.counter.get_counter('time'))\n\n                    task_params = self.search_task.adjust_params(task_params)\n                    task_params = self.search_task.design.infer_params(task_params)\n                    dsp_usage = task_params[\"i_t1\"] / task_params[\"i_t2\"] * task_params[\"j_t1\"] / task_params[\"j_t2\"] * task_params[\"k_t2\"] * 5\n                    if task_params[\"k_t2\"] == 8 and \\\n                       dsp_usage / self.cst.hw_cst[\"DSP\"] >= self.params[\"DSP_thres\"][0] and \\\n                       dsp_usage / self.cst.hw_cst[\"DSP\"] <= self.params[\"DSP_thres\"][1]:\n                       break\n                    '''\n                    resource, _ = self.search_task.design.est_resource(task_params)\n                    # Estimate the resource\n                    if resource[\"DSP\"] / self.cst.hw_cst[\"DSP\"] >= self.params[\"DSP_thres\"][0] and \\\n                       resource[\"DSP\"] / self.cst.hw_cst[\"DSP\"] <= self.params[\"DSP_thres\"][1]:\n                        break\n                    '''\n            else:\n                task_params = self.generate_random_sample()\n                self.epoch += 1\n                self.best_rewards.append(self.best_reward)\n                self.counter.update_counter('time')\n                self.best_rewards_time.append(self.counter.get_counter('time'))\n            if not task_params:\n                # Design space is exhausted\n                break\n            task_params = self.search_task.adjust_params(task_params)\n            reward, used_constraint, reward_meta = self.search_task.evaluate(task_params, self.search_obj)\n            if self.overuse_constraint(used_constraint):\n                reward = 0\n            if reward > self.best_reward:\n                self.best_reward = reward\n                self.best_reward_meta = reward_meta\n                self.best_sol_cst = used_constraint\n                self.best_sol = task_params\n                self.log(f'Epoch {self.epoch}: new best reward: {self.best_reward} ({1/self.best_reward:.0f})')\n                self.last_update_epoch = self.epoch\n                self.counter.update_counter('converge_time')\n                self.converge_time = self.counter.get_counter('converge_time')\n                self.best_search_record = utils.SearchRecord().extract_from_tuner_single_acc(self)            \n\n            if self.stop_criteria == \"epoch\" and self.epoch > self.max_epoch:\n                break\n            if self.stop_criteria == \"time\":\n                self.counter.update_counter('time')\n                if self.counter.get_counter('time') > self.max_time:\n                    break\n\n        return\n\ndef annealing_search(search_task, cst, search_obj, max_epochs, max_time, n_worker=1, silent=0, time_out=-1, profiling=0):\n    if profiling:\n        repeat_num = 3\n    else:\n        repeat_num = 1\n\n    tuner_params = {\n        \"T\": 200,\n        \"stepsize\": 16,\n        \"mutation_probability\": 1.0,\n        \"epsilon\": 0.1,\n        \"mutation_probs\": [0.2, 0.8, 0],\n        \"max_latency\": search_task.compute_ops()*10\n    }\n\n    best_record = utils.SearchRecord().reset()\n    for repeat in range(repeat_num):\n        tuner = AnnealingTuner(search_task, cst, search_obj, max_epochs, max_time, tuner_params, n_worker, silent)\n        tuner.search()\n\n        search_record = tuner.best_search_record\n        best_record.update(search_record)\n\n        if profiling:\n            config_str = \"_annealing\"\n\n            config_str += f\"_{search_task.design.name}\"\n            config_str += f\"_r{repeat}\"\n            with open(f'tmp/tuning_rewards{config_str}.csv', \"w\", newline='') as f:\n                fieldnames = ['epoch', 'reward', 'time']\n                writer = csv.DictWriter(f, fieldnames=fieldnames)\n                writer.writeheader()\n                for epoch in range(len(tuner.best_rewards)):\n                    writer.writerow({'epoch': epoch, 'reward': tuner.best_rewards[epoch], 'time': tuner.best_rewards_time[epoch]})\n\n    return best_record\n\nclass AnnealingTuner(Tuner):\n    def __init__(self, search_task, cst, obj, max_epoch, max_time, params, n_worker=1, silent=0):\n        super().__init__(search_task, cst, obj, max_epoch, max_time, n_worker=n_worker, silent=silent)\n        self.params = params\n        self.epoch = 0\n        if max_epoch > 0:\n            self.stop_criteria = \"epoch\"\n            self.max_epoch = max_epoch\n        else:\n            self.stop_criteria = \"time\"\n            self.max_time = max_time\n        self.counter = utils.PerfCounter()\n        self.param_idx_map = {} # Maps parameter name to its index in the sample\n        self.idx_param_map = {} # Maps the index to the parameter name\n\n    def update(self, args):\n        \"\"\" Optimization function\n        \"\"\"\n        if (np.any(np.isnan(args))) or (np.any(np.isneginf(args))) or (np.any(np.isposinf(args))) or (np.any(args[:] == 0)):\n            return self.params[\"max_latency\"]\n            #return float(\"inf\")\n\n        task_params = {}\n        for p, param in self.search_task.design.params_config[\"tunable\"].items():\n            task_params[param[\"name\"]] = args[self.param_idx_map[param[\"name\"]]]\n        for p, param in self.search_task.design.params_config[\"external\"].items():\n            task_params[param[\"name\"]] = self.search_task.workload[\"params\"][param[\"name\"]]\n        #print(args)\n        #print(task_params)\n        task_params = self.search_task.adjust_params(task_params)\n        reward, used_constraint, reward_meta = self.search_task.evaluate(task_params, self.search_obj)\n        # SA minimizes the opt target\n        if reward == 0:\n            reward = self.params[\"max_latency\"]\n            #reward = float(\"inf\")\n        else:\n            reward = 1 / reward\n        if self.overuse_constraint(used_constraint):\n            reward = self.params[\"max_latency\"]\n            #reward = float(\"inf\")\n\n        return reward\n\n    def bound_check(self, f_new, x_new, f_old, x_old):\n        \"\"\" Check if the parameters are legal.\n        \"\"\"\n        self.epoch += 1\n        self.best_rewards.append(self.best_reward)\n        self.counter.update_counter('time')\n        self.best_rewards_time.append(self.counter.get_counter('time'))\n\n        task_params = {}\n        for p, param in self.search_task.design.params_config[\"tunable\"].items():\n            task_params[param[\"name\"]] = x_new[self.param_idx_map[param[\"name\"]]]\n        for p, param in self.search_task.design.params_config[\"external\"].items():\n            task_params[param[\"name\"]] = self.search_task.workload[\"params\"][param[\"name\"]]\n        task_params = self.search_task.adjust_params(task_params)\n        task_params = self.search_task.design.infer_params(task_params)\n        if task_params:\n            status = self.search_task.design.bound_check(task_params)\n            #print(\"bound_check: \", task_params, status)\n            return status\n        else:\n            return False\n\n    def print_minimal(self, x, f, accepted):\n        \"\"\" Update the rewards when a local minimal is found.\n        \"\"\"\n        task_params = {}\n        for p, param in self.search_task.design.params_config[\"tunable\"].items():\n            task_params[param[\"name\"]] = x[self.param_idx_map[param[\"name\"]]]\n        for p, param in self.search_task.design.params_config[\"external\"].items():\n            task_params[param[\"name\"]] = self.search_task.workload[\"params\"][param[\"name\"]]\n        task_params = self.search_task.adjust_params(task_params)\n        reward, used_constraint, reward_meta = self.search_task.evaluate(task_params, self.search_obj)\n        if self.overuse_constraint(used_constraint):\n            reward = 0\n        if reward > self.best_reward:\n            self.best_reward = reward\n            self.best_reward_meta = reward_meta\n            self.best_sol_cst = used_constraint\n            self.best_sol = task_params\n            self.log(f'Epoch {self.epoch}: new best reward: {self.best_reward} ({1/self.best_reward:.0f})')\n            self.last_update_epoch = self.epoch\n            self.counter.update_counter('converge_time')\n            self.converge_time = self.counter.get_counter('converge_time')\n            self.best_search_record = utils.SearchRecord().extract_from_tuner_single_acc(self)        \n\n    def take_step(self, x):\n        \"\"\" Step-taking routine.\n        Note: Only for gemm.\n        \"\"\"\n        '''\n        s = self.params[\"stepsize\"]\n        x[0:3] += np.random.uniform(-max(1,s), max(1,s), 3)\n        x[3:5] += np.random.uniform(-max(1,int(.5*s)), max(1,int(.5*s)), 2)\n        x[5] += np.random.uniform(-max(1,int(.25*s)), max(1,int(.25*s)))\n        x = np.array([int(a) if int(a) > 0 else 1 for a in x])\n        '''\n        # Reuse the genetic search mutation method\n        if random.random() < self.params[\"mutation_probability\"]:\n            if random.random() < self.params[\"epsilon\"]:\n                task_params = self.search_task.generate_random_sample()\n                for i in range(len(x)):\n                    x[i] = task_params[self.idx_param_map[i]]\n            else:\n                idv = x\n                task_params = {}\n                for p, param in self.search_task.design.params_config[\"tunable\"].items():\n                    task_params[param[\"name\"]] = idv[self.param_idx_map[param[\"name\"]]]\n                for p, param in self.search_task.design.params_config[\"external\"].items():\n                    task_params[param[\"name\"]] = self.search_task.workload[\"params\"][param[\"name\"]]\n                # Build the chains\n                # [{\"params\": [p0, p3, p7], \"factors\": [ceil(p0/p3), p3/p7, p7]}, {}]\n                split_chains = []\n                for p, param in self.search_task.design.params_config[\"external\"].items():\n                    chain = {\"params\": [param[\"name\"]], \"factors\": []}\n                    cur_param = param\n                    while \"split_by\" in cur_param:\n                        if \"divisors\" in self.search_task.design.params_config[\"tunable\"][cur_param[\"split_by\"]] \\\n                            and cur_param[\"name\"] in self.search_task.design.params_config[\"tunable\"][cur_param[\"split_by\"]][\"divisors\"]:\n                            div = 1\n                        else:\n                            div = 0\n                        chain[\"params\"].append(cur_param[\"split_by\"])\n                        if div:\n                            factor = np.ceil(task_params[cur_param[\"name\"]] / task_params[cur_param[\"split_by\"]])\n                        else:\n                            factor = task_params[cur_param[\"name\"]] / task_params[cur_param[\"split_by\"]]\n                        chain[\"factors\"].append(max(1, int(factor)))\n                        cur_param = self.search_task.design.params_config[\"tunable\"][cur_param[\"split_by\"]]\n                    chain[\"factors\"].append(max(1, int(task_params[cur_param[\"name\"]])))\n                    split_chains.append(chain)\n\n                # Mutation\n                for chain in split_chains:\n                    if len(chain[\"factors\"]) <= 1:\n                        continue\n                    if 'fix_param' in self.search_task.configs:\n                        # Avoid mutating the fixed parameters\n                        for fix_p in self.search_task.configs['fix_param']:\n                            if fix_p[0] == chain['params'][0]:\n                                continue\n                    src_idx, dst_idx = random.sample(range(0, len(chain[\"factors\"])), 2)\n                    #mutation_policy_probs = [0.2, 0, 0.8] #\n                    mutation_policy_probs = self.params[\"mutation_probs\"]\n                    mutation_policy_probs = np.cumsum(mutation_policy_probs)\n                    #print(mutation_policy_probs)\n                    select_prob = random.random()\n                    if select_prob < mutation_policy_probs[0]:\n                        # Random\n                        if chain[\"factors\"][dst_idx] == 1:\n                            continue\n                        \"\"\"\n                        inc_stride = max(1, int(chain[\"factors\"][src_idx] * random.random() * 1.0))\n                        dec_stride = max(1, int(chain[\"factors\"][dst_idx] - chain[\"factors\"][src_idx] * chain[\"factors\"][dst_idx] / (chain[\"factors\"][src_idx] + inc_stride)))\n                        chain[\"factors\"][src_idx] += inc_stride\n                        chain[\"factors\"][dst_idx] -= dec_stride\n                        chain[\"factors\"][dst_idx] = max(1, chain[\"factors\"][dst_idx])\n                        \"\"\"\n                        src = max(1, int(chain[\"factors\"][src_idx] * random.random() * 1.0))\n                        dst = max(1, math.ceil(chain[\"factors\"][src_idx] * chain[\"factors\"][dst_idx] / src))                        \n                        chain[\"factors\"][src_idx] = src\n                        chain[\"factors\"][dst_idx] = dst                    \n                    elif select_prob < mutation_policy_probs[2]:\n                        # Factorization\n                        factor = chain[\"factors\"][src_idx]\n                        if factor == 1:\n                            continue\n                        divs = utils.factorization(factor)\n                        div = random.choice(divs)\n                        chain[\"factors\"][src_idx] /= div\n                        chain[\"factors\"][dst_idx] *= div\n                    else:\n                        # Random\n                        chain[\"factors\"][src_idx] = max(1, int(chain[\"factors\"][src_idx] * random.random() * 1.0))\n\n                # Revert to the params\n                # [{\"params\": [p0, p3, p7], \"factors\": [ceil(p0/p3), p3/p7, p7]}, {}]\n                for chain in split_chains:\n                    factor = chain[\"factors\"][-1]\n                    param = chain[\"params\"][-1]\n                    if param in self.param_idx_map:\n                        x[self.param_idx_map[param]] = factor\n                    for idx in range(len(chain[\"factors\"]) - 2, -1, -1):\n                        param = chain[\"params\"][idx]\n                        factor *= chain[\"factors\"][idx]\n                        if param in self.param_idx_map:\n                            x[self.param_idx_map[param]] = factor\n\n        return x\n\n    def search(self):\n        self.counter.init_counter('time')\n        self.counter.init_counter('converge_time')\n        self.epoch = 0\n\n        idx = 0\n        for p, param in self.search_task.design.params_config[\"tunable\"].items():\n            self.param_idx_map[param[\"name\"]] = idx\n            self.idx_param_map[idx] = param[\"name\"]\n            idx += 1\n\n        # Init guess\n        init_reward = 0\n        init_params = None\n        for i in range(5):\n            task_params = self.search_task.generate_random_sample()\n            task_params = self.search_task.adjust_params(task_params)\n            reward, used_constraint, reward_meta = self.search_task.evaluate(task_params, self.search_obj)\n            if self.overuse_constraint(used_constraint):\n                reward = 0\n            if reward > init_reward:\n                init_reward = reward\n                init_params = task_params\n\n        param_arr = []\n        for p, param in self.search_task.design.params_config[\"tunable\"].items():\n            param_arr.append(task_params[param[\"name\"]])\n        x0 = np.array(param_arr)\n        # Search\n        optimize.basinhopping(self.update, x0, niter=self.max_epoch, \\\n                accept_test=self.bound_check,\n                stepsize=self.params['stepsize'],\n                T=self.params['T'], callback=self.print_minimal,\n                take_step=self.take_step)\n\n        return\n\ndef bayesian_search(search_task, cst, search_obj, max_epochs, max_time, n_worker=1, silent=0, time_out=-1, profiling=0):\n    if profiling:\n        repeat_num = 3\n    else:\n        repeat_num = 1\n\n    tuner_params = {        \n        \"init_points\": 10,\n        \"mutation_probability\": 1.0,\n        \"epsilon\": 0.1,\n        \"mutation_probs\": [0.2, 0.8, 0],\n        \"max_latency\": search_task.compute_ops()*10\n    }\n\n    best_record = utils.SearchRecord().reset()\n    for repeat in range(repeat_num):\n        tuner = BayesianTuner(search_task, cst, search_obj, max_epochs, max_time, tuner_params, n_worker, silent)\n        tuner.search()\n\n        search_record = tuner.best_search_record\n        best_record.update(search_record)\n\n        if profiling:\n            config_str = \"_bayesian\"\n\n            config_str += f\"_{search_task.design.name}\"\n            config_str += f\"_r{repeat}\"\n            with open(f'tmp/tuning_rewards{config_str}.csv', \"w\", newline='') as f:\n                fieldnames = ['epoch', 'reward', 'time']\n                writer = csv.DictWriter(f, fieldnames=fieldnames)\n                writer.writeheader()\n                for epoch in range(len(tuner.best_rewards)):\n                    writer.writerow({'epoch': epoch, 'reward': tuner.best_rewards[epoch], 'time': tuner.best_rewards_time[epoch]})\n\n    return best_record\n\nclass BayesianTuner(Tuner):\n    def __init__(self, search_task, cst, obj, max_epoch, max_time, params, n_worker=1, silent=0):\n        super().__init__(search_task, cst, obj, max_epoch, max_time, n_worker=n_worker, silent=silent)\n        self.params = params\n        self.epoch = 0\n        if max_epoch > 0:\n            self.stop_criteria = \"epoch\"\n            self.max_epoch = max_epoch\n        else:\n            self.stop_criteria = \"time\"\n            self.max_time = max_time\n        self.counter = utils.PerfCounter()\n        self.param_idx_map = {} # Maps parameter name to its index in the sample\n        self.idx_param_map = {} # Maps the index to the parameter name\n\n    def black_box_function(self, i_t1, j_t1, k_t1, i_t2, j_t2, k_t2):        \n        task_params = {\n            \"i_t1\": int(i_t1), \"j_t1\": int(j_t1), \"k_t1\": int(k_t1),\n            \"i_t2\": int(i_t2), \"j_t2\": int(j_t2), \"k_t2\": int(k_t2)\n        }        \n\n        #task_params = {}\n        #for p, param in self.search_task.design.params_config[\"tunable\"].items():\n        #    task_params[param[\"name\"]] = x_new[self.param_idx_map[param[\"name\"]]]\n        for p, param in self.search_task.design.params_config[\"external\"].items():\n            task_params[param[\"name\"]] = self.search_task.workload[\"params\"][param[\"name\"]]\n        task_params = self.search_task.adjust_params(task_params)\n        task_params = self.search_task.design.infer_params(task_params)\n        if task_params:\n            status = self.search_task.design.bound_check(task_params)            \n            if not status:\n                return 0\n        else:\n            return 0\n\n        reward, used_constraint, reward_meta = self.search_task.evaluate(task_params, self.search_obj)        \n        if self.overuse_constraint(used_constraint):\n            return 0\n        if reward > self.best_reward:\n            self.best_reward = reward\n            self.best_reward_meta = reward_meta\n            self.best_sol_cst = used_constraint\n            self.best_sol = task_params\n            self.log(f'Epoch {self.epoch}: new best reward: {self.best_reward} ({1/self.best_reward:.0f})')\n            self.last_update_epoch = self.epoch\n            self.counter.update_counter('converge_time')\n            self.converge_time = self.counter.get_counter('converge_time')\n            self.best_search_record = utils.SearchRecord().extract_from_tuner_single_acc(self)        \n        self.best_rewards.append(self.best_reward)\n        self.counter.update_counter('time')\n        self.best_rewards_time.append(self.counter.get_counter('time'))\n        self.epoch += 1\n\n        return reward\n\n    def search(self):\n        self.counter.init_counter('time')\n        self.counter.init_counter('converge_time')\n        self.epoch = 0\n\n        idx = 0\n        for p, param in self.search_task.design.params_config[\"tunable\"].items():\n            self.param_idx_map[param[\"name\"]] = idx\n            self.idx_param_map[idx] = param[\"name\"]\n            idx += 1\n\n        init_points = self.params[\"init_points\"]\n        # Only test for mm task\n        pbounds = {'i_t1': (1, self.search_task.workload[\"params\"][\"i\"]), 'j_t1': (1, self.search_task.workload[\"params\"][\"j\"]), 'k_t1': (1, self.search_task.workload[\"params\"][\"k\"]),\\\n                   'i_t2': (1, self.search_task.workload[\"params\"][\"i\"]), 'j_t2': (1, self.search_task.workload[\"params\"][\"j\"]), 'k_t2': (1, min(256 // self.search_task.dw, 64, self.search_task.workload[\"params\"][\"k\"]))}\n        \n        optimizer = BayesianOptimization(\n            f=self.black_box_function,\n            pbounds=pbounds,\n            #verbose=1,\n            random_state=1,\n        )\n\n        optimizer.maximize(\n            init_points=init_points,\n            n_iter=self.max_epoch - init_points,\n        )\n\n        return\n\n'''\ndef opentuner_search(search_task, cst, search_obj, max_epochs, max_time, solver=1, fixed_params=None, n_worker=1, silent=0, time_out=-1, profiling=0, args=None):\n    if profiling:\n        repeat_num = 3\n    else:\n        repeat_num = 1\n\n    tuner_params = {\n        \"args\": args\n    }\n\n    best_record = utils.SearchRecord().reset()\n    for repeat in range(repeat_num):\n        tuner = OpenTunerInterface(search_task, cst, search_obj, max_epochs, max_time, tuner_params, n_worker, silent)\n        tuner.search()\n\n        search_record = tuner.best_search_record\n        best_record.update(search_record)\n\n        if profiling:\n            config_str = \"_opentuner\"\n            \n            config_str += f\"_{search_task.design.name}\"\n            config_str += f\"_r{repeat}\"\n            with open(f'tmp/tuning_rewards{config_str}.csv', \"w\", newline='') as f:\n                fieldnames = ['epoch', 'reward', 'time']\n                writer = csv.DictWriter(f, fieldnames=fieldnames)\n                writer.writeheader()\n                for epoch in range(len(tuner.best_rewards)):\n                    writer.writerow({'epoch': epoch, 'reward': tuner.best_rewards[epoch], 'time': tuner.best_rewards_time[epoch]})\n\n    return best_record\n\nclass OpenTunerInterface(Tuner):\n    def __init__(self, search_task, cst, obj, max_epoch, max_time, params, n_worker=1, silent=0):\n        super().__init__(search_task, cst, obj, max_epoch, max_time, n_worker=n_worker, silent=silent)\n        self.params = params\n        self.epoch = 0\n        if max_epoch > 0:\n            self.stop_criteria = \"epoch\"\n            self.max_epoch = max_epoch\n        else:\n            self.stop_criteria = \"time\"\n            self.max_time = max_time\n        self.counter = utils.PerfCounter()\n        self.param_idx_map = {} # Maps parameter name to its index in the sample\n        self.idx_param_map = {} # Maps the index to the parameter name\n\n    def init_args(self, args):\n        args.bail_threshold = 500\n        args.database = None\n        args.display_frequency = None                \n        args.generate_bandit_technique = False\n        args.label = None\n        args.list_techniques = False\n        args.machine_class = None\n        args.no_dups = True\n        args.parallel_compile = False\n        args.parallelism = 4\n        args.pipelining = 0\n        args.print_params = False\n        args.print_search_space_size = False\n        args.quiet = True\n        args.results_log = None\n        args.results_log_details = None\n        args.seed_configuration = []\n        if self.stop_criteria == \"time\":\n            args.stop_after = self.max_time\n        else:\n            args.stop_after = None\n        args.technique = None\n        args.test_limit = 5000\n\n        return args\n\n    def search(self):\n        self.counter.init_counter('time')\n        self.counter.init_counter('converge_time')\n        self.epoch = 0\n\n        idx = 0\n        for p, param in self.search_task.design.params_config[\"tunable\"].items():\n            self.param_idx_map[param[\"name\"]] = idx\n            self.idx_param_map[idx] = param[\"name\"]\n            idx += 1\n    \n        opentuner_args = self.init_args(self.params[\"args\"])\n        opentuner = OpenTunerInstance(opentuner_args, self)\n        opentuner.main(opentuner_args, self)\n\n        return\n\nclass OpenTunerInstance(MeasurementInterface):\n    def __init__(self, args, tuner):\n        super().__init__(args)\n        self.tuner = tuner\n\n    def manipulator(self):\n        \"\"\"\n        Define the search space by creating a\n        ConfigurationManipulator\n        \"\"\"\n        manipulator = ConfigurationManipulator()\n        tuner = self.tuner\n\n        manipulator.add_parameter(\n            IntegerParameter('i_t1', 1, tuner.search_task.workload[\"params\"][\"i\"]))\n        manipulator.add_parameter(\n            IntegerParameter('j_t1', 1, tuner.search_task.workload[\"params\"][\"j\"]))\n        manipulator.add_parameter(\n            IntegerParameter('k_t1', 1, tuner.search_task.workload[\"params\"][\"k\"]))\n        manipulator.add_parameter(\n            IntegerParameter('i_t2', 1, tuner.search_task.workload[\"params\"][\"i\"]))\n        manipulator.add_parameter(\n            IntegerParameter('j_t2', 1, tuner.search_task.workload[\"params\"][\"j\"]))\n        manipulator.add_parameter(\n            PowerOfTwoParameter('k_t2', 1, min(256 // tuner.search_task.dw, 64, tuner.search_task.workload[\"params\"][\"k\"])))\n\n        return manipulator\n\n    def run(self, desired_result, input, limit):\n        \"\"\"\n        Compile and run a given configuration then\n        return performance\n        \"\"\"\n        cfg = desired_result.configuration.data\n        tuner = self.tuner\n\n        x = [int(cfg['i_t1']), int(cfg['j_t1']), int(cfg['k_t1']),\\\n             int(cfg['i_t2']), int(cfg['j_t2']), int(cfg['k_t2'])]\n        \n        task_params = {}\n        for p, param in tuner.search_task.design.params_config[\"tunable\"].items():\n            task_params[param[\"name\"]] = x[tuner.param_idx_map[param[\"name\"]]]\n        for p, param in tuner.search_task.design.params_config[\"external\"].items():\n            task_params[param[\"name\"]] = tuner.search_task.workload[\"params\"][param[\"name\"]]\n        task_params = tuner.search_task.adjust_params(task_params)\n        task_params = tuner.search_task.design.infer_params(task_params)\n        if task_params:\n            status = tuner.search_task.design.bound_check(task_params)            \n            if not status:\n                return Result(state='ERROR', time=float('inf'))\n        else:\n            return Result(state='ERROR', time=float('inf'))\n\n        reward, used_constraint, reward_meta = tuner.search_task.evaluate(task_params, tuner.search_obj)\n        if tuner.overuse_constraint(used_constraint):\n            return Result(state='ERROR', time=float('inf'))\n        result = Result(time=1/reward)\n        if reward > tuner.best_reward:\n            tuner.best_reward = reward\n            tuner.best_reward_meta = reward_meta\n            tuner.best_sol_cst = used_constraint\n            tuner.best_sol = task_params\n            tuner.log(f'Epoch {tuner.epoch}: new best reward: {tuner.best_reward} ({1/tuner.best_reward:.0f})')\n            tuner.last_update_epoch = tuner.epoch\n            tuner.counter.update_counter('converge_time')\n            tuner.converge_time = tuner.counter.get_counter('converge_time')\n            tuner.best_search_record = utils.SearchRecord().extract_from_tuner_single_acc(tuner)\n        tuner.best_rewards.append(tuner.best_reward)\n        tuner.counter.update_counter('time')\n        tuner.best_rewards_time.append(tuner.counter.get_counter('time'))\n        tuner.epoch += 1\n\n        return result\n\ndef RL_search(search_task, cst, search_obj, max_epochs, max_time, n_worker=1, silent=0, time_out=-1, profiling=0):\n    if profiling:\n        repeat_num = 3\n    else:\n        repeat_num = 1\n\n    tuner_params = {                \n        \"eps\": 0.0,\n        \"temperature\": 1,\n        \"batch\": 200\n    }\n\n    best_record = utils.SearchRecord().reset()\n    for repeat in range(repeat_num):\n        tuner = RLTuner(search_task, cst, search_obj, max_epochs, max_time, tuner_params, n_worker, silent)\n        tuner.search()\n\n        search_record = tuner.best_search_record\n        best_record.update(search_record)\n\n        if profiling:\n            config_str = \"_RL\"\n            \n            config_str += f\"_{search_task.design.name}\"\n            config_str += f\"_r{repeat}\"\n            with open(f'tmp/tuning_rewards{config_str}.csv', \"w\", newline='') as f:\n                fieldnames = ['epoch', 'reward', 'time']\n                writer = csv.DictWriter(f, fieldnames=fieldnames)\n                writer.writeheader()\n                for epoch in range(len(tuner.best_rewards)):\n                    writer.writerow({'epoch': epoch, 'reward': tuner.best_rewards[epoch], 'time': tuner.best_rewards_time[epoch]})\n\n    return best_record    \n\nclass RLTuner(Tuner):\n    def __init__(self, search_task, cst, obj, max_epoch, max_time, params, n_worker=1, silent=0):\n        super().__init__(search_task, cst, obj, max_epoch, max_time, n_worker=n_worker, silent=silent)\n        self.params = params\n        self.epoch = 0\n        if max_epoch > 0:\n            self.stop_criteria = \"epoch\"\n            self.max_epoch = max_epoch\n        else:\n            self.stop_criteria = \"time\"\n            self.max_time = max_time\n        self.counter = utils.PerfCounter()\n        self.param_idx_map = {} # Maps parameter name to its index in the sample\n        self.idx_param_map = {} # Maps the index to the parameter name\n\n        self.agent = None\n        self.env = None\n\n    def policy_gradient(self, n_episodes=100000, max_t=1000, print_every=10, eps=0, temperature=1):\n        \"\"\"\n        n_episodes: number of training episodes\n        print_every: maximal number of episodes to keep the record\n        \"\"\"\n        best_score = -2**20\n        scores_window = deque(maxlen=print_every)\n        scores = []\n        has_succeed_history = False\n        for i_episode in range(n_episodes):\n            # Adjust learning rate\n            if i_episode % 100 == 0 and has_succeed_history:\n                eps /= 1.2\n                temperature /= 1.01\n                temperature = max(temperature, 1)\n                self.agent.adjust_lr(ratio=0.8, min_lr=1e-6)\n            \n            score = 0\n            state, infos = self.env.reset()\n            # Max number of attempts in one episode\n            for t in range(max_t):\n                # Generate one action\n                action, log_prob = self.agent.act(state, infos, eps, temperature)\n                # Get rewards from the env\n                next_state, reward, done, infos, sig, impt = self.env.step(action)\n                # Update the agent\n                self.agent.step(state, action, log_prob, reward, next_state, done, sig, impt, infos)\n                state = next_state\n                score += infos[\"reward_raw\"]\n                if done:\n                    break\n            \n            scores.append(score)\n            if infos[\"succeed\"]:\n                has_succeed_history = True\n                if score > self.best_reward:\n                    self.best_reward = score\n                    self.best_reward_meta = infos[\"reward_meta\"]\n                    self.best_sol_cst = infos[\"cst\"]\n                    self.best_sol = infos[\"sol\"]\n                    self.log(f'Epoch {self.epoch}: new best reward: {self.best_reward} ({1/self.best_reward:.0f})')\n                    self.last_update_epoch = self.epoch\n                    self.counter.update_counter('converge_time')\n                    self.converge_time = self.counter.get_counter('converge_time')\n                    self.best_search_record = utils.SearchRecord().extract_from_tuner_single_acc(self)                \n            self.best_rewards.append(self.best_reward)\n            self.counter.update_counter('time')\n            self.best_rewards_time.append(self.counter.get_counter('time'))\n            self.epoch += 1\n\n        return scores\n\n    def search(self):\n        self.counter.init_counter('time')\n        self.counter.init_counter('converge_time')\n        self.epoch = 0\n\n        idx = 0\n        for p, param in self.search_task.design.params_config[\"tunable\"].items():\n            self.param_idx_map[param[\"name\"]] = idx\n            self.idx_param_map[idx] = param[\"name\"]\n            idx += 1\n\n        # Dimension of the problem space (i, j, k)\n        dim_size = 3\n        # Dimension of the action vector (i_t1, j_t1, k_t1, i_t2, j_t2, k_t2)\n        n_action_steps = 6\n        # Level of each action step\n        action_size = max(self.search_task.workload[\"params\"][\"i\"], \n                          self.search_task.workload[\"params\"][\"j\"], \n                          self.search_task.workload[\"params\"][\"k\"])\n        # Initialize agent and environment \n        self.agent = RLAgent(dim_size=dim_size, n_action_steps=n_action_steps, action_size=action_size, seed=random.randint(0, 2**63), batch=self.params['batch'])\n        self.env = RLEnv(self.search_task, self.cst, self.param_idx_map, self.idx_param_map, self.search_obj,\n                         dim_size=dim_size, n_action_steps=n_action_steps, action_size=action_size)                \n        state = self.env.reset()\n        self.agent.reset()\n\n        scores = self.policy_gradient(n_episodes=self.max_epoch, eps=self.params['eps'], temperature=self.params['temperature'])\n\n        return\n'''\n\ndef genetic_search(search_task, cst, search_obj, max_epochs, max_time, solver=1, fixed_params=None, n_worker=1, silent=0, time_out=-1, profiling=0):\n    \"\"\" Genetic search\n    If solver is enabled, we will first call IPOPT solver to generate the initial params to\n    kick off the genetic search.\n    \"\"\"\n    if profiling:\n        solver = 1        \n        repeat_num = 3\n    else:\n        repeat_num = 1\n\n    init_params = None\n    #solver = 0\n    if solver == 1:\n        # Call IPOPT solver\n        init_params = off_chip_solver(search_task, cst, fixed_params, save=1)\n        #init_params = off_chip_solver(search_task, cst, fixed_params)\n    #print(search_task)\n    #print(init_params)\n    \n    if init_params:\n        # Modify it to divisors\n        param_idx_map = {}\n        idx_param_map = {}\n        idx = 0\n        for p, param in search_task.design.params_config[\"tunable\"].items():\n            param_idx_map[param[\"name\"]] = idx\n            idx_param_map[idx] = param[\"name\"]\n            idx += 1\n        import bisect\n        task_params = {}\n        for p, param in search_task.design.params_config[\"tunable\"].items():\n            task_params[param[\"name\"]] = init_params[param_idx_map[param[\"name\"]]]\n        for p, param in search_task.design.params_config[\"external\"].items():\n            task_params[param[\"name\"]] = search_task.workload[\"params\"][param[\"name\"]]\n        # Fix the first-level\n        #for p, param in search_task.design.params_config[\"external\"].items():\n        #    split_by_param = param[\"split_by\"]\n        #    choices = utils.get_divisors(int(task_params[p]), None)\n        #    idx = bisect.bisect(choices, task_params[split_by_param])\n        #    if idx >= len(choices):\n        #        idx -= 1\n        #    if idx > 1:\n        #        if abs(choices[idx - 1] - task_params[split_by_param]) < abs(choices[idx] - task_params[split_by_param]):\n        #            idx -= 1\n        #    task_params[split_by_param] = choices[idx]\n\n        ## Fix the first-level: make them multiple of 4 (for solver analysis)\n        #for p, param in search_task.design.params_config[\"external\"].items():\n        #    split_by_param = param[\"split_by\"]\n        #    if split_by_param.startswith(\"k\"):\n        #        task_params[split_by_param] = int(task_params[split_by_param] / 16) * 16            \n\n        # Fix the first-level: make them multiple of 2\n        for p, param in search_task.design.params_config[\"external\"].items():\n            split_by_param = param[\"split_by\"]            \n            task_params[split_by_param] = int(task_params[split_by_param] / 2) * 2\n\n        # Fix the second-level    \n        def filter_non_power_of_two(x):\n            if np.log2(x) != int(np.log2(x)):\n                return True\n            return False\n        for p, param in search_task.design.params_config[\"tunable\"].items():        \n            if \"divisors\" in param:\n                if \"tags\" in param and \"power_of_two\" in param[\"tags\"]:\n                    choices = utils.get_divisors(int(task_params[param[\"divisors\"][0]]), filter_non_power_of_two)\n                else:\n                    choices = utils.get_divisors(int(task_params[param[\"divisors\"][0]]), None)                \n                idx = bisect.bisect(choices, task_params[p])\n                if idx >= len(choices):\n                    idx -= 1\n                if idx > 1:\n                    if abs(choices[idx - 1] - task_params[p]) < abs(choices[idx] - task_params[p]):\n                        idx -= 1\n                task_params[p] = choices[idx]\n        init_params = []\n        for p, param in search_task.design.params_config[\"tunable\"].items():\n            init_params.append(task_params[param[\"name\"]])\n        #print(init_params)\n        #exit(0)        \n    \n    # comm\n    #init_params = [1024, 1024, 256, 128, 128, 4] # [1024, 1024, 320, 128, 128, 4]\n    # -comp\n    #init_params = [512, 512, 256, 32, 32, 4] # [520, 520, 320, 26, 26, 4]\n    # comm-comp\n    #init_params = [1024, 1024, 256, 64, 64, 4] # [1024, 1024, 320, 64, 64, 4]\n    # imperfect pruning\n    #init_params = [512, 1024, 8, 512, 512, 8]\n\n    mutation_probs_list = [\n        [0, 1, 0],\n        [0.2, 0.8, 0],\n        [0.4, 0.6, 0],\n        [0.6, 0.4, 0],\n        [0.8, 0.2, 0],\n        [1, 0, 0],\n        #[0, 0.8, 0.2],\n        #[0, 0.6, 0.4],\n        #[0, 0.4, 0.6],\n        #[0, 0.2, 0.8],\n        #[0, 0, 1]\n    ]\n\n    tuner_params = {\n        \"population_size\": 200,\\\n        \"mutation_probability\": 0.5,\\\n        \"parents_ratio\": 0.3,\\\n        \"epsilon\": 0.1,\\\n        #\"epsilon\": 0,\\\n        \"ancestor\": init_params,\\\n        \"fixed_params\": fixed_params,\\\n        \"time_out\": time_out,\n        \"mutation_probs\": mutation_probs_list[1]        \n        #\"mutation_probs\": mutation_probs_list[0]\n    }\n\n    #print(tuner_params)\n\n    best_record = utils.SearchRecord().reset()\n    for repeat in range(repeat_num):\n        tuner = GeneticTuner(search_task, cst, search_obj, max_epochs, max_time, tuner_params, n_worker, silent)\n        tuner.search()\n\n        search_record = tuner.best_search_record\n        best_record.update(search_record)\n\n        if profiling:\n            # Mutation methods\n            #config_str = \"\"\n            #for p in tuner_params[\"mutation_probs\"]:\n            #    config_str += \"_\"\n            #    config_str += str(p)\n\n            # Solver\n            #config_str = \"_comm_div_comp\"\n            #config_str = \"_no_solver\"\n            #config_str = \"_comm\"\n            #config_str = \"_comp\"\n            config_str = \"_comm_minus_comp\"\n\n            # Hardware Model\n            #config_str = \"_baseline\"\n            #config_str = \"_divisor_only\"\n            #config_str = \"_simplified_model\"\n\n            # Search Method\n            #config_str = \"_genetic\"            \n            #config_str += f\"_{search_task.design.name}\"\n\n            # Dataflow\n            #config_str = f\"_{search_task.workload['name']}_{search_obj}_{search_task.design.name}\"\n            \n            config_str += f\"_r{repeat}\"\n            with open(f'tmp/tuning_rewards{config_str}.csv', \"w\", newline='') as f:\n                fieldnames = ['epoch', 'reward', 'time']\n                writer = csv.DictWriter(f, fieldnames=fieldnames)\n                writer.writeheader()\n                for epoch in range(len(tuner.best_rewards)):\n                    writer.writerow({'epoch': epoch, 'reward': tuner.best_rewards[epoch], 'time': tuner.best_rewards_time[epoch]})\n\n    return best_record\n\nclass GeneticTuner(Tuner):\n    def __init__(self, search_task, cst, obj, max_epoch, max_time, params, n_worker=1, silent=0):\n        super().__init__(search_task, cst, obj, max_epoch, max_time, n_worker=n_worker, silent=silent)\n        self.params = params\n        self.epoch = 0\n        if max_epoch > 0:\n            self.stop_criteria = \"epoch\"\n            self.max_epoch = max_epoch\n        else:\n            self.stop_criteria = \"time\"\n            self.max_time = max_time\n        self.counter = utils.PerfCounter()\n        self.param_idx_map = {} # Maps parameter name to its index in the sample\n        self.idx_param_map = {} # Maps the index to the parameter name\n\n    def select_parents(self, population, fitness, num_parents):\n        \"\"\" Select \"num_parents\" parents with the highest fitness score.\n        \"\"\"\n        fitness_idx_sorted = np.argsort(-fitness)\n        parents = population[fitness_idx_sorted[:num_parents]][:]\n        return parents\n\n    def crossover(self, pool, num_children):\n        \"\"\" Perform single-point crossover.\n        \"\"\"\n        children = np.empty((num_children, len(self.search_task.design.params_config[\"tunable\"])))\n        # Build the parameter dependecy chain\n        param_deps = {} # [\"param\": \"dependent_param (multiple of this parameter)\"]\n        param_cnt = 0\n        for p, param in self.search_task.design.params_config[\"tunable\"].items():\n            if \"divisors\" in param:\n                param_deps[param[\"name\"]] = param[\"divisors\"][0]\n                param_cnt += 2\n        if param_cnt != len(self.search_task.design.params_config[\"tunable\"]):\n            raise RuntimeError(\"Not all tuning parameters can be handled by crossover\")\n        for i in range(num_children):\n            parents_idx = [i % pool.shape[0], np.random.randint(0, pool.shape[0])]\n            for param in param_deps:\n                idx = np.random.randint(0, 2)\n                children[i][self.param_idx_map[param]] = pool[parents_idx[idx]][self.param_idx_map[param]]\n                children[i][self.param_idx_map[param_deps[param]]] = pool[parents_idx[idx]][self.param_idx_map[param_deps[param]]]\n\n        return children\n\n    def mutation(self, pool):\n        \"\"\" Perform mutation\n        \"\"\"\n        for p_idx in range(pool.shape[0]):\n            if random.random() < self.params[\"mutation_probability\"]:\n                if random.random() < self.params[\"epsilon\"]:\n                    task_params = self.search_task.generate_random_sample()\n                    for i in range(pool.shape[1]):\n                        pool[p_idx][i] = task_params[self.idx_param_map[i]]\n                else:\n                    idv = pool[p_idx][:]\n                    task_params = {}\n                    for p, param in self.search_task.design.params_config[\"tunable\"].items():\n                        task_params[param[\"name\"]] = idv[self.param_idx_map[param[\"name\"]]]\n                    for p, param in self.search_task.design.params_config[\"external\"].items():\n                        task_params[param[\"name\"]] = self.search_task.workload[\"params\"][param[\"name\"]]\n                    # Build the chains\n                    # [{\"params\": [p0, p3, p7], \"factors\": [ceil(p0/p3), p3/p7, p7]}, {}]\n                    split_chains = []\n                    for p, param in self.search_task.design.params_config[\"external\"].items():\n                        chain = {\"params\": [param[\"name\"]], \"factors\": []}\n                        cur_param = param\n                        while \"split_by\" in cur_param:\n                            if \"divisors\" in self.search_task.design.params_config[\"tunable\"][cur_param[\"split_by\"]] \\\n                                and cur_param[\"name\"] in self.search_task.design.params_config[\"tunable\"][cur_param[\"split_by\"]][\"divisors\"]:\n                                div = 1\n                            else:\n                                div = 0\n                            chain[\"params\"].append(cur_param[\"split_by\"])\n                            if div:\n                                factor = np.ceil(task_params[cur_param[\"name\"]] / task_params[cur_param[\"split_by\"]])\n                            else:\n                                factor = task_params[cur_param[\"name\"]] / task_params[cur_param[\"split_by\"]]\n                            chain[\"factors\"].append(max(1, int(factor)))\n                            cur_param = self.search_task.design.params_config[\"tunable\"][cur_param[\"split_by\"]]\n                        chain[\"factors\"].append(max(1, int(task_params[cur_param[\"name\"]])))\n                        split_chains.append(chain)\n\n                    # Mutation\n                    for chain in split_chains:\n                        if len(chain[\"factors\"]) <= 1:\n                            continue\n                        if 'fix_param' in self.search_task.configs:\n                            # Avoid mutating the fixed parameters\n                            for fix_p in self.search_task.configs['fix_param']:\n                                if fix_p[0] == chain['params'][0]:\n                                    continue\n                        src_idx, dst_idx = random.sample(range(0, len(chain[\"factors\"])), 2)                        \n                        #src_idx, dst_idx = random.sample(range(1, len(chain[\"factors\"])), 2)\n                        mutation_policy_probs = self.params[\"mutation_probs\"]\n                        mutation_policy_probs = np.cumsum(mutation_policy_probs)\n                        #print(mutation_policy_probs)\n                        select_prob = random.random()\n                        if select_prob < mutation_policy_probs[0]:\n                            # Random\n                            if chain[\"factors\"][dst_idx] == 1:\n                                continue\n                            \"\"\"\n                            inc_stride = max(1, int(chain[\"factors\"][src_idx] * random.random() * 1.0))\n                            dec_stride = max(1, int(chain[\"factors\"][dst_idx] - chain[\"factors\"][src_idx] * chain[\"factors\"][dst_idx] / (chain[\"factors\"][src_idx] + inc_stride)))\n                            chain[\"factors\"][src_idx] += inc_stride\n                            chain[\"factors\"][dst_idx] -= dec_stride\n                            chain[\"factors\"][dst_idx] = max(1, chain[\"factors\"][dst_idx])\n                            \"\"\"\n                            #src = chain[\"factors\"][src_idx] + max(1, int(chain[\"factors\"][src_idx] * random.random() * 1.0))\n                            src = max(1, int(chain[\"factors\"][src_idx] * random.random() * 1.0))\n                            dst = max(1, math.ceil(chain[\"factors\"][src_idx] * chain[\"factors\"][dst_idx] / src))\n                            chain[\"factors\"][src_idx] = src\n                            chain[\"factors\"][dst_idx] = dst                        \n                        elif select_prob < mutation_policy_probs[1]:\n                            # Factorization\n                            factor = chain[\"factors\"][src_idx]\n                            if factor == 1:\n                                continue\n                            divs = utils.factorization(factor)\n                            div = random.choice(divs)\n                            chain[\"factors\"][src_idx] /= div\n                            chain[\"factors\"][dst_idx] *= div\n                        else:\n                            # Random (single)\n                            chain[\"factors\"][src_idx] = max(1, int(chain[\"factors\"][src_idx] * random.random() * 1.0))\n\n                    # Revert to the params\n                    # [{\"params\": [p0, p3, p7], \"factors\": [ceil(p0/p3), p3/p7, p7]}, {}]\n                    for chain in split_chains:\n                        factor = chain[\"factors\"][-1]\n                        param = chain[\"params\"][-1]\n                        if param in self.param_idx_map:\n                            pool[p_idx][self.param_idx_map[param]] = factor\n                        for idx in range(len(chain[\"factors\"]) - 2, -1, -1):\n                            param = chain[\"params\"][idx]\n                            factor *= chain[\"factors\"][idx]\n                            if param in self.param_idx_map:\n                                pool[p_idx][self.param_idx_map[param]] = factor\n\n        return pool\n\n    def search(self):\n        \"\"\" Search the design space using genetic algorithms.\n\n        The algorithm is configured by several parameters.\n        @ population_size: the number of trial solutions in each epoch.\n        @ mutation_probability: the chance of each gene in each individual solution\n        to be replaced by a random value.\n        @ crossover_probability: the chance of an existed solution to pass its genome\n        to new trial solutions.\n        @ parents_ratio: the ratio of population filled by the members of the previous\n        generation.\n        \"\"\"\n        self.counter.init_counter('time')\n        self.counter.init_counter('converge_time')\n        self.epoch = 0\n        # Internal testing\n        #local_reward = 0\n\n        # Init the stats\n        num_pop = int(self.params[\"population_size\"])\n        num_gen = int(self.max_epoch // num_pop)\n        num_parents = int(num_pop * self.params[\"parents_ratio\"])\n        self.log(f'Number of generations: {num_gen}')\n        self.log(f'Number of population: {num_pop}')\n        self.log(f'Number of parents: {num_parents}')\n\n        # Init the population\n        population = np.empty((num_pop, len(self.search_task.design.params_config[\"tunable\"])), dtype=int)\n        if \"ancestor\" in self.params and self.params[\"ancestor\"] != None:\n            # Initialize the population with the ancestor\n            ancestor = self.params[\"ancestor\"]\n            task_params = {}\n            idx = 0\n            for p, param in self.search_task.design.params_config[\"external\"].items():\n                task_params[param[\"split_by\"]] = ancestor[idx]\n                idx += 1\n            # Note: We assume only up to two-level tiling\n            for p, param in self.search_task.design.params_config[\"external\"].items():\n                task_params[self.search_task.design.params_config[\"tunable\"][param[\"split_by\"]][\"split_by\"]] = ancestor[idx]\n                idx += 1\n            #print(task_params)\n            task_params = self.search_task.adjust_params(task_params)\n            #print(task_params)\n            param_arr = []\n            for p, param in self.search_task.design.params_config[\"tunable\"].items():\n                param_arr.append(task_params[param[\"name\"]])\n            for i in range(num_pop):\n                population[i] = np.array(param_arr, dtype=int)\n        else:\n            # Initialize the population randomly\n            pop_cnt = 0\n            while pop_cnt < num_pop:\n                task_params = self.search_task.generate_random_sample()\n                param_arr = []\n                for p, param in self.search_task.design.params_config[\"tunable\"].items():\n                    param_arr.append(task_params[param[\"name\"]])\n                population[pop_cnt] = np.array(param_arr, dtype=int)\n                pop_cnt += 1\n        idx = 0\n        for p, param in self.search_task.design.params_config[\"tunable\"].items():\n            self.param_idx_map[param[\"name\"]] = idx\n            self.idx_param_map[idx] = param[\"name\"]\n            idx += 1\n\n        fitness = np.empty(num_pop, dtype=float)\n\n        terminate = False\n        while True:\n            if self.epoch > 0:\n                # Select the parents\n                parents = self.select_parents(population, fitness, num_parents)\n                if parents.shape[0] == 0:\n                    break\n                # Crossover\n                children = self.crossover(parents, num_pop - parents.shape[0])\n                # Mutation\n                children = self.mutation(children)\n                # Compose the new generation\n                population[0:parents.shape[0], :] = parents\n                population[parents.shape[0]:, :] = children\n\n            # Update the fitness\n            for i in range(num_pop):\n                idv = population[i]\n                task_params = {}\n                for p, param in self.search_task.design.params_config[\"tunable\"].items():\n                    task_params[param[\"name\"]] = idv[self.param_idx_map[param[\"name\"]]]\n                for p, param in self.search_task.design.params_config[\"external\"].items():\n                    task_params[param[\"name\"]] = self.search_task.workload[\"params\"][param[\"name\"]]\n                task_params = self.search_task.adjust_params(task_params)\n                reward, used_constraint, reward_meta = self.search_task.evaluate(task_params, self.search_obj)\n                #print(reward, used_constraint)\n                #pprint.pprint(reward_meta)\n                #print(task_params)\n                #exit(0)\n                if self.overuse_constraint(used_constraint):\n                    reward = 0\n                # Internal testing\n                #reward_old = reward\n                #if reward:\n                #    latency_tmp = 0\n                #    for lat in reward_meta[\"latency\"][\"latency_main\"]:\n                #        latency_tmp = max(latency_tmp, reward_meta[\"latency\"][\"latency_main\"][lat])\n                #    reward = 1 / latency_tmp\n\n                fitness[i] = reward\n                # Update the record\n                if reward > self.best_reward:\n                    self.best_reward = reward\n                    self.best_reward_meta = reward_meta\n                    self.best_sol_cst = used_constraint\n                    self.best_sol = task_params\n                    self.log(f'Epoch {self.epoch}: new best reward: {self.best_reward} ({1/self.best_reward:.3f})')\n                    self.last_update_epoch = self.epoch\n                    self.counter.update_counter('converge_time')\n                    self.converge_time = self.counter.get_counter('converge_time')\n                    self.best_search_record = utils.SearchRecord().extract_from_tuner_single_acc(self)\n                    #print(self.best_search_record)\n                    #exit(0)\n                self.best_rewards.append(self.best_reward)\n                self.counter.update_counter('time')\n                self.best_rewards_time.append(self.counter.get_counter('time'))\n\n                # Internal testing\n                #if reward_old > local_reward:\n                #    local_reward = reward_old\n                #self.best_rewards.append(local_reward)\n\n                self.epoch += 1\n                self.counter.update_counter('time')\n                if self.params['time_out'] > 0:\n                    if self.counter.get_counter('time') - self.counter.get_counter('converge_time'):\n                        # If the results are not improved after certain period of time, timeout\n                        terminate = True\n            if self.stop_criteria == \"epoch\" and self.epoch > self.max_epoch:\n                break\n            if self.stop_criteria == \"time\":\n                self.counter.update_counter('time')\n                if self.counter.get_counter('time') > self.max_time:\n                    break\n            if terminate:\n                break\n\n        return\n\ndef non_fuse_genetic_search(search_task, init_tasks, cst, search_obj, max_epochs, max_time, \\\n                            n_worker=1, silent=0, population_size=20, policy=0, meta=None):\n    \"\"\" This function finds the best array architecture for a list of tasks.\n    Init_tasks include the search records for each single task.\n    Policy 0: Allocate the init population based on the achieved throughput of each task.\n    Policy 1: Allocate the init population uniformly.\n    \"\"\"\n    import logging\n    logger = logging.getLogger('AutoSA-Tuner')\n    if silent == 0:\n        logger.info(\"Performing cross layer non-fusion genetic search...\")\n\n    # Internal use for profiling the init population\n    #logger.info('Init tasks')\n    #policy = 1\n    #for task in init_tasks:\n    #    logger.info(f'{task.to_str()}')\n\n    #import pickle    \n    #pickle.dump(init_tasks, open(f'tmp/{search_task.design.name}_init_tasks', 'wb'))\n    #init_tasks = pickle.load(open(f'tmp/{search_task.design.name}_init_tasks', 'rb'))\n\n    # Extract the init popluation allocation information\n    init_pop_record = []\n    for record in init_tasks:\n        task_hash = record.task_sols[0]['hash']\n        init_pop_record.append({\n            'latency': record.latency,\n            'ops': record.task_sols[0]['ops'],\n            'params': record.task_sols[0]['sol'],\n            'flops': record.task_sols[0]['ops'] / record.latency\n        })\n\n    best_latency = utils.compute_tasks_latency(search_task.tasks, init_tasks)\n    if silent == 0:\n        logger.info(f'Cross-layer non-fusion ideal latency: {best_latency}')\n\n    if policy == 0:\n        # Sort the records by flops and prune the ones with low throughput.\n        # The heuristic here is that the arch solution with higher throughput\n        # can potentially deliver the best performance for the entire network.\n        thres = 0.5\n        def takeFLOPS(elem):\n            return elem['flops']\n        init_pop_record.sort(key=takeFLOPS, reverse=True)\n        prune_idx = len(init_pop_record)\n        prune_flops = init_pop_record[0]['flops'] * thres\n        for i in range(len(init_pop_record)):\n            if init_pop_record[i]['flops'] < prune_flops:\n                prune_idx = i\n                break\n        init_pop_record = init_pop_record[:prune_idx]\n    elif policy == 1:\n        random.shuffle(init_pop_record)    \n\n    tuner_params = {\n        \"population_size\": max(population_size, len(init_pop_record)),\n        \"mutation_probability\": 0.7,\n        \"parents_ratio\": 0.3,\n        \"hw_parents_ratio\": 0.1, # Maintain the best parents found by the hw models\n        \"epsilon\": 0.05,\n        \"mutation_probs\": [0.2, 0.8, 0],\n        \"policy\": policy,\n        \"init_pop\": init_pop_record,\n        \"unit_max_epoch\": 0,\n        \"unit_max_time\": max_time,\n        \"best_reward\": 1 / best_latency,\n        \"best_reward_thres\": 0.95, # Terminate if the reward is within xx% compared to the best reward\n        \"use_ml_model\": 1,\n        \"model_gens\": meta[\"xgb_params\"][\"n_gens\"], # Switch to real estimates after every x gens\n        \"prune_params\": {\n            \"reward_thres\": 10, # Prune parents that is x worse than the best\n            \"xgb_n_turns\": population_size, # Use XGBoost model after x epochs\n            \"xgb_thres\": meta[\"xgb_params\"][\"thres\"], # Prune designs below x of the ideal reward\n            \"xgb_thres_adjust\": meta[\"xgb_params\"][\"thres_adjust\"] # Adjust the updated threshold by x\n        },\n        \"one_gen\": meta[\"one_gen\"] if meta else False # Only explore for one generation\n    }    \n\n    if max_epochs > 0:\n        pass\n    else:\n        max_time *= (len(search_task.tasks) * tuner_params[\"population_size\"] * 3)\n        max_time = min(max_time, 180) # 3 min at most\n\n    # Uncomment below if profiling the cost model\n    #tuner_params[\"best_reward_thres\"] = 1\n    #tuner_params[\"prune_params\"][\"xbg_thres\"] = 0\n    #tuner_params[\"policy\"] = 2\n    #tuner_params[\"one_gen\"] = 0\n    #max_time = 1800 # 30min\n\n    # Uncomment below if comparing methods\n    #tuner_params[\"best_reward_thres\"] = 2\n    #tuner_params[\"use_ml_model\"] = 1\n    #max_time = 180 # 3min\n\n    tuner = MultiWorkloadArrayGeneticTuner(search_task, cst, search_obj, max_epochs, max_time, tuner_params, n_worker, silent)\n    tuner.search()\n\n    # Uncomment below if profiling the cost model\n    #np.savetxt('tmp/cost_model_samples.csv', tuner.bst_data['data'], delimiter=',')\n\n    search_record = tuner.best_search_record\n    # Internal use for method comparison\n    #config_str= \"thrpt_init\"    \n    #if tuner_params[\"use_ml_model\"]:\n    #    config_str += \"_ml_\"\n    #    #config_str += f\"{meta['xgb_params']['n_gens']}_{meta['xgb_params']['thres']}_{meta['xgb_params']['thres_adjust']}\"\n    #else:\n    #    config_str += \"_no_ml_\"\n    #config_str += f\"{search_task.design.name}\"        \n\n    #with open(f\"tmp/tuning_rewards_{config_str}.csv\", \"w\", newline='') as f:\n    #    fieldnames = ['epoch', 'reward', 'time']\n    #    writer = csv.DictWriter(f, fieldnames=fieldnames)\n    #    writer.writeheader()\n    #    for epoch in range(len(tuner.best_rewards)):\n    #        writer.writerow({'epoch': epoch, 'reward': tuner.best_rewards[epoch], 'time': tuner.best_rewards_time[epoch]})\n\n    return search_record\n\nclass MultiWorkloadArrayGeneticTuner(GeneticTuner):\n    def __init__(self, search_task, cst, obj, max_epoch, max_time, params, n_worker=1, silent=0):\n        super().__init__(search_task, cst, obj, max_epoch, max_time, params, n_worker=n_worker, silent=silent)\n        self.search_cache = {} # Avoid search duplicate sample\n        self.bst_data = {'num': 0, 'valid': 0, 'data': None} # Boost tree information\n        self.bst = None # Boost tree\n        self.gen = 0\n        self.best_hw_sols = []\n\n    def xgboost_add_sample(self, sol, cst, reward):\n        \"\"\" Add the training sample into the training set.\n        \"\"\"\n        feature = []\n        for p, param in self.search_task.design.params_config['tunable'].items():\n            feature.append(sol[param['name']])\n        for dim in cst['dims']:\n            feature.append(dim)\n        feature.append(cst['SIMD'])\n        feature.append(cst['resource']['BRAM18K'])\n        feature.append(cst['resource']['DSP'])\n        for arr in cst['data_pack']:\n            for dp in cst['data_pack'][arr]:\n                feature.append(dp)\n        feature.append(reward)\n        if self.bst_data['num'] == 0:\n            self.bst_data['data'] = np.array([feature])\n        else:\n            self.bst_data['data'] = np.append(\n                self.bst_data['data'],\n                np.array([feature]), axis=0\n            )\n\n        self.bst_data['num'] += 1\n\n    def xgboost_train(self):\n        \"\"\" Train the XGBoost model.\n        \"\"\"\n        if self.bst_data['num'] == 0:\n            return\n\n        # Build the training set\n        data = self.bst_data['data'][:, :self.bst_data['data'].shape[1] - 1]\n        label = self.bst_data['data'][:, self.bst_data['data'].shape[1] - 1].flatten()\n        if len(label) == 0:\n            return\n\n        dtrain = xgb.DMatrix(data, label=label)\n        param = {'objective':'reg:squarederror', 'nthread': 1}\n        num_round = 10\n        self.bst = xgb.train(param, dtrain, num_round)\n\n        # Disable it when profiling the cost model\n        if self.bst_data['num'] >= self.params['prune_params']['xgb_n_turns']:\n            self.bst_data['valid'] = 1\n\n    def xgboost_predict(self, sol, cst):\n        preds = None\n        if self.bst:\n            feature = []\n            for p, param in self.search_task.design.params_config['tunable'].items():\n                feature.append(sol[param['name']])\n            for dim in cst['dims']:\n                feature.append(dim)\n            feature.append(cst['SIMD'])\n            feature.append(cst['resource']['BRAM18K'])\n            feature.append(cst['resource']['DSP'])\n            for arr in cst['data_pack']:\n                for dp in cst['data_pack'][arr]:\n                    feature.append(dp)\n\n            data = np.array([feature])\n            dtest = xgb.DMatrix(data)\n            preds = self.bst.predict(dtest)[0]\n\n        return preds\n\n    def xgboost_prune(self, sol, cst):\n        \"\"\" Prune the solution by XGBoost model\n        \"\"\"\n        pred = self.xgboost_predict(sol, cst)\n        if pred and self.bst_data['valid'] == 1:\n            if pred < self.params['prune_params']['xgb_thres']:\n                return True\n        return False\n\n    def select_parents(self, population, fitness, num_parents, num_hw_parents):\n        \"\"\" Select \"num_parents\" parents with the highest fitness score.\n        If num_hw_parents > 0, enlist the best hw solutions\n        \"\"\"\n        fitness_idx_sorted = np.argsort(-fitness)\n        parents = population[fitness_idx_sorted[:num_parents]][:]\n\n        sorted_fitness = fitness[fitness_idx_sorted[:num_parents]]\n        # Remove illegal parents\n        cut_idx = 0\n        while cut_idx < len(parents) and sorted_fitness[cut_idx] > 0:\n            cut_idx += 1\n        parents = parents[:cut_idx][:]\n\n        # Remove parents with low performance\n        cut_idx = 0\n        while cut_idx < len(parents) and \\\n              sorted_fitness[cut_idx] > sorted_fitness[0] / self.params['prune_params']['reward_thres']:\n            cut_idx += 1\n        parents = parents[:cut_idx][:]\n\n        # Remove redundant parents\n        cur_idx = 1\n        if parents.shape[0] > 1:\n            while cur_idx < parents.shape[0]:\n                if np.array_equal(parents[cur_idx], parents[cur_idx - 1]):\n                    parents = np.delete(parents, (cur_idx), axis=0)\n                else:\n                    cur_idx += 1\n\n        if num_hw_parents > 0:\n            num_hw_parents = min(num_hw_parents, len(self.best_hw_sols))\n            hw_parents = np.zeros((num_hw_parents, parents.shape[1]))\n            for i in range(num_hw_parents):\n                hw_parents[i] = self.best_hw_sols[-1 - i][\"idv\"]\n            #print(hw_parents)\n            #print(parents)\n            cur_idx = 0\n            while cur_idx < hw_parents.shape[0]:\n                redundant = False\n                for i in range(parents.shape[0]):\n                    if np.array_equal(parents[i], hw_parents[cur_idx]):\n                        redundant = True\n                        break\n                if redundant:\n                    hw_parents = np.delete(hw_parents, (cur_idx), axis=0)\n                else:\n                    cur_idx += 1\n            parents = np.concatenate((hw_parents, parents))\n            parents = parents[:num_parents][:]\n\n        return parents\n\n    def init_population(self, num_pop):\n        population = np.empty((num_pop, len(self.search_task.design.params_config[\"tunable\"])), dtype=int)\n        if self.params[\"policy\"] in [0, 1]:\n            for i in range(num_pop):\n                sol = self.params[\"init_pop\"][i % len(self.params[\"init_pop\"])][\"params\"]\n                param_arr = []\n                for p, param in self.search_task.design.params_config[\"tunable\"].items():\n                    param_arr.append(sol[param[\"name\"]])\n                population[i] = np.array(param_arr, dtype=int)\n        else:\n            raise RuntimeError(\"Unknown policy number.\")\n\n        return population\n\n    def hash_params(self, sol):\n        \"\"\" Hash the sample to string.\n        \"\"\"\n        hash_str = \"\"\n        for k, v in sol.items():\n            hash_str += f'{k}{v}'\n        return hash_str\n\n    def search_design(self, arch_sol, use_model=0, bst=None):\n        \"\"\" Search the optimal task configuration in the fixed array.\n        \"\"\"\n        network_search_record = utils.SearchRecord(self.max).reset()\n        # Update the hardware constraints\n        search_task = copy.deepcopy(self.search_task)\n        arch_cst = search_task.compute_arch_cst(arch_sol)\n        search_task.set_arch_cst(arch_cst)\n        search_task.set_arch_sol(arch_sol)\n\n        job_list = []\n        for task in search_task.tasks:\n            job_list.append({\n                'job_hash': str(task), 'func': genetic_search,\n                'args': [task, self.cst, self.search_obj, self.params[\"unit_max_epoch\"], self.params[\"unit_max_time\"], 1, None, 1, self.sub_task_silent]\n            })\n        pool = utils.MyExecutor(max(int(self.n_worker/2), 2))\n        results = pool.exec(job_list)\n        for task in search_task.tasks:\n            layer_record = results[str(task)]\n            network_search_record = network_search_record.append(layer_record)\n\n        network_search_record.cst = copy.deepcopy(arch_cst[\"resource\"])\n\n        return network_search_record\n\n    def search(self):\n        \"\"\" Search the design space using genetic algorithms.\n\n        The algorithm is configured by several parameters.\n        @ population_size: the number of trial solutions in each epoch.\n        @ mutation_probability: the chance of each gene in each individual solution\n        to be replaced by a random value.\n        @ crossover_probability: the chance of an existed solution to pass its genome\n        to new trial solutions.\n        @ parents_ratio: the ratio of population filled by the members of the previous\n        generation.\n        \"\"\"\n        self.counter.init_counter('time')\n        self.counter.init_counter('converge_time')\n        self.epoch = 0\n\n        # Init the stats\n        num_pop = int(self.params[\"population_size\"])\n        num_gen = int(self.max_epoch // num_pop)\n        num_parents = int(num_pop * self.params[\"parents_ratio\"])\n        if self.params[\"use_ml_model\"]:\n            num_hw_parents = int(num_pop * self.params[\"hw_parents_ratio\"])\n        else:\n            num_hw_parents = 0\n        self.log(f'Number of generations: {num_gen}')\n        self.log(f'Number of population: {num_pop}')\n        self.log(f'Number of parents: {num_parents}')\n        self.log(f'Number of hw parents: {num_hw_parents}')\n\n        idx = 0\n        for p, param in self.search_task.design.params_config[\"tunable\"].items():\n            self.param_idx_map[param[\"name\"]] = idx\n            self.idx_param_map[idx] = param[\"name\"]\n            idx += 1\n\n        # Init the population\n        population = self.init_population(num_pop)\n        fitness = np.empty(num_pop, dtype=float)\n\n        terminate = False\n        while True:\n            # Update the fitness\n            use_model = self.params[\"use_ml_model\"] and self.bst_data['valid'] and (self.gen % self.params['model_gens'] != 0)\n            if self.epoch > 0:\n                if use_model:\n                    num_pop = int(self.params[\"population_size\"]) * 4\n                    population = np.resize(population, (num_pop, population.shape[1]))\n                    fitness = np.resize(fitness, (num_pop))\n                    num_parents = int(num_pop * self.params[\"parents_ratio\"])\n                else:\n                    num_pop = int(self.params[\"population_size\"])\n                    population = np.resize(population, (num_pop, population.shape[1]))\n                    fitness = np.resize(fitness, (num_pop))\n                    num_parents = int(num_pop * self.params[\"parents_ratio\"])\n                if self.params[\"use_ml_model\"] and not use_model and self.bst_data['valid']:\n                    num_hw_parents = int(num_pop * self.params[\"hw_parents_ratio\"])\n                else:\n                    num_hw_parents = 0\n\n                # Select the parents\n                parents = self.select_parents(population, fitness, num_parents, num_hw_parents)\n                if parents.shape[0] == 0:\n                    break\n                # Crossover\n                children = self.crossover(parents, num_pop - parents.shape[0])\n                # Mutation\n                children = self.mutation(children)\n                # Compose the new generation\n                population[0:parents.shape[0], :] = parents\n                population[parents.shape[0]:, :] = children\n                #if use_model:\n                #    print(\"parents:\")\n                #    print(parents)\n                #    print(\"children:\")\n                #    print(children)\n\n            job_list = []\n            results = {}\n            for i in range(num_pop):\n                idv = population[i]\n                task_params = {}\n                for p, param in self.search_task.design.params_config[\"tunable\"].items():\n                    task_params[param[\"name\"]] = idv[self.param_idx_map[param[\"name\"]]]\n                idv_hash = self.hash_params(task_params)\n                for p, param in self.search_task.design.params_config[\"external\"].items():\n                    task_params[param[\"name\"]] = self.search_task.workload[\"params\"][param[\"name\"]]\n                # Note: XGBoost model has compatibility problem with multi-processing.\n                search_task = copy.deepcopy(self.search_task)\n                # Compute the architecture features\n                arch_cst = search_task.compute_arch_cst(task_params)\n                if not use_model:\n                    if idv_hash in self.search_cache:\n                        continue\n                    else:\n                        search_record = utils.SearchRecord(self.max).reset()\n                        if arch_cst:\n                            if not self.xgboost_prune(task_params, arch_cst):\n                                self.search_cache[idv_hash] = {'status': 'submit', 'value': None}\n                                job_list.append({\n                                    'job_hash': idv_hash,\n                                    'func': self.search_design,\n                                    'args': [task_params, use_model, copy.deepcopy(self.bst)]})\n                            else:\n                                results[idv_hash] = search_record\n                        else:\n                            results[idv_hash] = search_record\n                else:\n                    reward = 0\n                    if arch_cst:\n                        reward = self.xgboost_predict(task_params, arch_cst)\n                    results[idv_hash] = reward\n\n            if len(job_list) > 0:\n                pool = utils.MyExecutor(self.n_worker)\n                pool_results = pool.exec(job_list)\n                for result in pool_results:\n                    results[result] = pool_results[result]\n\n            # Update the tuner results\n            for i in range(num_pop):\n                idv = population[i]\n                task_params = {}\n                for p, param in self.search_task.design.params_config[\"tunable\"].items():\n                    task_params[param[\"name\"]] = idv[self.param_idx_map[param[\"name\"]]]\n                idv_hash = self.hash_params(task_params)\n                if use_model:\n                    fitness[i] = results[idv_hash]\n                else:\n                    if idv_hash in self.search_cache and self.search_cache[idv_hash]['status'] == 'done':\n                        fitness[i] = self.search_cache[idv_hash]['value']\n                        continue\n                    search_record = results[idv_hash]\n                    if self.overuse_constraint(search_record.cst) or search_record.valid == 0:\n                        search_record.reward = 0\n                    #self.log(f'{search_record}')\n                    if search_record.reward > 0:\n                        if self.search_task.max_latency == -1 or \\\n                           (self.search_task.max_latency != -1 and (self.best_reward < 1 / self.search_task.max_latency)):\n                           if search_record.reward > self.best_reward:\n                                self.best_reward = search_record.reward\n                                self.best_reward_meta = search_record.reward_meta\n                                self.best_sol_cst = search_record.cst\n                                self.best_sol = {\"arch_sol\": search_record.arch_sol, \\\n                                                 \"task_sols\": search_record.task_sols}\n                                self.log(f'Epoch {self.epoch}: new best reward: {self.best_reward} ({1/self.best_reward:.0f})')\n                                self.last_update_epoch = self.epoch\n                                self.counter.update_counter('converge_time')\n                                self.best_search_record = search_record\n                                self.best_hw_sols.append({\"idv\": population[i], \"reward\": search_record.reward})\n                                if self.best_reward >= self.params[\"best_reward\"] * self.params[\"best_reward_thres\"]:\n                                    terminate = True\n                        else:\n                            # If max_latency is set, when the best search records\n                            # fall less than the max_latency, the tuner will only\n                            # update the records that use fewer memory resources.\n                            if search_record.cst['BRAM18K'] < self.best_search_record.cst['BRAM18K']:\n                                self.best_reward = search_record.reward\n                                self.best_reward_meta = search_record.reward_meta\n                                self.best_sol_cst = search_record.cst\n                                self.best_sol = {\"arch_sol\": search_record.arch_sol, \\\n                                                 \"task_sols\": search_record.task_sols}\n                                self.log(f'Epoch {self.epoch}: new best reward (less BRAM): {self.best_reward} ({1/self.best_reward:.0f})')\n                                self.last_update_epoch = self.epoch\n                                self.counter.update_counter('converge_time')\n                                self.best_search_record = search_record\n                                self.best_hw_sols.append({\"idv\": population[i], \"reward\": search_record.reward})\n                                if self.best_reward >= self.params[\"best_reward\"] * self.params[\"best_reward_thres\"]:\n                                    terminate = True\n\n                    self.best_rewards.append(self.best_reward)\n                    self.counter.update_counter('time')\n                    self.best_rewards_time.append(self.counter.get_counter('time'))\n                    fitness[i] = search_record.reward / self.params['best_reward']\n                    self.search_cache[idv_hash] = {'status': 'done', 'value': fitness[i]}\n                    if terminate:\n                        break\n                self.epoch += 1\n\n            #if use_model:\n            #    print(\"fitness\")\n            #    print(fitness)\n\n            if self.params[\"one_gen\"]:\n                break\n\n            if self.search_task.max_latency != -1 and self.best_search_record.latency < self.search_task.max_latency:\n                break\n\n            # Add training samples\n            if not use_model and self.params[\"use_ml_model\"]:\n                for result in results:\n                    search_record = results[result]\n                    if self.params[\"best_reward\"] and search_record.valid:\n                        arch_cst = self.search_task.compute_arch_cst(search_record.arch_sol)\n                        if search_record.reward > 0:\n                            self.xgboost_add_sample(search_record.arch_sol, arch_cst, search_record.reward / self.params['best_reward'])\n                        else:\n                            self.xgboost_add_sample(search_record.arch_sol, arch_cst, 0)\n\n            # Train the cost model\n            if not use_model and self.params[\"use_ml_model\"]:\n                self.xgboost_train()\n                # Adjust the cost model threshold dynamically\n                if self.best_search_record.valid:\n                    arch_sol = self.best_search_record.arch_sol\n                    arch_cst = self.search_task.compute_arch_cst(arch_sol)\n                    pred = self.xgboost_predict(arch_sol, arch_cst)\n                    self.params['prune_params']['xgb_thres'] = pred * self.params['prune_params']['xgb_thres_adjust']\n                    self.log(f'Updated XGB pruning thres: {self.params[\"prune_params\"][\"xgb_thres\"]}')\n\n            self.gen += 1\n            # Uncomment it if profiling the cost model\n            #print(self.bst_data['num'])\n\n            if self.stop_criteria == \"epoch\" and self.epoch > self.max_epoch:\n                break\n            if self.stop_criteria == \"time\":\n                self.counter.update_counter('time')\n                if self.counter.get_counter('time') > self.max_time:\n                    break\n            if terminate:\n                break\n\n        return\n\ndef all_fuse_genetic_search(search_task, init_tasks, cst, search_obj, max_epochs, max_time, \\\n                            n_worker=1, silent=0, population_size=20, policy=0, explorer=None):\n    \"\"\" This function finds the best array architecture for a list of tasks.\n    Init_tasks include the search records for each single task.\n    All the tasks are fused.\n    Policy 0: We search the best config to minimize the latency of the last task, and\n    use it as the array config to search for the best config for the rest of the layers.\n    Then, we perform several epochs of genetic search on top of the arch config.\n    \"\"\"\n    import logging\n    logger = logging.getLogger('AutoSA-Tuner')\n    if silent == 0:\n        logger.info(\"Performing cross layer all-fusion genetic search...\")\n\n    # If init_tasks are provided, use them as the initial population,\n    # otherwise, the architecture is fixed. Use the fixed arch sol instead.\n    init_pop_record = []\n    best_latency = None\n    if search_task.fixed == 1:\n        init_pop_record.append({\n            'latency': -1, 'ops': -1, 'params': search_task.arch_sol\n        })\n        # Try to search for the last layer under the fixed constraints and add it\n        # as the candidate sample\n        last_task = copy.deepcopy(search_task.tasks[-1])\n        last_task.fuse = 1\n        last_task.last_fuse = 1\n        if last_task.use_uram:\n            last_task.configs['cin_read_mode'] = 3\n        else:\n            last_task.configs['cin_read_mode'] = 2\n        last_task.configs['cout_write_mode'] = 0\n        last_task.set_aux_func('update_cin_latency', 'update_cin_latency_last')\n        if last_task.use_uram == 0:\n            last_task.set_aux_func('update_cin_buf', 'update_cin_buf_bram_last')\n        else:\n            last_task.set_aux_func('update_cin_buf', 'update_cin_buf_uram_last')\n        local_silent = silent\n        if silent == 0:\n            local_silent = 1 if n_worker > 1 else 0\n        job_list = []\n        for repeat in range(3):\n            job_list.append({'job_hash': f'{str(last_task)}_{repeat}', 'func': explorer.tune, \\\n                             'args': [last_task, None, local_silent, 0]})\n        pool = utils.MyExecutor(n_worker)\n        results = pool.exec(job_list)\n        for r in results:\n            if results[r].valid:\n                init_pop_record.append({\n                    'latency': -1, \"ops\": -1, 'params': results[r].task_sols[0]['sol']\n                })\n    else:\n        for record in init_tasks:\n            task_hash = record.task_sols[0]['hash']\n            init_pop_record.append({\n                'latency': record.latency,\n                'ops': record.task_sols[0]['ops'],\n                'params': record.task_sols[0]['sol'],\n                'flops': record.task_sols[0]['ops'] / record.latency\n            })\n\n        best_latency = utils.compute_tasks_latency(search_task.tasks, init_tasks)\n        if silent == 0:\n            logger.info(f'Cross-layer all-fusion ideal latency: {best_latency}')\n\n    tuner_params = {\n        \"population_size\": max(population_size, len(init_pop_record)),\n        \"mutation_probability\": 1.0,\n        \"parents_ratio\": 0.2,\n        \"epsilon\": 0.1,\n        \"policy\": policy,\n        \"init_pop\": init_pop_record,\n        \"unit_max_epoch\": 0,\n        \"unit_max_time\": max_time,\n        \"arch_fixed\": search_task.fixed,\n        \"best_reward\": 1 / best_latency if best_latency else None,\n        \"best_reward_thres\": 0.95, # Terminate if the reward is within xx% compared to the best reward\n        \"model_gens\": 10, # Switch to real estimates after every x gens\n        \"prune_params\": {\n            \"reward_thres\": 10, # Prune parents that is x worse than the best\n            \"xgb_n_turns\": population_size / 2, # Use XGBoost model after x epochs\n            \"xgb_thres\": 0.5, # Prune designs below x of the ideal reward\n            \"xgb_thres_adjust\": 0.8 # Adjust the updated threshold by x\n        }\n    }\n\n    if max_epochs > 0:\n        pass\n    else:\n        max_time *= (len(search_task.tasks) * tuner_params[\"population_size\"] * 3)\n        #if tuner_params[\"arch_fixed\"] == 1:\n        #    max_time = min(max_time, 60) # 60 seconds at most\n        #else:\n        #    max_time = min(max_time, 120) # 120 seconds at most\n        max_time = min(max_time, 120) # 120 seconds at most\n\n    tuner = AllFuseGeneticTuner(search_task, cst, search_obj, max_epochs, max_time, tuner_params, n_worker, silent)\n    tuner.search()\n\n    search_record = tuner.best_search_record\n\n    return search_record\n\nclass AllFuseGeneticTuner(MultiWorkloadArrayGeneticTuner):\n    def __init__(self, search_task, cst, obj, max_epoch, max_time, params, n_worker=1, silent=0):\n        super().__init__(search_task, cst, obj, max_epoch, max_time, params, n_worker=n_worker, silent=silent)\n\n    def init_population(self, num_pop):\n        population = np.empty((num_pop, len(self.search_task.design.params_config[\"tunable\"])), dtype=int)\n        # Allocate uniformly\n        for i in range(num_pop):\n            sol = self.params[\"init_pop\"][i % len(self.params[\"init_pop\"])][\"params\"]\n            param_arr = []\n            for p, param in self.search_task.design.params_config[\"tunable\"].items():\n                param_arr.append(sol[param[\"name\"]])\n            population[i] = np.array(param_arr, dtype=int)\n\n        return population\n\n    def update_task_configs(self, tasks):\n        \"\"\" Update the fusion task configurations.\n        \"\"\"\n        for task_idx in range(len(tasks)):\n            task = tasks[task_idx]\n            task.fuse = 1\n            if task_idx == len(tasks) - 1:\n                task.last_fuse = 1\n            if task_idx == 0:\n                if task.use_uram == 0:\n                    task.configs['cin_read_mode'] = 1 # load one time\n                else:\n                    task.configs['cin_read_mode'] = 0 # load in ping-pong fashion\n            else:\n                if task.use_uram == 0:\n                    task.configs['cin_read_mode'] = 2 # load from on-chip BRAM buffers\n                else:\n                    task.configs['cin_read_mode'] = 3 # load from on-chip URAM buffers\n            if task_idx == len(tasks) - 1:\n                task.configs['cout_write_mode'] = 0 # write to off-chip memory\n            else:\n                task.configs['cout_write_mode'] = 1 # write to on-chip buffer\n            if task_idx == len(tasks) - 1:\n                task.set_aux_func('update_cin_latency', 'update_cin_latency_last')\n                if task.use_uram == 0:\n                    task.set_aux_func('update_cin_buf', 'update_cin_buf_bram_last')\n                else:\n                    task.set_aux_func('update_cin_buf', 'update_cin_buf_uram_last')\n            else:\n                task.set_aux_func('update_cin_latency', 'update_cin_latency')\n                if task.use_uram == 0:\n                    task.set_aux_func('update_cin_buf', 'update_cin_buf_bram')\n                else:\n                    task.set_aux_func('update_cin_buf', 'update_cin_buf_uram')\n\n    def update_fused_task_dims(self, last_sol, last_task, cur_task, partial):\n        \"\"\" Given the solution of the latter layer, update the workload dimensions of the\n        current layer.\n        For fused CNN, we have the or_t and oc_t from the latter layer.\n        We will estimate the or_t' and oc_t' of the former layer by\n        or_t' = or_t + k - 1\n        oc_t' = oc_t + k - 1\n        \"\"\"\n        if partial == 1:\n            or_t = min(last_sol['r_t1'], last_task.workload['params']['r'])\n            oc_t = min(last_sol['c_t1'], last_task.workload['params']['c'])\n        else:\n            or_t = last_task.workload['params']['r']\n            oc_t = last_task.workload['params']['c']\n\n        for tag in cur_task.workload['tags']:\n            if tag.startswith('maxpool'):\n                stride = int(tag.split('_')[-1])\n                or_t *= stride\n                oc_t *= stride\n        k = cur_task.workload['params']['p']\n        or_t_prev = or_t + k - 1\n        oc_t_prev = oc_t + k - 1\n        cur_task.workload['params']['r'] = or_t_prev\n        cur_task.workload['params']['c'] = oc_t_prev\n\n        return cur_task\n\n    def est_latency(self, layer_stats, search_task, mode=0):\n        \"\"\" Estimate the overall latency of the fused tasks.\n        If mode is 1, the last task r/c are set to 1.\n        \"\"\"\n        one_pass_latency = 0\n        for task_id in range(len(search_task.tasks)):\n            task = search_task.tasks[task_id]\n            nxt_task_id = (task_id + 1) % len(search_task.tasks)\n            if task_id == len(search_task.tasks) - 1:\n                one_pass_latency += layer_stats[task_id].reward_meta['latency']['latency_main'] / \\\n                                    np.ceil(task.workload['params']['r'] / layer_stats[task_id].task_sols[0]['sol']['r_t1']) / \\\n                                    np.ceil(task.workload['params']['c'] / layer_stats[task_id].task_sols[0]['sol']['c_t1']) + \\\n                                    max(layer_stats[nxt_task_id].reward_meta['latency']['latency_prologue'], layer_stats[task_id].reward_meta['latency']['latency_epilogue'])\n            else:\n                one_pass_latency += layer_stats[task_id].reward_meta['latency']['latency_main'] + \\\n                                    max(layer_stats[nxt_task_id].reward_meta['latency']['latency_prologue'],\n                                        layer_stats[task_id].reward_meta['latency']['latency_epilogue'])\n        last_task = search_task.tasks[-1]\n        if mode == 1:\n            # Revert back\n            last_task.workload[\"params\"]['r'] = last_task.workload[\"params\"]['old_r']\n            last_task.workload[\"params\"]['c'] = last_task.workload[\"params\"]['old_c']\n\n        total_latency = np.ceil(last_task.workload['params']['r'] / layer_stats[-1].task_sols[0]['sol']['r_t1']) * \\\n                        np.ceil(last_task.workload['params']['c'] / layer_stats[-1].task_sols[0]['sol']['c_t1']) * \\\n                        one_pass_latency\n        total_latency += layer_stats[0].reward_meta['latency']['latency_prologue']\n\n        return total_latency\n\n    def est_off_chip_trans(self, layer_stats, search_task, mode=0):\n        \"\"\" Compute the total off-chip transactions.\n        \"\"\"\n        total_trans = 0\n        one_pass_trans = 0\n        for task_id in range(len(search_task.tasks) - 1):\n            task = search_task.tasks[task_id]\n            layer_stat = layer_stats[task_id]\n            sol = layer_stat.task_sols[0]['sol']\n            if task_id == 0:\n                # Read cin, weights off-chip, write cout on-chip\n                one_pass_trans += np.ceil(task.workload['params']['i'] / sol['i_t1']) * \\\n                                  np.ceil(task.workload['params']['o'] / sol['o_t1']) * \\\n                                  np.ceil(task.workload['params']['r'] / sol['r_t1']) * \\\n                                  np.ceil(task.workload['params']['c'] / sol['c_t1']) * \\\n                                  (sol['i_t1'] * sol['r_t1'] * sol['c_t1'] + sol['i_t1'] * sol['o_t1'] * task.workload['params']['p'] * task.workload['params']['q'])\n            else:\n                # Read cin on-chip, weights off-chip, write cout on-chip\n                one_pass_trans += np.ceil(task.workload['params']['i'] / sol['i_t1']) * \\\n                                  np.ceil(task.workload['params']['o'] / sol['o_t1']) * \\\n                                  np.ceil(task.workload['params']['r'] / sol['r_t1']) * \\\n                                  np.ceil(task.workload['params']['c'] / sol['c_t1']) * \\\n                                  (sol['i_t1'] * sol['o_t1'] * task.workload['params']['p'] * task.workload['params']['q'])\n        last_task = search_task.tasks[-1]\n        if mode == 1:\n            # Revert back\n            last_task.workload[\"params\"]['r'] = last_task.workload[\"params\"]['old_r']\n            last_task.workload[\"params\"]['c'] = last_task.workload[\"params\"]['old_c']\n\n        total_trans = np.ceil(last_task.workload[\"params\"]['r'] / sol['r_t1']) * \\\n                      np.ceil(last_task.workload[\"params\"]['c'] / sol['c_t1']) * one_pass_trans\n        # Last task, read cin on-chip, weights off-chip, write cout off-chip\n        sol = layer_stats[-1].task_sols[0]['sol']\n        total_trans += np.ceil(last_task.workload['params']['i'] / sol['i_t1']) * \\\n                       np.ceil(last_task.workload['params']['o'] / sol['o_t1']) * \\\n                       np.ceil(last_task.workload['params']['r'] / sol['r_t1']) * \\\n                       np.ceil(last_task.workload['params']['c'] / sol['c_t1']) * \\\n                       (sol['i_t1'] * sol['o_t1'] * task.workload['params']['p'] * task.workload['params']['q']) + \\\n                       np.ceil(last_task.workload['params']['o'] / sol['o_t1']) * \\\n                       np.ceil(last_task.workload['params']['r'] / sol['r_t1']) * \\\n                       np.ceil(last_task.workload['params']['c'] / sol['c_t1']) * \\\n                       sol['o_t1'] * sol['r_t1'] * sol['c_t1']\n\n        return total_trans\n\n    def search_fixed_design(self, last_layer_sol, use_model=0, bst=None):\n        \"\"\" This function takes a fixed array and the solution of the last layer,\n        searches the config of the rest of the layers.\n        \"\"\"\n        network_search_record = utils.SearchRecord(self.max).reset()\n        # Update the hardware constraints\n        search_task = copy.deepcopy(self.search_task)\n        # Update the task configs\n        self.update_task_configs(search_task.tasks)\n\n        # Update the workload parameters\n        for p in search_task.tasks[-1].workload[\"params\"]:\n            last_layer_sol[p] = search_task.tasks[-1].workload[\"params\"][p]\n\n        last_sol = last_layer_sol\n        last_task = search_task.tasks[-1]\n\n        succeed = True\n        layer_stats = []\n        total_ops = 0\n        for task in search_task.tasks:\n            total_ops += task.compute_ops()\n        # Build the record of the last layer\n        reward, used_constraint, reward_meta = last_task.evaluate(last_layer_sol, self.search_obj)\n\n        if self.overuse_constraint(used_constraint):\n            reward = 0\n            return network_search_record\n        record = utils.SearchRecord(self.max).reset()\n        record.valid = 1\n        record.metric = self.search_obj\n        record.cst = used_constraint\n        record.reward = reward\n        record.reward_meta = reward_meta\n        record.latency = 1 / reward\n        record.ops = last_task.compute_ops()\n        record.task_names = [last_task.workload[\"name\"]]\n        record.arch_sol = last_task.arch_sol\n        record.task_sols = [{\n            \"name\": last_task.workload[\"name\"],\n            \"hash\": str(last_task),\n            \"ops\": last_task.compute_ops(),\n            \"sol\": last_layer_sol,\n            \"latency\": record.latency,\n            \"DSP_eff\": 0,\n            #\"reward_meta\": reward_meta,\n            \"BW\": 0\n        }]\n        record.records = None\n        layer_stats.append(record)\n        network_search_record = network_search_record.append(record)\n\n        for task_idx in range(len(search_task.tasks) - 2, -1, -1):\n            task = search_task.tasks[task_idx]\n            # Update the task desp\n            task = self.update_fused_task_dims(last_sol, last_task, task, 1 if task_idx == len(search_task.tasks) - 2 else 0)\n            search_record = genetic_search(task, self.cst, self.search_obj, self.params[\"unit_max_epoch\"], self.params[\"unit_max_time\"], 1, None, 1, self.sub_task_silent)\n            if search_record.valid == 0:\n                succeed = False\n                break\n            last_sol = search_record.task_sols[0]['sol']\n            last_task = task\n            network_search_record = network_search_record.append(search_record)\n            # Update the resource constraints\n            if task.use_uram == 0:\n                if search_record.cst[\"BRAM18K\"] > network_search_record.cst[\"BRAM18K\"]:\n                    network_search_record.cst = search_record.cst\n            else:\n                if search_record.cst[\"URAM\"] > network_search_record.cst[\"URAM\"]:\n                    network_search_record.cst = search_record.cst\n            layer_stats.insert(0, search_record)\n\n        network_search_record.fuse = 1\n        if succeed:\n            total_latency = self.est_latency(layer_stats, search_task)\n            network_search_record.reward = 1 / total_latency\n            network_search_record.latency = total_latency\n        else:\n            network_search_record.valid = 0\n\n        return network_search_record\n\n    def search_design1(self, arch_sol, use_model=0, bst=None):\n        \"\"\" This function searches from the last layer, and uses the\n        solution from the latter layer to allocate the fusion task of the previous layer.\n        It tends to allocate large tiles for the latter layers, which may\n        lead to large tiles for the early layers, resulting in no solution.\n        \"\"\"\n        network_search_record = utils.SearchRecord(self.max).reset()\n        # Update the hardware constraints\n        search_task = copy.deepcopy(self.search_task)\n        arch_cst = search_task.compute_arch_cst(arch_sol)\n        search_task.set_arch_cst(arch_cst)\n        search_task.set_arch_sol(arch_sol)\n\n        last_sol = None\n        last_task = None\n        succeed = True\n        layer_stats = []\n        total_ops = 0\n        for task in search_task.tasks:\n            total_ops += task.compute_ops()\n\n        # Update the task configs\n        self.update_task_configs(search_task.tasks)\n\n        for task_idx in range(len(search_task.tasks) - 1, -1, -1):\n            task = search_task.tasks[task_idx]\n            if task_idx < len(search_task.tasks) - 1:\n                # Update the task desp\n                task = self.update_fused_task_dims(last_sol, last_task, task, 1 if task_idx == len(search_task.tasks) - 2 else 0)\n            search_record = genetic_search(task, self.cst, self.search_obj, self.params[\"unit_max_epoch\"], self.params[\"unit_max_time\"], 1, None, 1, self.sub_task_silent)\n            if search_record.valid == 0:\n                succeed = False\n                break\n            last_sol = search_record.task_sols[0]['sol']\n            last_task = task\n            network_search_record = network_search_record.append(search_record)\n            # Update the resource constraints\n            if task.use_uram == 0:\n                if search_record.cst[\"BRAM18K\"] > network_search_record.cst[\"BRAM18K\"]:\n                    network_search_record.cst = search_record.cst\n            else:\n                if search_record.cst[\"URAM\"] > network_search_record.cst[\"URAM\"]:\n                    network_search_record.cst = search_record.cst\n            layer_stats.insert(0, search_record)\n\n        network_search_record.fuse = 1\n        if succeed:\n            total_latency = self.est_latency(layer_stats, search_task)\n            total_off_chip_trans = self.est_off_chip_trans(layer_stats, search_task)\n            network_search_record.reward = 1 / total_latency\n            network_search_record.latency = total_latency\n            network_search_record.ctc = total_ops / (total_off_chip_trans * search_task.dw)\n        else:\n            network_search_record.valid = 0\n\n        return network_search_record\n\n    def search_design2(self, arch_sol, use_model=0, bst=None):\n        \"\"\" This function searches from the last layer, and uses the\n        solution from the latter layer to allocate the fusion task of the previous layer.\n        The tile size of the last layer is fixed to 1x1.\n        \"\"\"\n        network_search_record = utils.SearchRecord(self.max).reset()\n        # Update the hardware constraints\n        search_task = copy.deepcopy(self.search_task)\n        arch_cst = search_task.compute_arch_cst(arch_sol)\n        search_task.set_arch_cst(arch_cst)\n        search_task.set_arch_sol(arch_sol)\n\n        last_sol = None\n        last_task = None\n        succeed = True\n        layer_stats = []\n        total_ops = 0\n        for task in search_task.tasks:\n            total_ops += task.compute_ops()\n\n        # Update the task configs\n        self.update_task_configs(search_task.tasks)\n\n        for task_idx in range(len(search_task.tasks) - 1, -1, -1):\n            task = search_task.tasks[task_idx]\n            if task_idx == len(search_task.tasks) - 1:\n                # Fix the r/c to 1\n                task.workload[\"params\"]['old_r'] = task.workload[\"params\"]['r']\n                task.workload[\"params\"]['old_c'] = task.workload[\"params\"]['c']\n                task.workload[\"params\"]['r'] = 1\n                task.workload[\"params\"]['c'] = 1\n            else:\n                # Update the task desp\n                task = self.update_fused_task_dims(last_sol, last_task, task, 1 if task_idx == len(search_task.tasks) - 2 else 0)\n            search_record = genetic_search(task, self.cst, self.search_obj, self.params[\"unit_max_epoch\"], self.params[\"unit_max_time\"], 1, None, 1, self.sub_task_silent)\n            if search_record.valid == 0:\n                succeed = False\n                break\n            last_sol = search_record.task_sols[0]['sol']\n            last_task = task\n            network_search_record = network_search_record.append(search_record)\n            # Update the resource constraints\n            if task.use_uram == 0:\n                if search_record.cst[\"BRAM18K\"] > network_search_record.cst[\"BRAM18K\"]:\n                    network_search_record.cst = search_record.cst\n            else:\n                if search_record.cst[\"URAM\"] > network_search_record.cst[\"URAM\"]:\n                    network_search_record.cst = search_record.cst\n            layer_stats.insert(0, search_record)\n\n        network_search_record.fuse = 1\n        if succeed:\n            total_latency = self.est_latency(layer_stats, search_task, mode=1)\n            total_off_chip_trans = self.est_off_chip_trans(layer_stats, search_task, mode=1)\n            network_search_record.reward = 1 / total_latency\n            network_search_record.latency = total_latency\n            network_search_record.ctc = total_ops / (total_off_chip_trans * search_task.dw)\n        else:\n            network_search_record.valid = 0\n\n        return network_search_record\n\n    def search(self):\n        self.counter.init_counter('time')\n        self.counter.init_counter('converge_time')\n        self.epoch = 0\n\n        # Init the stats\n        num_pop = int(self.params[\"population_size\"])\n        num_gen = int(self.max_epoch // num_pop)\n        num_parents = int(num_pop * self.params[\"parents_ratio\"])\n        self.log(f'Number of generations: {num_gen}')\n        self.log(f'Number of population: {num_pop}')\n        self.log(f'Number of parents: {num_parents}')\n\n        idx = 0\n        for p, param in self.search_task.design.params_config[\"tunable\"].items():\n            self.param_idx_map[param[\"name\"]] = idx\n            self.idx_param_map[idx] = param[\"name\"]\n            idx += 1\n\n        # Init the population\n        population = self.init_population(num_pop)\n        fitness = np.empty(num_pop, dtype=float)\n\n        terminate = False\n        while True:\n            if self.epoch > 0:\n                # Select the parents\n                parents = self.select_parents(population, fitness, num_parents)\n                if parents.shape[0] == 0:\n                    break\n                # Crossover\n                children = self.crossover(parents, num_pop - parents.shape[0])\n                # Mutation\n                children = self.mutation(children)\n                # Compose the new generation\n                population[0:parents.shape[0], :] = parents\n                population[parents.shape[0]:, :] = children\n\n            # Update the fitness\n            use_model = self.bst_data['valid'] and (self.gen % self.params['model_gens'] != 0)\n            job_list = []\n            results = {}\n            for i in range(num_pop):\n                idv = population[i]\n                task_params = {}\n                for p, param in self.search_task.design.params_config[\"tunable\"].items():\n                    task_params[param[\"name\"]] = idv[self.param_idx_map[param[\"name\"]]]\n                idv_hash = self.hash_params(task_params)\n                for p, param in self.search_task.design.params_config[\"external\"].items():\n                    task_params[param[\"name\"]] = self.search_task.workload[\"params\"][param[\"name\"]]\n                # Note: XGBoost model has compatibility problem with multi-processing.\n                search_task = copy.deepcopy(self.search_task)\n                # Compute the architecture features\n                arch_cst = search_task.compute_arch_cst(task_params)\n                if not use_model:\n                    if idv_hash in self.search_cache:\n                        continue\n                    else:\n                        search_record = utils.SearchRecord(self.max).reset()\n                        if arch_cst:\n                            if not self.xgboost_prune(task_params, arch_cst):\n                                self.search_cache[idv_hash] = {'status': 'submit', 'value': None}\n                                if self.params[\"arch_fixed\"] == 0:\n                                    job_list.append({\n                                        'job_hash': idv_hash,\n                                        'func': self.search_design1 if self.params['policy'] == 0 else self.search_design2,\n                                        'args': [task_params, use_model, copy.deepcopy(self.bst)]})\n                                else:\n                                    job_list.append({\n                                        'job_hash': idv_hash,\n                                        'func': self.search_fixed_design,\n                                        'args': [task_params, use_model, copy.deepcopy(self.bst)]})\n                            else:\n                                results[idv_hash] = search_record\n                        else:\n                            results[idv_hash] = search_record\n                else:\n                    reward = 0\n                    if arch_cst:\n                        reward = self.xgboost_predict(task_params, arch_cst)[0]\n                    results[idv_hash] = reward\n\n            if len(job_list) > 0:\n                pool = utils.MyExecutor(self.n_worker)\n                pool_results = pool.exec(job_list)\n                for result in pool_results:\n                    results[result] = pool_results[result]\n\n            # Update the tuner results\n            for i in range(num_pop):\n                idv = population[i]\n                task_params = {}\n                for p, param in self.search_task.design.params_config[\"tunable\"].items():\n                    task_params[param[\"name\"]] = idv[self.param_idx_map[param[\"name\"]]]\n                idv_hash = self.hash_params(task_params)\n                if use_model:\n                    fitness[i] = results[idv_hash]\n                else:\n                    if idv_hash in self.search_cache and self.search_cache[idv_hash]['status'] == 'done':\n                        fitness[i] = self.search_cache[idv_hash]['value']\n                        continue\n                    search_record = results[idv_hash]\n                    if search_record.valid == 0 or self.overuse_constraint(search_record.cst):\n                        search_record.reward = 0\n                    if search_record.reward > 0:\n                        if search_record.reward > self.best_reward:\n                            self.best_reward = search_record.reward\n                            self.best_reward_meta = search_record.reward_meta\n                            self.best_sol_cst = search_record.cst\n                            self.best_sol = {\"arch_sol\": search_record.arch_sol, \\\n                                             \"task_sols\": search_record.task_sols}\n                            self.log(f'Epoch {self.epoch}: new best reward: {self.best_reward} ({1/self.best_reward:.0f})')\n                            self.last_update_epoch = self.epoch\n                            self.counter.update_counter('converge_time')\n                            self.best_search_record = search_record\n                            if self.params[\"arch_fixed\"] == 1:\n                                if not self.params[\"best_reward\"]:\n                                    self.params[\"best_reward\"] = search_record.reward\n                            else:\n                                if self.best_reward >= self.params[\"best_reward\"] * self.params[\"best_reward_thres\"]:\n                                    terminate = True\n\n                    self.best_rewards.append(self.best_reward)\n                    if not self.params[\"best_reward\"]:\n                        if search_record.reward == 0:\n                            fitness[i] = 0\n                        else:\n                            raise RuntimeError(\"Best reward is not set.\")\n                    else:\n                        fitness[i] = search_record.reward / self.params['best_reward']\n                    self.search_cache[idv_hash] = {'status': 'done', 'value': fitness[i]}\n                    if terminate:\n                        break\n                self.epoch += 1\n\n            # Add training samples\n            if not use_model:\n                for result in results:\n                    search_record = results[result]\n                    if self.params[\"best_reward\"] and search_record.valid:\n                        arch_cst = self.search_task.compute_arch_cst(search_record.arch_sol)\n                        if search_record.reward > 0:\n                            self.xgboost_add_sample(search_record.arch_sol, arch_cst, search_record.reward / self.params['best_reward'])\n                        else:\n                            self.xgboost_add_sample(search_record.arch_sol, arch_cst, 0)\n\n            # Train the cost model\n            if not use_model:\n                self.xgboost_train()\n                # Adjust the cost model threshold dynamically\n                if self.best_search_record.valid:\n                    arch_sol = self.best_search_record.arch_sol\n                    arch_cst = self.search_task.compute_arch_cst(arch_sol)\n                    pred = self.xgboost_predict(arch_sol, arch_cst)\n                    self.params['prune_params']['xgb_thres'] = pred * self.params['prune_params']['xgb_thres_adjust']\n                    self.log(f'Updated XGB pruning thres: {self.params[\"prune_params\"][\"xgb_thres\"]}')\n\n            self.gen += 1\n\n            #exit(0)\n            if self.stop_criteria == \"epoch\" and self.epoch > self.max_epoch:\n                break\n            if self.stop_criteria == \"time\":\n                self.counter.update_counter('time')\n                if self.counter.get_counter('time') > self.max_time:\n                    break\n            if terminate:\n                break\n\n        return\n\ndef fuse_genetic_search(search_task, init_tasks, cst, search_obj, max_epochs, max_time, \\\n                        n_worker=1, silent=0, population_size=20, policy=0, meta=None, explorer=None):\n    \"\"\" This function finds the best fused array architecture for a list of tasks.\n    Init_tasks include the search records for each single task.\n    \"\"\"\n    import logging\n    logger = logging.getLogger('AutoSA-Tuner')\n    if silent == 0:\n        logger.info(\"Performing cross layer partial-fusion genetic search...\")\n\n    best_latency = utils.compute_tasks_latency(search_task.tasks, init_tasks)\n    if silent == 0:\n        logger.info(f'Cross-layer partial-fusion ideal latency: {best_latency}')\n\n    thres = 0.5\n    def takeFLOPS(elem):\n        return elem['flops']\n    multi_task_records = []\n    single_task_records = []\n    for record in init_tasks:\n        if record.valid == 0:\n            continue\n        if len(record.task_sols) > 1:\n            multi_task_records.append(record)\n        else:\n            single_task_records.append(record)\n    init_pop_record = []\n    for record in single_task_records:\n        if record.valid == 0:\n            continue\n        init_pop_record.append({\n            'latency': record.latency,\n            'ops': record.task_sols[0]['ops'],\n            'params': record.task_sols[0]['sol'],\n            'flops': record.task_sols[0]['ops'] / record.latency\n        })\n    init_pop_record.sort(key=takeFLOPS, reverse=True)\n    prune_idx = len(init_pop_record)\n    prune_flops = init_pop_record[0]['flops'] * thres\n    for i in range(len(init_pop_record)):\n        if init_pop_record[i]['flops'] < prune_flops:\n            prune_idx = i\n            break\n    init_pop_record = init_pop_record[:prune_idx]\n\n    for record in multi_task_records:\n        init_pop_record.insert(0, {\n            'latency': record.latency,\n            'ops': 0,\n            'params': record.arch_sol\n        })\n\n    tuner_params = {\n        \"population_size\": max(population_size, len(init_pop_record)),\n        \"mutation_probability\": 1.0,\n        \"parents_ratio\": 0.2,\n        \"epsilon\": 0.1,\n        \"policy\": policy,\n        \"init_pop\": init_pop_record,\n        \"unit_max_epoch\": 0,\n        \"unit_max_time\": max_time,\n        \"explorer\": explorer,\n        \"best_reward\": 1 / best_latency if best_latency else None,\n        \"best_reward_thres\": 0.95, # Terminate if the reward is within xx% compared to the best reward\n        \"model_gens\": 10, # Switch to real estimates after every x gens\n        \"prune_params\": {\n            \"reward_thres\": 10, # Prune parents that is x worse than the best\n            \"xgb_n_turns\": population_size / 2, # Use XGBoost model after x epochs\n            \"xgb_thres\": 0.5, # Prune designs below x of the ideal reward\n            \"xgb_thres_adjust\": 0.8 # Adjust the updated threshold by x\n        }\n    }\n\n    if meta:\n        tuner_params[\"fusion_candidates\"] = meta[\"fusion_candidates\"]\n\n    if max_epochs > 0:\n        pass\n    else:\n        max_time *= (len(search_task.tasks) * tuner_params[\"population_size\"] * 3)\n        max_time = min(max_time, 600) # 600 seconds at most\n\n    tuner = FuseGeneticTuner(search_task, cst, search_obj, max_epochs, max_time, tuner_params, n_worker, silent)\n    tuner.search()\n\n    search_record = tuner.best_search_record\n\n    return search_record\n\nclass FuseDPTuner(object):\n    def __init__(self, config, tasks, cst, n_worker=1):\n        self.config = config\n        self.tasks = tasks\n        self.cst = cst\n        self.n_worker = n_worker\n\n    def hash_dp_task(self, tasks):\n        ret = \"\"\n        for task in tasks:\n            ret += str(task)\n        return ret\n\n    def DP(self, cur_tasks, cut_idx):\n        num_tasks = len(cur_tasks)\n        search_record = utils.SearchRecord().reset()\n\n        if num_tasks == 1:\n            new_task = copy.deepcopy(cur_tasks[0])\n            new_task.set_arch_cst(copy.deepcopy(self.config['arch_cst']))\n            new_task.set_arch_sol(new_task.arch_sol)\n            new_task.fuse = 0\n            if str(new_task) in self.config['search_jobs'] and self.config['search_jobs'][str(new_task)]['done'] == 1:\n                search_record = self.config['search_jobs'][str(new_task)]['search_record'].dup()\n                # Correct the task names since cache is used\n                search_record.task_names = [cur_tasks[0].workload[\"name\"]]\n                search_record.exec_model = [cur_tasks[0].workload[\"name\"]]\n                search_record.records = None\n            else:\n                # Submit the task\n                self.config['search_jobs'][str(new_task)] = {'search_task': new_task, 'done': 0}\n        elif cut_idx == num_tasks:\n            task_names = []\n            exec_model = []\n            for task in cur_tasks:\n                task_names.append(task.workload[\"name\"])\n                exec_model.append(task.workload[\"name\"])\n            task_names_str = ''.join(task_names)\n            if \"fusion_candidates\" in self.config.keys():\n                # Only fuse the promising candidates\n                if task_names_str not in self.config['fusion_candidates']:\n                    return search_record\n            cur_tasks = copy.deepcopy(cur_tasks)\n            new_task = MultiTask(cur_tasks[0].design, cur_tasks, self.cst, fuse=2, use_uram=self.config['explorer'].search_config[\"use_uram\"])\n            new_task.set_arch_cst(copy.deepcopy(self.config['arch_cst']))\n            new_task.set_arch_sol(cur_tasks[0].arch_sol)\n            if str(new_task) in self.config['search_jobs'] and self.config['search_jobs'][str(new_task)]['done'] == 1:\n                search_record = self.config['search_jobs'][str(new_task)]['search_record'].dup()\n                # Correct the task names since cache is used\n                search_record.task_names = task_names\n                search_record.exec_model = exec_model\n            else:\n                self.config['search_jobs'][str(new_task)] = {'search_task': new_task, 'done': 0}\n        else:\n            for cut_idx in range(1, num_tasks + 1):\n                # Front\n                front = cur_tasks[:cut_idx]\n                front_hash = self.hash_dp_task(front)\n                if front_hash in self.config['DP_tasks']:\n                    search_record_front = self.config['DP_tasks'][front_hash].dup()\n                    # Update the task names\n                    task_names = []\n                    for task in front:\n                        task_names.append(task.workload[\"name\"])\n                    search_record_front.task_names = task_names\n                else:\n                    search_record_front = self.DP(front, cut_idx)\n                    self.config['DP_tasks'][front_hash] = search_record_front\n\n                if (cut_idx < num_tasks) and (self.mode == \"submit\" or \\\n                   (self.mode == \"aggregate\" and search_record_front.valid == 1)):\n                    # Back\n                    back = cur_tasks[cut_idx:]\n                    back_hash = self.hash_dp_task(back)\n                    if back_hash in self.config['DP_tasks']:\n                        search_record_back = self.config['DP_tasks'][back_hash].dup()\n                        # Update the task names\n                        task_names = []\n                        for task in back:\n                            task_names.append(task.workload[\"name\"])\n                        search_record_back.task_names = task_names\n                    else:\n                        search_record_back = self.DP(back, cut_idx)\n                        self.config['DP_tasks'][back_hash] = search_record_back\n\n                    local_search_record = utils.SearchRecord().reset().merge(search_record_front, search_record_back)\n                else:\n                    local_search_record = search_record_front\n\n                # Update the task names\n                task_names = []\n                for task in cur_tasks:\n                    task_names.append(task.workload[\"name\"])\n                local_search_record.task_names = task_names\n                search_record.update(local_search_record)\n\n        return search_record\n\n    def exec(self):\n        job_list = []\n        for job in self.config['search_jobs']:\n            explorer = copy.deepcopy(self.config['explorer'])\n            # Reduce the maximal forked processes\n            explorer.search_config['n_worker'] = max(int(self.n_worker / 2), 2)\n            job_list.append(\n                {'job_hash': job, 'func': explorer.tune,\n                 'args': [self.config['search_jobs'][job]['search_task'], None, 1, 0]}\n            )\n        pool = utils.MyExecutor(max(int(self.n_worker / 2), 2))\n        results = pool.exec(job_list)\n\n        for job in self.config['search_jobs']:\n            self.config['search_jobs'][job]['done'] = 1\n            self.config['search_jobs'][job]['search_record'] = results[job]\n\n    def search(self):\n        # Submit all DP tasks\n        self.mode = \"submit\"\n        self.DP(self.tasks, -1)\n        # Execute tasks\n        self.exec()\n        self.config['DP_tasks'] = {}\n        # Collect the results\n        self.mode = \"aggregate\"\n        search_record = self.DP(self.tasks, -1)\n\n        return search_record\n\nclass FuseGeneticTuner(MultiWorkloadArrayGeneticTuner):\n    def __init__(self, search_task, cst, obj, max_epoch, max_time, params, n_worker=1, silent=0):\n        super().__init__(search_task, cst, obj, max_epoch, max_time, params, n_worker=n_worker, silent=silent)\n\n    def init_population(self, num_pop):\n        population = np.empty((num_pop, len(self.search_task.design.params_config[\"tunable\"])), dtype=int)\n        # Allocate uniformly\n        for i in range(num_pop):\n            sol = self.params[\"init_pop\"][i % len(self.params[\"init_pop\"])][\"params\"]\n            param_arr = []\n            for p, param in self.search_task.design.params_config[\"tunable\"].items():\n                param_arr.append(sol[param[\"name\"]])\n            population[i] = np.array(param_arr, dtype=int)\n\n        return population\n\n    def search_design(self, arch_sol, use_model=0, bst=None):\n        network_search_record = utils.SearchRecord(self.max).reset()\n        # Update the hardware constraints\n        search_task = copy.deepcopy(self.search_task)\n        arch_cst = search_task.compute_arch_cst(arch_sol)\n        search_task.set_arch_cst(arch_cst)\n        search_task.set_arch_sol(arch_sol)\n\n        # Dynamic programming\n        dp_config = {\n            \"explorer\": self.params[\"explorer\"],\n            \"arch_cst\": arch_cst,\n            \"DP_tasks\": {},\n            \"search_jobs\": {}\n        }\n        if \"fusion_candidates\" in self.params:\n            dp_config[\"fusion_candidates\"] = self.params[\"fusion_candidates\"]\n\n        DP_tuner = FuseDPTuner(dp_config, search_task.tasks, self.cst, self.n_worker)\n        network_search_record.update(DP_tuner.search())\n\n        return network_search_record\n\n    def search(self):\n        self.counter.init_counter('time')\n        self.counter.init_counter('converge_time')\n        self.epoch = 0\n\n        # Init the stats\n        num_pop = int(self.params[\"population_size\"])\n        num_gen = int(self.max_epoch // num_pop)\n        num_parents = int(num_pop * self.params[\"parents_ratio\"])\n        self.log(f'Number of generations: {num_gen}')\n        self.log(f'Number of population: {num_pop}')\n        self.log(f'Number of parents: {num_parents}')\n\n        idx = 0\n        for p, param in self.search_task.design.params_config[\"tunable\"].items():\n            self.param_idx_map[param[\"name\"]] = idx\n            self.idx_param_map[idx] = param[\"name\"]\n            idx += 1\n\n        # Init the population\n        population = self.init_population(num_pop)\n        fitness = np.empty(num_pop, dtype=float)\n\n        terminate = False\n        while True:\n            if self.epoch > 0:\n                # Select the parents\n                parents = self.select_parents(population, fitness, num_parents)\n                if parents.shape[0] == 0:\n                    break\n                # Crossover\n                children = self.crossover(parents, num_pop - parents.shape[0])\n                # Mutation\n                children = self.mutation(children)\n                # Compose the new generation\n                population[0:parents.shape[0], :] = parents\n                population[parents.shape[0]:, :] = children\n\n            # Update the fitness\n            use_model = self.bst_data['valid'] and (self.gen % self.params['model_gens'] != 0)\n            job_list = []\n            results = {}\n            for i in range(num_pop):\n                idv = population[i]\n                task_params = {}\n                for p, param in self.search_task.design.params_config[\"tunable\"].items():\n                    task_params[param[\"name\"]] = idv[self.param_idx_map[param[\"name\"]]]\n                idv_hash = self.hash_params(task_params)\n                for p, param in self.search_task.design.params_config[\"external\"].items():\n                    task_params[param[\"name\"]] = self.search_task.workload[\"params\"][param[\"name\"]]\n                # Note: XGBoost model has compatibility problem with multi-processing.\n                search_task = copy.deepcopy(self.search_task)\n                # Compute the architecture features\n                arch_cst = search_task.compute_arch_cst(task_params)\n                if not use_model:\n                    if idv_hash in self.search_cache:\n                        continue\n                    else:\n                        search_record = utils.SearchRecord(self.max).reset()\n                        if arch_cst:\n                            if not self.xgboost_prune(task_params, arch_cst):\n                                self.search_cache[idv_hash] = {'status': 'submit', 'value': None}\n                                job_list.append({\n                                    'job_hash': idv_hash,\n                                    'func': self.search_design,\n                                    'args': [task_params, use_model, copy.deepcopy(self.bst)]})\n                            else:\n                                results[idv_hash] = search_record\n                        else:\n                            results[idv_hash] = search_record\n                else:\n                    reward = 0\n                    if arch_cst:\n                        reward = self.xgboost_predict(task_params, arch_cst)[0]\n                    results[idv_hash] = reward\n\n            if len(job_list) > 0:\n                pool = utils.MyExecutor(max(int(self.n_worker / 2), 2))\n                pool_results = pool.exec(job_list)\n                for result in pool_results:\n                    results[result] = pool_results[result]\n\n            # Update the tuner results\n            for i in range(num_pop):\n                idv = population[i]\n                task_params = {}\n                for p, param in self.search_task.design.params_config[\"tunable\"].items():\n                    task_params[param[\"name\"]] = idv[self.param_idx_map[param[\"name\"]]]\n                idv_hash = self.hash_params(task_params)\n                if use_model:\n                    fitness[i] = results[idv_hash]\n                else:\n                    if idv_hash in self.search_cache and self.search_cache[idv_hash]['status'] == 'done':\n                        fitness[i] = self.search_cache[idv_hash]['value']\n                        continue\n                    search_record = results[idv_hash]\n                    if self.overuse_constraint(search_record.cst) or search_record.valid == 0:\n                        search_record.reward = 0\n                    if search_record.reward > 0:\n                        if search_record.reward > self.best_reward:\n                            self.best_reward = search_record.reward\n                            self.best_reward_meta = search_record.reward_meta\n                            self.best_sol_cst = search_record.cst\n                            self.best_sol = {\"arch_sol\": search_record.arch_sol, \\\n                                             \"task_sols\": search_record.task_sols}\n                            self.log(f'Epoch {self.epoch}: new best reward: {self.best_reward} ({1/self.best_reward:.0f})')\n                            self.last_update_epoch = self.epoch\n                            self.counter.update_counter('converge_time')\n                            # Update the DSP eff\n                            search_record.dsp_eff = self.search_task.compute_dsp_eff(search_record.latency, search_record.cst[\"DSP\"])\n                            self.best_search_record = search_record\n                            if self.best_reward >= self.params[\"best_reward\"] * self.params[\"best_reward_thres\"]:\n                                terminate = True\n\n                    self.best_rewards.append(self.best_reward)\n                    fitness[i] = search_record.reward / self.params['best_reward']\n                    self.search_cache[idv_hash] = {'status': 'done', 'value': fitness[i]}\n                self.epoch += 1\n\n            # Add training samples\n            if not use_model:\n                for result in results:\n                    search_record = results[result]\n                    if self.params[\"best_reward\"] and search_record.valid:\n                        arch_cst = self.search_task.compute_arch_cst(search_record.arch_sol)\n                        if search_record.reward > 0:\n                            self.xgboost_add_sample(search_record.arch_sol, arch_cst, search_record.reward / self.params['best_reward'])\n                        else:\n                            self.xgboost_add_sample(search_record.arch_sol, arch_cst, 0)\n\n            # Train the cost model\n            if not use_model:\n                self.xgboost_train()\n                # Adjust the cost model threshold dynamically\n                if self.best_search_record.valid:\n                    arch_sol = self.best_search_record.arch_sol\n                    arch_cst = self.search_task.compute_arch_cst(arch_sol)\n                    pred = self.xgboost_predict(arch_sol, arch_cst)\n                    self.params['prune_params']['xgb_thres'] = pred * self.params['prune_params']['xgb_thres_adjust']\n                    self.log(f'Updated XGB pruning thres: {self.params[\"prune_params\"][\"xgb_thres\"]}')\n\n            self.gen += 1\n\n            if self.stop_criteria == \"epoch\" and self.epoch > self.max_epoch:\n                break\n            if self.stop_criteria == \"time\":\n                self.counter.update_counter('time')\n                if self.counter.get_counter('time') > self.max_time:\n                    break\n            if terminate:\n                break\n\n        return\n\ndef multi_acc_search1(search_task, init_tasks, cst, search_obj, max_epochs, max_time, \\\n                      n_worker=1, silent=0, population_size=20, policy=0, meta=None, explorer=None, profiling=0):\n    \"\"\" This function finds the best multi-array architecture for a list of tasks.\n    \"\"\"\n    import logging\n    logger = logging.getLogger('AutoSA-Tuner')\n    if silent == 0:\n        logger.info(\"Performing cross layer multi-accelerator genetic search...\")\n\n    best_latency = utils.compute_tasks_latency(search_task.tasks, init_tasks)\n    if silent == 0:\n        logger.info(f'Cross-layer multi-accelerator ideal latency: {best_latency}')\n\n    partition_candidates = meta[\"partition_candidates\"]\n\n    tuner_params = {\n        \"explorer\": explorer,\n        \"probe_points\": meta[\"init_partition_candidates\"],\n        \"best_reward\": 1 / best_latency if best_latency else None,\n        \"partition_candidates\": partition_candidates,\n        \"batch_size\": meta[\"batch_size\"],\n        \"use_uram_all\": meta[\"use_uram_all\"],\n        \"dsp_eff_thres\": 0.85, # If the DSP eff is greater than this thres, no fine-tuning is required.\n        \"latency_stdev_thres\": 0.03,\n        \"reward_stdev_thres\": 0.025,\n        \"max_trial\": 3 # Terminate fine-tuning after more than x trials\n    }\n    if meta:\n        tuner_params[\"design_idx_list\"] = meta['design_idx_list']\n\n    if max_epochs > 0:\n        pass\n    else:\n        max_time = 3600 # 60 minutes at most\n\n    tuner = MultiAccTuner1(search_task, cst, search_obj, max_epochs, max_time, tuner_params, n_worker, silent)\n    tuner.search()\n\n    search_record = tuner.best_search_record\n\n    # For internal testing\n    now = datetime.now()\n    config_str = f\"_{explorer.search_config['workload']}_multi1\"\n    with open(f'tmp/tuning_rewards{config_str}.csv', \"w\", newline='') as f:\n        fieldnames = ['epoch', 'reward', 'time']\n        writer = csv.DictWriter(f, fieldnames=fieldnames)\n        writer.writeheader()\n        #for epoch in range(len(tuner.best_rewards)):\n        for epoch in range(len(tuner.bayopt_best_rewards)):\n            writer.writerow({'epoch': epoch, 'reward': tuner.bayopt_best_rewards[epoch], 'time': tuner.bayopt_best_rewards_time[epoch]})\n\n    return search_record\n\nclass MultiAccTuner1(Tuner):\n    def __init__(self, search_task, cst, obj, max_epoch, max_time, params, n_worker=1, silent=0):\n        super().__init__(search_task, cst, obj, max_epoch, max_time, n_worker=n_worker, silent=silent)\n        self.params = params\n        self.epoch = 0\n        if max_epoch > 0:\n            self.stop_criteria = \"epoch\"\n            self.max_epoch = max_epoch\n        else:\n            self.stop_criteria = \"time\"\n            self.max_time = max_time\n        self.counter = utils.PerfCounter()\n        self.bayopt_epoch = 0\n        self.bayopt_best_rewards = []\n        self.bayopt_best_rewards_time = []\n\n        self.search_cache = {} # Store searc records\n        self.search_cache_cst = {}\n        self.bay_search_log = {} # Bayesian search log\n\n    def resource_alloc(self, partition):\n        \"\"\" Allocate initial DSP/BRAM limit.\n        The highest throughput is achieved when each array has a similar latency.\n        At the ideal case,\n        ops1/#DSP1 = ops2/#DSP2 = ...\n        The initial DSP is then allocated based on the #ops of each array.\n        #DSPi = opsi/ops_total * #DSP_total\n        \"\"\"\n        DSP_total = self.cst.hw_cst['DSP']\n        BRAM_total = self.cst.hw_cst['BRAM18K']\n        array_ops = []\n        for p in partition:\n            cur_ops = 0\n            for idx in p:\n                cur_ops += self.search_task.tasks[idx].compute_ops()\n            array_ops.append(cur_ops)\n\n        total_ops = sum(array_ops)\n        DSP_alloc = [int(n / total_ops * DSP_total) for n in array_ops]\n\n        if len(partition) == 1:\n            step = 1\n        else:\n            step = pow(2, int(np.log2(len(partition))) + 1)\n        BRAM18K_alloc = [self.cst.hw_cst['BRAM18K'] / step for n in array_ops]\n        #BRAM18K_alloc = [self.cst.hw_cst['BRAM18K'] for n in array_ops]\n\n        return {\"DSP\": DSP_alloc, \"BRAM18K\": BRAM18K_alloc, \"state\": 0}\n\n    def est_URAM(self, records):\n        URAM_total = 0\n        for i in range(len(records)):\n            record = records[i]\n            URAM_total += record.cst[\"URAM\"]\n\n        return URAM_total\n\n    def est_mem(self, partition, records, verbose=0):\n        \"\"\" Estimate the total BRAM18K usage.\n        BRAM18K is consumed by two parts: arrays and streaming buffers in-between.\n        For two adjacent arrays, suppose their tiling factors as:\n        [tr1, tc1, to1, ti1] and [tr2, tc2, to2, ti2]\n        Compute the tiling factors such that:\n        tr' = c0 * tr1\n        tc' = c1 * tc1\n        (c0 - 1) * tr1 < tr2 + k - 1 <= c0 * tr1\n        (c1 - 1) * tc1 < tc2 + k - 1 <= c1 * tc1\n        Streaming buffers are allocated to hold at least:\n        tr' * tc' * o1(i2) * 2\n        such that when the second array is using the first block of (tr2 + k - 1) * ... * i2,\n        the first array will continue to fill the rest of the buffer for the next round.\n        If verbose is set to 1, return the detailed resource usage of each array and streaming buffer.\n        \"\"\"\n        array_bufs = []\n        stream_bufs = []\n        BRAM18K_total = 0\n        URAM_total = 0\n        # array bufs\n        for i in range(len(records)):\n            record = records[i]\n            BRAM18K_total += record.cst[\"BRAM18K\"]\n            array_bufs.append(record.cst[\"BRAM18K\"])\n\n        # array bufs\n        if self.params[\"use_uram_all\"]:\n            for i in range(len(records)):\n                if len(partition[i]) == 1:\n                    continue\n                # Allocate URAMs to store the intermediate results for multi-task array\n                URAM_tmp = 0\n                for layer_idx in partition[i]:\n                    o = self.search_task.tasks[layer_idx].workload['params']['o']\n                    r = self.search_task.tasks[layer_idx].workload['params']['r']\n                    c = self.search_task.tasks[layer_idx].workload['params']['c']\n                    data_pack = records[i].task_sols[0]['sol']['i_t2']\n                    ele_num = o * r * c\n                    URAM_tmp = max(URAM_tmp, 2 * np.ceil(self.search_task.dw * data_pack * 8 / 72) * np.ceil(ele_num / data_pack / 4096))\n                URAM_total += URAM_tmp\n\n        # streaming bufs\n        for i in range(1, len(records)):\n            array1 = records[i - 1]\n            array2 = records[i]\n            # Streaming buffers are only inserted between single-task array.\n            if len(array1.task_names) > 1 or len(array2.task_names) > 1:\n                continue\n            layer1_idx = partition[i - 1][0]\n            layer2_idx = partition[i][0]\n            # Extract parameters of array 1\n            o1 = self.search_task.tasks[layer1_idx].workload['params']['o']\n            tr1 = min(array1.task_sols[0]['sol']['r_t1'], self.search_task.tasks[layer1_idx].workload['params']['r'])\n            tc1 = min(array1.task_sols[0]['sol']['c_t1'], self.search_task.tasks[layer1_idx].workload['params']['c'])\n            for tag in self.search_task.tasks[layer1_idx].workload['tags']:\n                if tag.startswith('maxpool'):\n                    stride = int(tag.split('_')[-1])\n                    tr1 /= stride\n                    tc1 /= stride\n            tr1 = max(int(tr1), 1)\n            tc1 = max(int(tc1), 1)\n            # Extract parameters of array 2\n            tr2 = min(array2.task_sols[0]['sol']['r_t1'], self.search_task.tasks[layer2_idx].workload['params']['r'])\n            tc2 = min(array2.task_sols[0]['sol']['c_t1'], self.search_task.tasks[layer2_idx].workload['params']['c'])\n            k = self.search_task.tasks[layer2_idx].workload['params']['p']\n            data_pack = array2.task_sols[0]['sol']['i_t2']\n            # Compute the BRAM size\n            c0 = np.ceil((tr2 + k - 1) / tr1)\n            c1 = np.ceil((tc2 + k - 1) / tc1)\n            array1_params = self.search_task.tasks[layer1_idx].workload[\"params\"]\n            array2_params = self.search_task.tasks[layer2_idx].workload[\"params\"]\n            trp = min(c0 * tr1, array1_params['r'])\n            tcp = min(c1 * tc1, array1_params['c'])\n            #ele_num = trp * tcp * o1 * 2\n            ele_num = min(trp * array1_params['c'] * o1, tcp * array1_params['r'] * o1)\n            #buffer = np.ceil(self.search_task.dw * data_pack * 8 / 36) * np.ceil(ele_num / data_pack / 512)\n            buffer = np.ceil(self.search_task.dw * data_pack * 8 / 72) * np.ceil(ele_num / data_pack / 4096)\n            stream_bufs.append(buffer)\n            #BRAM18K_total += buffer\n            URAM_total += buffer\n\n        if verbose == 0:\n            return {\"BRAM18K\": BRAM18K_total, \"URAM\": URAM_total}, None            \n        else:\n            return {\"BRAM18K\": BRAM18K_total, \"URAM\": URAM_total}, {\"array_bufs\": array_bufs, \"stream_bufs\": stream_bufs}            \n\n    def overuse_resource(self, partition, records):\n        for record in records:\n            if record.valid == 0:\n                return True\n        #BRAM18K = self.est_BRAM18K(partition, records)\n        mem, meta = self.est_mem(partition, records)\n        #URAM = self.est_URAM(records)\n        DSP = 0\n        for record in records:\n            DSP += record.cst[\"DSP\"]\n        BRAM18K = mem[\"BRAM18K\"]\n        URAM = mem[\"URAM\"]\n        if BRAM18K > self.cst.hw_cst[\"BRAM18K\"]:\n            return True\n        if URAM > self.cst.hw_cst[\"URAM\"]:\n            return True\n        if DSP > self.cst.hw_cst[\"DSP\"]:\n            return True\n\n        return False\n\n    def est_resource(self, partition, records):\n        #BRAM18K = self.est_BRAM18K(partition, records)\n        #URAM = self.est_URAM(records)\n        mem, meta = self.est_mem(partition, records)\n        DSP = 0\n        for record in records:\n            DSP += record.cst[\"DSP\"]\n\n        return {\"DSP\": DSP, \"BRAM18K\": mem[\"BRAM18K\"], \"URAM\": mem[\"URAM\"]}\n\n    def est_latency(self, partition, records, in_place=0, adjust=0, verbose=0):\n        \"\"\" Compute the latency of the design.\n        Single-task arrays are adjusted to start as long as the first batch of data\n        are ready in the streaming buffer.\n        Multi-task array will wait until the previous array finishes.\n        Any arrays following the multi-task array will also wait for the previous array to complete.\n        If in_place is set to 1, records latency will be updated.\n        If adjust is set to 1, we will consider the possible stall between arrays.\n        \"\"\"\n        array_latency = [records[0].latency * self.params[\"batch_size\"]]\n        setup_latency = [0]\n        record_latency = [r.latency for r in records]\n\n        # Update array and setup latency\n        for i in range(1, len(records)):\n            array1 = records[i - 1]\n            array2 = records[i]\n            if len(array1.task_names) > 1 or len(array2.task_names) > 1:\n                # One of the arrays is a multi-task array\n                # Start only if the previous one finishes\n                setup = array_latency[-1]\n                setup_latency.append(setup)\n\n                array_latency.append(max(record_latency[i] * self.params[\"batch_size\"], array_latency[i - 1]))\n            else:\n                # Both arrays are single-task arrays\n                # Start as long as the first block of data is ready\n                layer1_idx = partition[i - 1][0]\n                layer2_idx = partition[i][0]\n                # Extract parameters of array 1\n                o1 = self.search_task.tasks[layer1_idx].workload['params']['o']\n                tr1 = min(array1.task_sols[0]['sol']['r_t1'], self.search_task.tasks[layer1_idx].workload['params']['r'])\n                tc1 = min(array1.task_sols[0]['sol']['c_t1'], self.search_task.tasks[layer1_idx].workload['params']['c'])\n                tr1_post = tr1\n                tc1_post = tc1\n                for tag in self.search_task.tasks[layer1_idx].workload['tags']:\n                    if tag.startswith('maxpool'):\n                        stride = int(tag.split('_')[-1])\n                        tr1_post /= stride\n                        tc1_post /= stride\n                tr1_post = max(int(tr1_post), 1)\n                tc1_post = max(int(tc1_post), 1)\n                # Extract parameters of array 2\n                tr2 = min(array2.task_sols[0]['sol']['r_t1'], self.search_task.tasks[layer2_idx].workload['params']['r'])\n                tc2 = min(array2.task_sols[0]['sol']['c_t1'], self.search_task.tasks[layer2_idx].workload['params']['c'])\n                k = self.search_task.tasks[layer2_idx].workload['params']['p']\n                data_pack = array2.task_sols[0]['sol']['i_t2']\n\n                c0 = np.ceil((tr2 + k - 1) / tr1_post)\n                c1 = np.ceil((tc2 + k - 1) / tc1_post)\n                array1_params = self.search_task.tasks[layer1_idx].workload[\"params\"]\n                array2_params = self.search_task.tasks[layer2_idx].workload[\"params\"]\n                trp = min(c0 * tr1, array1_params['r'])\n                tcp = min(c1 * tc1, array1_params['c'])\n                # Setup latency\n                #setup = record_latency[i - 1] / (np.ceil(array1_params['r'] / trp) * np.ceil(array1_params['c'] / tcp))\n                if trp > tcp:\n                    setup = record_latency[i - 1] / np.ceil(array1_params['c'] / tcp)\n                else:\n                    setup = record_latency[i - 1] / np.ceil(array1_params['r'] / trp)\n                setup_latency.append(setup)\n\n                # Adjust the array latency\n                if adjust:\n                    raise RuntimeError(\"Array latency adjust for multi-array is not implemented.\")\n                    '''\n                    # Consider the fine-grained produce-consume relationship\n                    n_fill_rounds = np.ceil((min(2 * tr2 + k - 1, array1_params['r'] + k - 1) - c0 * tr1_post) / tr1_post) * c1\n                    fill_latency = array_latency[-1] / (np.ceil(array1_params['r'] / tr1 * np.ceil(array1_params['c'] / tc1))) * n_fill_rounds\n                    consume_latency = record_latency[i] / (np.ceil(array2_params['r'] / tr2 * np.ceil(array2_params['c'] / tc2)))\n                    adjusted_latency = max(fill_latency, consume_latency) * np.ceil(array2_params['r'] / tr2) * np.ceil(array2_params['c'] / tc2)\n                    record_latency[i] = adjusted_latency\n                    array_latency.append(adjusted_latency)\n                    '''\n                else:\n                    # Simply compute the max\n                    array_latency.append(max(record_latency[i] * self.params[\"batch_size\"], array_latency[i - 1]))\n\n        if in_place:\n            # Update the array latency\n            for i in range(len(records)):\n                records[i].latency = array_latency[i]\n\n        # Compute the latency\n        design_latency = 0\n        for lat in setup_latency:\n            design_latency += lat\n        design_latency += array_latency[-1]\n\n        # Throughput\n        max_latency = 0\n        for latency in array_latency:\n            if latency > max_latency:\n                max_latency = latency\n        throughput = 1 / max_latency * self.params[\"batch_size\"]\n\n        return design_latency, throughput, None\n\n    def est_dsp_eff(self, throughput, cst):\n        total_ops = 0\n        for task in self.search_task.tasks:\n            total_ops += task.compute_ops()\n        # Note: Only works for FP32\n        dsp_eff = throughput / (cst[\"DSP\"] / 5 * 2 / total_ops)\n\n        return dsp_eff\n\n    def evaluate(self, partition, records, verbose=0):\n        latency, throughput, meta = self.est_latency(partition, records, verbose=verbose)\n        #latency, throughput = self.est_latency(partition, records)\n        resource = self.est_resource(partition, records)\n        return latency, resource, throughput, meta\n\n    def is_finetune_required(self, records, dsp_eff):\n        \"\"\" Check if finetuning is required.\n        \"\"\"\n        # If DSP efficiency is higher than the thres, stop\n        if dsp_eff >= self.params[\"dsp_eff_thres\"]:\n            return False\n\n        return True\n\n    def resource_alloc_adjust(self, partition, resource_alloc, records, overuse_mem):\n        \"\"\" Adjust the resource allocation.\n        State 0: Try to allocate all the available resource to the bottleneck design.\n        If the resource allocation leads to memory overuse, reduce the resource allocated\n        to the bottleneck design graduualy until a legal one is found.\n        Switch to state 0.5 afterwards.\n        If the first attempt leads to a legal design while the bottleneck design remains\n        the bottleneck, switch to state 1.\n\n        State 0.5 (deprecated): Intermediate state. Simply try to increase the resource allocation.\n        If succeeds, switch back to state 0, otherwise, switch to state 1.\n        This state is set considering the instability of the search results, i.e.,\n        re-run the searching for more arrays might lead to a feasible solution.\n\n        State 1: Borrow resource from fastest designs to the bottleneck design.\n        We keep a cache to store all the past records for different arrays with\n        different resource allocation. This cache is prioritized when selecting\n        the reduced resource allocation.\n        In the case when no such option is found in the search log, simply reducing the\n        resource usage by a fixed amount.\n\n        \"records\" is the best feasible array records found so far.\n        \"overuse_mem\" indicates if the last attempt leads to memory overutilization.\n        \"\"\"\n        # Calculate the available resource\n        available_dsp = self.cst.hw_cst[\"DSP\"]\n        available_bram = self.cst.hw_cst[\"BRAM18K\"]\n        if resource_alloc[\"state\"] in [0, 0,5]:\n            available_dsp -= resource_alloc['init']['DSP_total']\n            available_bram -= resource_alloc['init']['BRAM18K_total']\n        else:\n            resource = self.est_resource(partition, records)\n            available_dsp -= resource['DSP']\n            available_bram -= resource['BRAM18K']\n\n        slow_idx_list = resource_alloc[\"slow_idx\"]\n        fast_idx_list = resource_alloc[\"fast_idx\"]\n\n        inc_dsp = 0\n        inc_bram = 0\n        dec_dsp = 0\n        dec_bram = 0\n\n        # State transition\n        if resource_alloc[\"state\"] == 0:\n            if resource_alloc[\"decrease\"][0] == 1 and not overuse_mem:\n                #resource_alloc[\"state\"] = 0.5 # Stale state for one more attempt\n                resource_alloc[\"state\"] = 1\n            if resource_alloc[\"n_adjust\"][0] == 1 and not overuse_mem:\n                # Allocate all the available resource is insufficent\n                resource_alloc[\"state\"] = 1\n        elif resource_alloc[\"state\"] == 0.5:\n            if resource_alloc[\"decrease\"][0] == 0 and not overuse_mem:\n                resource_alloc[\"state\"] = 0\n            else:\n                resource_alloc[\"state\"] = 1\n\n        if resource_alloc[\"state\"] in [0, 0.5]:\n            if resource_alloc[\"n_adjust\"][0] > 1:\n                if overuse_mem == 0:\n                    # Increase the lower bound\n                    resource_alloc[\"step\"][0][0] = sum(resource_alloc[\"step\"][0]) / 2\n                else:\n                    # Decrease faster\n                    resource_alloc[\"step\"][0][1] = sum(resource_alloc[\"step\"][0]) / 4\n\n            if resource_alloc[\"n_adjust\"][0] == 0:\n                # At the first attempt, allocate all the available resource to the bottleneck design\n                ratio = resource_alloc[\"step\"][0][1]\n            else:\n                ratio = sum(resource_alloc[\"step\"][0]) / 2\n            inc_dsp = available_dsp\n            inc_bram = int(available_bram * ratio)\n            resource_alloc[\"DSP\"][slow_idx_list[0]] = resource_alloc['init']['DSP'][slow_idx_list[0]] + inc_dsp\n            resource_alloc[\"BRAM18K\"][slow_idx_list[0]] = resource_alloc['init']['BRAM18K'][slow_idx_list[0]] + inc_bram\n\n            if resource_alloc[\"n_adjust\"][0] > 0:\n                if inc_bram > resource_alloc[\"history\"][0]:\n                    resource_alloc[\"decrease\"][0] = 0\n                else:\n                    resource_alloc[\"decrease\"][0] = 1\n            resource_alloc[\"history\"][0] = inc_bram\n            if inc_bram == 0:\n                resource_alloc[\"state\"] = 1\n\n        if resource_alloc[\"state\"] == 1:\n            # Calculate the available resource\n            available_dsp = self.cst.hw_cst[\"DSP\"]\n            available_bram = self.cst.hw_cst[\"BRAM18K\"]\n            resource = self.est_resource(partition, records)\n            available_dsp -= resource['DSP']\n            available_bram -= resource['BRAM18K']\n\n        if resource_alloc[\"state\"] == 1:\n            cur_adjust_thres = len(fast_idx_list) * 2\n            update_idx_select = {}\n            while True:\n                total_adjust_num = 0 # Number of successfully adjusted arrays\n                if cur_adjust_thres == 0:\n                    break\n                for idx in fast_idx_list:\n                    history = self.search_cache[idx]\n                    def take_latency(record):\n                        return record.latency\n                    history.sort(key=take_latency)\n                    # Compute the latency upper bound to adjust this array\n                    ub_latency = (records[slow_idx_list[0]].latency - records[idx].latency) / (cur_adjust_thres + 1) + records[idx].latency\n                    #print(\"adjust ub latency: \", ub_latency)\n\n                    # Decrease the memory allocation for this design to increase array latency up to ub_latency\n                    min_mem = records[idx].cst[\"BRAM18K\"]\n                    update_idx = -1\n                    for history_idx in range(len(history)):\n                        r = history[history_idx]\n                        if r.latency > records[slow_idx_list[0]].latency:\n                            break\n                        if r.latency >= records[idx].latency and r.latency <= ub_latency:\n                            if r.cst[\"BRAM18K\"] < min_mem:\n                                min_mem = r.cst[\"BRAM18K\"]\n                                update_idx = history_idx\n                    if update_idx != -1:\n                        total_adjust_num += 1\n                    update_idx_select[idx] = update_idx\n                if total_adjust_num < min(len(fast_idx_list), 2):\n                    # Adjust at least two arrays each time\n                    # If not enough candidate arrays are found, try to loose the upper bound\n                    cur_adjust_thres -= 1\n                else:\n                    break\n            for idx in fast_idx_list:\n                history = self.search_cache[idx]\n                def take_latency(record):\n                    return record.latency\n                history.sort(key=take_latency)\n                dec_bram_single = 0\n                dec_dsp_single = 0\n                if update_idx_select[idx] != -1:\n                    #print(\"cur, selected records constraints: \", records[idx].cst[\"BRAM18K\"], r.cst[\"BRAM18K\"])\n                    #print(\"cur, selected records latency: \", records[idx].latency, r.latency)\n                    r = history[update_idx_select[idx]]\n                    dec_bram_single = (records[idx].cst[\"BRAM18K\"] - r.cst[\"BRAM18K\"])\n                    dec_dsp_single = (records[idx].cst[\"DSP\"] - r.cst[\"DSP\"])\n                    resource_alloc[\"DSP\"][idx] = r.cst[\"DSP\"]\n                    resource_alloc[\"BRAM18K\"][idx] = r.cst[\"BRAM18K\"]\n                dec_bram += dec_bram_single\n                dec_dsp += dec_dsp_single\n            if dec_bram == 0:\n                # No available records found in the search cache.\n                # We will force fast designs to spare resource to the bottleneck design.\n                dec_dsp = 0\n                for idx in fast_idx_list:\n                    limit_ratio = min((1 - records[idx].latency / records[slow_idx_list[0]].latency) / 8, resource_alloc[\"step\"][1][0])\n                    dec_bram_single = records[idx].cst[\"BRAM18K\"] * limit_ratio\n                    dec_dsp_single = records[idx].cst[\"DSP\"] * limit_ratio / 2\n                    resource_alloc[\"DSP\"][idx] = records[idx].cst[\"DSP\"] - dec_dsp_single\n                    resource_alloc[\"BRAM18K\"][idx] = records[idx].cst[\"BRAM18K\"] - dec_bram_single\n                    dec_bram += dec_bram_single\n                    dec_dsp += dec_dsp_single\n\n            resource_alloc[\"DSP\"][slow_idx_list[0]] = records[slow_idx_list[0]].cst['DSP'] + (dec_dsp + available_dsp)\n            resource_alloc[\"BRAM18K\"][slow_idx_list[0]] = records[slow_idx_list[0]].cst['BRAM18K'] + (dec_bram + available_bram)\n            if resource_alloc[\"n_adjust\"][1] > 0:\n                if resource_alloc[\"BRAM18K\"][slow_idx_list[0]] > resource_alloc[\"history\"][1]:\n                    resource_alloc[\"decrease\"][1] = 0\n                else:\n                    resource_alloc[\"decrease\"][1] = 1\n            resource_alloc[\"history\"][1] = resource_alloc[\"BRAM18K\"][slow_idx_list[0]]\n            if resource_alloc[\"decrease\"][1] == 1 and not overuse_mem:\n                # Stop further tuning\n                return False\n\n        resource_alloc[\"n_adjust\"][math.floor(resource_alloc[\"state\"])] += 1\n        # Only try at most 3 times for each state\n        if resource_alloc[\"n_adjust\"][0] > 3 or resource_alloc[\"n_adjust\"][1] > 3:\n            return False\n\n        return True\n\n    def update_bottleneck_idx(self, records):\n        \"\"\" Return the slowest/fastest design index.\n        Select up to len(records) - 1 fast designs.\n        Select 1 slow design.\n        \"\"\"\n        slow = {'latency': 0, 'idx': []}\n        fast = {'latency': float(\"inf\"), 'idx': []}\n        for i in range(len(records)):\n            record = records[i]\n            if record.latency < fast['latency']:\n                fast['latency'] = record.latency\n                fast['idx'] = [i]\n            if record.latency > slow['latency']:\n                slow['latency'] = record.latency\n                slow['idx'] = [i]\n\n        list_len = len(records) - 1\n        for i in range(len(records)):\n            if i in fast['idx']:\n                continue\n            record = records[i]\n            if abs((record.latency - fast['latency']) / fast['latency']) < 0.05 and len(fast['idx']) < list_len:\n                fast['idx'].append(i)\n        if len(fast[\"idx\"]) == 1 and list_len > 1:\n            # Add one more into the list\n            fast_val = float(\"inf\")\n            idx = -1\n            for i in range(len(records)):\n                record = records[i]\n                if i == fast['idx'][0]:\n                    continue\n                if record.latency < fast_val:\n                    fast_val = record.latency\n                    idx = i\n            fast['idx'].append(idx)\n\n        list_len = 1\n        for i in range(len(records)):\n            if i in slow['idx']:\n                continue\n            record = records[i]\n            if abs((record.latency - slow['latency']) / slow['latency']) < 0.02 and len(slow['idx']) < list_len:\n                slow['idx'].append(i)\n\n        return slow[\"idx\"], fast[\"idx\"]\n\n    def find_legal_config(self, partition, resource_alloc, old_records=None, adjust_func=None, fine_tune=0, skip_search=1):\n        \"\"\" Find a legal configuration given the resource allocation.\n        If \"skip_search\" is set to 1, only re-search the designs in the slow/fast idx list.\n        If \"fine_tune\" is set to 1, we will adjust the resource allocation using\n        \"adjust_func\" until the bottleneck array changes or there is no valid resource\n        allocation found.\n        Otherwise, the current function will gradually reduce the BRAM allocation until a\n        valid design is found.\n        \"\"\"\n        legal_records = old_records\n        best_throughput = 0\n        is_first = True\n        n_arrays = len(partition)\n        # Maintain a list of several best designs for each array\n        history = [[] for i in range(n_arrays)]\n        history_thres = 2\n        if n_arrays > 10:\n            # Avoid storing too many configs\n            history_thres = 1\n\n        single_task_arrays = []\n        multi_task_arrays = []\n        for i in range(n_arrays):\n            if len(partition[i]) == 1:\n                single_task_arrays.append(i)\n            else:\n                multi_task_arrays.append(i)\n\n        while True:\n            # For internal testing\n            #pprint.pprint(resource_alloc)\n            records = []\n            skip_idx = []\n            job_list = []\n            tasks = []\n            # single task arrays\n            for i in single_task_arrays:\n                # Update the history\n                history_tmp = []\n                for record in history[i]:\n                    if record.cst[\"BRAM18K\"] <= resource_alloc[\"BRAM18K\"][i] and \\\n                       record.cst[\"DSP\"] <= resource_alloc[\"DSP\"][i]:\n                       history_tmp.append(record)\n                if legal_records and is_first:\n                    if legal_records[i].cst[\"BRAM18K\"] <= resource_alloc[\"BRAM18K\"][i] and \\\n                       legal_records[i].cst[\"DSP\"] <= resource_alloc[\"DSP\"][i]:\n                       history_tmp.append(legal_records[i])\n                       self.search_cache[i].append(legal_records[i])\n                history[i] = history_tmp\n                if skip_search == 1:\n                    if (i not in resource_alloc[\"slow_idx\"]) and (i not in resource_alloc[\"fast_idx\"]) and len(history[i]) > 0:\n                        skip_idx.append(i)\n                        continue\n                # Submit the search job\n                explorer_tmp = copy.deepcopy(self.params[\"explorer\"])\n                # Update the constraints\n                explorer_tmp.cst.hw_cst[\"DSP\"] = resource_alloc[\"DSP\"][i]\n                explorer_tmp.cst.hw_cst[\"BRAM18K\"] = resource_alloc[\"BRAM18K\"][i]\n                array_tasks = []\n                for design_idx in self.params[\"design_idx_list\"]:\n                    search_task = SingleTask(explorer_tmp.designs[design_idx], self.search_task.tasks[partition[i][0]].workload, explorer_tmp.cst)\n                    # Update the task configs\n                    if i == 0:\n                        # Load from DRAM\n                        search_task.configs[\"cin_read_mode\"] = 0\n                    elif (i > 0 and len(partition[i - 1]) > 1):\n                        if self.params[\"use_uram_all\"]:\n                            search_task.configs[\"cin_read_mode\"] = 2\n                        else:\n                            search_task.configs[\"cin_read_mode\"] = 0\n                    else:\n                        # Access on-chip streaming buffers\n                        search_task.configs[\"cin_read_mode\"] = 2\n                    if i == len(partition) - 1:\n                        # Write to DRAM\n                        search_task.configs[\"cout_write_mode\"] = 0\n                    elif (i < len(partition) - 1 and len(partition[i + 1]) > 1):\n                        if self.params[\"use_uram_all\"]:\n                            search_task.configs[\"cout_write_mode\"] = 1\n                        else:\n                            search_task.configs[\"cout_write_mode\"] = 0\n                    else:\n                        search_task.configs[\"cout_write_mode\"] = 1\n                    # Run it for multiple times\n                    for repeat in range(1):\n                        job_list.append(\n                            {\n                                \"job_hash\": f\"{str(search_task)}_{repeat}\",\n                                \"func\": explorer_tmp.tune,\n                                \"args\": [search_task, None, self.sub_task_silent, 1]\n                            })\n                    array_tasks.append(search_task)\n                tasks.append(array_tasks)\n\n            pool = utils.MyExecutor(self.n_worker)\n            results = pool.exec(job_list)\n\n            idx = 0\n            for i in single_task_arrays:\n                if i in skip_idx:\n                    continue\n                history_local = history[i]\n                array_tasks = tasks[idx]\n                for task in array_tasks:\n                    for result in results:\n                        if result.startswith(str(task)):\n                            record = results[result]\n                            if record.valid:\n                                record.arch_sol = record.task_sols[0]\n                                history_local.append(record)\n                                self.search_cache[i].append(record)\n                if len(history_local) == 0:\n                    return legal_records\n                def take_latency(record):\n                    return record.latency\n                history_local.sort(key=take_latency)\n                # Only take up to 2 designs for scalability issues\n                history_local = history_local[:min(len(history_local), history_thres)]\n                history[i] = history_local\n                idx += 1            \n\n            # multi-task array\n            for i in multi_task_arrays:\n                # Update the history\n                history_tmp = []\n                for record in history[i]:\n                    if record.cst[\"BRAM18K\"] <= resource_alloc[\"BRAM18K\"][i] and \\\n                       record.cst[\"DSP\"] <= resource_alloc[\"DSP\"][i]:\n                       history_tmp.append(record)\n                if legal_records and is_first:\n                    if legal_records[i].cst[\"BRAM18K\"] <= resource_alloc[\"BRAM18K\"][i] and \\\n                       legal_records[i].cst[\"DSP\"] <= resource_alloc[\"DSP\"][i]:\n                        history_tmp.append(legal_records[i])\n                        self.search_cache[i].append(legal_records[i])\n                history[i] = history_tmp\n                if skip_search == 1:\n                    if (i not in resource_alloc[\"slow_idx\"]) and (i not in resource_alloc[\"fast_idx\"]) and len(history[i]) > 0:\n                        continue\n                explorer_tmp = copy.deepcopy(self.params[\"explorer\"])\n                # Update the constraints\n                explorer_tmp.cst.hw_cst[\"DSP\"] = resource_alloc[\"DSP\"][i]\n                explorer_tmp.cst.hw_cst[\"BRAM18K\"] = resource_alloc[\"BRAM18K\"][i]\n                early_stop = -1\n                if self.params[\"use_uram_all\"]:\n                    search_task_configs = {}\n                    for task_idx in range(len(partition[i])):\n                        search_task_configs[task_idx] = {'cin_read_mode': 2, 'cout_write_mode': 1}\n                    if i == 0:\n                        search_task_configs[0][\"cin_read_mode\"] = 0\n                    if i == n_arrays - 1:\n                        search_task_configs[len(partition[i]) - 1][\"cout_write_mode\"] = 0\n                else:\n                    search_task_configs = None\n                job_list = []\n                for design_idx in self.params[\"design_idx_list\"]:\n                    # Parallel version\n                    job_list.append(\n                        {\n                            \"job_hash\": f\"{design_idx}\",\n                            \"func\": explorer_tmp.search_non_fusion_single_acc_customized1,\n                            \"args\": [design_idx, search_task_configs, -1, self.sub_task_silent, partition[i], None, True]\n                        })\n                    # Sequential version\n                    #search_record = explorer_tmp.search_non_fusion_single_acc_customized1(\\\n                    #    design_idx=design_idx, silent=self.sub_task_silent, \\\n                    #    workload_idx=partition[i], early_stop=early_stop, one_gen=True)\n                    #if search_record.valid:\n                    #    early_stop = search_record.latency\n                    #    history[i].append(search_record)\n                    #    self.search_cache[i].append(search_record)\n                pool = utils.MyExecutor(max(int(self.n_worker/8), 4))\n                results = pool.exec(job_list)\n                for design_idx in self.params[\"design_idx_list\"]:\n                    search_record = results[f\"{design_idx}\"]\n                    if search_record.valid:\n                        history[i].append(search_record)\n                        self.search_cache[i].append(search_record)\n                def take_latency(record):\n                    return record.latency\n                history[i].sort(key=take_latency)\n                history[i] = history[i][:min(len(history[i]), history_thres)]            \n\n            # Find the array combination that satisfies the memory usage\n            choices_tmp = [list(range(len(h))) for h in history]\n            choices = list(itertools.product(*choices_tmp))\n            max_bram_tmp = 0\n            min_bram_tmp = float(\"inf\")\n            best_throughput_tmp = 0\n            for choice in choices:\n                records_tmp = []\n                for i in range(n_arrays):\n                    records_tmp.append(history[i][choice[i]])\n                latency, throughput, _ = self.est_latency(partition, records_tmp)                \n                if not self.overuse_resource(partition, records_tmp):\n                    if throughput > best_throughput_tmp:\n                        records = records_tmp\n                        best_throughput_tmp = throughput\n\n            # Search for several designs with fewer resource for tuning\n            if records and fine_tune == 1:\n                # single-task array\n                max_attempt = 3\n                n_attempt_list = [max_attempt for i in range(n_arrays)]\n                for i in multi_task_arrays:\n                    n_attempt_list[i] = 0\n                last_record = [None for i in range(n_arrays)]\n                while any(y > 0 for y in n_attempt_list):\n                    job_list = []\n                    skip_idx = []\n                    tasks = []\n                    for i in single_task_arrays:\n                        if i not in resource_alloc[\"fast_idx\"]:\n                            skip_idx.append(i)\n                            n_attempt_list[i] = 0\n                            continue\n                        if int(resource_alloc[\"BRAM18K\"][i]) in self.search_cache_cst[i]:\n                            # Candidate search has been done for this one before\n                            skip_idx.append(i)\n                            n_attempt_list[i] = 0\n                            continue\n                        array_tasks = []\n                        unit_dec_bram = 4 # Decrease by 4 each time\n                        if last_record[i]:\n                            dec_bram = records[i].cst[\"BRAM18K\"]- last_record[i].cst[\"BRAM18K\"] + unit_dec_bram\n                        else:\n                            dec_bram = unit_dec_bram\n                        slow_idx_list = resource_alloc[\"slow_idx\"]\n                        fast_idx_list = resource_alloc[\"fast_idx\"]\n                        ub_latency = (records[slow_idx_list[0]].latency - records[i].latency) / (len(fast_idx_list) + 1) + records[i].latency\n                        n_attempt = n_attempt_list[i]\n                        if n_attempt == max_attempt:\n                            self.search_cache_cst[i].append(int(resource_alloc[\"BRAM18K\"][i]))\n                        if n_attempt > 0:\n                            explorer_tmp = copy.deepcopy(self.params[\"explorer\"])\n                            explorer_tmp.cst.hw_cst[\"DSP\"] = resource_alloc[\"DSP\"][i]\n                            explorer_tmp.cst.hw_cst[\"BRAM18K\"] = records[i].cst[\"BRAM18K\"] - dec_bram\n                            for design_idx in self.params[\"design_idx_list\"]:\n                                design = explorer_tmp.designs[design_idx]\n                                if design.name == records[i].design:\n                                    cur_design_idx = design_idx\n                            search_record = None\n                            for r_c in self.search_cache[i]:\n                                if r_c.cst[\"BRAM18K\"] == explorer_tmp.cst.hw_cst[\"BRAM18K\"] and \\\n                                   r_c.cst[\"DSP\"] == explorer_tmp.cst.hw_cst[\"DSP\"] and \\\n                                   r_c.design == explorer_tmp.designs[cur_design_idx].name:\n                                    search_record = r_c\n                                    last_record[i] = search_record\n                                    break\n                            if not search_record:\n                                search_task = SingleTask(explorer_tmp.designs[cur_design_idx], self.search_task.tasks[partition[i][0]].workload, explorer_tmp.cst)\n                                # Update the task configs\n                                if i == 0:\n                                    # Load from DRAM\n                                    search_task.configs[\"cin_read_mode\"] = 0\n                                elif (i > 0 and len(partition[i - 1]) > 1):\n                                    if self.params[\"use_uram_all\"]:\n                                        search_task.configs[\"cin_read_mode\"] = 2\n                                    else:\n                                        search_task.configs[\"cin_read_mode\"] = 0\n                                else:\n                                    # Access on-chip streaming buffers\n                                    search_task.configs[\"cin_read_mode\"] = 2\n                                if i == len(partition) - 1:\n                                    # Write to DRAM\n                                    search_task.configs[\"cout_write_mode\"] = 0\n                                elif (i < len(partition) - 1 and len(partition[i + 1]) > 1):\n                                    if self.params[\"use_uram_all\"]:\n                                        search_task.configs[\"cout_write_mode\"] = 1\n                                    else:\n                                        search_task.configs[\"cout_write_mode\"] = 0\n                                else:\n                                    search_task.configs[\"cout_write_mode\"] = 1\n                                for repeat in range(1):\n                                    job_list.append(\n                                    {\n                                        \"job_hash\": f\"{str(search_task)}_{repeat}\",\n                                        \"func\": explorer_tmp.tune,\n                                        \"args\": [search_task, None, self.sub_task_silent, 0]\n                                    })\n                                array_tasks.append(search_task)\n                        tasks.append(array_tasks)\n\n                    pool = utils.MyExecutor(self.n_worker)\n                    results = pool.exec(job_list)\n\n                    idx = 0\n                    for i in single_task_arrays:\n                        if i in skip_idx:\n                            continue\n                        array_tasks = tasks[idx]\n                        no_valid_record = True\n                        for task in array_tasks:\n                            for result in results:\n                                if result.startswith(str(task)):\n                                    record = results[result]\n                                    if record.valid:\n                                        record.arch_sol = record.task_sols[0]\n                                        self.search_cache[i].append(record)\n                                        last_record[i] = record\n                                        no_valid_record = False\n\n                        idx += 1\n                        if no_valid_record:\n                            n_attempt_list[i] = 0\n                        else:\n                            n_attempt_list[i] -= 1\n\n                # multi-task array\n                for i in multi_task_arrays:\n                    if i not in resource_alloc[\"fast_idx\"]:\n                        continue\n                    if int(resource_alloc[\"BRAM18K\"][i]) in self.search_cache_cst[i]:\n                        continue\n                    unit_dec_bram = 16 # Start with 16\n                    dec_bram = unit_dec_bram\n                    slow_idx_list = resource_alloc[\"slow_idx\"]\n                    fast_idx_list = resource_alloc[\"fast_idx\"]\n                    ub_latency = (records[slow_idx_list[0]].latency - records[i].latency) / (len(fast_idx_list) + 1) + records[i].latency\n                    n_attempt = 2 # Search two designs for multi-task array\n                    while n_attempt > 0:\n                        explorer_tmp = copy.deepcopy(self.params[\"explorer\"])\n                        explorer_tmp.cst.hw_cst[\"DSP\"] = resource_alloc[\"DSP\"][i]\n                        explorer_tmp.cst.hw_cst[\"BRAM18K\"] = records[i].cst[\"BRAM18K\"] - dec_bram\n                        for design_idx in self.params[\"design_idx_list\"]:\n                            design = explorer_tmp.designs[design_idx]\n                            if design.name == records[i].design:\n                                cur_design_idx = design_idx\n                        search_record = None\n                        for r_c in self.search_cache[i]:\n                            if r_c.cst[\"BRAM18K\"] == explorer_tmp.cst.hw_cst[\"BRAM18K\"] and \\\n                               r_c.cst[\"DSP\"] == explorer_tmp.cst.hw_cst[\"DSP\"] and \\\n                               r_c.design == explorer_tmp.designs[cur_design_idx].name:\n                                search_record = r_c\n                                break\n                        if not search_record:\n                            if self.params[\"use_uram_all\"]:\n                                search_task_configs = {}\n                                for task_idx in range(len(partition[i])):\n                                    search_task_configs[task_idx] = {'cin_read_mode': 2, 'cout_write_mode': 1}\n                                if i == 0:\n                                    search_task_configs[0][\"cin_read_mode\"] = 0\n                                if i == n_arrays - 1:\n                                    search_task_configs[len(partition[i]) - 1][\"cout_write_mode\"] = 0\n                            else:\n                                search_task_configs = None\n                            search_record = explorer_tmp.search_non_fusion_single_acc_customized1(\\\n                                design_idx=cur_design_idx, search_task_configs=search_task_configs, \\\n                                silent=self.sub_task_silent, \\\n                                workload_idx=partition[i], one_gen=True)\n                            if search_record.valid:\n                                self.search_cache[i].append(search_record)\n                        if search_record.valid:\n                            if n_attempt == 2 and search_record.latency > ub_latency:\n                                unit_dec_bram = 4\n                                dec_bram = unit_dec_bram\n                            else:\n                                dec_bram = records[i].cst[\"BRAM18K\"]- search_record.cst[\"BRAM18K\"] + unit_dec_bram\n                        else:\n                            break\n                        n_attempt -= 1\n                    self.search_cache_cst[i].append(int(resource_alloc[\"BRAM18K\"][i]))\n\n            is_first = False\n            if fine_tune:\n                skip_search = 1\n                if len(records) == 0:\n                    if not adjust_func(partition, resource_alloc, legal_records, 1):\n                        break\n                else:\n                    if best_throughput_tmp > best_throughput:\n                        legal_records = copy.deepcopy(records)\n                        best_throughput = best_throughput_tmp\n\n                    old_slow_idx = resource_alloc[\"slow_idx\"][0]\n                    old_slow_record_latency = resource_alloc[\"array_latency\"][old_slow_idx]\n                    slow, fast = self.update_bottleneck_idx(records)\n                    resource_alloc[\"slow_idx\"] = slow\n                    resource_alloc[\"fast_idx\"] = fast\n                    resource_alloc[\"array_latency\"] = [record.latency for record in records]\n                    \n                    # For internal testing\n                    #print(\"****************** Tuning ******************\")\n                    #latency_list = [r.latency for r in records]\n                    #dsp_list = [r.cst[\"DSP\"] for r in records]\n                    #bram_list = [r.cst[\"BRAM18K\"] for r in records]\n                    #dsp_eff_list = [r.dsp_eff for r in records]\n                    #print(\"max latency: \", 1 / best_throughput_tmp)\n                    #print(\"latency list: \", latency_list)\n                    #print(\"bram list: \", bram_list)\n                    #print(\"dsp list: \", dsp_list)\n                    #print(\"dsp eff: \", dsp_eff_list)\n                    #print(\"****************** Tuning ******************\")\n\n                    if resource_alloc[\"slow_idx\"][0] == old_slow_idx:\n                        if records[i].latency <= old_slow_record_latency:\n                            break\n                        if not adjust_func(partition, resource_alloc, records, 0):\n                            break\n                    else:\n                        break\n            else:\n                if len(records) == 0:\n                    resource_alloc[\"BRAM18K\"] = [n / 2 for n in resource_alloc[\"BRAM18K\"]]\n                    #resource_alloc[\"DSP\"] = [n / 2 for n in resource_alloc[\"DSP\"]]\n                else:\n                    legal_records = records\n                    break\n\n        return legal_records\n\n    def search_design(self, partition_idx):\n        partition_idx = int(partition_idx)\n        if partition_idx in self.bay_search_log:\n            return self.bay_search_log[partition_idx]\n        #print(len(self.params['partition_candidates']))\n        #print(partition_idx)\n        self.log(f\"Partition {partition_idx}: {self.params['partition_candidates'][partition_idx]['partition']}\")\n        rewards_window = []\n        self.counter.init_counter('local_time')\n        local_best_reward = 0\n        partition = self.params['partition_candidates'][partition_idx]['partition']\n        n_arrays = len(partition)\n         # Store all the search records for each array\n        for i in range(n_arrays):\n            self.search_cache[i] = []\n        # Store the resource constraint used for each search to avoid redundant search\n        for i in range(n_arrays):\n            self.search_cache_cst[i] = []\n\n        # Initialize resource allocation\n        resource_alloc = self.resource_alloc(partition)\n\n        # Find a legal config\n        records = self.find_legal_config(partition, resource_alloc, skip_search=0)\n        if records:\n            self.local_epoch = 0\n            self.last_update_epoch = 0\n            last_slow_idx = -1\n            while True:\n                latency, used_constraints, throughput, meta = self.evaluate(partition, records)\n                dsp_eff = self.est_dsp_eff(throughput, used_constraints)\n                reward = throughput\n                search_record = utils.SearchRecord().extract_from_tuner_multi_acc(records, reward, latency, used_constraints, throughput, dsp_eff, partition=partition)\n                # Update global reward\n                if reward > self.best_reward:\n                    self.best_reward = reward\n                    self.best_search_record = search_record\n                    self.log(f'Global Epoch {self.epoch} - Partition {partition_idx} - #Array {n_arrays}: new global best reward: {self.best_reward} (latency: {latency:.0f}, throughput: {throughput}, DSP eff: {dsp_eff:.2f}, BRAM: {used_constraints[\"BRAM18K\"]:.2f}, DSP: {used_constraints[\"DSP\"]:.2f}, URAM: {used_constraints[\"URAM\"]:.2f}, BW: {search_record.bw:.2f})')\n                self.best_rewards.append(self.best_reward)\n                self.counter.update_counter('time')\n                self.best_rewards_time.append(self.counter.get_counter('time'))\n                # Update local reward\n                if reward > local_best_reward:\n                    local_best_reward = reward\n                    self.log(f'Local Epoch {self.local_epoch} - Partition {partition_idx} - #Array {n_arrays}: new local best reward: {self.best_reward} (latency: {latency:.0f}, throughput: {throughput}, DSP eff: {dsp_eff:.2f}, BRAM: {used_constraints[\"BRAM18K\"]:.2f}, DSP: {used_constraints[\"DSP\"]:.2f}, URAM: {used_constraints[\"URAM\"]:.2f}, BW: {search_record.bw:.2f})')\n                    self.last_update_epoch = self.local_epoch\n                rewards_window.append(reward)\n\n                if len(rewards_window) > self.params[\"max_trial\"]:\n                    stdev_percent = np.std(rewards_window[-3:]) / np.mean(rewards_window[-3:])\n                    if stdev_percent < self.params[\"reward_stdev_thres\"]:\n                        self.log(f'Minimal improvement after {self.params[\"max_trial\"]} rounds, terminated')\n                        break\n                if self.local_epoch - self.last_update_epoch > self.params[\"max_trial\"]:\n                    self.log(f'No improvement after {self.params[\"max_trial\"]} rounds, terminated')\n                    break\n                # If the tuning time is too long, kill it\n                self.counter.update_counter('local_time')\n                if self.counter.get_counter(\"local_time\") > self.max_time:\n                    self.log('Time out, terminated')\n                    break\n\n                # Fine-tuning\n                if self.is_finetune_required(records, dsp_eff):\n                    # Find fastest/slowest design index\n                    slow, fast = self.update_bottleneck_idx(records)\n                    # Update resource alloc to reflect the current usage\n                    for i in range(len(records)):\n                        resource_alloc['DSP'][i] = np.ceil(records[i].cst['DSP'])\n                        resource_alloc['BRAM18K'][i] = np.ceil(records[i].cst['BRAM18K'])\n                    # Adjust resource alloc\n                    resource_alloc[\"init\"] = {\"DSP\": copy.deepcopy(resource_alloc['DSP']),\n                                              \"BRAM18K\": copy.deepcopy(resource_alloc['BRAM18K']),\n                                              \"DSP_total\": used_constraints['DSP'],\n                                              \"BRAM18K_total\": used_constraints['BRAM18K']}\n                    resource_alloc[\"array_latency\"] = [record.latency for record in records]\n                    resource_alloc[\"state\"] = 0\n                    if slow[0] == last_slow_idx:\n                        resource_alloc[\"state\"] = 1\n                    resource_alloc[\"slow_idx\"] = slow\n                    resource_alloc[\"fast_idx\"] = fast\n                    resource_alloc[\"step\"] = [[0, 1], [0.025]] # step for resource adjustment\n                    resource_alloc[\"n_adjust\"] = [0, 0] # number of attempts at each state\n                    resource_alloc[\"decrease\"] = [-1, -1] # indicate if the allocation of bram decreases in the previous round\n                    resource_alloc[\"history\"] = [0, 0] # bram allocation in the last round\n                    last_slow_idx = slow[0]\n                    if not self.resource_alloc_adjust(partition, resource_alloc, records, 0):\n                        self.log('No valid resource allocation found, terminated')\n                        break\n                    records = self.find_legal_config(partition, resource_alloc, old_records=records, adjust_func=self.resource_alloc_adjust, fine_tune=1, skip_search=0)\n                    if not records:\n                        self.log('No valid records found, terminated')\n                        break\n                else:\n                    self.log('Fine-tuning not required, terminated')\n                    break\n\n                self.epoch += 1\n                self.local_epoch += 1\n\n        self.bay_search_log[partition_idx] = local_best_reward\n        self.bayopt_epoch += 1\n        self.bayopt_best_rewards.append(self.best_reward)\n        self.counter.update_counter('time')\n        self.bayopt_best_rewards_time.append(self.counter.get_counter('time'))\n        return local_best_reward\n\n    def search(self):\n        self.n_layers = len(self.search_task.tasks)\n        if self.n_layers < 2:\n            raise RuntimeError(\"Multi-acc exploration requires at least two conv layers.\")\n        self.counter.init_counter('time')\n        # Bayesian Tuner\n        pbounds = {'partition_idx': (0, len(self.params[\"partition_candidates\"]) - 1)} # Right included\n\n        bay_tuner = BayesianOptimization(\n            f=self.search_design,\n            pbounds=pbounds,\n            random_state=1\n        )\n        for probe_idx in self.params['probe_points']:\n            bay_tuner.probe(\n                params=[probe_idx],\n                lazy=True\n            )\n        bay_tuner.maximize(\n            init_points=0,\n            n_iter=10\n        )\n\ndef multi_acc_search2(search_task, init_tasks, cst, search_obj, max_epochs, max_time, \\\n                      n_worker=1, silent=0, population_size=20, policy=0, meta=None, explorer=None, profiling=0):\n    \"\"\" This function finds the best multi-array architecture for a list of tasks.\n    The key difference compared to multi_acc_search2 is that in multi_acc_search2,\n    we restrain the resource for each array and search the best config for each one.\n    However, in multi_acc_search2, we search the array in sequence, when searching the\n    next array, we will take into account the previous one, and penalize the configs\n    resulting in longer overall latency (setup + array latency).\n    \"\"\"\n    import logging\n    logger = logging.getLogger('AutoSA-Tuner')\n    if silent == 0:\n        logger.info(\"Performing cross layer multi-accelerator genetic search...\")\n\n    best_latency = utils.compute_tasks_latency(search_task.tasks, init_tasks)\n    if silent == 0:\n        logger.info(f'Cross-layer multi-accelerator ideal latency: {best_latency}')\n\n    partition_candidates = meta[\"partition_candidates\"]\n\n    tuner_params = {\n        \"explorer\": explorer,\n        \"probe_points\": meta[\"init_partition_candidates\"],        \n        \"best_reward\": 1 / best_latency if best_latency else None,\n        \"partition_candidates\": partition_candidates,\n        \"batch_size\": meta[\"batch_size\"],        \n        \"dsp_eff_thres\": 0.85, # If the DSP eff is greater than this thres, no fine-tuning is required.\n        \"latency_stdev_thres\": 0.03,\n        \"reward_stdev_thres\": 0.025,\n        \"max_trial\": 3 # Terminate fine-tuning after more than x trials\n    }\n    if meta:\n        tuner_params[\"design_idx_list\"] = meta['design_idx_list']\n\n    if max_epochs > 0:\n        pass\n    else:\n        max_time = 1800 # 30 minutes at most\n\n    tuner = MultiAccTuner2(search_task, cst, search_obj, max_epochs, max_time, tuner_params, n_worker, silent)\n    tuner.search()\n\n    search_record = tuner.best_search_record\n    \n    now = datetime.now()\n    config_str = f\"_{explorer.search_config['workload']}_multi2\"        \n    with open(f'tmp/tuning_rewards{config_str}.csv', \"w\", newline='') as f:\n        fieldnames = ['epoch', 'reward', 'time']\n        writer = csv.DictWriter(f, fieldnames=fieldnames)\n        writer.writeheader()\n        for epoch in range(len(tuner.best_rewards)):\n            writer.writerow({'epoch': epoch, 'reward': tuner.best_rewards[epoch], 'time': tuner.best_rewards_time[epoch]})\n\n    return search_record\n\nclass MultiAccTuner2(MultiAccTuner1):\n    def __init__(self, search_task, cst, obj, max_epoch, max_time, params, n_worker=1, silent=0):\n        super().__init__(search_task, cst, obj, max_epoch, max_time, params, n_worker=n_worker, silent=silent)\n\n    def est_mem(self, partition, records, verbose=0):\n        \"\"\" Estimate the total memory usage.\n        BRAM18K is consumed by two parts: arrays and streaming buffers in-between.\n        For two adjacent arrays, suppose their tiling factors as:\n        [tr1, tc1, to1, ti1] and [tr2, tc2, to2, ti2]\n        Compute the tiling factors such that:\n        tr' = c0 * tr1\n        tc' = c1 * tc1\n        (c0 - 1) * tr1 < tr2 + k - 1 <= c0 * tr1\n        (c1 - 1) * tc1 < tc2 + k - 1 <= c1 * tc1\n        Streaming buffers are allocated to hold at least:\n        tr' * tc' * o1(i2) * 2\n        such that when the second array is using the first block of (tr2 + k - 1) * ... * i2,\n        the first array will continue to fill the rest of the buffer for the next round.\n        If verbose is set to 1, return the detailed resource usage of each array and streaming buffer.\n\n        Streaming buffers are mapped to URAMs for this architecture.\n        \"\"\"\n        array_bufs = []\n        stream_bufs = [0 for i in range(len(records))]\n        BRAM18K_total = 0\n        URAM_total = 0\n        # array bufs\n        for i in range(len(records)):\n            record = records[i]\n            BRAM18K_total += record.cst[\"BRAM18K\"]\n            array_bufs.append(record.cst[\"BRAM18K\"])\n\n        # streaming buffers\n        for round in range(len(partition[0])):\n            for i in range(1, len(records)):\n                if round >= len(partition[i - 1]):\n                    continue\n                layer1_idx = partition[i - 1][round]\n                if round >= len(partition[i]):\n                    continue\n                layer2_idx = partition[i][round]\n                array1 = records[i - 1].task_sols[round]\n                array2 = records[i].task_sols[round]\n                # Extract parameters of array 1\n                o1 = self.search_task.tasks[layer1_idx].workload['params']['o']\n                tr1 = min(array1['sol']['r_t1'], self.search_task.tasks[layer1_idx].workload['params']['r'])\n                tc1 = min(array1['sol']['c_t1'], self.search_task.tasks[layer1_idx].workload['params']['c'])\n                for tag in self.search_task.tasks[layer1_idx].workload['tags']:\n                    if tag.startswith('maxpool'):\n                        stride = int(tag.split('_')[-1])\n                        tr1 /= stride\n                        tc1 /= stride\n                tr1 = max(int(tr1), 1)\n                tc1 = max(int(tc1), 1)\n                # Extract parameters of array 2\n                tr2 = min(array2['sol']['r_t1'], self.search_task.tasks[layer2_idx].workload['params']['r'])\n                tc2 = min(array2['sol']['c_t1'], self.search_task.tasks[layer2_idx].workload['params']['c'])\n                k = self.search_task.tasks[layer2_idx].workload['params']['p']\n                data_pack = array2['sol']['i_t2']\n                # Compute the BRAM size\n                c0 = np.ceil((tr2 + k - 1) / tr1)\n                c1 = np.ceil((tc2 + k - 1) / tc1)\n                array1_params = self.search_task.tasks[layer1_idx].workload[\"params\"]\n                array2_params = self.search_task.tasks[layer2_idx].workload[\"params\"]\n                trp = min(c0 * tr1, array1_params['r'])\n                tcp = min(c1 * tc1, array1_params['c'])\n                #ele_num = trp * tcp * o1 * 2\n                ele_num = min(trp * array1_params['c'] * o1, tcp * array1_params['r'] * o1)\n\n                #buffer = np.ceil(self.search_task.dw * data_pack * 8 / 36) * np.ceil(ele_num / data_pack / 512)\n                buffer = np.ceil(self.search_task.dw * data_pack * 8 / 72) * np.ceil(ele_num / data_pack / 4096)\n\n                #print(array1['sol'])\n                #print(array2['sol'])\n                #print(c0, c1, tr1, tc1, trp, tcp, o1)\n                #print(i, data_pack, ele_num, buffer)\n                stream_bufs[i] = max(stream_bufs[i], buffer)\n\n        #BRAM18K_total += np.sum(stream_bufs)\n        URAM_total = np.sum(stream_bufs)\n        if verbose == 0:\n            return {\"BRAM18K\": BRAM18K_total, \"URAM\": URAM_total}, None\n        else:\n            return {\"BRAM18K\": BRAM18K_total, \"URAM\": URAM_total}, {\"array_bufs\": array_bufs, \"stream_bufs\": stream_bufs}\n\n    #def overuse_resource(self, partition, records):\n    #    for record in records:\n    #        if record.valid == 0:\n    #            return True\n    #    mem, meta = self.est_mem(partition, records)\n    #    DSP = 0\n    #    for record in records:\n    #        DSP += record.cst[\"DSP\"]\n    #    BRAM18K = mem[\"BRAM18K\"]\n    #    URAM = mem[\"URAM\"]\n    #    if BRAM18K > self.cst.hw_cst[\"BRAM18K\"]:\n    #        return True\n    #    if URAM > self.cst.hw_cst[\"URAM\"]:\n    #        return True\n    #    if DSP > self.cst.hw_cst[\"DSP\"]:\n    #        return True\n#\n    #    return False\n\n    #def est_resource(self, partition, records):\n    #    mem, meta = self.est_mem(partition, records)\n    #    DSP = 0\n    #    for record in records:\n    #        DSP += record.cst[\"DSP\"]\n#\n    #    return {\"DSP\": DSP, \"BRAM18K\": mem[\"BRAM18K\"], \"URAM\": mem[\"URAM\"]}\n\n    def est_latency(self, partition, records, in_place=0, adjust=0, verbose=0):\n        \"\"\" Compute the latency of the design.\n        The execution model is that at each round, each array will execute the layer at the head of\n        its partition list. Between arrays, there are streaming buffers that make sure the computation\n        gets started as soon as the data are available from the previous array.\n        Until all the arrays finish their tasks, we will start the next round.\n        If in_place is set to 1, records latency will be updated.\n        If adjust is set to 1, we will consider the possible stall between arrays.\n        \"\"\"\n        record_latency = []\n        for r in records:\n            tmp_latency = [s[\"latency\"] for s in r.task_sols]\n            record_latency.append(tmp_latency)\n\n        total_latency = [0 for i in range(len(records))]\n        for latency in record_latency[0]:\n            total_latency[0] += (latency * self.params[\"batch_size\"])\n        # Store the setup/array latency of each array at each round\n        round_info = []\n\n        design_latency = 0\n        for round in range(len(partition[0])):\n            setup_latency = [0]\n            array_latency = [record_latency[0][round] * self.params[\"batch_size\"]]\n\n            # Update the array and setup latency\n            for i in range(1, len(records)):\n                if round >= len(partition[i - 1]):\n                    continue\n                layer1_idx = partition[i - 1][round]\n                if round >= len(partition[i]):\n                    continue\n                layer2_idx = partition[i][round]\n                array1 = records[i - 1].task_sols[round]\n                array2 = records[i].task_sols[round]\n                # Extract the parameters of array 1\n                o1 = self.search_task.tasks[layer1_idx].workload['params']['o']\n                tr1 = min(array1['sol']['r_t1'], self.search_task.tasks[layer1_idx].workload['params']['r'])\n                tc1 = min(array1['sol']['c_t1'], self.search_task.tasks[layer1_idx].workload['params']['c'])\n                tr1_post = tr1\n                tc1_post = tc1\n                for tag in self.search_task.tasks[layer1_idx].workload['tags']:\n                    if tag.startswith('maxpool'):\n                        stride = int(tag.split('_')[-1])\n                        tr1_post /= stride\n                        tc1_post /= stride\n                tr1_post = max(int(tr1_post), 1)\n                tc1_post = max(int(tc1_post), 1)\n                # Extract parameters of array 2\n                tr2 = min(array2['sol']['r_t1'], self.search_task.tasks[layer2_idx].workload['params']['r'])\n                tc2 = min(array2['sol']['c_t1'], self.search_task.tasks[layer2_idx].workload['params']['c'])\n                k = self.search_task.tasks[layer2_idx].workload['params']['p']\n                data_pack = array2['sol']['i_t2']\n\n                c0 = np.ceil((tr2 + k - 1) / tr1_post)\n                c1 = np.ceil((tc2 + k - 1) / tc1_post)\n                array1_params = self.search_task.tasks[layer1_idx].workload[\"params\"]\n                array2_params = self.search_task.tasks[layer2_idx].workload[\"params\"]\n                trp = min(c0 * tr1, array1_params['r'])\n                tcp = min(c1 * tc1, array1_params['c'])\n                # Set up latency\n                #if (array1['sol']['r_t1'] == array2['sol']['r_t1']) and \\\n                #   (array1['sol']['c_t1'] == array2['sol']['c_t1']):\n                #    tri = np.ceil(array2['sol']['i_t1'] / array1['sol']['o_t1']) * array1['sol']['o_t1']\n                #    setup = array_latency[-1] / (np.ceil(array1_params['o'] / tri))\n                #else:\n                #setup = record_latency[i - 1][round] / (np.ceil(array1_params['r'] / trp) * np.ceil(array1_params['c'] / tcp))\n                if trp > tcp:\n                    setup = record_latency[i - 1][round] / np.ceil(array1_params['c'] / tcp)\n                else:\n                    setup = record_latency[i - 1][round] / np.ceil(array1_params['r'] / trp)\n                #setup = 0\n                setup_latency.append(setup)\n\n                # Adjust the array latency\n                if adjust:\n                    raise RuntimeError(\"Array latency adjust for multi-array is not implemented.\")\n                    \"\"\"\n                    # Consider the fine-grained produce-consume relationship\n                    n_fill_rounds = np.ceil((min(2 * tr2 + k - 1, array1_params['r'] + k - 1) - c0 * tr1_post) / tr1_post) * c1\n                    fill_latency = array_latency[-1] / (np.ceil(array1_params['r'] / tr1 * np.ceil(array1_params['c'] / tc1))) * n_fill_rounds\n                    consume_latency = record_latency[i][round] / (np.ceil(array2_params['r'] / tr2 * np.ceil(array2_params['c'] / tc2)))\n                    adjusted_latency = max(fill_latency, consume_latency) * np.ceil(array2_params['r'] / tr2) * np.ceil(array2_params['c'] / tc2)\n                    record_latency[i][round] = adjusted_latency\n                    array_latency.append(adjusted_latency)\n                    \"\"\"\n                else:\n                    # Simply compute the max\n                    array_latency.append(max(record_latency[i][round] * self.params[\"batch_size\"], array_latency[i - 1]))\n\n                for prev_i in range(i + 1):\n                    total_latency[i] += setup_latency[prev_i]\n                total_latency[i] += array_latency[i]\n\n            local_round_latency = 0\n            for lat in setup_latency:\n                local_round_latency += lat\n            local_round_latency += array_latency[-1]\n            design_latency += local_round_latency\n            \n            total_off_chip_trans = 0\n            for i in range(len(records)):\n                if round >= len(partition[i]):\n                    break\n                off_chip_acc_num_meta = records[i].task_sols[round][\"reward_meta\"][\"activity\"][\"off_chip_acc_num_meta\"]\n                cin_trans = 0\n                w_trans = 0\n                cout_trans = 0\n                for module in off_chip_acc_num_meta:\n                    if module.startswith(\"cin\"):\n                        cin_trans = off_chip_acc_num_meta[module]\n                    if module.startswith(\"w\"):\n                        w_trans = off_chip_acc_num_meta[module]\n                    if module.startswith(\"cout\"):\n                        cout_trans = off_chip_acc_num_meta[module]\n                if i in range(1, len(records)):\n                    cin_trans = 0\n                if i in range(len(records) - 1):\n                    cout_trans = 0\n                total_off_chip_trans += (cin_trans + w_trans + cout_trans)\n\n            round_info.append({\"latency\": local_round_latency, \"setup\": setup_latency, \n                               \"total_off_chip_trans\": total_off_chip_trans,\n                               \"array\": [], \"sol\": [], \"params\": []})\n            for i in range(len(records)):\n                if round >= len(partition[i]):\n                    continue\n                round_info[-1][\"array\"].append(records[i].task_sols[round][\"latency\"])\n                #round_info[-1][\"sol\"].append(records[i].task_sols[round][\"sol\"])\n                #round_info[-1][\"params\"].append(self.search_task.tasks[partition[i][round]].workload[\"params\"])\n\n        if in_place:\n            for i in range(1, len(records)):\n                records[i].latency = total_latency[i]\n\n        # Throughput\n        throughput = 1 / design_latency * self.params[\"batch_size\"]\n\n        if verbose == 1:\n            return design_latency, throughput, {\"total_latency\": total_latency, \"round_info\": round_info}\n        else:\n            return design_latency, throughput, {\"total_latency\": total_latency, \"round_info\": round_info}\n\n    #def evaluate(self, partition, records, verbose=0):\n    #    latency, throughput, meta = self.est_latency(partition, records, verbose=verbose)\n    #    resource = self.est_resource(partition, records)\n    #    return latency, resource, throughput, meta\n\n    def find_legal_config(self, partition, resource_alloc, old_records=None, adjust_func=None, fine_tune=0, skip_search=1):\n        legal_records = old_records\n        best_throughput = 0\n        is_first = True\n        n_arrays = len(partition)\n        # Maintain a list of several best designs for each array\n        history = [[] for i in range(n_arrays)]\n        history_thres = 2\n        #history_thres = 1\n        if n_arrays > 10:\n            # Aovid storing too many configs\n            history_thres = 1\n\n        while True:\n            ## For internal testing\n            #print(\"****************** Allocation ******************\")\n            #pprint.pprint(resource_alloc)\n            #print(\"****************** Allocation ******************\")\n            #start  = time.time()\n\n            records = []\n            skip_idx = []\n            job_list = []\n            tasks = []\n            for i in range(n_arrays):\n                # Update the history\n                history_tmp = []\n                for record in history[i]:\n                    if record.cst[\"BRAM18K\"] <= resource_alloc[\"BRAM18K\"][i] and \\\n                       record.cst[\"DSP\"] <= resource_alloc[\"DSP\"][i]:\n                       history_tmp.append(record)\n                if legal_records and is_first:\n                    if legal_records[i].cst[\"BRAM18K\"] <= resource_alloc[\"BRAM18K\"][i] and \\\n                       legal_records[i].cst[\"DSP\"] <= resource_alloc[\"DSP\"][i]:\n                       history_tmp.append(legal_records[i])\n                       self.search_cache[i].append(legal_records[i])\n                history[i] = history_tmp\n                if skip_search == 1:\n                    if (i in resource_alloc[\"fast_idx\"] and len(history[i]) > 0) or \\\n                       ((i not in resource_alloc[\"fast_idx\"]) and (i not in resource_alloc[\"slow_idx\"]) and len(history[i]) > 0):\n                    #if ((i not in resource_alloc[\"slow_idx\"]) and (i not in resource_alloc[\"fast_idx\"])) or len(history[i]) > 0:\n                        #if resource_alloc[\"state\"] == 0:\n                        #    if i < min(resource_alloc[\"slow_idx\"]):\n                        #        skip_idx.append(i)\n                        #        continue\n                        #elif resource_alloc[\"state\"] == 1:\n                        #    if i < min(resource_alloc[\"slow_idx\"]) and i < min(resource_alloc[\"fast_idx\"]):\n                        #        skip_idx.append(i)\n                        #        continue\n\n                        skip_idx.append(i)\n                        continue\n\n                #print(\"skipped: \", skip_idx)\n                #job_list = []\n                for design_idx in self.params[\"design_idx_list\"]:\n                    # Submit the search job\n                    #local_start = time.time()\n                    explorer_tmp = copy.deepcopy(self.params[\"explorer\"])\n                    #local_end = time.time()\n                    #print(\"copy time: \", local_end - local_start)\n                    # Update the constraints\n                    explorer_tmp.cst.hw_cst[\"DSP\"] = resource_alloc[\"DSP\"][i]\n                    explorer_tmp.cst.hw_cst[\"BRAM18K\"] = resource_alloc[\"BRAM18K\"][i]\n                    early_stop = -1\n                    search_task_configs = {}\n                    for task_idx in range(len(partition[i])):\n                        search_task_configs[task_idx] = {'cin_read_mode': 2, 'cout_write_mode': 1}\n                    if i == 0:\n                        for task_idx in range(len(partition[i])):\n                            # Load from DRAM\n                            search_task_configs[task_idx]['cin_read_mode'] = 0\n                    if i == n_arrays - 1:\n                        for task_idx in range(len(partition[i])):\n                            # Write to DRAM\n                            search_task_configs[task_idx]['cout_write_mode'] = 0\n\n                    if i > 0 and len(history[i - 1]) > 0:\n                        prev_array = {\"record\": history[i - 1][0], \"workloads\": partition[i - 1]}\n                    else:\n                        prev_array = None\n                    # Parallel version\n                    #print(f\"{i}_{design_idx}\")\n                    job_list.append(\n                        {\n                            \"job_hash\": f\"{i}_{design_idx}\",\n                            \"func\": explorer_tmp.search_non_fusion_single_acc_customized1,\n                            \"args\": [design_idx, search_task_configs, -1, self.sub_task_silent, partition[i], None, True]\n                            #\"args\": [design_idx, search_task_configs, -1, self.sub_task_silent, partition[i], prev_array, True]\n                        }\n                    )\n                    # Sequential version\n                    #search_record = explorer_tmp.search_non_fusion_single_acc_customized1(\\\n                    #    design_idx=design_idx, silent=self.sub_task_silent, \\\n                    #    workload_idx=partition[i], early_stop=early_stop, \\\n                    #    search_task_configs=search_task_configs, prev_array=prev_array, one_gen=True)\n                    #if search_record.valid:\n                    #    early_stop = search_record.latency\n                    #    history[i].append(search_record)\n                    #    self.search_cache[i].append(search_record)\n\n            pool = utils.MyExecutor(max(int(self.n_worker/8), 8))\n            results = pool.exec(job_list)\n            for i in range(n_arrays):\n                if i in skip_idx:\n                    continue\n                for design_idx in self.params[\"design_idx_list\"]:\n                    search_record = results[f\"{i}_{design_idx}\"]\n                    if search_record.valid:\n                        history[i].append(search_record)\n                        self.search_cache[i].append(search_record)\n                def take_latency(record):\n                    return record.latency\n                history[i].sort(key=take_latency)\n                history[i] = history[i][:min(len(history[i]), history_thres)]\n\n            #end = time.time()\n            #print(\"eval time: \", end - start)\n            #start  = time.time()\n\n            # Find the array combination that satisfies the memory usage\n            choices_tmp = [list(range(len(h))) for h in history]\n            choices = list(itertools.product(*choices_tmp))\n            #print(\"total choices: \", len(choices))\n            max_bram_tmp = 0\n            min_bram_tmp = float(\"inf\")\n            best_throughput_tmp = 0\n            for choice in choices:\n                records_tmp = []\n                for i in range(n_arrays):\n                    records_tmp.append(history[i][choice[i]])\n                latency, throughput, _ = self.est_latency(partition, records_tmp)\n                memory, meta = self.est_mem(partition, records_tmp, verbose=0)\n                #print(\"array_bufs: \", meta[\"array_bufs\"])\n                #print(\"stream_bufs: \", meta[\"stream_bufs\"])\n                #if memory > max_bram_tmp:\n                #    max_bram_tmp = memory[\"BRAM18K\"]\n                #if memory < min_bram_tmp:\n                #    min_bram_tmp = memory[\"BRAM18K\"]\n                #if memory < self.cst.hw_cst[\"BRAM18K\"]:\n                if not self.overuse_resource(partition, records_tmp):\n                    if throughput > best_throughput_tmp:\n                        records = records_tmp\n                        best_throughput_tmp = throughput\n\n            #end = time.time()\n            #print(f\"select time: {end - start} (avg: {(end - start) / len(choices)})\")\n            #start  = time.time()\n\n            # Search for several designs with fewer resource for tuning\n            if records and fine_tune == 1:\n                for i in range(n_arrays):\n                    if i not in resource_alloc[\"fast_idx\"]:\n                        continue\n                    if int(resource_alloc[\"BRAM18K\"][i]) in self.search_cache_cst[i]:\n                        continue\n                    unit_dec_bram = 16 # Start with 16\n                    dec_bram = unit_dec_bram\n                    slow_idx_list = resource_alloc[\"slow_idx\"]\n                    fast_idx_list = resource_alloc[\"fast_idx\"]\n                    ub_latency = (records[slow_idx_list[0]].latency - records[i].latency) / (len(fast_idx_list) + 1) + records[i].latency\n                    #print(\"cache ub latency: \", ub_latency)\n                    n_attempt = 2\n                    while n_attempt > 0:\n                        explorer_tmp = copy.deepcopy(self.params[\"explorer\"])\n                        explorer_tmp.cst.hw_cst[\"DSP\"] = resource_alloc[\"DSP\"][i]\n                        explorer_tmp.cst.hw_cst[\"BRAM18K\"] = records[i].cst[\"BRAM18K\"] - dec_bram\n                        for design_idx in self.params[\"design_idx_list\"]:\n                            design = explorer_tmp.designs[design_idx]\n                            if design.name == records[i].design:\n                                cur_design_idx = design_idx\n                        search_record = None\n                        for r_c in self.search_cache[i]:\n                            if r_c.cst[\"BRAM18K\"] == explorer_tmp.cst.hw_cst[\"BRAM18K\"] and \\\n                               r_c.cst[\"DSP\"] == explorer_tmp.cst.hw_cst[\"DSP\"] and \\\n                               r_c.design == explorer_tmp.designs[cur_design_idx].name:\n                                search_record = r_c\n                                break\n                        if not search_record:\n                            search_task_configs = {}\n                            for task_idx in range(len(partition[i])):\n                                search_task_configs[task_idx] = {'cin_read_mode': 2, 'cout_write_mode': 1}\n                            if i == 0:\n                                for task_idx in range(len(partition[i])):\n                                    # Load from DRAM\n                                    search_task_configs[task_idx]['cin_read_mode'] = 0\n                            if i == n_arrays - 1:\n                                for task_idx in range(len(partition[i])):\n                                    # Write to DRAM\n                                    search_task_configs[task_idx]['cout_write_mode'] = 0\n                            if i > 0:\n                                prev_array = {\"record\": records[i - 1], \"workloads\": partition[i - 1]}\n                            else:\n                                prev_array = None\n                            search_record = explorer_tmp.search_non_fusion_single_acc_customized1(\\\n                                design_idx=cur_design_idx, silent=self.sub_task_silent, workload_idx=partition[i], \\\n                                #search_task_configs=search_task_configs, prev_array=prev_array, one_gen=True)\n                                search_task_configs=search_task_configs, prev_array=None, one_gen=True)\n                            if search_record.valid:\n                                self.search_cache[i].append(search_record)\n                        if search_record.valid:\n                            #print(\"cache searching: \", search_record.cst[\"BRAM18K\"], search_record.latency)\n                            if n_attempt == 2 and search_record.latency > ub_latency:\n                                unit_dec_bram = 4\n                                dec_bram = unit_dec_bram\n                            else:\n                                dec_bram = records[i].cst[\"BRAM18K\"]- search_record.cst[\"BRAM18K\"] + unit_dec_bram\n                        else:\n                            break\n                        n_attempt -= 1\n                    self.search_cache_cst[i].append(int(resource_alloc[\"BRAM18K\"][i]))\n\n            #end = time.time()\n            #print(\"cache time: \", end - start)\n\n            is_first = False\n            if fine_tune:\n                skip_search = 1\n                if len(records) == 0:\n                    if not adjust_func(partition, resource_alloc, legal_records, 1):\n                        break\n                else:\n                    if best_throughput_tmp > best_throughput:\n                        legal_records = copy.deepcopy(records)\n                        best_throughput = best_throughput_tmp\n\n                    latency_list = [r.latency for r in records]\n                    old_slow_idx = resource_alloc[\"slow_idx\"][0]\n                    old_slow_record_latency = resource_alloc[\"array_latency\"][old_slow_idx]\n                    slow, fast = self.update_bottleneck_idx(records)\n                    resource_alloc[\"slow_idx\"] = slow\n                    resource_alloc[\"fast_idx\"] = fast\n                    resource_alloc[\"array_latency\"] = [record.latency for record in records]\n\n                    ## For internal testing\n                    #print(\"****************** Tuning ******************\")\n                    #latency, throughput, meta = self.est_latency(partition, records, verbose=1)\n                    #print(\"total_latency: \", meta[\"total_latency\"])\n                    #print(\"latency: \", latency)\n                    #print(\"throughput: \", throughput)\n                    #print(\"round_info: \")\n                    #pprint.pprint(meta[\"round_info\"])\n                    #memory, meta = self.est_mem(partition, records, verbose=1)\n                    #print(\"memory: \", memory)\n                    #print(\"array_bufs: \", meta[\"array_bufs\"])\n                    #print(\"stream_bufs: \", meta[\"stream_bufs\"])\n                    #latency_list = [r.latency for r in records]\n                    #dsp_list = [r.cst[\"DSP\"] for r in records]\n                    #dsf_eff_list = [r.dsp_eff for r in records]\n                    #bram_list = [r.cst[\"BRAM18K\"] for r in records]\n                    #kernel_list = [r.design for r in records]\n                    #print(\"latency list: \", latency_list)\n                    #print(\"bram list: \", bram_list)\n                    #print(\"dsp list: \", dsp_list)\n                    #print(\"dsp eff list: \", dsf_eff_list)\n                    #print(\"kernel list: \", kernel_list)\n                    #print(\"****************** Tuning ******************\")\n\n                    if resource_alloc[\"slow_idx\"][0] == old_slow_idx:\n                        # If the performance is not improved upon the last time, break as well\n                        if records[i].latency <= old_slow_record_latency:\n                            break\n                        if not adjust_func(partition, resource_alloc, records, 0):\n                            break\n                    else:\n                        break\n            else:\n                if len(records) == 0:\n                    resource_alloc[\"BRAM18K\"] = [n / 2 for n in resource_alloc[\"BRAM18K\"]]\n                    #resource_alloc[\"DSP\"] = [n / 2 for n in resource_alloc[\"DSP\"]]\n                else:\n                    legal_records = records\n                    break\n\n        return legal_records\n\n    def search_design(self, partition_idx):\n        partition_idx = int(partition_idx)\n        if partition_idx in self.bay_search_log:\n            return self.bay_search_log[partition_idx]\n        #n_array = int(n_array)\n        #if n_array in self.bay_search_log:\n        #    return self.bay_search_log[n_array]\n        self.log(f\"Partition {partition_idx}: {self.params['partition_candidates'][partition_idx]['partition']}, #Array: {len(self.params['partition_candidates'][partition_idx]['partition'])}\")\n        #self.log(f\"#Array: {n_array}\")\n        rewards_window = []\n        self.counter.init_counter('local_time')\n        local_best_reward = 0\n        # Build the partition\n        partition = self.params['partition_candidates'][partition_idx]['partition']\n        n_arrays = len(partition)\n        #partition = [[] for i in range(n_array)]\n        #for i in range(len(self.search_task.tasks)):\n        #    array_idx = i % n_array\n        #    partition[array_idx].append(i)\n        # Store all the search records for each array\n        for i in range(n_arrays):\n            self.search_cache[i] = []\n        # Store the resource constraint used for each search to avoid redundant search\n        for i in range(n_arrays):\n            self.search_cache_cst[i] = []\n\n        # Initialize resource allocation\n        resource_alloc = self.resource_alloc(partition)\n\n        # Find a legal config\n        records = self.find_legal_config(partition, resource_alloc, skip_search=0)\n        if records:\n            self.local_epoch = 0\n            self.last_update_epoch = 0\n            last_slow_idx = -1\n            while True:\n                latency, used_constraints, throughput, meta = self.evaluate(partition, records, verbose=1)\n                dsp_eff = self.est_dsp_eff(throughput, used_constraints)                \n                reward = throughput\n                search_record = utils.SearchRecord().extract_from_tuner_multi_acc(records, reward, latency, used_constraints, throughput, dsp_eff, partition=partition)\n                # Update global reward\n                if reward > self.best_reward:\n                    self.best_reward = reward\n                    self.best_search_record = search_record                    \n                    self.log(f'Global Epoch {self.epoch} - #Array {n_arrays}: new global best reward: {self.best_reward} (latency: {latency:.0f}, throughput: {throughput}, DSP eff: {dsp_eff:.2f}, BRAM: {used_constraints[\"BRAM18K\"]:.2f}, DSP: {used_constraints[\"DSP\"]:.2f}, URAM: {used_constraints[\"URAM\"]:.2f}, BW: {search_record.bw:.2f})')\n                self.best_rewards.append(self.best_reward)\n                self.counter.update_counter('time')\n                self.best_rewards_time.append(self.counter.get_counter('time'))\n                # Update local reward\n                if reward > local_best_reward:\n                    local_best_reward = reward\n                    self.log(f'Local Epoch {self.local_epoch} - #Array {n_arrays}: new local best reward: {self.best_reward} (latency: {latency:.0f}, throughput: {throughput}, DSP eff: {dsp_eff:.2f}, BRAM: {used_constraints[\"BRAM18K\"]:.2f}, DSP: {used_constraints[\"DSP\"]:.2f}, URAM: {used_constraints[\"URAM\"]:.2f}, BW: {search_record.bw:.2f})')\n                    self.last_update_epoch = self.local_epoch\n                rewards_window.append(reward)\n\n                if len(rewards_window) > self.params[\"max_trial\"]:\n                    stdev_percent = np.std(rewards_window[-3:]) / np.mean(rewards_window[-3:])\n                    if stdev_percent < self.params[\"reward_stdev_thres\"]:\n                        self.log(f'Minimal improvement after {self.params[\"max_trial\"]} rounds, terminated')\n                        break\n                if self.local_epoch - self.last_update_epoch > self.params[\"max_trial\"]:\n                    self.log(f'No improvement after {self.params[\"max_trial\"]} rounds, terminated')\n                    break\n                # If the tuning time is too long, kill it\n                self.counter.update_counter('local_time')\n                if self.counter.get_counter(\"local_time\") > self.max_time:\n                    self.log('Time out, terminated')\n                    break\n\n                # Fine-tuning\n                if self.is_finetune_required(records, dsp_eff):\n                    # Find fastest/slowest design index\n                    slow, fast = self.update_bottleneck_idx(records)\n                    # Update resource alloc to reflect the current usage\n                    for i in range(len(records)):\n                        resource_alloc['DSP'][i] = np.ceil(records[i].cst['DSP'])\n                        resource_alloc['BRAM18K'][i] = np.ceil(records[i].cst['BRAM18K'])\n                    # Adjust resource alloc\n                    resource_alloc[\"init\"] = {\"DSP\": copy.deepcopy(resource_alloc['DSP']),\n                                              \"BRAM18K\": copy.deepcopy(resource_alloc['BRAM18K']),\n                                              \"DSP_total\": used_constraints['DSP'],\n                                              \"BRAM18K_total\": used_constraints['BRAM18K'],\n                                              \"URAM_total\": used_constraints['URAM'],\n                                              }\n                    resource_alloc[\"array_latency\"] = [record.latency for record in records]\n                    resource_alloc[\"state\"] = 0\n                    if slow[0] == last_slow_idx:\n                        resource_alloc[\"state\"] = 1\n                    resource_alloc[\"slow_idx\"] = slow\n                    resource_alloc[\"fast_idx\"] = fast\n                    resource_alloc[\"step\"] = [[0, 1], [0.025]] # step for resource adjustment\n                    resource_alloc[\"n_adjust\"] = [0, 0] # number of attempts at each state\n                    resource_alloc[\"decrease\"] = [-1, -1] # indicate if the allocation of bram decreases in the previous round\n                    resource_alloc[\"history\"] = [0, 0] # bram allocation in the last round\n                    last_slow_idx = slow[0]\n                    if not self.resource_alloc_adjust(partition, resource_alloc, records, 0):\n                        self.log('No valid resource allocation found, terminated')\n                        break\n                    records = self.find_legal_config(partition, resource_alloc, old_records=records, adjust_func=self.resource_alloc_adjust, fine_tune=1, skip_search=0)\n                    if not records:\n                        self.log('No valid records found, terminated')\n                        break\n                else:\n                    self.log('Fine-tuning not required, terminated')\n                    break\n\n                self.epoch += 1\n                self.local_epoch += 1\n\n        self.bay_search_log[partition_idx] = local_best_reward\n        return local_best_reward\n\n    def search(self):\n        self.n_layers = len(self.search_task.tasks)\n        if self.n_layers < 2:\n            raise RuntimeError(\"Multi-acc exploration requires at least two conv layers.\")\n        self.counter.init_counter('time')\n        # Bayesian Tuner\n        #pbounds = {'n_array': (2, min(self.n_layers, self.params[\"n_array_max\"]))} # Right included\n        pbounds = {'partition_idx': (0, len(self.params[\"partition_candidates\"]) - 1)} # Right included\n\n        bay_tuner = BayesianOptimization(\n            f=self.search_design,\n            pbounds=pbounds,\n            random_state=1\n        )\n        for probe_idx in self.params['probe_points']:\n            bay_tuner.probe(\n                params=[probe_idx],\n                lazy=True\n            )\n        bay_tuner.maximize(\n            init_points=0,\n            n_iter=10\n        )\n"
  },
  {
    "path": "autosa_scripts/odyssey/unit_test.py",
    "content": "\nimport copy\nfrom search_task import SingleTask\nfrom design import Design\nimport json\nfrom tuners import Constraint\n\nclass Workload(object):\n    def __init__(self, params):\n        self.params = params\n\n    def __repr__(self):\n        return f\"{self.params}\"\n\nclass SearchTask(object):\n    def __init__(self, workload):\n        self.workload = workload\n\n    def __repr__(self):\n        return str(self.workload)\n\ndef est_mm_performance():\n    params = {\n        \"i\": 1024, \"i_t1\": 129, \"i_t2\": 3,\n        \"j\": 1024, \"j_t1\": 130, \"j_t2\": 13,\n        \"k\": 1024, \"k_t1\": 64, \"k_t2\": 4,\n        \"p9\": 16, \"p10\": 16, \"p11\": 4, \"p12\": 4 # A, B, None, C\n    }\n\n    # comp\n    #params = {\n    #    \"i\": 1024, \"i_t1\": 520, \"i_t2\": 26,\n    #    \"j\": 1024, \"j_t1\": 520, \"j_t2\": 26,\n    #    \"k\": 1024, \"k_t1\": 320, \"k_t2\": 4,\n    #    \"p9\": 16, \"p10\": 16, \"p11\": 4, \"p12\": 4 # A, B, None, C\n    #}\n\n    # comm\n    #params = {\n    #    \"i\": 1024, \"i_t1\": 1024, \"i_t2\": 128,\n    #    \"j\": 1024, \"j_t1\": 1024, \"j_t2\": 128,\n    #    \"k\": 1024, \"k_t1\": 320, \"k_t2\": 4,\n    #    \"p9\": 16, \"p10\": 16, \"p11\": 4, \"p12\": 4 # A, B, None, C\n    #}\n\n    # comm-comp\n    #params = {\n    #    \"i\": 1024, \"i_t1\": 1024, \"i_t2\": 64,\n    #    \"j\": 1024, \"j_t1\": 1024, \"j_t2\": 64,\n    #    \"k\": 1024, \"k_t1\": 320, \"k_t2\": 4,\n    #    \"p9\": 16, \"p10\": 16, \"p11\": 4, \"p12\": 4 # A, B, None, C\n    #}\n\n    workload = {\n        \"name\": \"gemm\",\n        \"tags\": [\"gemm\"],\n        \"params\": {\n            \"i\": 1024, \"j\": 1024, \"k\": 1024\n        }\n    }\n\n    cst = Constraint(\"cst/hw_cst.json\")\n\n    design_dir = \"designs\"\n    kernel_name = \"kernel3_2\"\n    with open(f\"designs/{kernel_name}.json\", \"r\") as json_f:\n        desp = json.load(json_f)\n    design = Design(kernel_name)\n    design.register(desp, f\"designs/register/{kernel_name}.py\")\n\n    search_task = SingleTask(design, workload, cst)\n    reward, resource, meta = search_task.evaluate(params)\n    print(1 / reward)\n    print(resource)\n    print(meta)\n\nif __name__ == \"__main__\":\n    est_mm_performance()\n"
  },
  {
    "path": "autosa_scripts/odyssey/utils.py",
    "content": "import time\nimport functools\nimport math\nimport logging\nimport itertools\nfrom datetime import datetime\nfrom subprocess import Popen, PIPE\nimport json\nimport pprint\nimport queue\nimport multiprocessing as mp\nfrom pathos.pools import ProcessPool, ParallelPool\nimport copy\n\ndef factorization(x):\n    if x == 0:\n        raise RuntimeError(f\"Factorization of 0\")\n    prime_factors = []\n    while x % 2 == 0:\n        prime_factors.append(2)\n        x = x / 2\n    \n    for i in range(3, int(math.sqrt(x)) + 1, 2):\n        while x % i == 0:\n            prime_factors.append(int(i))\n            x = x / i\n    \n    if x > 2:\n        prime_factors.append(int(x))\n\n    return prime_factors\n\ndef get_divisors(x, filter=None):\n    \"\"\" Return the divisors of the integer x\n    Call the filter function to filter out the illegal one.\n    \"\"\"\n    divisors = []\n    large_divisors = []\n    for i in range(1, int(math.sqrt(x) + 1)):\n        if x % i == 0:\n            if (filter and not filter(i)) or not filter:\n                divisors.append(int(i))\n            if i * i != x:\n                if (filter and not filter(int(x / i))) or not filter:\n                    large_divisors.append(int(x / i))\n    for d in reversed(large_divisors):\n        divisors.append(d)\n\n    return divisors\n\ndef compute_tasks_latency(search_tasks, init_tasks):\n    \"\"\" Aggregate the best latency of the search tasks.\n    \"\"\"\n    # Collect the best single task latency\n    task_latency = {}\n    for task in search_tasks:\n        found = False\n        cur_latency = []\n        task_prefix = str(task)[:str(task).find('d')]\n        for i_task in init_tasks:\n            if len(i_task.task_sols) == 1:\n                i_task_prefix = i_task.task_sols[0]['hash']\n                i_task_prefix = i_task_prefix[:i_task_prefix.find('d')]\n                if i_task_prefix == task_prefix:\n                    found = True\n                    cur_latency.append(i_task.task_sols[0]['latency'])\n        if not found:\n            #raise RuntimeError(f\"Task {str(task)} not found in the history.\")\n            return None\n        task_latency[task.workload[\"name\"]] = min(cur_latency)\n    \n    # Init tasks may contain fused tasks.\n    # If the fused tasks help improve the latency, we will replace the old \n    # unfused task pairs with the fused tasks.\n    for i_task in init_tasks:\n        if len(i_task.task_sols) > 1:\n            unfused_latency = 0\n            for name in i_task.task_names:\n                if name not in task_latency:\n                    # This task has been handled by other fusion tasks\n                    unfused_latency = 0\n                    break\n                unfused_latency += task_latency[name]\n            if i_task.latency < unfused_latency:\n                task_latency[''.join(i_task.task_names)] = i_task.latency\n                for name in i_task.task_names:\n                    del task_latency[name]\n\n    latency = 0\n    for k, v in task_latency.items():\n        latency += v\n\n    return latency\n\nclass PerfCounter(object):\n    def __init__(self, logger=None):\n        self.logger = logger\n        self.counters = {}\n    \n    def init_counter(self, name):        \n        self.counters[name] = {'start': time.perf_counter(), 'elapsed': 0}\n        \n    def update_counter(self, name):\n        if name not in self.counters:\n            raise RuntimeError(f\"Counter {name} is not defined\")\n        now = time.perf_counter()\n        self.counters[name]['elapsed'] += (now - self.counters[name]['start'])\n        self.counters[name]['start'] = now\n\n    def get_counter(self, name):\n        if name not in self.counters:\n            raise RuntimeError(f\"Counter {name} is not defined\")\n        return self.counters[name]['elapsed']\n\n    def print_counter(self, name):\n        if name not in self.counters:\n            raise RuntimeError(f\"Counter {name} is not defined\")\n        if not self.logger:\n            raise RuntimeError(f\"Logger is not defined\")\n        self.logger.info(f'[Event: {name}] Total elapsed time: {self.counters[name][\"elapsed\"]:.4f} s')\n\n    def print_counters(self):\n        if not self.logger:\n            raise RuntimeError(f\"Logger is not defined\")\n        for name in self.counters:\n            self.logger.info(f'[Event: {name}] Total elapsed time: {self.counters[name][\"elapsed\"]:.4f} s')    \n\ndef init_logger(outdir):\t\n    logger = logging.getLogger('AutoSA-Tuner')\n    # If there is already any handlers, remove them\t\n    for handler in logger.handlers[:]:\n        handler.close()\n        logger.removeHandler(handler)\n    formatter = logging.Formatter(\n                '[%(name)s %(asctime)s] %(levelname)s: %(message)s',\n                '%Y-%m-%d %H:%M:%S')\n    logger.setLevel(logging.INFO)\n    s_handler = logging.StreamHandler()    \t\n    f_handler = logging.FileHandler(f'{outdir}/tuning.log', 'a')\n    s_handler.setLevel(level=logging.INFO)\n    f_handler.setLevel(level=logging.INFO)    \n    s_handler.setFormatter(formatter)\n    f_handler.setFormatter(formatter)\n    logger.addHandler(s_handler)\n    logger.addHandler(f_handler)\n    \n    return logger    \n\nclass SearchRecord(object):\n    \"\"\" Data struct for storing the searching results\n    \"\"\"\n    def __init__(self, max=1):\n        self.cst = None\n        self.max = max\n        if self.max == 1:\n            self.reward = 0\n        else:\n            self.reward = float(\"inf\")\n        self.reward_meta = None\n        self.latency = 0\n        self.throughput = 0\n        self.energy = 0\n        self.dsp_eff = 0\n        self.design = None\n        self.ops = 0\n        self.task_names = []\n        self.metric = None\n        self.fuse = -1\n        self.split_pos = -1\n        self.partition = None\n        self.n_array = -1\n        self.bw = 0\n        self.ctc = 0\n        self.exec_model = []\n        self.converge_time = 0\n        # Design frequency\n        self.fre = 300 \n        self.off_chip_trans = 0\n        self.dw = 4 # Float\n        self.valid = 0        \n        \n        # Fixed array architecture solution\n        self.arch_sol = None\n        # Mapped tasks solutions\n        self.task_sols = []\n        # Sub task records\n        self.records = None\n        self.history = []\n\n    def reset(self):\n        self.cst = None        \n        if self.max == 1:\n            self.reward = 0\n        else:\n            self.reward = float(\"inf\")\n        self.reward_meta = None\n        self.latency = 0\n        self.throughput = 0\n        self.energy = 0\n        self.dsp_eff = 0\n        self.design = None\n        self.ops = 0\n        self.task_names = []\n        self.metric = None\n        self.fuse = -1\n        self.split_pos = -1\n        self.partition = None\n        self.n_array = -1\n        self.bw = 0\n        self.ctc = 0\n        self.exec_model = []\n        self.converge_time = 0\n        # Design frequency\n        self.fre = 300 \n        self.off_chip_trans = 0\n        self.valid = 0\n\n        self.arch_sol = None\n        self.task_sols = []\n        self.records = None        \n        self.history = []\n\n        return self\n\n    def update(self, new_record, save=0):  \n        \"\"\" Update the old records if new record is better.\n        If \"save\" is set to 1, store the current record to history.\n        \"\"\"\n        if new_record.valid == 0:\n            return False\n\n        if self.max != new_record.max:\n            raise RuntimeError(\"Inconsistent search record configuration\")\n        status = False\n        if self.max == 1:\n            if new_record.reward > self.reward:\t\t\t\t\n                status = True\n        else:\n            if new_record.reward < self.reward:\n                status = True\n        if status:\n            self.cst = copy.deepcopy(new_record.cst)\n            self.reward = new_record.reward\n            self.reward_meta = copy.deepcopy(new_record.reward_meta)\n            self.latency = new_record.latency\n            self.throughput = new_record.throughput\n            self.energy = new_record.energy\n            self.dsp_eff = new_record.dsp_eff\n            self.design = new_record.design            \n            self.ops = new_record.ops\n            self.task_names = new_record.task_names\n            self.fuse = new_record.fuse\n            self.split_pos = new_record.split_pos\n            self.partition = new_record.partition\n            self.n_array = new_record.n_array\n            self.bw = new_record.bw\n            self.ctc = new_record.ctc\n            self.exec_model = new_record.exec_model\n            self.metric = new_record.metric\n            self.converge_time = new_record.converge_time\n            self.off_chip_trans = new_record.off_chip_trans\n            self.valid = new_record.valid            \n\n            self.arch_sol = new_record.arch_sol\n            self.task_sols = new_record.task_sols\n            self.records = new_record.records\n        \n        if save == 1:\n            self.history.append(new_record)\n\n        return status\n\n    def dup(self):\n        \"\"\" Duplicate the current record.\n        \"\"\"\n        new_record = SearchRecord()\n        new_record.cst = copy.deepcopy(self.cst)\n        new_record.max = self.max\n        new_record.reward = self.reward\n        new_record.reward_meta = self.reward_meta\n        new_record.latency = self.latency\n        new_record.throughput = self.throughput\n        new_record.energy = self.energy\n        new_record.dsp_eff = self.dsp_eff\n        new_record.design = self.design\n        new_record.ops = self.ops\n        new_record.task_names = copy.deepcopy(self.task_names)\n        new_record.metric = self.metric\n        new_record.fuse = self.fuse\n        new_record.split_pos = self.split_pos\n        new_record.partition = self.partition\n        new_record.n_array = self.n_array\n        new_record.bw = self.bw\n        new_record.ctc = self.ctc\n        new_record.exec_model = copy.deepcopy(self.exec_model)\n        new_record.converge_time = self.converge_time\n        new_record.off_chip_trans = self.off_chip_trans\n        new_record.valid = self.valid\n        new_record.arch_sol = copy.deepcopy(self.arch_sol)\n        new_record.task_sols = copy.deepcopy(self.task_sols)\n        if self.records:\n            new_record.records = []\n            for record in self.records:\n                new_record.records.append(record.dup())\n        \n        return new_record\n\n    def extract_from_tuner_single_acc(self, tuner):\n        \"\"\" Extract the sinlge accelerator search results from the tuner.\n        \"\"\"\n        if tuner.best_sol:\n            self.cst = tuner.best_sol_cst\n            self.reward = tuner.best_reward\n            self.reward_meta = tuner.best_reward_meta\n            self.ops = tuner.search_task.compute_ops()\n            if tuner.search_obj == \"latency\":\n                self.latency = 1 / self.reward\n                self.throughput = self.ops / self.latency\n                # Compute the updated DSP efficiency\n                # Note: Only applicable for FP32\n                self.dsp_eff = tuner.search_task.compute_dsp_eff(self.latency, self.cst[\"DSP\"])\n            elif tuner.search_obj in [\"off_chip_comm\", \"dsp_num\"]:\n                self.latency = self.reward_meta[\"latency\"][\"latency\"]\n                self.throughput = self.ops / self.latency\n                self.dsp_eff = tuner.search_task.compute_dsp_eff(self.latency, self.cst[\"DSP\"])\n            elif tuner.search_obj == \"energy\":\n                self.energy = 1 / self.reward\n                self.latency = self.reward_meta[\"latency\"][\"latency\"]\n                self.throughput = self.ops / self.latency\n                self.dsp_eff = tuner.search_task.compute_dsp_eff(self.latency, self.cst[\"DSP\"])\n            else:\n                raise RuntimeError(\"Unsupported search objective: \", tuner.search_obj)\n            self.design = tuner.search_task.design.name            \n            self.task_names = [tuner.search_task.workload[\"name\"]]\n            #self.fuse = tuner.search_task.fuse\n            self.split_pos = -1\n            self.metric = tuner.search_obj\n            self.bw = tuner.search_task.compute_bw(tuner.best_sol)\n            self.ctc = tuner.search_task.compute_ctc(tuner.best_sol)\n            self.exec_model.append(tuner.search_task.workload[\"name\"])\n            self.converge_time = tuner.converge_time\n            self.off_chip_trans = tuner.search_task.est_off_chip_trans(tuner.best_sol)\n\n            # Solutions\n            self.arch_sol = tuner.search_task.arch_sol\n            self.task_sols = [{\n                \"name\": tuner.search_task.workload[\"name\"],\n                \"hash\": str(tuner.search_task),\n                \"ops\": tuner.search_task.compute_ops(),\n                \"sol\": tuner.best_sol,\n                \"latency\": self.latency,\n                \"CTC\": self.ctc,\n                \"DSP_eff\": self.dsp_eff,\n                \"reward_meta\": tuner.best_reward_meta,\n                \"BW\": self.bw\n            }]            \n            self.records = None\n\n            self.valid = 1\n\n        return self\n\n    def extract_from_tuner_multi_acc(self, records, reward, latency, cst, throughput, dsp_eff, split_pos=-1, partition=None, n_array=-1, meta=None):\n        \"\"\" Extract multi-acc search records from the tuner.\n        If meta is set, this is Arch3 (multi2), we use a different method to calcualte BW.\n        \"\"\"\n        self.valid = 1\n        for record in records:\n            if record.valid == 0:\n                self.valid = 0\n        self.cst = cst\n        self.latency = latency\n        self.reward = reward\n        self.dsp_eff = dsp_eff\n        self.throughput = throughput\n        self.split_pos = split_pos\n        self.partition = partition\n        self.n_array = n_array\n        self.metric = records[0].metric\n        for record in records:\n            self.task_names += copy.deepcopy(record.task_names)\n        #for record in records:\n        #    self.bw += record.bw\n        # Use the 1/throughput as the maximal latency\n        # Accumulate the total data communication for all the arrays \n        # For single-workload array, check if the on-chips streaming buffers are used.\n        if not meta:\n            max_latency = 1 / throughput\n            total_off_chip_trans = 0\n            for record in records:\n                total_off_chip_trans += record.off_chip_trans \n            self.bw = total_off_chip_trans * self.dw / (max_latency / (self.fre * 1e6)) / 1e9 # GB/s\n        else:\n            bw = 0\n            for r in range(len(meta['round_info'])):\n                total_off_chip_trans = meta['round_info'][r]['total_off_chip_trans']\n                round_latency = meta['round_info'][r]['latency']\n                bw = max(bw, total_off_chip_trans * self.dw / (round_latency / (self.fre * 1e6)) / 1e9)\n            self.bw = bw                \n\n        self.records = copy.deepcopy(records)\n\n        return self\n\n    def __repr__(self):\n        return self.to_str()\n\n    def to_str(self):\n        to_print = \"\"\n        if self.valid:        \n            to_print += f\"\\nreward: {self.reward}\"\n            #to_print += f\"\\nreward meta: {self.reward_meta}\"\n            to_print += f\"\\ncst: {pprint.pformat(self.cst, indent=2)}\"\n            to_print += f\"\\nlatency: {self.latency}\"\n            to_print += f\"\\nthroughput: {self.throughput}\"            \n            to_print += f\"\\nenergy(mJ/normalized): {self.energy:.6f}\"\n            to_print += f\"\\nDSP efficiency: {self.dsp_eff:.2f}\"\n            to_print += f\"\\nBW(GB/s): {self.bw:.2f}\"\n            to_print += f\"\\nops: {self.ops:.2f}\"\n            to_print += f\"\\nCTC(FLOP/byte): {self.ctc:.2f}\"\n            to_print += f\"\\ndesign: {self.design}\"\n            to_print += f\"\\nconverge time: {self.converge_time}\"\n            to_print += f\"\\noff-chip communication (Bytes): {self.off_chip_trans * self.dw}\"\n            if self.fuse != -1:\n                to_print += f\"\\nfuse: {self.fuse}\"\n            if self.split_pos != -1:\n                to_print += f\"\\nsplit position: {self.split_pos}\"            \n            if self.partition:\n                to_print += f\"\\npartition: {self.partition}\"\n            if self.n_array != -1:\n                to_print += f\"\\n#array: {self.n_array}\"            \n            if len(self.exec_model) > 0:\n                to_print += f\"\\nexec model: {self.exec_model}\"\n            to_print += f\"\\ntask names: {self.task_names}\"\n            if self.arch_sol:\n                to_print += f\"\\narch sol: {pprint.pformat(self.arch_sol, indent=2)}\"\n            if self.task_sols:\n                to_print += f\"\\ntask sols: \\n{pprint.pformat(self.task_sols, indent=2)}\"\n            if self.records:                \n                to_print += f\"\\nrecords: \"\n                for record_idx in range(len(self.records)):\n                    to_print += f\"\\n<record{record_idx}><begin>\"\n                    to_print += f\"{self.records[record_idx].to_str()}\"\n                    to_print += f\"<record{record_idx}><end>\"                \n            if len(self.history) > 1:\n                to_print += f\"\\nhistory records: \"\n                for record_idx in range(len(self.history)):\n                    to_print += f\"\\n<record{record_idx}><begin>\"\n                    to_print += f\"{self.history[record_idx].to_str()}\"\n                    to_print += f\"<record{record_idx}><end>\"\n        else:\n            to_print += f\"\\ninvalid record\"\n        to_print += \"\\n\"\n\n        return to_print\n\n    def append(self, record):\n        \"\"\" Append another record to the current record.\n        All the records should share the same architecture.\n        We will append the task solutions of the next record to the current record.\n        \"\"\"\n        if record.valid == 0:\n            self.valid = 0\n\n        if len(self.task_sols) == 0:\n            self = copy.deepcopy(record)\n        else:\n            if self.max != 1:\n                raise RuntimeError(\"Appending records is only suppported under the max mode.\")\n            if self.metric == \"latency\":\n                if record.latency != 0:\n                    self.dsp_eff = (self.dsp_eff * self.latency + record.dsp_eff * record.latency) / (self.latency + record.latency)\n                self.latency += record.latency\n                if self.latency != 0:\n                    self.reward = 1 / self.latency\n            else:\n                raise RuntimeError(f\"Unsupported metric: {self.metric}.\")\t\t\t\n            self.ops += record.ops\n            self.throughput = self.ops / self.latency\n            self.off_chip_trans += record.off_chip_trans\n            self.bw = max(self.bw, record.bw)\n            self.task_names += copy.deepcopy(record.task_names)\n            self.exec_model += copy.deepcopy(record.exec_model)\n\n            # Solutions\n            self.task_sols += copy.deepcopy(record.task_sols)\n\n        return self\n\n    def merge(self, record1, record2):\n        \"\"\" Merge another record to the current record.\n        All the records should share the same architecture.\n        We will append the next record to the current record lists.\n        \"\"\"                \n        if record1.valid == 0 or record2.valid == 0:\n            self.valid = 0\n            return self\n                \n        self.valid = 1\n        # Update the metadata\n        self.cst = record1.cst        \n        for item in self.cst:\n            if record2.cst[item] > self.cst[item]:\n                self.cst[item] = record2.cst[item]\n        self.metric = record1.metric\n        if self.metric == \"latency\":\n            self.latency = record1.latency + record2.latency\n            self.reward = 1 / self.latency\n            # Update the DSP efficiency\n            self.dsp_eff = (record1.dsp_eff * record1.latency + record2.dsp_eff * record2.latency) / (record1.latency + record2.latency)\n        else:\n            #print(self)\n            raise RuntimeError(f\"Unsupported metric: {self.metric}\")        \n        self.ops = record1.ops + record2.ops\n        self.off_chip_trans = record1.off_chip_trans + record2.off_chip_trans\n        self.bw = max(record1.bw, record2.bw)        \n        self.design = record1.design\n        for t_name in record1.task_names:\n            self.task_names.append(t_name)\n        for t_name in record2.task_names:\n            self.task_names.append(t_name)     \n\n        self.exec_model = copy.deepcopy(record1.exec_model)        \n        if record1.fuse == 1 or record2.fuse == 1:            \n            #print(record1.exec_model)\n            #print(record2.exec_model)\n            #print(record1.fuse, record2.fuse)\n            self.exec_model = [self.exec_model, record2.exec_model]\n            #print(self.exec_model)\n        else:\n            self.exec_model += record2.exec_model         \n        self.arch_sol = record1.arch_sol\n\n        # Solutions                \n        #new_record.records = [copy.deepcopy(self), copy.deepcopy(record)]\n        #self.records = [record1, record2]\n        self.records = [record1.dup(), record2.dup()]\n\n        return self\n\nclass NoDaemonProcess(mp.Process):\n\t# Make \"daemon\" attribute always return false\n\tdef _get_daemon(self):\n\t\treturn False\n\tdef _set_daemon(self, value):\n\t\tpass\n\tdaemon = property(_get_daemon, _set_daemon)\n\nclass MyExecutor(object):\n\tdef __init__(self, n_thread):\n\t\tself.n_thread = n_thread\n\t\tself.timeout = 1800 # 30 minutes\n\t\tself.task_queue = mp.Queue()\n\t\tself.ret_queue = mp.Queue()\n\t\tself.proc_list = []\t\t\n\t\tself.ret = {}\t\t\n\t\tif n_thread > 1:\n\t\t\tmanager = mp.Manager()\n\t\t\tself.return_dict = manager.dict()\n\t\t\tfor i in range(self.n_thread):\t\t\t\t\n\t\t\t\tp = NoDaemonProcess(target=self.runner, args=(self.task_queue, self.return_dict))\n\t\t\t\tself.proc_list.append(p)\t\t\t\n\t\t\tfor i in range(self.n_thread):\n\t\t\t\tself.proc_list[i].start()\n\t\n\tdef runner(self, q, return_dict):\n\t\twhile True:\n\t\t\ttask = q.get()\n\t\t\tif task is None:\n\t\t\t\tbreak\n\t\t\ttask_hash = task[0]\n\t\t\ttask_func = task[1]\n\t\t\ttask_args = task[2]\n\t\t\tret = task_func(*task_args)\t\t\t\n\t\t\treturn_dict[task_hash] = ret\n\n\tdef prune_jobs(self, jobs):\n\t\t\"\"\" Prune jobs with the same hash\n\t\t\"\"\"\n\t\tjob_list = []\n\t\tcache = []\n\n\t\tfor job in jobs:\n\t\t\tif job['job_hash'] in cache:\n\t\t\t\tcontinue\n\t\t\telse:\n\t\t\t\tjob_list.append(job)\n\t\t\t\tcache.append(job['job_hash'])\n\n\t\treturn job_list\t\n\n\tdef exec(self, job_list):\n\t\t\"\"\" Submit the job to the executor.\n\t\tjob and job_args are both lists.\n\t\tReturn a list of job results.\n\t\t\"\"\"\t\t\t\t\n\t\t# Prune away redundant jobs\n\t\tjob_list = self.prune_jobs(job_list)\t\t\t\n\t\t\t\t\n\t\tresults = {}\n\t\tif self.n_thread > 1:\t\t\t\n\t\t\tfor job in job_list:\n\t\t\t\tself.task_queue.put((job['job_hash'], job['func'], job['args']))\t\t\t\n\t\t\tfor i in range(self.n_thread):\n\t\t\t\tself.task_queue.put(None)\t\t\t\n\t\t\tstart = time.time()\n\t\t\twhile time.time() - start <= self.timeout:\t\t\t\t\n\t\t\t\tif not any(p.is_alive() for p in self.proc_list):\n\t\t\t\t\tbreak\n\t\t\t\ttime.sleep(.1)\t\t\t\t\n\t\t\telse:\n\t\t\t\t# Timeout, kill all the processes\n\t\t\t\tfor p in self.proc_list:\n\t\t\t\t\tp.terminate()\t\t\t\t\t\t\t\t\n\t\t\tfor p in self.proc_list:\n\t\t\t\tp.join()\t\t\t\n\t\t\t\n\t\t\tfor job in job_list:\n\t\t\t\tif job['job_hash'] in self.return_dict:\n\t\t\t\t\tresults[job['job_hash']] = self.return_dict[job['job_hash']]\n\t\t\t\telse:\n\t\t\t\t\tresults[job['job_hash']] = SearchRecord().reset()\n\t\telse:\n\t\t\tfor job in job_list:\n\t\t\t\tjob_args = job['args']\n\t\t\t\tresults[job['job_hash']] = job['func'](*job_args)\n\t\t\n\t\treturn results"
  },
  {
    "path": "autosa_scripts/odyssey/workload/conv.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1-1\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 1,\n        \"o\": 6,\n        \"r\": 5,\n        \"c\": 5,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mm.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"gemm\",\n      \"tags\": [\"gemm\"],\n      \"params\": {\n        \"i\": 1024,\n        \"j\": 1024,\n        \"k\": 1024\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mm64.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"gemm\",\n      \"tags\": [\"gemm\"],\n      \"params\": {\n        \"i\": 64,\n        \"j\": 64,\n        \"k\": 64\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 3,\n        \"o\": 32,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 32,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 16,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 16,\n        \"o\": 96,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 24,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 24,\n        \"o\": 96,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 24,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 24,\n        \"o\": 144,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"o\": 32,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 144,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"o\": 32,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 144,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"o\": 32,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 192,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"o\": 64,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 192,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"o\": 64,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 192,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"o\": 64,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 192,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_3-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"o\": 64,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 384,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 384,\n        \"o\": 96,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 384,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 384,\n        \"o\": 96,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 384,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 384,\n        \"o\": 96,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 576,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 576,\n        \"o\": 160,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 160,\n        \"o\": 576,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 576,\n        \"o\": 160,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 160,\n        \"o\": 576,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 576,\n        \"o\": 160,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv8_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 160,\n        \"o\": 960,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv8_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 960,\n        \"o\": 320,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv9\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 320,\n        \"o\": 1280,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_1.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 3,\n                \"o\": 32,\n                \"r\": 112,\n                \"c\": 112,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_10.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_1-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 32,\n                \"o\": 144,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_11.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_3-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 144,\n                \"o\": 32,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_12.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_1-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 32,\n                \"o\": 144,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_13.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_3-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 144,\n                \"o\": 32,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_14.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_1-0\",\n            \"tags\": [\n                \"conv\",\n                \"maxpool_2\"\n            ],\n            \"params\": {\n                \"i\": 32,\n                \"o\": 192,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_15.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_3-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 192,\n                \"o\": 64,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_16.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_1-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 64,\n                \"o\": 192,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_17.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_3-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 192,\n                \"o\": 64,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_18.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_1-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 64,\n                \"o\": 192,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_19.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_3-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 192,\n                \"o\": 64,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_2.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv2_1-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 32,\n                \"o\": 32,\n                \"r\": 112,\n                \"c\": 112,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_20.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_1-3\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 64,\n                \"o\": 192,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_21.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_3-3\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 192,\n                \"o\": 64,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_22.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv6_1-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 64,\n                \"o\": 384,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_23.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv6_3-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 384,\n                \"o\": 96,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_24.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv6_1-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 96,\n                \"o\": 384,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_25.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv6_3-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 384,\n                \"o\": 96,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_26.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv6_1-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 96,\n                \"o\": 384,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_27.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv6_3-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 384,\n                \"o\": 96,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_28.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv7_1-0\",\n            \"tags\": [\n                \"conv\",\n                \"maxpool_2\"\n            ],\n            \"params\": {\n                \"i\": 96,\n                \"o\": 576,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_29.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv7_3-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 576,\n                \"o\": 160,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_3.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv2_3-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 32,\n                \"o\": 16,\n                \"r\": 112,\n                \"c\": 112,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_30.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv7_1-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 160,\n                \"o\": 576,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_31.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv7_3-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 576,\n                \"o\": 160,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_32.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv7_1-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 160,\n                \"o\": 576,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_33.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv7_3-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 576,\n                \"o\": 160,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_34.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv8_1-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 160,\n                \"o\": 960,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_35.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv8_3-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 960,\n                \"o\": 320,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_36.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv9\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 320,\n                \"o\": 1280,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_4.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv3_1-0\",\n            \"tags\": [\n                \"conv\",\n                \"maxpool_2\"\n            ],\n            \"params\": {\n                \"i\": 16,\n                \"o\": 96,\n                \"r\": 112,\n                \"c\": 112,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_47.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv3_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 24,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 24,\n        \"o\": 96,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 24,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 24,\n        \"o\": 144,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_5.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv3_3-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 96,\n                \"o\": 24,\n                \"r\": 56,\n                \"c\": 56,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_6.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv3_1-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 24,\n                \"o\": 96,\n                \"r\": 56,\n                \"c\": 56,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_7.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv3_3-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 96,\n                \"o\": 24,\n                \"r\": 56,\n                \"c\": 56,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_8.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_1-0\",\n            \"tags\": [\n                \"conv\",\n                \"maxpool_2\"\n            ],\n            \"params\": {\n                \"i\": 24,\n                \"o\": 144,\n                \"r\": 56,\n                \"c\": 56,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_9.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_3-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 144,\n                \"o\": 32,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_complete.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 3,\n        \"o\": 32,\n        \"r\": 224,\n        \"c\": 224,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 32,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 16,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 16,\n        \"o\": 96,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 16,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 16,\n        \"o\": 96,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 24,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 24,\n        \"o\": 144,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"o\": 24,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 24,\n        \"o\": 144,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"o\": 24,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 24,\n        \"o\": 144,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"o\": 32,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 192,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"o\": 32,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 192,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"o\": 32,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 192,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"o\": 32,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 192,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_3-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"o\": 64,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 384,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 384,\n        \"o\": 64,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 384,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 384,\n        \"o\": 64,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 384,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 384,\n        \"o\": 96,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 576,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 576,\n        \"o\": 96,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 576,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 576,\n        \"o\": 96,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 576,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 576,\n        \"o\": 160,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv8_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 160,\n        \"o\": 960,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv8_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 960,\n        \"o\": 320,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv9\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 320,\n        \"o\": 1280,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv10\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1280,\n        \"o\": 1000,\n        \"r\": 1,\n        \"c\": 1,\n        \"p\": 1,\n        \"q\": 1\n      }\n    }\n  ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_conv3_1_0.json",
    "content": "{\n  \"workloads\": [    \n    {\n      \"name\": \"conv3_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 16,\n        \"o\": 96,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    }    \n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_first.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv3_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 16,\n        \"o\": 96,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 24,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 16,\n        \"o\": 96,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 24,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_first1.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv3_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 16,\n        \"o\": 96,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },    \n    {\n      \"name\": \"conv3_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 16,\n        \"o\": 96,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }    \n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_first2.json",
    "content": "{\n  \"workloads\": [    \n    {\n      \"name\": \"conv3_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 24,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },    \n    {\n      \"name\": \"conv3_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 24,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_half.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv4_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 24,\n        \"o\": 144,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"o\": 32,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 24,\n        \"o\": 144,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"o\": 32,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 24,\n        \"o\": 144,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"o\": 32,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }  \n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_img2col.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"j\": 50176,\n        \"k\": 27\n      }\n    },\n    {\n      \"name\": \"conv2_1-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"j\": 12544,\n        \"k\": 288\n      }\n    },\n    {\n      \"name\": \"conv2_3-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 16,\n        \"j\": 12544,\n        \"k\": 288\n      }\n    },\n    {\n      \"name\": \"conv3_1-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"j\": 12544,\n        \"k\": 144\n      }\n    },\n    {\n      \"name\": \"conv3_3-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 24,\n        \"j\": 3136,\n        \"k\": 864\n      }\n    },\n    {\n      \"name\": \"conv3_1-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"j\": 12544,\n        \"k\": 144\n      }\n    },\n    {\n      \"name\": \"conv3_3-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 24,\n        \"j\": 3136,\n        \"k\": 864\n      }\n    },\n    {\n      \"name\": \"conv4_1-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"j\": 3136,\n        \"k\": 216\n      }\n    },\n    {\n      \"name\": \"conv4_3-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"j\": 784,\n        \"k\": 1296\n      }\n    },\n    {\n      \"name\": \"conv4_1-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"j\": 3136,\n        \"k\": 216\n      }\n    },\n    {\n      \"name\": \"conv4_3-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"j\": 784,\n        \"k\": 1296\n      }\n    },\n    {\n      \"name\": \"conv4_1-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"j\": 3136,\n        \"k\": 216\n      }\n    },\n    {\n      \"name\": \"conv4_3-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"j\": 784,\n        \"k\": 1296\n      }\n    },\n    {\n      \"name\": \"conv5_1-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"j\": 784,\n        \"k\": 288\n      }\n    },\n    {\n      \"name\": \"conv5_3-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"j\": 196,\n        \"k\": 1728\n      }\n    },\n    {\n      \"name\": \"conv5_1-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"j\": 784,\n        \"k\": 288\n      }\n    },\n    {\n      \"name\": \"conv5_3-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"j\": 196,\n        \"k\": 1728\n      }\n    },\n    {\n      \"name\": \"conv5_1-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"j\": 784,\n        \"k\": 288\n      }\n    },\n    {\n      \"name\": \"conv5_3-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"j\": 196,\n        \"k\": 1728\n      }\n    },\n    {\n      \"name\": \"conv5_1-3\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"j\": 784,\n        \"k\": 288\n      }\n    },\n    {\n      \"name\": \"conv5_3-3\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"j\": 196,\n        \"k\": 1728\n      }\n    },\n    {\n      \"name\": \"conv6_1-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 384,\n        \"j\": 196,\n        \"k\": 576\n      }\n    },\n    {\n      \"name\": \"conv6_3-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"j\": 196,\n        \"k\": 3456\n      }\n    },\n    {\n      \"name\": \"conv6_1-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 384,\n        \"j\": 196,\n        \"k\": 576\n      }\n    },\n    {\n      \"name\": \"conv6_3-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"j\": 196,\n        \"k\": 3456\n      }\n    },\n    {\n      \"name\": \"conv6_1-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 384,\n        \"j\": 196,\n        \"k\": 576\n      }\n    },\n    {\n      \"name\": \"conv6_3-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"j\": 196,\n        \"k\": 3456\n      }\n    },\n    {\n      \"name\": \"conv7_1-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 576,\n        \"j\": 196,\n        \"k\": 864\n      }\n    },\n    {\n      \"name\": \"conv7_3-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 160,\n        \"j\": 49,\n        \"k\": 5184\n      }\n    },\n    {\n      \"name\": \"conv7_1-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 576,\n        \"j\": 196,\n        \"k\": 864\n      }\n    },\n    {\n      \"name\": \"conv7_3-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 160,\n        \"j\": 49,\n        \"k\": 5184\n      }\n    },\n    {\n      \"name\": \"conv7_1-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 576,\n        \"j\": 196,\n        \"k\": 864\n      }\n    },\n    {\n      \"name\": \"conv7_3-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 160,\n        \"j\": 49,\n        \"k\": 5184\n      }\n    },\n    {\n      \"name\": \"conv8_1-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 960,\n        \"j\": 49,\n        \"k\": 1440\n      }\n    },\n    {\n      \"name\": \"conv8_3-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 320,\n        \"j\": 49,\n        \"k\": 8640\n      }\n    },\n    {\n      \"name\": \"conv9\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 1280,\n        \"j\": 49,\n        \"k\": 2880\n      }\n    }\n  ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_no_first.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv2_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 32,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 16,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 16,\n        \"o\": 96,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 24,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 24,\n        \"o\": 96,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 24,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 24,\n        \"o\": 144,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"o\": 32,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 144,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"o\": 32,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 144,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"o\": 32,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 192,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"o\": 64,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 192,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"o\": 64,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 192,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"o\": 64,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 192,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_3-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"o\": 64,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 384,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 384,\n        \"o\": 96,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 384,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 384,\n        \"o\": 96,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 384,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 384,\n        \"o\": 96,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 576,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 576,\n        \"o\": 160,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 160,\n        \"o\": 576,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 576,\n        \"o\": 160,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 160,\n        \"o\": 576,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 576,\n        \"o\": 160,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv8_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 160,\n        \"o\": 960,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv8_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 960,\n        \"o\": 320,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv9\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 320,\n        \"o\": 1280,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_original.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 3,\n        \"o\": 32,\n        \"r\": 224,\n        \"c\": 224,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 32,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 16,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 16,\n        \"o\": 96,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 24,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 24,\n        \"o\": 96,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 24,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 24,\n        \"o\": 144,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"o\": 32,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 144,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"o\": 32,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 144,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 144,\n        \"o\": 32,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 32,\n        \"o\": 192,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"o\": 64,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 192,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"o\": 64,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 192,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"o\": 64,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 192,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_3-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 192,\n        \"o\": 64,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 384,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 384,\n        \"o\": 96,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 384,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 384,\n        \"o\": 96,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 384,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv6_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 384,\n        \"o\": 96,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_1-0\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 96,\n        \"o\": 576,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 576,\n        \"o\": 160,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 160,\n        \"o\": 576,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 576,\n        \"o\": 160,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 160,\n        \"o\": 576,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv7_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 576,\n        \"o\": 160,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv8_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 160,\n        \"o\": 960,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv8_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 960,\n        \"o\": 320,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv9\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 320,\n        \"o\": 1280,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    }\n  ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_test.json",
    "content": "{\n    \"workloads\": [      \n      {\n        \"name\": \"conv2_1-0\",\n        \"tags\": [\n          \"conv\"\n        ],\n        \"params\": {\n          \"i\": 32,\n          \"o\": 32,\n          \"r\": 112,\n          \"c\": 112,\n          \"p\": 1,\n          \"q\": 1\n        }\n      },\n      {\n        \"name\": \"conv2_3-0\",\n        \"tags\": [\n          \"conv\"\n        ],\n        \"params\": {\n          \"i\": 32,\n          \"o\": 16,\n          \"r\": 112,\n          \"c\": 112,\n          \"p\": 1,\n          \"q\": 1\n        }\n      },\n      {\n        \"name\": \"conv3_1-0\",\n        \"tags\": [\n          \"conv\",\n          \"maxpool_2\"\n        ],\n        \"params\": {\n          \"i\": 16,\n          \"o\": 96,\n          \"r\": 112,\n          \"c\": 112,\n          \"p\": 1,\n          \"q\": 1\n        }\n      },\n      {\n        \"name\": \"conv3_3-0\",\n        \"tags\": [\n          \"conv\"\n        ],\n        \"params\": {\n          \"i\": 96,\n          \"o\": 24,\n          \"r\": 56,\n          \"c\": 56,\n          \"p\": 1,\n          \"q\": 1\n        }\n      },\n      {\n        \"name\": \"conv3_1-1\",\n        \"tags\": [\n          \"conv\"\n        ],\n        \"params\": {\n          \"i\": 24,\n          \"o\": 96,\n          \"r\": 56,\n          \"c\": 56,\n          \"p\": 1,\n          \"q\": 1\n        }\n      },\n      {\n        \"name\": \"conv3_3-1\",\n        \"tags\": [\n          \"conv\"\n        ],\n        \"params\": {\n          \"i\": 96,\n          \"o\": 24,\n          \"r\": 56,\n          \"c\": 56,\n          \"p\": 1,\n          \"q\": 1\n        }\n      },\n      {\n        \"name\": \"conv4_1-0\",\n        \"tags\": [\n          \"conv\",\n          \"maxpool_2\"\n        ],\n        \"params\": {\n          \"i\": 24,\n          \"o\": 144,\n          \"r\": 56,\n          \"c\": 56,\n          \"p\": 1,\n          \"q\": 1\n        }\n      },\n      {\n        \"name\": \"conv4_3-0\",\n        \"tags\": [\n          \"conv\"\n        ],\n        \"params\": {\n          \"i\": 144,\n          \"o\": 32,\n          \"r\": 28,\n          \"c\": 28,\n          \"p\": 1,\n          \"q\": 1\n        }\n      }\n    ]\n  }\n  "
  },
  {
    "path": "autosa_scripts/odyssey/workload/mobilenetv2_test_single.json",
    "content": "{\n    \"workloads\": [          \n      {\n        \"name\": \"conv3_1-0\",\n        \"tags\": [\n          \"conv\",\n          \"maxpool_2\"\n        ],\n        \"params\": {\n          \"i\": 16,\n          \"o\": 96,\n          \"r\": 112,\n          \"c\": 112,\n          \"p\": 1,\n          \"q\": 1\n        }\n      }\n    ]\n  }\n  "
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet152.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_4\"\n      ],\n      \"params\": {\n        \"i\": 3,\n        \"o\": 64,\n        \"r\": 224,\n        \"c\": 224,\n        \"p\": 7,\n        \"q\": 7\n      }\n    },\n    {\n      \"name\": \"conv2_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 256,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 256,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2_3-2\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 256,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-5\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-5\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-5\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-6\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-6\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-6\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-7\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-7\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-7\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-5\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-5\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-5\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-6\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-6\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-6\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-7\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-7\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-7\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-8\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-8\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-8\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-9\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-9\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-9\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-10\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-10\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-10\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-11\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-11\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-11\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-12\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-12\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-12\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-13\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-13\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-13\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-14\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-14\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-14\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-15\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-15\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-15\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-16\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-16\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-16\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-17\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-17\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-17\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-18\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-18\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-18\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-19\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-19\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-19\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-20\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-20\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-20\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-21\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-21\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-21\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-22\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-22\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-22\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-23\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-23\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-23\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-24\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-24\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-24\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-25\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-25\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-25\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-26\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-26\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-26\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-27\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-27\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-27\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-28\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-28\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-28\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-29\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-29\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-29\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-30\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-30\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-30\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-31\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-31\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-31\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-32\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-32\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-32\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-33\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-33\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-33\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-34\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-34\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-34\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-35\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-35\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-35\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 2048,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 2048,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    }\n  ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 3,\n        \"o\": 64,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 7,\n        \"q\": 7\n      }\n    },\n    {\n      \"name\": \"conv2_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 256,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 256,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2_3-2\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 256,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-3\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-5\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-5\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-5\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 2048,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 2048,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_1.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv1\",\n            \"tags\": [\n                \"conv\",\n                \"maxpool_2\"\n            ],\n            \"params\": {\n                \"i\": 3,\n                \"o\": 64,\n                \"r\": 112,\n                \"c\": 112,\n                \"p\": 7,\n                \"q\": 7\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_10.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv2_3-2\",\n            \"tags\": [\n                \"conv\",\n                \"maxpool_2\"\n            ],\n            \"params\": {\n                \"i\": 64,\n                \"o\": 256,\n                \"r\": 56,\n                \"c\": 56,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_11.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv3_1-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 256,\n                \"o\": 128,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_12.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv3_2-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 128,\n                \"o\": 128,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_13.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv3_3-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 128,\n                \"o\": 512,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_14.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv3_1-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 512,\n                \"o\": 128,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_15.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv3_2-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 128,\n                \"o\": 128,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_16.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv3_3-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 128,\n                \"o\": 512,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_17.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv3_1-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 512,\n                \"o\": 128,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_18.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv3_2-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 128,\n                \"o\": 128,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_19.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv3_3-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 128,\n                \"o\": 512,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_2.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv2_1-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 64,\n                \"o\": 64,\n                \"r\": 56,\n                \"c\": 56,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_20.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv3_1-3\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 512,\n                \"o\": 128,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_21.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv3_2-3\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 128,\n                \"o\": 128,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_22.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv3_3-3\",\n            \"tags\": [\n                \"conv\",\n                \"maxpool_2\"\n            ],\n            \"params\": {\n                \"i\": 128,\n                \"o\": 512,\n                \"r\": 28,\n                \"c\": 28,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_23.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_1-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 512,\n                \"o\": 256,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_24.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_2-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 256,\n                \"o\": 256,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_25.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_3-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 256,\n                \"o\": 1024,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_26.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_1-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 1024,\n                \"o\": 256,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_27.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_2-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 256,\n                \"o\": 256,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_28.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_3-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 256,\n                \"o\": 1024,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_29.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_1-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 1024,\n                \"o\": 256,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_3.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv2_2-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 64,\n                \"o\": 64,\n                \"r\": 56,\n                \"c\": 56,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_30.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_2-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 256,\n                \"o\": 256,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_31.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_3-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 256,\n                \"o\": 1024,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_32.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_1-3\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 1024,\n                \"o\": 256,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_33.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_2-3\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 256,\n                \"o\": 256,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_34.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_3-3\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 256,\n                \"o\": 1024,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_35.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_1-4\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 1024,\n                \"o\": 256,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_36.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_2-4\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 256,\n                \"o\": 256,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_37.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_3-4\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 256,\n                \"o\": 1024,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_38.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_1-5\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 1024,\n                \"o\": 256,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_39.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_2-5\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 256,\n                \"o\": 256,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_4.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv2_3-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 64,\n                \"o\": 256,\n                \"r\": 56,\n                \"c\": 56,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_40.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv4_3-5\",\n            \"tags\": [\n                \"conv\",\n                \"maxpool_2\"\n            ],\n            \"params\": {\n                \"i\": 256,\n                \"o\": 1024,\n                \"r\": 14,\n                \"c\": 14,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_41.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_1-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 1024,\n                \"o\": 512,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_42.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_2-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 512,\n                \"o\": 512,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_43.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_3-0\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 512,\n                \"o\": 2048,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_44.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_1-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 2048,\n                \"o\": 512,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_45.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_2-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 512,\n                \"o\": 512,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_46.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_3-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 512,\n                \"o\": 2048,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_47.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_1-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 2048,\n                \"o\": 512,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_48.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_2-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 512,\n                \"o\": 512,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_49.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv5_3-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 512,\n                \"o\": 2048,\n                \"r\": 7,\n                \"c\": 7,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_5.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv2_1-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 256,\n                \"o\": 64,\n                \"r\": 56,\n                \"c\": 56,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_6.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv2_2-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 64,\n                \"o\": 64,\n                \"r\": 56,\n                \"c\": 56,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_7.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv2_3-1\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 64,\n                \"o\": 256,\n                \"r\": 56,\n                \"c\": 56,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_8.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv2_1-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 256,\n                \"o\": 64,\n                \"r\": 56,\n                \"c\": 56,\n                \"p\": 1,\n                \"q\": 1\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_9.json",
    "content": "{\n    \"workloads\": [\n        {\n            \"name\": \"conv2_2-2\",\n            \"tags\": [\n                \"conv\"\n            ],\n            \"params\": {\n                \"i\": 64,\n                \"o\": 64,\n                \"r\": 56,\n                \"c\": 56,\n                \"p\": 3,\n                \"q\": 3\n            }\n        }\n    ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_batch4.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_4\"\n      ],\n      \"params\": {\n        \"i\": 3,\n        \"o\": 64,\n        \"r\": 448,\n        \"c\": 448,\n        \"p\": 7,\n        \"q\": 7\n      }\n    },\n    {\n      \"name\": \"conv2_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 256,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 256,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2_3-2\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 256,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 128,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 128,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 128,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 128,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-3\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 256,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 256,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 256,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 256,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 256,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-5\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 256,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-5\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-5\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 512,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 512,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 512,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_conv5_1.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv5_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 2048,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    }\n  ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_img2col.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"j\": 50176,\n        \"k\": 147\n      }\n    },\n    {\n      \"name\": \"conv2_1-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"j\": 3136,\n        \"k\": 64\n      }\n    },\n    {\n      \"name\": \"conv2_2-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"j\": 3136,\n        \"k\": 576\n      }\n    },\n    {\n      \"name\": \"conv2_3-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 3136,\n        \"k\": 64\n      }\n    },\n    {\n      \"name\": \"conv2_1-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"j\": 3136,\n        \"k\": 64\n      }\n    },\n    {\n      \"name\": \"conv2_2-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"j\": 3136,\n        \"k\": 576\n      }\n    },\n    {\n      \"name\": \"conv2_3-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 3136,\n        \"k\": 64\n      }\n    },\n    {\n      \"name\": \"conv2_1-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"j\": 3136,\n        \"k\": 64\n      }\n    },\n    {\n      \"name\": \"conv2_2-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"j\": 3136,\n        \"k\": 576\n      }\n    },\n    {\n      \"name\": \"conv2_3-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 3136,\n        \"k\": 64\n      }\n    },\n    {\n      \"name\": \"conv3_1-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"j\": 784,\n        \"k\": 256\n      }\n    },\n    {\n      \"name\": \"conv3_2-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"j\": 784,\n        \"k\": 1152\n      }\n    },\n    {\n      \"name\": \"conv3_3-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"j\": 784,\n        \"k\": 128\n      }\n    },\n    {\n      \"name\": \"conv3_1-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"j\": 784,\n        \"k\": 256\n      }\n    },\n    {\n      \"name\": \"conv3_2-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"j\": 784,\n        \"k\": 1152\n      }\n    },\n    {\n      \"name\": \"conv3_3-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"j\": 784,\n        \"k\": 128\n      }\n    },\n    {\n      \"name\": \"conv3_1-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"j\": 784,\n        \"k\": 256\n      }\n    },\n    {\n      \"name\": \"conv3_2-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"j\": 784,\n        \"k\": 1152\n      }\n    },\n    {\n      \"name\": \"conv3_3-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"j\": 784,\n        \"k\": 128\n      }\n    },\n    {\n      \"name\": \"conv3_1-3\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"j\": 784,\n        \"k\": 256\n      }\n    },\n    {\n      \"name\": \"conv3_2-3\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"j\": 784,\n        \"k\": 1152\n      }\n    },\n    {\n      \"name\": \"conv3_3-3\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"j\": 784,\n        \"k\": 128\n      }\n    },\n    {\n      \"name\": \"conv4_1-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 196,\n        \"k\": 512\n      }\n    },\n    {\n      \"name\": \"conv4_2-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 196,\n        \"k\": 2304\n      }\n    },\n    {\n      \"name\": \"conv4_3-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"j\": 196,\n        \"k\": 256\n      }\n    },\n    {\n      \"name\": \"conv4_1-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 196,\n        \"k\": 512\n      }\n    },\n    {\n      \"name\": \"conv4_2-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 196,\n        \"k\": 2304\n      }\n    },\n    {\n      \"name\": \"conv4_3-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"j\": 196,\n        \"k\": 256\n      }\n    },\n    {\n      \"name\": \"conv4_1-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 196,\n        \"k\": 512\n      }\n    },\n    {\n      \"name\": \"conv4_2-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 196,\n        \"k\": 2304\n      }\n    },\n    {\n      \"name\": \"conv4_3-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"j\": 196,\n        \"k\": 256\n      }\n    },\n    {\n      \"name\": \"conv4_1-3\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 196,\n        \"k\": 512\n      }\n    },\n    {\n      \"name\": \"conv4_2-3\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 196,\n        \"k\": 2304\n      }\n    },\n    {\n      \"name\": \"conv4_3-3\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"j\": 196,\n        \"k\": 256\n      }\n    },\n    {\n      \"name\": \"conv4_1-4\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 196,\n        \"k\": 512\n      }\n    },\n    {\n      \"name\": \"conv4_2-4\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 196,\n        \"k\": 2304\n      }\n    },\n    {\n      \"name\": \"conv4_3-4\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"j\": 196,\n        \"k\": 256\n      }\n    },\n    {\n      \"name\": \"conv4_1-5\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 196,\n        \"k\": 512\n      }\n    },\n    {\n      \"name\": \"conv4_2-5\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 196,\n        \"k\": 2304\n      }\n    },\n    {\n      \"name\": \"conv4_3-5\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"j\": 196,\n        \"k\": 256\n      }\n    },\n    {\n      \"name\": \"conv5_1-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"j\": 49,\n        \"k\": 1024\n      }\n    },\n    {\n      \"name\": \"conv5_2-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"j\": 49,\n        \"k\": 4608\n      }\n    },\n    {\n      \"name\": \"conv5_3-0\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 2048,\n        \"j\": 49,\n        \"k\": 512\n      }\n    },\n    {\n      \"name\": \"conv5_1-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"j\": 49,\n        \"k\": 1024\n      }\n    },\n    {\n      \"name\": \"conv5_2-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"j\": 49,\n        \"k\": 4608\n      }\n    },\n    {\n      \"name\": \"conv5_3-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 2048,\n        \"j\": 49,\n        \"k\": 512\n      }\n    },\n    {\n      \"name\": \"conv5_1-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"j\": 49,\n        \"k\": 1024\n      }\n    },\n    {\n      \"name\": \"conv5_2-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"j\": 49,\n        \"k\": 4608\n      }\n    },\n    {\n      \"name\": \"conv5_3-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 2048,\n        \"j\": 49,\n        \"k\": 512\n      }\n    }\n  ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_last.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv5_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_last2.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv4_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-5\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-5\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-5\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/resnet50_original.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_4\"\n      ],\n      \"params\": {\n        \"i\": 3,\n        \"o\": 64,\n        \"r\": 224,\n        \"c\": 224,\n        \"p\": 7,\n        \"q\": 7\n      }\n    },\n    {\n      \"name\": \"conv2_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 256,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 256,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv2_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2_3-2\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 256,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_1-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv3_2-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3_3-3\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-3\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-4\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_1-5\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv4_2-5\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4_3-5\",\n      \"tags\": [\n        \"conv\",\n        \"maxpool_2\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 1024,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 1024,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-0\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 2048,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-1\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_1-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 2048,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    },\n    {\n      \"name\": \"conv5_2-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5_3-2\",\n      \"tags\": [\n        \"conv\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 2048,\n        \"r\": 7,\n        \"c\": 7,\n        \"p\": 1,\n        \"q\": 1\n      }\n    }\n  ]\n}"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16-2-img2col.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1-1\",\n      \"tags\": [\"gemm\", \"img2col\"],\n      \"params\": {\n        \"p0\": 64,\n        \"p1\": 50176,\n        \"p2\": 27\n      }\n    },\n    {\n      \"name\": \"conv1-2\",\n      \"tags\": [\"gemm\", \"img2col\"],\n      \"params\": {\n        \"p0\": 64,\n        \"p1\": 50176,\n        \"p2\": 576\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16-3.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1-1\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 3,\n        \"o\": 64,\n        \"r\": 224,\n        \"c\": 224,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv1-2\",\n      \"tags\": [\"conv\", \"maxpool_2\"],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 224,\n        \"c\": 224,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2-1\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 128,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16-4.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1-1\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 3,\n        \"o\": 64,\n        \"r\": 224,\n        \"c\": 224,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv1-2\",\n      \"tags\": [\"conv\", \"maxpool_2\"],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 224,\n        \"c\": 224,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2-1\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 128,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2-2\",\n      \"tags\": [\"conv\", \"maxpool_2\"],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1-1\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 3,\n        \"o\": 64,\n        \"r\": 224,\n        \"c\": 224,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv1-2\",\n      \"tags\": [\"conv\", \"maxpool_2\"],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 224,\n        \"c\": 224,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2-1\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 128,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv2-2\",\n      \"tags\": [\"conv\", \"maxpool_2\"],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3-1\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 256,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3-2\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv3-3\",\n      \"tags\": [\"conv\", \"maxpool_2\"],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4-1\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4-2\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv4-3\",\n      \"tags\": [\"conv\", \"maxpool_2\"],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5-1\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5-2\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    },\n    {\n      \"name\": \"conv5-3\",\n      \"tags\": [\"conv\", \"maxpool_2\"],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16_1.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1-1\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 3,\n        \"o\": 64,\n        \"r\": 224,\n        \"c\": 224,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }  \n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16_10.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv4-3\",\n      \"tags\": [\"conv\", \"maxpool_2\"],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16_11.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv5-1\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16_12.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv5-2\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16_13.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv5-3\",\n      \"tags\": [\"conv\", \"maxpool_2\"],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 14,\n        \"c\": 14,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16_2.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1-2\",\n      \"tags\": [\"conv\", \"maxpool_2\"],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 64,\n        \"r\": 224,\n        \"c\": 224,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16_3.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv2-1\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 64,\n        \"o\": 128,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16_4.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv2-2\",\n      \"tags\": [\"conv\", \"maxpool_2\"],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 128,\n        \"r\": 112,\n        \"c\": 112,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16_5.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv3-1\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 128,\n        \"o\": 256,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16_6.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv3-2\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16_7.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv3-3\",\n      \"tags\": [\"conv\", \"maxpool_2\"],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 256,\n        \"r\": 56,\n        \"c\": 56,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16_8.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv4-1\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 256,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16_9.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv4-2\",\n      \"tags\": [\"conv\"],\n      \"params\": {\n        \"i\": 512,\n        \"o\": 512,\n        \"r\": 28,\n        \"c\": 28,\n        \"p\": 3,\n        \"q\": 3\n      }\n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/odyssey/workload/vgg16_img2col.json",
    "content": "{\n  \"workloads\": [\n    {\n      \"name\": \"conv1-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"j\": 50176,\n        \"k\": 27\n      }\n    },\n    {\n      \"name\": \"conv1-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 64,\n        \"j\": 50176,\n        \"k\": 576\n      }\n    },\n    {\n      \"name\": \"conv2-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"j\": 12544,\n        \"k\": 576\n      }\n    },\n    {\n      \"name\": \"conv2-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 128,\n        \"j\": 12544,\n        \"k\": 1152\n      }\n    },\n    {\n      \"name\": \"conv3-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 3136,\n        \"k\": 1152\n      }\n    },\n    {\n      \"name\": \"conv3-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 3136,\n        \"k\": 2304\n      }\n    },\n    {\n      \"name\": \"conv3-3\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 256,\n        \"j\": 3136,\n        \"k\": 2304\n      }\n    },\n    {\n      \"name\": \"conv4-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"j\": 784,\n        \"k\": 2304\n      }\n    },\n    {\n      \"name\": \"conv4-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"j\": 784,\n        \"k\": 4608\n      }\n    },\n    {\n      \"name\": \"conv4-3\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"j\": 784,\n        \"k\": 4608\n      }\n    },\n    {\n      \"name\": \"conv5-1\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"j\": 196,\n        \"k\": 4608\n      }\n    },\n    {\n      \"name\": \"conv5-2\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"j\": 196,\n        \"k\": 4608\n      }\n    },\n    {\n      \"name\": \"conv5-3\",\n      \"tags\": [\n        \"gemm\"\n      ],\n      \"params\": {\n        \"i\": 512,\n        \"j\": 196,\n        \"k\": 4608\n      }\n    }\n  ]\n}"
  },
  {
    "path": "autosa_scripts/optimizer.py",
    "content": "#!/usr/bin/env python3\n\nimport sys\nimport argparse\nimport re\nimport os\nimport json\nimport subprocess\nimport itertools\nimport numpy as np\nimport pandas as pd\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn import metrics\nimport joblib\nimport xml.etree.ElementTree as ET\nimport time\nimport multiprocessing\nimport random\nfrom statistics import mean\nimport copy\nimport logging\nimport functools\nimport shutil\nimport datetime\nfrom pathlib import Path\n\nimport optimizer_prune as opt_prune\nimport resource_model as res_model\nimport latency_model as lat_model\n\ndef timer(func):\n    \"\"\" Print the runtime of the decorated function.\n\n    \"\"\"\n    @functools.wraps(func)\n    def wrapper_timer(*args, **kwargs):\n        start_time = time.perf_counter()\n        value = func(*args, **kwargs)\n        end_time = time.perf_counter()\n        run_time = end_time - start_time\n        print(\n            f'[AutoSA-Optimizer {datetime.datetime.now().strftime(\"%Y-%m-%d %H:%M:%S\")}] INFO: Finished function: {func.__name__} in {run_time:.4f} secs')\n        return value\n    return wrapper_timer\n\ndef generate_loop_candidates(loops, config, stage):\n    \"\"\" Generate candidate loops\n\n    This function samples each loop dimension given the sample numbers set in\n    the config, then builds a Cartesian product of all sampled loops to generate\n    all possible loop combinations to search.\n\n    Due to the current implementation limitation, we have the following limitation\n    on the loop candidates:\n    - Array partitionining: the loop candidates should be left-exclusive and right-inclusive.\n      This prevents generating single PEs along certain dimension which causes\n      codegen breakdown.\n    - Latency hiding: the loop candidates should be left-inclusive and right-exclusive.\n      Similarly, making it right-exclusive to avoid possible single PE case.\n    - SIMD, L2 array partitioning: both left- and right-inclusive\n    Note: for both latency hiding and SIMD, if we choose tiling factor as 1, the\n    corresponding stage will be skipeed in AutoSA.\n\n    If the sample mode is set in exhausive, we will search all divisible factors of\n    the loop bound.\n    If the sample mode is set in log, we will generate samples of exponentials of 2.\n    If the sample mode is set in linear, we will generate 'n' linear samples.\n    If the sample mode is set in random, we will generate 'n' random samples.\n\n    Parameters\n    ----------\n    loops: list\n        A list of loop upperbounds\n    config: dict\n        Global configuration\n    stage: str\n        Optimization stage name\n    \"\"\"\n    if stage not in [\n        'space_time',\n        'array_part',\n        'array_part_L2',\n        'latency_hiding',\n        'SIMD_vectorization']:\n        raise NameError(f'Stage {stage} is not defined.')\n\n    sample_mode = config['setting'][config['mode']]['sample'][stage]['mode']\n    sample_n = config['setting'][config['mode']]['sample'][stage]['n']\n    sample_loop_limit = config['setting'][config['mode']]['sample'][stage]['loop_limit']\n\n    l_inclusive = 1\n    r_inclusive = 1\n    if stage == 'array_part':\n        l_inclusive = 0\n    elif stage == 'latency_hiding':\n        r_inclusive = 0\n\n    # Sample each loop dim\n    sample_list = []\n    for loop in loops:\n        if sample_mode == 'log':\n            ub = int(\n                np.floor(\n                    np.log2(\n                        loop if sample_loop_limit == -1 else min(loop, sample_loop_limit))))\n            lb = 0\n        else:\n            ub = loop if sample_loop_limit == -1 else min(loop, sample_loop_limit)\n            lb = 1\n        if not r_inclusive:\n            ub = ub - 1\n        if not l_inclusive:\n            lb = lb + 1\n        if sample_mode == 'exhaustive':\n            samples = [s for s in range(lb, ub + 1) if loop % s == 0]\n        elif sample_mode == 'log':\n            samples = [\n                np.power(\n                    2,\n                    int(s)) for s in range(\n                    lb,\n                    ub +\n                    1) if loop %\n                np.power(\n                    2,\n                    int(s)) == 0]\n        elif sample_mode == 'linear':\n            samples = [s for s in range(lb, ub + 1) if loop % s == 0]\n            # Uniformly sample 'n' factors\n            stride = 1 if len(samples) <= sample_n else int(\n                len(samples) / sample_n)\n            samples = [samples[i] for i in range(0, len(samples), stride)]\n        elif sample_mode == 'random':\n            samples = [s for s in range(lb, ub + 1) if loop % s == 0]\n            # Randomly sample 'n' factors\n            if sample_n < len(samples):\n                samples = random.sample(samples, sample_n)\n        else:\n            raise NameError(f'Sample mode {sample_mode} is not defined.')\n        sample_list.append(samples)\n\n    # Generate Cartesian product\n    sample_loops = list(itertools.product(*sample_list))\n    sample_loops = [list(tup) for tup in sample_loops]\n\n    return sample_loops\n\ndef multi_process(loops, func, config):\n    \"\"\" Perform multi-processing for function \"func\".\n\n    Parameters\n    ----------\n    loops:\n        A list of loop candidates.\n    func:\n        The function to be executed by each process.\n    config: dict\n        Global configuration.\n    \"\"\"\n    num_proc = min(multiprocessing.cpu_count(),\n                   config['setting'][config['mode']]['multiprocess']['n_job'])\n    # Split the loops into chunks\n    chunk_size = int(np.ceil(float(len(loops)) / num_proc))\n    loop_chunks = [loops[i: i + min(chunk_size, len(loops) - i)]\n                   for i in range(0, len(loops), chunk_size)]\n    pool = multiprocessing.Pool(processes=num_proc)\n    # Allocate new work spaces for each forked process\n    for i in range(num_proc):\n        if i == 0:\n            continue\n        prj_dir = config['work_dir'][:-1] + str(i)\n        if os.path.exists(prj_dir):\n            continue\n        os.mkdir(f'{prj_dir}')\n        os.mkdir(f'{prj_dir}/output')\n        os.mkdir(f'{prj_dir}/output/latency_est')\n        os.mkdir(f'{prj_dir}/output/resource_est')\n        os.mkdir(f'{prj_dir}/output/src')\n        ret = execute_sys_cmd(\n            f'cp {config[\"work_dir\"]}/autosa_config.json {prj_dir}/', config)\n\n    config['logger'].info(f'Forking {num_proc} processes...')\n    verbose = config['verbose']\n    stdout = config['stdout']\n    logger = config['logger']\n    config['verbose'] = 0\n    config['stdout'] = subprocess.DEVNULL\n    config['logger'] = None\n    n_designs = config['monitor']['n_designs']\n    config['monitor']['n_designs'] = 0\n\n    # Execute the function\n    results = pool.starmap(func, [(loop_chunks[i], copy.deepcopy(config),\n        config['work_dir'][:-1] + str(i), 1) for i in range(len(loop_chunks))])\n    # Aggregate the monitor information\n    for result in results:\n        n_designs += result['monitor']['n_designs']\n    config['monitor']['n_designs'] = n_designs\n\n    if config['mode'] == 'search':\n        # Aggregate the results\n        config['search_results'] = merge_search_results(\n            [result['search_results'] for result in results],\n            config['setting']['search']['metric'],\n            config['setting']['search']['log']['n_record'],\n            config['hw_info'])\n\n    config['verbose'] = verbose\n    config['stdout'] = stdout\n    config['logger'] = logger\n\n    return\n\ndef cmp_designs(design1, design2, metric):\n    \"\"\" Compare two designs.\n\n    Parameters\n    ----------\n    design1: dict\n        Design 1.\n    design2: dict\n        Design 2.\n    metric: str\n        Metric to evaluate the design.\n    \"\"\"\n    if design1['found'] == False:\n        return design2\n    if design2['found'] == False:\n        return design1\n\n    if metric == 'latency':\n        if design1['latency'] < design2['latency']:\n            return design1\n        else:\n            return design2\n        # TODO: if the latency equals, we could compare to get the design with lower resouce usage.\n    elif metric == 'power':\n        if design1['power'] < design2['power']:\n            return design1\n        else:\n            return design2\n\ndef generate_sa_sizes_cmd(sa_sizes):\n    \"\"\" Generate the command line argument to specify the sa_sizes.\n\n    Concatenate each size in the sa_sizes to generate the final argument.\n\n    Parameters\n    ----------\n    sa_sizes: list\n        A list containing the sizes for each optimization stage.\n    \"\"\"\n    length = len(sa_sizes)\n    first = 1\n    cmd = '--sa-sizes=\"{'\n    for size in sa_sizes:\n        if not first:\n            cmd += ';'\n        cmd += size\n        first = 0\n\n    cmd += '}\"'\n    return cmd\n\n\n@timer\ndef train_resource_models_xilinx(config):\n    \"\"\" Train the resource model for Xilinx program.\n\n    This function first collects all HLS synthesized designs from the previous stage.\n    These designs are grouped by kernels.\n    Then, it trains a resource model for each kernel using linear regression.\n    The trained models are placed in /training/resource_models/\n\n    \"\"\"\n    tmp_dir = config['tmp_dir']\n    config['work_dir'] = f'{tmp_dir}/optimizer/synth'\n    jobs = os.listdir(config['work_dir'])\n    training_samples = {}\n    for job in jobs:\n        job_dir = f'{config[\"work_dir\"]}/{job}'\n        kernels = os.listdir(job_dir)\n        for kernel in kernels:\n            kernel_dir = f'{job_dir}/{kernel}'\n            designs = os.listdir(kernel_dir)\n            if kernel not in training_samples:\n                training_samples[kernel] = []\n            for design in designs:\n                design_dir = f'{kernel_dir}/{design}/output'\n                training_samples[kernel].append(design_dir)\n    # Train the resource model for each kernel\n    work_dir = f'{tmp_dir}/optimizer/training/resource_models'\n    if os.path.exists(work_dir):\n        shutil.rmtree(work_dir)\n    os.mkdir(work_dir)\n    for kernel in training_samples:\n        # Create the directory\n        cur_work_dir = f'{work_dir}/{kernel}'\n        os.mkdir(cur_work_dir)\n        # Collect the design infos\n        designs = training_samples[kernel]\n        design_infos = []\n        for design_dir in designs:\n            design_info = res_model.extract_design_info(design_dir, 1)\n            design_infos.append(design_info)\n            config['logger'].info(design_dir)\n        # Convert the design infos to a dataframe\n        modules, fifos, df = res_model.convert_design_infos_to_df(design_infos)\n        # Train the models\n        config['logger'].info(f'Train the resource models for {kernel}...')\n        res_model.train(df, modules, fifos, design_infos, cur_work_dir, config['logger'])\n\n@timer\ndef train_latency_models_xilinx(config):\n    \"\"\" Train the latency model\n\n    Note: We will assume all loops with II = 1 and depth = 1.\n    \"\"\"\n    return\n\ndef execute_autosa_cmd(config):\n    \"\"\" Compose the AutoSA command and run.\n\n    Parameters\n    ----------\n    config: dict\n        Global configuration.\n\n    Returns\n    -------\n    ret: int\n        The command return code.\n    \"\"\"\n    # Check if time out\n    if config['monitor']['time_out_start'] != -1:\n        elapsed_time = time.time() - config['monitor']['time_out_start']\n        if float(elapsed_time) / 60 > config['setting']['search']['time_out']:\n            return -1\n\n    cmd = ' '.join(config['cmds'])\n    #config['logger'].info(f'Execute CMD: {cmd}')\n    config['logger'].debug(f'Execute CMD: {cmd}')\n    p = subprocess.Popen(cmd, shell=True, stdout=config['stdout'])\n    ret = p.wait()\n    return ret\n\ndef execute_sys_cmd(cmd, config):\n    \"\"\" Execute the system command.\n\n    Parameters\n    ----------\n    cmd: str\n        Command to execute.\n    config: dict\n        Global configuration\n    \"\"\"\n    config['logger'].debug(f'Execute CMD: {cmd}')\n    p = subprocess.Popen(cmd, shell=True, stdout=config['stdout'])\n    ret = p.wait()\n    return ret\n\ndef generate_autosa_cmd_str(cmds):\n    \"\"\" Generate the cmd to print.    \n    \"\"\"\n    cmd_str = ''\n    is_first = True\n    for cmd in cmds:\n        #if cmd.find(' --tuning') != -1:\n        #    cmd = cmd.replace(' --tuning', '')\n        if not is_first:\n            cmd_str += ' '\n        cmd_str += cmd\n        is_first = False\n\n    return cmd_str\n\ndef save_design_files(config):\n    \"\"\" Save the current design.\n\n    \"\"\"\n    # Load the kernel id\n    design_dir = f'{config[\"work_dir\"]}/output'\n    with open(f'{design_dir}/resource_est/design_info.json', 'r') as f:\n        design_info = json.load(f)\n    kernel_id = design_info['kernel_id']\n    if not os.path.exists(f'{config[\"work_dir\"]}/kernel{kernel_id}'):\n        os.mkdir(f'{config[\"work_dir\"]}/kernel{kernel_id}')\n    prj_path = f'{config[\"work_dir\"]}/kernel{kernel_id}'\n    designs = os.listdir(prj_path)\n    design_id = len(designs)\n    design_path = f'{config[\"work_dir\"]}/kernel{kernel_id}/design{design_id}'\n    os.mkdir(design_path)\n\n    # Save the cmd\n    with open(design_path + '/design.info', 'w') as f:\n        f.write(generate_autosa_cmd_str(config['cmds']))\n\n    # if config['mode'] == 'search':\n        # Store the estimated latency and resource info\n        # TODO\n\n    # Copy the files\n    ret = execute_sys_cmd(\n        f'cp -r {config[\"work_dir\"]}/output {design_path}/',\n        config)\n\ndef clear_design_files(config):\n    \"\"\" Clean up the design folder files\n\n    \"\"\"\n    execute_sys_cmd(f'rm {config[\"work_dir\"]}/output/latency_est/*', config)\n    execute_sys_cmd(f'rm {config[\"work_dir\"]}/output/resource_est/*', config)\n    execute_sys_cmd(f'rm {config[\"work_dir\"]}/output/src/*', config)\n\ndef explore_design(config):\n    \"\"\" Explore the final design.\n\n    In the training mode, we will save the current design.\n    Later, we will sample some designs to be synthesized for\n    training the resource/latency models.\n    In the search mode, we will evaluate the resource and latency of the current\n    design and update the config accordingly.\n\n    \"\"\"\n    tmp_dir = config['tmp_dir']\n    # Update the monitor\n    config['monitor']['n_designs'] += 1\n\n    if config['mode'] == 'training':\n        save_design_files(config)\n        clear_design_files(config)\n        return\n    elif config['mode'] == 'search':\n        cur_design = {\n            'latency': -1,\n            'resource': {},\n            'power': -1,\n            'cmd': generate_autosa_cmd_str(config['cmds'])\n        }\n        config['monitor']['last_design'] = cur_design\n        design_dir = f'{config[\"work_dir\"]}/output'\n        if config['setting']['search']['metric'] == 'latency':\n            #start_time = time.perf_counter()\n            # Predict the latency\n            latency_info = lat_model.extract_latency_info(design_dir)\n            latency = lat_model.predict_design_latency(\n                latency_info, config['setting']['search']['cycle_period'],\n                config['search_results']['opt']['latency'])\n            #runtime = time.perf_counter() - start_time\n            #print(f'resource runtime: {runtime}')\n            if config['search_results']['opt']['found']:\n                if latency > config['search_results']['opt']['latency']:\n                    clear_design_files(config)\n                    return\n            cur_design['latency'] = int(latency)\n        elif config['setting']['search']['metric'] == 'power':\n            # Predict the power\n            clear_design_files(config)\n            raise NotImplementedError(f'DSE for power is not supported.')\n\n        # Predict the resource usage\n        #start_time = time.perf_counter()\n        design_info = res_model.extract_design_info(design_dir, 0)\n        modules, fifos, df = res_model.convert_design_infos_to_df([design_info])\n        kernel_id = design_info['kernel_id']\n        # Resource model path\n        res_model_path = f'{tmp_dir}/optimizer/training/resource_models/kernel{kernel_id}'\n        res = res_model.predict_design_resource_usage(\n            df, modules, fifos, design_info,\n            res_model_path,\n            config['setting']['search']['resource_target'])\n        cur_design['resource'] = res\n\n        if not res_model.resource_valid(res, config['hw_info'], \\\n            config['setting']['search']['pruning']['resource']['range'],\n            config['setting']['search']['resource_target']):\n            clear_design_files(config)\n            return\n        #runtime = time.perf_counter() - start_time\n        #print(f'resource runtime: {runtime}')\n\n        # Compare and update the search results\n        config['search_results'] = update_search_results(\n            config['search_results'], cur_design,\n            config['setting']['search']['log']['n_record'],\n            'latency', config['hw_info'])\n\n        # For certain time interval, print out the best design found so far\n        if config['setting']['search']['update_time_interval'] != -1:\n            if 'update_last_time' not in config['monitor']:\n                config['monitor']['update_last_time'] = time.time()\n            else:\n                elapsed_time = time.time() - config['monitor']['update_last_time']\n                if float(elapsed_time) / 60 > config['setting']['search']['update_time_interval']:\n                    # print the best results so far\n                    config['logger'].info(print_best_design(config['search_results']['opt'], config['hw_info']))\n                    config['monitor']['update_last_time'] = time.time()\n\n    clear_design_files(config)\n    return\n\ndef simd_loop_filter(loops, tuning):\n    \"\"\" Filter out the SIMD candidate loops based on the tuning information.\n\n    We select the legal simd loop with the highest score.\n    If there is no such loop, we will set \"loops\" to all \"1\"s.\n    AutoSA will not tile loops with the tiling factor as one for latency hiding or\n    SIMD vectorization.\n    If one such loop is found, we will set all loop bounds to 1 except the target loop.\n\n    Parameters\n    ----------\n    loops: list\n        upper bounds of all candidate SIMD loops\n    tuning: dict\n        tuning information for the SIMD stage\n    \"\"\"\n    scores = tuning['simd']['scores']\n    legal = tuning['simd']['legal']\n    # Find the candidate loop with the highest score\n    simd_loop_idx = -1\n    max_score = -1\n    for i in range(len(legal)):\n        if legal[i] == 0:\n            continue\n        if scores[i] > max_score:\n            max_score = scores[i]\n            simd_loop_idx = i\n\n    if simd_loop_idx < 0:\n        filter_loops = [1 for i in range(len(loops))]\n    else:\n        filter_loops = [1 for i in range(len(loops))]\n        filter_loops[simd_loop_idx] = loops[simd_loop_idx]\n\n    return filter_loops\n\n\ndef explore_simd_vectorization(config):\n    \"\"\" Explore the stage of SIMD vectorization.\n\n    When AutoSA reaches this stage, we will have the systolic array dimension\n    in the tuning information. If the pruning is enabled at this stage,\n    we will first filter out the designs not satisfying the pruning requirements\n    for the PE structures. (SIMD_vectorization_PE_pruning)\n    Next, we will limit the candidate loop upperbounds by examining the scores and\n    legality information in the tuning info. Only the upperbound for the legal loop\n    with the maximal score is kept, and all the rest is set to 1. (simd_loop_filter)\n    After the above steps, we will go through the standard precedurs as to generate\n    the candidate loops, compile the program, and move forward to the next stage.\n\n    \"\"\"\n    pruning_en = config['setting'][config['mode']]['pruning']['SIMD_vectorization']['enable']\n    if config['autosa_config']['simd']['mode'] == 'manual':\n        with open(f'{config[\"work_dir\"]}/output/tuning.json') as f:\n            tuning = json.load(f)\n        if 'simd' not in tuning:\n            # No SIMD opportunities found, we will skip this stage\n            explore_design(config)\n        else:    \n            PE_pruning_postpone = 0\n            if pruning_en:                \n                # Perform early pruning based on the PE numbers\n                config['tuning'] = tuning\n                if 'sa_dims' in config['tuning']['simd']:\n                    #print(PE_pruning_postpone)          \n                    if opt_prune.SIMD_vectorization_PE_pruning(config):\n                        return\n                else:\n                    PE_pruning_postpone = 1\n            #print(PE_pruning_postpone)                    \n            loops = tuning['simd']['tilable_loops']\n            # Filter the SIMD loops\n            loops = simd_loop_filter(loops, tuning)\n            loops_pool = generate_loop_candidates(\n                loops, config, \"SIMD_vectorization\")\n\n            if len(loops_pool) == 0:\n                simd_en = config['autosa_config']['simd']['enable']\n                sa_sizes = config['sa_sizes'].copy()\n                config['autosa_config']['simd']['enable'] = 0\n                with open(f'{config[\"work_dir\"]}/autosa_config.json', 'w') as f:\n                    json.dump(config['autosa_config'], f, indent=4)\n\n                ret = execute_autosa_cmd(config)\n                if ret != 0:\n                    config['logger'].error(f'CMD failed with error code {ret}')\n                    config['autosa_config']['simd']['enable'] = simd_en\n                    config['sa_sizes'] = sa_sizes\n                    return\n                if PE_pruning_postpone:\n                    with open(f'{config[\"work_dir\"]}/output/tuning.json') as f:\n                        tuning = json.load(f)              \n                    config['tuning'] = tuning  \n                    if opt_prune.SIMD_vectorization_PE_pruning(config, 1):\n                        config['autosa_config']['simd']['enable'] = simd_en\n                        config['sa_sizes'] = sa_sizes\n                        return\n                explore_design(config)\n                config['autosa_config']['simd']['enable'] = simd_en\n                config['sa_sizes'] = sa_sizes\n                with open(f'{config[\"work_dir\"]}/autosa_config.json', 'w') as f:\n                    json.dump(config['autosa_config'], f, indent=4)\n            else:\n                if config['mode'] == 'search' and config['setting']['search']['metric'] == 'latency' \\\n                    and pruning_en:\n                    loops_pool = opt_prune.reorder_simd_loops(loops_pool)\n                for loop in loops_pool:\n                    sa_sizes = config['sa_sizes'].copy()\n                    config['sa_sizes'].append(\n                        f'kernel[]->simd{str(loop).replace(\" \", \"\")}')\n                    config['cmds'][3] = generate_sa_sizes_cmd(config['sa_sizes'])\n\n                    #start_time = time.perf_counter()\n                    ret = execute_autosa_cmd(config)\n                    #run_time = time.perf_counter() - start_time\n                    #print(f'runtime: {run_time}')\n\n                    if ret != 0:\n                        config['logger'].error(f'CMD failed with error code {ret}')\n                        config['sa_sizes'] = sa_sizes\n                        continue\n                    if PE_pruning_postpone:\n                        with open(f'{config[\"work_dir\"]}/output/tuning.json') as f:\n                            tuning = json.load(f)              \n                        config['tuning'] = tuning  \n                        if opt_prune.SIMD_vectorization_PE_pruning(config, 1):                            \n                            config['sa_sizes'] = sa_sizes\n                            continue\n\n                    explore_design(config)\n                    config['sa_sizes'] = sa_sizes\n\n                    if config['mode'] == 'search' and config['setting']['search']['metric'] == 'latency' \\\n                        and pruning_en:\n                        if opt_prune.SIMD_vectorization_latency_pruning(config):\n                            return\n    else:\n        explore_design(config)\n\n    return\n\n\ndef explore_latency_hiding(config):\n    \"\"\" Explore the stage of latency hiding.\n\n\n    \"\"\"\n    if config['autosa_config']['latency']['mode'] == 'manual':\n        # Fetch the tuning info\n        with open(f'{config[\"work_dir\"]}/output/tuning.json') as f:\n            tuning = json.load(f)\n        if 'latency' not in tuning:\n            # This stage is skippd by AutoSA, we will also skip it\n            latency_hiding_en = config['autosa_config']['latency']['enable']\n            sa_sizes = config['sa_sizes'].copy()\n            config['autosa_config']['latency']['enable'] = 0\n            with open(f'{config[\"work_dir\"]}/autosa_config.json', 'w') as f:\n                json.dump(config['autosa_config'], f, indent=4)\n            ret = execute_autosa_cmd(config)\n            if ret != 0:\n                config['logger'].error(f'CMD failed with error code {ret}')\n                config['autosa_config']['latency']['enable'] = latency_hiding_en\n                config['sa_sizes'] = sa_sizes\n                return\n            explore_simd_vectorization(config)\n\n            config['autosa_config']['latency']['enable'] = latency_hiding_en\n            config['sa_sizes'] = sa_sizes\n            with open(f'{config[\"work_dir\"]}/autosa_config.json', 'w') as f:\n                json.dump(config['autosa_config'], f, indent=4)\n            return\n\n        loops = tuning['latency']['tilable_loops']        \n        loops_pool = generate_loop_candidates(loops, config, \"latency_hiding\")\n        if config['setting'][config['mode']\n                             ]['pruning']['latency_hiding']['enable']:\n            config['tuning'] = tuning\n            loops_pool = opt_prune.latency_hiding_loops_pruning(\n                loops_pool, config)\n\n        if len(loops_pool) == 0:\n            # Latency hiding is a must. In this case, we will stop exploration and return.\n            return\n        else:\n            for loop in loops_pool:\n                # Hack: For GEMM4\n                #loop[-1] = 1\n\n                sa_sizes = config['sa_sizes'].copy()\n                config['sa_sizes'].append(\n                    f'kernel[]->latency{str(loop).replace(\" \", \"\")}')\n                config['cmds'][3] = generate_sa_sizes_cmd(config['sa_sizes'])\n                ret = execute_autosa_cmd(config)\n                if ret != 0:\n                    config['logger'].error(f'CMD failed with error code {ret}')\n                    config['sa_sizes'] = sa_sizes\n                    continue\n                explore_simd_vectorization(config)\n                config['sa_sizes'] = sa_sizes\n    else:\n        explore_simd_vectorization(config)\n\n    return\n\n\ndef explore_array_part_L2(config):\n    \"\"\" Explore the stage of second-level array partitioning.\n\n    \"\"\"\n    if config['autosa_config']['array_part_L2']['mode'] == 'manual':\n        # Fetch the tuning info\n        with open(f'{config[\"work_dir\"]}/output/tuning.json') as f:\n            tuning = json.load(f)\n        loops = tuning['array_part_L2']['tilable_loops']\n        coincident = tuning['array_part_L2']['coincident']\n        # Generate the tiling factors to proceed\n        loops_pool = generate_loop_candidates(loops, config, 'array_part_L2')\n        if config['setting'][config['mode']\n                             ]['pruning']['array_part_L2']['enable']:\n            config['tuning'] = tuning\n            loops_pool = opt_prune.array_part_L2_loops_pruning(\n                loops_pool, config)\n\n        if len(loops_pool) == 0:\n            # No available tiling options, we will disable this step and skip\n            # it.\n            array_part_L2_en = config['autosa_config']['array_part_L2']['enable']\n            sa_sizes = config['sa_sizes'].copy()\n            config['autosa_config']['array_part_L2']['enable'] = 0\n            with open(f'{config[\"work_dir\"]}/autosa_config.json', 'w') as f:\n                json.dump(config['autosa_config'], f, indent=4)\n\n            ret = execute_autosa_cmd(config)\n            if ret != 0:\n                config['logger'].error(f'CMD failed with error code {ret}')\n                config['autosa_config']['array_part_L2']['enable'] = array_part_L2_en\n                config['sa_sizes'] = sa_sizes\n                return\n            explore_latency_hiding(config)\n            # Revert the changes\n            config['autosa_config']['array_part_L2']['enable'] = array_part_L2_en\n            config['sa_sizes'] = sa_sizes\n            with open(f'{config[\"work_dir\"]}/autosa_config.json', 'w') as f:\n                json.dump(config['autosa_config'], f, indent=4)\n        else:\n            for loop in loops_pool:\n                sa_sizes = config['sa_sizes'].copy()\n                config['sa_sizes'].append(\n                    f'kernel[]->array_part_L2{str(loop).replace(\" \", \"\")}')\n                config['cmds'][3] = generate_sa_sizes_cmd(config['sa_sizes'])\n                ret = execute_autosa_cmd(config)\n                if ret != 0:\n                    config['logger'].error(f'CMD failed with error code {ret}')\n                    config['sa_sizes'] = sa_sizes\n                    continue\n                explore_latency_hiding(config)\n                config['sa_sizes'] = sa_sizes\n    else:\n        explore_latency_hiding(config)\n\n\ndef explore_array_part_single_job(loops, config, work_dir, is_multi_process=0):\n    \"\"\" Explore the stage of array partitioning with single process.\n\n    Parameters\n    ----------\n    loops:\n        Candidate loops.\n    config:\n        Global configuration.\n    work_dir: str\n        The current work directory.\n    is_multi_process: int\n        Is multi process launched.\n    \"\"\"\n    # Modify the commands\n    config['cmds'][1] = f'--config={work_dir}/autosa_config.json'\n    config['cmds'][2] = f'--output-dir={work_dir}/output'\n    config['work_dir'] = work_dir\n    config['logger'] = logging.getLogger('AutoSA-Optimizer')\n\n    # Progress meter\n    total_tasks = len(loops)\n    finished_tasks = 0\n    for loop in loops:\n        sa_sizes = config['sa_sizes'].copy()\n        config['sa_sizes'].append(\n            f'kernel[]->array_part{str(loop).replace(\" \", \"\")}')\n        config['cmds'][3] = generate_sa_sizes_cmd(config['sa_sizes'])\n        ret = execute_autosa_cmd(config)\n        if ret != 0:\n            config['logger'].error(f'CMD failed with error code {ret}')\n            config['sa_sizes'] = sa_sizes\n            continue\n        if config['two_level_buffer']:\n            explore_array_part_L2(config)\n        else:\n            explore_latency_hiding(config)\n        config['sa_sizes'] = sa_sizes\n        finished_tasks += 1\n        config['logger'].info(f'Progress(PID: {os.getpid()}): [{finished_tasks}/{total_tasks}]')\n\n    if is_multi_process:\n        config['logger'] = None\n    return config\n\n\ndef explore_array_part(config):\n    \"\"\" Explore the stage of array partitioning.\n\n    If this stage is set in Manual mode, this function will load the tuning\n    info which contains all the tilable loops.\n    This function will then generate all possible loop tiling combination.\n    If stage pruning is enabled, these loop candidates will be pruned\n    based on certain heuristics.\n    Next, this function will iterate through these combinations and proceed to\n    the next stage.\n    If multi-processing is enabled, the optimizer folder directory will\n    be updated to allocate a workspace for each forked process.\n    We will distribute these loops equally to all the processes to proceed.\n\n    Otherwise, we will skip this stage and jump to the next stage.\n    As for the next stage, we will go to:\n    - array_part_L2 if config['two_level_buffer'] is enabled\n    - latency_hiding if config['two_level_buffer'] is disabled\n\n    We apply the following heuristic to prune the candidate loops.\n    - The product of tiling factors should be no less than the #PE lower bound.\n\n    Parameters\n    ----------\n    config: dict\n        Global configuration.\n    \"\"\"\n    if config['autosa_config']['array_part']['mode'] == 'manual':\n        # The program will terminate after array partitioning\n        # Fetch the tuning info\n        with open(f'{config[\"work_dir\"]}/output/tuning.json') as f:\n            tuning = json.load(f)\n        loops = tuning['array_part']['tilable_loops']\n        # Generate the tiling factors to proceed\n        loops_pool = generate_loop_candidates(loops, config, 'array_part')\n        if config['setting'][config['mode']\n                             ]['pruning']['array_part']['enable']:\n            # Apply pruning on the candidate loops\n            loops_pool = opt_prune.array_part_loops_pruning(loops_pool, config)\n\n        if len(loops_pool) == 0:\n            # No available tiling options, we will disable this step and skip it.\n            # At the same time, two-level-buffer is also disabled\n            array_part_en = config['autosa_config']['array_part']['enable']\n            array_part_L2_en = config['autosa_config']['array_part_L2']['enable']\n            sa_sizes = config['sa_sizes'].copy()\n            config['autosa_config']['array_part']['enable'] = 0\n            config['autosa_config']['array_part_L2']['enable'] = 0\n            with open(f'config[\"work_dir\"]/autosa_config.json', 'w') as f:\n                json.dump(config['autosa_config'], f, indent=4)\n\n            ret = execute_autosa_cmd(config)\n            if ret != 0:\n                config['logger'].error(f'CMD failed with error code {ret}')\n                config['autosa_config']['array_part']['enable'] = array_part_en\n                config['autosa_config']['array_part_L2']['enable'] = array_part_L2_en\n                config['sa_sizes'] = sa_sizes\n                return\n            explore_latency_hiding(config)\n            # Revert the changes\n            config['autosa_config']['array_part']['enable'] = array_part_en\n            config['autosa_config']['array_part_L2']['enable'] = array_part_L2_en\n            config['sa_sizes'] = sa_sizes\n            with open(f'config[\"work_dir\"]/autosa_config.json', 'w') as f:\n                json.dump(config['autosa_config'], f, indent=4)\n        else:\n            if config['setting'][config['mode']]['multiprocess']['n_job'] > 1 and len(loops_pool) > 1:\n                multi_process(\n                    loops_pool,\n                    explore_array_part_single_job,\n                    config)\n            else:\n                explore_array_part_single_job(\n                    loops_pool, config, config['work_dir'])\n    else:\n        if config['autosa_config']['array_part_L2']['enable']:\n            explore_array_part_L2(config)\n        else:\n            explore_latency_hiding(config)\n\n\ndef explore_space_time(config):\n    \"\"\" Explore the stage of space-time transformation.\n\n    If this stage is set in Manual mode, we will load the tuning info\n    and iterate through all possible kernels to proceed.\n    Otherwise, AutoSA automatically selects one kernel to proceed.\n    We will directly jump to the next stage: array partitioning.\n\n    Parameters\n    ----------\n    config: dict\n        Global configuration.\n    \"\"\"\n    if config['autosa_config']['space_time']['mode'] == 'manual':\n        # The program will terminate after the space-time transformation\n        # Fetch the tuning info\n        with open(f'{config[\"work_dir\"]}/output/tuning.json') as f:\n            tuning = json.load(f)\n        if 'space_time' not in tuning:\n            # Users have assigned the space-time options, we will skip this stage\n            explore_array_part(config)\n        else:\n            n_kernel = tuning['space_time']['n_kernel']\n\n            # Iterate through different kernels\n            #for kernel_id in [0]:\n            for kernel_id in range(n_kernel):\n                config['logger'].info(f'Search kernel {kernel_id}...')\n                sa_sizes = config['sa_sizes'].copy()\n                config['sa_sizes'].append(f'kernel[]->space_time[{kernel_id}]')\n                config['cmds'][3] = generate_sa_sizes_cmd(config['sa_sizes'])\n                ret = execute_autosa_cmd(config)\n                if ret != 0:\n                    config['logger'].error(f'CMD failed with error code {ret}')\n                    config['sa_sizes'] = sa_sizes\n                    continue\n                explore_array_part(config)\n                config['sa_sizes'] = sa_sizes\n    else:\n        explore_array_part(config)\n\n\n@timer\ndef explore_design_space(config):\n    \"\"\" Explore the design space through multiple stages\n\n    We will expand the design space through multiple stages:\n    space-time transformation ->\n    array partitioning ->\n    latency hiding ->\n    SIMD vectorization\n\n    At each stage, we will generate a new cmd and execute it to obtain the tuning\n    information for the next stage.\n    The cmd list:\n    - config['cmds'][0]: the original user command\n    - config['cmds'][1]: the AutoSA config file\n    - config['cmds'][2]: the AutoSA output directory\n    - config['cmds'][3]: the AutoSA sizes\n\n    Parameters\n    ----------\n    config: dict\n        Global configuration.\n    \"\"\"\n    # Execute the cmd\n    config['cmds'][3] = generate_sa_sizes_cmd(config['sa_sizes'])\n    ret = execute_autosa_cmd(config)\n    if ret != 0:\n        config['logger'].error(f'CMD failed with error code {ret}')\n        config['sa_sizes'] = []\n        return\n    # Enter the first stage: space-time transformation\n    explore_space_time(config)\n\ndef synth_train_samples_single_job(config, job_id):\n    \"\"\" Launch HLS synthesis for each single process\n\n    \"\"\"\n    config['logger'] = logging.getLogger('AutoSA-Optimizer')\n    autosa_prj_path = os.environ['AUTOSA_ROOT']\n    work_dir = f'{config[\"work_dir\"]}/job{job_id}'\n    kernels = os.listdir(work_dir)\n    for kernel in kernels:\n        path = f'{work_dir}/{kernel}'\n        designs = os.listdir(path)\n        for design in designs:\n            prj_path = f'{path}/{design}/output'\n            # Copy the HLS TCL script to the project\n            ret = execute_sys_cmd(\n                f'cp {autosa_prj_path}/autosa_scripts/hls_scripts/hls_script_synth.tcl {prj_path}/hls_script.tcl',\n                config)\n            # Execute the TCL\n            cwd = os.getcwd()\n            os.chdir(prj_path)\n            ret = execute_sys_cmd('vivado_hls -f hls_script.tcl', config)\n            os.chdir(cwd)\n\n@timer\ndef generate_train_samples(config):\n    \"\"\" Generate the training samples.\n\n    \"\"\"\n    # Prepare the directory and files\n    tmp_dir = config['tmp_dir']\n    if os.path.exists(f'{tmp_dir}/optimizer/training'):\n        shutil.rmtree(f'{tmp_dir}/optimizer/training')\n    os.mkdir(f'{tmp_dir}/optimizer/training')\n    os.mkdir(f'{tmp_dir}/optimizer/training/job0')\n    # Initialize file directory\n    Path(f'{config[\"work_dir\"]}/output').mkdir(exist_ok=True)\n    Path(f'{config[\"work_dir\"]}/output/src').mkdir(exist_ok=True)\n    Path(f'{config[\"work_dir\"]}/output/latency_est').mkdir(exist_ok=True)\n    Path(f'{config[\"work_dir\"]}/output/resource_est').mkdir(exist_ok=True)\n    with open(f'{config[\"work_dir\"]}/autosa_config.json', 'w') as f:\n        json.dump(config['autosa_config'], f, indent=4)\n\n    while config['monitor']['n_designs'] < config['setting']['synth']['sample']['n']:\n        # Collect enough training samples\n        explore_design_space(config)\n    config['logger'].info(f'{config[\"monitor\"][\"n_designs\"]} designs are generated.')\n\n@timer\ndef synth_train_samples(config):\n    \"\"\" Synthesize the trainig samples.\n\n    We will sample a few designs generated from the previous training exploration.\n    Next, we call Vivado HLS to synthesize each design.\n\n    \"\"\"\n    tmp_dir = config['tmp_dir']\n    config['work_dir'] = f'{tmp_dir}/optimizer/training'\n    # Collect all designs into a list\n    design_paths = {}\n    for n in range(config['setting']['training']['multiprocess']['n_job']):\n        f_path = f'{config[\"work_dir\"]}/job{n}'\n        f_list = os.listdir(f_path)\n        for f in f_list:\n            if 'kernel' in f:\n                if f not in design_paths:\n                    design_paths[f] = []\n                d_path = f'{f_path}/{f}'\n                d_list = os.listdir(d_path)\n                for d in d_list:\n                    prj_path = f'{d_path}/{d}'\n                    design_paths[f].append(prj_path)\n    # Random sample a few designs for each kernel and build the synthesis folder\n    config['work_dir'] = f'{tmp_dir}/optimizer/synth'\n    if os.path.exists(config['work_dir']):\n        shutil.rmtree(config['work_dir'])\n    os.mkdir(config['work_dir'])\n    num_proc = min(multiprocessing.cpu_count(),\n                   config['setting']['synth']['multiprocess']['n_job'])\n    for i in range(num_proc):\n        prj_dir = config['work_dir'] + f'/job{i}'\n        os.mkdir(prj_dir)\n    tasks = []\n    for kernel in design_paths:\n        designs = design_paths[kernel]\n        n_sample = config['setting']['synth']['sample']['n']\n        if n_sample < len(designs):\n            designs = random.sample(designs, n_sample)\n        # Push to the list\n        for design in designs:\n            tasks.append((kernel, design))\n    # Uniformly distribute the tasks to each processor\n    chunk_size = int(np.ceil(float(len(tasks)) / num_proc))\n    task_chunks = [tasks[i: i + min(chunk_size, len(tasks) - i)]\n                   for i in range(0, len(tasks), chunk_size)]\n    for job_id in range(len(task_chunks)):\n        task_chunk = task_chunks[job_id]\n        for task in task_chunk:\n            kernel = task[0]\n            design_path = task[1]\n            design = design_path.rsplit('/', 1)[-1]\n            if not os.path.exists(\n                    f'{config[\"work_dir\"]}/job{job_id}/{kernel}'):\n                os.mkdir(f'{config[\"work_dir\"]}/job{job_id}/{kernel}')\n            new_design_path = f'{config[\"work_dir\"]}/job{job_id}/{kernel}/{design}'\n            # copy the design files\n            ret = execute_sys_cmd(\n                f'cp -r {design_path} {new_design_path}', config)\n\n    # Execute the HLS synthesis\n    pool = multiprocessing.Pool(processes=num_proc)\n    config['logger'].info(f'Launch HLS synthesis with {num_proc} processes...')\n    logger = config['logger']\n    config['logger'] = None\n    ret = pool.starmap(\n        synth_train_samples_single_job, [\n            (config, i) for i in range(num_proc)])\n    config['logger'] = logger\n\n\ndef train_xilinx(config):\n    \"\"\" Train the resource and latency models on Xilinx platforms.\n\n    This function first creates training samples by randomly sampling all\n    the design points.\n    Then it calls Vivado HLS to synthesize all designs.\n    Next it collects the results and trains the resource and latency models\n    using linear regression.\n\n    Parameters\n    ----------\n    config: dict\n        Global configuration.\n    \"\"\"\n    config['mode'] = 'training'\n\n    # Generate sample designs\n    config['logger'].info('Generate training samples...')\n    generate_train_samples(config)\n\n    # Synthesize designs\n    config['logger'].info('Synthesize training samples...')\n    synth_train_samples(config)\n\n    # Train the resource models\n    config['logger'].info('Train resource models...')\n    train_resource_models_xilinx(config)\n\n    ## Train the latency models\n    # config['logger'].info('Train latency models...')\n    # train_latency_models_xilinx(config) # TODO\n\ndef get_default_pruning_policy(mode):\n    \"\"\" Return the default search pruning policy.\n\n    \"\"\"\n    #TODO\n    return\n\ndef get_sample_policy(mode, n_random=2):\n    \"\"\" Return the search sampling policy.\n\n    Parameters\n    ----------\n    mode: str\n        Sampling mode.\n    n_random: int\n        The higher the random level, the more samples are generated.\n    \"\"\"\n    if mode == 'random':\n        ret = {\n            \"array_part\": {\n                \"mode\": \"random\",\n                \"n\": n_random,\n                \"loop_limit\": -1\n            },\n            \"array_part_L2\": {\n                \"mode\": \"random\",\n                \"n\": n_random,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"random\",\n                \"n\": n_random,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"random\",\n                \"n\": n_random,\n                \"loop_limit\": 8\n            }\n        }\n    elif mode == 'exhaustive':\n        ret = {\n            \"array_part\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": -1\n            },\n            \"array_part_L2\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": -1\n            },\n            \"latency_hiding\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 64\n            },\n            \"SIMD_vectorization\": {\n                \"mode\": \"exhaustive\",\n                \"n\": -1,\n                \"loop_limit\": 8\n            }\n        }\n    else:\n        raise RuntimeError(f'Unknown sampling mode: {mode}')\n\n    return ret\n\ndef print_best_design(opt_design, hw_info=None):\n    \"\"\" Pretty print the best design.\n\n    Parameters\n    ----------\n    opt_design: dict\n        Optimal design.\n\n    Returns\n    -------\n    ret: str\n        Printed design in a string.\n    \"\"\"\n    ret = (\n        f\"\\n======== Best design ========\\n\"\n        f\"Latency(Cycle): {int(opt_design['latency'])}\\n\"\n        f\"Power(W): {opt_design['power']}\\n\"\n        f\"Resource:\\n\"\n    )\n\n    if 'FF' in opt_design['resource']:\n        ret += f\"\\tFF: {int(opt_design['resource']['FF'])}\"\n        if hw_info:\n            ratio = float(opt_design['resource']['FF']) / hw_info['FF']\n            ret += f\" ({ratio:.2f})\"\n        ret += \"\\n\"\n    if 'LUT' in opt_design['resource']:\n        ret += f\"\\tLUT: {int(opt_design['resource']['LUT'])}\"\n        if hw_info:\n            ratio = float(opt_design['resource']['LUT']) / hw_info['LUT']\n            ret += f\" ({ratio:.2f})\"\n        ret += \"\\n\"\n    if 'BRAM18K' in opt_design['resource']:\n        ret += f\"\\tBRAM18K: {int(opt_design['resource']['BRAM18K'])}\"\n        if hw_info:\n            ratio = float(opt_design['resource']['BRAM18K']) / hw_info['BRAM18K']\n            ret += f\" ({ratio:.2f})\"\n        ret += \"\\n\"\n    if 'URAM' in opt_design['resource']:\n        ret += f\"\\tURAM: {int(opt_design['resource']['URAM'])}\"\n        if hw_info:\n            ratio = float(opt_design['resource']['URAM']) / hw_info['URAM']\n            ret += f\" ({ratio:.2f})\"\n        ret += \"\\n\"\n    if 'DSP' in opt_design['resource']:\n        ret += f\"\\tDSP: {int(opt_design['resource']['DSP'])}\"\n        if hw_info:\n            ratio = float(opt_design['resource']['DSP']) / hw_info['DSP']\n            ret += f\" ({ratio:.2f})\"\n        ret += \"\\n\"\n    ret += f\"=============================\"\n\n    return ret\n\ndef save_search_log(records, log):\n    \"\"\" Save the DSE design records to log file.\n\n    Parameters\n    ----------\n    records: list\n        A list of best designs found in the tuning process.\n    log: str\n        Path to the log file.\n    \"\"\"\n    with open(log, 'w') as f:\n        json.dump(records, f, indent=4)\n\ndef search_xilinx(config):\n    \"\"\" Perform search phase on Xilinx platform.\n\n    \"\"\"\n    # Prepare the directory and files\n    tmp_dir = config['tmp_dir']\n    if os.path.exists(f'{tmp_dir}/optimizer/search'):\n        shutil.rmtree(f'{tmp_dir}/optimizer/search')\n    os.mkdir(f'{tmp_dir}/optimizer/search')\n    os.mkdir(f'{tmp_dir}/optimizer/search/job0')\n    # Initialize file directory\n    Path(f'{config[\"work_dir\"]}/output').mkdir(exist_ok=True)\n    Path(f'{config[\"work_dir\"]}/output/src').mkdir(exist_ok=True)\n    Path(f'{config[\"work_dir\"]}/output/latency_est').mkdir(exist_ok=True)\n    Path(f'{config[\"work_dir\"]}/output/resource_est').mkdir(exist_ok=True)\n    with open(f'{config[\"work_dir\"]}/autosa_config.json', 'w') as f:\n        json.dump(config['autosa_config'], f, indent=4)\n\n    config['mode'] = 'search'\n    config['search_results'] = init_search_results()\n    # Modify the command\n    #config['cmds'][0] += ' --tuning'\n\n    if config['setting'][config['mode']]['pruning']['random_start']['enable']:\n        # Random search the design space\n        config['search_results'] = init_search_results()\n        # Update the sampling strategy\n        user_policy = copy.deepcopy(config['setting'][config['mode']]['sample'])\n        config['setting'][config['mode']]['sample'] = get_sample_policy('random',\n            config['setting'][config['mode']]['pruning']['random_start']['n_random'])\n        n_trial = 0\n        while n_trial < config['setting'][config['mode']]['pruning']['random_start']['n_trial']:\n            config['logger'].info(f'Run random search to warm up... [{n_trial + 1}/{config[\"setting\"][config[\"mode\"]][\"pruning\"][\"random_start\"][\"n_trial\"]}]')\n            explore_design_space(config)\n            config['logger'].info(print_best_design(config['search_results']['opt'], config['hw_info']))\n            n_trial += 1\n        config['setting'][config['mode']]['sample'] = user_policy\n\n    config['logger'].info('Start searching...')\n    # Set up the time-out counter\n    if config['setting']['search']['time_out'] != -1:\n        config['monitor']['time_out_start'] = time.time()\n    if config['setting'][config['mode']]['mode'] == 'exhaustive':\n        config['logger'].info('Search mode: Exhaustive')\n        config['setting'][config['mode']]['sample'] = \\\n            get_sample_policy(config['setting'][config['mode']]['mode'])\n        explore_design_space(config)\n    elif config['setting'][config['mode']]['mode'] == 'random':\n        config['logger'].info('Search mode: Random')\n        config['setting'][config['mode']]['sample'] = \\\n            get_sample_policy(config['setting'][config['mode']]['mode'],\n                config['setting'][config['mode']]['n_random'])\n        explore_design_space(config)\n    elif config['setting'][config['mode']]['mode'] == 'customized':\n        config['logger'].info('Search mode: Customized')\n        explore_design_space(config)\n\n    #print(config['monitor']['n_designs'])\n\n    # Print out the best design\n    config['logger'].info(print_best_design(config['search_results']['opt'], config['hw_info']))\n    # Store the tuning log\n    tmp_dir = config['tmp_dir']\n    log_path = f'{tmp_dir}/optimizer/search/DSE.log'\n    config['logger'].info(f'Saving the DSE results to: {log_path}')\n    save_search_log(config['search_results']['records'], log_path)\n\n    return\n\n\ndef init_logger(training, search, verbose, tmp_dir):\n    \"\"\" Init AutoSA logger.\n\n    Initialize the AutoSA logger.\n\n    Parameters\n    ----------\n    training: boolean\n        Enable training phase.\n    search: boolean\n        Enable search phase.\n    verbose: int\n        Logger verbose level.\n        0: Print minimal information from Optimizer.\n        1: Print all information from Optimizer.\n        2: Print information from Optimizer and AutoSA.\n    tmp_dir: str\n        Path to the temp files.\n\n    Returns\n    -------\n    logger:\n        AutoSA logger.\n    \"\"\"\n    logger = logging.getLogger('AutoSA-Optimizer')\n    formatter = logging.Formatter(\n        '[%(name)s %(asctime)s] %(levelname)s: %(message)s',\n        '%Y-%m-%d %H:%M:%S')\n    logger.setLevel(logging.INFO)\n\n    s_handler = logging.StreamHandler()\n    if training:\n        f_handler = logging.FileHandler(\n            f'{tmp_dir}/optimizer/training.log', 'w')\n    elif search:\n        f_handler = logging.FileHandler(f'{tmp_dir}/optimizer/search.log', 'w')\n    if verbose > 1:\n        s_handler.setLevel(level=logging.DEBUG)\n        f_handler.setLevel(level=logging.DEBUG)\n    elif verbose == 1:\n        s_handler.setLevel(level=logging.INFO)\n        f_handler.setLevel(level=logging.INFO)\n    else:\n        s_handler.setLevel(level=logging.WARNING)\n        f_handler.setLevel(level=logging.WARNING)\n\n    s_handler.setFormatter(formatter)\n    f_handler.setFormatter(formatter)\n    logger.addHandler(s_handler)\n    logger.addHandler(f_handler)\n\n    return logger\n\n\ndef init_monitor():\n    \"\"\" Init monitor for DSE.\n\n    Returns\n    -------\n    monitor: dict\n        \"n_designs\": number of designs that are examined\n        \"time_out_start\": the starting time for time-out counter\n    \"\"\"\n    monitor = {\"n_designs\": 0, \"time_out_start\": -1}\n\n    return monitor\n\ndef init_search_results():\n    \"\"\" Init search results for DSE.\n\n    Note: The search results contain two parts: the opt design and the tuning\n    records. The opt design is the best design found during the search process.\n    The records contain the top designs found during the search process.\n\n    \"\"\"\n    ret = {\n        'opt': {\n            'found': False,\n            'latency': -1,\n            'resource': {'FF': -1, 'LUT': -1, 'BRAM18K': -1, 'URAM': -1, 'DSP': -1},\n            'power': -1,\n            'cmd': None\n        },\n        'records': []\n    }\n\n    return ret\n\ndef update_search_results(results, cur_design, n_record, metric, hw_info):\n    \"\"\" Update the search results.\n\n    Parameters\n    ----------\n    results: dict\n        A dict containing the current search results.\n    cur_design: dict\n        The current design to be compared.\n    n_record: int\n        The number of records to be logged in the search results.\n    metric: str\n        Evaluation metric.\n    hw_info: dict\n        A dictionary containing the hardware information.\n    \"\"\"\n    if metric == 'latency':\n        update_design = False\n        if not results['opt']['found']:\n            results['opt']['found'] = True\n            update_design = True\n        else:\n            update_design = False\n            if cur_design['latency'] < results['opt']['latency']:\n                update_design = True\n            elif cur_design['latency'] == results['opt']['latency']:\n                # We compute a score for the resource usage.\n                cur_res_score = res_model.compute_res_util_score(cur_design['resource'], hw_info)\n                opt_res_score = res_model.compute_res_util_score(results['opt']['resource'], hw_info)\n                if cur_res_score < opt_res_score:\n                    update_design = True\n\n        if update_design:\n            # Update the opt\n            results['opt']['latency'] = cur_design['latency']\n            results['opt']['resource'] = cur_design['resource']\n            results['opt']['cmd'] = cur_design['cmd']\n            # Update the records\n            results['records'].insert(0, results['opt'].copy())\n            results['records'] = results['records'][:n_record]\n    else:\n        raise NotImplementedError(f'Update search results for power is not supported.')\n\n    return results\n\ndef merge_search_results(results, metric, n_record, hw_info):\n    \"\"\" Merge search results from DSE.\n\n    We will first merge the records and then update the opt design.\n    Each result is already sorted. Therefore, we will initialize the return list\n    with the first result. For the following results, we will insert them into the\n    return list by comparing the metrics.\n\n    Parameters\n    ----------\n    results: list\n        A list of results to merge.\n    metric: str\n        The DSE evaluation metric.\n    n_record: int\n        Number of top records to keep.\n    hw_info: dict\n        Hardware information.\n\n    Returns\n    -------\n    ret: dict\n        A dict containing the merged search results\n    \"\"\"\n    ret = init_search_results()\n    if metric == 'latency':\n        is_first = 1\n        for result in results:\n            if len(result['records']) == 0:\n                continue\n\n            if is_first == 1:\n                ret = result\n                is_first = 0\n            else:\n                records = result['records']\n                for record in records:\n                    inserted = False\n                    for cmp_id in range(len(ret['records'])):\n                        cmp_record = ret['records'][cmp_id]\n                        # Check if it is a duplicate record\n                        if record['cmd'] == cmp_record['cmd']:\n                            inserted = True\n                            break\n\n                        if record['latency'] < cmp_record['latency']:\n                            ret['records'].insert(cmp_id, record)\n                            inserted = True\n                            break\n                        elif record['latency'] == cmp_record['latency']:\n                            cur_res_score = res_model.compute_res_util_score(record['resource'], hw_info)\n                            cmp_res_score = res_model.compute_res_util_score(cmp_record['resource'], hw_info)\n                            if cur_res_score < cmp_res_score:\n                                ret['records'].insert(cmp_id, record)\n                                inserted = True\n                                break\n                            elif cur_res_score == cmp_res_score:\n                                # Duplicated\n                                inserted = True\n                                break\n\n                    if inserted == False:\n                        ret['records'].append(record)\n\n                ret['opt'] = ret['records'][0]\n                ret['records'] = ret['records'][:n_record]\n\n        return ret\n    else:\n        raise NotImplementedError(f'Merge results for metric {metric} is not supported.')\n\ndef init_config(setting, verbose, hw_info, cmd, training, search, tmp_dir):\n    \"\"\" Init AutoSA Optimizer global configuration.\n\n    Init the global configuration used in Optimizer.\n\n    Parameters\n    ----------\n    setting: dict\n        AutoSA Optimizer setting.\n    verbose: int\n        Print verbose level.\n    tmp_dir: str\n        Path to the temporary files.\n\n    Note\n    ----\n    Configuration is a dictionary containing the following info:\n      setting: dict\n        AutoSA Optimizer setting.\n      verbose: int\n        Print verbose level.\n      stdout:\n        Stdout pipe.\n      work_dir: str\n        The default working directory.\n      hw_info: dict\n        The hardware configuration.\n      logger:\n        The default logger.\n      cmds: list\n        A list of AutoSA commands.\n          [0]: The user input command.\n          [1]: AutoSA configuration file.\n          [2]: AutoSA output directory.\n          [3]: AutoSA sizes.\n      sa_sizes: list\n        A list of AutoSA tiling factors.\n      two_level_buffer: boolean\n        Is two_level_buffer enabled.\n      hbm: boolean\n        Is HBM enabled.\n      kernel_file_path: str\n        Input kernel file path.\n      simd_info: dict\n        Kernel SIMD information.\n      tuning: dict\n        Temporary tuning information from AutoSA.\n      monitor: dict\n        A dictionary storing the monitoring information of the DSE\n          \"n_designs\": number of designs that are examined\n\n    Returns\n    -------\n    config: dict\n        Initialized global configuration.\n    \"\"\"\n    config = {}\n    config['setting'] = setting\n    config['verbose'] = verbose\n    config['tmp_dir'] = tmp_dir\n    if verbose == 2:\n        # Print AutoSA info\n        config['stdout'] = None\n    else:\n        config['stdout'] = subprocess.DEVNULL\n    if training:\n        config['work_dir'] = f'{tmp_dir}/optimizer/training/job0'\n    else:\n        config['work_dir'] = f'{tmp_dir}/optimizer/search/job0'\n    with open(hw_info) as f:\n        config['hw_info'] = json.load(f)\n    config['cmds'] = [cmd]\n    config['cmds'].append(\n        f'--autosa-config={config[\"work_dir\"]}/autosa_config.json')\n    config['cmds'].append(f'--autosa-output-dir={config[\"work_dir\"]}/output')\n    config['cmds'].append('')\n    config['sa_sizes'] = []\n    # Look up if sa_sizes are pre-set in the cmd\n    if config['cmds'][0].find('sa-sizes') != -1:\n        m = re.search(r'--sa-sizes=\"{(.+?)}\"', config['cmds'][0])\n        if m:\n            for size in m.group(1).split(';'):\n                config['sa_sizes'].append(size)\n            # delete the sa_sizes from the cmd\n            config['cmds'][0] = re.sub(r'--sa-sizes=\".+?\"', '', config['cmds'][0])\n    if cmd.find('two-level-buffer') != -1:\n        config['two_level_buffer'] = 1\n    else:\n        config['two_level_buffer'] = 0\n    if cmd.find('hbm') != -1:\n        config['hbm'] = 1\n    else:\n        config['hbm'] = 0\n    # Load SIMD info file\n    kernel_file_path = cmd.split()[1]\n    kernel_file_path = kernel_file_path.rsplit('/', 1)[0]\n    config['kernel_file_path'] = kernel_file_path\n    config['simd_info'] = None\n    with open(kernel_file_path + '/simd_info.json', 'r') as f:\n        config['simd_info'] = json.load(f)\n\n    return config\n\n\ndef xilinx_run(cmd, hw_info, setting, training, search, verbose, tmp_dir):\n    \"\"\" Design space exploration on Xilinx platform.\n\n    The following four stages are explored in the DSE:\n    - space-time transformation\n    - array partitioning\n    - latnecy hiding\n    - simd vectorization\n\n    The DSE includes two phaese: training phase and search phase\n    In the tranining phase, for each systolic array candidate, we generate\n    a set of tuning parameters for the later three stages. This step\n    creates a suite of designs. We will use training samples to train the\n    regression models for the latency and resource usage of the design.\n\n    After the training stage is done, we enter the search phase.\n    In this phase, for each systolic array, we will explore all different\n    tiling factors in the later three stages. After the tuning parameters\n    of each stage is determined, we estimate the latency and resource usage\n    of the design using the pre-trained regression model.\n    Finally, the design with the least latency and under the resource contraints\n    is selected.\n\n    Folder structure:\n    autosa.tmp\n    - optimizer\n      - [training.log | search.log]\n      - training\n        - job0\n          - autosa_config.json\n          - output\n            - src\n            - latency_est\n            - resource_est\n          - design0\n          - design1\n      - search\n        - job0\n        - job1\n\n    Paramters\n    ---------\n    cmd: str\n        Command line to run AutoSA.\n    info: str\n        Path to FPGA platform hardware resource information file.\n    setting: dict\n        Optimizer settings.\n    training: boolean\n        Enable traning phase.\n    search: boolean\n        Enable search phase.\n    verbose: int\n        Print verbose information.\n    tmp_dir: str\n        Path to the folder that stores the temp files.\n    \"\"\"\n\n    if not os.path.exists(f'{tmp_dir}/optimizer'):\n        os.mkdir(f'{tmp_dir}/optimizer')\n\n    # Init logger and optimizer config\n    logger = init_logger(training, search, verbose, tmp_dir)\n    config = init_config(setting, verbose, hw_info, cmd, training, search, tmp_dir)\n    config['logger'] = logger\n    # Init monitor\n    config['monitor'] = init_monitor()\n\n    # Init the AutoSA configuration\n    autosa_config = {\"space_time\": {\"mode\": \"manual\"},\n                     \"array_part\": {\"enable\": 1, \"mode\": \"manual\"},\n                     \"array_part_L2\": {\n        \"enable\": config['two_level_buffer'],\n        \"mode\": \"manual\"},\n        \"latency\": {\"enable\": 1, \"mode\": \"manual\"},\n        \"simd\": {\"enable\": 1, \"mode\": \"manual\"},\n        \"hbm\": {\"enable\": config['hbm'], \"mode\": \"manual\"}}\n    config['autosa_config'] = autosa_config\n\n    # Training phase\n    if training:\n        config['logger'].info(f'Run training phase...')\n        train_xilinx(config)\n\n    # Search phase\n    if search:\n        config['logger'].info(f'Run search phase...')\n        search_xilinx(config)\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser(description='==== AutoSA Optimizer ====')\n    parser.add_argument(\n        '-c',\n        '--cmd',\n        metavar='CMD',\n        required=True,\n        help='AutoSA command line')\n    parser.add_argument(\n        '-i',\n        '--info',\n        metavar='INFO',\n        required=True,\n        help='hardware resource information')\n    parser.add_argument(\n        '-s',\n        '--setting',\n        metavar='SETTING',\n        required=False,\n        default='autosa_config/optimizer_settings.json',\n        help='optimizer settings')\n    parser.add_argument(\n        '-p',\n        '--platform',\n        metavar='PLATFORM',\n        required=True,\n        help='hardware platform: intel/xilinx')\n    parser.add_argument(\n        '--training',\n        action='store_true',\n        help='run training phase')\n    parser.add_argument(\n        '--search',\n        action='store_true',\n        help='run search phase')\n    parser.add_argument(\n        '--verbose',\n        type=int,\n        required=False,\n        default=1,\n        help='provide verbose information [0-2]')\n    parser.add_argument(\n        '--tmp-dir',\n        required=False,\n        default='./autosa.tmp',\n        help='temporary file directory')\n\n    args = parser.parse_args()\n\n    # Parse the settings into a dict\n    with open(args.setting) as f:\n        setting = json.load(f)\n\n    if args.platform == 'intel':\n        print(\"Intel platform is not supported yet!\")  # TODO\n    elif args.platform == 'xilinx':\n        xilinx_run(\n            args.cmd,\n            args.info,\n            setting,\n            args.training,\n            args.search,\n            args.verbose,\n            args.tmp_dir)\n"
  },
  {
    "path": "autosa_scripts/optimizer_prune.py",
    "content": "#!/usr/bin/env python3\n\ndef array_part_loops_pruning(loops, config):\n    \"\"\" Apply pruning on array partitioning candidate loops.\n\n    At present, we apply the following heuristics:\n    - The product of all array_part loops should be greater than the total PE number  \n    - TODO: Prune based on off-chip traffic\n\n    Parameters\n    ----------\n    loops: list\n        A list of candidate loops\n    config:\n        Global configuration\n    \"\"\"\n    pruned_loops = []\n\n    PE_lb = config['setting'][config['mode']\n                              ]['pruning']['array_part']['PE_num'][0]\n    for loop in loops:\n        if PE_lb == -1:\n            pruned_loops.append(loop)\n        else:\n            prod = 1\n            for l in loop:\n                if l > 1:\n                    prod *= l\n            if prod < PE_lb:\n                continue\n            pruned_loops.append(loop)\n\n    return pruned_loops\n\n\ndef array_part_L2_loops_pruning(loops, config):\n    \"\"\" Apply pruning on L2 array partitioning candidate loops.\n\n    At present, we wpply the following heuristics:\n    - We only apply L2 array partitioning on parallel loops to save off-chip communication.\n      We examine from outer loops to inner loops. Once we meet a non-parallel loop,\n      we will stop from here, and set the tiling factors from here to below to maximum.\n\n    Parameters\n    ----------\n    loops: list\n        A list of candidate loops\n    config:\n        Global configuration  \n    \"\"\"\n    pruned_loops = []\n    tuning = config['tuning']\n    loop_stop = 0\n    for c in tuning['array_part_L2']['coincident']:\n        if not c:\n            break\n        loop_stop += 1\n    ubs = tuning['array_part_L2']['tilable_loops'][loop_stop:]\n    for loop in loops:\n        # Examine [loop_stop:-1], only leave those that equal the upper bound\n        loop_cut = loop[loop_stop:]\n        if loop_cut != ubs:\n            continue\n        pruned_loops.append(loop)\n\n    return pruned_loops\n\n\ndef latency_hiding_loops_pruning(loops, config):\n    \"\"\" Apply pruning on latency hiding candidate loops.\n\n    At present, we apply the following heuristics:\n    - We compute the latency hiding register sizes and prune it when it is \n      greater or less than the pre-set threshold.\n\n    Parameters\n    ----------\n    loops: list\n        A list of candidate loops\n    config:\n        Global configuration\n    \"\"\"\n    pruned_loops = []\n    reg_size_lb = config['setting'][config['mode']\n                                    ]['pruning']['latency_hiding']['reg_size'][0]\n    reg_size_ub = config['setting'][config['mode']\n                                    ]['pruning']['latency_hiding']['reg_size'][1]\n    for loop in loops:\n        size = 1\n        for l in loop:\n            size *= l\n        if reg_size_lb != -1:\n            if size < reg_size_lb:\n                continue\n        if reg_size_ub != -1:\n            if size > reg_size_ub:\n                continue\n        pruned_loops.append(loop)\n\n    return pruned_loops\n\n\ndef SIMD_vectorization_PE_pruning(config, postpone=0):\n    \"\"\" Apply pruning based on the PE structures at the SIMD vectorization stage.\n\n    At present, we apply the following heuristics:\n    - We restrain the PE number within certain range\n    - We restrain the PE shape for 2D array\n\n    Parameters\n    ----------\n    config: dict\n        Global configuration\n    postpone: int\n        If the pruning is postponed after the SIMD optimization\n\n    Returns\n    -------\n    ret: boolean\n        If this configuration is to be pruned.\n    \"\"\"\n    tuning = config['tuning']\n    ret = False\n    PE_num_lb = config['setting'][config['mode']\n                                  ]['pruning']['SIMD_vectorization']['PE_num'][0]\n    PE_num_ub = config['setting'][config['mode']\n                                  ]['pruning']['SIMD_vectorization']['PE_num'][1]\n    if postpone == 0:\n        sa_dims = tuning['simd']['sa_dims']\n    else:\n        sa_dims = tuning['sa_dims']\n\n    n_pe = 1\n    for dim in sa_dims:\n        n_pe *= int(dim)\n    if PE_num_lb != -1:\n        if n_pe < PE_num_lb:\n            return True\n    if PE_num_ub != -1:\n        if n_pe > PE_num_ub:\n            return True\n    \n    if len(sa_dims) > 1:\n        sa_dims.sort(reverse=True)\n        pe_ratio = sa_dims[0] / sa_dims[1]\n        if config['setting'][config['mode']]['pruning']['SIMD_vectorization']['PE_ratio'] != -1:\n            if pe_ratio > config['setting'][config['mode']]['pruning']['SIMD_vectorization']['PE_ratio']:\n                return True\n\n    return ret\n\n\ndef reorder_simd_loops(loops):\n    \"\"\" Reorder the simd loops for pruning.\n\n    The input loops contains a list of candidate loops. \n    For each candidate loop, it is in the format of [1, 1, X].\n    We will sort the loops based on the non-one element in descending order.    \n\n    Parameters\n    ----------\n    loops: list\n        A list containing all candidate SIMD loops to be evaluated.\n    \"\"\"\n    # Find the position of the non-one element.\n    pos = -1\n    for loop in loops:\n        for i in range(len(loop)):\n            if loop[i] != 1:\n                pos = i\n                break\n        if pos != -1:\n            break\n\n    if pos == -1:\n        # All the loops are ones.\n        return loops\n\n    loops.sort(key=lambda x: x[pos], reverse=True)\n    return loops\n\n\ndef SIMD_vectorization_latency_pruning(config):\n    \"\"\" Perform latency-based pruning at the SIMD vectorization stage.\n\n    We have already reordered the SIMD candidate loops in descending order.\n    Therefore, if the last design evaluated is slower than the opt design found\n    so far, there is no chance for the rest of candidates which has a smaller \n    SIMD factor to beat the opt design. \n    We will stop exploration for these loops and return.\n    Otherwise, if the resource usage is legal, we have already found a design that \n    achieves the least latency in the current group. For the other designs with \n    a smaller SIMD factor, their latency is no less than the current design.\n    We will stop exploration for these loops and return.\n    However, there a chance that the designs with a smaller SIMD factor acheives \n    the same latency but with less resource usage (for a comm bound design). \n    At present, we ignore such cases.\n\n    \"\"\"\n    last_design = config['monitor']['last_design']\n    if last_design['latency'] == -1:\n        # The current design is already slower than opt., stop exploration.\n        return True\n    else:\n        # The current design is resource-legal, stop exploration.\n        if not last_design['resource']:\n            return True\n    return False\n"
  },
  {
    "path": "autosa_scripts/pe_group.py",
    "content": "#!/usr/bin/env python3\n\nimport sympy\nimport sys\nimport argparse\nimport re\nimport numpy as np\n\ndef locate_data_trans_block(line_id, lines):\n    prev_line_id = line_id - 1\n    while prev_line_id >= 0:\n        prev_line = lines[prev_line_id]\n        if prev_line.find('{') != -1:\n            block_start = prev_line_id\n            break\n        prev_line_id -= 1\n    nxt_line_id = line_id + 1\n    while nxt_line_id < len(lines):\n        nxt_line = lines[nxt_line_id]\n        if nxt_line.find('}') != -1:\n            block_end = nxt_line_id\n            break\n        nxt_line_id += 1\n\n    return block_start, block_end\n\ndef modify_index(lines, var_map, PE_dims):\n    #print(var_map)\n\n    new_lines = []\n    for line in lines:\n        for var in var_map:\n            new_var = var\n            for dim_idx in range(len(PE_dims)):\n                new_var += f'[s{dim_idx}]'\n            line = re.sub(rf'{var}', f'{new_var}', line)\n            if line.find(var) != -1 and var_map[var]['simd'] == 1:\n                # TODO: Consider the index only appears once\n                pos = line.find(var)\n                end_pos = pos\n                for p in range(pos, len(line)):\n                    if line[p] == ' ' or line[p] == ')':\n                        end_pos = p - 1\n                        break\n                #print(pos)\n                #print(end_pos)\n                ref = line[pos : end_pos + 1]\n                #print(ref)\n                index = ref[ref.find('['):]\n                indices = []\n                while len(index) > 0:\n                    start_pos = index.find('[')\n                    end_pos = index.find(']')\n                    indices.append(index[start_pos:end_pos+1])\n                    index = index[end_pos + 1:]\n                #print(index)\n                #print(indices)\n                last_index = indices[-1][1:-1]\n                new_ref = ref[:ref.find('[')]\n                for index in indices[:-1]:\n                    new_ref += index\n                new_ref += '.data['\n                new_ref += last_index\n                new_ref += ']'\n                #print(ref)\n                #print(new_ref)\n                line = line.replace(ref, new_ref)\n\n        new_lines.append(line)\n\n    return new_lines\n\ndef insert_data_trans(lines, data_trans_info, PE_dims):    \n    for group_name in data_trans_info:\n        info = data_trans_info[group_name]\n        #print(group_name)\n        #print(data_trans_info[group_name]['PE_index_start'])\n        #print(data_trans_info[group_name]['PE_index_end'])\n        dir = [info['PE_index_end'][dim] - info['PE_index_start'][dim] for dim in range(len(info['PE_index_start']))]\n        #print(dir)\n        if dir == [0, 1]:\n            new_lines = [\\\n                '#pragma unroll\\n',\n                f'for (int s0 = 0; s0 < {PE_dims[0]}; s0++) {{\\n',\n                f'  local_{group_name}[s0][0][0] = read_channel_intel(fifo_{group_name}_PE[s0][0]);\\n',\n                '}\\n'\n                '#pragma unroll\\n',\n                f'for (int s0 = 0; s0 < {PE_dims[0]}; s0++) {{\\n',\n                '  #pragma unroll\\n',\n                f'  for (int s1 = 1; s1 < {PE_dims[1]}; s1++) {{\\n',\n                f'    local_{group_name}[s0][s1][0] = __fpga_reg(__fpga_reg(local_{group_name}[s0][0][0]));\\n'\n                '  }\\n'\n                '}\\n'\n            ]            \n        elif dir == [1, 0]:\n            new_lines = [\\\n                '#pragma unroll\\n',\n                f'for (int s1 = 0; s1 < {PE_dims[1]}; s1++) {{\\n',\n                f'  local_{group_name}[0][s1][0] = read_channel_intel(fifo_{group_name}_PE[0][s1]);\\n',\n                '}\\n'\n                '#pragma unroll\\n',\n                f'for (int s0 = 1; s0 < {PE_dims[0]}; s0++) {{\\n',\n                '  #pragma unroll\\n',\n                f'  for (int s1 = 0; s1 < {PE_dims[1]}; s1++) {{\\n',\n                f'    local_{group_name}[s0][s1][0] = __fpga_reg(__fpga_reg(local_{group_name}[0][s1][0]));\\n'\n                '  }\\n'\n                '}\\n'\n            ]            \n        else:\n            raise NotImplementedError('Unsupport Direction')\n        lines = new_lines + lines\n\n    return lines\n\ndef modify_channels(lines, data_trans_info, PE_dims):\n    # In the channel declaration, delete all the fifo_{group}_PE\n    new_lines = []\n    drain_groups = []\n    for line in lines:\n        find = False\n        for group in data_trans_info:\n            if line.find('/* PE fifo */') != -1 and line.find(f'fifo_{group}_PE') != -1:\n                find = True\n        if line.find('/* PE fifo */') != -1 and line.find(f'_PE_') != -1 and line.find('drain') != -1:\n            m = re.search(r'fifo_(.+?)_PE', line)\n            drain_group = m.group(1)\n            if drain_group not in drain_groups:\n                drain_groups.append(drain_group)\n            find = True\n        if not find:\n            new_lines.append(line)    \n\n    for line_id in range(len(lines)):\n        line = lines[line_id]\n        if line.find('/* Channel Declaration */') != -1:\n            channel_decl_start = line_id\n            break\n    for group in data_trans_info:\n        info = data_trans_info[group]\n        dir = [info['PE_index_end'][dim] - info['PE_index_start'][dim] for dim in range(len(info['PE_index_start']))]\n        if dir == [0, 1]:\n            line = f'/* PE fifo */ channel {info[\"data_type\"]} fifo_{group}_PE[{PE_dims[0]}][1] __attribute__((depth(2)));\\n'\n        elif dir == [1, 0]:\n            line = f'/* PE fifo */ channel {info[\"data_type\"]} fifo_{group}_PE[1][{PE_dims[1]}] __attribute__((depth(2)));\\n'\n        else:\n            raise NotImplementedError('Unsupport Direction')\n        new_lines.insert(channel_decl_start + 1, line)\n    for group in drain_groups:\n        line = f'/* PE fifo */ channel float fifo_{group}_PE[{PE_dims[0]}][{PE_dims[1]}] __attribute__((depth(2)));\\n'\n        new_lines.insert(channel_decl_start + 1, line)\n\n    # Replace all channel calls\n    for group in data_trans_info:\n        fifo_prefix = f'fifo_{group}_PE_'\n        for line_id in range(len(new_lines)):\n            line = new_lines[line_id]\n            if line.find(fifo_prefix) != -1:\n                modify = False\n                if line.find('write_channel_intel') != -1:\n                    m = re.search(r'\\((.+?)\\)', line)\n                    fifo_name = m.group(1).split(',')[0]                    \n                    modify = True\n                elif line.find('read_channel_intel') != -1:\n                    m = re.search(r'\\((.+?)\\)', line)\n                    fifo_name = m.group(1)\n                    modify = True\n                if modify:                    \n                    #print(fifo_name)\n                    index = fifo_name[len(fifo_prefix):].split('_')\n                    new_fifo_name = fifo_prefix[:-1]\n                    for ind in index:\n                        new_fifo_name += f'[{ind}]'\n                    #print(new_fifo_name)\n                    line = line.replace(fifo_name, new_fifo_name)\n                    new_lines[line_id] = line\n\n    #print(lines)\n    #print(drain_groups)\n    for group in drain_groups:\n        fifo_prefix = f'fifo_{group}_PE_'\n        for line_id in range(len(new_lines)):\n            line = new_lines[line_id]\n            if line.find(fifo_prefix) != -1:\n                modify = False\n                if line.find('write_channel_intel') != -1:\n                    m = re.search(r'\\((.+?)\\)', line)\n                    fifo_name = m.group(1).split(',')[0]                    \n                    modify = True         \n                elif line.find('read_channel_intel') != -1:\n                    m = re.search(r'\\((.+?)\\)', line)\n                    fifo_name = m.group(1)\n                    modify = True                           \n                if modify:       \n                    # Check if inside a PE definition\n                    inside_PE = False\n                    prev_line_id = line_id - 1                                        \n                    while prev_line_id >= 0:                        \n                        prev_line = new_lines[prev_line_id]                                                             \n                        if prev_line.find('/* Module') != -1:\n                            break\n                        if prev_line.find('void PE') != -1:\n                            inside_PE = True\n                            break      \n                        prev_line_id -= 1                                                      \n                    #print(inside_PE)                        \n                    #print(fifo_prefix)\n                    #print(fifo_name)\n                    index = fifo_name[len(fifo_prefix):].split('_')\n                    new_fifo_name = fifo_prefix[:-1]\n                    if inside_PE:\n                        for i in range(len(PE_dims)):\n                            new_fifo_name += f'[s{i}]'\n                    else:\n                        for ind in index:                        \n                            new_fifo_name += f'[{ind}]'\n                    #print(new_fifo_name)\n                    line = line.replace(fifo_name, new_fifo_name)\n                    new_lines[line_id] = line\n\n    # Delete all dummy functions\n    module_start = False\n    delete_module = False\n    delete_pos = []\n    for line_id in range(len(new_lines)):\n        line = new_lines[line_id]\n        if line.find('/* Module Definition */') != -1:\n            module_start = not module_start\n            if module_start:\n                module_start_pos = line_id\n                delete_module = False\n            if not module_start:\n                module_end_pos = line_id\n                if delete_module:\n                    delete_pos.append([module_start_pos, module_end_pos])\n            if module_start:\n                nxt_line = new_lines[line_id + 3]            \n                if nxt_line.find('dummy') != -1:\n                    delete_module = True\n    offset = 0\n    for p in delete_pos:\n        new_lines = new_lines[:p[0] - offset] + new_lines[p[1] + 1 - offset:]\n        offset += (p[1] - p[0] + 1)                \n\n    return new_lines\n\ndef modify_body(lines, PE_dims, var_map):\n    \"\"\"\n    This function modifies the PE body.\n    For the user statement, it is wrapped with unrolled space loops\n    For the data transfer statements, they are replaced with two loop blocks,\n    one for initializing the boundary, the other for reusing the data.\n    \"\"\"    \n    loop_bodies = []\n    # Locate the user statements\n    for line_id in range(len(lines)):\n        line = lines[line_id]\n        if line.find('hls_pipeline') != -1:\n            # extract the loop body\n            body_start = line_id\n            r_minus_l = -1\n            nxt_line_id = line_id + 1            \n            while nxt_line_id < len(lines):\n                nxt_line = lines[nxt_line_id]\n                if nxt_line.find('}') != -1:\n                    r_minus_l += 1\n                if nxt_line.find('{') != -1:\n                    r_minus_l -= 1\n                if r_minus_l == 0:\n                    body_end = nxt_line_id - 1\n                    break\n                nxt_line_id += 1\n            loop_body = lines[body_start : body_end + 1]\n            #print(loop_body)\n            loop_bodies.append({'pos': [body_start, body_end], 'lines': loop_body})\n    \n    # Modidy the loop bodies\n    #for body in loop_bodies:\n    body_offset = 0\n    for idx in range(len(loop_bodies)):\n        body = loop_bodies[idx]\n        body_lines = body['lines']        \n        group_names = []\n        has_data_trans = True\n        data_trans_info = extract_data_trans_info(body_lines, PE_dims)\n        # Remove the in transfer\n        while has_data_trans:\n            has_data_trans = False\n            for line_id in range(len(body_lines)):\n                line = body_lines[line_id]\n                if line.find('read_channel_intel') != -1:\n                    has_data_trans = True\n                    # Locate the read block and the write block\n                    block_start, block_end = locate_data_trans_block(line_id, body_lines)\n                    m = re.search(r'\\((.+?)\\)', line)    \n                    fifo_name = m.group(1)\n                    group_name = fifo_name.split('_')[1]\n                    group_names.append(group_name)\n                    break\n            if has_data_trans:\n                body_lines = body_lines[:block_start] + body_lines[block_end + 1:]\n        # Remove the out transfer\n        has_data_trans = True\n        while has_data_trans:\n            has_data_trans = False\n            for line_id in range(len(body_lines)):\n                line = body_lines[line_id]\n                if line.find('write_channel_intel') != -1:\n                    m = re.search(r'\\((.+?)\\)', line)\n                    fifo_name = m.group(1).split(',')[0]\n                    group_name = fifo_name.split('_')[1]\n                    if group_name in group_names:\n                        has_data_trans = True\n                        block_start, block_end = locate_data_trans_block(line_id, body_lines)\n            if has_data_trans:\n                body_lines = body_lines[:block_start] + body_lines[block_end + 1:]\n        #print(body_lines)\n        # Wrap the body with space loops\n        for dim_idx in range(len(PE_dims)):\n            dim = PE_dims[dim_idx]            \n            line = f'#pragma unroll\\nfor (int s{dim_idx} = 0; s{dim_idx} < {dim}; s{dim_idx}++) {{\\n'\n            body_lines.insert(dim_idx, line)                        \n        for dim in PE_dims:\n            body_lines.append('}\\n')\n\n        # Modify the index\n        body_lines = modify_index(body_lines, var_map, PE_dims)\n        #print(body_lines)\n\n        # Insert the data transfer stmts\n        body_lines = insert_data_trans(body_lines, data_trans_info, PE_dims)\n        #loop_bodies[idx]['lines'] = body_lines\n\n        # Replace the loop bodies\n        body_pos = body['pos']        \n        lines = lines[: body_offset + body_pos[0]] \\\n                + body_lines \\\n                + lines[body_offset + body_pos[1] + 1 :]   \n        body_offset += len(body_lines) - (body_pos[1] - body_pos[0] + 1)\n\n    return lines\n\ndef extract_data_trans_info(lines, PE_dims):\n    \"\"\" Extract the data transfer information \n\n    \"\"\"\n    data_trans_info = {}\n    for line_id in range(len(lines)):\n        line = lines[line_id]\n        if line.find('read_channel_intel') != -1:\n            # Check the start and end of the block\n            block_start, block_end = locate_data_trans_block(line_id, lines)            \n            block_lines = lines[block_start : block_end + 1]\n            # Parse the data type\n            block_line = block_lines[1]\n            data_type = block_line.strip().split(' ')[0]\n            #print(data_type)\n            # Parse the start PE index\n            block_line = block_lines[2]\n            m = re.search(r'\\((.+?)\\)', block_line)\n            fifo_name = m.group(1)\n            PE_index_start = fifo_name.split('_')[-len(PE_dims):]\n            PE_index_start = [int(s) for s in PE_index_start]\n            #print(PE_index_start)\n            # Parse the IO group name\n            group_name = fifo_name.split('_')[1]\n            #print(group_name)\n            data_trans_info[group_name] = {\\\n                'in_block_lines': block_lines, 'in_block_pos': [block_start, block_end], \\\n                'PE_index_start': PE_index_start, 'data_type': data_type}\n        if line.find('write_channel_intel') != -1:\n            m = re.search(r'\\((.+?)\\)', line)\n            fifo_name = m.group(1).split(',')[0]\n            group_name = fifo_name.split('_')[1]\n            if group_name in data_trans_info:                \n                # Check the start and end of the block\n                block_start, block_end = locate_data_trans_block(line_id, lines)\n                block_lines = lines[block_start : block_end + 1]\n                # Parse the end PE index\n                block_line = block_lines[3]\n                m = re.search(r'\\((.+?)\\)', block_line)\n                fifo_name = m.group(1).split(',')[0]\n                PE_index_end = fifo_name.split('_')[-len(PE_dims):]\n                PE_index_end = [int(s) for s in PE_index_end]\n                #print(PE_index_end)\n                group_name = fifo_name.split('_')[1]\n                data_trans_info[group_name]['PE_index_end'] = PE_index_end\n                data_trans_info[group_name]['out_block_lines'] = block_lines\n                data_trans_info[group_name]['out_block_pos'] = [block_start, block_end]\n\n    return data_trans_info\n\ndef compose_PE(data_trans_info, PE_dims, PE_defs):\n    PE_def = PE_defs[0]\n    # Extract the variable declariton and main body */\n    PE_lines = PE_def['def']\n    var_start = False\n    var_end = False\n    var_lines = []\n    body_lines = []\n    for line_id in range(len(PE_lines)):\n        line = PE_lines[line_id]\n        if line.find('Variable Declaration') != -1:\n            var_start = not var_start\n            if not var_start:\n                var_end = True\n            continue\n        if var_start:\n            var_lines.append(line)\n        if var_end:\n            body_lines.append(line)\n    var_lines = var_lines[1:] # Remove the module id\n    body_lines = body_lines[:-2] # Remove the function end bracket\n\n    lines = []\n    lines.append('/* Module Definition */\\n')\n    lines.append('__attribute__((max_global_work_dim(0)))\\n')\n    lines.append('__attribute__((autorun))\\n')\n    lines.append('kernel void PE()\\n')\n    lines.append('{\\n')\n\n    var_map = {}\n    # Print the variable definitions \n    lines.append('  /* Variable Declaration */\\n')\n    for var_line in var_lines:\n        simd = 0\n        m = re.search(r'local_(.+?)\\[', var_line)\n        group_name = m.group(1)\n        data_type = var_line.strip().split(' ')[0]\n        index = var_line[var_line.find('['):var_line.find(';')]\n        indices = []\n        while len(index) > 0:\n            start_pos = index.find('[')\n            end_pos = index.find(']')\n            indices.append(index[start_pos:end_pos+1])\n            index = index[end_pos + 1:]\n        #print(group_name, data_type, indices)            \n        for dim in PE_dims:\n            index = f'[{dim}]'\n            indices.insert(0, index)\n        if group_name in data_trans_info:\n            if data_trans_info[group_name]['data_type'] != data_type:\n                # SIMD > 1\n                simd = 1\n                data_type = data_trans_info[group_name]['data_type']\n                indices = indices[:-1]\n        #print(group_name, data_type, indices)            \n        new_index = ''\n        for index in indices:\n            new_index += index\n        new_var_line = f'  {data_type} local_{group_name}{new_index};'\n        #print(new_var_line)      \n        var_map[f'local_{group_name}'] = {'simd': simd}\n        lines.append(new_var_line + '\\n')\n        \n    lines.append('  /* Variable Declaration */\\n')\n\n    # Print the body\n    new_body_lines = modify_body(body_lines, PE_dims, var_map)\n    for line in new_body_lines:\n        lines.append(line)\n\n    lines.append('}\\n')\n    lines.append('/* Module Definition */\\n')\n\n    return lines\n\ndef run(input_f, output_f):\n    \"\"\" Group PEs into a Monolithic Function\n\n    This funciton is only used for the following case:\n    - Intel OpenCL\n    - The systolic array should be an output-stationary rectangular array\n    We will first collect the array dims and the data transfer direction for each IO group.\n    Next we will generate a new monolithic function of PE:\n    - Variable declaration: \n      - Remove module ids\n      - Extend all the local arrays with array dimensions. \n        - If the array is an external array, we will repack the array with the SIMD factor\n    - For each statement, add the space loops with unroll pragma      \n    \"\"\"\n    with open(input_f) as f:\n        lines = f.readlines()\n\n    # Collect the array dims\n    PE_defs = []\n    module_start = False\n    is_PE = False    \n    PE_indices = []\n    for line_id in range(len(lines)):\n        line = lines[line_id]\n        if line.find('Module Definition') != -1:\n            module_start = not module_start\n            if module_start:\n                module_start_pos = line_id\n                is_PE = False\n            else:\n                module_end_pos = line_id\n                if is_PE:\n                    PE_defs.append({'def': lines[module_start_pos : module_end_pos + 1], \\\n                                    'pos': [module_start_pos, module_end_pos]})\n            if module_start:\n                #print(line_id)\n                nxt_line_id = line_id + 1\n                while nxt_line_id < len(lines):                    \n                    nxt_line = lines[nxt_line_id]\n                    if nxt_line.find('kernel void PE') != -1:\n                        is_PE = True\n                        m = re.search(r'void PE(.+?)\\(', nxt_line)\n                        #print(nxt_line)\n                        if m:\n                            PE_index = m.group(1).split('_')[1:]\n                            PE_indices.append(PE_index)\n                        if is_PE:\n                            break\n                    if nxt_line.find('Module Definition') != -1:\n                        break\n                    nxt_line_id += 1\n\n    #print(PE_indices)\n    PE_dims = [int(d) for d in PE_indices[0]]\n    for ind in PE_indices:\n        for dim in range(len(PE_dims)):\n            PE_dims[dim] = max(PE_dims[dim], int(ind[dim]) + 1)\n    #print(PE_dims)\n    \n    PE_lines = PE_defs[0]['def']\n    # Parse the data transfer information\n    data_trans_info = extract_data_trans_info(PE_lines, PE_dims)    \n\n    # Compose the new PE function\n    PE_lines = compose_PE(data_trans_info, PE_dims, PE_defs)\n\n    line_offset = 0\n    for PE_def in PE_defs:\n        lines = lines[:PE_def['pos'][0] - line_offset] + lines[PE_def['pos'][1] + 1 - line_offset:]\n        line_offset += (PE_def['pos'][1] - PE_def['pos'][0] + 1)\n\n    lines = lines + PE_lines\n\n    # Modify the channels\n    lines = modify_channels(lines, data_trans_info, PE_dims)\n\n    with open(output_f, 'w') as f:\n        for line in lines:\n            f.write(line)\n    #    f.writelines(PE_lines)\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser(description='Group PEs into a Monolithic Function')\n    parser.add_argument('-i', required=True, help='input kernel function')\n    parser.add_argument('-o', required=True, help='output kernel function')\n\n    args = parser.parse_args()\n    run(args.i, args.o)"
  },
  {
    "path": "autosa_scripts/ppcg_changes/isl/ast_type.h",
    "content": "#ifndef ISL_AST_TYPE_H\n#define ISL_AST_TYPE_H\n\n#include <isl/list.h>\n\n#if defined(__cplusplus)\nextern \"C\" {\n#endif\n\n/* AutoSA Extended */\nenum autosa_loop_type {\n\tautosa_loop_error = -1,\n  autosa_loop_default = 0,\n  autosa_loop_time,\n  autosa_loop_space,\n  autosa_loop_latency,\n  autosa_loop_simd,\n  autosa_loop_array_part\t\n};\n/* AutoSA Extended */\n\nstruct __isl_export isl_ast_expr;\ntypedef struct isl_ast_expr isl_ast_expr;\n\nstruct __isl_export isl_ast_node;\ntypedef struct isl_ast_node isl_ast_node;\n\nenum isl_ast_expr_op_type {\n\tisl_ast_expr_op_error = -1,\n\tisl_ast_expr_op_and,\n\tisl_ast_expr_op_and_then,\n\tisl_ast_expr_op_or,\n\tisl_ast_expr_op_or_else,\n\tisl_ast_expr_op_max,\n\tisl_ast_expr_op_min,\n\tisl_ast_expr_op_minus,\n\tisl_ast_expr_op_add,\n\tisl_ast_expr_op_sub,\n\tisl_ast_expr_op_mul,\n\tisl_ast_expr_op_div,\n\tisl_ast_expr_op_fdiv_q,\t/* Round towards -infty */\n\tisl_ast_expr_op_pdiv_q,\t/* Dividend is non-negative */\n\tisl_ast_expr_op_pdiv_r,\t/* Dividend is non-negative */\n\tisl_ast_expr_op_zdiv_r,\t/* Result only compared against zero */\n\tisl_ast_expr_op_cond,\n\tisl_ast_expr_op_select,\n\tisl_ast_expr_op_eq,\n\tisl_ast_expr_op_le,\n\tisl_ast_expr_op_lt,\n\tisl_ast_expr_op_ge,\n\tisl_ast_expr_op_gt,\n\tisl_ast_expr_op_call,\n\tisl_ast_expr_op_access,\n\tisl_ast_expr_op_member,\n\tisl_ast_expr_op_address_of\n};\n\n#define isl_ast_op_type\t\tisl_ast_expr_op_type\n#define isl_ast_op_error\tisl_ast_expr_op_error\n#define isl_ast_op_and\t\tisl_ast_expr_op_and\n#define isl_ast_op_and_then\tisl_ast_expr_op_and_then\n#define isl_ast_op_or\t\tisl_ast_expr_op_or\n#define isl_ast_op_or_else\tisl_ast_expr_op_or_else\n#define isl_ast_op_max\t\tisl_ast_expr_op_max\n#define isl_ast_op_min\t\tisl_ast_expr_op_min\n#define isl_ast_op_minus\tisl_ast_expr_op_minus\n#define isl_ast_op_add\t\tisl_ast_expr_op_add\n#define isl_ast_op_sub\t\tisl_ast_expr_op_sub\n#define isl_ast_op_mul\t\tisl_ast_expr_op_mul\n#define isl_ast_op_div\t\tisl_ast_expr_op_div\n#define isl_ast_op_fdiv_q\tisl_ast_expr_op_fdiv_q\n#define isl_ast_op_pdiv_q\tisl_ast_expr_op_pdiv_q\n#define isl_ast_op_pdiv_r\tisl_ast_expr_op_pdiv_r\n#define isl_ast_op_zdiv_r\tisl_ast_expr_op_zdiv_r\n#define isl_ast_op_cond\t\tisl_ast_expr_op_cond\n#define isl_ast_op_select\tisl_ast_expr_op_select\n#define isl_ast_op_eq\t\tisl_ast_expr_op_eq\n#define isl_ast_op_le\t\tisl_ast_expr_op_le\n#define isl_ast_op_lt\t\tisl_ast_expr_op_lt\n#define isl_ast_op_ge\t\tisl_ast_expr_op_ge\n#define isl_ast_op_gt\t\tisl_ast_expr_op_gt\n#define isl_ast_op_call\t\tisl_ast_expr_op_call\n#define isl_ast_op_access\tisl_ast_expr_op_access\n#define isl_ast_op_member\tisl_ast_expr_op_member\n#define isl_ast_op_address_of\tisl_ast_expr_op_address_of\n\nenum isl_ast_expr_type {\n\tisl_ast_expr_error = -1,\n\tisl_ast_expr_op,\n\tisl_ast_expr_id,\n\tisl_ast_expr_int\n};\n\nenum isl_ast_node_type {\n\tisl_ast_node_error = -1,\n\tisl_ast_node_for = 1,\n\tisl_ast_node_if,\n\tisl_ast_node_block,\n\tisl_ast_node_mark,\n\tisl_ast_node_user\n};\n\nenum isl_ast_loop_type {\n\tisl_ast_loop_error = -1,\n\tisl_ast_loop_default = 0,\n\tisl_ast_loop_atomic,\n\tisl_ast_loop_unroll,\n\tisl_ast_loop_separate\n};\n\nstruct isl_ast_print_options;\ntypedef struct isl_ast_print_options isl_ast_print_options;\n\nISL_DECLARE_LIST(ast_expr)\nISL_DECLARE_EXPORTED_LIST(ast_node)\n\n#if defined(__cplusplus)\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "autosa_scripts/ppcg_changes/isl/files.txt",
    "content": "include/isl/schedule_node.h\ninclude/isl/ast_type.h\ninclude/isl/schedule.h\nisl_schedule_tree.c\nisl_schedule_tree.h\nisl_schedule_node.c\nisl_schedule_band.c\nisl_schedule_band.h\nisl_schedule.c\n"
  },
  {
    "path": "autosa_scripts/ppcg_changes/isl/isl_patch.sh",
    "content": "#!/bin/sh\ncp ast_type.h ../../../src/isl/include/isl/\ncp schedule_node.h ../../../src/isl/include/isl/\ncp schedule.h ../../../src/isl/include/isl/\ncp vec.h ../../../src/isl/include/isl/\ncp isl_schedule_tree.c ../../../src/isl/\ncp isl_schedule_tree.h ../../../src/isl/\ncp isl_schedule_node.c ../../../src/isl/\ncp isl_schedule_band.c ../../../src/isl/\ncp isl_schedule_band.h ../../../src/isl/\ncp isl_schedule.c ../../../src/isl/\n"
  },
  {
    "path": "autosa_scripts/ppcg_changes/isl/isl_schedule.c",
    "content": "/*\n * Copyright 2011      INRIA Saclay\n * Copyright 2012-2014 Ecole Normale Superieure\n * Copyright 2016      Sven Verdoolaege\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege, INRIA Saclay - Ile-de-France,\n * Parc Club Orsay Universite, ZAC des vignes, 4 rue Jacques Monod,\n * 91893 Orsay, France\n * and Ecole Normale Superieure, 45 rue d'Ulm, 75230 Paris, France\n */\n\n#include <isl/ctx.h>\n#include <isl/val.h>\n#include <isl_aff_private.h>\n#include <isl/map.h>\n#include <isl/set.h>\n#include <isl/schedule.h>\n#include <isl/schedule_node.h>\n#include <isl_sort.h>\n#include <isl/printer.h>\n#include <isl_schedule_private.h>\n#include <isl_schedule_tree.h>\n#include <isl_schedule_node_private.h>\n\n/* Return a schedule encapsulating the given schedule tree.\n *\n * We currently only allow schedule trees with a domain or extension as root.\n *\n * The leaf field is initialized as a leaf node so that it can be\n * used to represent leaves in the constructed schedule.\n * The reference count is set to -1 since the isl_schedule_tree\n * should never be freed.  It is up to the (internal) users of\n * these leaves to ensure that they are only used while the schedule\n * is still alive.\n */\n__isl_give isl_schedule *isl_schedule_from_schedule_tree(isl_ctx *ctx,\n\t__isl_take isl_schedule_tree *tree)\n{\n\tenum isl_schedule_node_type type;\n\tisl_schedule *schedule;\n\n\tif (!tree)\n\t\treturn NULL;\n\ttype = isl_schedule_tree_get_type(tree);\n\tif (type != isl_schedule_node_domain &&\n\t    type != isl_schedule_node_extension)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_unsupported,\n\t\t\t\"root of schedule tree should be a domain or extension\",\n\t\t\tgoto error);\n\n\tschedule = isl_calloc_type(ctx, isl_schedule);\n\tif (!schedule)\n\t\tgoto error;\n\n\tschedule->ref = 1;\n\tschedule->root = tree;\n\tschedule->leaf = isl_schedule_tree_leaf(ctx);\n\n\tif (!schedule->leaf)\n\t\treturn isl_schedule_free(schedule);\n\treturn schedule;\nerror:\n\tisl_schedule_tree_free(tree);\n\treturn NULL;\n}\n\n/* Return a pointer to a schedule with as single node\n * a domain node with the given domain.\n */\n__isl_give isl_schedule *isl_schedule_from_domain(\n\t__isl_take isl_union_set *domain)\n{\n\tisl_ctx *ctx;\n\tisl_schedule_tree *tree;\n\n\tctx = isl_union_set_get_ctx(domain);\n\ttree = isl_schedule_tree_from_domain(domain);\n\treturn isl_schedule_from_schedule_tree(ctx, tree);\n}\n\n/* Return a pointer to a schedule with as single node\n * a domain node with an empty domain.\n */\n__isl_give isl_schedule *isl_schedule_empty(__isl_take isl_space *space)\n{\n\treturn isl_schedule_from_domain(isl_union_set_empty(space));\n}\n\n/* Return a new reference to \"sched\".\n */\n__isl_give isl_schedule *isl_schedule_copy(__isl_keep isl_schedule *sched)\n{\n\tif (!sched)\n\t\treturn NULL;\n\n\tsched->ref++;\n\treturn sched;\n}\n\n/* Return an isl_schedule that is equal to \"schedule\" and that has only\n * a single reference.\n */\n__isl_give isl_schedule *isl_schedule_cow(__isl_take isl_schedule *schedule)\n{\n\tisl_ctx *ctx;\n\tisl_schedule_tree *tree;\n\n\tif (!schedule)\n\t\treturn NULL;\n\tif (schedule->ref == 1)\n\t\treturn schedule;\n\n\tctx = isl_schedule_get_ctx(schedule);\n\tschedule->ref--;\n\ttree = isl_schedule_tree_copy(schedule->root);\n\treturn isl_schedule_from_schedule_tree(ctx, tree);\n}\n\n__isl_null isl_schedule *isl_schedule_free(__isl_take isl_schedule *sched)\n{\n\tif (!sched)\n\t\treturn NULL;\n\n\tif (--sched->ref > 0)\n\t\treturn NULL;\n\n\tisl_schedule_tree_free(sched->root);\n\tisl_schedule_tree_free(sched->leaf);\n\tfree(sched);\n\treturn NULL;\n}\n\n/* Replace the root of \"schedule\" by \"tree\".\n */\n__isl_give isl_schedule *isl_schedule_set_root(\n\t__isl_take isl_schedule *schedule, __isl_take isl_schedule_tree *tree)\n{\n\tif (!schedule || !tree)\n\t\tgoto error;\n\tif (schedule->root == tree) {\n\t\tisl_schedule_tree_free(tree);\n\t\treturn schedule;\n\t}\n\n\tschedule = isl_schedule_cow(schedule);\n\tif (!schedule)\n\t\tgoto error;\n\tisl_schedule_tree_free(schedule->root);\n\tschedule->root = tree;\n\n\treturn schedule;\nerror:\n\tisl_schedule_free(schedule);\n\tisl_schedule_tree_free(tree);\n\treturn NULL;\n}\n\nisl_ctx *isl_schedule_get_ctx(__isl_keep isl_schedule *schedule)\n{\n\treturn schedule ? isl_schedule_tree_get_ctx(schedule->leaf) : NULL;\n}\n\n/* Return a pointer to the leaf of \"schedule\".\n */\n__isl_keep isl_schedule_tree *isl_schedule_peek_leaf(\n\t__isl_keep isl_schedule *schedule)\n{\n\treturn schedule ? schedule->leaf : NULL;\n}\n\n/* Are \"schedule1\" and \"schedule2\" obviously equal to each other?\n */\nisl_bool isl_schedule_plain_is_equal(__isl_keep isl_schedule *schedule1,\n\t__isl_keep isl_schedule *schedule2)\n{\n\tif (!schedule1 || !schedule2)\n\t\treturn isl_bool_error;\n\tif (schedule1 == schedule2)\n\t\treturn isl_bool_true;\n\treturn isl_schedule_tree_plain_is_equal(schedule1->root,\n\t\t\t\t\t\tschedule2->root);\n}\n\n/* Return the (parameter) space of the schedule, i.e., the space\n * of the root domain.\n */\n__isl_give isl_space *isl_schedule_get_space(\n\t__isl_keep isl_schedule *schedule)\n{\n\tenum isl_schedule_node_type type;\n\tisl_space *space;\n\tisl_union_set *domain;\n\n\tif (!schedule)\n\t\treturn NULL;\n\ttype = isl_schedule_tree_get_type(schedule->root);\n\tif (type != isl_schedule_node_domain)\n\t\tisl_die(isl_schedule_get_ctx(schedule), isl_error_internal,\n\t\t\t\"root node not a domain node\", return NULL);\n\n\tdomain = isl_schedule_tree_domain_get_domain(schedule->root);\n\tspace = isl_union_set_get_space(domain);\n\tisl_union_set_free(domain);\n\n\treturn space;\n}\n\n/* Return a pointer to the root of \"schedule\".\n */\n__isl_give isl_schedule_node *isl_schedule_get_root(\n\t__isl_keep isl_schedule *schedule)\n{\n\tisl_ctx *ctx;\n\tisl_schedule_tree *tree;\n\tisl_schedule_tree_list *ancestors;\n\n\tif (!schedule)\n\t\treturn NULL;\n\n\tctx = isl_schedule_get_ctx(schedule);\n\ttree = isl_schedule_tree_copy(schedule->root);\n\tschedule = isl_schedule_copy(schedule);\n\tancestors = isl_schedule_tree_list_alloc(ctx, 0);\n\treturn isl_schedule_node_alloc(schedule, tree, ancestors, NULL);\n}\n\n/* Return the domain of the root domain node of \"schedule\".\n */\n__isl_give isl_union_set *isl_schedule_get_domain(\n\t__isl_keep isl_schedule *schedule)\n{\n\tif (!schedule)\n\t\treturn NULL;\n\treturn isl_schedule_tree_domain_get_domain(schedule->root);\n}\n\n/* Traverse all nodes of \"sched\" in depth first preorder.\n *\n * If \"fn\" returns -1 on any of the nodes, then the traversal is aborted.\n * If \"fn\" returns 0 on any of the nodes, then the subtree rooted\n * at that node is skipped.\n *\n * Return 0 on success and -1 on failure.\n */\nisl_stat isl_schedule_foreach_schedule_node_top_down(\n\t__isl_keep isl_schedule *sched,\n\tisl_bool (*fn)(__isl_keep isl_schedule_node *node, void *user),\n\tvoid *user)\n{\n\tisl_schedule_node *node;\n\tisl_stat r;\n\n\tif (!sched)\n\t\treturn isl_stat_error;\n\n\tnode = isl_schedule_get_root(sched);\n\tr = isl_schedule_node_foreach_descendant_top_down(node, fn, user);\n\tisl_schedule_node_free(node);\n\n\treturn r;\n}\n\n/* Traverse the node of \"sched\" in depth first postorder,\n * allowing the user to modify the visited node.\n * The traversal continues from the node returned by the callback function.\n * It is the responsibility of the user to ensure that this does not\n * lead to an infinite loop.  It is safest to always return a pointer\n * to the same position (same ancestors and child positions) as the input node.\n */\n__isl_give isl_schedule *isl_schedule_map_schedule_node_bottom_up(\n\t__isl_take isl_schedule *schedule,\n\t__isl_give isl_schedule_node *(*fn)(\n\t\t__isl_take isl_schedule_node *node, void *user), void *user)\n{\n\tisl_schedule_node *node;\n\n\tnode = isl_schedule_get_root(schedule);\n\tisl_schedule_free(schedule);\n\n\tnode = isl_schedule_node_map_descendant_bottom_up(node, fn, user);\n\tschedule = isl_schedule_node_get_schedule(node);\n\tisl_schedule_node_free(node);\n\n\treturn schedule;\n}\n\n/* Wrapper around isl_schedule_node_reset_user for use as\n * an isl_schedule_map_schedule_node_bottom_up callback.\n */\nstatic __isl_give isl_schedule_node *reset_user(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\treturn isl_schedule_node_reset_user(node);\n}\n\n/* Reset the user pointer on all identifiers of parameters and tuples\n * in the schedule \"schedule\".\n */\n__isl_give isl_schedule *isl_schedule_reset_user(\n\t__isl_take isl_schedule *schedule)\n{\n\treturn isl_schedule_map_schedule_node_bottom_up(schedule, &reset_user,\n\t\t\t\t\t\t\tNULL);\n}\n\n/* Wrapper around isl_schedule_node_align_params for use as\n * an isl_schedule_map_schedule_node_bottom_up callback.\n */\nstatic __isl_give isl_schedule_node *align_params(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\tisl_space *space = user;\n\n\treturn isl_schedule_node_align_params(node, isl_space_copy(space));\n}\n\n/* Align the parameters of all nodes in schedule \"schedule\"\n * to those of \"space\".\n */\n__isl_give isl_schedule *isl_schedule_align_params(\n\t__isl_take isl_schedule *schedule, __isl_take isl_space *space)\n{\n\tschedule = isl_schedule_map_schedule_node_bottom_up(schedule,\n\t\t\t\t\t\t    &align_params, space);\n\tisl_space_free(space);\n\treturn schedule;\n}\n\n/* Wrapper around isl_schedule_node_pullback_union_pw_multi_aff for use as\n * an isl_schedule_map_schedule_node_bottom_up callback.\n */\nstatic __isl_give isl_schedule_node *pullback_upma(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\tisl_union_pw_multi_aff *upma = user;\n\n\treturn isl_schedule_node_pullback_union_pw_multi_aff(node,\n\t\t\t\t\tisl_union_pw_multi_aff_copy(upma));\n}\n\n/* Compute the pullback of \"schedule\" by the function represented by \"upma\".\n * In other words, plug in \"upma\" in the iteration domains of \"schedule\".\n *\n * The schedule tree is not allowed to contain any expansion nodes.\n */\n__isl_give isl_schedule *isl_schedule_pullback_union_pw_multi_aff(\n\t__isl_take isl_schedule *schedule,\n\t__isl_take isl_union_pw_multi_aff *upma)\n{\n\tschedule = isl_schedule_map_schedule_node_bottom_up(schedule,\n\t\t\t\t\t\t&pullback_upma, upma);\n\tisl_union_pw_multi_aff_free(upma);\n\treturn schedule;\n}\n\n/* Expand the schedule \"schedule\" by extending all leaves\n * with an expansion node with as subtree the tree of \"expansion\".\n * The expansion of the expansion node is determined by \"contraction\"\n * and the domain of \"expansion\".  That is, the domain of \"expansion\"\n * is contracted according to \"contraction\".\n *\n * Call isl_schedule_node_expand after extracting the required\n * information from \"expansion\".\n */\n__isl_give isl_schedule *isl_schedule_expand(__isl_take isl_schedule *schedule,\n\t__isl_take isl_union_pw_multi_aff *contraction,\n\t__isl_take isl_schedule *expansion)\n{\n\tisl_union_set *domain;\n\tisl_schedule_node *node;\n\tisl_schedule_tree *tree;\n\n\tdomain = isl_schedule_get_domain(expansion);\n\n\tnode = isl_schedule_get_root(expansion);\n\tnode = isl_schedule_node_child(node, 0);\n\ttree = isl_schedule_node_get_tree(node);\n\tisl_schedule_node_free(node);\n\tisl_schedule_free(expansion);\n\n\tnode = isl_schedule_get_root(schedule);\n\tisl_schedule_free(schedule);\n\tnode = isl_schedule_node_expand(node, contraction, domain, tree);\n\tschedule = isl_schedule_node_get_schedule(node);\n\tisl_schedule_node_free(node);\n\n\treturn schedule;\n}\n\n/* Intersect the domain of the schedule \"schedule\" with \"domain\".\n * The root of \"schedule\" is required to be a domain node.\n */\n__isl_give isl_schedule *isl_schedule_intersect_domain(\n\t__isl_take isl_schedule *schedule, __isl_take isl_union_set *domain)\n{\n\tenum isl_schedule_node_type root_type;\n\tisl_schedule_node *node;\n\n\tif (!schedule || !domain)\n\t\tgoto error;\n\n\troot_type = isl_schedule_tree_get_type(schedule->root);\n\tif (root_type != isl_schedule_node_domain)\n\t\tisl_die(isl_schedule_get_ctx(schedule), isl_error_invalid,\n\t\t\t\"root node must be a domain node\", goto error);\n\n\tnode = isl_schedule_get_root(schedule);\n\tisl_schedule_free(schedule);\n\tnode = isl_schedule_node_domain_intersect_domain(node, domain);\n\tschedule = isl_schedule_node_get_schedule(node);\n\tisl_schedule_node_free(node);\n\n\treturn schedule;\nerror:\n\tisl_schedule_free(schedule);\n\tisl_union_set_free(domain);\n\treturn NULL;\n}\n\n/* Replace the domain of the schedule \"schedule\" with the gist\n * of the original domain with respect to the parameter domain \"context\".\n */\n__isl_give isl_schedule *isl_schedule_gist_domain_params(\n\t__isl_take isl_schedule *schedule, __isl_take isl_set *context)\n{\n\tenum isl_schedule_node_type root_type;\n\tisl_schedule_node *node;\n\n\tif (!schedule || !context)\n\t\tgoto error;\n\n\troot_type = isl_schedule_tree_get_type(schedule->root);\n\tif (root_type != isl_schedule_node_domain)\n\t\tisl_die(isl_schedule_get_ctx(schedule), isl_error_invalid,\n\t\t\t\"root node must be a domain node\", goto error);\n\n\tnode = isl_schedule_get_root(schedule);\n\tisl_schedule_free(schedule);\n\tnode = isl_schedule_node_domain_gist_params(node, context);\n\tschedule = isl_schedule_node_get_schedule(node);\n\tisl_schedule_node_free(node);\n\n\treturn schedule;\nerror:\n\tisl_schedule_free(schedule);\n\tisl_set_free(context);\n\treturn NULL;\n}\n\n/* Return an isl_union_map representation of the schedule. In particular,\n * return an isl_union_map corresponding to the subtree schedule of the child\n * of the root domain node.  That is, we do not intersect the domain\n * of the returned isl_union_map with the domain constraints.\n */\n__isl_give isl_union_map *isl_schedule_get_map(__isl_keep isl_schedule *sched)\n{\n\tenum isl_schedule_node_type type;\n\tisl_schedule_node *node;\n\tisl_union_map *umap;\n\n\tif (!sched)\n\t\treturn NULL;\n\ttype = isl_schedule_tree_get_type(sched->root);\n\tif (type != isl_schedule_node_domain)\n\t\tisl_die(isl_schedule_get_ctx(sched), isl_error_internal,\n\t\t\t\"root node not a domain node\", return NULL);\n\n\tnode = isl_schedule_get_root(sched);\n\tnode = isl_schedule_node_child(node, 0);\n\tumap = isl_schedule_node_get_subtree_schedule_union_map(node);\n\tisl_schedule_node_free(node);\n\n\treturn umap;\n}\n\n/* Insert a band node with partial schedule \"partial\" between the domain\n * root node of \"schedule\" and its single child.\n * Return a pointer to the updated schedule.\n *\n * If any of the nodes in the tree depend on the set of outer band nodes\n * then we refuse to insert the band node.\n */\n__isl_give isl_schedule *isl_schedule_insert_partial_schedule(\n\t__isl_take isl_schedule *schedule,\n\t__isl_take isl_multi_union_pw_aff *partial)\n{\n\tisl_schedule_node *node;\n\tint anchored;\n\n\tnode = isl_schedule_get_root(schedule);\n\tisl_schedule_free(schedule);\n\tif (!node)\n\t\tgoto error;\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_domain)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_internal,\n\t\t\t\"root node not a domain node\", goto error);\n\n\tnode = isl_schedule_node_child(node, 0);\n\tanchored = isl_schedule_node_is_subtree_anchored(node);\n\tif (anchored < 0)\n\t\tgoto error;\n\tif (anchored)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"cannot insert band node in anchored subtree\",\n\t\t\tgoto error);\n\tnode = isl_schedule_node_insert_partial_schedule(node, partial);\n\n\tschedule = isl_schedule_node_get_schedule(node);\n\tisl_schedule_node_free(node);\n\n\treturn schedule;\nerror:\n\tisl_schedule_node_free(node);\n\tisl_multi_union_pw_aff_free(partial);\n\treturn NULL;\n}\n\n/* Insert a context node with constraints \"context\" between the domain\n * root node of \"schedule\" and its single child.\n * Return a pointer to the updated schedule.\n */\n__isl_give isl_schedule *isl_schedule_insert_context(\n\t__isl_take isl_schedule *schedule, __isl_take isl_set *context)\n{\n\tisl_schedule_node *node;\n\n\tnode = isl_schedule_get_root(schedule);\n\tisl_schedule_free(schedule);\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = isl_schedule_node_insert_context(node, context);\n\tschedule = isl_schedule_node_get_schedule(node);\n\tisl_schedule_node_free(node);\n\n\treturn schedule;\n}\n\n/* Insert a guard node with constraints \"guard\" between the domain\n * root node of \"schedule\" and its single child.\n * Return a pointer to the updated schedule.\n */\n__isl_give isl_schedule *isl_schedule_insert_guard(\n\t__isl_take isl_schedule *schedule, __isl_take isl_set *guard)\n{\n\tisl_schedule_node *node;\n\n\tnode = isl_schedule_get_root(schedule);\n\tisl_schedule_free(schedule);\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = isl_schedule_node_insert_guard(node, guard);\n\tschedule = isl_schedule_node_get_schedule(node);\n\tisl_schedule_node_free(node);\n\n\treturn schedule;\n}\n\n/* Return a tree with as top-level node a filter corresponding to \"filter\" and\n * as child, the (single) child of \"tree\".\n * However, if this single child is of type \"type\", then the filter is inserted\n * in the children of this single child instead.\n */\nstatic __isl_give isl_schedule_tree *insert_filter_in_child_of_type(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *filter,\n\tenum isl_schedule_node_type type)\n{\n\tif (!isl_schedule_tree_has_children(tree)) {\n\t\tisl_schedule_tree_free(tree);\n\t\treturn isl_schedule_tree_from_filter(filter);\n\t} else {\n\t\ttree = isl_schedule_tree_child(tree, 0);\n\t}\n\n\tif (isl_schedule_tree_get_type(tree) == type)\n\t\ttree = isl_schedule_tree_children_insert_filter(tree, filter);\n\telse\n\t\ttree = isl_schedule_tree_insert_filter(tree, filter);\n\n\treturn tree;\n}\n\n/* Construct a schedule that combines the schedules \"schedule1\" and \"schedule2\"\n * with a top-level node (underneath the domain node) of type \"type\",\n * either isl_schedule_node_sequence or isl_schedule_node_set.\n * The domains of the two schedules are assumed to be disjoint.\n *\n * The new schedule has as domain the union of the domains of the two\n * schedules.  The child of the domain node is a node of type \"type\"\n * with two filters corresponding to the domains of the input schedules.\n * If one (or both) of the top-level nodes of the two schedules is itself\n * of type \"type\", then the filter is pushed into the children of that\n * node and the sequence or set is flattened.\n */\n__isl_give isl_schedule *isl_schedule_pair(enum isl_schedule_node_type type,\n\t__isl_take isl_schedule *schedule1, __isl_take isl_schedule *schedule2)\n{\n\tint disjoint;\n\tisl_ctx *ctx;\n\tenum isl_schedule_node_type root_type;\n\tisl_schedule_tree *tree1, *tree2;\n\tisl_union_set *filter1, *filter2, *domain;\n\n\tif (!schedule1 || !schedule2)\n\t\tgoto error;\n\n\troot_type = isl_schedule_tree_get_type(schedule1->root);\n\tif (root_type != isl_schedule_node_domain)\n\t\tisl_die(isl_schedule_get_ctx(schedule1), isl_error_internal,\n\t\t\t\"root node not a domain node\", goto error);\n\troot_type = isl_schedule_tree_get_type(schedule2->root);\n\tif (root_type != isl_schedule_node_domain)\n\t\tisl_die(isl_schedule_get_ctx(schedule1), isl_error_internal,\n\t\t\t\"root node not a domain node\", goto error);\n\n\tctx = isl_schedule_get_ctx(schedule1);\n\ttree1 = isl_schedule_tree_copy(schedule1->root);\n\tfilter1 = isl_schedule_tree_domain_get_domain(tree1);\n\ttree2 = isl_schedule_tree_copy(schedule2->root);\n\tfilter2 = isl_schedule_tree_domain_get_domain(tree2);\n\n\tisl_schedule_free(schedule1);\n\tisl_schedule_free(schedule2);\n\n\tdisjoint = isl_union_set_is_disjoint(filter1, filter2);\n\tif (disjoint < 0)\n\t\tfilter1 = isl_union_set_free(filter1);\n\tif (!disjoint)\n\t\tisl_die(ctx, isl_error_invalid,\n\t\t\t\"schedule domains not disjoint\",\n\t\t\tfilter1 = isl_union_set_free(filter1));\n\n\tdomain = isl_union_set_union(isl_union_set_copy(filter1),\n\t\t\t\t    isl_union_set_copy(filter2));\n\tfilter1 = isl_union_set_gist(filter1, isl_union_set_copy(domain));\n\tfilter2 = isl_union_set_gist(filter2, isl_union_set_copy(domain));\n\n\ttree1 = insert_filter_in_child_of_type(tree1, filter1, type);\n\ttree2 = insert_filter_in_child_of_type(tree2, filter2, type);\n\n\ttree1 = isl_schedule_tree_from_pair(type, tree1, tree2);\n\ttree1 = isl_schedule_tree_insert_domain(tree1, domain);\n\n\treturn isl_schedule_from_schedule_tree(ctx, tree1);\nerror:\n\tisl_schedule_free(schedule1);\n\tisl_schedule_free(schedule2);\n\treturn NULL;\n}\n\n/* Construct a schedule that combines the schedules \"schedule1\" and \"schedule2\"\n * through a sequence node.\n * The domains of the input schedules are assumed to be disjoint.\n */\n__isl_give isl_schedule *isl_schedule_sequence(\n\t__isl_take isl_schedule *schedule1, __isl_take isl_schedule *schedule2)\n{\n\treturn isl_schedule_pair(isl_schedule_node_sequence,\n\t\t\t\tschedule1, schedule2);\n}\n\n/* Construct a schedule that combines the schedules \"schedule1\" and \"schedule2\"\n * through a set node.\n * The domains of the input schedules are assumed to be disjoint.\n */\n__isl_give isl_schedule *isl_schedule_set(\n\t__isl_take isl_schedule *schedule1, __isl_take isl_schedule *schedule2)\n{\n\treturn isl_schedule_pair(isl_schedule_node_set, schedule1, schedule2);\n}\n\n/* Print \"schedule\" to \"p\".\n */\n__isl_give isl_printer *isl_printer_print_schedule(__isl_take isl_printer *p,\n\t__isl_keep isl_schedule *schedule)\n{\n\tif (!schedule)\n\t\treturn isl_printer_free(p);\n\n\treturn isl_printer_print_schedule_tree(p, schedule->root);\n}\n\n/* AutoSA Extended */\n/* Return a new duplicate schedule of \"sched\".\n */\n__isl_give isl_schedule *isl_schedule_dup(__isl_keep isl_schedule *sched)\n{\n\tif (!sched)\n\t\treturn NULL;\n\n  isl_schedule_tree *tree = isl_schedule_tree_dup(sched->root);\n  isl_schedule *new_sched = isl_schedule_from_schedule_tree(\n      isl_schedule_get_ctx(sched), tree);\n\t\n\treturn new_sched;\n}\n/* AutoSA Extended */\n\n#undef BASE\n#define BASE schedule\n#include <print_templ_yaml.c>\n"
  },
  {
    "path": "autosa_scripts/ppcg_changes/isl/isl_schedule_band.c",
    "content": "/*\n * Copyright 2013-2014 Ecole Normale Superieure\n * Copyright 2014      INRIA Rocquencourt\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege,\n * Ecole Normale Superieure, 45 rue d'Ulm, 75230 Paris, France\n * and Inria Paris - Rocquencourt, Domaine de Voluceau - Rocquencourt,\n * B.P. 105 - 78153 Le Chesnay, France\n */\n\n#include <string.h>\n#include <isl/val.h>\n#include <isl/space.h>\n#include <isl/map.h>\n#include <isl/schedule_node.h>\n#include <isl_schedule_band.h>\n#include <isl_schedule_private.h>\n\nisl_ctx *isl_schedule_band_get_ctx(__isl_keep isl_schedule_band *band)\n{\n\treturn band ? isl_multi_union_pw_aff_get_ctx(band->mupa) : NULL;\n}\n\n/* Return a new uninitialized isl_schedule_band.\n */\nstatic __isl_give isl_schedule_band *isl_schedule_band_alloc(isl_ctx *ctx)\n{\n\tisl_schedule_band *band;\n\n\tband = isl_calloc_type(ctx, isl_schedule_band);\n\tif (!band)\n\t\treturn NULL;\n\n\tband->ref = 1;\n\n\treturn band;\n}\n\n/* Return a new isl_schedule_band with partial schedule \"mupa\".\n * First replace \"mupa\" by its greatest integer part to ensure\n * that the schedule is always integral.\n * The band is not marked permutable, the dimensions are not\n * marked coincident and the AST build options are empty.\n * Since there are no build options, the node is not anchored.\n */\n__isl_give isl_schedule_band *isl_schedule_band_from_multi_union_pw_aff(\n\t__isl_take isl_multi_union_pw_aff *mupa)\n{\n\tisl_size dim;\n\tisl_ctx *ctx;\n\tisl_schedule_band *band;\n\tisl_space *space;\n\n\tmupa = isl_multi_union_pw_aff_floor(mupa);\n\tdim = isl_multi_union_pw_aff_size(mupa);\n\tif (dim < 0)\n\t\tgoto error;\n\tctx = isl_multi_union_pw_aff_get_ctx(mupa);\n\tband = isl_schedule_band_alloc(ctx);\n\tif (!band)\n\t\tgoto error;\n\n\tband->n = dim;\n\tband->coincident = isl_calloc_array(ctx, int, band->n);\n\t/* AutoSA Extended */\n\tband->space_time = isl_calloc_array(ctx, enum autosa_loop_type, band->n);\n  \tband->pe_opt = isl_calloc_array(ctx, enum autosa_loop_type, band->n);\n\tband->sched_pos = isl_calloc_array(ctx, int, band->n);\n\tfor (int i = 0; i < band->n; ++i) {\n\t\tband->sched_pos[i] = -1;\n\t\tband->iter[i] = NULL;\n\t}\n\t/* AutoSA Extended */\n\tband->mupa = mupa;\n\tspace = isl_space_params_alloc(ctx, 0);\n\tband->ast_build_options = isl_union_set_empty(space);\n\tband->anchored = 0;\n\n\tif ((band->n && !band->coincident) || !band->ast_build_options)\n\t\treturn isl_schedule_band_free(band);\n\n\treturn band;\nerror:\n\tisl_multi_union_pw_aff_free(mupa);\n\treturn NULL;\n}\n\n/* Create a duplicate of the given isl_schedule_band.\n */\n__isl_give isl_schedule_band *isl_schedule_band_dup(\n\t__isl_keep isl_schedule_band *band)\n{\n\tint i;\n\tisl_ctx *ctx;\n\tisl_schedule_band *dup;\n\n\tif (!band)\n\t\treturn NULL;\n\n\tctx = isl_schedule_band_get_ctx(band);\n\tdup = isl_schedule_band_alloc(ctx);\n\tif (!dup)\n\t\treturn NULL;\n\n\tdup->n = band->n;\n\tdup->coincident = isl_alloc_array(ctx, int, band->n);\n\tif (band->n && !dup->coincident)\n\t\treturn isl_schedule_band_free(dup);\n\n\tfor (i = 0; i < band->n; ++i)\n\t\tdup->coincident[i] = band->coincident[i];\n\tdup->permutable = band->permutable;\n\n\t/* AutoSA Extended */\n    if (band->space_time) {\n      dup->space_time = isl_alloc_array(ctx, enum autosa_loop_type, band->n);\n      for (i = 0; i < band->n; ++i)\n        dup->space_time[i] = band->space_time[i];\n    }\n    if (band->pe_opt) {\n      dup->pe_opt = isl_alloc_array(ctx, enum autosa_loop_type, band->n);\n      for (i = 0; i < band->n; ++i)\n        dup->pe_opt[i] = band->pe_opt[i];\n    }\t\n\tif (band->sched_pos) {\n      dup->sched_pos = isl_alloc_array(ctx, int, band->n);\n      for (i = 0; i < band->n; ++i)\n        dup->sched_pos[i] = band->sched_pos[i];\n    }\t\n\tif (band->iter) {      \n      for (i = 0; i < band->n; ++i)\n        dup->iter[i] = band->iter[i];\n    }\t\n\t/* AutoSA Extended */\n\n\tdup->mupa = isl_multi_union_pw_aff_copy(band->mupa);\n\tdup->ast_build_options = isl_union_set_copy(band->ast_build_options);\n\tif (!dup->mupa || !dup->ast_build_options)\n\t\treturn isl_schedule_band_free(dup);\n\n\tif (band->loop_type) {\n\t\tdup->loop_type = isl_alloc_array(ctx,\n\t\t\t\t\t    enum isl_ast_loop_type, band->n);\n\t\tif (band->n && !dup->loop_type)\n\t\t\treturn isl_schedule_band_free(dup);\n\t\tfor (i = 0; i < band->n; ++i)\n\t\t\tdup->loop_type[i] = band->loop_type[i];\n\t}\n\tif (band->isolate_loop_type) {\n\t\tdup->isolate_loop_type = isl_alloc_array(ctx,\n\t\t\t\t\t    enum isl_ast_loop_type, band->n);\n\t\tif (band->n && !dup->isolate_loop_type)\n\t\t\treturn isl_schedule_band_free(dup);\n\t\tfor (i = 0; i < band->n; ++i)\n\t\t\tdup->isolate_loop_type[i] = band->isolate_loop_type[i];\n\t}\n\n\treturn dup;\n}\n\n/* Return an isl_schedule_band that is equal to \"band\" and that has only\n * a single reference.\n */\n__isl_give isl_schedule_band *isl_schedule_band_cow(\n\t__isl_take isl_schedule_band *band)\n{\n\tif (!band)\n\t\treturn NULL;\n\n\tif (band->ref == 1)\n\t\treturn band;\n\tband->ref--;\n\treturn isl_schedule_band_dup(band);\n}\n\n/* Return a new reference to \"band\".\n */\n__isl_give isl_schedule_band *isl_schedule_band_copy(\n\t__isl_keep isl_schedule_band *band)\n{\n\tif (!band)\n\t\treturn NULL;\n\n\tband->ref++;\n\treturn band;\n}\n\n/* Free a reference to \"band\" and return NULL.\n */\n__isl_null isl_schedule_band *isl_schedule_band_free(\n\t__isl_take isl_schedule_band *band)\n{\n\tif (!band)\n\t\treturn NULL;\n\n\tif (--band->ref > 0)\n\t\treturn NULL;\n\n\tisl_multi_union_pw_aff_free(band->mupa);\n\tisl_union_set_free(band->ast_build_options);\n\tfree(band->loop_type);\n\tfree(band->isolate_loop_type);\n\tfree(band->coincident);\n\t/* AutoSA Extended */\n\tfree(band->space_time);\n\tfree(band->pe_opt);\n\tfree(band->sched_pos);\n\t/* AutoSA Extended */\n\tfree(band);\n\n\treturn NULL;\n}\n\n/* Are \"band1\" and \"band2\" obviously equal?\n */\nisl_bool isl_schedule_band_plain_is_equal(__isl_keep isl_schedule_band *band1,\n\t__isl_keep isl_schedule_band *band2)\n{\n\tint i;\n\tisl_bool equal;\n\n\tif (!band1 || !band2)\n\t\treturn isl_bool_error;\n\tif (band1 == band2)\n\t\treturn isl_bool_true;\n\n\tif (band1->n != band2->n)\n\t\treturn isl_bool_false;\n\tfor (i = 0; i < band1->n; ++i)\n\t\tif (band1->coincident[i] != band2->coincident[i])\n\t\t\treturn isl_bool_false;\n\tif (band1->permutable != band2->permutable)\n\t\treturn isl_bool_false;\n\n\tequal = isl_multi_union_pw_aff_plain_is_equal(band1->mupa, band2->mupa);\n\tif (equal < 0 || !equal)\n\t\treturn equal;\n\n\tif (!band1->loop_type != !band2->loop_type)\n\t\treturn isl_bool_false;\n\tif (band1->loop_type)\n\t\tfor (i = 0; i < band1->n; ++i)\n\t\t\tif (band1->loop_type[i] != band2->loop_type[i])\n\t\t\t\treturn isl_bool_false;\n\n\tif (!band1->isolate_loop_type != !band2->isolate_loop_type)\n\t\treturn isl_bool_false;\n\tif (band1->isolate_loop_type)\n\t\tfor (i = 0; i < band1->n; ++i)\n\t\t\tif (band1->isolate_loop_type[i] !=\n\t\t\t\t\t\tband2->isolate_loop_type[i])\n\t\t\t\treturn isl_bool_false;\n\n\treturn isl_union_set_is_equal(band1->ast_build_options,\n\t\t\t\t\tband2->ast_build_options);\n}\n\n/* Return the number of scheduling dimensions in the band.\n */\nisl_size isl_schedule_band_n_member(__isl_keep isl_schedule_band *band)\n{\n\treturn band ? band->n : isl_size_error;\n}\n\n/* Is the given scheduling dimension coincident within the band and\n * with respect to the coincidence constraints?\n */\nisl_bool isl_schedule_band_member_get_coincident(\n\t__isl_keep isl_schedule_band *band, int pos)\n{\n\tif (!band)\n\t\treturn isl_bool_error;\n\n\tif (pos < 0 || pos >= band->n)\n\t\tisl_die(isl_schedule_band_get_ctx(band), isl_error_invalid,\n\t\t\t\"invalid member position\", return isl_bool_error);\n\n\treturn isl_bool_ok(band->coincident[pos]);\n}\n\n/* Mark the given scheduling dimension as being coincident or not\n * according to \"coincident\".\n */\n__isl_give isl_schedule_band *isl_schedule_band_member_set_coincident(\n\t__isl_take isl_schedule_band *band, int pos, int coincident)\n{\n\tif (!band)\n\t\treturn NULL;\n\tif (isl_schedule_band_member_get_coincident(band, pos) == coincident)\n\t\treturn band;\n\tband = isl_schedule_band_cow(band);\n\tif (!band)\n\t\treturn NULL;\n\n\tif (pos < 0 || pos >= band->n)\n\t\tisl_die(isl_schedule_band_get_ctx(band), isl_error_invalid,\n\t\t\t\"invalid member position\",\n\t\t\treturn isl_schedule_band_free(band));\n\n\tband->coincident[pos] = coincident;\n\n\treturn band;\n}\n\n/* Is the schedule band mark permutable?\n */\nisl_bool isl_schedule_band_get_permutable(__isl_keep isl_schedule_band *band)\n{\n\tif (!band)\n\t\treturn isl_bool_error;\n\treturn isl_bool_ok(band->permutable);\n}\n\n/* Mark the schedule band permutable or not according to \"permutable\"?\n */\n__isl_give isl_schedule_band *isl_schedule_band_set_permutable(\n\t__isl_take isl_schedule_band *band, int permutable)\n{\n\tif (!band)\n\t\treturn NULL;\n\tif (band->permutable == permutable)\n\t\treturn band;\n\tband = isl_schedule_band_cow(band);\n\tif (!band)\n\t\treturn NULL;\n\n\tband->permutable = permutable;\n\n\treturn band;\n}\n\n/* Is the band node \"node\" anchored?  That is, does it reference\n * the outer band nodes?\n */\nint isl_schedule_band_is_anchored(__isl_keep isl_schedule_band *band)\n{\n\treturn band ? band->anchored : -1;\n}\n\n/* Return the schedule space of the band.\n */\n__isl_give isl_space *isl_schedule_band_get_space(\n\t__isl_keep isl_schedule_band *band)\n{\n\tif (!band)\n\t\treturn NULL;\n\treturn isl_multi_union_pw_aff_get_space(band->mupa);\n}\n\n/* Intersect the domain of the band schedule of \"band\" with \"domain\".\n */\n__isl_give isl_schedule_band *isl_schedule_band_intersect_domain(\n\t__isl_take isl_schedule_band *band, __isl_take isl_union_set *domain)\n{\n\tband = isl_schedule_band_cow(band);\n\tif (!band || !domain)\n\t\tgoto error;\n\n\tband->mupa = isl_multi_union_pw_aff_intersect_domain(band->mupa,\n\t\t\t\t\t\t\t\tdomain);\n\tif (!band->mupa)\n\t\treturn isl_schedule_band_free(band);\n\n\treturn band;\nerror:\n\tisl_schedule_band_free(band);\n\tisl_union_set_free(domain);\n\treturn NULL;\n}\n\n/* Return the schedule of the band in isolation.\n */\n__isl_give isl_multi_union_pw_aff *isl_schedule_band_get_partial_schedule(\n\t__isl_keep isl_schedule_band *band)\n{\n\treturn band ? isl_multi_union_pw_aff_copy(band->mupa) : NULL;\n}\n\n/* Replace the schedule of \"band\" by \"schedule\".\n */\n__isl_give isl_schedule_band *isl_schedule_band_set_partial_schedule(\n\t__isl_take isl_schedule_band *band,\n\t__isl_take isl_multi_union_pw_aff *schedule)\n{\n\tband = isl_schedule_band_cow(band);\n\tif (!band || !schedule)\n\t\tgoto error;\n\n\tisl_multi_union_pw_aff_free(band->mupa);\n\tband->mupa = schedule;\n\n\treturn band;\nerror:\n\tisl_schedule_band_free(band);\n\tisl_multi_union_pw_aff_free(schedule);\n\treturn NULL;\n}\n\n/* Return the loop AST generation type for the band member of \"band\"\n * at position \"pos\".\n */\nenum isl_ast_loop_type isl_schedule_band_member_get_ast_loop_type(\n\t__isl_keep isl_schedule_band *band, int pos)\n{\n\tif (!band)\n\t\treturn isl_ast_loop_error;\n\n\tif (pos < 0 || pos >= band->n)\n\t\tisl_die(isl_schedule_band_get_ctx(band), isl_error_invalid,\n\t\t\t\"invalid member position\", return isl_ast_loop_error);\n\n\tif (!band->loop_type)\n\t\treturn isl_ast_loop_default;\n\n\treturn band->loop_type[pos];\n}\n\n/* Set the loop AST generation type for the band member of \"band\"\n * at position \"pos\" to \"type\".\n */\n__isl_give isl_schedule_band *isl_schedule_band_member_set_ast_loop_type(\n\t__isl_take isl_schedule_band *band, int pos,\n\tenum isl_ast_loop_type type)\n{\n\tif (!band)\n\t\treturn NULL;\n\tif (isl_schedule_band_member_get_ast_loop_type(band, pos) == type)\n\t\treturn band;\n\n\tif (pos < 0 || pos >= band->n)\n\t\tisl_die(isl_schedule_band_get_ctx(band), isl_error_invalid,\n\t\t\t\"invalid member position\",\n\t\t\treturn isl_schedule_band_free(band));\n\n\tband = isl_schedule_band_cow(band);\n\tif (!band)\n\t\treturn isl_schedule_band_free(band);\n\n\tif (!band->loop_type) {\n\t\tisl_ctx *ctx;\n\n\t\tctx = isl_schedule_band_get_ctx(band);\n\t\tband->loop_type = isl_calloc_array(ctx,\n\t\t\t\t\t    enum isl_ast_loop_type, band->n);\n\t\tif (band->n && !band->loop_type)\n\t\t\treturn isl_schedule_band_free(band);\n\t}\n\n\tband->loop_type[pos] = type;\n\n\treturn band;\n}\n\n/* Return the loop AST generation type for the band member of \"band\"\n * at position \"pos\" for the part that has been isolated by the isolate option.\n */\nenum isl_ast_loop_type isl_schedule_band_member_get_isolate_ast_loop_type(\n\t__isl_keep isl_schedule_band *band, int pos)\n{\n\tif (!band)\n\t\treturn isl_ast_loop_error;\n\n\tif (pos < 0 || pos >= band->n)\n\t\tisl_die(isl_schedule_band_get_ctx(band), isl_error_invalid,\n\t\t\t\"invalid member position\", return isl_ast_loop_error);\n\n\tif (!band->isolate_loop_type)\n\t\treturn isl_ast_loop_default;\n\n\treturn band->isolate_loop_type[pos];\n}\n\n/* Set the loop AST generation type for the band member of \"band\"\n * at position \"pos\" to \"type\" for the part that has been isolated\n * by the isolate option.\n */\n__isl_give isl_schedule_band *\nisl_schedule_band_member_set_isolate_ast_loop_type(\n\t__isl_take isl_schedule_band *band, int pos,\n\tenum isl_ast_loop_type type)\n{\n\tif (!band)\n\t\treturn NULL;\n\tif (isl_schedule_band_member_get_isolate_ast_loop_type(band, pos) ==\n\t\t\t\t\t\t\t\t\ttype)\n\t\treturn band;\n\n\tif (pos < 0 || pos >= band->n)\n\t\tisl_die(isl_schedule_band_get_ctx(band), isl_error_invalid,\n\t\t\t\"invalid member position\",\n\t\t\treturn isl_schedule_band_free(band));\n\n\tband = isl_schedule_band_cow(band);\n\tif (!band)\n\t\treturn isl_schedule_band_free(band);\n\n\tif (!band->isolate_loop_type) {\n\t\tisl_ctx *ctx;\n\n\t\tctx = isl_schedule_band_get_ctx(band);\n\t\tband->isolate_loop_type = isl_calloc_array(ctx,\n\t\t\t\t\t    enum isl_ast_loop_type, band->n);\n\t\tif (band->n && !band->isolate_loop_type)\n\t\t\treturn isl_schedule_band_free(band);\n\t}\n\n\tband->isolate_loop_type[pos] = type;\n\n\treturn band;\n}\n\nstatic const char *option_str[] = {\n\t[isl_ast_loop_atomic] = \"atomic\",\n\t[isl_ast_loop_unroll] = \"unroll\",\n\t[isl_ast_loop_separate] = \"separate\"\n};\n\n/* Given a parameter space \"space\", extend it to a set space\n *\n *\t{ type[x] }\n *\n * or\n *\n *\t{ [isolate[] -> type[x]] }\n *\n * depending on whether \"isolate\" is set.\n * These can be used to encode loop AST generation options of the given type.\n */\nstatic __isl_give isl_space *loop_type_space(__isl_take isl_space *space,\n\tenum isl_ast_loop_type type, int isolate)\n{\n\tconst char *name;\n\n\tname = option_str[type];\n\tspace = isl_space_set_from_params(space);\n\tspace = isl_space_add_dims(space, isl_dim_set, 1);\n\tspace = isl_space_set_tuple_name(space, isl_dim_set, name);\n\tif (!isolate)\n\t\treturn space;\n\tspace = isl_space_from_range(space);\n\tspace = isl_space_set_tuple_name(space, isl_dim_in, \"isolate\");\n\tspace = isl_space_wrap(space);\n\n\treturn space;\n}\n\n/* Add encodings of the \"n\" loop AST generation options \"type\" to \"options\".\n * If \"isolate\" is set, then these options refer to the isolated part.\n *\n * In particular, for each sequence of consecutive identical types \"t\",\n * different from the default, add an option\n *\n *\t{ t[x] : first <= x <= last }\n *\n * or\n *\n *\t{ [isolate[] -> t[x]] : first <= x <= last }\n */\nstatic __isl_give isl_union_set *add_loop_types(\n\t__isl_take isl_union_set *options, int n, enum isl_ast_loop_type *type,\n\tint isolate)\n{\n\tint i;\n\n\tif (!type)\n\t\treturn options;\n\tif (!options)\n\t\treturn NULL;\n\n\tfor (i = 0; i < n; ++i) {\n\t\tint first;\n\t\tisl_space *space;\n\t\tisl_set *option;\n\n\t\tif (type[i] == isl_ast_loop_default)\n\t\t\tcontinue;\n\n\t\tfirst = i;\n\t\twhile (i + 1 < n && type[i + 1] == type[i])\n\t\t\t++i;\n\n\t\tspace = isl_union_set_get_space(options);\n\t\tspace = loop_type_space(space, type[i], isolate);\n\t\toption = isl_set_universe(space);\n\t\toption = isl_set_lower_bound_si(option, isl_dim_set, 0, first);\n\t\toption = isl_set_upper_bound_si(option, isl_dim_set, 0, i);\n\t\toptions = isl_union_set_add_set(options, option);\n\t}\n\n\treturn options;\n}\n\n/* Return the AST build options associated to \"band\".\n */\n__isl_give isl_union_set *isl_schedule_band_get_ast_build_options(\n\t__isl_keep isl_schedule_band *band)\n{\n\tisl_union_set *options;\n\n\tif (!band)\n\t\treturn NULL;\n\n\toptions = isl_union_set_copy(band->ast_build_options);\n\toptions = add_loop_types(options, band->n, band->loop_type, 0);\n\toptions = add_loop_types(options, band->n, band->isolate_loop_type, 1);\n\n\treturn options;\n}\n\n/* Internal data structure for not().\n */\nstruct isl_not_data {\n\tisl_bool (*is)(__isl_keep isl_set *set);\n};\n\n/* Does \"set\" not satisfy data->is()?\n */\nstatic isl_bool not(__isl_keep isl_set *set, void *user)\n{\n\tstruct isl_not_data *data = user;\n\n\treturn isl_bool_not(data->is(set));\n}\n\n/* Does \"uset\" contain any set that satisfies \"is\"?\n * In other words, is it not the case that all of them do not satisfy \"is\"?\n */\nstatic isl_bool has_any(__isl_keep isl_union_set *uset,\n\tisl_bool (*is)(__isl_keep isl_set *set))\n{\n\tstruct isl_not_data data = { is };\n\n\treturn isl_bool_not(isl_union_set_every_set(uset, &not, &data));\n}\n\n/* Does \"set\" live in a space of the form\n *\n *\tisolate[[...] -> [...]]\n *\n * ?\n */\nstatic isl_bool is_isolate(__isl_keep isl_set *set)\n{\n\tif (isl_set_has_tuple_name(set)) {\n\t\tconst char *name;\n\t\tname = isl_set_get_tuple_name(set);\n\t\tif (isl_set_is_wrapping(set) && !strcmp(name, \"isolate\"))\n\t\t\treturn isl_bool_true;\n\t}\n\n\treturn isl_bool_false;\n}\n\n/* Does \"options\" include an option of the ofrm\n *\n *\tisolate[[...] -> [...]]\n *\n * ?\n */\nstatic isl_bool has_isolate_option(__isl_keep isl_union_set *options)\n{\n\treturn has_any(options, &is_isolate);\n}\n\n/* Does \"set\" encode a loop AST generation option?\n */\nstatic isl_bool is_loop_type_option(__isl_keep isl_set *set)\n{\n\tisl_size dim;\n\n\tdim = isl_set_dim(set, isl_dim_set);\n\tif (dim < 0)\n\t\treturn isl_bool_error;\n\tif (dim == 1 && isl_set_has_tuple_name(set)) {\n\t\tconst char *name;\n\t\tenum isl_ast_loop_type type;\n\t\tname = isl_set_get_tuple_name(set);\n\t\tfor (type = isl_ast_loop_atomic;\n\t\t    type <= isl_ast_loop_separate; ++type) {\n\t\t\tif (strcmp(name, option_str[type]))\n\t\t\t\tcontinue;\n\t\t\treturn isl_bool_true;\n\t\t}\n\t}\n\n\treturn isl_bool_false;\n}\n\n/* Does \"set\" encode a loop AST generation option for the isolated part?\n * That is, is of the form\n *\n *\t{ [isolate[] -> t[x]] }\n *\n * with t equal to \"atomic\", \"unroll\" or \"separate\"?\n */\nstatic isl_bool is_isolate_loop_type_option(__isl_keep isl_set *set)\n{\n\tconst char *name;\n\tenum isl_ast_loop_type type;\n\tisl_map *map;\n\n\tif (!isl_set_is_wrapping(set))\n\t\treturn isl_bool_false;\n\tmap = isl_set_unwrap(isl_set_copy(set));\n\tif (!isl_map_has_tuple_name(map, isl_dim_in) ||\n\t    !isl_map_has_tuple_name(map, isl_dim_out)) {\n\t\tisl_map_free(map);\n\t\treturn isl_bool_false;\n\t}\n\tname = isl_map_get_tuple_name(map, isl_dim_in);\n\tif (!strcmp(name, \"isolate\")) {\n\t\tname = isl_map_get_tuple_name(map, isl_dim_out);\n\t\tfor (type = isl_ast_loop_atomic;\n\t\t    type <= isl_ast_loop_separate; ++type) {\n\t\t\tif (strcmp(name, option_str[type]))\n\t\t\t\tcontinue;\n\t\t\tisl_map_free(map);\n\t\t\treturn isl_bool_true;\n\t\t}\n\t}\n\tisl_map_free(map);\n\n\treturn isl_bool_false;\n}\n\n/* Does \"options\" encode any loop AST generation options\n * for the isolated part?\n */\nstatic isl_bool has_isolate_loop_type_options(__isl_keep isl_union_set *options)\n{\n\treturn has_any(options, &is_isolate_loop_type_option);\n}\n\n/* Does \"options\" encode any loop AST generation options?\n */\nstatic isl_bool has_loop_type_options(__isl_keep isl_union_set *options)\n{\n\treturn has_any(options, &is_loop_type_option);\n}\n\n/* Extract the loop AST generation type for the band member\n * at position \"pos\" from \"options\".\n * If \"isolate\" is set, then extract the loop types for the isolated part.\n */\nstatic enum isl_ast_loop_type extract_loop_type(\n\t__isl_keep isl_union_set *options, int pos, int isolate)\n{\n\tisl_ctx *ctx;\n\tenum isl_ast_loop_type type, res = isl_ast_loop_default;\n\n\tctx = isl_union_set_get_ctx(options);\n\tfor (type = isl_ast_loop_atomic;\n\t    type <= isl_ast_loop_separate; ++type) {\n\t\tisl_space *space;\n\t\tisl_set *option;\n\t\tint empty;\n\n\t\tspace = isl_union_set_get_space(options);\n\t\tspace = loop_type_space(space, type, isolate);\n\t\toption = isl_union_set_extract_set(options, space);\n\t\toption = isl_set_fix_si(option, isl_dim_set, 0, pos);\n\t\tempty = isl_set_is_empty(option);\n\t\tisl_set_free(option);\n\n\t\tif (empty < 0)\n\t\t\treturn isl_ast_loop_error;\n\t\tif (empty)\n\t\t\tcontinue;\n\t\tif (res != isl_ast_loop_default)\n\t\t\tisl_die(ctx, isl_error_invalid,\n\t\t\t\t\"conflicting loop type options\",\n\t\t\t\treturn isl_ast_loop_error);\n\t\tres = type;\n\t}\n\n\treturn res;\n}\n\n/* Extract the loop AST generation types for the members of \"band\"\n * from \"options\" and store them in band->loop_type.\n * Return -1 on error.\n */\nstatic int extract_loop_types(__isl_keep isl_schedule_band *band,\n\t__isl_keep isl_union_set *options)\n{\n\tint i;\n\n\tif (!band->loop_type) {\n\t\tisl_ctx *ctx = isl_schedule_band_get_ctx(band);\n\t\tband->loop_type = isl_alloc_array(ctx,\n\t\t\t\t\t    enum isl_ast_loop_type, band->n);\n\t\tif (band->n && !band->loop_type)\n\t\t\treturn -1;\n\t}\n\tfor (i = 0; i < band->n; ++i) {\n\t\tband->loop_type[i] = extract_loop_type(options, i, 0);\n\t\tif (band->loop_type[i] == isl_ast_loop_error)\n\t\t\treturn -1;\n\t}\n\n\treturn 0;\n}\n\n/* Extract the loop AST generation types for the members of \"band\"\n * from \"options\" for the isolated part and\n * store them in band->isolate_loop_type.\n * Return -1 on error.\n */\nstatic int extract_isolate_loop_types(__isl_keep isl_schedule_band *band,\n\t__isl_keep isl_union_set *options)\n{\n\tint i;\n\n\tif (!band->isolate_loop_type) {\n\t\tisl_ctx *ctx = isl_schedule_band_get_ctx(band);\n\t\tband->isolate_loop_type = isl_alloc_array(ctx,\n\t\t\t\t\t    enum isl_ast_loop_type, band->n);\n\t\tif (band->n && !band->isolate_loop_type)\n\t\t\treturn -1;\n\t}\n\tfor (i = 0; i < band->n; ++i) {\n\t\tband->isolate_loop_type[i] = extract_loop_type(options, i, 1);\n\t\tif (band->isolate_loop_type[i] == isl_ast_loop_error)\n\t\t\treturn -1;\n\t}\n\n\treturn 0;\n}\n\n/* Construct universe sets of the spaces that encode loop AST generation\n * types (for the isolated part if \"isolate\" is set).  That is, construct\n *\n *\t{ atomic[x]; separate[x]; unroll[x] }\n *\n * or\n *\n *\t{ [isolate[] -> atomic[x]]; [isolate[] -> separate[x]];\n *\t  [isolate[] -> unroll[x]] }\n */\nstatic __isl_give isl_union_set *loop_types(__isl_take isl_space *space,\n\tint isolate)\n{\n\tenum isl_ast_loop_type type;\n\tisl_union_set *types;\n\n\ttypes = isl_union_set_empty(space);\n\tfor (type = isl_ast_loop_atomic;\n\t    type <= isl_ast_loop_separate; ++type) {\n\t\tisl_set *set;\n\n\t\tspace = isl_union_set_get_space(types);\n\t\tspace = loop_type_space(space, type, isolate);\n\t\tset = isl_set_universe(space);\n\t\ttypes = isl_union_set_add_set(types, set);\n\t}\n\n\treturn types;\n}\n\n/* Remove all elements from spaces that encode loop AST generation types\n * from \"options\".\n */\nstatic __isl_give isl_union_set *clear_loop_types(\n\t__isl_take isl_union_set *options)\n{\n\tisl_union_set *types;\n\n\ttypes = loop_types(isl_union_set_get_space(options), 0);\n\toptions = isl_union_set_subtract(options, types);\n\n\treturn options;\n}\n\n/* Remove all elements from spaces that encode loop AST generation types\n * for the isolated part from \"options\".\n */\nstatic __isl_give isl_union_set *clear_isolate_loop_types(\n\t__isl_take isl_union_set *options)\n{\n\tisl_union_set *types;\n\n\ttypes = loop_types(isl_union_set_get_space(options), 1);\n\toptions = isl_union_set_subtract(options, types);\n\n\treturn options;\n}\n\n/* Replace the AST build options associated to \"band\" by \"options\".\n * If there are any loop AST generation type options, then they\n * are extracted and stored in band->loop_type.  Otherwise,\n * band->loop_type is removed to indicate that the default applies\n * to all members.  Similarly for the loop AST generation type options\n * for the isolated part, which are stored in band->isolate_loop_type.\n * The remaining options are stored in band->ast_build_options.\n *\n * Set anchored if the options include an isolate option since the\n * domain of the wrapped map references the outer band node schedules.\n */\n__isl_give isl_schedule_band *isl_schedule_band_set_ast_build_options(\n\t__isl_take isl_schedule_band *band, __isl_take isl_union_set *options)\n{\n\tisl_bool has_isolate, has_loop_type, has_isolate_loop_type;\n\n\tband = isl_schedule_band_cow(band);\n\tif (!band || !options)\n\t\tgoto error;\n\thas_isolate = has_isolate_option(options);\n\tif (has_isolate < 0)\n\t\tgoto error;\n\thas_loop_type = has_loop_type_options(options);\n\tif (has_loop_type < 0)\n\t\tgoto error;\n\thas_isolate_loop_type = has_isolate_loop_type_options(options);\n\tif (has_isolate_loop_type < 0)\n\t\tgoto error;\n\n\tif (!has_loop_type) {\n\t\tfree(band->loop_type);\n\t\tband->loop_type = NULL;\n\t} else {\n\t\tif (extract_loop_types(band, options) < 0)\n\t\t\tgoto error;\n\t\toptions = clear_loop_types(options);\n\t\tif (!options)\n\t\t\tgoto error;\n\t}\n\n\tif (!has_isolate_loop_type) {\n\t\tfree(band->isolate_loop_type);\n\t\tband->isolate_loop_type = NULL;\n\t} else {\n\t\tif (extract_isolate_loop_types(band, options) < 0)\n\t\t\tgoto error;\n\t\toptions = clear_isolate_loop_types(options);\n\t\tif (!options)\n\t\t\tgoto error;\n\t}\n\n\tisl_union_set_free(band->ast_build_options);\n\tband->ast_build_options = options;\n\tband->anchored = has_isolate;\n\n\treturn band;\nerror:\n\tisl_schedule_band_free(band);\n\tisl_union_set_free(options);\n\treturn NULL;\n}\n\n/* Return the \"isolate\" option associated to \"band\", assuming\n * it at appears at schedule depth \"depth\".\n *\n * The isolate option is of the form\n *\n *\tisolate[[flattened outer bands] -> band]\n */\n__isl_give isl_set *isl_schedule_band_get_ast_isolate_option(\n\t__isl_keep isl_schedule_band *band, int depth)\n{\n\tisl_space *space;\n\tisl_set *isolate;\n\n\tif (!band)\n\t\treturn NULL;\n\n\tspace = isl_schedule_band_get_space(band);\n\tspace = isl_space_from_range(space);\n\tspace = isl_space_add_dims(space, isl_dim_in, depth);\n\tspace = isl_space_wrap(space);\n\tspace = isl_space_set_tuple_name(space, isl_dim_set, \"isolate\");\n\n\tisolate = isl_union_set_extract_set(band->ast_build_options, space);\n\n\treturn isolate;\n}\n\n/* Replace the option \"drop\" in the AST build options by \"add\".\n * That is, remove \"drop\" and add \"add\".\n */\n__isl_give isl_schedule_band *isl_schedule_band_replace_ast_build_option(\n\t__isl_take isl_schedule_band *band, __isl_take isl_set *drop,\n\t__isl_take isl_set *add)\n{\n\tisl_union_set *options;\n\n\tband = isl_schedule_band_cow(band);\n\tif (!band)\n\t\tgoto error;\n\n\toptions = band->ast_build_options;\n\toptions = isl_union_set_subtract(options, isl_union_set_from_set(drop));\n\toptions = isl_union_set_union(options, isl_union_set_from_set(add));\n\tband->ast_build_options = options;\n\n\tif (!band->ast_build_options)\n\t\treturn isl_schedule_band_free(band);\n\n\treturn band;\nerror:\n\tisl_schedule_band_free(band);\n\tisl_set_free(drop);\n\tisl_set_free(add);\n\treturn NULL;\n}\n\n/* Multiply the partial schedule of \"band\" with the factors in \"mv\".\n * Replace the result by its greatest integer part to ensure\n * that the schedule is always integral.\n */\n__isl_give isl_schedule_band *isl_schedule_band_scale(\n\t__isl_take isl_schedule_band *band, __isl_take isl_multi_val *mv)\n{\n\tband = isl_schedule_band_cow(band);\n\tif (!band || !mv)\n\t\tgoto error;\n\tband->mupa = isl_multi_union_pw_aff_scale_multi_val(band->mupa, mv);\n\tband->mupa = isl_multi_union_pw_aff_floor(band->mupa);\n\tif (!band->mupa)\n\t\treturn isl_schedule_band_free(band);\n\treturn band;\nerror:\n\tisl_schedule_band_free(band);\n\tisl_multi_val_free(mv);\n\treturn NULL;\n}\n\n/* Divide the partial schedule of \"band\" by the factors in \"mv\".\n * Replace the result by its greatest integer part to ensure\n * that the schedule is always integral.\n */\n__isl_give isl_schedule_band *isl_schedule_band_scale_down(\n\t__isl_take isl_schedule_band *band, __isl_take isl_multi_val *mv)\n{\n\tband = isl_schedule_band_cow(band);\n\tif (!band || !mv)\n\t\tgoto error;\n\tband->mupa = isl_multi_union_pw_aff_scale_down_multi_val(band->mupa,\n\t\t\t\t\t\t\t\tmv);\n\tband->mupa = isl_multi_union_pw_aff_floor(band->mupa);\n\tif (!band->mupa)\n\t\treturn isl_schedule_band_free(band);\n\treturn band;\nerror:\n\tisl_schedule_band_free(band);\n\tisl_multi_val_free(mv);\n\treturn NULL;\n}\n\n/* Reduce the partial schedule of \"band\" modulo the factors in \"mv\".\n */\n__isl_give isl_schedule_band *isl_schedule_band_mod(\n\t__isl_take isl_schedule_band *band, __isl_take isl_multi_val *mv)\n{\n\tband = isl_schedule_band_cow(band);\n\tif (!band || !mv)\n\t\tgoto error;\n\tband->mupa = isl_multi_union_pw_aff_mod_multi_val(band->mupa, mv);\n\tif (!band->mupa)\n\t\treturn isl_schedule_band_free(band);\n\treturn band;\nerror:\n\tisl_schedule_band_free(band);\n\tisl_multi_val_free(mv);\n\treturn NULL;\n}\n\n/* Shift the partial schedule of \"band\" by \"shift\" after checking\n * that the domain of the partial schedule would not be affected\n * by this shift.\n */\n__isl_give isl_schedule_band *isl_schedule_band_shift(\n\t__isl_take isl_schedule_band *band,\n\t__isl_take isl_multi_union_pw_aff *shift)\n{\n\tisl_union_set *dom1, *dom2;\n\tisl_bool subset;\n\n\tband = isl_schedule_band_cow(band);\n\tif (!band || !shift)\n\t\tgoto error;\n\tdom1 = isl_multi_union_pw_aff_domain(\n\t\t\t\tisl_multi_union_pw_aff_copy(band->mupa));\n\tdom2 = isl_multi_union_pw_aff_domain(\n\t\t\t\tisl_multi_union_pw_aff_copy(shift));\n\tsubset = isl_union_set_is_subset(dom1, dom2);\n\tisl_union_set_free(dom1);\n\tisl_union_set_free(dom2);\n\tif (subset < 0)\n\t\tgoto error;\n\tif (!subset)\n\t\tisl_die(isl_schedule_band_get_ctx(band), isl_error_invalid,\n\t\t\t\"domain of shift needs to include domain of \"\n\t\t\t\"partial schedule\", goto error);\n\tband->mupa = isl_multi_union_pw_aff_add(band->mupa, shift);\n\tif (!band->mupa)\n\t\treturn isl_schedule_band_free(band);\n\treturn band;\nerror:\n\tisl_schedule_band_free(band);\n\tisl_multi_union_pw_aff_free(shift);\n\treturn NULL;\n}\n\n/* Given the schedule of a band, construct the corresponding\n * schedule for the tile loops based on the given tile sizes\n * and return the result.\n *\n * If the scale tile loops options is set, then the tile loops\n * are scaled by the tile sizes.\n *\n * That is replace each schedule dimension \"i\" by either\n * \"floor(i/s)\" or \"s * floor(i/s)\".\n */\nstatic isl_multi_union_pw_aff *isl_multi_union_pw_aff_tile(\n\t__isl_take isl_multi_union_pw_aff *sched,\n\t__isl_take isl_multi_val *sizes)\n{\n\tisl_ctx *ctx;\n\tint i;\n\tisl_size n;\n\tisl_val *v;\n\tint scale;\n\n\tctx = isl_multi_val_get_ctx(sizes);\n\tscale = isl_options_get_tile_scale_tile_loops(ctx);\n\n\tn = isl_multi_union_pw_aff_size(sched);\n\tif (n < 0)\n\t\tsched = isl_multi_union_pw_aff_free(sched);\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_union_pw_aff *upa;\n\n\t\tupa = isl_multi_union_pw_aff_get_union_pw_aff(sched, i);\n\t\tv = isl_multi_val_get_val(sizes, i);\n\n\t\tupa = isl_union_pw_aff_scale_down_val(upa, isl_val_copy(v));\n\t\tupa = isl_union_pw_aff_floor(upa);\n\t\tif (scale)\n\t\t\tupa = isl_union_pw_aff_scale_val(upa, isl_val_copy(v));\n\t\tisl_val_free(v);\n\n\t\tsched = isl_multi_union_pw_aff_set_union_pw_aff(sched, i, upa);\n\t}\n\n\tisl_multi_val_free(sizes);\n\treturn sched;\n}\n\n/* Replace \"band\" by a band corresponding to the tile loops of a tiling\n * with the given tile sizes.\n */\n__isl_give isl_schedule_band *isl_schedule_band_tile(\n\t__isl_take isl_schedule_band *band, __isl_take isl_multi_val *sizes)\n{\n\tband = isl_schedule_band_cow(band);\n\tif (!band || !sizes)\n\t\tgoto error;\n\tband->mupa = isl_multi_union_pw_aff_tile(band->mupa, sizes);\n\tif (!band->mupa)\n\t\treturn isl_schedule_band_free(band);\n\treturn band;\nerror:\n\tisl_schedule_band_free(band);\n\tisl_multi_val_free(sizes);\n\treturn NULL;\n}\n\n/* Replace \"band\" by a band corresponding to the point loops of a tiling\n * with the given tile sizes.\n * \"tile\" is the corresponding tile loop band.\n *\n * If the shift point loops option is set, then the point loops\n * are shifted to start at zero.  That is, each schedule dimension \"i\"\n * is replaced by \"i - s * floor(i/s)\".\n * The expression \"floor(i/s)\" (or \"s * floor(i/s)\") is extracted from\n * the tile band.\n *\n * Otherwise, the band is left untouched.\n */\n__isl_give isl_schedule_band *isl_schedule_band_point(\n\t__isl_take isl_schedule_band *band, __isl_keep isl_schedule_band *tile,\n\t__isl_take isl_multi_val *sizes)\n{\n\tisl_ctx *ctx;\n\tisl_multi_union_pw_aff *scaled;\n\n\tif (!band || !sizes)\n\t\tgoto error;\n\n\tctx = isl_schedule_band_get_ctx(band);\n\tif (!isl_options_get_tile_shift_point_loops(ctx)) {\n\t\tisl_multi_val_free(sizes);\n\t\treturn band;\n\t}\n\tband = isl_schedule_band_cow(band);\n\tif (!band)\n\t\tgoto error;\n\n\tscaled = isl_schedule_band_get_partial_schedule(tile);\n\tif (!isl_options_get_tile_scale_tile_loops(ctx))\n\t\tscaled = isl_multi_union_pw_aff_scale_multi_val(scaled, sizes);\n\telse\n\t\tisl_multi_val_free(sizes);\n\tband->mupa = isl_multi_union_pw_aff_sub(band->mupa, scaled);\n\tif (!band->mupa)\n\t\treturn isl_schedule_band_free(band);\n\treturn band;\nerror:\n\tisl_schedule_band_free(band);\n\tisl_multi_val_free(sizes);\n\treturn NULL;\n}\n\n/* Drop the \"n\" dimensions starting at \"pos\" from \"band\".\n *\n * We apply the transformation even if \"n\" is zero to ensure consistent\n * behavior with respect to changes in the schedule space.\n *\n * The caller is responsible for updating the isolate option.\n */\n__isl_give isl_schedule_band *isl_schedule_band_drop(\n\t__isl_take isl_schedule_band *band, int pos, int n)\n{\n\tint i;\n\n\tif (pos < 0 || n < 0 || pos + n > band->n)\n\t\tisl_die(isl_schedule_band_get_ctx(band), isl_error_internal,\n\t\t\t\"range out of bounds\",\n\t\t\treturn isl_schedule_band_free(band));\n\n\tband = isl_schedule_band_cow(band);\n\tif (!band)\n\t\treturn NULL;\n\n\tband->mupa = isl_multi_union_pw_aff_drop_dims(band->mupa,\n\t\t\t\t\t\t\tisl_dim_set, pos, n);\n\tif (!band->mupa)\n\t\treturn isl_schedule_band_free(band);\n\n\tfor (i = pos + n; i < band->n; ++i)\n\t\tband->coincident[i - n] = band->coincident[i];\n\tif (band->loop_type)\n\t\tfor (i = pos + n; i < band->n; ++i)\n\t\t\tband->loop_type[i - n] = band->loop_type[i];\n\tif (band->isolate_loop_type)\n\t\tfor (i = pos + n; i < band->n; ++i)\n\t\t\tband->isolate_loop_type[i - n] =\n\t\t\t\t\t\t    band->isolate_loop_type[i];\n\t/* AutoSA Extended */\t\t\t\t\t\t\t\t\n  \tif (band->space_time)\n  \t  for (i = pos + n; i < band->n; ++i)\n  \t    band->space_time[i - n] = band->space_time[i];\n  \tif (band->pe_opt)\n  \t  for (i = pos + n; i < band->n; ++i)\n  \t    band->pe_opt[i - n] = band->pe_opt[i];\t\n\tif (band->sched_pos)\n      for (i = pos + n; i < band->n; ++i)\n        band->sched_pos[i - n] = band->sched_pos[i];\n\tif (band->iter)\n      for (i = pos + n; i < band->n; ++i)\n        band->iter[i - n] = band->iter[i];\n\t/* AutoSA Extended */\n\n\tband->n -= n;\n\n\treturn band;\n}\n\n/* Reset the user pointer on all identifiers of parameters and tuples\n * in \"band\".\n */\n__isl_give isl_schedule_band *isl_schedule_band_reset_user(\n\t__isl_take isl_schedule_band *band)\n{\n\tband = isl_schedule_band_cow(band);\n\tif (!band)\n\t\treturn NULL;\n\n\tband->mupa = isl_multi_union_pw_aff_reset_user(band->mupa);\n\tband->ast_build_options =\n\t\tisl_union_set_reset_user(band->ast_build_options);\n\tif (!band->mupa || !band->ast_build_options)\n\t\treturn isl_schedule_band_free(band);\n\n\treturn band;\n}\n\n/* Align the parameters of \"band\" to those of \"space\".\n */\n__isl_give isl_schedule_band *isl_schedule_band_align_params(\n\t__isl_take isl_schedule_band *band, __isl_take isl_space *space)\n{\n\tband = isl_schedule_band_cow(band);\n\tif (!band || !space)\n\t\tgoto error;\n\n\tband->mupa = isl_multi_union_pw_aff_align_params(band->mupa,\n\t\t\t\t\t\tisl_space_copy(space));\n\tband->ast_build_options =\n\t\tisl_union_set_align_params(band->ast_build_options, space);\n\tif (!band->mupa || !band->ast_build_options)\n\t\treturn isl_schedule_band_free(band);\n\n\treturn band;\nerror:\n\tisl_space_free(space);\n\tisl_schedule_band_free(band);\n\treturn NULL;\n}\n\n/* Compute the pullback of \"band\" by the function represented by \"upma\".\n * In other words, plug in \"upma\" in the iteration domains of \"band\".\n */\n__isl_give isl_schedule_band *isl_schedule_band_pullback_union_pw_multi_aff(\n\t__isl_take isl_schedule_band *band,\n\t__isl_take isl_union_pw_multi_aff *upma)\n{\n\tband = isl_schedule_band_cow(band);\n\tif (!band || !upma)\n\t\tgoto error;\n\n\tband->mupa =\n\t\tisl_multi_union_pw_aff_pullback_union_pw_multi_aff(band->mupa,\n\t\t\t\t\t\t\t\t\tupma);\n\tif (!band->mupa)\n\t\treturn isl_schedule_band_free(band);\n\n\treturn band;\nerror:\n\tisl_union_pw_multi_aff_free(upma);\n\tisl_schedule_band_free(band);\n\treturn NULL;\n}\n\n/* Compute the gist of \"band\" with respect to \"context\".\n * In particular, compute the gist of the associated partial schedule.\n */\n__isl_give isl_schedule_band *isl_schedule_band_gist(\n\t__isl_take isl_schedule_band *band, __isl_take isl_union_set *context)\n{\n\tif (!band || !context)\n\t\tgoto error;\n\tif (band->n == 0) {\n\t\tisl_union_set_free(context);\n\t\treturn band;\n\t}\n\tband = isl_schedule_band_cow(band);\n\tif (!band)\n\t\tgoto error;\n\tband->mupa = isl_multi_union_pw_aff_gist(band->mupa, context);\n\tif (!band->mupa)\n\t\treturn isl_schedule_band_free(band);\n\treturn band;\nerror:\n\tisl_union_set_free(context);\n\tisl_schedule_band_free(band);\n\treturn NULL;\n}\n\n/* AutoSA Extended */\n/* Return the space_time property of the scheduling dimension within\n * the band.\n */\nenum autosa_loop_type isl_schedule_band_member_get_space_time(\n  __isl_keep isl_schedule_band *band, int pos)\n{\n  if (!band)\n    return autosa_loop_error;\n\n  if (pos < 0 || pos >= band->n)\n    isl_die(isl_schedule_band_get_ctx(band), isl_error_invalid,\n        \"invalid member position\", return autosa_loop_error);\n\n  if (!band->space_time)\n    return autosa_loop_error;\n\n  return band->space_time[pos];\n}\n\n/* Mark the given scheduling dimension as \"loop_type\".\n */\n__isl_give isl_schedule_band *isl_schedule_band_member_set_space_time(\n  __isl_take isl_schedule_band *band, int pos, enum autosa_loop_type loop_type)\n{\n  if (!band)\n    return NULL;\n  band = isl_schedule_band_cow(band);\n  if (!band)\n    return NULL;\n\n  if (pos < 0 || pos >= band->n)\n    isl_die(isl_schedule_band_get_ctx(band), isl_error_invalid,\n        \"invalid member position\",\n        return isl_schedule_band_free(band));\n\n  if (!band->space_time)\n    band->space_time = isl_calloc_array(isl_schedule_band_get_ctx(band), \n\t\t\tenum autosa_loop_type, band->n);\n  band->space_time[pos] = loop_type;\n\n  return band;\n}\n\n/* Return the pe_opt property of the scheduling dimension within\n * the band.\n */\nenum autosa_loop_type isl_schedule_band_member_get_pe_opt(\n  __isl_keep isl_schedule_band *band, int pos)\n{\n  if (!band)\n    return autosa_loop_error;\n\n  if (pos < 0 || pos >= band->n)\n    isl_die(isl_schedule_band_get_ctx(band), isl_error_invalid,\n        \"invalid member position\", return autosa_loop_error);\n\n  if (!band->pe_opt)\n    return autosa_loop_error;\n\n  return band->pe_opt[pos];\n}\n\n/* Mark the given scheduling dimension as \"loop_type\".\n */\n__isl_give isl_schedule_band *isl_schedule_band_member_set_pe_opt(\n  __isl_take isl_schedule_band *band, int pos, enum autosa_loop_type loop_type)\n{\n  if (!band)\n    return NULL;\n  band = isl_schedule_band_cow(band);\n  if (!band)\n    return NULL;\n\n  if (pos < 0 || pos >= band->n)\n    isl_die(isl_schedule_band_get_ctx(band), isl_error_invalid,\n        \"invalid member position\",\n        return isl_schedule_band_free(band));\n\n  if (!band->pe_opt)\n    band->pe_opt = isl_calloc_array(isl_schedule_band_get_ctx(band), \n\t\t\tenum autosa_loop_type, band->n);\n  band->pe_opt[pos] = loop_type;\n\n  return band;\n}\n\n/* Return the sched_pos property of the scheduling dimension within the band.\n */\nint isl_schedule_band_member_get_sched_pos(\n\t__isl_keep isl_schedule_band *band, int pos)\n{\n\tif (!band)\n\t\treturn -1;\n\n\tif (pos < 0 || pos >= band->n)\n\t\tisl_die(isl_schedule_band_get_ctx(band), isl_error_invalid,\n        \"invalid member position\", return -1);\n\n\tif (!band->sched_pos)\n\t\treturn -1;\n\n\treturn band->sched_pos[pos];\n}\n\n/* Mark the given scheduling dimension as \"sched_pos\".\n */\n__isl_give isl_schedule_band *isl_schedule_band_member_set_sched_pos(\n\t\t__isl_take isl_schedule_band *band, int pos, int sched_pos)\n{\n\tif (!band)\n\t\treturn NULL;\n\tband = isl_schedule_band_cow(band);\n\tif (!band)\n    return NULL;\n\n  if (pos < 0 || pos >= band->n)\n    isl_die(isl_schedule_band_get_ctx(band), isl_error_invalid,\n        \"invalid member position\",\n        return isl_schedule_band_free(band));\n\n  if (!band->sched_pos)\n    band->sched_pos = isl_calloc_array(isl_schedule_band_get_ctx(band), \n\t\t\tint, band->n);\n  band->sched_pos[pos] = sched_pos;\n\n  return band;\t\n}\n\n/* Return the iter property of the scheduling dimension within the band.\n */\nvoid *isl_schedule_band_member_get_iter(\n\t__isl_keep isl_schedule_band *band, int pos)\n{\n\tif (!band)\n\t\treturn NULL;\n\n\tif (pos < 0 || pos >= band->n)\n\t\tisl_die(isl_schedule_band_get_ctx(band), isl_error_invalid,\n        \"invalid member position\", return NULL);\n\n\tif (!band->iter)\n\t\treturn NULL;\n\n\treturn band->iter[pos];\n}\n\n/* Mark the given scheduling dimension as \"iter\".\n */\n__isl_give isl_schedule_band *isl_schedule_band_member_set_iter(\n\t\t__isl_take isl_schedule_band *band, int pos, void *iter)\n{\n\tif (!band)\n\t\treturn NULL;\n\tband = isl_schedule_band_cow(band);\n\tif (!band)\n    return NULL;\n\n  if (pos < 0 || pos >= band->n)\n    isl_die(isl_schedule_band_get_ctx(band), isl_error_invalid,\n        \"invalid member position\",\n        return isl_schedule_band_free(band));\n\n  //if (!band->iter)\n  //  band->iter = isl_calloc_array(isl_schedule_band_get_ctx(band), \n\t//\t\tvoid, band->n);\n  if (pos > 20) \n\tisl_die(isl_schedule_band_get_ctx(band), isl_error_invalid,\n\t\t\"maximal band dim 20 surpassed\",\n\t\treturn isl_schedule_band_free(band));\n  band->iter[pos] = iter;\n\n  return band;\t\n}\n/* AutoSA Extended */"
  },
  {
    "path": "autosa_scripts/ppcg_changes/isl/isl_schedule_band.h",
    "content": "#ifndef ISL_SCHEDULE_BAND_H\n#define ISL_SCHEDULE_BAND_H\n\n#include <isl/aff.h>\n#include <isl/ast_type.h>\n#include <isl/union_map.h>\n\n/* Information about a band within a schedule.\n *\n * n is the number of scheduling dimensions within the band.\n * coincident is an array of length n, indicating whether a scheduling dimension\n *\tsatisfies the coincidence constraints in the sense that\n *\tthe corresponding dependence distances are zero.\n * permutable is set if the band is permutable.\n * mupa is the partial schedule corresponding to this band.  The dimension\n *\tof mupa is equal to n.\n * loop_type contains the loop AST generation types for the members\n * in the band.  It may be NULL, if all members are\n * of type isl_ast_loop_default.\n * isolate_loop_type contains the loop AST generation types for the members\n * in the band for the isolated part.  It may be NULL, if all members are\n * of type isl_ast_loop_default.\n * ast_build_options are the remaining AST build options associated\n * to the band.\n * anchored is set if the node depends on its position in the schedule tree.\n *\tIn particular, it is set if the AST build options include\n *\tan isolate option.\n */\nstruct isl_schedule_band {\n\tint ref;\n\n\tint n;\n\tint *coincident;\n\tint permutable;\n\n\tisl_multi_union_pw_aff *mupa;\n\n\tint anchored;\n\tisl_union_set *ast_build_options;\n\tenum isl_ast_loop_type *loop_type;\n\tenum isl_ast_loop_type *isolate_loop_type;\n\n\t/* AutoSA Extended */\n\tenum autosa_loop_type *space_time;\n\tenum autosa_loop_type *pe_opt;\n\tint *sched_pos;\n\tvoid *iter[20];\n\t/* AutoSA Extended */\n};\ntypedef struct isl_schedule_band isl_schedule_band;\n\n__isl_give isl_schedule_band *isl_schedule_band_from_multi_union_pw_aff(\n\t__isl_take isl_multi_union_pw_aff *mupa);\n__isl_give isl_schedule_band *isl_schedule_band_copy(\n\t__isl_keep isl_schedule_band *band);\n__isl_null isl_schedule_band *isl_schedule_band_free(\n\t__isl_take isl_schedule_band *band);\n\nisl_ctx *isl_schedule_band_get_ctx(__isl_keep isl_schedule_band *band);\n\nisl_bool isl_schedule_band_plain_is_equal(__isl_keep isl_schedule_band *band1,\n\t__isl_keep isl_schedule_band *band2);\n\nint isl_schedule_band_is_anchored(__isl_keep isl_schedule_band *band);\n\n__isl_give isl_space *isl_schedule_band_get_space(\n\t__isl_keep isl_schedule_band *band);\n__isl_give isl_schedule_band *isl_schedule_band_intersect_domain(\n\t__isl_take isl_schedule_band *band, __isl_take isl_union_set *domain);\n__isl_give isl_multi_union_pw_aff *isl_schedule_band_get_partial_schedule(\n\t__isl_keep isl_schedule_band *band);\n__isl_give isl_schedule_band *isl_schedule_band_set_partial_schedule(\n\t__isl_take isl_schedule_band *band,\n\t__isl_take isl_multi_union_pw_aff *schedule);\nenum isl_ast_loop_type isl_schedule_band_member_get_ast_loop_type(\n\t__isl_keep isl_schedule_band *band, int pos);\n__isl_give isl_schedule_band *isl_schedule_band_member_set_ast_loop_type(\n\t__isl_take isl_schedule_band *band, int pos,\n\tenum isl_ast_loop_type type);\nenum isl_ast_loop_type isl_schedule_band_member_get_isolate_ast_loop_type(\n\t__isl_keep isl_schedule_band *band, int pos);\n__isl_give isl_schedule_band *\nisl_schedule_band_member_set_isolate_ast_loop_type(\n\t__isl_take isl_schedule_band *band, int pos,\n\tenum isl_ast_loop_type type);\n__isl_give isl_union_set *isl_schedule_band_get_ast_build_options(\n\t__isl_keep isl_schedule_band *band);\n__isl_give isl_schedule_band *isl_schedule_band_set_ast_build_options(\n\t__isl_take isl_schedule_band *band, __isl_take isl_union_set *options);\n__isl_give isl_set *isl_schedule_band_get_ast_isolate_option(\n\t__isl_keep isl_schedule_band *band, int depth);\n__isl_give isl_schedule_band *isl_schedule_band_replace_ast_build_option(\n\t__isl_take isl_schedule_band *band, __isl_take isl_set *drop,\n\t__isl_take isl_set *add);\n\nisl_size isl_schedule_band_n_member(__isl_keep isl_schedule_band *band);\nisl_bool isl_schedule_band_member_get_coincident(\n\t__isl_keep isl_schedule_band *band, int pos);\n__isl_give isl_schedule_band *isl_schedule_band_member_set_coincident(\n\t__isl_take isl_schedule_band *band, int pos, int coincident);\nisl_bool isl_schedule_band_get_permutable(__isl_keep isl_schedule_band *band);\n__isl_give isl_schedule_band *isl_schedule_band_set_permutable(\n\t__isl_take isl_schedule_band *band, int permutable);\n\n__isl_give isl_schedule_band *isl_schedule_band_scale(\n\t__isl_take isl_schedule_band *band, __isl_take isl_multi_val *mv);\n__isl_give isl_schedule_band *isl_schedule_band_scale_down(\n\t__isl_take isl_schedule_band *band, __isl_take isl_multi_val *mv);\n__isl_give isl_schedule_band *isl_schedule_band_mod(\n\t__isl_take isl_schedule_band *band, __isl_take isl_multi_val *mv);\n__isl_give isl_schedule_band *isl_schedule_band_tile(\n\t__isl_take isl_schedule_band *band, __isl_take isl_multi_val *sizes);\n__isl_give isl_schedule_band *isl_schedule_band_point(\n\t__isl_take isl_schedule_band *band, __isl_keep isl_schedule_band *tile,\n\t__isl_take isl_multi_val *sizes);\n__isl_give isl_schedule_band *isl_schedule_band_shift(\n\t__isl_take isl_schedule_band *band,\n\t__isl_take isl_multi_union_pw_aff *shift);\n__isl_give isl_schedule_band *isl_schedule_band_drop(\n\t__isl_take isl_schedule_band *band, int pos, int n);\n__isl_give isl_schedule_band *isl_schedule_band_gist(\n\t__isl_take isl_schedule_band *band, __isl_take isl_union_set *context);\n\n__isl_give isl_schedule_band *isl_schedule_band_reset_user(\n\t__isl_take isl_schedule_band *band);\n__isl_give isl_schedule_band *isl_schedule_band_align_params(\n\t__isl_take isl_schedule_band *band, __isl_take isl_space *space);\n__isl_give isl_schedule_band *isl_schedule_band_pullback_union_pw_multi_aff(\n\t__isl_take isl_schedule_band *band,\n\t__isl_take isl_union_pw_multi_aff *upma);\n\n/* AutoSA Extended */\nenum autosa_loop_type isl_schedule_band_member_get_space_time(\n\t__isl_keep isl_schedule_band *band, int pos);\n__isl_give isl_schedule_band *isl_schedule_band_member_set_space_time(\n\t__isl_take isl_schedule_band *band, int pos, enum autosa_loop_type loop_type);\nenum autosa_loop_type isl_schedule_band_member_get_pe_opt(\n\t__isl_keep isl_schedule_band *band, int pos);\n__isl_give isl_schedule_band *isl_schedule_band_member_set_pe_opt(\n\t__isl_take isl_schedule_band *band, int pos, enum autosa_loop_type loop_type);\nint isl_schedule_band_member_get_sched_pos(\n\t__isl_keep isl_schedule_band *band, int pos);\n__isl_give isl_schedule_band *isl_schedule_band_member_set_sched_pos(\n\t__isl_take isl_schedule_band *band, int pos, int sched_pos);\nvoid *isl_schedule_band_member_get_iter(\n\t__isl_keep isl_schedule_band *band, int pos);\n__isl_give isl_schedule_band *isl_schedule_band_member_set_iter(\n\t__isl_take isl_schedule_band *band, int pos, void *iter);\t\n/* AutoSA Extended */\n\n#endif\n"
  },
  {
    "path": "autosa_scripts/ppcg_changes/isl/isl_schedule_node.c",
    "content": "/*\n * Copyright 2013-2014 Ecole Normale Superieure\n * Copyright 2014      INRIA Rocquencourt\n * Copyright 2016      Sven Verdoolaege\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege,\n * Ecole Normale Superieure, 45 rue d'Ulm, 75230 Paris, France\n * and Inria Paris - Rocquencourt, Domaine de Voluceau - Rocquencourt,\n * B.P. 105 - 78153 Le Chesnay, France\n */\n\n#include <isl/id.h>\n#include <isl/val.h>\n#include <isl/space.h>\n#include <isl/set.h>\n#include <isl/ast_type.h>\n#include <isl_schedule_band.h>\n#include <isl_schedule_private.h>\n#include <isl_schedule_node_private.h>\n\n/* Create a new schedule node in the given schedule, point at the given\n * tree with given ancestors and child positions.\n * \"child_pos\" may be NULL if there are no ancestors.\n */\n__isl_give isl_schedule_node *isl_schedule_node_alloc(\n\t__isl_take isl_schedule *schedule, __isl_take isl_schedule_tree *tree,\n\t__isl_take isl_schedule_tree_list *ancestors, int *child_pos)\n{\n\tisl_ctx *ctx;\n\tisl_schedule_node *node;\n\tint i;\n\tisl_size n;\n\n\tn = isl_schedule_tree_list_n_schedule_tree(ancestors);\n\tif (!schedule || !tree || n < 0)\n\t\tgoto error;\n\tif (n > 0 && !child_pos)\n\t\tgoto error;\n\tctx = isl_schedule_get_ctx(schedule);\n\tnode = isl_calloc_type(ctx, isl_schedule_node);\n\tif (!node)\n\t\tgoto error;\n\tnode->ref = 1;\n\tnode->schedule = schedule;\n\tnode->tree = tree;\n\tnode->ancestors = ancestors;\n\tnode->child_pos = isl_alloc_array(ctx, int, n);\n\tif (n && !node->child_pos)\n\t\treturn isl_schedule_node_free(node);\n\tfor (i = 0; i < n; ++i)\n\t\tnode->child_pos[i] = child_pos[i];\n\n\treturn node;\nerror:\n\tisl_schedule_free(schedule);\n\tisl_schedule_tree_free(tree);\n\tisl_schedule_tree_list_free(ancestors);\n\treturn NULL;\n}\n\n/* Return a pointer to the root of a schedule tree with as single\n * node a domain node with the given domain.\n */\n__isl_give isl_schedule_node *isl_schedule_node_from_domain(\n\t__isl_take isl_union_set *domain)\n{\n\tisl_schedule *schedule;\n\tisl_schedule_node *node;\n\n\tschedule = isl_schedule_from_domain(domain);\n\tnode = isl_schedule_get_root(schedule);\n\tisl_schedule_free(schedule);\n\n\treturn node;\n}\n\n/* Return a pointer to the root of a schedule tree with as single\n * node a extension node with the given extension.\n */\n__isl_give isl_schedule_node *isl_schedule_node_from_extension(\n\t__isl_take isl_union_map *extension)\n{\n\tisl_ctx *ctx;\n\tisl_schedule *schedule;\n\tisl_schedule_tree *tree;\n\tisl_schedule_node *node;\n\n\tif (!extension)\n\t\treturn NULL;\n\n\tctx = isl_union_map_get_ctx(extension);\n\ttree = isl_schedule_tree_from_extension(extension);\n\tschedule = isl_schedule_from_schedule_tree(ctx, tree);\n\tnode = isl_schedule_get_root(schedule);\n\tisl_schedule_free(schedule);\n\n\treturn node;\n}\n\n/* Return the isl_ctx to which \"node\" belongs.\n */\nisl_ctx *isl_schedule_node_get_ctx(__isl_keep isl_schedule_node *node)\n{\n\treturn node ? isl_schedule_get_ctx(node->schedule) : NULL;\n}\n\n/* Return a pointer to the leaf of the schedule into which \"node\" points.\n */\n__isl_keep isl_schedule_tree *isl_schedule_node_peek_leaf(\n\t__isl_keep isl_schedule_node *node)\n{\n\treturn node ? isl_schedule_peek_leaf(node->schedule) : NULL;\n}\n\n/* Return a copy of the leaf of the schedule into which \"node\" points.\n */\n__isl_give isl_schedule_tree *isl_schedule_node_get_leaf(\n\t__isl_keep isl_schedule_node *node)\n{\n\treturn isl_schedule_tree_copy(isl_schedule_node_peek_leaf(node));\n}\n\n/* Return the type of the node or isl_schedule_node_error on error.\n */\nenum isl_schedule_node_type isl_schedule_node_get_type(\n\t__isl_keep isl_schedule_node *node)\n{\n\treturn node ? isl_schedule_tree_get_type(node->tree)\n\t\t    : isl_schedule_node_error;\n}\n\n/* Return the type of the parent of \"node\" or isl_schedule_node_error on error.\n */\nenum isl_schedule_node_type isl_schedule_node_get_parent_type(\n\t__isl_keep isl_schedule_node *node)\n{\n\tisl_size n;\n\tint pos;\n\tint has_parent;\n\tisl_schedule_tree *parent;\n\tenum isl_schedule_node_type type;\n\n\tif (!node)\n\t\treturn isl_schedule_node_error;\n\thas_parent = isl_schedule_node_has_parent(node);\n\tif (has_parent < 0)\n\t\treturn isl_schedule_node_error;\n\tif (!has_parent)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"node has no parent\", return isl_schedule_node_error);\n\tn = isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n\tif (n < 0)\n\t\treturn isl_schedule_node_error;\n\n\tpos = n - 1;\n\tparent = isl_schedule_tree_list_get_schedule_tree(node->ancestors, pos);\n\ttype = isl_schedule_tree_get_type(parent);\n\tisl_schedule_tree_free(parent);\n\n\treturn type;\n}\n\n/* Return a copy of the subtree that this node points to.\n */\n__isl_give isl_schedule_tree *isl_schedule_node_get_tree(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\n\treturn isl_schedule_tree_copy(node->tree);\n}\n\n/* Return a copy of the schedule into which \"node\" points.\n */\n__isl_give isl_schedule *isl_schedule_node_get_schedule(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\treturn isl_schedule_copy(node->schedule);\n}\n\n/* Return a fresh copy of \"node\".\n */\n__isl_take isl_schedule_node *isl_schedule_node_dup(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\n\treturn isl_schedule_node_alloc(isl_schedule_copy(node->schedule),\n\t\t\t\tisl_schedule_tree_copy(node->tree),\n\t\t\t\tisl_schedule_tree_list_copy(node->ancestors),\n\t\t\t\tnode->child_pos);\n}\n\n/* Return an isl_schedule_node that is equal to \"node\" and that has only\n * a single reference.\n */\n__isl_give isl_schedule_node *isl_schedule_node_cow(\n\t__isl_take isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\n\tif (node->ref == 1)\n\t\treturn node;\n\tnode->ref--;\n\treturn isl_schedule_node_dup(node);\n}\n\n/* Return a new reference to \"node\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_copy(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\n\tnode->ref++;\n\treturn node;\n}\n\n/* Free \"node\" and return NULL.\n */\n__isl_null isl_schedule_node *isl_schedule_node_free(\n\t__isl_take isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\tif (--node->ref > 0)\n\t\treturn NULL;\n\n\tisl_schedule_tree_list_free(node->ancestors);\n\tfree(node->child_pos);\n\tisl_schedule_tree_free(node->tree);\n\tisl_schedule_free(node->schedule);\n\tfree(node);\n\n\treturn NULL;\n}\n\n/* Do \"node1\" and \"node2\" point to the same position in the same\n * schedule?\n */\nisl_bool isl_schedule_node_is_equal(__isl_keep isl_schedule_node *node1,\n\t__isl_keep isl_schedule_node *node2)\n{\n\tint i;\n\tisl_size n1, n2;\n\n\tif (!node1 || !node2)\n\t\treturn isl_bool_error;\n\tif (node1 == node2)\n\t\treturn isl_bool_true;\n\tif (node1->schedule != node2->schedule)\n\t\treturn isl_bool_false;\n\n\tn1 = isl_schedule_node_get_tree_depth(node1);\n\tn2 = isl_schedule_node_get_tree_depth(node2);\n\tif (n1 < 0 || n2 < 0)\n\t\treturn isl_bool_error;\n\tif (n1 != n2)\n\t\treturn isl_bool_false;\n\tfor (i = 0; i < n1; ++i)\n\t\tif (node1->child_pos[i] != node2->child_pos[i])\n\t\t\treturn isl_bool_false;\n\n\treturn isl_bool_true;\n}\n\n/* Return the number of outer schedule dimensions of \"node\"\n * in its schedule tree.\n *\n * Return isl_size_error on error.\n */\nisl_size isl_schedule_node_get_schedule_depth(\n\t__isl_keep isl_schedule_node *node)\n{\n\tint i;\n\tisl_size n;\n\tint depth = 0;\n\n\tif (!node)\n\t\treturn isl_size_error;\n\n\tn = isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n\tif (n < 0)\n\t\treturn isl_size_error;\n\tfor (i = n - 1; i >= 0; --i) {\n\t\tisl_schedule_tree *tree;\n\t\tisl_size n;\n\n\t\ttree = isl_schedule_tree_list_get_schedule_tree(\n\t\t\t\t\t\t    node->ancestors, i);\n\t\tif (!tree)\n\t\t\treturn isl_size_error;\n\t\tn = 0;\n\t\tif (tree->type == isl_schedule_node_band)\n\t\t\tn = isl_schedule_tree_band_n_member(tree);\n\t\tdepth += n;\n\t\tisl_schedule_tree_free(tree);\n\t\tif (n < 0)\n\t\t\treturn isl_size_error;\n\t}\n\n\treturn depth;\n}\n\n/* Internal data structure for\n * isl_schedule_node_get_prefix_schedule_union_pw_multi_aff\n *\n * \"initialized\" is set if the filter field has been initialized.\n * If \"universe_domain\" is not set, then the collected filter is intersected\n * with the domain of the root domain node.\n * \"universe_filter\" is set if we are only collecting the universes of filters\n * \"collect_prefix\" is set if we are collecting prefixes.\n * \"filter\" collects all outer filters and is NULL until \"initialized\" is set.\n * \"prefix\" collects all outer band partial schedules (if \"collect_prefix\"\n * is set).  If it is used, then it is initialized by the caller\n * of collect_filter_prefix to a zero-dimensional function.\n */\nstruct isl_schedule_node_get_filter_prefix_data {\n\tint initialized;\n\tint universe_domain;\n\tint universe_filter;\n\tint collect_prefix;\n\tisl_union_set *filter;\n\tisl_multi_union_pw_aff *prefix;\n};\n\nstatic isl_stat collect_filter_prefix(__isl_keep isl_schedule_tree_list *list,\n\tint n, struct isl_schedule_node_get_filter_prefix_data *data);\n\n/* Update the filter and prefix information in \"data\" based on the first \"n\"\n * elements in \"list\" and the expansion tree root \"tree\".\n *\n * We first collect the information from the elements in \"list\",\n * initializing the filter based on the domain of the expansion.\n * Then we map the results to the expanded space and combined them\n * with the results already in \"data\".\n */\nstatic isl_stat collect_filter_prefix_expansion(\n\t__isl_take isl_schedule_tree *tree,\n\t__isl_keep isl_schedule_tree_list *list, int n,\n\tstruct isl_schedule_node_get_filter_prefix_data *data)\n{\n\tstruct isl_schedule_node_get_filter_prefix_data contracted;\n\tisl_union_pw_multi_aff *c;\n\tisl_union_map *exp, *universe;\n\tisl_union_set *filter;\n\n\tc = isl_schedule_tree_expansion_get_contraction(tree);\n\texp = isl_schedule_tree_expansion_get_expansion(tree);\n\n\tcontracted.initialized = 1;\n\tcontracted.universe_domain = data->universe_domain;\n\tcontracted.universe_filter = data->universe_filter;\n\tcontracted.collect_prefix = data->collect_prefix;\n\tuniverse = isl_union_map_universe(isl_union_map_copy(exp));\n\tfilter = isl_union_map_domain(universe);\n\tif (data->collect_prefix) {\n\t\tisl_space *space = isl_union_set_get_space(filter);\n\t\tspace = isl_space_set_from_params(space);\n\t\tcontracted.prefix = isl_multi_union_pw_aff_zero(space);\n\t}\n\tcontracted.filter = filter;\n\n\tif (collect_filter_prefix(list, n, &contracted) < 0)\n\t\tcontracted.filter = isl_union_set_free(contracted.filter);\n\tif (data->collect_prefix) {\n\t\tisl_multi_union_pw_aff *prefix;\n\n\t\tprefix = contracted.prefix;\n\t\tprefix =\n\t\t    isl_multi_union_pw_aff_pullback_union_pw_multi_aff(prefix,\n\t\t\t\t\t\tisl_union_pw_multi_aff_copy(c));\n\t\tdata->prefix = isl_multi_union_pw_aff_flat_range_product(\n\t\t\t\t\t\tprefix, data->prefix);\n\t}\n\tfilter = contracted.filter;\n\tif (data->universe_domain)\n\t\tfilter = isl_union_set_preimage_union_pw_multi_aff(filter,\n\t\t\t\t\t\tisl_union_pw_multi_aff_copy(c));\n\telse\n\t\tfilter = isl_union_set_apply(filter, isl_union_map_copy(exp));\n\tif (!data->initialized)\n\t\tdata->filter = filter;\n\telse\n\t\tdata->filter = isl_union_set_intersect(filter, data->filter);\n\tdata->initialized = 1;\n\n\tisl_union_pw_multi_aff_free(c);\n\tisl_union_map_free(exp);\n\tisl_schedule_tree_free(tree);\n\n\treturn isl_stat_ok;\n}\n\n/* Update the filter information in \"data\" based on the first \"n\"\n * elements in \"list\" and the extension tree root \"tree\", in case\n * data->universe_domain is set and data->collect_prefix is not.\n *\n * We collect the universe domain of the elements in \"list\" and\n * add it to the universe range of the extension (intersected\n * with the already collected filter, if any).\n */\nstatic isl_stat collect_universe_domain_extension(\n\t__isl_take isl_schedule_tree *tree,\n\t__isl_keep isl_schedule_tree_list *list, int n,\n\tstruct isl_schedule_node_get_filter_prefix_data *data)\n{\n\tstruct isl_schedule_node_get_filter_prefix_data data_outer;\n\tisl_union_map *extension;\n\tisl_union_set *filter;\n\n\tdata_outer.initialized = 0;\n\tdata_outer.universe_domain = 1;\n\tdata_outer.universe_filter = data->universe_filter;\n\tdata_outer.collect_prefix = 0;\n\tdata_outer.filter = NULL;\n\tdata_outer.prefix = NULL;\n\n\tif (collect_filter_prefix(list, n, &data_outer) < 0)\n\t\tdata_outer.filter = isl_union_set_free(data_outer.filter);\n\n\textension = isl_schedule_tree_extension_get_extension(tree);\n\textension = isl_union_map_universe(extension);\n\tfilter = isl_union_map_range(extension);\n\tif (data_outer.initialized)\n\t\tfilter = isl_union_set_union(filter, data_outer.filter);\n\tif (data->initialized)\n\t\tfilter = isl_union_set_intersect(filter, data->filter);\n\n\tdata->filter = filter;\n\n\tisl_schedule_tree_free(tree);\n\n\treturn isl_stat_ok;\n}\n\n/* Update \"data\" based on the tree node \"tree\" in case \"data\" has\n * not been initialized yet.\n *\n * Return 0 on success and -1 on error.\n *\n * If \"tree\" is a filter, then we set data->filter to this filter\n * (or its universe).\n * If \"tree\" is a domain, then this means we have reached the root\n * of the schedule tree without being able to extract any information.\n * We therefore initialize data->filter to the universe of the domain,\n * or the domain itself if data->universe_domain is not set.\n * If \"tree\" is a band with at least one member, then we set data->filter\n * to the universe of the schedule domain and replace the zero-dimensional\n * data->prefix by the band schedule (if data->collect_prefix is set).\n */\nstatic isl_stat collect_filter_prefix_init(__isl_keep isl_schedule_tree *tree,\n\tstruct isl_schedule_node_get_filter_prefix_data *data)\n{\n\tenum isl_schedule_node_type type;\n\tisl_multi_union_pw_aff *mupa;\n\tisl_union_set *filter;\n\tisl_size n;\n\n\ttype = isl_schedule_tree_get_type(tree);\n\tswitch (type) {\n\tcase isl_schedule_node_error:\n\t\treturn isl_stat_error;\n\tcase isl_schedule_node_expansion:\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_internal,\n\t\t\t\"should be handled by caller\", return isl_stat_error);\n\tcase isl_schedule_node_extension:\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"cannot handle extension nodes\", return isl_stat_error);\n\tcase isl_schedule_node_context:\n\tcase isl_schedule_node_leaf:\n\tcase isl_schedule_node_guard:\n\tcase isl_schedule_node_mark:\n\tcase isl_schedule_node_sequence:\n\tcase isl_schedule_node_set:\n\t\treturn isl_stat_ok;\n\tcase isl_schedule_node_domain:\n\t\tfilter = isl_schedule_tree_domain_get_domain(tree);\n\t\tif (data->universe_domain)\n\t\t\tfilter = isl_union_set_universe(filter);\n\t\tdata->filter = filter;\n\t\tbreak;\n\tcase isl_schedule_node_band:\n\t\tn = isl_schedule_tree_band_n_member(tree);\n\t\tif (n < 0)\n\t\t\treturn isl_stat_error;\n\t\tif (n == 0)\n\t\t\treturn isl_stat_ok;\n\t\tmupa = isl_schedule_tree_band_get_partial_schedule(tree);\n\t\tif (data->collect_prefix) {\n\t\t\tisl_multi_union_pw_aff_free(data->prefix);\n\t\t\tmupa = isl_multi_union_pw_aff_reset_tuple_id(mupa,\n\t\t\t\t\t\t\t\tisl_dim_set);\n\t\t\tdata->prefix = isl_multi_union_pw_aff_copy(mupa);\n\t\t}\n\t\tfilter = isl_multi_union_pw_aff_domain(mupa);\n\t\tfilter = isl_union_set_universe(filter);\n\t\tdata->filter = filter;\n\t\tbreak;\n\tcase isl_schedule_node_filter:\n\t\tfilter = isl_schedule_tree_filter_get_filter(tree);\n\t\tif (data->universe_filter)\n\t\t\tfilter = isl_union_set_universe(filter);\n\t\tdata->filter = filter;\n\t\tbreak;\n\t}\n\n\tif ((data->collect_prefix && !data->prefix) || !data->filter)\n\t\treturn isl_stat_error;\n\n\tdata->initialized = 1;\n\n\treturn isl_stat_ok;\n}\n\n/* Update \"data\" based on the tree node \"tree\" in case \"data\" has\n * already been initialized.\n *\n * Return 0 on success and -1 on error.\n *\n * If \"tree\" is a domain and data->universe_domain is not set, then\n * intersect data->filter with the domain.\n * If \"tree\" is a filter, then we intersect data->filter with this filter\n * (or its universe).\n * If \"tree\" is a band with at least one member and data->collect_prefix\n * is set, then we extend data->prefix with the band schedule.\n * If \"tree\" is an extension, then we make sure that we are not collecting\n * information on any extended domain elements.\n */\nstatic isl_stat collect_filter_prefix_update(__isl_keep isl_schedule_tree *tree,\n\tstruct isl_schedule_node_get_filter_prefix_data *data)\n{\n\tenum isl_schedule_node_type type;\n\tisl_multi_union_pw_aff *mupa;\n\tisl_union_set *filter;\n\tisl_union_map *extension;\n\tisl_bool empty;\n\tisl_size n;\n\n\ttype = isl_schedule_tree_get_type(tree);\n\tswitch (type) {\n\tcase isl_schedule_node_error:\n\t\treturn isl_stat_error;\n\tcase isl_schedule_node_expansion:\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_internal,\n\t\t\t\"should be handled by caller\", return isl_stat_error);\n\tcase isl_schedule_node_extension:\n\t\textension = isl_schedule_tree_extension_get_extension(tree);\n\t\textension = isl_union_map_intersect_range(extension,\n\t\t\t\t\tisl_union_set_copy(data->filter));\n\t\tempty = isl_union_map_is_empty(extension);\n\t\tisl_union_map_free(extension);\n\t\tif (empty < 0)\n\t\t\treturn isl_stat_error;\n\t\tif (empty)\n\t\t\tbreak;\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"cannot handle extension nodes\", return isl_stat_error);\n\tcase isl_schedule_node_context:\n\tcase isl_schedule_node_leaf:\n\tcase isl_schedule_node_guard:\n\tcase isl_schedule_node_mark:\n\tcase isl_schedule_node_sequence:\n\tcase isl_schedule_node_set:\n\t\tbreak;\n\tcase isl_schedule_node_domain:\n\t\tif (data->universe_domain)\n\t\t\tbreak;\n\t\tfilter = isl_schedule_tree_domain_get_domain(tree);\n\t\tdata->filter = isl_union_set_intersect(data->filter, filter);\n\t\tbreak;\n\tcase isl_schedule_node_band:\n\t\tn = isl_schedule_tree_band_n_member(tree);\n\t\tif (n < 0)\n\t\t\treturn isl_stat_error;\n\t\tif (n == 0)\n\t\t\tbreak;\n\t\tif (!data->collect_prefix)\n\t\t\tbreak;\n\t\tmupa = isl_schedule_tree_band_get_partial_schedule(tree);\n\t\tdata->prefix = isl_multi_union_pw_aff_flat_range_product(mupa,\n\t\t\t\t\t\t\t\tdata->prefix);\n\t\tif (!data->prefix)\n\t\t\treturn isl_stat_error;\n\t\tbreak;\n\tcase isl_schedule_node_filter:\n\t\tfilter = isl_schedule_tree_filter_get_filter(tree);\n\t\tif (data->universe_filter)\n\t\t\tfilter = isl_union_set_universe(filter);\n\t\tdata->filter = isl_union_set_intersect(data->filter, filter);\n\t\tif (!data->filter)\n\t\t\treturn isl_stat_error;\n\t\tbreak;\n\t}\n\n\treturn isl_stat_ok;\n}\n\n/* Collect filter and/or prefix information from the first \"n\"\n * elements in \"list\" (which represent the ancestors of a node).\n * Store the results in \"data\".\n *\n * Extension nodes are only supported if they do not affect the outcome,\n * i.e., if we are collecting information on non-extended domain elements,\n * or if we are collecting the universe domain (without prefix).\n *\n * Return 0 on success and -1 on error.\n *\n * We traverse the list from innermost ancestor (last element)\n * to outermost ancestor (first element), calling collect_filter_prefix_init\n * on each node as long as we have not been able to extract any information\n * yet and collect_filter_prefix_update afterwards.\n * If we come across an expansion node, then we interrupt the traversal\n * and call collect_filter_prefix_expansion to restart the traversal\n * over the remaining ancestors and to combine the results with those\n * that have already been collected.\n * If we come across an extension node and we are only computing\n * the universe domain, then we interrupt the traversal and call\n * collect_universe_domain_extension to restart the traversal\n * over the remaining ancestors and to combine the results with those\n * that have already been collected.\n * On successful return, data->initialized will be set since the outermost\n * ancestor is a domain node, which always results in an initialization.\n */\nstatic isl_stat collect_filter_prefix(__isl_keep isl_schedule_tree_list *list,\n\tint n, struct isl_schedule_node_get_filter_prefix_data *data)\n{\n\tint i;\n\n\tif (!list)\n\t\treturn isl_stat_error;\n\n\tfor (i = n - 1; i >= 0; --i) {\n\t\tisl_schedule_tree *tree;\n\t\tenum isl_schedule_node_type type;\n\t\tisl_stat r;\n\n\t\ttree = isl_schedule_tree_list_get_schedule_tree(list, i);\n\t\tif (!tree)\n\t\t\treturn isl_stat_error;\n\t\ttype = isl_schedule_tree_get_type(tree);\n\t\tif (type == isl_schedule_node_expansion)\n\t\t\treturn collect_filter_prefix_expansion(tree, list, i,\n\t\t\t\t\t\t\t\tdata);\n\t\tif (type == isl_schedule_node_extension &&\n\t\t    data->universe_domain && !data->collect_prefix)\n\t\t\treturn collect_universe_domain_extension(tree, list, i,\n\t\t\t\t\t\t\t\tdata);\n\t\tif (!data->initialized)\n\t\t\tr = collect_filter_prefix_init(tree, data);\n\t\telse\n\t\t\tr = collect_filter_prefix_update(tree, data);\n\t\tisl_schedule_tree_free(tree);\n\t\tif (r < 0)\n\t\t\treturn isl_stat_error;\n\t}\n\n\treturn isl_stat_ok;\n}\n\n/* Return the concatenation of the partial schedules of all outer band\n * nodes of \"node\" interesected with all outer filters\n * as an isl_multi_union_pw_aff.\n * None of the ancestors of \"node\" may be an extension node, unless\n * there is also a filter ancestor that filters out all the extended\n * domain elements.\n *\n * If \"node\" is pointing at the root of the schedule tree, then\n * there are no domain elements reaching the current node, so\n * we return an empty result.\n *\n * We collect all the filters and partial schedules in collect_filter_prefix\n * and intersect the domain of the combined schedule with the combined filter.\n */\n__isl_give isl_multi_union_pw_aff *\nisl_schedule_node_get_prefix_schedule_multi_union_pw_aff(\n\t__isl_keep isl_schedule_node *node)\n{\n\tisl_size n;\n\tisl_space *space;\n\tstruct isl_schedule_node_get_filter_prefix_data data;\n\n\tif (!node)\n\t\treturn NULL;\n\n\tspace = isl_schedule_get_space(node->schedule);\n\tspace = isl_space_set_from_params(space);\n\tif (node->tree == node->schedule->root)\n\t\treturn isl_multi_union_pw_aff_zero(space);\n\n\tdata.initialized = 0;\n\tdata.universe_domain = 1;\n\tdata.universe_filter = 0;\n\tdata.collect_prefix = 1;\n\tdata.filter = NULL;\n\tdata.prefix = isl_multi_union_pw_aff_zero(space);\n\n\tn = isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n\tif (n < 0 || collect_filter_prefix(node->ancestors, n, &data) < 0)\n\t\tdata.prefix = isl_multi_union_pw_aff_free(data.prefix);\n\n\tdata.prefix = isl_multi_union_pw_aff_intersect_domain(data.prefix,\n\t\t\t\t\t\t\t\tdata.filter);\n\n\treturn data.prefix;\n}\n\n/* Return the concatenation of the partial schedules of all outer band\n * nodes of \"node\" interesected with all outer filters\n * as an isl_union_pw_multi_aff.\n * None of the ancestors of \"node\" may be an extension node, unless\n * there is also a filter ancestor that filters out all the extended\n * domain elements.\n *\n * If \"node\" is pointing at the root of the schedule tree, then\n * there are no domain elements reaching the current node, so\n * we return an empty result.\n *\n * We collect all the filters and partial schedules in collect_filter_prefix.\n * The partial schedules are collected as an isl_multi_union_pw_aff.\n * If this isl_multi_union_pw_aff is zero-dimensional, then it does not\n * contain any domain information, so we construct the isl_union_pw_multi_aff\n * result as a zero-dimensional function on the collected filter.\n * Otherwise, we convert the isl_multi_union_pw_aff to\n * an isl_multi_union_pw_aff and intersect the domain with the filter.\n */\n__isl_give isl_union_pw_multi_aff *\nisl_schedule_node_get_prefix_schedule_union_pw_multi_aff(\n\t__isl_keep isl_schedule_node *node)\n{\n\tisl_size n, dim;\n\tisl_space *space;\n\tisl_union_pw_multi_aff *prefix;\n\tstruct isl_schedule_node_get_filter_prefix_data data;\n\n\tif (!node)\n\t\treturn NULL;\n\n\tspace = isl_schedule_get_space(node->schedule);\n\tif (node->tree == node->schedule->root)\n\t\treturn isl_union_pw_multi_aff_empty(space);\n\n\tspace = isl_space_set_from_params(space);\n\tdata.initialized = 0;\n\tdata.universe_domain = 1;\n\tdata.universe_filter = 0;\n\tdata.collect_prefix = 1;\n\tdata.filter = NULL;\n\tdata.prefix = isl_multi_union_pw_aff_zero(space);\n\n\tn = isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n\tif (n < 0 || collect_filter_prefix(node->ancestors, n, &data) < 0)\n\t\tdata.prefix = isl_multi_union_pw_aff_free(data.prefix);\n\n\tdim = isl_multi_union_pw_aff_dim(data.prefix, isl_dim_set);\n\tif (dim < 0)\n\t\tdata.prefix = isl_multi_union_pw_aff_free(data.prefix);\n\tif (data.prefix && dim == 0) {\n\t\tisl_multi_union_pw_aff_free(data.prefix);\n\t\tprefix = isl_union_pw_multi_aff_from_domain(data.filter);\n\t} else {\n\t\tprefix =\n\t\t    isl_union_pw_multi_aff_from_multi_union_pw_aff(data.prefix);\n\t\tprefix = isl_union_pw_multi_aff_intersect_domain(prefix,\n\t\t\t\t\t\t\t\tdata.filter);\n\t}\n\n\treturn prefix;\n}\n\n/* Return the concatenation of the partial schedules of all outer band\n * nodes of \"node\" interesected with all outer filters\n * as an isl_union_map.\n */\n__isl_give isl_union_map *isl_schedule_node_get_prefix_schedule_union_map(\n\t__isl_keep isl_schedule_node *node)\n{\n\tisl_union_pw_multi_aff *upma;\n\n\tupma = isl_schedule_node_get_prefix_schedule_union_pw_multi_aff(node);\n\treturn isl_union_map_from_union_pw_multi_aff(upma);\n}\n\n/* Return the concatenation of the partial schedules of all outer band\n * nodes of \"node\" intersected with all outer domain constraints.\n * None of the ancestors of \"node\" may be an extension node, unless\n * there is also a filter ancestor that filters out all the extended\n * domain elements.\n *\n * Essentially, this function intersects the domain of the output\n * of isl_schedule_node_get_prefix_schedule_union_map with the output\n * of isl_schedule_node_get_domain, except that it only traverses\n * the ancestors of \"node\" once.\n */\n__isl_give isl_union_map *isl_schedule_node_get_prefix_schedule_relation(\n\t__isl_keep isl_schedule_node *node)\n{\n\tisl_size n, dim;\n\tisl_space *space;\n\tisl_union_map *prefix;\n\tstruct isl_schedule_node_get_filter_prefix_data data;\n\n\tif (!node)\n\t\treturn NULL;\n\n\tspace = isl_schedule_get_space(node->schedule);\n\tif (node->tree == node->schedule->root)\n\t\treturn isl_union_map_empty(space);\n\n\tspace = isl_space_set_from_params(space);\n\tdata.initialized = 0;\n\tdata.universe_domain = 0;\n\tdata.universe_filter = 0;\n\tdata.collect_prefix = 1;\n\tdata.filter = NULL;\n\tdata.prefix = isl_multi_union_pw_aff_zero(space);\n\n\tn = isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n\tif (n < 0 || collect_filter_prefix(node->ancestors, n, &data) < 0)\n\t\tdata.prefix = isl_multi_union_pw_aff_free(data.prefix);\n\n\tdim = isl_multi_union_pw_aff_dim(data.prefix, isl_dim_set);\n\tif (dim < 0)\n\t\tdata.prefix = isl_multi_union_pw_aff_free(data.prefix);\n\tif (data.prefix && dim == 0) {\n\t\tisl_multi_union_pw_aff_free(data.prefix);\n\t\tprefix = isl_union_map_from_domain(data.filter);\n\t} else {\n\t\tprefix = isl_union_map_from_multi_union_pw_aff(data.prefix);\n\t\tprefix = isl_union_map_intersect_domain(prefix, data.filter);\n\t}\n\n\treturn prefix;\n}\n\n/* Return the domain elements that reach \"node\".\n *\n * If \"node\" is pointing at the root of the schedule tree, then\n * there are no domain elements reaching the current node, so\n * we return an empty result.\n * None of the ancestors of \"node\" may be an extension node, unless\n * there is also a filter ancestor that filters out all the extended\n * domain elements.\n *\n * Otherwise, we collect all filters reaching the node,\n * intersected with the root domain in collect_filter_prefix.\n */\n__isl_give isl_union_set *isl_schedule_node_get_domain(\n\t__isl_keep isl_schedule_node *node)\n{\n\tisl_size n;\n\tstruct isl_schedule_node_get_filter_prefix_data data;\n\n\tif (!node)\n\t\treturn NULL;\n\n\tif (node->tree == node->schedule->root) {\n\t\tisl_space *space;\n\n\t\tspace = isl_schedule_get_space(node->schedule);\n\t\treturn isl_union_set_empty(space);\n\t}\n\n\tdata.initialized = 0;\n\tdata.universe_domain = 0;\n\tdata.universe_filter = 0;\n\tdata.collect_prefix = 0;\n\tdata.filter = NULL;\n\tdata.prefix = NULL;\n\n\tn = isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n\tif (n < 0 || collect_filter_prefix(node->ancestors, n, &data) < 0)\n\t\tdata.filter = isl_union_set_free(data.filter);\n\n\treturn data.filter;\n}\n\n/* Return the union of universe sets of the domain elements that reach \"node\".\n *\n * If \"node\" is pointing at the root of the schedule tree, then\n * there are no domain elements reaching the current node, so\n * we return an empty result.\n *\n * Otherwise, we collect the universes of all filters reaching the node\n * in collect_filter_prefix.\n */\n__isl_give isl_union_set *isl_schedule_node_get_universe_domain(\n\t__isl_keep isl_schedule_node *node)\n{\n\tisl_size n;\n\tstruct isl_schedule_node_get_filter_prefix_data data;\n\n\tif (!node)\n\t\treturn NULL;\n\n\tif (node->tree == node->schedule->root) {\n\t\tisl_space *space;\n\n\t\tspace = isl_schedule_get_space(node->schedule);\n\t\treturn isl_union_set_empty(space);\n\t}\n\n\tdata.initialized = 0;\n\tdata.universe_domain = 1;\n\tdata.universe_filter = 1;\n\tdata.collect_prefix = 0;\n\tdata.filter = NULL;\n\tdata.prefix = NULL;\n\n\tn = isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n\tif (n < 0 || collect_filter_prefix(node->ancestors, n, &data) < 0)\n\t\tdata.filter = isl_union_set_free(data.filter);\n\n\treturn data.filter;\n}\n\n/* Return the subtree schedule of \"node\".\n *\n * Since isl_schedule_tree_get_subtree_schedule_union_map does not handle\n * trees that do not contain any schedule information, we first\n * move down to the first relevant descendant and handle leaves ourselves.\n *\n * If the subtree rooted at \"node\" contains any expansion nodes, then\n * the returned subtree schedule is formulated in terms of the expanded\n * domains.\n * The subtree is not allowed to contain any extension nodes.\n */\n__isl_give isl_union_map *isl_schedule_node_get_subtree_schedule_union_map(\n\t__isl_keep isl_schedule_node *node)\n{\n\tisl_schedule_tree *tree, *leaf;\n\tisl_union_map *umap;\n\n\ttree = isl_schedule_node_get_tree(node);\n\tleaf = isl_schedule_node_peek_leaf(node);\n\ttree = isl_schedule_tree_first_schedule_descendant(tree, leaf);\n\tif (!tree)\n\t\treturn NULL;\n\tif (tree == leaf) {\n\t\tisl_union_set *domain;\n\t\tdomain = isl_schedule_node_get_universe_domain(node);\n\t\tisl_schedule_tree_free(tree);\n\t\treturn isl_union_map_from_domain(domain);\n\t}\n\n\tumap = isl_schedule_tree_get_subtree_schedule_union_map(tree);\n\tisl_schedule_tree_free(tree);\n\treturn umap;\n}\n\n/* Return the number of ancestors of \"node\" in its schedule tree.\n */\nisl_size isl_schedule_node_get_tree_depth(__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn isl_size_error;\n\treturn isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n}\n\n/* Does \"node\" have a parent?\n *\n * That is, does it point to any node of the schedule other than the root?\n */\nisl_bool isl_schedule_node_has_parent(__isl_keep isl_schedule_node *node)\n{\n\tisl_size depth;\n\n\tdepth = isl_schedule_node_get_tree_depth(node);\n\tif (depth < 0)\n\t\treturn isl_bool_error;\n\treturn isl_bool_ok(depth != 0);\n}\n\n/* Return the position of \"node\" among the children of its parent.\n */\nisl_size isl_schedule_node_get_child_position(\n\t__isl_keep isl_schedule_node *node)\n{\n\tisl_size n;\n\tisl_bool has_parent;\n\n\tif (!node)\n\t\treturn isl_size_error;\n\thas_parent = isl_schedule_node_has_parent(node);\n\tif (has_parent < 0)\n\t\treturn isl_size_error;\n\tif (!has_parent)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"node has no parent\", return isl_size_error);\n\n\tn = isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n\treturn n < 0 ? isl_size_error : node->child_pos[n - 1];\n}\n\n/* Does the parent (if any) of \"node\" have any children with a smaller child\n * position than this one?\n */\nisl_bool isl_schedule_node_has_previous_sibling(\n\t__isl_keep isl_schedule_node *node)\n{\n\tisl_size n;\n\tisl_bool has_parent;\n\n\tif (!node)\n\t\treturn isl_bool_error;\n\thas_parent = isl_schedule_node_has_parent(node);\n\tif (has_parent < 0 || !has_parent)\n\t\treturn has_parent;\n\n\tn = isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n\tif (n < 0)\n\t\treturn isl_bool_error;\n\n\treturn isl_bool_ok(node->child_pos[n - 1] > 0);\n}\n\n/* Does the parent (if any) of \"node\" have any children with a greater child\n * position than this one?\n */\nisl_bool isl_schedule_node_has_next_sibling(__isl_keep isl_schedule_node *node)\n{\n\tisl_size n, n_child;\n\tisl_bool has_parent;\n\tisl_schedule_tree *tree;\n\n\tif (!node)\n\t\treturn isl_bool_error;\n\thas_parent = isl_schedule_node_has_parent(node);\n\tif (has_parent < 0 || !has_parent)\n\t\treturn has_parent;\n\n\tn = isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n\tif (n < 0)\n\t\treturn isl_bool_error;\n\ttree = isl_schedule_tree_list_get_schedule_tree(node->ancestors, n - 1);\n\tn_child = isl_schedule_tree_n_children(tree);\n\tisl_schedule_tree_free(tree);\n\tif (n_child < 0)\n\t\treturn isl_bool_error;\n\n\treturn isl_bool_ok(node->child_pos[n - 1] + 1 < n_child);\n}\n\n/* Does \"node\" have any children?\n *\n * Any node other than the leaf nodes is considered to have at least\n * one child, even if the corresponding isl_schedule_tree does not\n * have any children.\n */\nisl_bool isl_schedule_node_has_children(__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn isl_bool_error;\n\treturn isl_bool_ok(!isl_schedule_tree_is_leaf(node->tree));\n}\n\n/* Return the number of children of \"node\"?\n *\n * Any node other than the leaf nodes is considered to have at least\n * one child, even if the corresponding isl_schedule_tree does not\n * have any children.  That is, the number of children of \"node\" is\n * only zero if its tree is the explicit empty tree.  Otherwise,\n * if the isl_schedule_tree has any children, then it is equal\n * to the number of children of \"node\".  If it has zero children,\n * then \"node\" still has a leaf node as child.\n */\nisl_size isl_schedule_node_n_children(__isl_keep isl_schedule_node *node)\n{\n\tisl_size n;\n\n\tif (!node)\n\t\treturn isl_size_error;\n\n\tif (isl_schedule_tree_is_leaf(node->tree))\n\t\treturn 0;\n\n\tn = isl_schedule_tree_n_children(node->tree);\n\tif (n < 0)\n\t\treturn isl_size_error;\n\tif (n == 0)\n\t\treturn 1;\n\n\treturn n;\n}\n\n/* Move the \"node\" pointer to the ancestor of the given generation\n * of the node it currently points to, where generation 0 is the node\n * itself and generation 1 is its parent.\n */\n__isl_give isl_schedule_node *isl_schedule_node_ancestor(\n\t__isl_take isl_schedule_node *node, int generation)\n{\n\tisl_size n;\n\tisl_schedule_tree *tree;\n\n\tif (!node)\n\t\treturn NULL;\n\tif (generation == 0)\n\t\treturn node;\n\tn = isl_schedule_node_get_tree_depth(node);\n\tif (n < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (generation < 0 || generation > n)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"generation out of bounds\",\n\t\t\treturn isl_schedule_node_free(node));\n\tnode = isl_schedule_node_cow(node);\n\tif (!node)\n\t\treturn NULL;\n\n\ttree = isl_schedule_tree_list_get_schedule_tree(node->ancestors,\n\t\t\t\t\t\t\tn - generation);\n\tisl_schedule_tree_free(node->tree);\n\tnode->tree = tree;\n\tnode->ancestors = isl_schedule_tree_list_drop(node->ancestors,\n\t\t\t\t\t\t    n - generation, generation);\n\tif (!node->ancestors || !node->tree)\n\t\treturn isl_schedule_node_free(node);\n\n\treturn node;\n}\n\n/* Move the \"node\" pointer to the parent of the node it currently points to.\n */\n__isl_give isl_schedule_node *isl_schedule_node_parent(\n\t__isl_take isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\tif (!isl_schedule_node_has_parent(node))\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"node has no parent\",\n\t\t\treturn isl_schedule_node_free(node));\n\treturn isl_schedule_node_ancestor(node, 1);\n}\n\n/* Move the \"node\" pointer to the root of its schedule tree.\n */\n__isl_give isl_schedule_node *isl_schedule_node_root(\n\t__isl_take isl_schedule_node *node)\n{\n\tisl_size n;\n\n\tif (!node)\n\t\treturn NULL;\n\tn = isl_schedule_node_get_tree_depth(node);\n\tif (n < 0)\n\t\treturn isl_schedule_node_free(node);\n\treturn isl_schedule_node_ancestor(node, n);\n}\n\n/* Move the \"node\" pointer to the child at position \"pos\" of the node\n * it currently points to.\n */\n__isl_give isl_schedule_node *isl_schedule_node_child(\n\t__isl_take isl_schedule_node *node, int pos)\n{\n\tisl_size n;\n\tisl_ctx *ctx;\n\tisl_schedule_tree *tree;\n\tint *child_pos;\n\n\tnode = isl_schedule_node_cow(node);\n\tif (!node)\n\t\treturn NULL;\n\tif (!isl_schedule_node_has_children(node))\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"node has no children\",\n\t\t\treturn isl_schedule_node_free(node));\n\n\tctx = isl_schedule_node_get_ctx(node);\n\tn = isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n\tif (n < 0)\n\t\treturn isl_schedule_node_free(node);\n\tchild_pos = isl_realloc_array(ctx, node->child_pos, int, n + 1);\n\tif (!child_pos)\n\t\treturn isl_schedule_node_free(node);\n\tnode->child_pos = child_pos;\n\tnode->child_pos[n] = pos;\n\n\tnode->ancestors = isl_schedule_tree_list_add(node->ancestors,\n\t\t\t\tisl_schedule_tree_copy(node->tree));\n\ttree = node->tree;\n\tif (isl_schedule_tree_has_children(tree))\n\t\ttree = isl_schedule_tree_get_child(tree, pos);\n\telse\n\t\ttree = isl_schedule_node_get_leaf(node);\n\tisl_schedule_tree_free(node->tree);\n\tnode->tree = tree;\n\n\tif (!node->tree || !node->ancestors)\n\t\treturn isl_schedule_node_free(node);\n\n\treturn node;\n}\n\n/* Move the \"node\" pointer to the first child of the node\n * it currently points to.\n */\n__isl_give isl_schedule_node *isl_schedule_node_first_child(\n\t__isl_take isl_schedule_node *node)\n{\n\treturn isl_schedule_node_child(node, 0);\n}\n\n/* Move the \"node\" pointer to the child of this node's parent in\n * the previous child position.\n */\n__isl_give isl_schedule_node *isl_schedule_node_previous_sibling(\n\t__isl_take isl_schedule_node *node)\n{\n\tisl_size n;\n\tisl_schedule_tree *parent, *tree;\n\n\tnode = isl_schedule_node_cow(node);\n\tif (!node)\n\t\treturn NULL;\n\tif (!isl_schedule_node_has_previous_sibling(node))\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"node has no previous sibling\",\n\t\t\treturn isl_schedule_node_free(node));\n\n\tn = isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n\tif (n < 0)\n\t\treturn isl_schedule_node_free(node);\n\tparent = isl_schedule_tree_list_get_schedule_tree(node->ancestors,\n\t\t\t\t\t\t\t\t\tn - 1);\n\tif (!parent)\n\t\treturn isl_schedule_node_free(node);\n\tnode->child_pos[n - 1]--;\n\ttree = isl_schedule_tree_list_get_schedule_tree(parent->children,\n\t\t\t\t\t\t\tnode->child_pos[n - 1]);\n\tisl_schedule_tree_free(parent);\n\tif (!tree)\n\t\treturn isl_schedule_node_free(node);\n\tisl_schedule_tree_free(node->tree);\n\tnode->tree = tree;\n\n\treturn node;\n}\n\n/* Move the \"node\" pointer to the child of this node's parent in\n * the next child position.\n */\n__isl_give isl_schedule_node *isl_schedule_node_next_sibling(\n\t__isl_take isl_schedule_node *node)\n{\n\tisl_size n;\n\tisl_schedule_tree *parent, *tree;\n\n\tnode = isl_schedule_node_cow(node);\n\tif (!node)\n\t\treturn NULL;\n\tif (!isl_schedule_node_has_next_sibling(node))\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"node has no next sibling\",\n\t\t\treturn isl_schedule_node_free(node));\n\n\tn = isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n\tif (n < 0)\n\t\treturn isl_schedule_node_free(node);\n\tparent = isl_schedule_tree_list_get_schedule_tree(node->ancestors,\n\t\t\t\t\t\t\t\t\tn - 1);\n\tif (!parent)\n\t\treturn isl_schedule_node_free(node);\n\tnode->child_pos[n - 1]++;\n\ttree = isl_schedule_tree_list_get_schedule_tree(parent->children,\n\t\t\t\t\t\t\tnode->child_pos[n - 1]);\n\tisl_schedule_tree_free(parent);\n\tif (!tree)\n\t\treturn isl_schedule_node_free(node);\n\tisl_schedule_tree_free(node->tree);\n\tnode->tree = tree;\n\n\treturn node;\n}\n\n/* Return a copy to the child at position \"pos\" of \"node\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_get_child(\n\t__isl_keep isl_schedule_node *node, int pos)\n{\n\treturn isl_schedule_node_child(isl_schedule_node_copy(node), pos);\n}\n\n/* Traverse the descendant of \"node\" in depth-first order, including\n * \"node\" itself.  Call \"enter\" whenever a node is entered and \"leave\"\n * whenever a node is left.  The callback \"enter\" is responsible\n * for moving to the deepest initial subtree of its argument that\n * should be traversed.\n */\nstatic __isl_give isl_schedule_node *traverse(\n\t__isl_take isl_schedule_node *node,\n\t__isl_give isl_schedule_node *(*enter)(\n\t\t__isl_take isl_schedule_node *node, void *user),\n\t__isl_give isl_schedule_node *(*leave)(\n\t\t__isl_take isl_schedule_node *node, void *user),\n\tvoid *user)\n{\n\tisl_size depth;\n\tisl_size node_depth;\n\n\tdepth = isl_schedule_node_get_tree_depth(node);\n\tif (depth < 0)\n\t\treturn isl_schedule_node_free(node);\n\n\tdo {\n\t\tnode = enter(node, user);\n\t\tnode = leave(node, user);\n\t\twhile ((node_depth = isl_schedule_node_get_tree_depth(node)) >\n\t\t\t\tdepth &&\n\t\t\t\t!isl_schedule_node_has_next_sibling(node)) {\n\t\t\tnode = isl_schedule_node_parent(node);\n\t\t\tnode = leave(node, user);\n\t\t}\n\t\tif (node_depth < 0)\n\t\t\treturn isl_schedule_node_free(node);\n\t\tif (node_depth > depth)\n\t\t\tnode = isl_schedule_node_next_sibling(node);\n\t} while (node_depth > depth);\n\n\treturn node;\n}\n\n/* Internal data structure for isl_schedule_node_foreach_descendant_top_down.\n *\n * \"fn\" is the user-specified callback function.\n * \"user\" is the user-specified argument for the callback.\n */\nstruct isl_schedule_node_preorder_data {\n\tisl_bool (*fn)(__isl_keep isl_schedule_node *node, void *user);\n\tvoid *user;\n};\n\n/* Callback for \"traverse\" to enter a node and to move\n * to the deepest initial subtree that should be traversed\n * for use in a preorder visit.\n *\n * If the user callback returns a negative value, then we abort\n * the traversal.  If this callback returns zero, then we skip\n * the subtree rooted at the current node.  Otherwise, we move\n * down to the first child and repeat the process until a leaf\n * is reached.\n */\nstatic __isl_give isl_schedule_node *preorder_enter(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\tstruct isl_schedule_node_preorder_data *data = user;\n\n\tif (!node)\n\t\treturn NULL;\n\n\tdo {\n\t\tisl_bool r;\n\n\t\tr = data->fn(node, data->user);\n\t\tif (r < 0)\n\t\t\treturn isl_schedule_node_free(node);\n\t\tif (r == isl_bool_false)\n\t\t\treturn node;\n\t} while (isl_schedule_node_has_children(node) &&\n\t\t(node = isl_schedule_node_first_child(node)) != NULL);\n\n\treturn node;\n}\n\n/* Callback for \"traverse\" to leave a node\n * for use in a preorder visit.\n * Since we already visited the node when we entered it,\n * we do not need to do anything here.\n */\nstatic __isl_give isl_schedule_node *preorder_leave(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\treturn node;\n}\n\n/* Traverse the descendants of \"node\" (including the node itself)\n * in depth first preorder.\n *\n * If \"fn\" returns isl_bool_error on any of the nodes,\n * then the traversal is aborted.\n * If \"fn\" returns isl_bool_false on any of the nodes, then the subtree rooted\n * at that node is skipped.\n *\n * Return isl_stat_ok on success and isl_stat_error on failure.\n */\nisl_stat isl_schedule_node_foreach_descendant_top_down(\n\t__isl_keep isl_schedule_node *node,\n\tisl_bool (*fn)(__isl_keep isl_schedule_node *node, void *user),\n\tvoid *user)\n{\n\tstruct isl_schedule_node_preorder_data data = { fn, user };\n\n\tnode = isl_schedule_node_copy(node);\n\tnode = traverse(node, &preorder_enter, &preorder_leave, &data);\n\tisl_schedule_node_free(node);\n\n\treturn node ? isl_stat_ok : isl_stat_error;\n}\n\n/* Internal data structure for isl_schedule_node_every_descendant.\n *\n * \"test\" is the user-specified callback function.\n * \"user\" is the user-specified callback function argument.\n *\n * \"failed\" is initialized to 0 and set to 1 if \"test\" fails\n * on any node.\n */\nstruct isl_union_map_every_data {\n\tisl_bool (*test)(__isl_keep isl_schedule_node *node, void *user);\n\tvoid *user;\n\tint failed;\n};\n\n/* isl_schedule_node_foreach_descendant_top_down callback\n * that sets data->failed if data->test returns false and\n * subsequently aborts the traversal.\n */\nstatic isl_bool call_every(__isl_keep isl_schedule_node *node, void *user)\n{\n\tstruct isl_union_map_every_data *data = user;\n\tisl_bool r;\n\n\tr = data->test(node, data->user);\n\tif (r < 0)\n\t\treturn isl_bool_error;\n\tif (r)\n\t\treturn isl_bool_true;\n\tdata->failed = 1;\n\treturn isl_bool_error;\n}\n\n/* Does \"test\" succeed on every descendant of \"node\" (including \"node\" itself)?\n */\nisl_bool isl_schedule_node_every_descendant(__isl_keep isl_schedule_node *node,\n\tisl_bool (*test)(__isl_keep isl_schedule_node *node, void *user),\n\tvoid *user)\n{\n\tstruct isl_union_map_every_data data = { test, user, 0 };\n\tisl_stat r;\n\n\tr = isl_schedule_node_foreach_descendant_top_down(node, &call_every,\n\t\t\t\t\t\t\t&data);\n\tif (r >= 0)\n\t\treturn isl_bool_true;\n\tif (data.failed)\n\t\treturn isl_bool_false;\n\treturn isl_bool_error;\n}\n\n/* Internal data structure for isl_schedule_node_map_descendant_bottom_up.\n *\n * \"fn\" is the user-specified callback function.\n * \"user\" is the user-specified argument for the callback.\n */\nstruct isl_schedule_node_postorder_data {\n\t__isl_give isl_schedule_node *(*fn)(__isl_take isl_schedule_node *node,\n\t\tvoid *user);\n\tvoid *user;\n};\n\n/* Callback for \"traverse\" to enter a node and to move\n * to the deepest initial subtree that should be traversed\n * for use in a postorder visit.\n *\n * Since we are performing a postorder visit, we only need\n * to move to the deepest initial leaf here.\n */\nstatic __isl_give isl_schedule_node *postorder_enter(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\twhile (node && isl_schedule_node_has_children(node))\n\t\tnode = isl_schedule_node_first_child(node);\n\n\treturn node;\n}\n\n/* Callback for \"traverse\" to leave a node\n * for use in a postorder visit.\n *\n * Since we are performing a postorder visit, we need\n * to call the user callback here.\n */\nstatic __isl_give isl_schedule_node *postorder_leave(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\tstruct isl_schedule_node_postorder_data *data = user;\n\n\treturn data->fn(node, data->user);\n}\n\n/* Traverse the descendants of \"node\" (including the node itself)\n * in depth first postorder, allowing the user to modify the visited node.\n * The traversal continues from the node returned by the callback function.\n * It is the responsibility of the user to ensure that this does not\n * lead to an infinite loop.  It is safest to always return a pointer\n * to the same position (same ancestors and child positions) as the input node.\n */\n__isl_give isl_schedule_node *isl_schedule_node_map_descendant_bottom_up(\n\t__isl_take isl_schedule_node *node,\n\t__isl_give isl_schedule_node *(*fn)(__isl_take isl_schedule_node *node,\n\t\tvoid *user), void *user)\n{\n\tstruct isl_schedule_node_postorder_data data = { fn, user };\n\n\treturn traverse(node, &postorder_enter, &postorder_leave, &data);\n}\n\n/* Traverse the ancestors of \"node\" from the root down to and including\n * the parent of \"node\", calling \"fn\" on each of them.\n *\n * If \"fn\" returns -1 on any of the nodes, then the traversal is aborted.\n *\n * Return 0 on success and -1 on failure.\n */\nisl_stat isl_schedule_node_foreach_ancestor_top_down(\n\t__isl_keep isl_schedule_node *node,\n\tisl_stat (*fn)(__isl_keep isl_schedule_node *node, void *user),\n\tvoid *user)\n{\n\tint i;\n\tisl_size n;\n\n\tn = isl_schedule_node_get_tree_depth(node);\n\tif (n < 0)\n\t\treturn isl_stat_error;\n\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_schedule_node *ancestor;\n\t\tisl_stat r;\n\n\t\tancestor = isl_schedule_node_copy(node);\n\t\tancestor = isl_schedule_node_ancestor(ancestor, n - i);\n\t\tr = fn(ancestor, user);\n\t\tisl_schedule_node_free(ancestor);\n\t\tif (r < 0)\n\t\t\treturn isl_stat_error;\n\t}\n\n\treturn isl_stat_ok;\n}\n\n/* Is any node in the subtree rooted at \"node\" anchored?\n * That is, do any of these nodes reference the outer band nodes?\n */\nisl_bool isl_schedule_node_is_subtree_anchored(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn isl_bool_error;\n\treturn isl_schedule_tree_is_subtree_anchored(node->tree);\n}\n\n/* Return the number of members in the given band node.\n */\nisl_size isl_schedule_node_band_n_member(__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn isl_size_error;\n\treturn isl_schedule_tree_band_n_member(node->tree);\n}\n\n/* Is the band member at position \"pos\" of the band node \"node\"\n * marked coincident?\n */\nisl_bool isl_schedule_node_band_member_get_coincident(\n\t__isl_keep isl_schedule_node *node, int pos)\n{\n\tif (!node)\n\t\treturn isl_bool_error;\n\treturn isl_schedule_tree_band_member_get_coincident(node->tree, pos);\n}\n\n/* Mark the band member at position \"pos\" the band node \"node\"\n * as being coincident or not according to \"coincident\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_band_member_set_coincident(\n\t__isl_take isl_schedule_node *node, int pos, int coincident)\n{\n\tint c;\n\tisl_schedule_tree *tree;\n\n\tif (!node)\n\t\treturn NULL;\n\tc = isl_schedule_node_band_member_get_coincident(node, pos);\n\tif (c == coincident)\n\t\treturn node;\n\n\ttree = isl_schedule_tree_copy(node->tree);\n\ttree = isl_schedule_tree_band_member_set_coincident(tree, pos,\n\t\t\t\t\t\t\t    coincident);\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\n\treturn node;\n}\n\n/* Is the band node \"node\" marked permutable?\n */\nisl_bool isl_schedule_node_band_get_permutable(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn isl_bool_error;\n\n\treturn isl_schedule_tree_band_get_permutable(node->tree);\n}\n\n/* Mark the band node \"node\" permutable or not according to \"permutable\"?\n */\n__isl_give isl_schedule_node *isl_schedule_node_band_set_permutable(\n\t__isl_take isl_schedule_node *node, int permutable)\n{\n\tisl_schedule_tree *tree;\n\n\tif (!node)\n\t\treturn NULL;\n\tif (isl_schedule_node_band_get_permutable(node) == permutable)\n\t\treturn node;\n\n\ttree = isl_schedule_tree_copy(node->tree);\n\ttree = isl_schedule_tree_band_set_permutable(tree, permutable);\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\n\treturn node;\n}\n\n/* Return the schedule space of the band node.\n */\n__isl_give isl_space *isl_schedule_node_band_get_space(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\n\treturn isl_schedule_tree_band_get_space(node->tree);\n}\n\n/* Return the schedule of the band node in isolation.\n */\n__isl_give isl_multi_union_pw_aff *isl_schedule_node_band_get_partial_schedule(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\n\treturn isl_schedule_tree_band_get_partial_schedule(node->tree);\n}\n\n/* Return the schedule of the band node in isolation in the form of\n * an isl_union_map.\n *\n * If the band does not have any members, then we construct a universe map\n * with the universe of the domain elements reaching the node as domain.\n * Otherwise, we extract an isl_multi_union_pw_aff representation and\n * convert that to an isl_union_map.\n */\n__isl_give isl_union_map *isl_schedule_node_band_get_partial_schedule_union_map(\n\t__isl_keep isl_schedule_node *node)\n{\n\tisl_size n;\n\tisl_multi_union_pw_aff *mupa;\n\n\tif (!node)\n\t\treturn NULL;\n\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"not a band node\", return NULL);\n\tn = isl_schedule_node_band_n_member(node);\n\tif (n < 0)\n\t\treturn NULL;\n\tif (n == 0) {\n\t\tisl_union_set *domain;\n\n\t\tdomain = isl_schedule_node_get_universe_domain(node);\n\t\treturn isl_union_map_from_domain(domain);\n\t}\n\n\tmupa = isl_schedule_node_band_get_partial_schedule(node);\n\treturn isl_union_map_from_multi_union_pw_aff(mupa);\n}\n\n/* Return the loop AST generation type for the band member of band node \"node\"\n * at position \"pos\".\n */\nenum isl_ast_loop_type isl_schedule_node_band_member_get_ast_loop_type(\n\t__isl_keep isl_schedule_node *node, int pos)\n{\n\tif (!node)\n\t\treturn isl_ast_loop_error;\n\n\treturn isl_schedule_tree_band_member_get_ast_loop_type(node->tree, pos);\n}\n\n/* Set the loop AST generation type for the band member of band node \"node\"\n * at position \"pos\" to \"type\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_band_member_set_ast_loop_type(\n\t__isl_take isl_schedule_node *node, int pos,\n\tenum isl_ast_loop_type type)\n{\n\tisl_schedule_tree *tree;\n\n\tif (!node)\n\t\treturn NULL;\n\n\ttree = isl_schedule_tree_copy(node->tree);\n\ttree = isl_schedule_tree_band_member_set_ast_loop_type(tree, pos, type);\n\treturn isl_schedule_node_graft_tree(node, tree);\n}\n\n/* Return the loop AST generation type for the band member of band node \"node\"\n * at position \"pos\" for the isolated part.\n */\nenum isl_ast_loop_type isl_schedule_node_band_member_get_isolate_ast_loop_type(\n\t__isl_keep isl_schedule_node *node, int pos)\n{\n\tif (!node)\n\t\treturn isl_ast_loop_error;\n\n\treturn isl_schedule_tree_band_member_get_isolate_ast_loop_type(\n\t\t\t\t\t\t\t    node->tree, pos);\n}\n\n/* Set the loop AST generation type for the band member of band node \"node\"\n * at position \"pos\" for the isolated part to \"type\".\n */\n__isl_give isl_schedule_node *\nisl_schedule_node_band_member_set_isolate_ast_loop_type(\n\t__isl_take isl_schedule_node *node, int pos,\n\tenum isl_ast_loop_type type)\n{\n\tisl_schedule_tree *tree;\n\n\tif (!node)\n\t\treturn NULL;\n\n\ttree = isl_schedule_tree_copy(node->tree);\n\ttree = isl_schedule_tree_band_member_set_isolate_ast_loop_type(tree,\n\t\t\t\t\t\t\t\t    pos, type);\n\treturn isl_schedule_node_graft_tree(node, tree);\n}\n\n/* Return the AST build options associated to band node \"node\".\n */\n__isl_give isl_union_set *isl_schedule_node_band_get_ast_build_options(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\n\treturn isl_schedule_tree_band_get_ast_build_options(node->tree);\n}\n\n/* Replace the AST build options associated to band node \"node\" by \"options\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_band_set_ast_build_options(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_set *options)\n{\n\tisl_schedule_tree *tree;\n\n\tif (!node || !options)\n\t\tgoto error;\n\n\ttree = isl_schedule_tree_copy(node->tree);\n\ttree = isl_schedule_tree_band_set_ast_build_options(tree, options);\n\treturn isl_schedule_node_graft_tree(node, tree);\nerror:\n\tisl_schedule_node_free(node);\n\tisl_union_set_free(options);\n\treturn NULL;\n}\n\n/* Return the \"isolate\" option associated to band node \"node\".\n */\n__isl_give isl_set *isl_schedule_node_band_get_ast_isolate_option(\n\t__isl_keep isl_schedule_node *node)\n{\n\tisl_size depth;\n\n\tdepth = isl_schedule_node_get_schedule_depth(node);\n\tif (depth < 0)\n\t\treturn NULL;\n\n\treturn isl_schedule_tree_band_get_ast_isolate_option(node->tree, depth);\n}\n\n/* Make sure that that spaces of \"node\" and \"mv\" are the same.\n * Return -1 on error, reporting the error to the user.\n */\nstatic int check_space_multi_val(__isl_keep isl_schedule_node *node,\n\t__isl_keep isl_multi_val *mv)\n{\n\tisl_space *node_space, *mv_space;\n\tint equal;\n\n\tnode_space = isl_schedule_node_band_get_space(node);\n\tmv_space = isl_multi_val_get_space(mv);\n\tequal = isl_space_tuple_is_equal(node_space, isl_dim_set,\n\t\t\t\t\tmv_space, isl_dim_set);\n\tisl_space_free(mv_space);\n\tisl_space_free(node_space);\n\tif (equal < 0)\n\t\treturn -1;\n\tif (!equal)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"spaces don't match\", return -1);\n\n\treturn 0;\n}\n\n/* Multiply the partial schedule of the band node \"node\"\n * with the factors in \"mv\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_band_scale(\n\t__isl_take isl_schedule_node *node, __isl_take isl_multi_val *mv)\n{\n\tisl_schedule_tree *tree;\n\tint anchored;\n\n\tif (!node || !mv)\n\t\tgoto error;\n\tif (check_space_multi_val(node, mv) < 0)\n\t\tgoto error;\n\tanchored = isl_schedule_node_is_subtree_anchored(node);\n\tif (anchored < 0)\n\t\tgoto error;\n\tif (anchored)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"cannot scale band node with anchored subtree\",\n\t\t\tgoto error);\n\n\ttree = isl_schedule_node_get_tree(node);\n\ttree = isl_schedule_tree_band_scale(tree, mv);\n\treturn isl_schedule_node_graft_tree(node, tree);\nerror:\n\tisl_multi_val_free(mv);\n\tisl_schedule_node_free(node);\n\treturn NULL;\n}\n\n/* Divide the partial schedule of the band node \"node\"\n * by the factors in \"mv\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_band_scale_down(\n\t__isl_take isl_schedule_node *node, __isl_take isl_multi_val *mv)\n{\n\tisl_schedule_tree *tree;\n\tint anchored;\n\n\tif (!node || !mv)\n\t\tgoto error;\n\tif (check_space_multi_val(node, mv) < 0)\n\t\tgoto error;\n\tanchored = isl_schedule_node_is_subtree_anchored(node);\n\tif (anchored < 0)\n\t\tgoto error;\n\tif (anchored)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"cannot scale down band node with anchored subtree\",\n\t\t\tgoto error);\n\n\ttree = isl_schedule_node_get_tree(node);\n\ttree = isl_schedule_tree_band_scale_down(tree, mv);\n\treturn isl_schedule_node_graft_tree(node, tree);\nerror:\n\tisl_multi_val_free(mv);\n\tisl_schedule_node_free(node);\n\treturn NULL;\n}\n\n/* Reduce the partial schedule of the band node \"node\"\n * modulo the factors in \"mv\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_band_mod(\n\t__isl_take isl_schedule_node *node, __isl_take isl_multi_val *mv)\n{\n\tisl_schedule_tree *tree;\n\tisl_bool anchored;\n\n\tif (!node || !mv)\n\t\tgoto error;\n\tif (check_space_multi_val(node, mv) < 0)\n\t\tgoto error;\n\tanchored = isl_schedule_node_is_subtree_anchored(node);\n\tif (anchored < 0)\n\t\tgoto error;\n\tif (anchored)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"cannot perform mod on band node with anchored subtree\",\n\t\t\tgoto error);\n\n\ttree = isl_schedule_node_get_tree(node);\n\ttree = isl_schedule_tree_band_mod(tree, mv);\n\treturn isl_schedule_node_graft_tree(node, tree);\nerror:\n\tisl_multi_val_free(mv);\n\tisl_schedule_node_free(node);\n\treturn NULL;\n}\n\n/* Make sure that that spaces of \"node\" and \"mupa\" are the same.\n * Return isl_stat_error on error, reporting the error to the user.\n */\nstatic isl_stat check_space_multi_union_pw_aff(\n\t__isl_keep isl_schedule_node *node,\n\t__isl_keep isl_multi_union_pw_aff *mupa)\n{\n\tisl_space *node_space, *mupa_space;\n\tisl_bool equal;\n\n\tnode_space = isl_schedule_node_band_get_space(node);\n\tmupa_space = isl_multi_union_pw_aff_get_space(mupa);\n\tequal = isl_space_tuple_is_equal(node_space, isl_dim_set,\n\t\t\t\t\tmupa_space, isl_dim_set);\n\tisl_space_free(mupa_space);\n\tisl_space_free(node_space);\n\tif (equal < 0)\n\t\treturn isl_stat_error;\n\tif (!equal)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"spaces don't match\", return isl_stat_error);\n\n\treturn isl_stat_ok;\n}\n\n/* Shift the partial schedule of the band node \"node\" by \"shift\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_band_shift(\n\t__isl_take isl_schedule_node *node,\n\t__isl_take isl_multi_union_pw_aff *shift)\n{\n\tisl_schedule_tree *tree;\n\tint anchored;\n\n\tif (!node || !shift)\n\t\tgoto error;\n\tif (check_space_multi_union_pw_aff(node, shift) < 0)\n\t\tgoto error;\n\tanchored = isl_schedule_node_is_subtree_anchored(node);\n\tif (anchored < 0)\n\t\tgoto error;\n\tif (anchored)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"cannot shift band node with anchored subtree\",\n\t\t\tgoto error);\n\n\ttree = isl_schedule_node_get_tree(node);\n\ttree = isl_schedule_tree_band_shift(tree, shift);\n\treturn isl_schedule_node_graft_tree(node, tree);\nerror:\n\tisl_multi_union_pw_aff_free(shift);\n\tisl_schedule_node_free(node);\n\treturn NULL;\n}\n\n/* Tile \"node\" with tile sizes \"sizes\".\n *\n * The current node is replaced by two nested nodes corresponding\n * to the tile dimensions and the point dimensions.\n *\n * Return a pointer to the outer (tile) node.\n *\n * If any of the descendants of \"node\" depend on the set of outer band nodes,\n * then we refuse to tile the node.\n *\n * If the scale tile loops option is set, then the tile loops\n * are scaled by the tile sizes.  If the shift point loops option is set,\n * then the point loops are shifted to start at zero.\n * In particular, these options affect the tile and point loop schedules\n * as follows\n *\n *\tscale\tshift\toriginal\ttile\t\tpoint\n *\n *\t0\t0\ti\t\tfloor(i/s)\ti\n *\t1\t0\ti\t\ts * floor(i/s)\ti\n *\t0\t1\ti\t\tfloor(i/s)\ti - s * floor(i/s)\n *\t1\t1\ti\t\ts * floor(i/s)\ti - s * floor(i/s)\n */\n__isl_give isl_schedule_node *isl_schedule_node_band_tile(\n\t__isl_take isl_schedule_node *node, __isl_take isl_multi_val *sizes)\n{\n\tisl_schedule_tree *tree;\n\tint anchored;\n\n\tif (!node || !sizes)\n\t\tgoto error;\n\tanchored = isl_schedule_node_is_subtree_anchored(node);\n\tif (anchored < 0)\n\t\tgoto error;\n\tif (anchored)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"cannot tile band node with anchored subtree\",\n\t\t\tgoto error);\n\n\tif (check_space_multi_val(node, sizes) < 0)\n\t\tgoto error;\n\n\ttree = isl_schedule_node_get_tree(node);\n\ttree = isl_schedule_tree_band_tile(tree, sizes);\n\treturn isl_schedule_node_graft_tree(node, tree);\nerror:\n\tisl_multi_val_free(sizes);\n\tisl_schedule_node_free(node);\n\treturn NULL;\n}\n\n/* Move the band node \"node\" down to all the leaves in the subtree\n * rooted at \"node\".\n * Return a pointer to the node in the resulting tree that is in the same\n * position as the node pointed to by \"node\" in the original tree.\n *\n * If the node only has a leaf child, then nothing needs to be done.\n * Otherwise, the child of the node is removed and the result is\n * appended to all the leaves in the subtree rooted at the original child.\n * Since the node is moved to the leaves, it needs to be expanded\n * according to the expansion, if any, defined by that subtree.\n * In the end, the original node is replaced by the result of\n * attaching copies of the expanded node to the leaves.\n *\n * If any of the nodes in the subtree rooted at \"node\" depend on\n * the set of outer band nodes then we refuse to sink the band node.\n */\n__isl_give isl_schedule_node *isl_schedule_node_band_sink(\n\t__isl_take isl_schedule_node *node)\n{\n\tenum isl_schedule_node_type type;\n\tisl_schedule_tree *tree, *child;\n\tisl_union_pw_multi_aff *contraction;\n\tisl_bool anchored;\n\tisl_size n;\n\n\tif (!node)\n\t\treturn NULL;\n\n\ttype = isl_schedule_node_get_type(node);\n\tif (type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"not a band node\", return isl_schedule_node_free(node));\n\tanchored = isl_schedule_node_is_subtree_anchored(node);\n\tif (anchored < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (anchored)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"cannot sink band node in anchored subtree\",\n\t\t\treturn isl_schedule_node_free(node));\n\tn = isl_schedule_tree_n_children(node->tree);\n\tif (n < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (n == 0)\n\t\treturn node;\n\n\tcontraction = isl_schedule_node_get_subtree_contraction(node);\n\n\ttree = isl_schedule_node_get_tree(node);\n\tchild = isl_schedule_tree_get_child(tree, 0);\n\ttree = isl_schedule_tree_reset_children(tree);\n\ttree = isl_schedule_tree_pullback_union_pw_multi_aff(tree, contraction);\n\ttree = isl_schedule_tree_append_to_leaves(child, tree);\n\n\treturn isl_schedule_node_graft_tree(node, tree);\n}\n\n/* Split \"node\" into two nested band nodes, one with the first \"pos\"\n * dimensions and one with the remaining dimensions.\n * The schedules of the two band nodes live in anonymous spaces.\n * The loop AST generation type options and the isolate option\n * are split over the two band nodes.\n */\n__isl_give isl_schedule_node *isl_schedule_node_band_split(\n\t__isl_take isl_schedule_node *node, int pos)\n{\n\tisl_size depth;\n\tisl_schedule_tree *tree;\n\n\tdepth = isl_schedule_node_get_schedule_depth(node);\n\tif (depth < 0)\n\t\treturn isl_schedule_node_free(node);\n\ttree = isl_schedule_node_get_tree(node);\n\ttree = isl_schedule_tree_band_split(tree, pos, depth);\n\treturn isl_schedule_node_graft_tree(node, tree);\n}\n\n/* Return the context of the context node \"node\".\n */\n__isl_give isl_set *isl_schedule_node_context_get_context(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\n\treturn isl_schedule_tree_context_get_context(node->tree);\n}\n\n/* Return the domain of the domain node \"node\".\n */\n__isl_give isl_union_set *isl_schedule_node_domain_get_domain(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\n\treturn isl_schedule_tree_domain_get_domain(node->tree);\n}\n\n/* Return the expansion map of expansion node \"node\".\n */\n__isl_give isl_union_map *isl_schedule_node_expansion_get_expansion(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\n\treturn isl_schedule_tree_expansion_get_expansion(node->tree);\n}\n\n/* Return the contraction of expansion node \"node\".\n */\n__isl_give isl_union_pw_multi_aff *isl_schedule_node_expansion_get_contraction(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\n\treturn isl_schedule_tree_expansion_get_contraction(node->tree);\n}\n\n/* Replace the contraction and the expansion of the expansion node \"node\"\n * by \"contraction\" and \"expansion\".\n */\n__isl_give isl_schedule_node *\nisl_schedule_node_expansion_set_contraction_and_expansion(\n\t__isl_take isl_schedule_node *node,\n\t__isl_take isl_union_pw_multi_aff *contraction,\n\t__isl_take isl_union_map *expansion)\n{\n\tisl_schedule_tree *tree;\n\n\tif (!node || !contraction || !expansion)\n\t\tgoto error;\n\n\ttree = isl_schedule_tree_copy(node->tree);\n\ttree = isl_schedule_tree_expansion_set_contraction_and_expansion(tree,\n\t\t\t\t\t\t\tcontraction, expansion);\n\treturn isl_schedule_node_graft_tree(node, tree);\nerror:\n\tisl_schedule_node_free(node);\n\tisl_union_pw_multi_aff_free(contraction);\n\tisl_union_map_free(expansion);\n\treturn NULL;\n}\n\n/* Return the extension of the extension node \"node\".\n */\n__isl_give isl_union_map *isl_schedule_node_extension_get_extension(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\n\treturn isl_schedule_tree_extension_get_extension(node->tree);\n}\n\n/* Replace the extension of extension node \"node\" by \"extension\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_extension_set_extension(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_map *extension)\n{\n\tisl_schedule_tree *tree;\n\n\tif (!node || !extension)\n\t\tgoto error;\n\n\ttree = isl_schedule_tree_copy(node->tree);\n\ttree = isl_schedule_tree_extension_set_extension(tree, extension);\n\treturn isl_schedule_node_graft_tree(node, tree);\nerror:\n\tisl_schedule_node_free(node);\n\tisl_union_map_free(extension);\n\treturn NULL;\n}\n\n/* Return the filter of the filter node \"node\".\n */\n__isl_give isl_union_set *isl_schedule_node_filter_get_filter(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\n\treturn isl_schedule_tree_filter_get_filter(node->tree);\n}\n\n/* Replace the filter of filter node \"node\" by \"filter\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_filter_set_filter(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_set *filter)\n{\n\tisl_schedule_tree *tree;\n\n\tif (!node || !filter)\n\t\tgoto error;\n\n\ttree = isl_schedule_tree_copy(node->tree);\n\ttree = isl_schedule_tree_filter_set_filter(tree, filter);\n\treturn isl_schedule_node_graft_tree(node, tree);\nerror:\n\tisl_schedule_node_free(node);\n\tisl_union_set_free(filter);\n\treturn NULL;\n}\n\n/* Intersect the filter of filter node \"node\" with \"filter\".\n *\n * If the filter of the node is already a subset of \"filter\",\n * then leave the node unchanged.\n */\n__isl_give isl_schedule_node *isl_schedule_node_filter_intersect_filter(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_set *filter)\n{\n\tisl_union_set *node_filter = NULL;\n\tisl_bool subset;\n\n\tif (!node || !filter)\n\t\tgoto error;\n\n\tnode_filter = isl_schedule_node_filter_get_filter(node);\n\tsubset = isl_union_set_is_subset(node_filter, filter);\n\tif (subset < 0)\n\t\tgoto error;\n\tif (subset) {\n\t\tisl_union_set_free(node_filter);\n\t\tisl_union_set_free(filter);\n\t\treturn node;\n\t}\n\tnode_filter = isl_union_set_intersect(node_filter, filter);\n\tnode = isl_schedule_node_filter_set_filter(node, node_filter);\n\treturn node;\nerror:\n\tisl_schedule_node_free(node);\n\tisl_union_set_free(node_filter);\n\tisl_union_set_free(filter);\n\treturn NULL;\n}\n\n/* Return the guard of the guard node \"node\".\n */\n__isl_give isl_set *isl_schedule_node_guard_get_guard(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\n\treturn isl_schedule_tree_guard_get_guard(node->tree);\n}\n\n/* Return the mark identifier of the mark node \"node\".\n */\n__isl_give isl_id *isl_schedule_node_mark_get_id(\n\t__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn NULL;\n\n\treturn isl_schedule_tree_mark_get_id(node->tree);\n}\n\n/* Replace the child at position \"pos\" of the sequence node \"node\"\n * by the children of sequence root node of \"tree\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_sequence_splice(\n\t__isl_take isl_schedule_node *node, int pos,\n\t__isl_take isl_schedule_tree *tree)\n{\n\tisl_schedule_tree *node_tree;\n\n\tif (!node || !tree)\n\t\tgoto error;\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_sequence)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"not a sequence node\", goto error);\n\tif (isl_schedule_tree_get_type(tree) != isl_schedule_node_sequence)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"not a sequence node\", goto error);\n\tnode_tree = isl_schedule_node_get_tree(node);\n\tnode_tree = isl_schedule_tree_sequence_splice(node_tree, pos, tree);\n\tnode = isl_schedule_node_graft_tree(node, node_tree);\n\n\treturn node;\nerror:\n\tisl_schedule_node_free(node);\n\tisl_schedule_tree_free(tree);\n\treturn NULL;\n}\n\n/* Given a sequence node \"node\", with a child at position \"pos\" that\n * is also a sequence node, attach the children of that node directly\n * as children of \"node\" at that position, replacing the original child.\n *\n * The filters of these children are intersected with the filter\n * of the child at position \"pos\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_sequence_splice_child(\n\t__isl_take isl_schedule_node *node, int pos)\n{\n\tint i;\n\tisl_size n;\n\tisl_union_set *filter;\n\tisl_schedule_node *child;\n\tisl_schedule_tree *tree;\n\n\tif (!node)\n\t\treturn NULL;\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_sequence)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"not a sequence node\",\n\t\t\treturn isl_schedule_node_free(node));\n\tnode = isl_schedule_node_child(node, pos);\n\tnode = isl_schedule_node_child(node, 0);\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_sequence)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"not a sequence node\",\n\t\t\treturn isl_schedule_node_free(node));\n\tn = isl_schedule_node_n_children(node);\n\tif (n < 0)\n\t\treturn isl_schedule_node_free(node);\n\tchild = isl_schedule_node_copy(node);\n\tnode = isl_schedule_node_parent(node);\n\tfilter = isl_schedule_node_filter_get_filter(node);\n\tfor (i = 0; i < n; ++i) {\n\t\tchild = isl_schedule_node_child(child, i);\n\t\tchild = isl_schedule_node_filter_intersect_filter(child,\n\t\t\t\t\t\tisl_union_set_copy(filter));\n\t\tchild = isl_schedule_node_parent(child);\n\t}\n\tisl_union_set_free(filter);\n\ttree = isl_schedule_node_get_tree(child);\n\tisl_schedule_node_free(child);\n\tnode = isl_schedule_node_parent(node);\n\tnode = isl_schedule_node_sequence_splice(node, pos, tree);\n\n\treturn node;\n}\n\n/* Update the ancestors of \"node\" to point to the tree that \"node\"\n * now points to.\n * That is, replace the child in the original parent that corresponds\n * to the current tree position by node->tree and continue updating\n * the ancestors in the same way until the root is reached.\n *\n * If \"fn\" is not NULL, then it is called on each ancestor as we move up\n * the tree so that it can modify the ancestor before it is added\n * to the list of ancestors of the modified node.\n * The additional \"pos\" argument records the position\n * of the \"tree\" argument in the original schedule tree.\n *\n * If \"node\" originally points to a leaf of the schedule tree, then make sure\n * that in the end it points to a leaf in the updated schedule tree.\n */\nstatic __isl_give isl_schedule_node *update_ancestors(\n\t__isl_take isl_schedule_node *node,\n\t__isl_give isl_schedule_tree *(*fn)(__isl_take isl_schedule_tree *tree,\n\t\t__isl_keep isl_schedule_node *pos, void *user), void *user)\n{\n\tint i;\n\tisl_size n;\n\tint is_leaf;\n\tisl_schedule_tree *tree;\n\tisl_schedule_node *pos = NULL;\n\n\tif (fn)\n\t\tpos = isl_schedule_node_copy(node);\n\n\tnode = isl_schedule_node_cow(node);\n\tif (!node)\n\t\treturn isl_schedule_node_free(pos);\n\n\tn = isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n\tif (n < 0)\n\t\treturn isl_schedule_node_free(pos);\n\ttree = isl_schedule_tree_copy(node->tree);\n\n\tfor (i = n - 1; i >= 0; --i) {\n\t\tisl_schedule_tree *parent;\n\n\t\tparent = isl_schedule_tree_list_get_schedule_tree(\n\t\t\t\t\t\t    node->ancestors, i);\n\t\tparent = isl_schedule_tree_replace_child(parent,\n\t\t\t\t\t\t    node->child_pos[i], tree);\n\t\tif (fn) {\n\t\t\tpos = isl_schedule_node_parent(pos);\n\t\t\tparent = fn(parent, pos, user);\n\t\t}\n\t\tnode->ancestors = isl_schedule_tree_list_set_schedule_tree(\n\t\t\t    node->ancestors, i, isl_schedule_tree_copy(parent));\n\n\t\ttree = parent;\n\t}\n\n\tif (fn)\n\t\tisl_schedule_node_free(pos);\n\n\tis_leaf = isl_schedule_tree_is_leaf(node->tree);\n\tnode->schedule = isl_schedule_set_root(node->schedule, tree);\n\tif (is_leaf) {\n\t\tisl_schedule_tree_free(node->tree);\n\t\tnode->tree = isl_schedule_node_get_leaf(node);\n\t}\n\n\tif (!node->schedule || !node->ancestors)\n\t\treturn isl_schedule_node_free(node);\n\n\treturn node;\n}\n\n/* Replace the subtree that \"pos\" points to by \"tree\", updating\n * the ancestors to maintain a consistent state.\n */\n__isl_give isl_schedule_node *isl_schedule_node_graft_tree(\n\t__isl_take isl_schedule_node *pos, __isl_take isl_schedule_tree *tree)\n{\n\tif (!tree || !pos)\n\t\tgoto error;\n\tif (pos->tree == tree) {\n\t\tisl_schedule_tree_free(tree);\n\t\treturn pos;\n\t}\n\n\tpos = isl_schedule_node_cow(pos);\n\tif (!pos)\n\t\tgoto error;\n\n\tisl_schedule_tree_free(pos->tree);\n\tpos->tree = tree;\n\n\treturn update_ancestors(pos, NULL, NULL);\nerror:\n\tisl_schedule_node_free(pos);\n\tisl_schedule_tree_free(tree);\n\treturn NULL;\n}\n\n/* Make sure we can insert a node between \"node\" and its parent.\n * Return -1 on error, reporting the reason why we cannot insert a node.\n */\nstatic int check_insert(__isl_keep isl_schedule_node *node)\n{\n\tint has_parent;\n\tenum isl_schedule_node_type type;\n\n\thas_parent = isl_schedule_node_has_parent(node);\n\tif (has_parent < 0)\n\t\treturn -1;\n\tif (!has_parent)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"cannot insert node outside of root\", return -1);\n\n\ttype = isl_schedule_node_get_parent_type(node);\n\tif (type == isl_schedule_node_error)\n\t\treturn -1;\n\tif (type == isl_schedule_node_set || type == isl_schedule_node_sequence)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"cannot insert node between set or sequence node \"\n\t\t\t\"and its filter children\", return -1);\n\n\treturn 0;\n}\n\n/* Insert a band node with partial schedule \"mupa\" between \"node\" and\n * its parent.\n * Return a pointer to the new band node.\n *\n * If any of the nodes in the subtree rooted at \"node\" depend on\n * the set of outer band nodes then we refuse to insert the band node.\n */\n__isl_give isl_schedule_node *isl_schedule_node_insert_partial_schedule(\n\t__isl_take isl_schedule_node *node,\n\t__isl_take isl_multi_union_pw_aff *mupa)\n{\n\tint anchored;\n\tisl_schedule_band *band;\n\tisl_schedule_tree *tree;\n\n\tif (check_insert(node) < 0)\n\t\tnode = isl_schedule_node_free(node);\n\tanchored = isl_schedule_node_is_subtree_anchored(node);\n\tif (anchored < 0)\n\t\tgoto error;\n\tif (anchored)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"cannot insert band node in anchored subtree\",\n\t\t\tgoto error);\n\n\ttree = isl_schedule_node_get_tree(node);\n\tband = isl_schedule_band_from_multi_union_pw_aff(mupa);\n\ttree = isl_schedule_tree_insert_band(tree, band);\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\n\treturn node;\nerror:\n\tisl_schedule_node_free(node);\n\tisl_multi_union_pw_aff_free(mupa);\n\treturn NULL;\n}\n\n/* Insert a context node with context \"context\" between \"node\" and its parent.\n * Return a pointer to the new context node.\n */\n__isl_give isl_schedule_node *isl_schedule_node_insert_context(\n\t__isl_take isl_schedule_node *node, __isl_take isl_set *context)\n{\n\tisl_schedule_tree *tree;\n\n\tif (check_insert(node) < 0)\n\t\tnode = isl_schedule_node_free(node);\n\n\ttree = isl_schedule_node_get_tree(node);\n\ttree = isl_schedule_tree_insert_context(tree, context);\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\n\treturn node;\n}\n\n/* Insert an expansion node with the given \"contraction\" and \"expansion\"\n * between \"node\" and its parent.\n * Return a pointer to the new expansion node.\n *\n * Typically the domain and range spaces of the expansion are different.\n * This means that only one of them can refer to the current domain space\n * in a consistent tree.  It is up to the caller to ensure that the tree\n * returns to a consistent state.\n */\n__isl_give isl_schedule_node *isl_schedule_node_insert_expansion(\n\t__isl_take isl_schedule_node *node,\n\t__isl_take isl_union_pw_multi_aff *contraction,\n\t__isl_take isl_union_map *expansion)\n{\n\tisl_schedule_tree *tree;\n\n\tif (check_insert(node) < 0)\n\t\tnode = isl_schedule_node_free(node);\n\n\ttree = isl_schedule_node_get_tree(node);\n\ttree = isl_schedule_tree_insert_expansion(tree, contraction, expansion);\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\n\treturn node;\n}\n\n/* Insert an extension node with extension \"extension\" between \"node\" and\n * its parent.\n * Return a pointer to the new extension node.\n */\n__isl_give isl_schedule_node *isl_schedule_node_insert_extension(\n\t__isl_take isl_schedule_node *node,\n\t__isl_take isl_union_map *extension)\n{\n\tisl_schedule_tree *tree;\n\n\ttree = isl_schedule_node_get_tree(node);\n\ttree = isl_schedule_tree_insert_extension(tree, extension);\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\n\treturn node;\n}\n\n/* Insert a filter node with filter \"filter\" between \"node\" and its parent.\n * Return a pointer to the new filter node.\n */\n__isl_give isl_schedule_node *isl_schedule_node_insert_filter(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_set *filter)\n{\n\tisl_schedule_tree *tree;\n\n\tif (check_insert(node) < 0)\n\t\tnode = isl_schedule_node_free(node);\n\n\ttree = isl_schedule_node_get_tree(node);\n\ttree = isl_schedule_tree_insert_filter(tree, filter);\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\n\treturn node;\n}\n\n/* Insert a guard node with guard \"guard\" between \"node\" and its parent.\n * Return a pointer to the new guard node.\n */\n__isl_give isl_schedule_node *isl_schedule_node_insert_guard(\n\t__isl_take isl_schedule_node *node, __isl_take isl_set *guard)\n{\n\tisl_schedule_tree *tree;\n\n\tif (check_insert(node) < 0)\n\t\tnode = isl_schedule_node_free(node);\n\n\ttree = isl_schedule_node_get_tree(node);\n\ttree = isl_schedule_tree_insert_guard(tree, guard);\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\n\treturn node;\n}\n\n/* Insert a mark node with mark identifier \"mark\" between \"node\" and\n * its parent.\n * Return a pointer to the new mark node.\n */\n__isl_give isl_schedule_node *isl_schedule_node_insert_mark(\n\t__isl_take isl_schedule_node *node, __isl_take isl_id *mark)\n{\n\tisl_schedule_tree *tree;\n\n\tif (check_insert(node) < 0)\n\t\tnode = isl_schedule_node_free(node);\n\n\ttree = isl_schedule_node_get_tree(node);\n\ttree = isl_schedule_tree_insert_mark(tree, mark);\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\n\treturn node;\n}\n\n/* Attach the current subtree of \"node\" to a sequence of filter tree nodes\n * with filters described by \"filters\", attach this sequence\n * of filter tree nodes as children to a new tree of type \"type\" and\n * replace the original subtree of \"node\" by this new tree.\n * Each copy of the original subtree is simplified with respect\n * to the corresponding filter.\n */\nstatic __isl_give isl_schedule_node *isl_schedule_node_insert_children(\n\t__isl_take isl_schedule_node *node,\n\tenum isl_schedule_node_type type,\n\t__isl_take isl_union_set_list *filters)\n{\n\tint i;\n\tisl_size n;\n\tisl_ctx *ctx;\n\tisl_schedule_tree *tree;\n\tisl_schedule_tree_list *list;\n\n\tif (check_insert(node) < 0)\n\t\tnode = isl_schedule_node_free(node);\n\n\tn = isl_union_set_list_n_union_set(filters);\n\tif (!node || n < 0)\n\t\tgoto error;\n\n\tctx = isl_schedule_node_get_ctx(node);\n\tlist = isl_schedule_tree_list_alloc(ctx, n);\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_schedule_node *node_i;\n\t\tisl_schedule_tree *tree;\n\t\tisl_union_set *filter;\n\n\t\tfilter = isl_union_set_list_get_union_set(filters, i);\n\t\tnode_i = isl_schedule_node_copy(node);\n\t\tnode_i = isl_schedule_node_gist(node_i,\n\t\t\t\t\t\tisl_union_set_copy(filter));\n\t\ttree = isl_schedule_node_get_tree(node_i);\n\t\tisl_schedule_node_free(node_i);\n\t\ttree = isl_schedule_tree_insert_filter(tree, filter);\n\t\tlist = isl_schedule_tree_list_add(list, tree);\n\t}\n\ttree = isl_schedule_tree_from_children(type, list);\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\n\tisl_union_set_list_free(filters);\n\treturn node;\nerror:\n\tisl_union_set_list_free(filters);\n\tisl_schedule_node_free(node);\n\treturn NULL;\n}\n\n/* Insert a sequence node with child filters \"filters\" between \"node\" and\n * its parent.  That is, the tree that \"node\" points to is attached\n * to each of the child nodes of the filter nodes.\n * Return a pointer to the new sequence node.\n */\n__isl_give isl_schedule_node *isl_schedule_node_insert_sequence(\n\t__isl_take isl_schedule_node *node,\n\t__isl_take isl_union_set_list *filters)\n{\n\treturn isl_schedule_node_insert_children(node,\n\t\t\t\t\tisl_schedule_node_sequence, filters);\n}\n\n/* Insert a set node with child filters \"filters\" between \"node\" and\n * its parent.  That is, the tree that \"node\" points to is attached\n * to each of the child nodes of the filter nodes.\n * Return a pointer to the new set node.\n */\n__isl_give isl_schedule_node *isl_schedule_node_insert_set(\n\t__isl_take isl_schedule_node *node,\n\t__isl_take isl_union_set_list *filters)\n{\n\treturn isl_schedule_node_insert_children(node,\n\t\t\t\t\tisl_schedule_node_set, filters);\n}\n\n/* Remove \"node\" from its schedule tree and return a pointer\n * to the leaf at the same position in the updated schedule tree.\n *\n * It is not allowed to remove the root of a schedule tree or\n * a child of a set or sequence node.\n */\n__isl_give isl_schedule_node *isl_schedule_node_cut(\n\t__isl_take isl_schedule_node *node)\n{\n\tisl_schedule_tree *leaf;\n\tenum isl_schedule_node_type parent_type;\n\n\tif (!node)\n\t\treturn NULL;\n\tif (!isl_schedule_node_has_parent(node))\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"cannot cut root\", return isl_schedule_node_free(node));\n\n\tparent_type = isl_schedule_node_get_parent_type(node);\n\tif (parent_type == isl_schedule_node_set ||\n\t    parent_type == isl_schedule_node_sequence)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"cannot cut child of set or sequence\",\n\t\t\treturn isl_schedule_node_free(node));\n\n\tleaf = isl_schedule_node_get_leaf(node);\n\treturn isl_schedule_node_graft_tree(node, leaf);\n}\n\n/* Remove a single node from the schedule tree, attaching the child\n * of \"node\" directly to its parent.\n * Return a pointer to this former child or to the leaf the position\n * of the original node if there was no child.\n * It is not allowed to remove the root of a schedule tree,\n * a set or sequence node, a child of a set or sequence node or\n * a band node with an anchored subtree.\n */\n__isl_give isl_schedule_node *isl_schedule_node_delete(\n\t__isl_take isl_schedule_node *node)\n{\n\tisl_size n, depth;\n\tisl_schedule_tree *tree;\n\tenum isl_schedule_node_type type;\n\n\tdepth = isl_schedule_node_get_tree_depth(node);\n\tn = isl_schedule_node_n_children(node);\n\tif (depth < 0 || n < 0)\n\t\treturn isl_schedule_node_free(node);\n\n\tif (depth == 0)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"cannot delete root node\",\n\t\t\treturn isl_schedule_node_free(node));\n\tif (n != 1)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"can only delete node with a single child\",\n\t\t\treturn isl_schedule_node_free(node));\n\ttype = isl_schedule_node_get_parent_type(node);\n\tif (type == isl_schedule_node_sequence || type == isl_schedule_node_set)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"cannot delete child of set or sequence\",\n\t\t\treturn isl_schedule_node_free(node));\n\tif (isl_schedule_node_get_type(node) == isl_schedule_node_band) {\n\t\tint anchored;\n\n\t\tanchored = isl_schedule_node_is_subtree_anchored(node);\n\t\tif (anchored < 0)\n\t\t\treturn isl_schedule_node_free(node);\n\t\tif (anchored)\n\t\t\tisl_die(isl_schedule_node_get_ctx(node),\n\t\t\t\tisl_error_invalid,\n\t\t\t\t\"cannot delete band node with anchored subtree\",\n\t\t\t\treturn isl_schedule_node_free(node));\n\t}\n\n\ttree = isl_schedule_node_get_tree(node);\n\tif (!tree || isl_schedule_tree_has_children(tree)) {\n\t\ttree = isl_schedule_tree_child(tree, 0);\n\t} else {\n\t\tisl_schedule_tree_free(tree);\n\t\ttree = isl_schedule_node_get_leaf(node);\n\t}\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\n\treturn node;\n}\n\n/* Internal data structure for the group_ancestor callback.\n *\n * If \"finished\" is set, then we no longer need to modify\n * any further ancestors.\n *\n * \"contraction\" and \"expansion\" represent the expansion\n * that reflects the grouping.\n *\n * \"domain\" contains the domain elements that reach the position\n * where the grouping is performed.  That is, it is the range\n * of the resulting expansion.\n * \"domain_universe\" is the universe of \"domain\".\n * \"group\" is the set of group elements, i.e., the domain\n * of the resulting expansion.\n * \"group_universe\" is the universe of \"group\".\n *\n * \"sched\" is the schedule for the group elements, in pratice\n * an identity mapping on \"group_universe\".\n * \"dim\" is the dimension of \"sched\".\n */\nstruct isl_schedule_group_data {\n\tint finished;\n\n\tisl_union_map *expansion;\n\tisl_union_pw_multi_aff *contraction;\n\n\tisl_union_set *domain;\n\tisl_union_set *domain_universe;\n\tisl_union_set *group;\n\tisl_union_set *group_universe;\n\n\tint dim;\n\tisl_multi_aff *sched;\n};\n\n/* Is domain covered by data->domain within data->domain_universe?\n */\nstatic isl_bool locally_covered_by_domain(__isl_keep isl_union_set *domain,\n\tstruct isl_schedule_group_data *data)\n{\n\tisl_bool is_subset;\n\tisl_union_set *test;\n\n\ttest = isl_union_set_copy(domain);\n\ttest = isl_union_set_intersect(test,\n\t\t\t    isl_union_set_copy(data->domain_universe));\n\tis_subset = isl_union_set_is_subset(test, data->domain);\n\tisl_union_set_free(test);\n\n\treturn is_subset;\n}\n\n/* Update the band tree root \"tree\" to refer to the group instances\n * in data->group rather than the original domain elements in data->domain.\n * \"pos\" is the position in the original schedule tree where the modified\n * \"tree\" will be attached.\n *\n * Add the part of the identity schedule on the group instances data->sched\n * that corresponds to this band node to the band schedule.\n * If the domain elements that reach the node and that are part\n * of data->domain_universe are all elements of data->domain (and therefore\n * replaced by the group instances) then this data->domain_universe\n * is removed from the domain of the band schedule.\n */\nstatic __isl_give isl_schedule_tree *group_band(\n\t__isl_take isl_schedule_tree *tree, __isl_keep isl_schedule_node *pos,\n\tstruct isl_schedule_group_data *data)\n{\n\tisl_union_set *domain;\n\tisl_multi_aff *ma;\n\tisl_multi_union_pw_aff *mupa, *partial;\n\tisl_bool is_covered;\n\tisl_size depth, n;\n\tisl_bool has_id;\n\n\tdomain = isl_schedule_node_get_domain(pos);\n\tis_covered = locally_covered_by_domain(domain, data);\n\tif (is_covered >= 0 && is_covered) {\n\t\tdomain = isl_union_set_universe(domain);\n\t\tdomain = isl_union_set_subtract(domain,\n\t\t\t    isl_union_set_copy(data->domain_universe));\n\t\ttree = isl_schedule_tree_band_intersect_domain(tree, domain);\n\t} else\n\t\tisl_union_set_free(domain);\n\tif (is_covered < 0)\n\t\treturn isl_schedule_tree_free(tree);\n\tdepth = isl_schedule_node_get_schedule_depth(pos);\n\tn = isl_schedule_tree_band_n_member(tree);\n\tif (depth < 0 || n < 0)\n\t\treturn isl_schedule_tree_free(tree);\n\tma = isl_multi_aff_copy(data->sched);\n\tma = isl_multi_aff_drop_dims(ma, isl_dim_out, 0, depth);\n\tma = isl_multi_aff_drop_dims(ma, isl_dim_out, n, data->dim - depth - n);\n\tmupa = isl_multi_union_pw_aff_from_multi_aff(ma);\n\tpartial = isl_schedule_tree_band_get_partial_schedule(tree);\n\thas_id = isl_multi_union_pw_aff_has_tuple_id(partial, isl_dim_set);\n\tif (has_id < 0) {\n\t\tpartial = isl_multi_union_pw_aff_free(partial);\n\t} else if (has_id) {\n\t\tisl_id *id;\n\t\tid = isl_multi_union_pw_aff_get_tuple_id(partial, isl_dim_set);\n\t\tmupa = isl_multi_union_pw_aff_set_tuple_id(mupa,\n\t\t\t\t\t\t\t    isl_dim_set, id);\n\t}\n\tpartial = isl_multi_union_pw_aff_union_add(partial, mupa);\n\ttree = isl_schedule_tree_band_set_partial_schedule(tree, partial);\n\n\treturn tree;\n}\n\n/* Drop the parameters in \"uset\" that are not also in \"space\".\n * \"n\" is the number of parameters in \"space\".\n */\nstatic __isl_give isl_union_set *union_set_drop_extra_params(\n\t__isl_take isl_union_set *uset, __isl_keep isl_space *space, int n)\n{\n\tisl_size n2;\n\n\tuset = isl_union_set_align_params(uset, isl_space_copy(space));\n\tn2 = isl_union_set_dim(uset, isl_dim_param);\n\tif (n2 < 0)\n\t\treturn isl_union_set_free(uset);\n\tuset = isl_union_set_project_out(uset, isl_dim_param, n, n2 - n);\n\n\treturn uset;\n}\n\n/* Update the context tree root \"tree\" to refer to the group instances\n * in data->group rather than the original domain elements in data->domain.\n * \"pos\" is the position in the original schedule tree where the modified\n * \"tree\" will be attached.\n *\n * We do not actually need to update \"tree\" since a context node only\n * refers to the schedule space.  However, we may need to update \"data\"\n * to not refer to any parameters introduced by the context node.\n */\nstatic __isl_give isl_schedule_tree *group_context(\n\t__isl_take isl_schedule_tree *tree, __isl_keep isl_schedule_node *pos,\n\tstruct isl_schedule_group_data *data)\n{\n\tisl_space *space;\n\tisl_union_set *domain;\n\tisl_size n1, n2;\n\tisl_bool involves;\n\tisl_size depth;\n\n\tdepth = isl_schedule_node_get_tree_depth(pos);\n\tif (depth < 0)\n\t\treturn isl_schedule_tree_free(tree);\n\tif (depth == 1)\n\t\treturn tree;\n\n\tdomain = isl_schedule_node_get_universe_domain(pos);\n\tspace = isl_union_set_get_space(domain);\n\tisl_union_set_free(domain);\n\n\tn1 = isl_space_dim(space, isl_dim_param);\n\tdata->expansion = isl_union_map_align_params(data->expansion, space);\n\tn2 = isl_union_map_dim(data->expansion, isl_dim_param);\n\n\tif (n1 < 0 || n2 < 0)\n\t\treturn isl_schedule_tree_free(tree);\n\tif (n1 == n2)\n\t\treturn tree;\n\n\tinvolves = isl_union_map_involves_dims(data->expansion,\n\t\t\t\tisl_dim_param, n1, n2 - n1);\n\tif (involves < 0)\n\t\treturn isl_schedule_tree_free(tree);\n\tif (involves)\n\t\tisl_die(isl_schedule_node_get_ctx(pos), isl_error_invalid,\n\t\t\t\"grouping cannot only refer to global parameters\",\n\t\t\treturn isl_schedule_tree_free(tree));\n\n\tdata->expansion = isl_union_map_project_out(data->expansion,\n\t\t\t\tisl_dim_param, n1, n2 - n1);\n\tspace = isl_union_map_get_space(data->expansion);\n\n\tdata->contraction = isl_union_pw_multi_aff_align_params(\n\t\t\t\tdata->contraction, isl_space_copy(space));\n\tn2 = isl_union_pw_multi_aff_dim(data->contraction, isl_dim_param);\n\tif (n2 < 0)\n\t\tdata->contraction =\n\t\t\t\tisl_union_pw_multi_aff_free(data->contraction);\n\tdata->contraction = isl_union_pw_multi_aff_drop_dims(data->contraction,\n\t\t\t\tisl_dim_param, n1, n2 - n1);\n\n\tdata->domain = union_set_drop_extra_params(data->domain, space, n1);\n\tdata->domain_universe =\n\t\tunion_set_drop_extra_params(data->domain_universe, space, n1);\n\tdata->group = union_set_drop_extra_params(data->group, space, n1);\n\tdata->group_universe =\n\t\tunion_set_drop_extra_params(data->group_universe, space, n1);\n\n\tdata->sched = isl_multi_aff_align_params(data->sched,\n\t\t\t\tisl_space_copy(space));\n\tn2 = isl_multi_aff_dim(data->sched, isl_dim_param);\n\tif (n2 < 0)\n\t\tdata->sched = isl_multi_aff_free(data->sched);\n\tdata->sched = isl_multi_aff_drop_dims(data->sched,\n\t\t\t\tisl_dim_param, n1, n2 - n1);\n\n\tisl_space_free(space);\n\n\treturn tree;\n}\n\n/* Update the domain tree root \"tree\" to refer to the group instances\n * in data->group rather than the original domain elements in data->domain.\n * \"pos\" is the position in the original schedule tree where the modified\n * \"tree\" will be attached.\n *\n * We first double-check that all grouped domain elements are actually\n * part of the root domain and then replace those elements by the group\n * instances.\n */\nstatic __isl_give isl_schedule_tree *group_domain(\n\t__isl_take isl_schedule_tree *tree, __isl_keep isl_schedule_node *pos,\n\tstruct isl_schedule_group_data *data)\n{\n\tisl_union_set *domain;\n\tisl_bool is_subset;\n\n\tdomain = isl_schedule_tree_domain_get_domain(tree);\n\tis_subset = isl_union_set_is_subset(data->domain, domain);\n\tisl_union_set_free(domain);\n\tif (is_subset < 0)\n\t\treturn isl_schedule_tree_free(tree);\n\tif (!is_subset)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_internal,\n\t\t\t\"grouped domain should be part of outer domain\",\n\t\t\treturn isl_schedule_tree_free(tree));\n\tdomain = isl_schedule_tree_domain_get_domain(tree);\n\tdomain = isl_union_set_subtract(domain,\n\t\t\t\tisl_union_set_copy(data->domain));\n\tdomain = isl_union_set_union(domain, isl_union_set_copy(data->group));\n\ttree = isl_schedule_tree_domain_set_domain(tree, domain);\n\n\treturn tree;\n}\n\n/* Update the expansion tree root \"tree\" to refer to the group instances\n * in data->group rather than the original domain elements in data->domain.\n * \"pos\" is the position in the original schedule tree where the modified\n * \"tree\" will be attached.\n *\n * Let G_1 -> D_1 be the expansion of \"tree\" and G_2 -> D_2 the newly\n * introduced expansion in a descendant of \"tree\".\n * We first double-check that D_2 is a subset of D_1.\n * Then we remove D_2 from the range of G_1 -> D_1 and add the mapping\n * G_1 -> D_1 . D_2 -> G_2.\n * Simmilarly, we restrict the domain of the contraction to the universe\n * of the range of the updated expansion and add G_2 -> D_2 . D_1 -> G_1,\n * attempting to remove the domain constraints of this additional part.\n */\nstatic __isl_give isl_schedule_tree *group_expansion(\n\t__isl_take isl_schedule_tree *tree, __isl_keep isl_schedule_node *pos,\n\tstruct isl_schedule_group_data *data)\n{\n\tisl_union_set *domain;\n\tisl_union_map *expansion, *umap;\n\tisl_union_pw_multi_aff *contraction, *upma;\n\tint is_subset;\n\n\texpansion = isl_schedule_tree_expansion_get_expansion(tree);\n\tdomain = isl_union_map_range(expansion);\n\tis_subset = isl_union_set_is_subset(data->domain, domain);\n\tisl_union_set_free(domain);\n\tif (is_subset < 0)\n\t\treturn isl_schedule_tree_free(tree);\n\tif (!is_subset)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_internal,\n\t\t\t\"grouped domain should be part \"\n\t\t\t\"of outer expansion domain\",\n\t\t\treturn isl_schedule_tree_free(tree));\n\texpansion = isl_schedule_tree_expansion_get_expansion(tree);\n\tumap = isl_union_map_from_union_pw_multi_aff(\n\t\t\tisl_union_pw_multi_aff_copy(data->contraction));\n\tumap = isl_union_map_apply_range(expansion, umap);\n\texpansion = isl_schedule_tree_expansion_get_expansion(tree);\n\texpansion = isl_union_map_subtract_range(expansion,\n\t\t\t\tisl_union_set_copy(data->domain));\n\texpansion = isl_union_map_union(expansion, umap);\n\tumap = isl_union_map_universe(isl_union_map_copy(expansion));\n\tdomain = isl_union_map_range(umap);\n\tcontraction = isl_schedule_tree_expansion_get_contraction(tree);\n\tumap = isl_union_map_from_union_pw_multi_aff(contraction);\n\tumap = isl_union_map_apply_range(isl_union_map_copy(data->expansion),\n\t\t\t\t\tumap);\n\tupma = isl_union_pw_multi_aff_from_union_map(umap);\n\tcontraction = isl_schedule_tree_expansion_get_contraction(tree);\n\tcontraction = isl_union_pw_multi_aff_intersect_domain(contraction,\n\t\t\t\t\t\t\t\tdomain);\n\tdomain = isl_union_pw_multi_aff_domain(\n\t\t\t\tisl_union_pw_multi_aff_copy(upma));\n\tupma = isl_union_pw_multi_aff_gist(upma, domain);\n\tcontraction = isl_union_pw_multi_aff_union_add(contraction, upma);\n\ttree = isl_schedule_tree_expansion_set_contraction_and_expansion(tree,\n\t\t\t\t\t\t\tcontraction, expansion);\n\n\treturn tree;\n}\n\n/* Update the tree root \"tree\" to refer to the group instances\n * in data->group rather than the original domain elements in data->domain.\n * \"pos\" is the position in the original schedule tree where the modified\n * \"tree\" will be attached.\n *\n * If we have come across a domain or expansion node before (data->finished\n * is set), then we no longer need perform any modifications.\n *\n * If \"tree\" is a filter, then we add data->group_universe to the filter.\n * We also remove data->domain_universe from the filter if all the domain\n * elements in this universe that reach the filter node are part of\n * the elements that are being grouped by data->expansion.\n * If \"tree\" is a band, domain or expansion, then it is handled\n * in a separate function.\n */\nstatic __isl_give isl_schedule_tree *group_ancestor(\n\t__isl_take isl_schedule_tree *tree, __isl_keep isl_schedule_node *pos,\n\tvoid *user)\n{\n\tstruct isl_schedule_group_data *data = user;\n\tisl_union_set *domain;\n\tisl_bool is_covered;\n\n\tif (!tree || !pos)\n\t\treturn isl_schedule_tree_free(tree);\n\n\tif (data->finished)\n\t\treturn tree;\n\n\tswitch (isl_schedule_tree_get_type(tree)) {\n\tcase isl_schedule_node_error:\n\t\treturn isl_schedule_tree_free(tree);\n\tcase isl_schedule_node_extension:\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_unsupported,\n\t\t\t\"grouping not allowed in extended tree\",\n\t\t\treturn isl_schedule_tree_free(tree));\n\tcase isl_schedule_node_band:\n\t\ttree = group_band(tree, pos, data);\n\t\tbreak;\n\tcase isl_schedule_node_context:\n\t\ttree = group_context(tree, pos, data);\n\t\tbreak;\n\tcase isl_schedule_node_domain:\n\t\ttree = group_domain(tree, pos, data);\n\t\tdata->finished = 1;\n\t\tbreak;\n\tcase isl_schedule_node_filter:\n\t\tdomain = isl_schedule_node_get_domain(pos);\n\t\tis_covered = locally_covered_by_domain(domain, data);\n\t\tisl_union_set_free(domain);\n\t\tif (is_covered < 0)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tdomain = isl_schedule_tree_filter_get_filter(tree);\n\t\tif (is_covered)\n\t\t\tdomain = isl_union_set_subtract(domain,\n\t\t\t\t    isl_union_set_copy(data->domain_universe));\n\t\tdomain = isl_union_set_union(domain,\n\t\t\t\t    isl_union_set_copy(data->group_universe));\n\t\ttree = isl_schedule_tree_filter_set_filter(tree, domain);\n\t\tbreak;\n\tcase isl_schedule_node_expansion:\n\t\ttree = group_expansion(tree, pos, data);\n\t\tdata->finished = 1;\n\t\tbreak;\n\tcase isl_schedule_node_leaf:\n\tcase isl_schedule_node_guard:\n\tcase isl_schedule_node_mark:\n\tcase isl_schedule_node_sequence:\n\tcase isl_schedule_node_set:\n\t\tbreak;\n\t}\n\n\treturn tree;\n}\n\n/* Group the domain elements that reach \"node\" into instances\n * of a single statement with identifier \"group_id\".\n * In particular, group the domain elements according to their\n * prefix schedule.\n *\n * That is, introduce an expansion node with as contraction\n * the prefix schedule (with the target space replaced by \"group_id\")\n * and as expansion the inverse of this contraction (with its range\n * intersected with the domain elements that reach \"node\").\n * The outer nodes are then modified to refer to the group instances\n * instead of the original domain elements.\n *\n * No instance of \"group_id\" is allowed to reach \"node\" prior\n * to the grouping.\n * No ancestor of \"node\" is allowed to be an extension node.\n *\n * Return a pointer to original node in tree, i.e., the child\n * of the newly introduced expansion node.\n */\n__isl_give isl_schedule_node *isl_schedule_node_group(\n\t__isl_take isl_schedule_node *node, __isl_take isl_id *group_id)\n{\n\tstruct isl_schedule_group_data data = { 0 };\n\tisl_space *space;\n\tisl_union_set *domain;\n\tisl_union_pw_multi_aff *contraction;\n\tisl_union_map *expansion;\n\tisl_bool disjoint;\n\tisl_size depth;\n\n\tdepth = isl_schedule_node_get_schedule_depth(node);\n\tif (depth < 0 || !group_id)\n\t\tgoto error;\n\tif (check_insert(node) < 0)\n\t\tgoto error;\n\n\tdomain = isl_schedule_node_get_domain(node);\n\tdata.domain = isl_union_set_copy(domain);\n\tdata.domain_universe = isl_union_set_copy(domain);\n\tdata.domain_universe = isl_union_set_universe(data.domain_universe);\n\n\tdata.dim = depth;\n\tif (data.dim == 0) {\n\t\tisl_ctx *ctx;\n\t\tisl_set *set;\n\t\tisl_union_set *group;\n\t\tisl_union_map *univ;\n\n\t\tctx = isl_schedule_node_get_ctx(node);\n\t\tspace = isl_space_set_alloc(ctx, 0, 0);\n\t\tspace = isl_space_set_tuple_id(space, isl_dim_set, group_id);\n\t\tset = isl_set_universe(isl_space_copy(space));\n\t\tgroup = isl_union_set_from_set(set);\n\t\texpansion = isl_union_map_from_domain_and_range(domain, group);\n\t\tuniv = isl_union_map_universe(isl_union_map_copy(expansion));\n\t\tcontraction = isl_union_pw_multi_aff_from_union_map(univ);\n\t\texpansion = isl_union_map_reverse(expansion);\n\t} else {\n\t\tisl_multi_union_pw_aff *prefix;\n\t\tisl_union_set *univ;\n\n\t\tprefix =\n\t\tisl_schedule_node_get_prefix_schedule_multi_union_pw_aff(node);\n\t\tprefix = isl_multi_union_pw_aff_set_tuple_id(prefix,\n\t\t\t\t\t\t\tisl_dim_set, group_id);\n\t\tspace = isl_multi_union_pw_aff_get_space(prefix);\n\t\tcontraction = isl_union_pw_multi_aff_from_multi_union_pw_aff(\n\t\t\t\t\t\t\tprefix);\n\t\tuniv = isl_union_set_universe(isl_union_set_copy(domain));\n\t\tcontraction =\n\t\t    isl_union_pw_multi_aff_intersect_domain(contraction, univ);\n\t\texpansion = isl_union_map_from_union_pw_multi_aff(\n\t\t\t\t    isl_union_pw_multi_aff_copy(contraction));\n\t\texpansion = isl_union_map_reverse(expansion);\n\t\texpansion = isl_union_map_intersect_range(expansion, domain);\n\t}\n\tspace = isl_space_map_from_set(space);\n\tdata.sched = isl_multi_aff_identity(space);\n\tdata.group = isl_union_map_domain(isl_union_map_copy(expansion));\n\tdata.group = isl_union_set_coalesce(data.group);\n\tdata.group_universe = isl_union_set_copy(data.group);\n\tdata.group_universe = isl_union_set_universe(data.group_universe);\n\tdata.expansion = isl_union_map_copy(expansion);\n\tdata.contraction = isl_union_pw_multi_aff_copy(contraction);\n\tnode = isl_schedule_node_insert_expansion(node, contraction, expansion);\n\n\tdisjoint = isl_union_set_is_disjoint(data.domain_universe,\n\t\t\t\t\t    data.group_universe);\n\n\tnode = update_ancestors(node, &group_ancestor, &data);\n\n\tisl_union_set_free(data.domain);\n\tisl_union_set_free(data.domain_universe);\n\tisl_union_set_free(data.group);\n\tisl_union_set_free(data.group_universe);\n\tisl_multi_aff_free(data.sched);\n\tisl_union_map_free(data.expansion);\n\tisl_union_pw_multi_aff_free(data.contraction);\n\n\tnode = isl_schedule_node_child(node, 0);\n\n\tif (!node || disjoint < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (!disjoint)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"group instances already reach node\",\n\t\t\treturn isl_schedule_node_free(node));\n\n\treturn node;\nerror:\n\tisl_schedule_node_free(node);\n\tisl_id_free(group_id);\n\treturn NULL;\n}\n\n/* Compute the gist of the given band node with respect to \"context\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_band_gist(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_set *context)\n{\n\tisl_schedule_tree *tree;\n\n\ttree = isl_schedule_node_get_tree(node);\n\ttree = isl_schedule_tree_band_gist(tree, context);\n\treturn isl_schedule_node_graft_tree(node, tree);\n}\n\n/* Internal data structure for isl_schedule_node_gist.\n * \"n_expansion\" is the number of outer expansion nodes\n * with respect to the current position\n * \"filters\" contains an element for each outer filter, expansion or\n * extension node with respect to the current position, each representing\n * the intersection of the previous element and the filter on the filter node\n * or the expansion/extension of the previous element.\n * The first element in the original context passed to isl_schedule_node_gist.\n */\nstruct isl_node_gist_data {\n\tint n_expansion;\n\tisl_union_set_list *filters;\n};\n\n/* Enter the expansion node \"node\" during a isl_schedule_node_gist traversal.\n *\n * In particular, add an extra element to data->filters containing\n * the expansion of the previous element and replace the expansion\n * and contraction on \"node\" by the gist with respect to these filters.\n * Also keep track of the fact that we have entered another expansion.\n */\nstatic __isl_give isl_schedule_node *gist_enter_expansion(\n\t__isl_take isl_schedule_node *node, struct isl_node_gist_data *data)\n{\n\tisl_size n;\n\tisl_union_set *inner;\n\tisl_union_map *expansion;\n\tisl_union_pw_multi_aff *contraction;\n\n\tdata->n_expansion++;\n\n\tn = isl_union_set_list_n_union_set(data->filters);\n\tif (n < 0)\n\t\treturn isl_schedule_node_free(node);\n\tinner = isl_union_set_list_get_union_set(data->filters, n - 1);\n\texpansion = isl_schedule_node_expansion_get_expansion(node);\n\tinner = isl_union_set_apply(inner, expansion);\n\n\tcontraction = isl_schedule_node_expansion_get_contraction(node);\n\tcontraction = isl_union_pw_multi_aff_gist(contraction,\n\t\t\t\t\t\tisl_union_set_copy(inner));\n\n\tdata->filters = isl_union_set_list_add(data->filters, inner);\n\n\tinner = isl_union_set_list_get_union_set(data->filters, n - 1);\n\texpansion = isl_schedule_node_expansion_get_expansion(node);\n\texpansion = isl_union_map_gist_domain(expansion, inner);\n\tnode = isl_schedule_node_expansion_set_contraction_and_expansion(node,\n\t\t\t\t\t\tcontraction, expansion);\n\n\treturn node;\n}\n\n/* Leave the expansion node \"node\" during a isl_schedule_node_gist traversal.\n *\n * In particular, remove the element in data->filters that was added by\n * gist_enter_expansion and decrement the number of outer expansions.\n *\n * The expansion has already been simplified in gist_enter_expansion.\n * If this simplification results in an identity expansion, then\n * it is removed here.\n */\nstatic __isl_give isl_schedule_node *gist_leave_expansion(\n\t__isl_take isl_schedule_node *node, struct isl_node_gist_data *data)\n{\n\tisl_size n;\n\tisl_bool identity;\n\tisl_union_map *expansion;\n\n\texpansion = isl_schedule_node_expansion_get_expansion(node);\n\tidentity = isl_union_map_is_identity(expansion);\n\tisl_union_map_free(expansion);\n\n\tif (identity < 0)\n\t\tnode = isl_schedule_node_free(node);\n\telse if (identity)\n\t\tnode = isl_schedule_node_delete(node);\n\n\tn = isl_union_set_list_n_union_set(data->filters);\n\tif (n < 0)\n\t\treturn isl_schedule_node_free(node);\n\tdata->filters = isl_union_set_list_drop(data->filters, n - 1, 1);\n\n\tdata->n_expansion--;\n\n\treturn node;\n}\n\n/* Enter the extension node \"node\" during a isl_schedule_node_gist traversal.\n *\n * In particular, add an extra element to data->filters containing\n * the union of the previous element with the additional domain elements\n * introduced by the extension.\n */\nstatic __isl_give isl_schedule_node *gist_enter_extension(\n\t__isl_take isl_schedule_node *node, struct isl_node_gist_data *data)\n{\n\tisl_size n;\n\tisl_union_set *inner, *extra;\n\tisl_union_map *extension;\n\n\tn = isl_union_set_list_n_union_set(data->filters);\n\tif (n < 0)\n\t\treturn isl_schedule_node_free(node);\n\tinner = isl_union_set_list_get_union_set(data->filters, n - 1);\n\textension = isl_schedule_node_extension_get_extension(node);\n\textra = isl_union_map_range(extension);\n\tinner = isl_union_set_union(inner, extra);\n\n\tdata->filters = isl_union_set_list_add(data->filters, inner);\n\n\treturn node;\n}\n\n/* Can we finish gisting at this node?\n * That is, is the filter on the current filter node a subset of\n * the original context passed to isl_schedule_node_gist?\n * If we have gone through any expansions, then we cannot perform\n * this test since the current domain elements are incomparable\n * to the domain elements in the original context.\n */\nstatic isl_bool gist_done(__isl_keep isl_schedule_node *node,\n\tstruct isl_node_gist_data *data)\n{\n\tisl_union_set *filter, *outer;\n\tisl_bool subset;\n\n\tif (data->n_expansion != 0)\n\t\treturn isl_bool_false;\n\n\tfilter = isl_schedule_node_filter_get_filter(node);\n\touter = isl_union_set_list_get_union_set(data->filters, 0);\n\tsubset = isl_union_set_is_subset(filter, outer);\n\tisl_union_set_free(outer);\n\tisl_union_set_free(filter);\n\n\treturn subset;\n}\n\n/* Callback for \"traverse\" to enter a node and to move\n * to the deepest initial subtree that should be traversed\n * by isl_schedule_node_gist.\n *\n * The \"filters\" list is extended by one element each time\n * we come across a filter node by the result of intersecting\n * the last element in the list with the filter on the filter node.\n *\n * If the filter on the current filter node is a subset of\n * the original context passed to isl_schedule_node_gist,\n * then there is no need to go into its subtree since it cannot\n * be further simplified by the context.  The \"filters\" list is\n * still extended for consistency, but the actual value of the\n * added element is immaterial since it will not be used.\n *\n * Otherwise, the filter on the current filter node is replaced by\n * the gist of the original filter with respect to the intersection\n * of the original context with the intermediate filters.\n *\n * If the new element in the \"filters\" list is empty, then no elements\n * can reach the descendants of the current filter node.  The subtree\n * underneath the filter node is therefore removed.\n *\n * Each expansion node we come across is handled by\n * gist_enter_expansion.\n *\n * Each extension node we come across is handled by\n * gist_enter_extension.\n */\nstatic __isl_give isl_schedule_node *gist_enter(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\tstruct isl_node_gist_data *data = user;\n\n\tdo {\n\t\tisl_union_set *filter, *inner;\n\t\tisl_bool done, empty;\n\t\tisl_size n;\n\n\t\tswitch (isl_schedule_node_get_type(node)) {\n\t\tcase isl_schedule_node_error:\n\t\t\treturn isl_schedule_node_free(node);\n\t\tcase isl_schedule_node_expansion:\n\t\t\tnode = gist_enter_expansion(node, data);\n\t\t\tcontinue;\n\t\tcase isl_schedule_node_extension:\n\t\t\tnode = gist_enter_extension(node, data);\n\t\t\tcontinue;\n\t\tcase isl_schedule_node_band:\n\t\tcase isl_schedule_node_context:\n\t\tcase isl_schedule_node_domain:\n\t\tcase isl_schedule_node_guard:\n\t\tcase isl_schedule_node_leaf:\n\t\tcase isl_schedule_node_mark:\n\t\tcase isl_schedule_node_sequence:\n\t\tcase isl_schedule_node_set:\n\t\t\tcontinue;\n\t\tcase isl_schedule_node_filter:\n\t\t\tbreak;\n\t\t}\n\t\tdone = gist_done(node, data);\n\t\tfilter = isl_schedule_node_filter_get_filter(node);\n\t\tn = isl_union_set_list_n_union_set(data->filters);\n\t\tif (n < 0 || done < 0 || done) {\n\t\t\tdata->filters = isl_union_set_list_add(data->filters,\n\t\t\t\t\t\t\t\tfilter);\n\t\t\tif (n < 0 || done < 0)\n\t\t\t\treturn isl_schedule_node_free(node);\n\t\t\treturn node;\n\t\t}\n\t\tinner = isl_union_set_list_get_union_set(data->filters, n - 1);\n\t\tfilter = isl_union_set_gist(filter, isl_union_set_copy(inner));\n\t\tnode = isl_schedule_node_filter_set_filter(node,\n\t\t\t\t\t\tisl_union_set_copy(filter));\n\t\tfilter = isl_union_set_intersect(filter, inner);\n\t\tempty = isl_union_set_is_empty(filter);\n\t\tdata->filters = isl_union_set_list_add(data->filters, filter);\n\t\tif (empty < 0)\n\t\t\treturn isl_schedule_node_free(node);\n\t\tif (!empty)\n\t\t\tcontinue;\n\t\tnode = isl_schedule_node_child(node, 0);\n\t\tnode = isl_schedule_node_cut(node);\n\t\tnode = isl_schedule_node_parent(node);\n\t\treturn node;\n\t} while (isl_schedule_node_has_children(node) &&\n\t\t(node = isl_schedule_node_first_child(node)) != NULL);\n\n\treturn node;\n}\n\n/* Callback for \"traverse\" to leave a node for isl_schedule_node_gist.\n *\n * In particular, if the current node is a filter node, then we remove\n * the element on the \"filters\" list that was added when we entered\n * the node.  There is no need to compute any gist here, since we\n * already did that when we entered the node.\n *\n * Expansion nodes are handled by gist_leave_expansion.\n *\n * If the current node is an extension, then remove the element\n * in data->filters that was added by gist_enter_extension.\n *\n * If the current node is a band node, then we compute the gist of\n * the band node with respect to the intersection of the original context\n * and the intermediate filters.\n *\n * If the current node is a sequence or set node, then some of\n * the filter children may have become empty and so they are removed.\n * If only one child is left, then the set or sequence node along with\n * the single remaining child filter is removed.  The filter can be\n * removed because the filters on a sequence or set node are supposed\n * to partition the incoming domain instances.\n * In principle, it should then be impossible for there to be zero\n * remaining children, but should this happen, we replace the entire\n * subtree with an empty filter.\n */\nstatic __isl_give isl_schedule_node *gist_leave(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\tstruct isl_node_gist_data *data = user;\n\tisl_schedule_tree *tree;\n\tint i;\n\tisl_size n;\n\tisl_union_set *filter;\n\n\tswitch (isl_schedule_node_get_type(node)) {\n\tcase isl_schedule_node_error:\n\t\treturn isl_schedule_node_free(node);\n\tcase isl_schedule_node_expansion:\n\t\tnode = gist_leave_expansion(node, data);\n\t\tbreak;\n\tcase isl_schedule_node_extension:\n\tcase isl_schedule_node_filter:\n\t\tn = isl_union_set_list_n_union_set(data->filters);\n\t\tif (n < 0)\n\t\t\treturn isl_schedule_node_free(node);\n\t\tdata->filters = isl_union_set_list_drop(data->filters,\n\t\t\t\t\t\t\tn - 1, 1);\n\t\tbreak;\n\tcase isl_schedule_node_band:\n\t\tn = isl_union_set_list_n_union_set(data->filters);\n\t\tif (n < 0)\n\t\t\treturn isl_schedule_node_free(node);\n\t\tfilter = isl_union_set_list_get_union_set(data->filters, n - 1);\n\t\tnode = isl_schedule_node_band_gist(node, filter);\n\t\tbreak;\n\tcase isl_schedule_node_set:\n\tcase isl_schedule_node_sequence:\n\t\ttree = isl_schedule_node_get_tree(node);\n\t\tn = isl_schedule_tree_n_children(tree);\n\t\tif (n < 0)\n\t\t\ttree = isl_schedule_tree_free(tree);\n\t\tfor (i = n - 1; i >= 0; --i) {\n\t\t\tisl_schedule_tree *child;\n\t\t\tisl_union_set *filter;\n\t\t\tisl_bool empty;\n\n\t\t\tchild = isl_schedule_tree_get_child(tree, i);\n\t\t\tfilter = isl_schedule_tree_filter_get_filter(child);\n\t\t\tempty = isl_union_set_is_empty(filter);\n\t\t\tisl_union_set_free(filter);\n\t\t\tisl_schedule_tree_free(child);\n\t\t\tif (empty < 0)\n\t\t\t\ttree = isl_schedule_tree_free(tree);\n\t\t\telse if (empty)\n\t\t\t\ttree = isl_schedule_tree_drop_child(tree, i);\n\t\t}\n\t\tn = isl_schedule_tree_n_children(tree);\n\t\tif (n < 0)\n\t\t\ttree = isl_schedule_tree_free(tree);\n\t\tnode = isl_schedule_node_graft_tree(node, tree);\n\t\tif (n == 1) {\n\t\t\tnode = isl_schedule_node_delete(node);\n\t\t\tnode = isl_schedule_node_delete(node);\n\t\t} else if (n == 0) {\n\t\t\tisl_space *space;\n\n\t\t\tfilter =\n\t\t\t    isl_union_set_list_get_union_set(data->filters, 0);\n\t\t\tspace = isl_union_set_get_space(filter);\n\t\t\tisl_union_set_free(filter);\n\t\t\tfilter = isl_union_set_empty(space);\n\t\t\tnode = isl_schedule_node_cut(node);\n\t\t\tnode = isl_schedule_node_insert_filter(node, filter);\n\t\t}\n\t\tbreak;\n\tcase isl_schedule_node_context:\n\tcase isl_schedule_node_domain:\n\tcase isl_schedule_node_guard:\n\tcase isl_schedule_node_leaf:\n\tcase isl_schedule_node_mark:\n\t\tbreak;\n\t}\n\n\treturn node;\n}\n\n/* Compute the gist of the subtree at \"node\" with respect to\n * the reaching domain elements in \"context\".\n * In particular, compute the gist of all band and filter nodes\n * in the subtree with respect to \"context\".  Children of set or sequence\n * nodes that end up with an empty filter are removed completely.\n *\n * We keep track of the intersection of \"context\" with all outer filters\n * of the current node within the subtree in the final element of \"filters\".\n * Initially, this list contains the single element \"context\" and it is\n * extended or shortened each time we enter or leave a filter node.\n */\n__isl_give isl_schedule_node *isl_schedule_node_gist(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_set *context)\n{\n\tstruct isl_node_gist_data data;\n\n\tdata.n_expansion = 0;\n\tdata.filters = isl_union_set_list_from_union_set(context);\n\tnode = traverse(node, &gist_enter, &gist_leave, &data);\n\tisl_union_set_list_free(data.filters);\n\treturn node;\n}\n\n/* Intersect the domain of domain node \"node\" with \"domain\".\n *\n * If the domain of \"node\" is already a subset of \"domain\",\n * then nothing needs to be changed.\n *\n * Otherwise, we replace the domain of the domain node by the intersection\n * and simplify the subtree rooted at \"node\" with respect to this intersection.\n */\n__isl_give isl_schedule_node *isl_schedule_node_domain_intersect_domain(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_set *domain)\n{\n\tisl_schedule_tree *tree;\n\tisl_union_set *uset;\n\tint is_subset;\n\n\tif (!node || !domain)\n\t\tgoto error;\n\n\tuset = isl_schedule_tree_domain_get_domain(node->tree);\n\tis_subset = isl_union_set_is_subset(uset, domain);\n\tisl_union_set_free(uset);\n\tif (is_subset < 0)\n\t\tgoto error;\n\tif (is_subset) {\n\t\tisl_union_set_free(domain);\n\t\treturn node;\n\t}\n\n\ttree = isl_schedule_tree_copy(node->tree);\n\tuset = isl_schedule_tree_domain_get_domain(tree);\n\tuset = isl_union_set_intersect(uset, domain);\n\ttree = isl_schedule_tree_domain_set_domain(tree,\n\t\t\t\t\t\t    isl_union_set_copy(uset));\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = isl_schedule_node_gist(node, uset);\n\tnode = isl_schedule_node_parent(node);\n\n\treturn node;\nerror:\n\tisl_schedule_node_free(node);\n\tisl_union_set_free(domain);\n\treturn NULL;\n}\n\n/* Replace the domain of domain node \"node\" with the gist\n * of the original domain with respect to the parameter domain \"context\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_domain_gist_params(\n\t__isl_take isl_schedule_node *node, __isl_take isl_set *context)\n{\n\tisl_union_set *domain;\n\tisl_schedule_tree *tree;\n\n\tif (!node || !context)\n\t\tgoto error;\n\n\ttree = isl_schedule_tree_copy(node->tree);\n\tdomain = isl_schedule_tree_domain_get_domain(node->tree);\n\tdomain = isl_union_set_gist_params(domain, context);\n\ttree = isl_schedule_tree_domain_set_domain(tree, domain);\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\n\treturn node;\nerror:\n\tisl_schedule_node_free(node);\n\tisl_set_free(context);\n\treturn NULL;\n}\n\n/* Internal data structure for isl_schedule_node_get_subtree_expansion.\n * \"expansions\" contains a list of accumulated expansions\n * for each outer expansion, set or sequence node.  The first element\n * in the list is an identity mapping on the reaching domain elements.\n * \"res\" collects the results.\n */\nstruct isl_subtree_expansion_data {\n\tisl_union_map_list *expansions;\n\tisl_union_map *res;\n};\n\n/* Callback for \"traverse\" to enter a node and to move\n * to the deepest initial subtree that should be traversed\n * by isl_schedule_node_get_subtree_expansion.\n *\n * Whenever we come across an expansion node, the last element\n * of data->expansions is combined with the expansion\n * on the expansion node.\n *\n * Whenever we come across a filter node that is the child\n * of a set or sequence node, data->expansions is extended\n * with a new element that restricts the previous element\n * to the elements selected by the filter.\n * The previous element can then be reused while backtracking.\n */\nstatic __isl_give isl_schedule_node *subtree_expansion_enter(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\tstruct isl_subtree_expansion_data *data = user;\n\n\tdo {\n\t\tenum isl_schedule_node_type type;\n\t\tisl_union_set *filter;\n\t\tisl_union_map *inner, *expansion;\n\t\tisl_size n;\n\n\t\tswitch (isl_schedule_node_get_type(node)) {\n\t\tcase isl_schedule_node_error:\n\t\t\treturn isl_schedule_node_free(node);\n\t\tcase isl_schedule_node_filter:\n\t\t\ttype = isl_schedule_node_get_parent_type(node);\n\t\t\tif (type != isl_schedule_node_set &&\n\t\t\t    type != isl_schedule_node_sequence)\n\t\t\t\tbreak;\n\t\t\tfilter = isl_schedule_node_filter_get_filter(node);\n\t\t\tn = isl_union_map_list_n_union_map(data->expansions);\n\t\t\tif (n < 0)\n\t\t\t\tdata->expansions =\n\t\t\t\t    isl_union_map_list_free(data->expansions);\n\t\t\tinner =\n\t\t\t    isl_union_map_list_get_union_map(data->expansions,\n\t\t\t\t\t\t\t\tn - 1);\n\t\t\tinner = isl_union_map_intersect_range(inner, filter);\n\t\t\tdata->expansions =\n\t\t\t    isl_union_map_list_add(data->expansions, inner);\n\t\t\tbreak;\n\t\tcase isl_schedule_node_expansion:\n\t\t\tn = isl_union_map_list_n_union_map(data->expansions);\n\t\t\tif (n < 0)\n\t\t\t\tdata->expansions =\n\t\t\t\t    isl_union_map_list_free(data->expansions);\n\t\t\texpansion =\n\t\t\t\tisl_schedule_node_expansion_get_expansion(node);\n\t\t\tinner =\n\t\t\t    isl_union_map_list_get_union_map(data->expansions,\n\t\t\t\t\t\t\t\tn - 1);\n\t\t\tinner = isl_union_map_apply_range(inner, expansion);\n\t\t\tdata->expansions =\n\t\t\t    isl_union_map_list_set_union_map(data->expansions,\n\t\t\t\t\t\t\t\tn - 1, inner);\n\t\t\tbreak;\n\t\tcase isl_schedule_node_band:\n\t\tcase isl_schedule_node_context:\n\t\tcase isl_schedule_node_domain:\n\t\tcase isl_schedule_node_extension:\n\t\tcase isl_schedule_node_guard:\n\t\tcase isl_schedule_node_leaf:\n\t\tcase isl_schedule_node_mark:\n\t\tcase isl_schedule_node_sequence:\n\t\tcase isl_schedule_node_set:\n\t\t\tbreak;\n\t\t}\n\t} while (isl_schedule_node_has_children(node) &&\n\t\t(node = isl_schedule_node_first_child(node)) != NULL);\n\n\treturn node;\n}\n\n/* Callback for \"traverse\" to leave a node for\n * isl_schedule_node_get_subtree_expansion.\n *\n * If we come across a filter node that is the child\n * of a set or sequence node, then we remove the element\n * of data->expansions that was added in subtree_expansion_enter.\n *\n * If we reach a leaf node, then the accumulated expansion is\n * added to data->res.\n */\nstatic __isl_give isl_schedule_node *subtree_expansion_leave(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\tstruct isl_subtree_expansion_data *data = user;\n\tisl_size n;\n\tisl_union_map *inner;\n\tenum isl_schedule_node_type type;\n\n\tswitch (isl_schedule_node_get_type(node)) {\n\tcase isl_schedule_node_error:\n\t\treturn isl_schedule_node_free(node);\n\tcase isl_schedule_node_filter:\n\t\ttype = isl_schedule_node_get_parent_type(node);\n\t\tif (type != isl_schedule_node_set &&\n\t\t    type != isl_schedule_node_sequence)\n\t\t\tbreak;\n\t\tn = isl_union_map_list_n_union_map(data->expansions);\n\t\tif (n < 0)\n\t\t\tdata->expansions =\n\t\t\t\t    isl_union_map_list_free(data->expansions);\n\t\tdata->expansions = isl_union_map_list_drop(data->expansions,\n\t\t\t\t\t\t\tn - 1, 1);\n\t\tbreak;\n\tcase isl_schedule_node_leaf:\n\t\tn = isl_union_map_list_n_union_map(data->expansions);\n\t\tif (n < 0)\n\t\t\tdata->expansions =\n\t\t\t\t    isl_union_map_list_free(data->expansions);\n\t\tinner = isl_union_map_list_get_union_map(data->expansions,\n\t\t\t\t\t\t\tn - 1);\n\t\tdata->res = isl_union_map_union(data->res, inner);\n\t\tbreak;\n\tcase isl_schedule_node_band:\n\tcase isl_schedule_node_context:\n\tcase isl_schedule_node_domain:\n\tcase isl_schedule_node_expansion:\n\tcase isl_schedule_node_extension:\n\tcase isl_schedule_node_guard:\n\tcase isl_schedule_node_mark:\n\tcase isl_schedule_node_sequence:\n\tcase isl_schedule_node_set:\n\t\tbreak;\n\t}\n\n\treturn node;\n}\n\n/* Return a mapping from the domain elements that reach \"node\"\n * to the corresponding domain elements in the leaves of the subtree\n * rooted at \"node\" obtained by composing the intermediate expansions.\n *\n * We start out with an identity mapping between the domain elements\n * that reach \"node\" and compose it with all the expansions\n * on a path from \"node\" to a leaf while traversing the subtree.\n * Within the children of an a sequence or set node, the\n * accumulated expansion is restricted to the elements selected\n * by the filter child.\n */\n__isl_give isl_union_map *isl_schedule_node_get_subtree_expansion(\n\t__isl_keep isl_schedule_node *node)\n{\n\tstruct isl_subtree_expansion_data data;\n\tisl_space *space;\n\tisl_union_set *domain;\n\tisl_union_map *expansion;\n\n\tif (!node)\n\t\treturn NULL;\n\n\tdomain = isl_schedule_node_get_universe_domain(node);\n\tspace = isl_union_set_get_space(domain);\n\texpansion = isl_union_set_identity(domain);\n\tdata.res = isl_union_map_empty(space);\n\tdata.expansions = isl_union_map_list_from_union_map(expansion);\n\n\tnode = isl_schedule_node_copy(node);\n\tnode = traverse(node, &subtree_expansion_enter,\n\t\t\t&subtree_expansion_leave, &data);\n\tif (!node)\n\t\tdata.res = isl_union_map_free(data.res);\n\tisl_schedule_node_free(node);\n\n\tisl_union_map_list_free(data.expansions);\n\n\treturn data.res;\n}\n\n/* Internal data structure for isl_schedule_node_get_subtree_contraction.\n * \"contractions\" contains a list of accumulated contractions\n * for each outer expansion, set or sequence node.  The first element\n * in the list is an identity mapping on the reaching domain elements.\n * \"res\" collects the results.\n */\nstruct isl_subtree_contraction_data {\n\tisl_union_pw_multi_aff_list *contractions;\n\tisl_union_pw_multi_aff *res;\n};\n\n/* Callback for \"traverse\" to enter a node and to move\n * to the deepest initial subtree that should be traversed\n * by isl_schedule_node_get_subtree_contraction.\n *\n * Whenever we come across an expansion node, the last element\n * of data->contractions is combined with the contraction\n * on the expansion node.\n *\n * Whenever we come across a filter node that is the child\n * of a set or sequence node, data->contractions is extended\n * with a new element that restricts the previous element\n * to the elements selected by the filter.\n * The previous element can then be reused while backtracking.\n */\nstatic __isl_give isl_schedule_node *subtree_contraction_enter(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\tstruct isl_subtree_contraction_data *data = user;\n\n\tdo {\n\t\tenum isl_schedule_node_type type;\n\t\tisl_union_set *filter;\n\t\tisl_union_pw_multi_aff *inner, *contraction;\n\t\tisl_size n;\n\n\t\tswitch (isl_schedule_node_get_type(node)) {\n\t\tcase isl_schedule_node_error:\n\t\t\treturn isl_schedule_node_free(node);\n\t\tcase isl_schedule_node_filter:\n\t\t\ttype = isl_schedule_node_get_parent_type(node);\n\t\t\tif (type != isl_schedule_node_set &&\n\t\t\t    type != isl_schedule_node_sequence)\n\t\t\t\tbreak;\n\t\t\tfilter = isl_schedule_node_filter_get_filter(node);\n\t\t\tn = isl_union_pw_multi_aff_list_n_union_pw_multi_aff(\n\t\t\t\t\t\tdata->contractions);\n\t\t\tif (n < 0)\n\t\t\t\tdata->contractions =\n\t\t\t\t    isl_union_pw_multi_aff_list_free(\n\t\t\t\t\t\t\t    data->contractions);\n\t\t\tinner =\n\t\t\t    isl_union_pw_multi_aff_list_get_union_pw_multi_aff(\n\t\t\t\t\t\tdata->contractions, n - 1);\n\t\t\tinner = isl_union_pw_multi_aff_intersect_domain(inner,\n\t\t\t\t\t\t\t\tfilter);\n\t\t\tdata->contractions =\n\t\t\t    isl_union_pw_multi_aff_list_add(data->contractions,\n\t\t\t\t\t\t\t\tinner);\n\t\t\tbreak;\n\t\tcase isl_schedule_node_expansion:\n\t\t\tn = isl_union_pw_multi_aff_list_n_union_pw_multi_aff(\n\t\t\t\t\t\tdata->contractions);\n\t\t\tif (n < 0)\n\t\t\t\tdata->contractions =\n\t\t\t\t    isl_union_pw_multi_aff_list_free(\n\t\t\t\t\t\t\t    data->contractions);\n\t\t\tcontraction =\n\t\t\t    isl_schedule_node_expansion_get_contraction(node);\n\t\t\tinner =\n\t\t\t    isl_union_pw_multi_aff_list_get_union_pw_multi_aff(\n\t\t\t\t\t\tdata->contractions, n - 1);\n\t\t\tinner =\n\t\t\t    isl_union_pw_multi_aff_pullback_union_pw_multi_aff(\n\t\t\t\t\t\tinner, contraction);\n\t\t\tdata->contractions =\n\t\t\t    isl_union_pw_multi_aff_list_set_union_pw_multi_aff(\n\t\t\t\t\tdata->contractions, n - 1, inner);\n\t\t\tbreak;\n\t\tcase isl_schedule_node_band:\n\t\tcase isl_schedule_node_context:\n\t\tcase isl_schedule_node_domain:\n\t\tcase isl_schedule_node_extension:\n\t\tcase isl_schedule_node_guard:\n\t\tcase isl_schedule_node_leaf:\n\t\tcase isl_schedule_node_mark:\n\t\tcase isl_schedule_node_sequence:\n\t\tcase isl_schedule_node_set:\n\t\t\tbreak;\n\t\t}\n\t} while (isl_schedule_node_has_children(node) &&\n\t\t(node = isl_schedule_node_first_child(node)) != NULL);\n\n\treturn node;\n}\n\n/* Callback for \"traverse\" to leave a node for\n * isl_schedule_node_get_subtree_contraction.\n *\n * If we come across a filter node that is the child\n * of a set or sequence node, then we remove the element\n * of data->contractions that was added in subtree_contraction_enter.\n *\n * If we reach a leaf node, then the accumulated contraction is\n * added to data->res.\n */\nstatic __isl_give isl_schedule_node *subtree_contraction_leave(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\tstruct isl_subtree_contraction_data *data = user;\n\tisl_size n;\n\tisl_union_pw_multi_aff *inner;\n\tenum isl_schedule_node_type type;\n\n\tswitch (isl_schedule_node_get_type(node)) {\n\tcase isl_schedule_node_error:\n\t\treturn isl_schedule_node_free(node);\n\tcase isl_schedule_node_filter:\n\t\ttype = isl_schedule_node_get_parent_type(node);\n\t\tif (type != isl_schedule_node_set &&\n\t\t    type != isl_schedule_node_sequence)\n\t\t\tbreak;\n\t\tn = isl_union_pw_multi_aff_list_n_union_pw_multi_aff(\n\t\t\t\t\t\tdata->contractions);\n\t\tif (n < 0)\n\t\t\tdata->contractions = isl_union_pw_multi_aff_list_free(\n\t\t\t\t\t\t\t    data->contractions);\n\t\tdata->contractions =\n\t\t\tisl_union_pw_multi_aff_list_drop(data->contractions,\n\t\t\t\t\t\t\tn - 1, 1);\n\t\tbreak;\n\tcase isl_schedule_node_leaf:\n\t\tn = isl_union_pw_multi_aff_list_n_union_pw_multi_aff(\n\t\t\t\t\t\tdata->contractions);\n\t\tif (n < 0)\n\t\t\tdata->contractions = isl_union_pw_multi_aff_list_free(\n\t\t\t\t\t\t\t    data->contractions);\n\t\tinner = isl_union_pw_multi_aff_list_get_union_pw_multi_aff(\n\t\t\t\t\t\tdata->contractions, n - 1);\n\t\tdata->res = isl_union_pw_multi_aff_union_add(data->res, inner);\n\t\tbreak;\n\tcase isl_schedule_node_band:\n\tcase isl_schedule_node_context:\n\tcase isl_schedule_node_domain:\n\tcase isl_schedule_node_expansion:\n\tcase isl_schedule_node_extension:\n\tcase isl_schedule_node_guard:\n\tcase isl_schedule_node_mark:\n\tcase isl_schedule_node_sequence:\n\tcase isl_schedule_node_set:\n\t\tbreak;\n\t}\n\n\treturn node;\n}\n\n/* Return a mapping from the domain elements in the leaves of the subtree\n * rooted at \"node\" to the corresponding domain elements that reach \"node\"\n * obtained by composing the intermediate contractions.\n *\n * We start out with an identity mapping between the domain elements\n * that reach \"node\" and compose it with all the contractions\n * on a path from \"node\" to a leaf while traversing the subtree.\n * Within the children of an a sequence or set node, the\n * accumulated contraction is restricted to the elements selected\n * by the filter child.\n */\n__isl_give isl_union_pw_multi_aff *isl_schedule_node_get_subtree_contraction(\n\t__isl_keep isl_schedule_node *node)\n{\n\tstruct isl_subtree_contraction_data data;\n\tisl_space *space;\n\tisl_union_set *domain;\n\tisl_union_pw_multi_aff *contraction;\n\n\tif (!node)\n\t\treturn NULL;\n\n\tdomain = isl_schedule_node_get_universe_domain(node);\n\tspace = isl_union_set_get_space(domain);\n\tcontraction = isl_union_set_identity_union_pw_multi_aff(domain);\n\tdata.res = isl_union_pw_multi_aff_empty(space);\n\tdata.contractions =\n\t    isl_union_pw_multi_aff_list_from_union_pw_multi_aff(contraction);\n\n\tnode = isl_schedule_node_copy(node);\n\tnode = traverse(node, &subtree_contraction_enter,\n\t\t\t&subtree_contraction_leave, &data);\n\tif (!node)\n\t\tdata.res = isl_union_pw_multi_aff_free(data.res);\n\tisl_schedule_node_free(node);\n\n\tisl_union_pw_multi_aff_list_free(data.contractions);\n\n\treturn data.res;\n}\n\n/* Do the nearest \"n\" ancestors of \"node\" have the types given in \"types\"\n * (starting at the parent of \"node\")?\n */\nstatic isl_bool has_ancestors(__isl_keep isl_schedule_node *node,\n\tint n, enum isl_schedule_node_type *types)\n{\n\tint i;\n\tisl_size n_ancestor;\n\n\tif (!node)\n\t\treturn isl_bool_error;\n\n\tn_ancestor = isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n\tif (n_ancestor < 0)\n\t\treturn isl_bool_error;\n\tif (n_ancestor < n)\n\t\treturn isl_bool_false;\n\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_schedule_tree *tree;\n\t\tint correct_type;\n\n\t\ttree = isl_schedule_tree_list_get_schedule_tree(node->ancestors,\n\t\t\t\t\t\t\t    n_ancestor - 1 - i);\n\t\tif (!tree)\n\t\t\treturn isl_bool_error;\n\t\tcorrect_type = isl_schedule_tree_get_type(tree) == types[i];\n\t\tisl_schedule_tree_free(tree);\n\t\tif (!correct_type)\n\t\t\treturn isl_bool_false;\n\t}\n\n\treturn isl_bool_true;\n}\n\n/* Given a node \"node\" that appears in an extension (i.e., it is the child\n * of a filter in a sequence inside an extension node), are the spaces\n * of the extension specified by \"extension\" disjoint from those\n * of both the original extension and the domain elements that reach\n * that original extension?\n */\nstatic int is_disjoint_extension(__isl_keep isl_schedule_node *node,\n\t__isl_keep isl_union_map *extension)\n{\n\tisl_union_map *old;\n\tisl_union_set *domain;\n\tint empty;\n\n\tnode = isl_schedule_node_copy(node);\n\tnode = isl_schedule_node_parent(node);\n\tnode = isl_schedule_node_parent(node);\n\tnode = isl_schedule_node_parent(node);\n\told = isl_schedule_node_extension_get_extension(node);\n\tdomain = isl_schedule_node_get_universe_domain(node);\n\tisl_schedule_node_free(node);\n\told = isl_union_map_universe(old);\n\tdomain = isl_union_set_union(domain, isl_union_map_range(old));\n\textension = isl_union_map_copy(extension);\n\textension = isl_union_map_intersect_range(extension, domain);\n\tempty = isl_union_map_is_empty(extension);\n\tisl_union_map_free(extension);\n\n\treturn empty;\n}\n\n/* Given a node \"node\" that is governed by an extension node, extend\n * that extension node with \"extension\".\n *\n * In particular, \"node\" is the child of a filter in a sequence that\n * is in turn a child of an extension node.  Extend that extension node\n * with \"extension\".\n *\n * Return a pointer to the parent of the original node (i.e., a filter).\n */\nstatic __isl_give isl_schedule_node *extend_extension(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_map *extension)\n{\n\tisl_size pos;\n\tisl_bool disjoint;\n\tisl_union_map *node_extension;\n\n\tnode = isl_schedule_node_parent(node);\n\tpos = isl_schedule_node_get_child_position(node);\n\tif (pos < 0)\n\t\tnode = isl_schedule_node_free(node);\n\tnode = isl_schedule_node_parent(node);\n\tnode = isl_schedule_node_parent(node);\n\tnode_extension = isl_schedule_node_extension_get_extension(node);\n\tdisjoint = isl_union_map_is_disjoint(extension, node_extension);\n\textension = isl_union_map_union(extension, node_extension);\n\tnode = isl_schedule_node_extension_set_extension(node, extension);\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = isl_schedule_node_child(node, pos);\n\n\tif (disjoint < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (!node)\n\t\treturn NULL;\n\tif (!disjoint)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"extension domain should be disjoint from earlier \"\n\t\t\t\"extensions\", return isl_schedule_node_free(node));\n\n\treturn node;\n}\n\n/* Return the universe of \"uset\" if this universe is disjoint from \"ref\".\n * Otherwise, return \"uset\".\n *\n * Also check if \"uset\" itself is disjoint from \"ref\", reporting\n * an error if it is not.\n */\nstatic __isl_give isl_union_set *replace_by_universe_if_disjoint(\n\t__isl_take isl_union_set *uset, __isl_keep isl_union_set *ref)\n{\n\tint disjoint;\n\tisl_union_set *universe;\n\n\tdisjoint = isl_union_set_is_disjoint(uset, ref);\n\tif (disjoint < 0)\n\t\treturn isl_union_set_free(uset);\n\tif (!disjoint)\n\t\tisl_die(isl_union_set_get_ctx(uset), isl_error_invalid,\n\t\t\t\"extension domain should be disjoint from \"\n\t\t\t\"current domain\", return isl_union_set_free(uset));\n\n\tuniverse = isl_union_set_universe(isl_union_set_copy(uset));\n\tdisjoint = isl_union_set_is_disjoint(universe, ref);\n\tif (disjoint >= 0 && disjoint) {\n\t\tisl_union_set_free(uset);\n\t\treturn universe;\n\t}\n\tisl_union_set_free(universe);\n\n\tif (disjoint < 0)\n\t\treturn isl_union_set_free(uset);\n\treturn uset;\n}\n\n/* Insert an extension node on top of \"node\" with extension \"extension\".\n * In addition, insert a filter that separates node from the extension\n * between the extension node and \"node\".\n * Return a pointer to the inserted filter node.\n *\n * If \"node\" already appears in an extension (i.e., if it is the child\n * of a filter in a sequence inside an extension node), then extend that\n * extension with \"extension\" instead.\n * In this case, a pointer to the original filter node is returned.\n * Note that if some of the elements in the new extension live in the\n * same space as those of the original extension or the domain elements\n * reaching the original extension, then we insert a new extension anyway.\n * Otherwise, we would have to adjust the filters in the sequence child\n * of the extension to ensure that the elements in the new extension\n * are filtered out.\n */\nstatic __isl_give isl_schedule_node *insert_extension(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_map *extension)\n{\n\tenum isl_schedule_node_type ancestors[] =\n\t\t{ isl_schedule_node_filter, isl_schedule_node_sequence,\n\t\t  isl_schedule_node_extension };\n\tisl_union_set *domain;\n\tisl_union_set *filter;\n\tisl_bool in_ext;\n\n\tin_ext = has_ancestors(node, 3, ancestors);\n\tif (in_ext < 0)\n\t\tgoto error;\n\tif (in_ext) {\n\t\tint disjoint;\n\n\t\tdisjoint = is_disjoint_extension(node, extension);\n\t\tif (disjoint < 0)\n\t\t\tgoto error;\n\t\tif (disjoint)\n\t\t\treturn extend_extension(node, extension);\n\t}\n\n\tfilter = isl_schedule_node_get_domain(node);\n\tdomain = isl_union_map_range(isl_union_map_copy(extension));\n\tfilter = replace_by_universe_if_disjoint(filter, domain);\n\tisl_union_set_free(domain);\n\n\tnode = isl_schedule_node_insert_filter(node, filter);\n\tnode = isl_schedule_node_insert_extension(node, extension);\n\tnode = isl_schedule_node_child(node, 0);\n\treturn node;\nerror:\n\tisl_schedule_node_free(node);\n\tisl_union_map_free(extension);\n\treturn NULL;\n}\n\n/* Replace the subtree that \"node\" points to by \"tree\" (which has\n * a sequence root with two children), except if the parent of \"node\"\n * is a sequence as well, in which case \"tree\" is spliced at the position\n * of \"node\" in its parent.\n * Return a pointer to the child of the \"tree_pos\" (filter) child of \"tree\"\n * in the updated schedule tree.\n */\nstatic __isl_give isl_schedule_node *graft_or_splice(\n\t__isl_take isl_schedule_node *node, __isl_take isl_schedule_tree *tree,\n\tint tree_pos)\n{\n\tisl_size pos;\n\n\tif (isl_schedule_node_get_parent_type(node) ==\n\t    isl_schedule_node_sequence) {\n\t\tpos = isl_schedule_node_get_child_position(node);\n\t\tif (pos < 0)\n\t\t\tnode = isl_schedule_node_free(node);\n\t\tnode = isl_schedule_node_parent(node);\n\t\tnode = isl_schedule_node_sequence_splice(node, pos, tree);\n\t} else {\n\t\tpos = 0;\n\t\tnode = isl_schedule_node_graft_tree(node, tree);\n\t}\n\tnode = isl_schedule_node_child(node, pos + tree_pos);\n\tnode = isl_schedule_node_child(node, 0);\n\n\treturn node;\n}\n\n/* Insert a node \"graft\" into the schedule tree of \"node\" such that it\n * is executed before (if \"before\" is set) or after (if \"before\" is not set)\n * the node that \"node\" points to.\n * The root of \"graft\" is an extension node.\n * Return a pointer to the node that \"node\" pointed to.\n *\n * We first insert an extension node on top of \"node\" (or extend\n * the extension node if there already is one), with a filter on \"node\"\n * separating it from the extension.\n * We then insert a filter in the graft to separate it from the original\n * domain elements and combine the original and new tree in a sequence.\n * If we have extended an extension node, then the children of this\n * sequence are spliced in the sequence of the extended extension\n * at the position where \"node\" appears in the original extension.\n * Otherwise, the sequence pair is attached to the new extension node.\n */\nstatic __isl_give isl_schedule_node *graft_extension(\n\t__isl_take isl_schedule_node *node, __isl_take isl_schedule_node *graft,\n\tint before)\n{\n\tisl_union_map *extension;\n\tisl_union_set *graft_domain;\n\tisl_union_set *node_domain;\n\tisl_schedule_tree *tree, *tree_graft;\n\n\textension = isl_schedule_node_extension_get_extension(graft);\n\tgraft_domain = isl_union_map_range(isl_union_map_copy(extension));\n\tnode_domain = isl_schedule_node_get_universe_domain(node);\n\tnode = insert_extension(node, extension);\n\n\tgraft_domain = replace_by_universe_if_disjoint(graft_domain,\n\t\t\t\t\t\t\tnode_domain);\n\tisl_union_set_free(node_domain);\n\n\ttree = isl_schedule_node_get_tree(node);\n\tif (!isl_schedule_node_has_children(graft)) {\n\t\ttree_graft = isl_schedule_tree_from_filter(graft_domain);\n\t} else {\n\t\tgraft = isl_schedule_node_child(graft, 0);\n\t\ttree_graft = isl_schedule_node_get_tree(graft);\n\t\ttree_graft = isl_schedule_tree_insert_filter(tree_graft,\n\t\t\t\t\t\t\t\tgraft_domain);\n\t}\n\tif (before)\n\t\ttree = isl_schedule_tree_sequence_pair(tree_graft, tree);\n\telse\n\t\ttree = isl_schedule_tree_sequence_pair(tree, tree_graft);\n\tnode = graft_or_splice(node, tree, before);\n\n\tisl_schedule_node_free(graft);\n\n\treturn node;\n}\n\n/* Replace the root domain node of \"node\" by an extension node suitable\n * for insertion at \"pos\".\n * That is, create an extension node that maps the outer band nodes\n * at \"pos\" to the domain of the root node of \"node\" and attach\n * the child of this root node to the extension node.\n */\nstatic __isl_give isl_schedule_node *extension_from_domain(\n\t__isl_take isl_schedule_node *node, __isl_keep isl_schedule_node *pos)\n{\n\tisl_union_set *universe;\n\tisl_union_set *domain;\n\tisl_union_map *ext;\n\tisl_size depth;\n\tisl_bool anchored;\n\tisl_space *space;\n\tisl_schedule_node *res;\n\tisl_schedule_tree *tree;\n\n\tdepth = isl_schedule_node_get_schedule_depth(pos);\n\tanchored = isl_schedule_node_is_subtree_anchored(node);\n\tif (depth < 0 || anchored < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (anchored)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_unsupported,\n\t\t\t\"cannot graft anchored tree with domain root\",\n\t\t\treturn isl_schedule_node_free(node));\n\n\tdomain = isl_schedule_node_domain_get_domain(node);\n\tspace = isl_union_set_get_space(domain);\n\tspace = isl_space_set_from_params(space);\n\tspace = isl_space_add_dims(space, isl_dim_set, depth);\n\tuniverse = isl_union_set_from_set(isl_set_universe(space));\n\text = isl_union_map_from_domain_and_range(universe, domain);\n\tres = isl_schedule_node_from_extension(ext);\n\tnode = isl_schedule_node_child(node, 0);\n\tif (!node)\n\t\treturn isl_schedule_node_free(res);\n\tif (!isl_schedule_tree_is_leaf(node->tree)) {\n\t\ttree = isl_schedule_node_get_tree(node);\n\t\tres = isl_schedule_node_child(res, 0);\n\t\tres = isl_schedule_node_graft_tree(res, tree);\n\t\tres = isl_schedule_node_parent(res);\n\t}\n\tisl_schedule_node_free(node);\n\n\treturn res;\n}\n\n/* Insert a node \"graft\" into the schedule tree of \"node\" such that it\n * is executed before (if \"before\" is set) or after (if \"before\" is not set)\n * the node that \"node\" points to.\n * The root of \"graft\" may be either a domain or an extension node.\n * In the latter case, the domain of the extension needs to correspond\n * to the outer band nodes of \"node\".\n * The elements of the domain or the range of the extension may not\n * intersect with the domain elements that reach \"node\".\n * The schedule tree of \"graft\" may not be anchored.\n *\n * The schedule tree of \"node\" is modified to include an extension node\n * corresponding to the root node of \"graft\" as a child of the original\n * parent of \"node\".  The original node that \"node\" points to and the\n * child of the root node of \"graft\" are attached to this extension node\n * through a sequence, with appropriate filters and with the child\n * of \"graft\" appearing before or after the original \"node\".\n *\n * If \"node\" already appears inside a sequence that is the child of\n * an extension node and if the spaces of the new domain elements\n * do not overlap with those of the original domain elements,\n * then that extension node is extended with the new extension\n * rather than introducing a new segment of extension and sequence nodes.\n *\n * Return a pointer to the same node in the modified tree that\n * \"node\" pointed to in the original tree.\n */\nstatic __isl_give isl_schedule_node *isl_schedule_node_graft_before_or_after(\n\t__isl_take isl_schedule_node *node, __isl_take isl_schedule_node *graft,\n\tint before)\n{\n\tif (!node || !graft)\n\t\tgoto error;\n\tif (check_insert(node) < 0)\n\t\tgoto error;\n\n\tif (isl_schedule_node_get_type(graft) == isl_schedule_node_domain)\n\t\tgraft = extension_from_domain(graft, node);\n\n\tif (!graft)\n\t\tgoto error;\n\tif (isl_schedule_node_get_type(graft) != isl_schedule_node_extension)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"expecting domain or extension as root of graft\",\n\t\t\tgoto error);\n\n\treturn graft_extension(node, graft, before);\nerror:\n\tisl_schedule_node_free(node);\n\tisl_schedule_node_free(graft);\n\treturn NULL;\n}\n\n/* Insert a node \"graft\" into the schedule tree of \"node\" such that it\n * is executed before the node that \"node\" points to.\n * The root of \"graft\" may be either a domain or an extension node.\n * In the latter case, the domain of the extension needs to correspond\n * to the outer band nodes of \"node\".\n * The elements of the domain or the range of the extension may not\n * intersect with the domain elements that reach \"node\".\n * The schedule tree of \"graft\" may not be anchored.\n *\n * Return a pointer to the same node in the modified tree that\n * \"node\" pointed to in the original tree.\n */\n__isl_give isl_schedule_node *isl_schedule_node_graft_before(\n\t__isl_take isl_schedule_node *node, __isl_take isl_schedule_node *graft)\n{\n\treturn isl_schedule_node_graft_before_or_after(node, graft, 1);\n}\n\n/* Insert a node \"graft\" into the schedule tree of \"node\" such that it\n * is executed after the node that \"node\" points to.\n * The root of \"graft\" may be either a domain or an extension node.\n * In the latter case, the domain of the extension needs to correspond\n * to the outer band nodes of \"node\".\n * The elements of the domain or the range of the extension may not\n * intersect with the domain elements that reach \"node\".\n * The schedule tree of \"graft\" may not be anchored.\n *\n * Return a pointer to the same node in the modified tree that\n * \"node\" pointed to in the original tree.\n */\n__isl_give isl_schedule_node *isl_schedule_node_graft_after(\n\t__isl_take isl_schedule_node *node,\n\t__isl_take isl_schedule_node *graft)\n{\n\treturn isl_schedule_node_graft_before_or_after(node, graft, 0);\n}\n\n/* Split the domain elements that reach \"node\" into those that satisfy\n * \"filter\" and those that do not.  Arrange for the first subset to be\n * executed before or after the second subset, depending on the value\n * of \"before\".\n * Return a pointer to the tree corresponding to the second subset,\n * except when this subset is empty in which case the original pointer\n * is returned.\n * If both subsets are non-empty, then a sequence node is introduced\n * to impose the order.  If the grandparent of the original node was\n * itself a sequence, then the original child is replaced by two children\n * in this sequence instead.\n * The children in the sequence are copies of the original subtree,\n * simplified with respect to their filters.\n */\nstatic __isl_give isl_schedule_node *isl_schedule_node_order_before_or_after(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_set *filter,\n\tint before)\n{\n\tenum isl_schedule_node_type ancestors[] =\n\t\t{ isl_schedule_node_filter, isl_schedule_node_sequence };\n\tisl_union_set *node_domain, *node_filter = NULL, *parent_filter;\n\tisl_schedule_node *node2;\n\tisl_schedule_tree *tree1, *tree2;\n\tisl_bool empty1, empty2;\n\tisl_bool in_seq;\n\n\tif (!node || !filter)\n\t\tgoto error;\n\tif (check_insert(node) < 0)\n\t\tgoto error;\n\n\tin_seq = has_ancestors(node, 2, ancestors);\n\tif (in_seq < 0)\n\t\tgoto error;\n\tnode_domain = isl_schedule_node_get_domain(node);\n\tfilter = isl_union_set_gist(filter, isl_union_set_copy(node_domain));\n\tnode_filter = isl_union_set_copy(node_domain);\n\tnode_filter = isl_union_set_subtract(node_filter,\n\t\t\t\t\t\tisl_union_set_copy(filter));\n\tnode_filter = isl_union_set_gist(node_filter, node_domain);\n\tempty1 = isl_union_set_is_empty(filter);\n\tempty2 = isl_union_set_is_empty(node_filter);\n\tif (empty1 < 0 || empty2 < 0)\n\t\tgoto error;\n\tif (empty1 || empty2) {\n\t\tisl_union_set_free(filter);\n\t\tisl_union_set_free(node_filter);\n\t\treturn node;\n\t}\n\n\tif (in_seq) {\n\t\tnode = isl_schedule_node_parent(node);\n\t\tparent_filter = isl_schedule_node_filter_get_filter(node);\n\t\tnode_filter = isl_union_set_intersect(node_filter,\n\t\t\t\t\t    isl_union_set_copy(parent_filter));\n\t\tfilter = isl_union_set_intersect(filter, parent_filter);\n\t}\n\n\tnode2 = isl_schedule_node_copy(node);\n\tnode = isl_schedule_node_gist(node, isl_union_set_copy(node_filter));\n\tnode2 = isl_schedule_node_gist(node2, isl_union_set_copy(filter));\n\ttree1 = isl_schedule_node_get_tree(node);\n\ttree2 = isl_schedule_node_get_tree(node2);\n\ttree1 = isl_schedule_tree_insert_filter(tree1, node_filter);\n\ttree2 = isl_schedule_tree_insert_filter(tree2, filter);\n\tisl_schedule_node_free(node2);\n\n\tif (before) {\n\t\ttree1 = isl_schedule_tree_sequence_pair(tree2, tree1);\n\t\tnode = graft_or_splice(node, tree1, 1);\n\t} else {\n\t\ttree1 = isl_schedule_tree_sequence_pair(tree1, tree2);\n\t\tnode = graft_or_splice(node, tree1, 0);\n\t}\n\n\treturn node;\nerror:\n\tisl_schedule_node_free(node);\n\tisl_union_set_free(filter);\n\tisl_union_set_free(node_filter);\n\treturn NULL;\n}\n\n/* Split the domain elements that reach \"node\" into those that satisfy\n * \"filter\" and those that do not.  Arrange for the first subset to be\n * executed before the second subset.\n * Return a pointer to the tree corresponding to the second subset,\n * except when this subset is empty in which case the original pointer\n * is returned.\n */\n__isl_give isl_schedule_node *isl_schedule_node_order_before(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_set *filter)\n{\n\treturn isl_schedule_node_order_before_or_after(node, filter, 1);\n}\n\n/* Split the domain elements that reach \"node\" into those that satisfy\n * \"filter\" and those that do not.  Arrange for the first subset to be\n * executed after the second subset.\n * Return a pointer to the tree corresponding to the second subset,\n * except when this subset is empty in which case the original pointer\n * is returned.\n */\n__isl_give isl_schedule_node *isl_schedule_node_order_after(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_set *filter)\n{\n\treturn isl_schedule_node_order_before_or_after(node, filter, 0);\n}\n\n/* Reset the user pointer on all identifiers of parameters and tuples\n * in the schedule node \"node\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_reset_user(\n\t__isl_take isl_schedule_node *node)\n{\n\tisl_schedule_tree *tree;\n\n\ttree = isl_schedule_node_get_tree(node);\n\ttree = isl_schedule_tree_reset_user(tree);\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\n\treturn node;\n}\n\n/* Align the parameters of the schedule node \"node\" to those of \"space\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_align_params(\n\t__isl_take isl_schedule_node *node, __isl_take isl_space *space)\n{\n\tisl_schedule_tree *tree;\n\n\ttree = isl_schedule_node_get_tree(node);\n\ttree = isl_schedule_tree_align_params(tree, space);\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\n\treturn node;\n}\n\n/* Compute the pullback of schedule node \"node\"\n * by the function represented by \"upma\".\n * In other words, plug in \"upma\" in the iteration domains\n * of schedule node \"node\".\n * We currently do not handle expansion nodes.\n *\n * Note that this is only a helper function for\n * isl_schedule_pullback_union_pw_multi_aff.  In order to maintain consistency,\n * this function should not be called on a single node without also\n * calling it on all the other nodes.\n */\n__isl_give isl_schedule_node *isl_schedule_node_pullback_union_pw_multi_aff(\n\t__isl_take isl_schedule_node *node,\n\t__isl_take isl_union_pw_multi_aff *upma)\n{\n\tisl_schedule_tree *tree;\n\n\ttree = isl_schedule_node_get_tree(node);\n\ttree = isl_schedule_tree_pullback_union_pw_multi_aff(tree, upma);\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\n\treturn node;\n}\n\n/* Internal data structure for isl_schedule_node_expand.\n * \"tree\" is the tree that needs to be plugged in in all the leaves.\n * \"domain\" is the set of domain elements in the original leaves\n * to which the tree applies.\n */\nstruct isl_schedule_expand_data {\n\tisl_schedule_tree *tree;\n\tisl_union_set *domain;\n};\n\n/* If \"node\" is a leaf, then plug in data->tree, simplifying it\n * within its new context.\n *\n * If there are any domain elements at the leaf where the tree\n * should not be plugged in (i.e., there are elements not in data->domain)\n * then first extend the tree to only apply to the elements in data->domain\n * by constructing a set node that selects data->tree for elements\n * in data->domain and a leaf for the other elements.\n */\nstatic __isl_give isl_schedule_node *expand(__isl_take isl_schedule_node *node,\n\tvoid *user)\n{\n\tstruct isl_schedule_expand_data *data = user;\n\tisl_schedule_tree *tree, *leaf;\n\tisl_union_set *domain, *left;\n\tisl_bool empty;\n\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_leaf)\n\t\treturn node;\n\n\tdomain = isl_schedule_node_get_domain(node);\n\ttree = isl_schedule_tree_copy(data->tree);\n\n\tleft = isl_union_set_copy(domain);\n\tleft = isl_union_set_subtract(left, isl_union_set_copy(data->domain));\n\tempty = isl_union_set_is_empty(left);\n\tif (empty >= 0 && !empty) {\n\t\tleaf = isl_schedule_node_get_leaf(node);\n\t\tleaf = isl_schedule_tree_insert_filter(leaf, left);\n\t\tleft = isl_union_set_copy(data->domain);\n\t\ttree = isl_schedule_tree_insert_filter(tree, left);\n\t\ttree = isl_schedule_tree_set_pair(tree, leaf);\n\t} else {\n\t\tif (empty < 0)\n\t\t\tnode = isl_schedule_node_free(node);\n\t\tisl_union_set_free(left);\n\t}\n\n\tnode = isl_schedule_node_graft_tree(node, tree);\n\tnode = isl_schedule_node_gist(node, domain);\n\n\treturn node;\n}\n\n/* Expand the tree rooted at \"node\" by extending all leaves\n * with an expansion node with as child \"tree\".\n * The expansion is determined by \"contraction\" and \"domain\".\n * That is, the elements of \"domain\" are contracted according\n * to \"contraction\".  The expansion relation is then the inverse\n * of \"contraction\" with its range intersected with \"domain\".\n *\n * Insert the appropriate expansion node on top of \"tree\" and\n * then plug in the result in all leaves of \"node\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_expand(\n\t__isl_take isl_schedule_node *node,\n\t__isl_take isl_union_pw_multi_aff *contraction,\n\t__isl_take isl_union_set *domain,\n\t__isl_take isl_schedule_tree *tree)\n{\n\tstruct isl_schedule_expand_data data;\n\tisl_union_map *expansion;\n\tisl_union_pw_multi_aff *copy;\n\n\tif (!node || !contraction || !tree)\n\t\tnode = isl_schedule_node_free(node);\n\n\tcopy = isl_union_pw_multi_aff_copy(contraction);\n\texpansion = isl_union_map_from_union_pw_multi_aff(copy);\n\texpansion = isl_union_map_reverse(expansion);\n\texpansion = isl_union_map_intersect_range(expansion, domain);\n\tdata.domain = isl_union_map_domain(isl_union_map_copy(expansion));\n\n\ttree = isl_schedule_tree_insert_expansion(tree, contraction, expansion);\n\tdata.tree = tree;\n\n\tnode = isl_schedule_node_map_descendant_bottom_up(node, &expand, &data);\n\tisl_union_set_free(data.domain);\n\tisl_schedule_tree_free(data.tree);\n\treturn node;\n}\n\n/* Return the position of the subtree containing \"node\" among the children\n * of \"ancestor\".  \"node\" is assumed to be a descendant of \"ancestor\".\n * In particular, both nodes should point to the same schedule tree.\n *\n * Return isl_size_error on error.\n */\nisl_size isl_schedule_node_get_ancestor_child_position(\n\t__isl_keep isl_schedule_node *node,\n\t__isl_keep isl_schedule_node *ancestor)\n{\n\tisl_size n1, n2;\n\tisl_schedule_tree *tree;\n\n\tn1 = isl_schedule_node_get_tree_depth(ancestor);\n\tn2 = isl_schedule_node_get_tree_depth(node);\n\tif (n1 < 0 || n2 < 0)\n\t\treturn isl_size_error;\n\n\tif (node->schedule != ancestor->schedule)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"not a descendant\", return isl_size_error);\n\n\tif (n1 >= n2)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"not a descendant\", return isl_size_error);\n\ttree = isl_schedule_tree_list_get_schedule_tree(node->ancestors, n1);\n\tisl_schedule_tree_free(tree);\n\tif (tree != ancestor->tree)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"not a descendant\", return isl_size_error);\n\n\treturn node->child_pos[n1];\n}\n\n/* Given two nodes that point to the same schedule tree, return their\n * closest shared ancestor.\n *\n * Since the two nodes point to the same schedule, they share at least\n * one ancestor, the root of the schedule.  We move down from the root\n * to the first ancestor where the respective children have a different\n * child position.  This is the requested ancestor.\n * If there is no ancestor where the children have a different position,\n * then one node is an ancestor of the other and then this node is\n * the requested ancestor.\n */\n__isl_give isl_schedule_node *isl_schedule_node_get_shared_ancestor(\n\t__isl_keep isl_schedule_node *node1,\n\t__isl_keep isl_schedule_node *node2)\n{\n\tint i;\n\tisl_size n1, n2;\n\n\tn1 = isl_schedule_node_get_tree_depth(node1);\n\tn2 = isl_schedule_node_get_tree_depth(node2);\n\tif (n1 < 0 || n2 < 0)\n\t\treturn NULL;\n\tif (node1->schedule != node2->schedule)\n\t\tisl_die(isl_schedule_node_get_ctx(node1), isl_error_invalid,\n\t\t\t\"not part of same schedule\", return NULL);\n\tif (n2 < n1)\n\t\treturn isl_schedule_node_get_shared_ancestor(node2, node1);\n\tif (n1 == 0)\n\t\treturn isl_schedule_node_copy(node1);\n\tif (isl_schedule_node_is_equal(node1, node2))\n\t\treturn isl_schedule_node_copy(node1);\n\n\tfor (i = 0; i < n1; ++i)\n\t\tif (node1->child_pos[i] != node2->child_pos[i])\n\t\t\tbreak;\n\n\tnode1 = isl_schedule_node_copy(node1);\n\treturn isl_schedule_node_ancestor(node1, n1 - i);\n}\n\n/* Print \"node\" to \"p\".\n */\n__isl_give isl_printer *isl_printer_print_schedule_node(\n\t__isl_take isl_printer *p, __isl_keep isl_schedule_node *node)\n{\n\tisl_size n;\n\n\tif (!node)\n\t\treturn isl_printer_free(p);\n\tn = isl_schedule_tree_list_n_schedule_tree(node->ancestors);\n\tif (n < 0)\n\t\treturn isl_printer_free(p);\n\treturn isl_printer_print_schedule_tree_mark(p, node->schedule->root, n,\n\t\t\tnode->child_pos);\n}\n\nvoid isl_schedule_node_dump(__isl_keep isl_schedule_node *node)\n{\n\tisl_ctx *ctx;\n\tisl_printer *printer;\n\n\tif (!node)\n\t\treturn;\n\n\tctx = isl_schedule_node_get_ctx(node);\n\tprinter = isl_printer_to_file(ctx, stderr);\n\tprinter = isl_printer_set_yaml_style(printer, ISL_YAML_STYLE_BLOCK);\n\tprinter = isl_printer_print_schedule_node(printer, node);\n\n\tisl_printer_free(printer);\n}\n\n/* Return a string representation of \"node\".\n * Print the schedule node in block format as it would otherwise\n * look identical to the entire schedule.\n */\n__isl_give char *isl_schedule_node_to_str(__isl_keep isl_schedule_node *node)\n{\n\tisl_printer *printer;\n\tchar *s;\n\n\tif (!node)\n\t\treturn NULL;\n\n\tprinter = isl_printer_to_str(isl_schedule_node_get_ctx(node));\n\tprinter = isl_printer_set_yaml_style(printer, ISL_YAML_STYLE_BLOCK);\n\tprinter = isl_printer_print_schedule_node(printer, node);\n\ts = isl_printer_get_str(printer);\n\tisl_printer_free(printer);\n\n\treturn s;\n}\n\n/* AutoSA Extended */\n/* Return the space_time property of the band member position \"pos\" of the \n * band node \"node\". \n */\nenum autosa_loop_type isl_schedule_node_band_member_get_space_time(\n  __isl_keep isl_schedule_node *node, int pos)\n{\n  if (!node)\n    return autosa_loop_error;\n  return isl_schedule_tree_band_member_get_space_time(node->tree, pos);\n}\n\n/* Mark the band member at position \"pos\" of the band node \"node\"\n * as \"loop_type\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_band_member_set_space_time(\n  __isl_take isl_schedule_node *node, int pos, enum autosa_loop_type loop_type)\n{\n  enum autosa_loop_type t;\n  isl_schedule_tree *tree;\n\n  if (!node)\n    return NULL;\n  t = isl_schedule_node_band_member_get_space_time(node, pos);\n  if (t == loop_type)\n    return node;\n\n  tree = isl_schedule_tree_copy(node->tree);\n  tree = isl_schedule_tree_band_member_set_space_time(tree, pos, loop_type);\n  node = isl_schedule_node_graft_tree(node, tree);\n\n  return node;\n}\n\n/* Return the pe_opt property of the band member position \"pos\" of the \n * band node \"node\". \n */\nenum autosa_loop_type isl_schedule_node_band_member_get_pe_opt(\n  __isl_keep isl_schedule_node *node, int pos)\n{\n  if (!node)\n    return autosa_loop_error;\n  return isl_schedule_tree_band_member_get_pe_opt(node->tree, pos);\n}\n\n/* Mark the band member at position \"pos\" of the band node \"node\"\n * as \"loop_type\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_band_member_set_pe_opt(\n  __isl_take isl_schedule_node *node, int pos, enum autosa_loop_type loop_type)\n{\n  enum autosa_loop_type t;\n  isl_schedule_tree *tree;\n\n  if (!node)\n    return NULL;\n  t = isl_schedule_node_band_member_get_pe_opt(node, pos);\n  if (t == loop_type)\n    return node;\n\n  tree = isl_schedule_tree_copy(node->tree);\n  tree = isl_schedule_tree_band_member_set_pe_opt(tree, pos, loop_type);\n  node = isl_schedule_node_graft_tree(node, tree);\n\n  return node;\n}\n\n/* Return the sched_pos property of the band member position \"pos\" of the \n * band node \"node\". \n */\nint isl_schedule_node_band_member_get_sched_pos(\n  __isl_keep isl_schedule_node *node, int pos)\n{\n  if (!node)\n    return -1;\n  return isl_schedule_tree_band_member_get_sched_pos(node->tree, pos);\n}\n\n/* Mark the band member at position \"pos\" of the band node \"node\"\n * as \"sched_pos\".\n */\n__isl_give isl_schedule_node *isl_schedule_node_band_member_set_sched_pos(\n  __isl_take isl_schedule_node *node, int pos, int sched_pos)\n{\n  int sp;\n  isl_schedule_tree *tree;\n\n  if (!node)\n    return NULL;\n  sp = isl_schedule_node_band_member_get_sched_pos(node, pos);\n  if (sp == sched_pos)\n    return node;\n\n  tree = isl_schedule_tree_copy(node->tree);\n  tree = isl_schedule_tree_band_member_set_sched_pos(tree, pos, sched_pos);\n  node = isl_schedule_node_graft_tree(node, tree);\n\n  return node;\n}\n\nvoid *isl_schedule_node_band_member_get_iter(__isl_keep isl_schedule_node *node, int pos)\n{\n  if (!node)\n\treturn NULL;\n  return isl_schedule_tree_band_member_get_iter(node->tree, pos);\n}\n\n__isl_give isl_schedule_node *isl_schedule_node_band_member_set_iter(\n  __isl_take isl_schedule_node *node, int pos, void *iter) \n{\n  void *it;\n  isl_schedule_tree *tree;\n\n  if (!node)\n    return NULL;\n  it = isl_schedule_node_band_member_get_iter(node, pos);\n  if (it == iter)\n    return node;\n\n  tree = isl_schedule_tree_copy(node->tree);\n  tree = isl_schedule_tree_band_member_set_iter(tree, pos, iter);\n  node = isl_schedule_node_graft_tree(node, tree);\n\n  return node;\n}\n/* AutoSA Extended */"
  },
  {
    "path": "autosa_scripts/ppcg_changes/isl/isl_schedule_tree.c",
    "content": "/*\n * Copyright 2013-2014 Ecole Normale Superieure\n * Copyright 2014      INRIA Rocquencourt\n * Copyright 2016      INRIA Paris\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege,\n * Ecole Normale Superieure, 45 rue d'Ulm, 75230 Paris, France\n * and Inria Paris - Rocquencourt, Domaine de Voluceau - Rocquencourt,\n * B.P. 105 - 78153 Le Chesnay, France\n * and Centre de Recherche Inria de Paris, 2 rue Simone Iff - Voie DQ12,\n * CS 42112, 75589 Paris Cedex 12, France\n */\n\n#include <isl/id.h>\n#include <isl/val.h>\n#include <isl/space.h>\n#include <isl/map.h>\n#include <isl_schedule_band.h>\n#include <isl_schedule_private.h>\n\n#undef EL\n#define EL isl_schedule_tree\n\n#include <isl_list_templ.h>\n\n#undef EL_BASE\n#define EL_BASE schedule_tree\n\n#include <isl_list_templ.c>\n\n/* Is \"tree\" the leaf of a schedule tree?\n */\nint isl_schedule_tree_is_leaf(__isl_keep isl_schedule_tree *tree)\n{\n\treturn isl_schedule_tree_get_type(tree) == isl_schedule_node_leaf;\n}\n\n/* Create a new schedule tree of type \"type\".\n * The caller is responsible for filling in the type specific fields and\n * the children.\n *\n * By default, the single node tree does not have any anchored nodes.\n * The caller is responsible for updating the anchored field if needed.\n */\nstatic __isl_give isl_schedule_tree *isl_schedule_tree_alloc(isl_ctx *ctx,\n\tenum isl_schedule_node_type type)\n{\n\tisl_schedule_tree *tree;\n\n\tif (type == isl_schedule_node_error)\n\t\treturn NULL;\n\n\ttree = isl_calloc_type(ctx, isl_schedule_tree);\n\tif (!tree)\n\t\treturn NULL;\n\n\ttree->ref = 1;\n\ttree->ctx = ctx;\n\tisl_ctx_ref(ctx);\n\ttree->type = type;\n\ttree->anchored = 0;\n\n\treturn tree;\n}\n\n/* Return a fresh copy of \"tree\".\n */\n__isl_take isl_schedule_tree *isl_schedule_tree_dup(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\tisl_ctx *ctx;\n\tisl_schedule_tree *dup;\n\n\tif (!tree)\n\t\treturn NULL;\n\n\tctx = isl_schedule_tree_get_ctx(tree);\n\tdup = isl_schedule_tree_alloc(ctx, tree->type);\n\tif (!dup)\n\t\treturn NULL;\n\n\tswitch (tree->type) {\n\tcase isl_schedule_node_error:\n\t\tisl_die(ctx, isl_error_internal,\n\t\t\t\"allocation should have failed\",\n\t\t\treturn isl_schedule_tree_free(dup));\n\tcase isl_schedule_node_band:\n\t\tdup->band = isl_schedule_band_copy(tree->band);\n\t\tif (!dup->band)\n\t\t\treturn isl_schedule_tree_free(dup);\n\t\tbreak;\n\tcase isl_schedule_node_context:\n\t\tdup->context = isl_set_copy(tree->context);\n\t\tif (!dup->context)\n\t\t\treturn isl_schedule_tree_free(dup);\n\t\tbreak;\n\tcase isl_schedule_node_domain:\n\t\tdup->domain = isl_union_set_copy(tree->domain);\n\t\tif (!dup->domain)\n\t\t\treturn isl_schedule_tree_free(dup);\n\t\tbreak;\n\tcase isl_schedule_node_expansion:\n\t\tdup->contraction =\n\t\t\tisl_union_pw_multi_aff_copy(tree->contraction);\n\t\tdup->expansion = isl_union_map_copy(tree->expansion);\n\t\tif (!dup->contraction || !dup->expansion)\n\t\t\treturn isl_schedule_tree_free(dup);\n\t\tbreak;\n\tcase isl_schedule_node_extension:\n\t\tdup->extension = isl_union_map_copy(tree->extension);\n\t\tif (!dup->extension)\n\t\t\treturn isl_schedule_tree_free(dup);\n\t\tbreak;\n\tcase isl_schedule_node_filter:\n\t\tdup->filter = isl_union_set_copy(tree->filter);\n\t\tif (!dup->filter)\n\t\t\treturn isl_schedule_tree_free(dup);\n\t\tbreak;\n\tcase isl_schedule_node_guard:\n\t\tdup->guard = isl_set_copy(tree->guard);\n\t\tif (!dup->guard)\n\t\t\treturn isl_schedule_tree_free(dup);\n\t\tbreak;\n\tcase isl_schedule_node_mark:\n\t\tdup->mark = isl_id_copy(tree->mark);\n\t\tif (!dup->mark)\n\t\t\treturn isl_schedule_tree_free(dup);\n\t\tbreak;\n\tcase isl_schedule_node_leaf:\n\tcase isl_schedule_node_sequence:\n\tcase isl_schedule_node_set:\n\t\tbreak;\n\t}\n\n\tif (tree->children) {\n\t\tdup->children = isl_schedule_tree_list_copy(tree->children);\n\t\tif (!dup->children)\n\t\t\treturn isl_schedule_tree_free(dup);\n\t}\n\tdup->anchored = tree->anchored;\n\n\treturn dup;\n}\n\n/* Return an isl_schedule_tree that is equal to \"tree\" and that has only\n * a single reference.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_cow(\n\t__isl_take isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn NULL;\n\n\tif (tree->ref == 1)\n\t\treturn tree;\n\ttree->ref--;\n\treturn isl_schedule_tree_dup(tree);\n}\n\n/* Return a new reference to \"tree\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_copy(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn NULL;\n\n\ttree->ref++;\n\treturn tree;\n}\n\n/* Free \"tree\" and return NULL.\n */\n__isl_null isl_schedule_tree *isl_schedule_tree_free(\n\t__isl_take isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn NULL;\n\tif (--tree->ref > 0)\n\t\treturn NULL;\n\n\tswitch (tree->type) {\n\tcase isl_schedule_node_band:\n\t\tisl_schedule_band_free(tree->band);\n\t\tbreak;\n\tcase isl_schedule_node_context:\n\t\tisl_set_free(tree->context);\n\t\tbreak;\n\tcase isl_schedule_node_domain:\n\t\tisl_union_set_free(tree->domain);\n\t\tbreak;\n\tcase isl_schedule_node_expansion:\n\t\tisl_union_pw_multi_aff_free(tree->contraction);\n\t\tisl_union_map_free(tree->expansion);\n\t\tbreak;\n\tcase isl_schedule_node_extension:\n\t\tisl_union_map_free(tree->extension);\n\t\tbreak;\n\tcase isl_schedule_node_filter:\n\t\tisl_union_set_free(tree->filter);\n\t\tbreak;\n\tcase isl_schedule_node_guard:\n\t\tisl_set_free(tree->guard);\n\t\tbreak;\n\tcase isl_schedule_node_mark:\n\t\tisl_id_free(tree->mark);\n\t\tbreak;\n\tcase isl_schedule_node_sequence:\n\tcase isl_schedule_node_set:\n\tcase isl_schedule_node_error:\n\tcase isl_schedule_node_leaf:\n\t\tbreak;\n\t}\n\tisl_schedule_tree_list_free(tree->children);\n\tisl_ctx_deref(tree->ctx);\n\tfree(tree);\n\n\treturn NULL;\n}\n\n/* Create and return a new leaf schedule tree.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_leaf(isl_ctx *ctx)\n{\n\treturn isl_schedule_tree_alloc(ctx, isl_schedule_node_leaf);\n}\n\n/* Create a new band schedule tree referring to \"band\"\n * with no children.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_from_band(\n\t__isl_take isl_schedule_band *band)\n{\n\tisl_ctx *ctx;\n\tisl_schedule_tree *tree;\n\n\tif (!band)\n\t\treturn NULL;\n\n\tctx = isl_schedule_band_get_ctx(band);\n\ttree = isl_schedule_tree_alloc(ctx, isl_schedule_node_band);\n\tif (!tree)\n\t\tgoto error;\n\n\ttree->band = band;\n\ttree->anchored = isl_schedule_band_is_anchored(band);\n\n\treturn tree;\nerror:\n\tisl_schedule_band_free(band);\n\treturn NULL;\n}\n\n/* Create a new context schedule tree with the given context and no children.\n * Since the context references the outer schedule dimension,\n * the tree is anchored.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_from_context(\n\t__isl_take isl_set *context)\n{\n\tisl_ctx *ctx;\n\tisl_schedule_tree *tree;\n\n\tif (!context)\n\t\treturn NULL;\n\n\tctx = isl_set_get_ctx(context);\n\ttree = isl_schedule_tree_alloc(ctx, isl_schedule_node_context);\n\tif (!tree)\n\t\tgoto error;\n\n\ttree->context = context;\n\ttree->anchored = 1;\n\n\treturn tree;\nerror:\n\tisl_set_free(context);\n\treturn NULL;\n}\n\n/* Create a new domain schedule tree with the given domain and no children.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_from_domain(\n\t__isl_take isl_union_set *domain)\n{\n\tisl_ctx *ctx;\n\tisl_schedule_tree *tree;\n\n\tif (!domain)\n\t\treturn NULL;\n\n\tctx = isl_union_set_get_ctx(domain);\n\ttree = isl_schedule_tree_alloc(ctx, isl_schedule_node_domain);\n\tif (!tree)\n\t\tgoto error;\n\n\ttree->domain = domain;\n\n\treturn tree;\nerror:\n\tisl_union_set_free(domain);\n\treturn NULL;\n}\n\n/* Create a new expansion schedule tree with the given contraction and\n * expansion and no children.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_from_expansion(\n\t__isl_take isl_union_pw_multi_aff *contraction,\n\t__isl_take isl_union_map *expansion)\n{\n\tisl_ctx *ctx;\n\tisl_schedule_tree *tree;\n\n\tif (!contraction || !expansion)\n\t\tgoto error;\n\n\tctx = isl_union_map_get_ctx(expansion);\n\ttree = isl_schedule_tree_alloc(ctx, isl_schedule_node_expansion);\n\tif (!tree)\n\t\tgoto error;\n\n\ttree->contraction = contraction;\n\ttree->expansion = expansion;\n\n\treturn tree;\nerror:\n\tisl_union_pw_multi_aff_free(contraction);\n\tisl_union_map_free(expansion);\n\treturn NULL;\n}\n\n/* Create a new extension schedule tree with the given extension and\n * no children.\n * Since the domain of the extension refers to the outer schedule dimension,\n * the tree is anchored.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_from_extension(\n\t__isl_take isl_union_map *extension)\n{\n\tisl_ctx *ctx;\n\tisl_schedule_tree *tree;\n\n\tif (!extension)\n\t\treturn NULL;\n\n\tctx = isl_union_map_get_ctx(extension);\n\ttree = isl_schedule_tree_alloc(ctx, isl_schedule_node_extension);\n\tif (!tree)\n\t\tgoto error;\n\n\ttree->extension = extension;\n\ttree->anchored = 1;\n\n\treturn tree;\nerror:\n\tisl_union_map_free(extension);\n\treturn NULL;\n}\n\n/* Create a new filter schedule tree with the given filter and no children.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_from_filter(\n\t__isl_take isl_union_set *filter)\n{\n\tisl_ctx *ctx;\n\tisl_schedule_tree *tree;\n\n\tif (!filter)\n\t\treturn NULL;\n\n\tctx = isl_union_set_get_ctx(filter);\n\ttree = isl_schedule_tree_alloc(ctx, isl_schedule_node_filter);\n\tif (!tree)\n\t\tgoto error;\n\n\ttree->filter = filter;\n\n\treturn tree;\nerror:\n\tisl_union_set_free(filter);\n\treturn NULL;\n}\n\n/* Create a new guard schedule tree with the given guard and no children.\n * Since the guard references the outer schedule dimension,\n * the tree is anchored.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_from_guard(\n\t__isl_take isl_set *guard)\n{\n\tisl_ctx *ctx;\n\tisl_schedule_tree *tree;\n\n\tif (!guard)\n\t\treturn NULL;\n\n\tctx = isl_set_get_ctx(guard);\n\ttree = isl_schedule_tree_alloc(ctx, isl_schedule_node_guard);\n\tif (!tree)\n\t\tgoto error;\n\n\ttree->guard = guard;\n\ttree->anchored = 1;\n\n\treturn tree;\nerror:\n\tisl_set_free(guard);\n\treturn NULL;\n}\n\n/* Create a new mark schedule tree with the given mark identifier and\n * no children.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_from_mark(\n\t__isl_take isl_id *mark)\n{\n\tisl_ctx *ctx;\n\tisl_schedule_tree *tree;\n\n\tif (!mark)\n\t\treturn NULL;\n\n\tctx = isl_id_get_ctx(mark);\n\ttree = isl_schedule_tree_alloc(ctx, isl_schedule_node_mark);\n\tif (!tree)\n\t\tgoto error;\n\n\ttree->mark = mark;\n\n\treturn tree;\nerror:\n\tisl_id_free(mark);\n\treturn NULL;\n}\n\n/* Does \"tree\" have any node that depends on its position\n * in the complete schedule tree?\n */\nisl_bool isl_schedule_tree_is_subtree_anchored(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\treturn tree ? isl_bool_ok(tree->anchored) : isl_bool_error;\n}\n\n/* Does the root node of \"tree\" depend on its position in the complete\n * schedule tree?\n * Band nodes may be anchored depending on the associated AST build options.\n * Context, extension and guard nodes are always anchored.\n */\nint isl_schedule_tree_is_anchored(__isl_keep isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn -1;\n\n\tswitch (isl_schedule_tree_get_type(tree)) {\n\tcase isl_schedule_node_error:\n\t\treturn -1;\n\tcase isl_schedule_node_band:\n\t\treturn isl_schedule_band_is_anchored(tree->band);\n\tcase isl_schedule_node_context:\n\tcase isl_schedule_node_extension:\n\tcase isl_schedule_node_guard:\n\t\treturn 1;\n\tcase isl_schedule_node_domain:\n\tcase isl_schedule_node_expansion:\n\tcase isl_schedule_node_filter:\n\tcase isl_schedule_node_leaf:\n\tcase isl_schedule_node_mark:\n\tcase isl_schedule_node_sequence:\n\tcase isl_schedule_node_set:\n\t\treturn 0;\n\t}\n\n\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_internal,\n\t\t\"unhandled case\", return -1);\n}\n\n/* Update the anchored field of \"tree\" based on whether the root node\n * itself in anchored and the anchored fields of the children.\n *\n * This function should be called whenever the children of a tree node\n * are changed or the anchoredness of the tree root itself changes.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_update_anchored(\n\t__isl_take isl_schedule_tree *tree)\n{\n\tint i;\n\tisl_size n;\n\tint anchored;\n\n\tanchored = isl_schedule_tree_is_anchored(tree);\n\tn = isl_schedule_tree_n_children(tree);\n\tif (anchored < 0 || n < 0)\n\t\treturn isl_schedule_tree_free(tree);\n\n\tfor (i = 0; !anchored && i < n; ++i) {\n\t\tisl_schedule_tree *child;\n\n\t\tchild = isl_schedule_tree_get_child(tree, i);\n\t\tif (!child)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tanchored = child->anchored;\n\t\tisl_schedule_tree_free(child);\n\t}\n\n\tif (anchored == tree->anchored)\n\t\treturn tree;\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree)\n\t\treturn NULL;\n\ttree->anchored = anchored;\n\treturn tree;\n}\n\n/* Create a new tree of the given type (isl_schedule_node_sequence or\n * isl_schedule_node_set) with the given children.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_from_children(\n\tenum isl_schedule_node_type type,\n\t__isl_take isl_schedule_tree_list *list)\n{\n\tisl_ctx *ctx;\n\tisl_schedule_tree *tree;\n\n\tif (!list)\n\t\treturn NULL;\n\n\tctx = isl_schedule_tree_list_get_ctx(list);\n\ttree = isl_schedule_tree_alloc(ctx, type);\n\tif (!tree)\n\t\tgoto error;\n\n\ttree->children = list;\n\ttree = isl_schedule_tree_update_anchored(tree);\n\n\treturn tree;\nerror:\n\tisl_schedule_tree_list_free(list);\n\treturn NULL;\n}\n\n/* Construct a tree with a root node of type \"type\" and as children\n * \"tree1\" and \"tree2\".\n * If the root of one (or both) of the input trees is itself of type \"type\",\n * then the tree is replaced by its children.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_from_pair(\n\tenum isl_schedule_node_type type, __isl_take isl_schedule_tree *tree1,\n\t__isl_take isl_schedule_tree *tree2)\n{\n\tisl_ctx *ctx;\n\tisl_schedule_tree_list *list;\n\n\tif (!tree1 || !tree2)\n\t\tgoto error;\n\n\tctx = isl_schedule_tree_get_ctx(tree1);\n\tif (isl_schedule_tree_get_type(tree1) == type) {\n\t\tlist = isl_schedule_tree_list_copy(tree1->children);\n\t\tisl_schedule_tree_free(tree1);\n\t} else {\n\t\tlist = isl_schedule_tree_list_alloc(ctx, 2);\n\t\tlist = isl_schedule_tree_list_add(list, tree1);\n\t}\n\tif (isl_schedule_tree_get_type(tree2) == type) {\n\t\tisl_schedule_tree_list *children;\n\n\t\tchildren = isl_schedule_tree_list_copy(tree2->children);\n\t\tlist = isl_schedule_tree_list_concat(list, children);\n\t\tisl_schedule_tree_free(tree2);\n\t} else {\n\t\tlist = isl_schedule_tree_list_add(list, tree2);\n\t}\n\n\treturn isl_schedule_tree_from_children(type, list);\nerror:\n\tisl_schedule_tree_free(tree1);\n\tisl_schedule_tree_free(tree2);\n\treturn NULL;\n}\n\n/* Construct a tree with a sequence root node and as children\n * \"tree1\" and \"tree2\".\n * If the root of one (or both) of the input trees is itself a sequence,\n * then the tree is replaced by its children.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_sequence_pair(\n\t__isl_take isl_schedule_tree *tree1,\n\t__isl_take isl_schedule_tree *tree2)\n{\n\treturn isl_schedule_tree_from_pair(isl_schedule_node_sequence,\n\t\t\t\t\t\ttree1, tree2);\n}\n\n/* Construct a tree with a set root node and as children\n * \"tree1\" and \"tree2\".\n * If the root of one (or both) of the input trees is itself a set,\n * then the tree is replaced by its children.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_set_pair(\n\t__isl_take isl_schedule_tree *tree1,\n\t__isl_take isl_schedule_tree *tree2)\n{\n\treturn isl_schedule_tree_from_pair(isl_schedule_node_set, tree1, tree2);\n}\n\n/* Return the isl_ctx to which \"tree\" belongs.\n */\nisl_ctx *isl_schedule_tree_get_ctx(__isl_keep isl_schedule_tree *tree)\n{\n\treturn tree ? tree->ctx : NULL;\n}\n\n/* Return the type of the root of the tree or isl_schedule_node_error\n * on error.\n */\nenum isl_schedule_node_type isl_schedule_tree_get_type(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\treturn tree ? tree->type : isl_schedule_node_error;\n}\n\n/* Are \"tree1\" and \"tree2\" obviously equal to each other?\n */\nisl_bool isl_schedule_tree_plain_is_equal(__isl_keep isl_schedule_tree *tree1,\n\t__isl_keep isl_schedule_tree *tree2)\n{\n\tisl_bool equal;\n\tint i;\n\tisl_size n1, n2;\n\n\tif (!tree1 || !tree2)\n\t\treturn isl_bool_error;\n\tif (tree1 == tree2)\n\t\treturn isl_bool_true;\n\tif (tree1->type != tree2->type)\n\t\treturn isl_bool_false;\n\n\tswitch (tree1->type) {\n\tcase isl_schedule_node_band:\n\t\tequal = isl_schedule_band_plain_is_equal(tree1->band,\n\t\t\t\t\t\t\ttree2->band);\n\t\tbreak;\n\tcase isl_schedule_node_context:\n\t\tequal = isl_set_is_equal(tree1->context, tree2->context);\n\t\tbreak;\n\tcase isl_schedule_node_domain:\n\t\tequal = isl_union_set_is_equal(tree1->domain, tree2->domain);\n\t\tbreak;\n\tcase isl_schedule_node_expansion:\n\t\tequal = isl_union_map_is_equal(tree1->expansion,\n\t\t\t\t\t\ttree2->expansion);\n\t\tif (equal >= 0 && equal)\n\t\t\tequal = isl_union_pw_multi_aff_plain_is_equal(\n\t\t\t\t    tree1->contraction, tree2->contraction);\n\t\tbreak;\n\tcase isl_schedule_node_extension:\n\t\tequal = isl_union_map_is_equal(tree1->extension,\n\t\t\t\t\t\ttree2->extension);\n\t\tbreak;\n\tcase isl_schedule_node_filter:\n\t\tequal = isl_union_set_is_equal(tree1->filter, tree2->filter);\n\t\tbreak;\n\tcase isl_schedule_node_guard:\n\t\tequal = isl_set_is_equal(tree1->guard, tree2->guard);\n\t\tbreak;\n\tcase isl_schedule_node_mark:\n\t\tequal = isl_bool_ok(tree1->mark == tree2->mark);\n\t\tbreak;\n\tcase isl_schedule_node_leaf:\n\tcase isl_schedule_node_sequence:\n\tcase isl_schedule_node_set:\n\t\tequal = isl_bool_true;\n\t\tbreak;\n\tcase isl_schedule_node_error:\n\t\tequal = isl_bool_error;\n\t\tbreak;\n\t}\n\n\tif (equal < 0 || !equal)\n\t\treturn equal;\n\n\tn1 = isl_schedule_tree_n_children(tree1);\n\tn2 = isl_schedule_tree_n_children(tree2);\n\tif (n1 < 0 || n2 < 0)\n\t\treturn isl_bool_error;\n\tif (n1 != n2)\n\t\treturn isl_bool_false;\n\tfor (i = 0; i < n1; ++i) {\n\t\tisl_schedule_tree *child1, *child2;\n\n\t\tchild1 = isl_schedule_tree_get_child(tree1, i);\n\t\tchild2 = isl_schedule_tree_get_child(tree2, i);\n\t\tequal = isl_schedule_tree_plain_is_equal(child1, child2);\n\t\tisl_schedule_tree_free(child1);\n\t\tisl_schedule_tree_free(child2);\n\n\t\tif (equal < 0 || !equal)\n\t\t\treturn equal;\n\t}\n\n\treturn isl_bool_true;\n}\n\n/* Does \"tree\" have any children, other than an implicit leaf.\n */\nint isl_schedule_tree_has_children(__isl_keep isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn -1;\n\n\treturn tree->children != NULL;\n}\n\n/* Return the number of children of \"tree\", excluding implicit leaves.\n * The \"children\" field is NULL if there are\n * no children (except for the implicit leaves).\n */\nisl_size isl_schedule_tree_n_children(__isl_keep isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn isl_size_error;\n\n\tif (!tree->children)\n\t\treturn 0;\n\treturn isl_schedule_tree_list_n_schedule_tree(tree->children);\n}\n\n/* Return a copy of the (explicit) child at position \"pos\" of \"tree\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_get_child(\n\t__isl_keep isl_schedule_tree *tree, int pos)\n{\n\tif (!tree)\n\t\treturn NULL;\n\tif (!tree->children)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_internal,\n\t\t\t\"schedule tree has no explicit children\", return NULL);\n\treturn isl_schedule_tree_list_get_schedule_tree(tree->children, pos);\n}\n\n/* Return a copy of the (explicit) child at position \"pos\" of \"tree\" and\n * free \"tree\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_child(\n\t__isl_take isl_schedule_tree *tree, int pos)\n{\n\tisl_schedule_tree *child;\n\n\tchild = isl_schedule_tree_get_child(tree, pos);\n\tisl_schedule_tree_free(tree);\n\treturn child;\n}\n\n/* Remove all (explicit) children from \"tree\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_reset_children(\n\t__isl_take isl_schedule_tree *tree)\n{\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree)\n\t\treturn NULL;\n\ttree->children = isl_schedule_tree_list_free(tree->children);\n\treturn tree;\n}\n\n/* Remove the child at position \"pos\" from the children of \"tree\".\n * If there was only one child to begin with, then remove all children.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_drop_child(\n\t__isl_take isl_schedule_tree *tree, int pos)\n{\n\tisl_size n;\n\n\ttree = isl_schedule_tree_cow(tree);\n\n\tn = isl_schedule_tree_n_children(tree);\n\tif (n < 0)\n\t\treturn isl_schedule_tree_free(tree);\n\tif (n == 0)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"tree does not have any explicit children\",\n\t\t\treturn isl_schedule_tree_free(tree));\n\tif (pos < 0 || pos >= n)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"position out of bounds\",\n\t\t\treturn isl_schedule_tree_free(tree));\n\tif (n == 1)\n\t\treturn isl_schedule_tree_reset_children(tree);\n\n\ttree->children = isl_schedule_tree_list_drop(tree->children, pos, 1);\n\tif (!tree->children)\n\t\treturn isl_schedule_tree_free(tree);\n\n\treturn tree;\n}\n\n/* Replace the child at position \"pos\" of \"tree\" by \"child\".\n *\n * If the new child is a leaf, then it is not explicitly\n * recorded in the list of children.  Instead, the list of children\n * (which is assumed to have only one element) is removed.\n * Note that the children of set and sequence nodes are always\n * filters, so they cannot be replaced by empty trees.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_replace_child(\n\t__isl_take isl_schedule_tree *tree, int pos,\n\t__isl_take isl_schedule_tree *child)\n{\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree || !child)\n\t\tgoto error;\n\n\tif (isl_schedule_tree_is_leaf(child)) {\n\t\tisl_size n;\n\n\t\tisl_schedule_tree_free(child);\n\t\tif (!tree->children && pos == 0)\n\t\t\treturn tree;\n\t\tn = isl_schedule_tree_n_children(tree);\n\t\tif (n < 0)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tif (n != 1)\n\t\t\tisl_die(isl_schedule_tree_get_ctx(tree),\n\t\t\t\tisl_error_internal,\n\t\t\t\t\"can only replace single child by leaf\",\n\t\t\t\tgoto error);\n\t\treturn isl_schedule_tree_reset_children(tree);\n\t}\n\n\tif (!tree->children && pos == 0)\n\t\ttree->children =\n\t\t\tisl_schedule_tree_list_from_schedule_tree(child);\n\telse\n\t\ttree->children = isl_schedule_tree_list_set_schedule_tree(\n\t\t\t\ttree->children, pos, child);\n\n\tif (!tree->children)\n\t\treturn isl_schedule_tree_free(tree);\n\ttree = isl_schedule_tree_update_anchored(tree);\n\n\treturn tree;\nerror:\n\tisl_schedule_tree_free(tree);\n\tisl_schedule_tree_free(child);\n\treturn NULL;\n}\n\n/* Replace the (explicit) children of \"tree\" by \"children\"?\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_set_children(\n\t__isl_take isl_schedule_tree *tree,\n\t__isl_take isl_schedule_tree_list *children)\n{\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree || !children)\n\t\tgoto error;\n\tisl_schedule_tree_list_free(tree->children);\n\ttree->children = children;\n\treturn tree;\nerror:\n\tisl_schedule_tree_free(tree);\n\tisl_schedule_tree_list_free(children);\n\treturn NULL;\n}\n\n/* Create a new band schedule tree referring to \"band\"\n * with \"tree\" as single child.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_insert_band(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_schedule_band *band)\n{\n\tisl_schedule_tree *res;\n\n\tres = isl_schedule_tree_from_band(band);\n\treturn isl_schedule_tree_replace_child(res, 0, tree);\n}\n\n/* Create a new context schedule tree with the given context and\n * with \"tree\" as single child.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_insert_context(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_set *context)\n{\n\tisl_schedule_tree *res;\n\n\tres = isl_schedule_tree_from_context(context);\n\treturn isl_schedule_tree_replace_child(res, 0, tree);\n}\n\n/* Create a new domain schedule tree with the given domain and\n * with \"tree\" as single child.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_insert_domain(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *domain)\n{\n\tisl_schedule_tree *res;\n\n\tres = isl_schedule_tree_from_domain(domain);\n\treturn isl_schedule_tree_replace_child(res, 0, tree);\n}\n\n/* Create a new expansion schedule tree with the given contraction and\n * expansion and with \"tree\" as single child.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_insert_expansion(\n\t__isl_take isl_schedule_tree *tree,\n\t__isl_take isl_union_pw_multi_aff *contraction,\n\t__isl_take isl_union_map *expansion)\n{\n\tisl_schedule_tree *res;\n\n\tres = isl_schedule_tree_from_expansion(contraction, expansion);\n\treturn isl_schedule_tree_replace_child(res, 0, tree);\n}\n\n/* Create a new extension schedule tree with the given extension and\n * with \"tree\" as single child.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_insert_extension(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_map *extension)\n{\n\tisl_schedule_tree *res;\n\n\tres = isl_schedule_tree_from_extension(extension);\n\treturn isl_schedule_tree_replace_child(res, 0, tree);\n}\n\n/* Create a new filter schedule tree with the given filter and single child.\n *\n * If the root of \"tree\" is itself a filter node, then the two\n * filter nodes are merged into one node.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_insert_filter(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *filter)\n{\n\tisl_schedule_tree *res;\n\n\tif (isl_schedule_tree_get_type(tree) == isl_schedule_node_filter) {\n\t\tisl_union_set *tree_filter;\n\n\t\ttree_filter = isl_schedule_tree_filter_get_filter(tree);\n\t\ttree_filter = isl_union_set_intersect(tree_filter, filter);\n\t\ttree = isl_schedule_tree_filter_set_filter(tree, tree_filter);\n\t\treturn tree;\n\t}\n\n\tres = isl_schedule_tree_from_filter(filter);\n\treturn isl_schedule_tree_replace_child(res, 0, tree);\n}\n\n/* Insert a filter node with filter set \"filter\"\n * in each of the children of \"tree\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_children_insert_filter(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *filter)\n{\n\tint i;\n\tisl_size n;\n\n\tn = isl_schedule_tree_n_children(tree);\n\tif (n < 0 || !filter)\n\t\tgoto error;\n\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_schedule_tree *child;\n\n\t\tchild = isl_schedule_tree_get_child(tree, i);\n\t\tchild = isl_schedule_tree_insert_filter(child,\n\t\t\t\t\t\t    isl_union_set_copy(filter));\n\t\ttree = isl_schedule_tree_replace_child(tree, i, child);\n\t}\n\n\tisl_union_set_free(filter);\n\treturn tree;\nerror:\n\tisl_union_set_free(filter);\n\tisl_schedule_tree_free(tree);\n\treturn NULL;\n}\n\n/* Create a new guard schedule tree with the given guard and\n * with \"tree\" as single child.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_insert_guard(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_set *guard)\n{\n\tisl_schedule_tree *res;\n\n\tres = isl_schedule_tree_from_guard(guard);\n\treturn isl_schedule_tree_replace_child(res, 0, tree);\n}\n\n/* Create a new mark schedule tree with the given mark identifier and\n * single child.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_insert_mark(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_id *mark)\n{\n\tisl_schedule_tree *res;\n\n\tres = isl_schedule_tree_from_mark(mark);\n\treturn isl_schedule_tree_replace_child(res, 0, tree);\n}\n\n/* Return the number of members in the band tree root.\n */\nisl_size isl_schedule_tree_band_n_member(__isl_keep isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn isl_size_error;\n\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", return isl_size_error);\n\n\treturn isl_schedule_band_n_member(tree->band);\n}\n\n/* Is the band member at position \"pos\" of the band tree root\n * marked coincident?\n */\nisl_bool isl_schedule_tree_band_member_get_coincident(\n\t__isl_keep isl_schedule_tree *tree, int pos)\n{\n\tif (!tree)\n\t\treturn isl_bool_error;\n\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", return isl_bool_error);\n\n\treturn isl_schedule_band_member_get_coincident(tree->band, pos);\n}\n\n/* Mark the given band member as being coincident or not\n * according to \"coincident\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_member_set_coincident(\n\t__isl_take isl_schedule_tree *tree, int pos, int coincident)\n{\n\tif (!tree)\n\t\treturn NULL;\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", return isl_schedule_tree_free(tree));\n\tif (isl_schedule_tree_band_member_get_coincident(tree, pos) ==\n\t\t\t\t\t\t\t\t    coincident)\n\t\treturn tree;\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree)\n\t\treturn NULL;\n\n\ttree->band = isl_schedule_band_member_set_coincident(tree->band, pos,\n\t\t\t\t\t\t\tcoincident);\n\tif (!tree->band)\n\t\treturn isl_schedule_tree_free(tree);\n\treturn tree;\n}\n\n/* Is the band tree root marked permutable?\n */\nisl_bool isl_schedule_tree_band_get_permutable(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn isl_bool_error;\n\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", return isl_bool_error);\n\n\treturn isl_schedule_band_get_permutable(tree->band);\n}\n\n/* Mark the band tree root permutable or not according to \"permutable\"?\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_set_permutable(\n\t__isl_take isl_schedule_tree *tree, int permutable)\n{\n\tif (!tree)\n\t\treturn NULL;\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", return isl_schedule_tree_free(tree));\n\tif (isl_schedule_tree_band_get_permutable(tree) == permutable)\n\t\treturn tree;\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree)\n\t\treturn NULL;\n\n\ttree->band = isl_schedule_band_set_permutable(tree->band, permutable);\n\tif (!tree->band)\n\t\treturn isl_schedule_tree_free(tree);\n\treturn tree;\n}\n\n/* Return the schedule space of the band tree root.\n */\n__isl_give isl_space *isl_schedule_tree_band_get_space(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn NULL;\n\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", return NULL);\n\n\treturn isl_schedule_band_get_space(tree->band);\n}\n\n/* Intersect the domain of the band schedule of the band tree root\n * with \"domain\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_intersect_domain(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *domain)\n{\n\tif (!tree || !domain)\n\t\tgoto error;\n\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", goto error);\n\n\ttree->band = isl_schedule_band_intersect_domain(tree->band, domain);\n\tif (!tree->band)\n\t\treturn isl_schedule_tree_free(tree);\n\n\treturn tree;\nerror:\n\tisl_schedule_tree_free(tree);\n\tisl_union_set_free(domain);\n\treturn NULL;\n}\n\n/* Return the schedule of the band tree root in isolation.\n */\n__isl_give isl_multi_union_pw_aff *isl_schedule_tree_band_get_partial_schedule(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn NULL;\n\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", return NULL);\n\n\treturn isl_schedule_band_get_partial_schedule(tree->band);\n}\n\n/* Replace the schedule of the band tree root by \"schedule\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_set_partial_schedule(\n\t__isl_take isl_schedule_tree *tree,\n\t__isl_take isl_multi_union_pw_aff *schedule)\n{\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree || !schedule)\n\t\tgoto error;\n\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", return NULL);\n\ttree->band = isl_schedule_band_set_partial_schedule(tree->band,\n\t\t\t\t\t\t\t\tschedule);\n\n\treturn tree;\nerror:\n\tisl_schedule_tree_free(tree);\n\tisl_multi_union_pw_aff_free(schedule);\n\treturn NULL;\n}\n\n/* Return the loop AST generation type for the band member\n * of the band tree root at position \"pos\".\n */\nenum isl_ast_loop_type isl_schedule_tree_band_member_get_ast_loop_type(\n\t__isl_keep isl_schedule_tree *tree, int pos)\n{\n\tif (!tree)\n\t\treturn isl_ast_loop_error;\n\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", return isl_ast_loop_error);\n\n\treturn isl_schedule_band_member_get_ast_loop_type(tree->band, pos);\n}\n\n/* Set the loop AST generation type for the band member of the band tree root\n * at position \"pos\" to \"type\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_member_set_ast_loop_type(\n\t__isl_take isl_schedule_tree *tree, int pos,\n\tenum isl_ast_loop_type type)\n{\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree)\n\t\treturn NULL;\n\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", return isl_schedule_tree_free(tree));\n\n\ttree->band = isl_schedule_band_member_set_ast_loop_type(tree->band,\n\t\t\t\t\t\t\t\tpos, type);\n\tif (!tree->band)\n\t\treturn isl_schedule_tree_free(tree);\n\n\treturn tree;\n}\n\n/* Return the loop AST generation type for the band member\n * of the band tree root at position \"pos\" for the isolated part.\n */\nenum isl_ast_loop_type isl_schedule_tree_band_member_get_isolate_ast_loop_type(\n\t__isl_keep isl_schedule_tree *tree, int pos)\n{\n\tif (!tree)\n\t\treturn isl_ast_loop_error;\n\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", return isl_ast_loop_error);\n\n\treturn isl_schedule_band_member_get_isolate_ast_loop_type(tree->band,\n\t\t\t\t\t\t\t\t\tpos);\n}\n\n/* Set the loop AST generation type for the band member of the band tree root\n * at position \"pos\" for the isolated part to \"type\".\n */\n__isl_give isl_schedule_tree *\nisl_schedule_tree_band_member_set_isolate_ast_loop_type(\n\t__isl_take isl_schedule_tree *tree, int pos,\n\tenum isl_ast_loop_type type)\n{\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree)\n\t\treturn NULL;\n\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", return isl_schedule_tree_free(tree));\n\n\ttree->band = isl_schedule_band_member_set_isolate_ast_loop_type(\n\t\t\t\t\t\t\ttree->band, pos, type);\n\tif (!tree->band)\n\t\treturn isl_schedule_tree_free(tree);\n\n\treturn tree;\n}\n\n/* Return the AST build options associated to the band tree root.\n */\n__isl_give isl_union_set *isl_schedule_tree_band_get_ast_build_options(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn NULL;\n\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", return NULL);\n\n\treturn isl_schedule_band_get_ast_build_options(tree->band);\n}\n\n/* Replace the AST build options associated to band tree root by \"options\".\n * Updated the anchored field if the anchoredness of the root node itself\n * changes.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_set_ast_build_options(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *options)\n{\n\tint was_anchored;\n\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree || !options)\n\t\tgoto error;\n\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", goto error);\n\n\twas_anchored = isl_schedule_tree_is_anchored(tree);\n\ttree->band = isl_schedule_band_set_ast_build_options(tree->band,\n\t\t\t\t\t\t\t\toptions);\n\tif (!tree->band)\n\t\treturn isl_schedule_tree_free(tree);\n\tif (isl_schedule_tree_is_anchored(tree) != was_anchored)\n\t\ttree = isl_schedule_tree_update_anchored(tree);\n\n\treturn tree;\nerror:\n\tisl_schedule_tree_free(tree);\n\tisl_union_set_free(options);\n\treturn NULL;\n}\n\n/* Return the \"isolate\" option associated to the band tree root of \"tree\",\n * which is assumed to appear at schedule depth \"depth\".\n */\n__isl_give isl_set *isl_schedule_tree_band_get_ast_isolate_option(\n\t__isl_keep isl_schedule_tree *tree, int depth)\n{\n\tif (!tree)\n\t\treturn NULL;\n\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", return NULL);\n\n\treturn isl_schedule_band_get_ast_isolate_option(tree->band, depth);\n}\n\n/* Return the context of the context tree root.\n */\n__isl_give isl_set *isl_schedule_tree_context_get_context(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn NULL;\n\n\tif (tree->type != isl_schedule_node_context)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a context node\", return NULL);\n\n\treturn isl_set_copy(tree->context);\n}\n\n/* Return the domain of the domain tree root.\n */\n__isl_give isl_union_set *isl_schedule_tree_domain_get_domain(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn NULL;\n\n\tif (tree->type != isl_schedule_node_domain)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a domain node\", return NULL);\n\n\treturn isl_union_set_copy(tree->domain);\n}\n\n/* Replace the domain of domain tree root \"tree\" by \"domain\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_domain_set_domain(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *domain)\n{\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree || !domain)\n\t\tgoto error;\n\n\tif (tree->type != isl_schedule_node_domain)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a domain node\", goto error);\n\n\tisl_union_set_free(tree->domain);\n\ttree->domain = domain;\n\n\treturn tree;\nerror:\n\tisl_schedule_tree_free(tree);\n\tisl_union_set_free(domain);\n\treturn NULL;\n}\n\n/* Return the contraction of the expansion tree root.\n */\n__isl_give isl_union_pw_multi_aff *isl_schedule_tree_expansion_get_contraction(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn NULL;\n\n\tif (tree->type != isl_schedule_node_expansion)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not an expansion node\", return NULL);\n\n\treturn isl_union_pw_multi_aff_copy(tree->contraction);\n}\n\n/* Return the expansion of the expansion tree root.\n */\n__isl_give isl_union_map *isl_schedule_tree_expansion_get_expansion(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn NULL;\n\n\tif (tree->type != isl_schedule_node_expansion)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not an expansion node\", return NULL);\n\n\treturn isl_union_map_copy(tree->expansion);\n}\n\n/* Replace the contraction and the expansion of the expansion tree root \"tree\"\n * by \"contraction\" and \"expansion\".\n */\n__isl_give isl_schedule_tree *\nisl_schedule_tree_expansion_set_contraction_and_expansion(\n\t__isl_take isl_schedule_tree *tree,\n\t__isl_take isl_union_pw_multi_aff *contraction,\n\t__isl_take isl_union_map *expansion)\n{\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree || !contraction || !expansion)\n\t\tgoto error;\n\n\tif (tree->type != isl_schedule_node_expansion)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not an expansion node\", return NULL);\n\n\tisl_union_pw_multi_aff_free(tree->contraction);\n\ttree->contraction = contraction;\n\tisl_union_map_free(tree->expansion);\n\ttree->expansion = expansion;\n\n\treturn tree;\nerror:\n\tisl_schedule_tree_free(tree);\n\tisl_union_pw_multi_aff_free(contraction);\n\tisl_union_map_free(expansion);\n\treturn NULL;\n}\n\n/* Return the extension of the extension tree root.\n */\n__isl_give isl_union_map *isl_schedule_tree_extension_get_extension(\n\t__isl_take isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn NULL;\n\n\tif (tree->type != isl_schedule_node_extension)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not an extension node\", return NULL);\n\n\treturn isl_union_map_copy(tree->extension);\n}\n\n/* Replace the extension of extension tree root \"tree\" by \"extension\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_extension_set_extension(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_map *extension)\n{\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree || !extension)\n\t\tgoto error;\n\n\tif (tree->type != isl_schedule_node_extension)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not an extension node\", return NULL);\n\tisl_union_map_free(tree->extension);\n\ttree->extension = extension;\n\n\treturn tree;\nerror:\n\tisl_schedule_tree_free(tree);\n\tisl_union_map_free(extension);\n\treturn NULL;\n}\n\n/* Return the filter of the filter tree root.\n */\n__isl_give isl_union_set *isl_schedule_tree_filter_get_filter(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn NULL;\n\n\tif (tree->type != isl_schedule_node_filter)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a filter node\", return NULL);\n\n\treturn isl_union_set_copy(tree->filter);\n}\n\n/* Replace the filter of the filter tree root by \"filter\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_filter_set_filter(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *filter)\n{\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree || !filter)\n\t\tgoto error;\n\n\tif (tree->type != isl_schedule_node_filter)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a filter node\", return NULL);\n\n\tisl_union_set_free(tree->filter);\n\ttree->filter = filter;\n\n\treturn tree;\nerror:\n\tisl_schedule_tree_free(tree);\n\tisl_union_set_free(filter);\n\treturn NULL;\n}\n\n/* Return the guard of the guard tree root.\n */\n__isl_give isl_set *isl_schedule_tree_guard_get_guard(\n\t__isl_take isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn NULL;\n\n\tif (tree->type != isl_schedule_node_guard)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a guard node\", return NULL);\n\n\treturn isl_set_copy(tree->guard);\n}\n\n/* Return the mark identifier of the mark tree root \"tree\".\n */\n__isl_give isl_id *isl_schedule_tree_mark_get_id(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn NULL;\n\n\tif (tree->type != isl_schedule_node_mark)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a mark node\", return NULL);\n\n\treturn isl_id_copy(tree->mark);\n}\n\n/* Set dim to the range dimension of \"map\" and abort the search.\n */\nstatic isl_stat set_range_dim(__isl_take isl_map *map, void *user)\n{\n\tisl_size *dim = user;\n\n\t*dim = isl_map_dim(map, isl_dim_out);\n\tisl_map_free(map);\n\n\treturn isl_stat_error;\n}\n\n/* Return the dimension of the range of \"umap\".\n * \"umap\" is assumed not to be empty and\n * all maps inside \"umap\" are assumed to have the same range.\n *\n * We extract the range dimension from the first map in \"umap\".\n */\nstatic isl_size range_dim(__isl_keep isl_union_map *umap)\n{\n\tisl_size dim = isl_size_error;\n\tisl_size n;\n\n\tn = isl_union_map_n_map(umap);\n\tif (n < 0)\n\t\treturn isl_size_error;\n\tif (n == 0)\n\t\tisl_die(isl_union_map_get_ctx(umap), isl_error_internal,\n\t\t\t\"unexpected empty input\", return isl_size_error);\n\n\tisl_union_map_foreach_map(umap, &set_range_dim, &dim);\n\n\treturn dim;\n}\n\n/* Append an \"extra\" number of zeros to the range of \"umap\" and\n * return the result.\n */\nstatic __isl_give isl_union_map *append_range(__isl_take isl_union_map *umap,\n\tint extra)\n{\n\tisl_union_set *dom;\n\tisl_space *space;\n\tisl_multi_val *mv;\n\tisl_union_pw_multi_aff *suffix;\n\tisl_union_map *universe;\n\tisl_union_map *suffix_umap;\n\n\tuniverse = isl_union_map_universe(isl_union_map_copy(umap));\n\tdom = isl_union_map_domain(universe);\n\tspace = isl_union_set_get_space(dom);\n\tspace = isl_space_set_from_params(space);\n\tspace = isl_space_add_dims(space, isl_dim_set, extra);\n\tmv = isl_multi_val_zero(space);\n\n\tsuffix = isl_union_pw_multi_aff_multi_val_on_domain(dom, mv);\n\tsuffix_umap = isl_union_map_from_union_pw_multi_aff(suffix);\n\tumap = isl_union_map_flat_range_product(umap, suffix_umap);\n\n\treturn umap;\n}\n\n/* Should we skip the root of \"tree\" while looking for the first\n * descendant with schedule information?\n * That is, is it impossible to derive any information about\n * the iteration domain from this node?\n *\n * We do not want to skip leaf or error nodes because there is\n * no point in looking any deeper from these nodes.\n * We can only extract partial iteration domain information\n * from an extension node, but extension nodes are not supported\n * by the caller and it will error out on them.\n */\nstatic isl_bool domain_less(__isl_keep isl_schedule_tree *tree)\n{\n\tenum isl_schedule_node_type type;\n\tisl_size n;\n\n\ttype = isl_schedule_tree_get_type(tree);\n\tswitch (type) {\n\tcase isl_schedule_node_band:\n\t\tn = isl_schedule_tree_band_n_member(tree);\n\t\treturn n < 0 ? isl_bool_error : isl_bool_ok(n == 0);\n\tcase isl_schedule_node_context:\n\tcase isl_schedule_node_guard:\n\tcase isl_schedule_node_mark:\n\t\treturn isl_bool_true;\n\tcase isl_schedule_node_leaf:\n\tcase isl_schedule_node_error:\n\tcase isl_schedule_node_domain:\n\tcase isl_schedule_node_expansion:\n\tcase isl_schedule_node_extension:\n\tcase isl_schedule_node_filter:\n\tcase isl_schedule_node_set:\n\tcase isl_schedule_node_sequence:\n\t\treturn isl_bool_false;\n\t}\n\n\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_internal,\n\t\t\"unhandled case\", return isl_bool_error);\n}\n\n/* Move down to the first descendant of \"tree\" that contains any schedule\n * information or return \"leaf\" if there is no such descendant.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_first_schedule_descendant(\n\t__isl_take isl_schedule_tree *tree, __isl_keep isl_schedule_tree *leaf)\n{\n\tisl_bool down;\n\n\twhile ((down = domain_less(tree)) == isl_bool_true) {\n\t\tif (!isl_schedule_tree_has_children(tree)) {\n\t\t\tisl_schedule_tree_free(tree);\n\t\t\treturn isl_schedule_tree_copy(leaf);\n\t\t}\n\t\ttree = isl_schedule_tree_child(tree, 0);\n\t}\n\n\tif (down < 0)\n\t\treturn isl_schedule_tree_free(tree);\n\n\treturn tree;\n}\n\nstatic __isl_give isl_union_map *subtree_schedule_extend(\n\t__isl_keep isl_schedule_tree *tree, __isl_take isl_union_map *outer);\n\n/* Extend the schedule map \"outer\" with the subtree schedule\n * of the (single) child of \"tree\", if any.\n *\n * If \"tree\" does not have any descendants (apart from those that\n * do not carry any schedule information), then we simply return \"outer\".\n * Otherwise, we extend the schedule map \"outer\" with the subtree schedule\n * of the single child.\n */\nstatic __isl_give isl_union_map *subtree_schedule_extend_child(\n\t__isl_keep isl_schedule_tree *tree, __isl_take isl_union_map *outer)\n{\n\tisl_schedule_tree *child;\n\tisl_union_map *res;\n\n\tif (!tree)\n\t\treturn isl_union_map_free(outer);\n\tif (!isl_schedule_tree_has_children(tree))\n\t\treturn outer;\n\tchild = isl_schedule_tree_get_child(tree, 0);\n\tif (!child)\n\t\treturn isl_union_map_free(outer);\n\tres = subtree_schedule_extend(child, outer);\n\tisl_schedule_tree_free(child);\n\treturn res;\n}\n\n/* Extract the parameter space from one of the children of \"tree\",\n * which are assumed to be filters.\n */\nstatic __isl_give isl_space *extract_space_from_filter_child(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\tisl_space *space;\n\tisl_union_set *dom;\n\tisl_schedule_tree *child;\n\n\tchild = isl_schedule_tree_list_get_schedule_tree(tree->children, 0);\n\tdom = isl_schedule_tree_filter_get_filter(child);\n\tspace = isl_union_set_get_space(dom);\n\tisl_union_set_free(dom);\n\tisl_schedule_tree_free(child);\n\n\treturn space;\n}\n\n/* Extend the schedule map \"outer\" with the subtree schedule\n * of a set or sequence node.\n *\n * The schedule for the set or sequence node itself is composed of\n * pieces of the form\n *\n *\tfilter -> []\n *\n * or\n *\n *\tfilter -> [index]\n *\n * The first form is used if there is only a single child or\n * if the current node is a set node and the schedule_separate_components\n * option is not set.\n *\n * Each of the pieces above is extended with the subtree schedule of\n * the child of the corresponding filter, if any, padded with zeros\n * to ensure that all pieces have the same range dimension.\n */\nstatic __isl_give isl_union_map *subtree_schedule_extend_from_children(\n\t__isl_keep isl_schedule_tree *tree, __isl_take isl_union_map *outer)\n{\n\tint i;\n\tisl_size n;\n\tisl_size dim;\n\tint separate;\n\tisl_ctx *ctx;\n\tisl_val *v = NULL;\n\tisl_multi_val *mv;\n\tisl_space *space;\n\tisl_union_map *umap;\n\n\tn = isl_schedule_tree_n_children(tree);\n\tif (n < 0)\n\t\treturn isl_union_map_free(outer);\n\tif (n == 0)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_internal,\n\t\t\t\"missing children\", return isl_union_map_free(outer));\n\n\tctx = isl_schedule_tree_get_ctx(tree);\n\tseparate = n > 1 && (tree->type == isl_schedule_node_sequence ||\n\t\t\t    isl_options_get_schedule_separate_components(ctx));\n\n\tspace = isl_space_params_alloc(ctx, 0);\n\n\tumap = isl_union_map_empty(isl_space_copy(space));\n\tspace = isl_space_set_from_params(space);\n\tif (separate) {\n\t\tspace = isl_space_add_dims(space, isl_dim_set, 1);\n\t\tv = isl_val_zero(ctx);\n\t}\n\tmv = isl_multi_val_zero(space);\n\n\tdim = isl_multi_val_dim(mv, isl_dim_set);\n\tif (dim < 0)\n\t\tumap = isl_union_map_free(umap);\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_multi_val *mv_copy;\n\t\tisl_union_pw_multi_aff *upma;\n\t\tisl_union_map *umap_i;\n\t\tisl_union_set *dom;\n\t\tisl_schedule_tree *child;\n\t\tisl_size dim_i;\n\t\tisl_bool empty;\n\n\t\tchild = isl_schedule_tree_list_get_schedule_tree(\n\t\t\t\t\t\t\ttree->children, i);\n\t\tdom = isl_schedule_tree_filter_get_filter(child);\n\n\t\tif (separate) {\n\t\t\tmv = isl_multi_val_set_val(mv, 0, isl_val_copy(v));\n\t\t\tv = isl_val_add_ui(v, 1);\n\t\t}\n\t\tmv_copy = isl_multi_val_copy(mv);\n\t\tspace = isl_union_set_get_space(dom);\n\t\tmv_copy = isl_multi_val_align_params(mv_copy, space);\n\t\tupma = isl_union_pw_multi_aff_multi_val_on_domain(dom, mv_copy);\n\t\tumap_i = isl_union_map_from_union_pw_multi_aff(upma);\n\t\tumap_i = isl_union_map_flat_range_product(\n\t\t\t\t\t    isl_union_map_copy(outer), umap_i);\n\t\tumap_i = subtree_schedule_extend_child(child, umap_i);\n\t\tisl_schedule_tree_free(child);\n\n\t\tempty = isl_union_map_is_empty(umap_i);\n\t\tif (empty < 0)\n\t\t\tumap_i = isl_union_map_free(umap_i);\n\t\telse if (empty) {\n\t\t\tisl_union_map_free(umap_i);\n\t\t\tcontinue;\n\t\t}\n\n\t\tdim_i = range_dim(umap_i);\n\t\tif (dim_i < 0) {\n\t\t\tumap = isl_union_map_free(umap);\n\t\t} else if (dim < dim_i) {\n\t\t\tumap = append_range(umap, dim_i - dim);\n\t\t\tdim = dim_i;\n\t\t} else if (dim_i < dim) {\n\t\t\tumap_i = append_range(umap_i, dim - dim_i);\n\t\t}\n\t\tumap = isl_union_map_union(umap, umap_i);\n\t}\n\n\tisl_val_free(v);\n\tisl_multi_val_free(mv);\n\tisl_union_map_free(outer);\n\n\treturn umap;\n}\n\n/* Extend the schedule map \"outer\" with the subtree schedule of \"tree\".\n *\n * If the root of the tree is a set or a sequence, then we extend\n * the schedule map in subtree_schedule_extend_from_children.\n * Otherwise, we extend the schedule map with the partial schedule\n * corresponding to the root of the tree and then continue with\n * the single child of this root.\n * In the special case of an expansion, the schedule map is \"extended\"\n * by applying the expansion to the domain of the schedule map.\n */\nstatic __isl_give isl_union_map *subtree_schedule_extend(\n\t__isl_keep isl_schedule_tree *tree, __isl_take isl_union_map *outer)\n{\n\tisl_multi_union_pw_aff *mupa;\n\tisl_union_map *umap;\n\tisl_union_set *domain;\n\tisl_size n;\n\n\tif (!tree)\n\t\treturn NULL;\n\n\tswitch (tree->type) {\n\tcase isl_schedule_node_error:\n\t\treturn isl_union_map_free(outer);\n\tcase isl_schedule_node_extension:\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"cannot construct subtree schedule of tree \"\n\t\t\t\"with extension nodes\",\n\t\t\treturn isl_union_map_free(outer));\n\tcase isl_schedule_node_context:\n\tcase isl_schedule_node_guard:\n\tcase isl_schedule_node_mark:\n\t\treturn subtree_schedule_extend_child(tree, outer);\n\tcase isl_schedule_node_band:\n\t\tn = isl_schedule_tree_band_n_member(tree);\n\t\tif (n < 0)\n\t\t\treturn isl_union_map_free(outer);\n\t\tif (n == 0)\n\t\t\treturn subtree_schedule_extend_child(tree, outer);\n\t\tmupa = isl_schedule_band_get_partial_schedule(tree->band);\n\t\tumap = isl_union_map_from_multi_union_pw_aff(mupa);\n\t\touter = isl_union_map_flat_range_product(outer, umap);\n\t\tumap = subtree_schedule_extend_child(tree, outer);\n\t\tbreak;\n\tcase isl_schedule_node_domain:\n\t\tdomain = isl_schedule_tree_domain_get_domain(tree);\n\t\tumap = isl_union_map_from_domain(domain);\n\t\touter = isl_union_map_flat_range_product(outer, umap);\n\t\tumap = subtree_schedule_extend_child(tree, outer);\n\t\tbreak;\n\tcase isl_schedule_node_expansion:\n\t\tumap = isl_schedule_tree_expansion_get_expansion(tree);\n\t\touter = isl_union_map_apply_domain(outer, umap);\n\t\tumap = subtree_schedule_extend_child(tree, outer);\n\t\tbreak;\n\tcase isl_schedule_node_filter:\n\t\tdomain = isl_schedule_tree_filter_get_filter(tree);\n\t\tumap = isl_union_map_from_domain(domain);\n\t\touter = isl_union_map_flat_range_product(outer, umap);\n\t\tumap = subtree_schedule_extend_child(tree, outer);\n\t\tbreak;\n\tcase isl_schedule_node_leaf:\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_internal,\n\t\t\t\"leaf node should be handled by caller\", return NULL);\n\tcase isl_schedule_node_set:\n\tcase isl_schedule_node_sequence:\n\t\tumap = subtree_schedule_extend_from_children(tree, outer);\n\t\tbreak;\n\t}\n\n\treturn umap;\n}\n\nstatic __isl_give isl_union_set *initial_domain(\n\t__isl_keep isl_schedule_tree *tree);\n\n/* Extract a universe domain from the children of the tree root \"tree\",\n * which is a set or sequence, meaning that its children are filters.\n * In particular, return the union of the universes of the filters.\n */\nstatic __isl_give isl_union_set *initial_domain_from_children(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\tint i;\n\tisl_size n;\n\tisl_space *space;\n\tisl_union_set *domain;\n\n\tn = isl_schedule_tree_n_children(tree);\n\tif (n < 0)\n\t\treturn NULL;\n\tif (n == 0)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_internal,\n\t\t\t\"missing children\", return NULL);\n\n\tspace = extract_space_from_filter_child(tree);\n\tdomain = isl_union_set_empty(space);\n\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_schedule_tree *child;\n\t\tisl_union_set *domain_i;\n\n\t\tchild = isl_schedule_tree_get_child(tree, i);\n\t\tdomain_i = initial_domain(child);\n\t\tdomain = isl_union_set_union(domain, domain_i);\n\t\tisl_schedule_tree_free(child);\n\t}\n\n\treturn domain;\n}\n\n/* Extract a universe domain from the tree root \"tree\".\n * The caller is responsible for making sure that this node\n * would not be skipped by isl_schedule_tree_first_schedule_descendant\n * and that it is not a leaf node.\n */\nstatic __isl_give isl_union_set *initial_domain(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\tisl_multi_union_pw_aff *mupa;\n\tisl_union_set *domain;\n\tisl_union_map *exp;\n\tisl_size n;\n\n\tif (!tree)\n\t\treturn NULL;\n\n\tswitch (tree->type) {\n\tcase isl_schedule_node_error:\n\t\treturn NULL;\n\tcase isl_schedule_node_context:\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_internal,\n\t\t\t\"context node should be handled by caller\",\n\t\t\treturn NULL);\n\tcase isl_schedule_node_guard:\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_internal,\n\t\t\t\"guard node should be handled by caller\",\n\t\t\treturn NULL);\n\tcase isl_schedule_node_mark:\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_internal,\n\t\t\t\"mark node should be handled by caller\",\n\t\t\treturn NULL);\n\tcase isl_schedule_node_extension:\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"cannot construct subtree schedule of tree \"\n\t\t\t\"with extension nodes\", return NULL);\n\tcase isl_schedule_node_band:\n\t\tn = isl_schedule_tree_band_n_member(tree);\n\t\tif (n < 0)\n\t\t\treturn NULL;\n\t\tif (n == 0)\n\t\t\tisl_die(isl_schedule_tree_get_ctx(tree),\n\t\t\t\tisl_error_internal,\n\t\t\t\t\"0D band should be handled by caller\",\n\t\t\t\treturn NULL);\n\t\tmupa = isl_schedule_band_get_partial_schedule(tree->band);\n\t\tdomain = isl_multi_union_pw_aff_domain(mupa);\n\t\tdomain = isl_union_set_universe(domain);\n\t\tbreak;\n\tcase isl_schedule_node_domain:\n\t\tdomain = isl_schedule_tree_domain_get_domain(tree);\n\t\tdomain = isl_union_set_universe(domain);\n\t\tbreak;\n\tcase isl_schedule_node_expansion:\n\t\texp = isl_schedule_tree_expansion_get_expansion(tree);\n\t\texp = isl_union_map_universe(exp);\n\t\tdomain = isl_union_map_domain(exp);\n\t\tbreak;\n\tcase isl_schedule_node_filter:\n\t\tdomain = isl_schedule_tree_filter_get_filter(tree);\n\t\tdomain = isl_union_set_universe(domain);\n\t\tbreak;\n\tcase isl_schedule_node_leaf:\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_internal,\n\t\t\t\"leaf node should be handled by caller\", return NULL);\n\tcase isl_schedule_node_set:\n\tcase isl_schedule_node_sequence:\n\t\tdomain = initial_domain_from_children(tree);\n\t\tbreak;\n\t}\n\n\treturn domain;\n}\n\n/* Return the subtree schedule of a node that contains some schedule\n * information, i.e., a node that would not be skipped by\n * isl_schedule_tree_first_schedule_descendant and that is not a leaf.\n *\n * If the tree contains any expansions, then the returned subtree\n * schedule is formulated in terms of the expanded domains.\n * The tree is not allowed to contain any extension nodes.\n *\n * We start with an initial zero-dimensional subtree schedule based\n * on the domain information in the root node and then extend it\n * based on the schedule information in the root node and its descendants.\n */\n__isl_give isl_union_map *isl_schedule_tree_get_subtree_schedule_union_map(\n\t__isl_keep isl_schedule_tree *tree)\n{\n\tisl_union_set *domain;\n\tisl_union_map *umap;\n\n\tdomain = initial_domain(tree);\n\tumap = isl_union_map_from_domain(domain);\n\treturn subtree_schedule_extend(tree, umap);\n}\n\n/* Multiply the partial schedule of the band root node of \"tree\"\n * with the factors in \"mv\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_scale(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_multi_val *mv)\n{\n\tif (!tree || !mv)\n\t\tgoto error;\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", goto error);\n\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree)\n\t\tgoto error;\n\n\ttree->band = isl_schedule_band_scale(tree->band, mv);\n\tif (!tree->band)\n\t\treturn isl_schedule_tree_free(tree);\n\n\treturn tree;\nerror:\n\tisl_schedule_tree_free(tree);\n\tisl_multi_val_free(mv);\n\treturn NULL;\n}\n\n/* Divide the partial schedule of the band root node of \"tree\"\n * by the factors in \"mv\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_scale_down(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_multi_val *mv)\n{\n\tif (!tree || !mv)\n\t\tgoto error;\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", goto error);\n\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree)\n\t\tgoto error;\n\n\ttree->band = isl_schedule_band_scale_down(tree->band, mv);\n\tif (!tree->band)\n\t\treturn isl_schedule_tree_free(tree);\n\n\treturn tree;\nerror:\n\tisl_schedule_tree_free(tree);\n\tisl_multi_val_free(mv);\n\treturn NULL;\n}\n\n/* Reduce the partial schedule of the band root node of \"tree\"\n * modulo the factors in \"mv\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_mod(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_multi_val *mv)\n{\n\tif (!tree || !mv)\n\t\tgoto error;\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", goto error);\n\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree)\n\t\tgoto error;\n\n\ttree->band = isl_schedule_band_mod(tree->band, mv);\n\tif (!tree->band)\n\t\treturn isl_schedule_tree_free(tree);\n\n\treturn tree;\nerror:\n\tisl_schedule_tree_free(tree);\n\tisl_multi_val_free(mv);\n\treturn NULL;\n}\n\n/* Shift the partial schedule of the band root node of \"tree\" by \"shift\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_shift(\n\t__isl_take isl_schedule_tree *tree,\n\t__isl_take isl_multi_union_pw_aff *shift)\n{\n\tif (!tree || !shift)\n\t\tgoto error;\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", goto error);\n\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree)\n\t\tgoto error;\n\n\ttree->band = isl_schedule_band_shift(tree->band, shift);\n\tif (!tree->band)\n\t\treturn isl_schedule_tree_free(tree);\n\n\treturn tree;\nerror:\n\tisl_schedule_tree_free(tree);\n\tisl_multi_union_pw_aff_free(shift);\n\treturn NULL;\n}\n\n/* Given two trees with sequence roots, replace the child at position\n * \"pos\" of \"tree\" with the children of \"child\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_sequence_splice(\n\t__isl_take isl_schedule_tree *tree, int pos,\n\t__isl_take isl_schedule_tree *child)\n{\n\tisl_size n;\n\tisl_schedule_tree_list *list1, *list2;\n\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree || !child)\n\t\tgoto error;\n\tif (isl_schedule_tree_get_type(tree) != isl_schedule_node_sequence)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a sequence node\", goto error);\n\tn = isl_schedule_tree_n_children(tree);\n\tif (n < 0)\n\t\tgoto error;\n\tif (pos < 0 || pos >= n)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"position out of bounds\", goto error);\n\tif (isl_schedule_tree_get_type(child) != isl_schedule_node_sequence)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a sequence node\", goto error);\n\n\tlist1 = isl_schedule_tree_list_copy(tree->children);\n\tlist1 = isl_schedule_tree_list_drop(list1, pos, n - pos);\n\tlist2 = isl_schedule_tree_list_copy(tree->children);\n\tlist2 = isl_schedule_tree_list_drop(list2, 0, pos + 1);\n\tlist1 = isl_schedule_tree_list_concat(list1,\n\t\t\t\tisl_schedule_tree_list_copy(child->children));\n\tlist1 = isl_schedule_tree_list_concat(list1, list2);\n\n\tisl_schedule_tree_free(tree);\n\tisl_schedule_tree_free(child);\n\treturn isl_schedule_tree_from_children(isl_schedule_node_sequence,\n\t\t\t\t\t\tlist1);\nerror:\n\tisl_schedule_tree_free(tree);\n\tisl_schedule_tree_free(child);\n\treturn NULL;\n}\n\n/* Tile the band root node of \"tree\" with tile sizes \"sizes\".\n *\n * We duplicate the band node, change the schedule of one of them\n * to the tile schedule and the other to the point schedule and then\n * attach the point band as a child to the tile band.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_tile(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_multi_val *sizes)\n{\n\tisl_schedule_tree *child = NULL;\n\n\tif (!tree || !sizes)\n\t\tgoto error;\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", goto error);\n\n\tchild = isl_schedule_tree_copy(tree);\n\ttree = isl_schedule_tree_cow(tree);\n\tchild = isl_schedule_tree_cow(child);\n\tif (!tree || !child)\n\t\tgoto error;\n\n\ttree->band = isl_schedule_band_tile(tree->band,\n\t\t\t\t\t    isl_multi_val_copy(sizes));\n\tif (!tree->band)\n\t\tgoto error;\n\tchild->band = isl_schedule_band_point(child->band, tree->band, sizes);\n\tif (!child->band)\n\t\tchild = isl_schedule_tree_free(child);\n\n\ttree = isl_schedule_tree_replace_child(tree, 0, child);\n\n\treturn tree;\nerror:\n\tisl_schedule_tree_free(child);\n\tisl_schedule_tree_free(tree);\n\tisl_multi_val_free(sizes);\n\treturn NULL;\n}\n\n/* Given an isolate AST generation option \"isolate\" for a band of size pos + n,\n * return the corresponding option for a band covering the first \"pos\"\n * members.\n *\n * The input isolate option is of the form\n *\n *\tisolate[[flattened outer bands] -> [pos; n]]\n *\n * The output isolate option is of the form\n *\n *\tisolate[[flattened outer bands] -> [pos]]\n */\nstatic __isl_give isl_set *isolate_initial(__isl_keep isl_set *isolate,\n\tint pos, int n)\n{\n\tisl_id *id;\n\tisl_map *map;\n\n\tisolate = isl_set_copy(isolate);\n\tid = isl_set_get_tuple_id(isolate);\n\tmap = isl_set_unwrap(isolate);\n\tmap = isl_map_project_out(map, isl_dim_out, pos, n);\n\tisolate = isl_map_wrap(map);\n\tisolate = isl_set_set_tuple_id(isolate, id);\n\n\treturn isolate;\n}\n\n/* Given an isolate AST generation option \"isolate\" for a band of size pos + n,\n * return the corresponding option for a band covering the final \"n\"\n * members within a band covering the first \"pos\" members.\n *\n * The input isolate option is of the form\n *\n *\tisolate[[flattened outer bands] -> [pos; n]]\n *\n * The output isolate option is of the form\n *\n *\tisolate[[flattened outer bands; pos] -> [n]]\n *\n *\n * The range is first split into\n *\n *\tisolate[[flattened outer bands] -> [[pos] -> [n]]]\n *\n * and then the first pos members are moved to the domain\n *\n *\tisolate[[[flattened outer bands] -> [pos]] -> [n]]\n *\n * after which the domain is flattened to obtain the desired output.\n */\nstatic __isl_give isl_set *isolate_final(__isl_keep isl_set *isolate,\n\tint pos, int n)\n{\n\tisl_id *id;\n\tisl_space *space;\n\tisl_multi_aff *ma1, *ma2;\n\tisl_map *map;\n\n\tisolate = isl_set_copy(isolate);\n\tid = isl_set_get_tuple_id(isolate);\n\tmap = isl_set_unwrap(isolate);\n\tspace = isl_space_range(isl_map_get_space(map));\n\tma1 = isl_multi_aff_project_out_map(isl_space_copy(space),\n\t\t\t\t\t\t   isl_dim_set, pos, n);\n\tma2 = isl_multi_aff_project_out_map(space, isl_dim_set, 0, pos);\n\tma1 = isl_multi_aff_range_product(ma1, ma2);\n\tmap = isl_map_apply_range(map, isl_map_from_multi_aff(ma1));\n\tmap = isl_map_uncurry(map);\n\tmap = isl_map_flatten_domain(map);\n\tisolate = isl_map_wrap(map);\n\tisolate = isl_set_set_tuple_id(isolate, id);\n\n\treturn isolate;\n}\n\n/* Split the band root node of \"tree\" into two nested band nodes,\n * one with the first \"pos\" dimensions and\n * one with the remaining dimensions.\n * The tree is itself positioned at schedule depth \"depth\".\n *\n * The loop AST generation type options and the isolate option\n * are split over the two band nodes.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_split(\n\t__isl_take isl_schedule_tree *tree, int pos, int depth)\n{\n\tisl_size n;\n\tisl_set *isolate, *tree_isolate, *child_isolate;\n\tisl_schedule_tree *child;\n\n\tif (!tree)\n\t\treturn NULL;\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", return isl_schedule_tree_free(tree));\n\n\tn = isl_schedule_tree_band_n_member(tree);\n\tif (n < 0)\n\t\treturn isl_schedule_tree_free(tree);\n\tif (pos < 0 || pos > n)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"position out of bounds\",\n\t\t\treturn isl_schedule_tree_free(tree));\n\n\tchild = isl_schedule_tree_copy(tree);\n\ttree = isl_schedule_tree_cow(tree);\n\tchild = isl_schedule_tree_cow(child);\n\tif (!tree || !child)\n\t\tgoto error;\n\n\tisolate = isl_schedule_tree_band_get_ast_isolate_option(tree, depth);\n\ttree_isolate = isolate_initial(isolate, pos, n - pos);\n\tchild_isolate = isolate_final(isolate, pos, n - pos);\n\tchild->band = isl_schedule_band_drop(child->band, 0, pos);\n\tchild->band = isl_schedule_band_replace_ast_build_option(child->band,\n\t\t\t\t\tisl_set_copy(isolate), child_isolate);\n\ttree->band = isl_schedule_band_drop(tree->band, pos, n - pos);\n\ttree->band = isl_schedule_band_replace_ast_build_option(tree->band,\n\t\t\t\t\tisl_set_copy(isolate), tree_isolate);\n\tisl_set_free(isolate);\n\tif (!child->band || !tree->band)\n\t\tgoto error;\n\n\ttree = isl_schedule_tree_replace_child(tree, 0, child);\n\n\treturn tree;\nerror:\n\tisl_schedule_tree_free(child);\n\tisl_schedule_tree_free(tree);\n\treturn NULL;\n}\n\n/* Attach \"tree2\" at each of the leaves of \"tree1\".\n *\n * If \"tree1\" does not have any explicit children, then make \"tree2\"\n * its single child.  Otherwise, attach \"tree2\" to the leaves of\n * each of the children of \"tree1\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_append_to_leaves(\n\t__isl_take isl_schedule_tree *tree1,\n\t__isl_take isl_schedule_tree *tree2)\n{\n\tint i;\n\tisl_size n;\n\n\tn = isl_schedule_tree_n_children(tree1);\n\tif (n < 0 || !tree2)\n\t\tgoto error;\n\tif (n == 0) {\n\t\tisl_schedule_tree_list *list;\n\t\tlist = isl_schedule_tree_list_from_schedule_tree(tree2);\n\t\ttree1 = isl_schedule_tree_set_children(tree1, list);\n\t\treturn tree1;\n\t}\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_schedule_tree *child;\n\n\t\tchild = isl_schedule_tree_get_child(tree1, i);\n\t\tchild = isl_schedule_tree_append_to_leaves(child,\n\t\t\t\t\tisl_schedule_tree_copy(tree2));\n\t\ttree1 = isl_schedule_tree_replace_child(tree1, i, child);\n\t}\n\n\tisl_schedule_tree_free(tree2);\n\treturn tree1;\nerror:\n\tisl_schedule_tree_free(tree1);\n\tisl_schedule_tree_free(tree2);\n\treturn NULL;\n}\n\n/* Reset the user pointer on all identifiers of parameters and tuples\n * in the root of \"tree\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_reset_user(\n\t__isl_take isl_schedule_tree *tree)\n{\n\tif (isl_schedule_tree_is_leaf(tree))\n\t\treturn tree;\n\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree)\n\t\treturn NULL;\n\n\tswitch (tree->type) {\n\tcase isl_schedule_node_error:\n\t\treturn isl_schedule_tree_free(tree);\n\tcase isl_schedule_node_band:\n\t\ttree->band = isl_schedule_band_reset_user(tree->band);\n\t\tif (!tree->band)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tbreak;\n\tcase isl_schedule_node_context:\n\t\ttree->context = isl_set_reset_user(tree->context);\n\t\tif (!tree->context)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tbreak;\n\tcase isl_schedule_node_domain:\n\t\ttree->domain = isl_union_set_reset_user(tree->domain);\n\t\tif (!tree->domain)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tbreak;\n\tcase isl_schedule_node_expansion:\n\t\ttree->contraction =\n\t\t\tisl_union_pw_multi_aff_reset_user(tree->contraction);\n\t\ttree->expansion = isl_union_map_reset_user(tree->expansion);\n\t\tif (!tree->contraction || !tree->expansion)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tbreak;\n\tcase isl_schedule_node_extension:\n\t\ttree->extension = isl_union_map_reset_user(tree->extension);\n\t\tif (!tree->extension)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tbreak;\n\tcase isl_schedule_node_filter:\n\t\ttree->filter = isl_union_set_reset_user(tree->filter);\n\t\tif (!tree->filter)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tbreak;\n\tcase isl_schedule_node_guard:\n\t\ttree->guard = isl_set_reset_user(tree->guard);\n\t\tif (!tree->guard)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tbreak;\n\tcase isl_schedule_node_leaf:\n\tcase isl_schedule_node_mark:\n\tcase isl_schedule_node_sequence:\n\tcase isl_schedule_node_set:\n\t\tbreak;\n\t}\n\n\treturn tree;\n}\n\n/* Align the parameters of the root of \"tree\" to those of \"space\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_align_params(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_space *space)\n{\n\tif (!space)\n\t\tgoto error;\n\n\tif (isl_schedule_tree_is_leaf(tree)) {\n\t\tisl_space_free(space);\n\t\treturn tree;\n\t}\n\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree)\n\t\tgoto error;\n\n\tswitch (tree->type) {\n\tcase isl_schedule_node_error:\n\t\tgoto error;\n\tcase isl_schedule_node_band:\n\t\ttree->band = isl_schedule_band_align_params(tree->band, space);\n\t\tif (!tree->band)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tbreak;\n\tcase isl_schedule_node_context:\n\t\ttree->context = isl_set_align_params(tree->context, space);\n\t\tif (!tree->context)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tbreak;\n\tcase isl_schedule_node_domain:\n\t\ttree->domain = isl_union_set_align_params(tree->domain, space);\n\t\tif (!tree->domain)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tbreak;\n\tcase isl_schedule_node_expansion:\n\t\ttree->contraction =\n\t\t\tisl_union_pw_multi_aff_align_params(tree->contraction,\n\t\t\t\t\t\t\tisl_space_copy(space));\n\t\ttree->expansion = isl_union_map_align_params(tree->expansion,\n\t\t\t\t\t\t\t\tspace);\n\t\tif (!tree->contraction || !tree->expansion)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tbreak;\n\tcase isl_schedule_node_extension:\n\t\ttree->extension = isl_union_map_align_params(tree->extension,\n\t\t\t\t\t\t\t\tspace);\n\t\tif (!tree->extension)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tbreak;\n\tcase isl_schedule_node_filter:\n\t\ttree->filter = isl_union_set_align_params(tree->filter, space);\n\t\tif (!tree->filter)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tbreak;\n\tcase isl_schedule_node_guard:\n\t\ttree->guard = isl_set_align_params(tree->guard, space);\n\t\tif (!tree->guard)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t\tbreak;\n\tcase isl_schedule_node_leaf:\n\tcase isl_schedule_node_mark:\n\tcase isl_schedule_node_sequence:\n\tcase isl_schedule_node_set:\n\t\tisl_space_free(space);\n\t\tbreak;\n\t}\n\n\treturn tree;\nerror:\n\tisl_space_free(space);\n\tisl_schedule_tree_free(tree);\n\treturn NULL;\n}\n\n/* Does \"tree\" involve the iteration domain?\n * That is, does it need to be modified\n * by isl_schedule_tree_pullback_union_pw_multi_aff?\n */\nstatic int involves_iteration_domain(__isl_keep isl_schedule_tree *tree)\n{\n\tif (!tree)\n\t\treturn -1;\n\n\tswitch (tree->type) {\n\tcase isl_schedule_node_error:\n\t\treturn -1;\n\tcase isl_schedule_node_band:\n\tcase isl_schedule_node_domain:\n\tcase isl_schedule_node_expansion:\n\tcase isl_schedule_node_extension:\n\tcase isl_schedule_node_filter:\n\t\treturn 1;\n\tcase isl_schedule_node_context:\n\tcase isl_schedule_node_leaf:\n\tcase isl_schedule_node_guard:\n\tcase isl_schedule_node_mark:\n\tcase isl_schedule_node_sequence:\n\tcase isl_schedule_node_set:\n\t\treturn 0;\n\t}\n\n\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_internal,\n\t\t\"unhandled case\", return -1);\n}\n\n/* Compute the pullback of the root node of \"tree\" by the function\n * represented by \"upma\".\n * In other words, plug in \"upma\" in the iteration domains of\n * the root node of \"tree\".\n * We currently do not handle expansion nodes.\n *\n * We first check if the root node involves any iteration domains.\n * If so, we handle the specific cases.\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_pullback_union_pw_multi_aff(\n\t__isl_take isl_schedule_tree *tree,\n\t__isl_take isl_union_pw_multi_aff *upma)\n{\n\tint involves;\n\n\tif (!tree || !upma)\n\t\tgoto error;\n\n\tinvolves = involves_iteration_domain(tree);\n\tif (involves < 0)\n\t\tgoto error;\n\tif (!involves) {\n\t\tisl_union_pw_multi_aff_free(upma);\n\t\treturn tree;\n\t}\n\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree)\n\t\tgoto error;\n\n\tif (tree->type == isl_schedule_node_band) {\n\t\ttree->band = isl_schedule_band_pullback_union_pw_multi_aff(\n\t\t\t\t\t\t\t    tree->band, upma);\n\t\tif (!tree->band)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t} else if (tree->type == isl_schedule_node_domain) {\n\t\ttree->domain =\n\t\t\tisl_union_set_preimage_union_pw_multi_aff(tree->domain,\n\t\t\t\t\t\t\t\t\tupma);\n\t\tif (!tree->domain)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t} else if (tree->type == isl_schedule_node_expansion) {\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_unsupported,\n\t\t\t\"cannot pullback expansion node\", goto error);\n\t} else if (tree->type == isl_schedule_node_extension) {\n\t\ttree->extension =\n\t\t\tisl_union_map_preimage_range_union_pw_multi_aff(\n\t\t\t    tree->extension, upma);\n\t\tif (!tree->extension)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t} else if (tree->type == isl_schedule_node_filter) {\n\t\ttree->filter =\n\t\t\tisl_union_set_preimage_union_pw_multi_aff(tree->filter,\n\t\t\t\t\t\t\t\t\tupma);\n\t\tif (!tree->filter)\n\t\t\treturn isl_schedule_tree_free(tree);\n\t}\n\n\treturn tree;\nerror:\n\tisl_union_pw_multi_aff_free(upma);\n\tisl_schedule_tree_free(tree);\n\treturn NULL;\n}\n\n/* Compute the gist of the band tree root with respect to \"context\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_gist(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *context)\n{\n\tif (!tree)\n\t\treturn NULL;\n\tif (tree->type != isl_schedule_node_band)\n\t\tisl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n\t\t\t\"not a band node\", goto error);\n\ttree = isl_schedule_tree_cow(tree);\n\tif (!tree)\n\t\tgoto error;\n\n\ttree->band = isl_schedule_band_gist(tree->band, context);\n\tif (!tree->band)\n\t\treturn isl_schedule_tree_free(tree);\n\treturn tree;\nerror:\n\tisl_union_set_free(context);\n\tisl_schedule_tree_free(tree);\n\treturn NULL;\n}\n\n/* Are any members in \"band\" marked coincident?\n */\nstatic isl_bool any_coincident(__isl_keep isl_schedule_band *band)\n{\n\tint i;\n\tisl_size n;\n\n\tn = isl_schedule_band_n_member(band);\n\tif (n < 0)\n\t\treturn isl_bool_error;\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_bool coincident;\n\n\t\tcoincident = isl_schedule_band_member_get_coincident(band, i);\n\t\tif (coincident < 0 || coincident)\n\t\t\treturn coincident;\n\t}\n\n\treturn isl_bool_false;\n}\n\n/* AutoSA Extended */\n/* Is space_time property existed or are any members in \"band\" marked space/time?\n */\nstatic isl_bool any_space_time(__isl_keep isl_schedule_band *band)\n{\n  int i;\n  isl_size n;\n\n  n = isl_schedule_band_n_member(band);\n  if (n < 0)\n    return isl_bool_error;\n  for (i = 0; i < n; ++i) {\n    enum autosa_loop_type space_time;\n    \n    space_time = isl_schedule_band_member_get_space_time(band, i);\n    if (space_time == autosa_loop_time || space_time == autosa_loop_space)\n      return isl_bool_true;\n  }\n\n  return isl_bool_false;\n}\n\n/* Is pe_opt property existed or are any members in \"band\" marked pe_opt?\n */\nstatic isl_bool any_pe_opt(__isl_keep isl_schedule_band *band)\n{\n  int i;\n  isl_size n;\n\n  n = isl_schedule_band_n_member(band);\n  if (n < 0)\n    return isl_bool_error;\n  for (i = 0; i < n; ++i) {\n    enum autosa_loop_type pe_opt;\n    \n    pe_opt = isl_schedule_band_member_get_pe_opt(band, i);\n    if (pe_opt == autosa_loop_latency || pe_opt == autosa_loop_simd || \n\t\t\t\tpe_opt == autosa_loop_array_part)\n      return isl_bool_true;\n  }\n  \n  return isl_bool_false;\n}\n\n/* Is sched_pos property existed or are any numbers in \"band\" marked sched_pos? \n */\nstatic isl_bool any_sched_pos(__isl_keep isl_schedule_band *band)\n{\n\tint i;\n\tisl_size n;\n\n\tn = isl_schedule_band_n_member(band);\n\tif (n < 0)\n\t\treturn isl_bool_error;\n\tfor (i = 0; i < n; ++i) {\n\t\tint sched_pos;\n\n\t\tsched_pos = isl_schedule_band_member_get_sched_pos(band, i);\n\t\tif (sched_pos >= 0 && sched_pos < n)\n\t\t\treturn isl_bool_true;\n\t}\n\n\treturn isl_bool_false;\n}\n/* AutoSA Extended */\n\n/* Print the band node \"band\" to \"p\".\n *\n * The permutable and coincident properties are only printed if they\n * are different from the defaults.\n * The coincident property is always printed in YAML flow style.\n */\nstatic __isl_give isl_printer *print_tree_band(__isl_take isl_printer *p,\n\t__isl_keep isl_schedule_band *band)\n{\n\tisl_union_set *options;\n\tisl_bool empty;\n\tisl_bool coincident;\n\t/* AutoSA Extended */\n\tisl_bool pe_opt;\n\tisl_bool space_time;\n\tisl_bool sched_pos;\n\t/* AutoSA Extended */\n\n\tp = isl_printer_print_str(p, \"schedule\");\n\tp = isl_printer_yaml_next(p);\n\tp = isl_printer_print_str(p, \"\\\"\");\n\tp = isl_printer_print_multi_union_pw_aff(p, band->mupa);\n\tp = isl_printer_print_str(p, \"\\\"\");\n\tif (isl_schedule_band_get_permutable(band)) {\n\t\tp = isl_printer_yaml_next(p);\n\t\tp = isl_printer_print_str(p, \"permutable\");\n\t\tp = isl_printer_yaml_next(p);\n\t\tp = isl_printer_print_int(p, 1);\n\t}\n\tcoincident = any_coincident(band);\n\tif (coincident < 0)\n\t\treturn isl_printer_free(p);\n\tif (coincident) {\n\t\tint i;\n\t\tisl_size n;\n\t\tint style;\n\n\t\tp = isl_printer_yaml_next(p);\n\t\tp = isl_printer_print_str(p, \"coincident\");\n\t\tp = isl_printer_yaml_next(p);\n\t\tstyle = isl_printer_get_yaml_style(p);\n\t\tp = isl_printer_set_yaml_style(p, ISL_YAML_STYLE_FLOW);\n\t\tp = isl_printer_yaml_start_sequence(p);\n\t\tn = isl_schedule_band_n_member(band);\n\t\tif (n < 0)\n\t\t\treturn isl_printer_free(p);\n\t\tfor (i = 0; i < n; ++i) {\n\t\t\tp = isl_printer_print_int(p,\n\t\t\t    isl_schedule_band_member_get_coincident(band, i));\n\t\t\tp = isl_printer_yaml_next(p);\n\t\t}\n\t\tp = isl_printer_yaml_end_sequence(p);\n\t\tp = isl_printer_set_yaml_style(p, style);\n\t}\n\t/* AutoSA Extended */\n  space_time = any_space_time(band);\n  if (space_time < 0)\n    return isl_printer_free(p);\n  if (space_time) {\n    int i;\n    isl_size n;\n    int style;\n\n    p = isl_printer_yaml_next(p);\n    p = isl_printer_print_str(p, \"space_time\");\n    p = isl_printer_yaml_next(p);\n    style = isl_printer_get_yaml_style(p);\n    p = isl_printer_set_yaml_style(p, ISL_YAML_STYLE_FLOW);\n    p = isl_printer_yaml_start_sequence(p);\n    n = isl_schedule_band_n_member(band);\n    if (n < 0)\n      return isl_printer_free(p);\n    for (i = 0; i < n; ++i) {\n      switch(isl_schedule_band_member_get_space_time(band, i)) {\n        case autosa_loop_default:\n          p = isl_printer_print_str(p, \"default\");\n          p = isl_printer_yaml_next(p);\n          break;\n        case autosa_loop_error:\n          p = isl_printer_print_str(p, \"error\");\n          p = isl_printer_yaml_next(p);\n          break;\n        case autosa_loop_time:\n          p = isl_printer_print_str(p, \"time\");\n          p = isl_printer_yaml_next(p);\n          break;\n        case autosa_loop_space:\n          p = isl_printer_print_str(p, \"space\");\n          p = isl_printer_yaml_next(p);\n          break;\n        default:\n          p = isl_printer_print_str(p, \"unknown\");\n          p = isl_printer_yaml_next(p);\n          break;\n      }\n    }\n    p = isl_printer_yaml_end_sequence(p);\n    p = isl_printer_set_yaml_style(p, style);\n  }\n  pe_opt = any_pe_opt(band);\n  if (pe_opt < 0)\n    return isl_printer_free(p);\n  if (pe_opt) {\n    int i;\n    isl_size n;\n    int style;\n\n    p = isl_printer_yaml_next(p);\n    p = isl_printer_print_str(p, \"pe_opt\");\n    p = isl_printer_yaml_next(p);\n    style = isl_printer_get_yaml_style(p);\n    p = isl_printer_set_yaml_style(p, ISL_YAML_STYLE_FLOW);\n    p = isl_printer_yaml_start_sequence(p);\n    n = isl_schedule_band_n_member(band);\n    if (n < 0)\n      return isl_printer_free(p);\n    for (i = 0; i < n; ++i) {\n      switch(isl_schedule_band_member_get_pe_opt(band, i)) {\n        case autosa_loop_default:\n          p = isl_printer_print_str(p, \"default\");\n          p = isl_printer_yaml_next(p);\n          break;\n        case autosa_loop_error:\n          p = isl_printer_print_str(p, \"error\");\n          p = isl_printer_yaml_next(p);\n          break;\n        case autosa_loop_latency:\n          p = isl_printer_print_str(p, \"latency\");\n          p = isl_printer_yaml_next(p);\n          break;\n        case autosa_loop_simd:\n          p = isl_printer_print_str(p, \"simd\");\n          p = isl_printer_yaml_next(p);\n          break;\n        case autosa_loop_array_part:\n          p = isl_printer_print_str(p, \"array_part\");\n          p = isl_printer_yaml_next(p);\n          break;\n        default:\n          p = isl_printer_print_str(p, \"unknown\");\n          p = isl_printer_yaml_next(p);\n          break;\n      }\n    }\n    p = isl_printer_yaml_end_sequence(p);\n    p = isl_printer_set_yaml_style(p, style);\n  }\n\tsched_pos = any_sched_pos(band);\n\tif (sched_pos < 0)\n\t\treturn isl_printer_free(p);\n\tif (sched_pos)\t\t {\n\t\tint i;\n\t\tisl_size n;\n\t\tint style;\n\n\t\tp = isl_printer_yaml_next(p);\n\t\tp = isl_printer_print_str(p, \"sched_pos\");\n\t\tp = isl_printer_yaml_next(p);\n\t\tstyle = isl_printer_get_yaml_style(p);\n\t\tp = isl_printer_set_yaml_style(p, ISL_YAML_STYLE_FLOW);\n\t\tp = isl_printer_yaml_start_sequence(p);\n\t\tn = isl_schedule_band_n_member(band);\n\t\tif (n < 0)\n\t\t\treturn isl_printer_free(p);\n\t\tfor (i = 0; i < n; ++i) {\n\t\t\tp = isl_printer_print_int(p, isl_schedule_band_member_get_sched_pos(band, i));\n\t\t\tp = isl_printer_yaml_next(p);\n\t\t}\n\t\tp = isl_printer_yaml_end_sequence(p);\n\t\tp = isl_printer_set_yaml_style(p, style);\n\t}\n\t/* AutoSA Extended */\n\n\toptions = isl_schedule_band_get_ast_build_options(band);\n\tempty = isl_union_set_is_empty(options);\n\tif (empty < 0)\n\t\tp = isl_printer_free(p);\n\tif (!empty) {\n\t\tp = isl_printer_yaml_next(p);\n\t\tp = isl_printer_print_str(p, \"options\");\n\t\tp = isl_printer_yaml_next(p);\n\t\tp = isl_printer_print_str(p, \"\\\"\");\n\t\tp = isl_printer_print_union_set(p, options);\n\t\tp = isl_printer_print_str(p, \"\\\"\");\n\t}\n\tisl_union_set_free(options);\n\n\treturn p;\n}\n\n#undef BASE\n#define BASE str\n#define isl_str const char\n#include \"print_yaml_field_templ.c\"\n\n#undef BASE\n#define BASE set\n#include \"print_yaml_field_templ.c\"\n\n#undef BASE\n#define BASE union_set\n#include \"print_yaml_field_templ.c\"\n\n#undef BASE\n#define BASE union_map\n#include \"print_yaml_field_templ.c\"\n\n#undef BASE\n#define BASE union_pw_multi_aff\n#include \"print_yaml_field_templ.c\"\n\n/* Print \"tree\" to \"p\".\n *\n * If \"n_ancestor\" is non-negative, then \"child_pos\" contains the child\n * positions of a descendant of the current node that should be marked\n * (by the comment \"YOU ARE HERE\").  In particular, if \"n_ancestor\"\n * is zero, then the current node should be marked.\n * The marking is only printed in YAML block format.\n *\n * Implicit leaf nodes are not printed, except if they correspond\n * to the node that should be marked.\n */\n__isl_give isl_printer *isl_printer_print_schedule_tree_mark(\n\t__isl_take isl_printer *p, __isl_keep isl_schedule_tree *tree,\n\tint n_ancestor, int *child_pos)\n{\n\tint i;\n\tisl_size n;\n\tint sequence = 0;\n\tint block;\n\n\tblock = isl_printer_get_yaml_style(p) == ISL_YAML_STYLE_BLOCK;\n\n\tp = isl_printer_yaml_start_mapping(p);\n\tif (n_ancestor == 0 && block) {\n\t\tp = isl_printer_print_str(p, \"# YOU ARE HERE\");\n\t\tp = isl_printer_end_line(p);\n\t\tp = isl_printer_start_line(p);\n\t}\n\tswitch (tree->type) {\n\tcase isl_schedule_node_error:\n\t\tp = isl_printer_print_str(p, \"ERROR\");\n\t\tp = isl_printer_yaml_next(p);\n\t\tbreak;\n\tcase isl_schedule_node_leaf:\n\t\tp = isl_printer_print_str(p, \"leaf\");\n\t\tp = isl_printer_yaml_next(p);\n\t\tbreak;\n\tcase isl_schedule_node_sequence:\n\t\tp = isl_printer_print_str(p, \"sequence\");\n\t\tp = isl_printer_yaml_next(p);\n\t\tsequence = 1;\n\t\tbreak;\n\tcase isl_schedule_node_set:\n\t\tp = isl_printer_print_str(p, \"set\");\n\t\tp = isl_printer_yaml_next(p);\n\t\tsequence = 1;\n\t\tbreak;\n\tcase isl_schedule_node_context:\n\t\tp = print_yaml_field_set(p, \"context\", tree->context);\n\t\tbreak;\n\tcase isl_schedule_node_domain:\n\t\tp = print_yaml_field_union_set(p, \"domain\", tree->domain);\n\t\tbreak;\n\tcase isl_schedule_node_expansion:\n\t\tp = print_yaml_field_union_pw_multi_aff(p, \"contraction\",\n\t\t\t\t\t\t\ttree->contraction);\n\t\tp = print_yaml_field_union_map(p, \"expansion\", tree->expansion);\n\t\tbreak;\n\tcase isl_schedule_node_extension:\n\t\tp = print_yaml_field_union_map(p, \"extension\", tree->extension);\n\t\tbreak;\n\tcase isl_schedule_node_filter:\n\t\tp = print_yaml_field_union_set(p, \"filter\", tree->filter);\n\t\tbreak;\n\tcase isl_schedule_node_guard:\n\t\tp = print_yaml_field_set(p, \"guard\", tree->guard);\n\t\tbreak;\n\tcase isl_schedule_node_mark:\n\t\tp = print_yaml_field_str(p, \"mark\",\n\t\t\t\t\tisl_id_get_name(tree->mark));\n\t\tbreak;\n\tcase isl_schedule_node_band:\n\t\tp = print_tree_band(p, tree->band);\n\t\tp = isl_printer_yaml_next(p);\n\t\tbreak;\n\t}\n\n\tn = isl_schedule_tree_n_children(tree);\n\tif (n < 0)\n\t\treturn isl_printer_free(p);\n\tif (n == 0) {\n\t\tif (n_ancestor > 0 && block) {\n\t\t\tisl_schedule_tree *leaf;\n\n\t\t\tp = isl_printer_print_str(p, \"child\");\n\t\t\tp = isl_printer_yaml_next(p);\n\t\t\tleaf = isl_schedule_tree_leaf(isl_printer_get_ctx(p));\n\t\t\tp = isl_printer_print_schedule_tree_mark(p,\n\t\t\t\t\tleaf, 0, NULL);\n\t\t\tisl_schedule_tree_free(leaf);\n\t\t\tp = isl_printer_yaml_next(p);\n\t\t}\n\t\treturn isl_printer_yaml_end_mapping(p);\n\t}\n\n\tif (sequence) {\n\t\tp = isl_printer_yaml_start_sequence(p);\n\t} else {\n\t\tp = isl_printer_print_str(p, \"child\");\n\t\tp = isl_printer_yaml_next(p);\n\t}\n\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_schedule_tree *t;\n\n\t\tt = isl_schedule_tree_get_child(tree, i);\n\t\tif (n_ancestor > 0 && child_pos[0] == i)\n\t\t\tp = isl_printer_print_schedule_tree_mark(p, t,\n\t\t\t\t\t\tn_ancestor - 1, child_pos + 1);\n\t\telse\n\t\t\tp = isl_printer_print_schedule_tree_mark(p, t,\n\t\t\t\t\t\t-1, NULL);\n\t\tisl_schedule_tree_free(t);\n\n\t\tp = isl_printer_yaml_next(p);\n\t}\n\n\tif (sequence)\n\t\tp = isl_printer_yaml_end_sequence(p);\n\tp = isl_printer_yaml_end_mapping(p);\n\n\treturn p;\n}\n\n/* Print \"tree\" to \"p\".\n */\n__isl_give isl_printer *isl_printer_print_schedule_tree(\n\t__isl_take isl_printer *p, __isl_keep isl_schedule_tree *tree)\n{\n\treturn isl_printer_print_schedule_tree_mark(p, tree, -1, NULL);\n}\n\nvoid isl_schedule_tree_dump(__isl_keep isl_schedule_tree *tree)\n{\n\tisl_ctx *ctx;\n\tisl_printer *printer;\n\n\tif (!tree)\n\t\treturn;\n\n\tctx = isl_schedule_tree_get_ctx(tree);\n\tprinter = isl_printer_to_file(ctx, stderr);\n\tprinter = isl_printer_set_yaml_style(printer, ISL_YAML_STYLE_BLOCK);\n\tprinter = isl_printer_print_schedule_tree(printer, tree);\n\n\tisl_printer_free(printer);\n}\n\n/* AutoSA Extended */\n/* Return the space_time property of the band member at position \n * \"pos\" of the band tree root.\n */\nenum autosa_loop_type isl_schedule_tree_band_member_get_space_time(\n  __isl_keep isl_schedule_tree *tree, int pos)\n{\n  if (!tree)\n    return autosa_loop_error;\n  \n  if (tree->type != isl_schedule_node_band)\n    isl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n        \"not a band node\", return autosa_loop_error);\n\n  return isl_schedule_band_member_get_space_time(tree->band, pos);\n}\n\n/* Set the space_time property of the band member accoding to \"loop_type\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_member_set_space_time(\n  __isl_take isl_schedule_tree *tree, int pos, enum autosa_loop_type loop_type)\n{\n  if (!tree)\n    return NULL;\n  if (tree->type != isl_schedule_node_band)\n    isl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n        \"not a band node\", return isl_schedule_tree_free(tree));\n  if (isl_schedule_tree_band_member_get_space_time(tree, pos) == \n      loop_type)\n    return tree;\n  tree = isl_schedule_tree_cow(tree);\n  if (!tree)\n    return NULL;\n\n  tree->band = isl_schedule_band_member_set_space_time(tree->band, pos,\n      loop_type);\n  if (!tree->band)\n    return isl_schedule_tree_free(tree);\n  \n  return tree;\n}\n\n/* Return the pe_opt property of the band member at position \n * \"pos\" of the band tree root.\n */\nenum autosa_loop_type isl_schedule_tree_band_member_get_pe_opt(\n  __isl_keep isl_schedule_tree *tree, int pos)\n{\n  if (!tree)\n    return isl_size_error;\n  \n  if (tree->type != isl_schedule_node_band)\n    isl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n        \"not a band node\", return autosa_loop_error);\n\n  return isl_schedule_band_member_get_pe_opt(tree->band, pos);\n}\n\n/* Set the space_time property of the band member accoding to \"loop_type\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_member_set_pe_opt(\n  __isl_take isl_schedule_tree *tree, int pos, enum autosa_loop_type loop_type)\n{\n  if (!tree)\n    return NULL;\n  if (tree->type != isl_schedule_node_band)\n    isl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n        \"not a band node\", return isl_schedule_tree_free(tree));\n  if (isl_schedule_tree_band_member_get_pe_opt(tree, pos) == \n      loop_type)\n    return tree;\n  tree = isl_schedule_tree_cow(tree);\n  if (!tree)\n    return NULL;\n\n  tree->band = isl_schedule_band_member_set_pe_opt(tree->band, pos,\n      loop_type);\n  if (!tree->band)\n    return isl_schedule_tree_free(tree);\n  \n  return tree;\n}\n\n/* Return the sched_pos property of the band member at position \n * \"pos\" of the band tree root.\n */\nint isl_schedule_tree_band_member_get_sched_pos(\n  __isl_keep isl_schedule_tree *tree, int pos)\n{\n  if (!tree)\n    return isl_size_error;\n  \n  if (tree->type != isl_schedule_node_band)\n    isl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n        \"not a band node\", return autosa_loop_error);\n\n  return isl_schedule_band_member_get_sched_pos(tree->band, pos);\n}\n\n/* Set the sched_pos property of the band member accoding to \"sched_pos\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_member_set_sched_pos(\n  __isl_take isl_schedule_tree *tree, int pos, int sched_pos)\n{\n  if (!tree)\n    return NULL;\n  if (tree->type != isl_schedule_node_band)\n    isl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n        \"not a band node\", return isl_schedule_tree_free(tree));\n  if (isl_schedule_tree_band_member_get_sched_pos(tree, pos) == \n      sched_pos)\n    return tree;\n  tree = isl_schedule_tree_cow(tree);\n  if (!tree)\n    return NULL;\n\n  tree->band = isl_schedule_band_member_set_sched_pos(tree->band, pos,\n      sched_pos);\n  if (!tree->band)\n    return isl_schedule_tree_free(tree);\n  \n  return tree;\n}\n\n/* Return the iter property of the band member at position \n * \"pos\" of the band tree root.\n */\nvoid *isl_schedule_tree_band_member_get_iter(\n  __isl_keep isl_schedule_tree *tree, int pos)\n{\n  if (!tree)\n    return NULL;\n  \n  if (tree->type != isl_schedule_node_band)\n    isl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n        \"not a band node\", return NULL);\n\n  return isl_schedule_band_member_get_iter(tree->band, pos);\n}\n\n/* Set the iter property of the band member accoding to \"iter\".\n */\n__isl_give isl_schedule_tree *isl_schedule_tree_band_member_set_iter(\n  __isl_take isl_schedule_tree *tree, int pos, void *iter)\n{\n  if (!tree)\n    return NULL;\n  if (tree->type != isl_schedule_node_band)\n    isl_die(isl_schedule_tree_get_ctx(tree), isl_error_invalid,\n        \"not a band node\", return isl_schedule_tree_free(tree));\n  if (isl_schedule_tree_band_member_get_iter(tree, pos) == \n      iter)\n    return tree;\n  tree = isl_schedule_tree_cow(tree);\n  if (!tree)\n    return NULL;\n\n  tree->band = isl_schedule_band_member_set_iter(tree->band, pos,\n      iter);\n  if (!tree->band)\n    return isl_schedule_tree_free(tree);\n  \n  return tree;\n}\n/* AutoSA Extended */"
  },
  {
    "path": "autosa_scripts/ppcg_changes/isl/isl_schedule_tree.h",
    "content": "#ifndef ISL_SCHEDLUE_TREE_H\n#define ISL_SCHEDLUE_TREE_H\n\n#include <isl_schedule_band.h>\n#include <isl/schedule.h>\n#include <isl/set.h>\n#include <isl/union_set.h>\n\nstruct isl_schedule_tree;\ntypedef struct isl_schedule_tree isl_schedule_tree;\n\nISL_DECLARE_LIST(schedule_tree)\n\n/* A schedule (sub)tree.\n *\n * The leaves of a tree are not explicitly represented inside\n * the isl_schedule_tree, except when the tree consists of only a leaf.\n *\n * The \"band\" field is valid when type is isl_schedule_node_band.\n * The \"context\" field is valid when type is isl_schedule_node_context\n * and represents constraints on the flat product of the outer band nodes,\n * possibly introducing additional parameters.\n * The \"domain\" field is valid when type is isl_schedule_node_domain\n * and introduces the statement instances scheduled by the tree.\n *\n * The \"contraction\" and \"expansion\" fields are valid when type\n * is isl_schedule_node_expansion.\n * \"expansion\" expands the reaching domain elements to one or more\n * domain elements for the subtree.\n * \"contraction\" maps these elements back to the corresponding\n * reaching domain element.  It does not involve any domain constraints.\n *\n * The \"extension\" field is valid when the is isl_schedule_node_extension\n * maps outer schedule dimensions (the flat product of the outer band nodes)\n * to additional iteration domains.\n *\n * The \"filter\" field is valid when type is isl_schedule_node_filter\n * and represents the statement instances selected by the node.\n *\n * The \"guard\" field is valid when type is isl_schedule_node_guard\n * and represents constraints on the flat product of the outer band nodes\n * that need to be enforced by the outer nodes in the generated AST.\n *\n * The \"mark\" field is valid when type is isl_schedule_node_mark and\n * identifies the mark.\n *\n * The \"children\" field is valid for all types except\n * isl_schedule_node_leaf.  This field is NULL if there are\n * no children (except for the implicit leaves).\n *\n * anchored is set if the node or any of its descendants depends\n * on its position in the schedule tree.\n */\nstruct isl_schedule_tree {\n\tint ref;\n\tisl_ctx *ctx;\n\tint anchored;\n\tenum isl_schedule_node_type type;\n\tunion {\n\t\tisl_schedule_band *band;\n\t\tisl_set *context;\n\t\tisl_union_set *domain;\n\t\tstruct {\n\t\t\tisl_union_pw_multi_aff *contraction;\n\t\t\tisl_union_map *expansion;\n\t\t};\n\t\tisl_union_map *extension;\n\t\tisl_union_set *filter;\n\t\tisl_set *guard;\n\t\tisl_id *mark;\n\t};\n\tisl_schedule_tree_list *children;\n};\n\nisl_ctx *isl_schedule_tree_get_ctx(__isl_keep isl_schedule_tree *tree);\nenum isl_schedule_node_type isl_schedule_tree_get_type(\n\t__isl_keep isl_schedule_tree *tree);\n\n__isl_give isl_schedule_tree *isl_schedule_tree_leaf(isl_ctx *ctx);\nint isl_schedule_tree_is_leaf(__isl_keep isl_schedule_tree *tree);\n\nisl_bool isl_schedule_tree_plain_is_equal(__isl_keep isl_schedule_tree *tree1,\n\t__isl_keep isl_schedule_tree *tree2);\n\n__isl_give isl_schedule_tree *isl_schedule_tree_copy(\n\t__isl_keep isl_schedule_tree *tree);\n__isl_null isl_schedule_tree *isl_schedule_tree_free(\n\t__isl_take isl_schedule_tree *tree);\n\n__isl_give isl_schedule_tree *isl_schedule_tree_from_band(\n\t__isl_take isl_schedule_band *band);\n__isl_give isl_schedule_tree *isl_schedule_tree_from_context(\n\t__isl_take isl_set *context);\n__isl_give isl_schedule_tree *isl_schedule_tree_from_domain(\n\t__isl_take isl_union_set *domain);\n__isl_give isl_schedule_tree *isl_schedule_tree_from_expansion(\n\t__isl_take isl_union_pw_multi_aff *contraction,\n\t__isl_take isl_union_map *expansion);\n__isl_give isl_schedule_tree *isl_schedule_tree_from_extension(\n\t__isl_take isl_union_map *extension);\n__isl_give isl_schedule_tree *isl_schedule_tree_from_filter(\n\t__isl_take isl_union_set *filter);\n__isl_give isl_schedule_tree *isl_schedule_tree_from_guard(\n\t__isl_take isl_set *guard);\n__isl_give isl_schedule_tree *isl_schedule_tree_from_children(\n\tenum isl_schedule_node_type type,\n\t__isl_take isl_schedule_tree_list *list);\n__isl_give isl_schedule_tree *isl_schedule_tree_from_pair(\n\tenum isl_schedule_node_type type, __isl_take isl_schedule_tree *tree1,\n\t__isl_take isl_schedule_tree *tree2);\n__isl_give isl_schedule_tree *isl_schedule_tree_sequence_pair(\n\t__isl_take isl_schedule_tree *tree1,\n\t__isl_take isl_schedule_tree *tree2);\n__isl_give isl_schedule_tree *isl_schedule_tree_set_pair(\n\t__isl_take isl_schedule_tree *tree1,\n\t__isl_take isl_schedule_tree *tree2);\n\nisl_bool isl_schedule_tree_is_subtree_anchored(\n\t__isl_keep isl_schedule_tree *tree);\n\n__isl_give isl_space *isl_schedule_tree_band_get_space(\n\t__isl_keep isl_schedule_tree *tree);\n__isl_give isl_schedule_tree *isl_schedule_tree_band_intersect_domain(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *domain);\n__isl_give isl_multi_union_pw_aff *isl_schedule_tree_band_get_partial_schedule(\n\t__isl_keep isl_schedule_tree *tree);\n__isl_give isl_schedule_tree *isl_schedule_tree_band_set_partial_schedule(\n\t__isl_take isl_schedule_tree *tree,\n\t__isl_take isl_multi_union_pw_aff *schedule);\nenum isl_ast_loop_type isl_schedule_tree_band_member_get_ast_loop_type(\n\t__isl_keep isl_schedule_tree *tree, int pos);\n__isl_give isl_schedule_tree *isl_schedule_tree_band_member_set_ast_loop_type(\n\t__isl_take isl_schedule_tree *tree, int pos,\n\tenum isl_ast_loop_type type);\nenum isl_ast_loop_type isl_schedule_tree_band_member_get_isolate_ast_loop_type(\n\t__isl_keep isl_schedule_tree *tree, int pos);\n__isl_give isl_schedule_tree *\nisl_schedule_tree_band_member_set_isolate_ast_loop_type(\n\t__isl_take isl_schedule_tree *tree, int pos,\n\tenum isl_ast_loop_type type);\n__isl_give isl_union_set *isl_schedule_tree_band_get_ast_build_options(\n\t__isl_keep isl_schedule_tree *tree);\n__isl_give isl_schedule_tree *isl_schedule_tree_band_set_ast_build_options(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *options);\n__isl_give isl_set *isl_schedule_tree_band_get_ast_isolate_option(\n\t__isl_keep isl_schedule_tree *tree, int depth);\n__isl_give isl_set *isl_schedule_tree_context_get_context(\n\t__isl_keep isl_schedule_tree *tree);\n__isl_give isl_union_set *isl_schedule_tree_domain_get_domain(\n\t__isl_keep isl_schedule_tree *tree);\n__isl_give isl_schedule_tree *isl_schedule_tree_domain_set_domain(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *domain);\n__isl_give isl_union_pw_multi_aff *isl_schedule_tree_expansion_get_contraction(\n\t__isl_keep isl_schedule_tree *tree);\n__isl_give isl_union_map *isl_schedule_tree_expansion_get_expansion(\n\t__isl_keep isl_schedule_tree *tree);\n__isl_give isl_schedule_tree *\nisl_schedule_tree_expansion_set_contraction_and_expansion(\n\t__isl_take isl_schedule_tree *tree,\n\t__isl_take isl_union_pw_multi_aff *contraction,\n\t__isl_take isl_union_map *expansion);\n__isl_give isl_union_map *isl_schedule_tree_extension_get_extension(\n\t__isl_keep isl_schedule_tree *tree);\n__isl_give isl_schedule_tree *isl_schedule_tree_extension_set_extension(\n\t__isl_take isl_schedule_tree *tree,\n\t__isl_take isl_union_map *extension);\n__isl_give isl_union_set *isl_schedule_tree_filter_get_filter(\n\t__isl_keep isl_schedule_tree *tree);\n__isl_give isl_schedule_tree *isl_schedule_tree_filter_set_filter(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *filter);\n__isl_give isl_set *isl_schedule_tree_guard_get_guard(\n\t__isl_keep isl_schedule_tree *tree);\n__isl_give isl_id *isl_schedule_tree_mark_get_id(\n\t__isl_keep isl_schedule_tree *tree);\n\n__isl_give isl_schedule_tree *isl_schedule_tree_first_schedule_descendant(\n\t__isl_take isl_schedule_tree *tree, __isl_keep isl_schedule_tree *leaf);\n__isl_give isl_union_map *isl_schedule_tree_get_subtree_schedule_union_map(\n\t__isl_keep isl_schedule_tree *tree);\n\nisl_size isl_schedule_tree_band_n_member(__isl_keep isl_schedule_tree *tree);\n\nisl_bool isl_schedule_tree_band_member_get_coincident(\n\t__isl_keep isl_schedule_tree *tree, int pos);\n__isl_give isl_schedule_tree *isl_schedule_tree_band_member_set_coincident(\n\t__isl_take isl_schedule_tree *tree, int pos, int coincident);\nisl_bool isl_schedule_tree_band_get_permutable(\n\t__isl_keep isl_schedule_tree *tree);\n__isl_give isl_schedule_tree *isl_schedule_tree_band_set_permutable(\n\t__isl_take isl_schedule_tree *tree, int permutable);\n\nint isl_schedule_tree_has_children(__isl_keep isl_schedule_tree *tree);\nisl_size isl_schedule_tree_n_children(__isl_keep isl_schedule_tree *tree);\n__isl_give isl_schedule_tree *isl_schedule_tree_get_child(\n\t__isl_keep isl_schedule_tree *tree, int pos);\n\n__isl_give isl_schedule_tree *isl_schedule_tree_insert_band(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_schedule_band *band);\n__isl_give isl_schedule_tree *isl_schedule_tree_insert_context(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_set *context);\n__isl_give isl_schedule_tree *isl_schedule_tree_insert_domain(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *domain);\n__isl_give isl_schedule_tree *isl_schedule_tree_insert_expansion(\n\t__isl_take isl_schedule_tree *tree,\n\t__isl_take isl_union_pw_multi_aff *contraction,\n\t__isl_take isl_union_map *expansion);\n__isl_give isl_schedule_tree *isl_schedule_tree_insert_extension(\n\t__isl_take isl_schedule_tree *tree,\n\t__isl_take isl_union_map *extension);\n__isl_give isl_schedule_tree *isl_schedule_tree_insert_filter(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *filter);\n__isl_give isl_schedule_tree *isl_schedule_tree_children_insert_filter(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *filter);\n__isl_give isl_schedule_tree *isl_schedule_tree_insert_guard(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_set *guard);\n__isl_give isl_schedule_tree *isl_schedule_tree_insert_mark(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_id *mark);\n\n__isl_give isl_schedule_tree *isl_schedule_tree_append_to_leaves(\n\t__isl_take isl_schedule_tree *tree1,\n\t__isl_take isl_schedule_tree *tree2);\n\n__isl_give isl_schedule_tree *isl_schedule_tree_band_scale(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_multi_val *mv);\n__isl_give isl_schedule_tree *isl_schedule_tree_band_scale_down(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_multi_val *mv);\n__isl_give isl_schedule_tree *isl_schedule_tree_band_mod(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_multi_val *mv);\n__isl_give isl_schedule_tree *isl_schedule_tree_band_tile(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_multi_val *sizes);\n__isl_give isl_schedule_tree *isl_schedule_tree_band_shift(\n\t__isl_take isl_schedule_tree *tree,\n\t__isl_take isl_multi_union_pw_aff *shift);\n__isl_give isl_schedule_tree *isl_schedule_tree_band_split(\n\t__isl_take isl_schedule_tree *tree, int pos, int depth);\n__isl_give isl_schedule_tree *isl_schedule_tree_band_gist(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_union_set *context);\n\n__isl_give isl_schedule_tree *isl_schedule_tree_child(\n\t__isl_take isl_schedule_tree *tree, int pos);\n__isl_give isl_schedule_tree *isl_schedule_tree_reset_children(\n\t__isl_take isl_schedule_tree *tree);\n__isl_give isl_schedule_tree *isl_schedule_tree_drop_child(\n\t__isl_take isl_schedule_tree *tree, int pos);\n__isl_give isl_schedule_tree *isl_schedule_tree_replace_child(\n\t__isl_take isl_schedule_tree *tree, int pos,\n\t__isl_take isl_schedule_tree *new_child);\n__isl_give isl_schedule_tree *isl_schedule_tree_sequence_splice(\n\t__isl_take isl_schedule_tree *tree, int pos,\n\t__isl_take isl_schedule_tree *child);\n\n__isl_give isl_schedule_tree *isl_schedule_tree_reset_user(\n\t__isl_take isl_schedule_tree *tree);\n__isl_give isl_schedule_tree *isl_schedule_tree_align_params(\n\t__isl_take isl_schedule_tree *tree, __isl_take isl_space *space);\n__isl_give isl_schedule_tree *isl_schedule_tree_pullback_union_pw_multi_aff(\n\t__isl_take isl_schedule_tree *tree,\n\t__isl_take isl_union_pw_multi_aff *upma);\n\n__isl_give isl_printer *isl_printer_print_schedule_tree(\n\t__isl_take isl_printer *p, __isl_keep isl_schedule_tree *tree);\n__isl_give isl_printer *isl_printer_print_schedule_tree_mark(\n\t__isl_take isl_printer *p, __isl_keep isl_schedule_tree *tree,\n\tint n_ancestor, int *child_pos);\n\n/* AutoSA Extended */\n__isl_take isl_schedule_tree *isl_schedule_tree_dup(\n\t__isl_keep isl_schedule_tree *tree);\nenum autosa_loop_type isl_schedule_tree_band_member_get_space_time(\n\t__isl_keep isl_schedule_tree *tree, int pos);\n__isl_give isl_schedule_tree *isl_schedule_tree_band_member_set_space_time(\n\t__isl_take isl_schedule_tree *tree, int pos, enum autosa_loop_type loop_type);\nenum autosa_loop_type isl_schedule_tree_band_member_get_pe_opt(\n\t__isl_keep isl_schedule_tree *tree, int pos);\n__isl_give isl_schedule_tree *isl_schedule_tree_band_member_set_pe_opt(\n\t__isl_take isl_schedule_tree *tree, int pos, enum autosa_loop_type loop_type);\nint isl_schedule_tree_band_member_get_sched_pos(\n\t__isl_keep isl_schedule_tree *tree, int pos);\n__isl_give isl_schedule_tree *isl_schedule_tree_band_member_set_sched_pos(\n\t__isl_take isl_schedule_tree *tree, int pos, int sched_pos);\nvoid *isl_schedule_tree_band_member_get_iter(\n\t__isl_keep isl_schedule_tree *tree, int pos);\n__isl_give isl_schedule_tree *isl_schedule_tree_band_member_set_iter(\n\t__isl_take isl_schedule_tree *tree, int pos, void *iter);\n/* AutoSA Extended */\n\n#endif\n"
  },
  {
    "path": "autosa_scripts/ppcg_changes/isl/schedule.h",
    "content": "#ifndef ISL_SCHEDULE_H\n#define ISL_SCHEDULE_H\n\n#include <isl/union_set_type.h>\n#include <isl/union_map_type.h>\n#include <isl/schedule_type.h>\n#include <isl/aff_type.h>\n#include <isl/space_type.h>\n#include <isl/set_type.h>\n#include <isl/list.h>\n#include <isl/printer_type.h>\n\n#if defined(__cplusplus)\nextern \"C\" {\n#endif\n\nstruct __isl_export isl_schedule_constraints;\ntypedef struct isl_schedule_constraints isl_schedule_constraints;\n\nisl_stat isl_options_set_schedule_max_coefficient(isl_ctx *ctx, int val);\nint isl_options_get_schedule_max_coefficient(isl_ctx *ctx);\n\nisl_stat isl_options_set_schedule_max_constant_term(isl_ctx *ctx, int val);\nint isl_options_get_schedule_max_constant_term(isl_ctx *ctx);\n\nisl_stat isl_options_set_schedule_maximize_band_depth(isl_ctx *ctx, int val);\nint isl_options_get_schedule_maximize_band_depth(isl_ctx *ctx);\n\nisl_stat isl_options_set_schedule_maximize_coincidence(isl_ctx *ctx, int val);\nint isl_options_get_schedule_maximize_coincidence(isl_ctx *ctx);\n\nisl_stat isl_options_set_schedule_outer_coincidence(isl_ctx *ctx, int val);\nint isl_options_get_schedule_outer_coincidence(isl_ctx *ctx);\n\nisl_stat isl_options_set_schedule_split_scaled(isl_ctx *ctx, int val);\nint isl_options_get_schedule_split_scaled(isl_ctx *ctx);\n\nisl_stat isl_options_set_schedule_treat_coalescing(isl_ctx *ctx, int val);\nint isl_options_get_schedule_treat_coalescing(isl_ctx *ctx);\n\nisl_stat isl_options_set_schedule_separate_components(isl_ctx *ctx, int val);\nint isl_options_get_schedule_separate_components(isl_ctx *ctx);\n\nisl_stat isl_options_set_schedule_serialize_sccs(isl_ctx *ctx, int val);\nint isl_options_get_schedule_serialize_sccs(isl_ctx *ctx);\n\nisl_stat isl_options_set_schedule_whole_component(isl_ctx *ctx, int val);\nint isl_options_get_schedule_whole_component(isl_ctx *ctx);\n\nisl_stat isl_options_set_schedule_carry_self_first(isl_ctx *ctx, int val);\nint isl_options_get_schedule_carry_self_first(isl_ctx *ctx);\n\n__isl_give isl_schedule_constraints *isl_schedule_constraints_copy(\n\t__isl_keep isl_schedule_constraints *sc);\n__isl_export\n__isl_give isl_schedule_constraints *isl_schedule_constraints_on_domain(\n\t__isl_take isl_union_set *domain);\n__isl_export\n__isl_give isl_schedule_constraints *isl_schedule_constraints_set_context(\n\t__isl_take isl_schedule_constraints *sc, __isl_take isl_set *context);\n__isl_export\n__isl_give isl_schedule_constraints *isl_schedule_constraints_set_validity(\n\t__isl_take isl_schedule_constraints *sc,\n\t__isl_take isl_union_map *validity);\n__isl_export\n__isl_give isl_schedule_constraints *isl_schedule_constraints_set_coincidence(\n\t__isl_take isl_schedule_constraints *sc,\n\t__isl_take isl_union_map *coincidence);\n__isl_export\n__isl_give isl_schedule_constraints *isl_schedule_constraints_set_proximity(\n\t__isl_take isl_schedule_constraints *sc,\n\t__isl_take isl_union_map *proximity);\n__isl_export\n__isl_give isl_schedule_constraints *\nisl_schedule_constraints_set_conditional_validity(\n\t__isl_take isl_schedule_constraints *sc,\n\t__isl_take isl_union_map *condition,\n\t__isl_take isl_union_map *validity);\n__isl_null isl_schedule_constraints *isl_schedule_constraints_free(\n\t__isl_take isl_schedule_constraints *sc);\n\nisl_ctx *isl_schedule_constraints_get_ctx(\n\t__isl_keep isl_schedule_constraints *sc);\n__isl_export\n__isl_give isl_union_set *isl_schedule_constraints_get_domain(\n\t__isl_keep isl_schedule_constraints *sc);\n__isl_export\n__isl_give isl_set *isl_schedule_constraints_get_context(\n\t__isl_keep isl_schedule_constraints *sc);\n__isl_export\n__isl_give isl_union_map *isl_schedule_constraints_get_validity(\n\t__isl_keep isl_schedule_constraints *sc);\n__isl_export\n__isl_give isl_union_map *isl_schedule_constraints_get_coincidence(\n\t__isl_keep isl_schedule_constraints *sc);\n__isl_export\n__isl_give isl_union_map *isl_schedule_constraints_get_proximity(\n\t__isl_keep isl_schedule_constraints *sc);\n__isl_export\n__isl_give isl_union_map *isl_schedule_constraints_get_conditional_validity(\n\t__isl_keep isl_schedule_constraints *sc);\n__isl_export\n__isl_give isl_union_map *\nisl_schedule_constraints_get_conditional_validity_condition(\n\t__isl_keep isl_schedule_constraints *sc);\n\n__isl_give isl_schedule_constraints *isl_schedule_constraints_apply(\n\t__isl_take isl_schedule_constraints *sc,\n\t__isl_take isl_union_map *umap);\n\n__isl_constructor\n__isl_give isl_schedule_constraints *isl_schedule_constraints_read_from_str(\n\tisl_ctx *ctx, const char *str);\n__isl_give isl_schedule_constraints *isl_schedule_constraints_read_from_file(\n\tisl_ctx *ctx, FILE *input);\n__isl_give isl_printer *isl_printer_print_schedule_constraints(\n\t__isl_take isl_printer *p, __isl_keep isl_schedule_constraints *sc);\nvoid isl_schedule_constraints_dump(__isl_keep isl_schedule_constraints *sc);\n__isl_give char *isl_schedule_constraints_to_str(\n\t__isl_keep isl_schedule_constraints *sc);\n\n__isl_export\n__isl_give isl_schedule *isl_schedule_constraints_compute_schedule(\n\t__isl_take isl_schedule_constraints *sc);\n\n__isl_give isl_schedule *isl_union_set_compute_schedule(\n\t__isl_take isl_union_set *domain,\n\t__isl_take isl_union_map *validity,\n\t__isl_take isl_union_map *proximity);\n\n__isl_give isl_schedule *isl_schedule_empty(__isl_take isl_space *space);\n__isl_export\n__isl_give isl_schedule *isl_schedule_from_domain(\n\t__isl_take isl_union_set *domain);\n__isl_give isl_schedule *isl_schedule_copy(__isl_keep isl_schedule *sched);\n__isl_null isl_schedule *isl_schedule_free(__isl_take isl_schedule *sched);\n__isl_export\n__isl_give isl_union_map *isl_schedule_get_map(__isl_keep isl_schedule *sched);\n\nisl_ctx *isl_schedule_get_ctx(__isl_keep isl_schedule *sched);\nisl_bool isl_schedule_plain_is_equal(__isl_keep isl_schedule *schedule1,\n\t__isl_keep isl_schedule *schedule2);\n\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_get_root(\n\t__isl_keep isl_schedule *schedule);\n__isl_give isl_union_set *isl_schedule_get_domain(\n\t__isl_keep isl_schedule *schedule);\n\nisl_stat isl_schedule_foreach_schedule_node_top_down(\n\t__isl_keep isl_schedule *sched,\n\tisl_bool (*fn)(__isl_keep isl_schedule_node *node, void *user),\n\tvoid *user);\n__isl_give isl_schedule *isl_schedule_map_schedule_node_bottom_up(\n\t__isl_take isl_schedule *schedule,\n\t__isl_give isl_schedule_node *(*fn)(\n\t\t__isl_take isl_schedule_node *node, void *user), void *user);\n\n__isl_give isl_schedule *isl_schedule_insert_context(\n\t__isl_take isl_schedule *schedule, __isl_take isl_set *context);\n__isl_give isl_schedule *isl_schedule_insert_partial_schedule(\n\t__isl_take isl_schedule *schedule,\n\t__isl_take isl_multi_union_pw_aff *partial);\n__isl_give isl_schedule *isl_schedule_insert_guard(\n\t__isl_take isl_schedule *schedule, __isl_take isl_set *guard);\n__isl_give isl_schedule *isl_schedule_sequence(\n\t__isl_take isl_schedule *schedule1, __isl_take isl_schedule *schedule2);\n__isl_give isl_schedule *isl_schedule_set(\n\t__isl_take isl_schedule *schedule1, __isl_take isl_schedule *schedule2);\n__isl_give isl_schedule *isl_schedule_intersect_domain(\n\t__isl_take isl_schedule *schedule, __isl_take isl_union_set *domain);\n__isl_give isl_schedule *isl_schedule_gist_domain_params(\n\t__isl_take isl_schedule *schedule, __isl_take isl_set *context);\n\n__isl_give isl_schedule *isl_schedule_reset_user(\n\t__isl_take isl_schedule *schedule);\n__isl_give isl_schedule *isl_schedule_align_params(\n\t__isl_take isl_schedule *schedule, __isl_take isl_space *space);\n__isl_overload\n__isl_give isl_schedule *isl_schedule_pullback_union_pw_multi_aff(\n\t__isl_take isl_schedule *schedule,\n\t__isl_take isl_union_pw_multi_aff *upma);\n__isl_give isl_schedule *isl_schedule_expand(__isl_take isl_schedule *schedule,\n\t__isl_take isl_union_pw_multi_aff *contraction,\n\t__isl_take isl_schedule *expansion);\n\n__isl_give isl_schedule *isl_schedule_read_from_file(isl_ctx *ctx, FILE *input);\n__isl_constructor\n__isl_give isl_schedule *isl_schedule_read_from_str(isl_ctx *ctx,\n\tconst char *str);\n__isl_give isl_printer *isl_printer_print_schedule(__isl_take isl_printer *p,\n\t__isl_keep isl_schedule *schedule);\nvoid isl_schedule_dump(__isl_keep isl_schedule *schedule);\n__isl_give char *isl_schedule_to_str(__isl_keep isl_schedule *schedule);\n\n/* AutoSA Extended */\n__isl_give isl_schedule *isl_schedule_dup(__isl_keep isl_schedule *sched);\n/* AutoSA Extended */\n\n#if defined(__cplusplus)\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "autosa_scripts/ppcg_changes/isl/schedule_node.h",
    "content": "#ifndef ISL_SCHEDULE_NODE_H\n#define ISL_SCHEDULE_NODE_H\n\n#include <isl/schedule_type.h>\n#include <isl/union_set_type.h>\n#include <isl/aff_type.h>\n#include <isl/ast_type.h>\n#include <isl/val_type.h>\n#include <isl/space_type.h>\n#include <isl/id_type.h>\n#include <isl/set.h>\n\n#if defined(__cplusplus)\nextern \"C\" {\n#endif\n\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_from_domain(\n\t__isl_take isl_union_set *domain);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_from_extension(\n\t__isl_take isl_union_map *extension);\n__isl_give isl_schedule_node *isl_schedule_node_copy(\n\t__isl_keep isl_schedule_node *node);\n__isl_null isl_schedule_node *isl_schedule_node_free(\n\t__isl_take isl_schedule_node *node);\n\n__isl_export\nisl_bool isl_schedule_node_is_equal(__isl_keep isl_schedule_node *node1,\n\t__isl_keep isl_schedule_node *node2);\n\nisl_ctx *isl_schedule_node_get_ctx(__isl_keep isl_schedule_node *node);\n__isl_subclass(isl_schedule_node)\nenum isl_schedule_node_type isl_schedule_node_get_type(\n\t__isl_keep isl_schedule_node *node);\nenum isl_schedule_node_type isl_schedule_node_get_parent_type(\n\t__isl_keep isl_schedule_node *node);\n__isl_export\n__isl_give isl_schedule *isl_schedule_node_get_schedule(\n\t__isl_keep isl_schedule_node *node);\n\n__isl_export\nisl_stat isl_schedule_node_foreach_descendant_top_down(\n\t__isl_keep isl_schedule_node *node,\n\tisl_bool (*fn)(__isl_keep isl_schedule_node *node, void *user),\n\tvoid *user);\n__isl_export\nisl_bool isl_schedule_node_every_descendant(__isl_keep isl_schedule_node *node,\n\tisl_bool (*test)(__isl_keep isl_schedule_node *node, void *user),\n\tvoid *user);\n__isl_export\nisl_stat isl_schedule_node_foreach_ancestor_top_down(\n\t__isl_keep isl_schedule_node *node,\n\tisl_stat (*fn)(__isl_keep isl_schedule_node *node, void *user),\n\tvoid *user);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_map_descendant_bottom_up(\n\t__isl_take isl_schedule_node *node,\n\t__isl_give isl_schedule_node *(*fn)(__isl_take isl_schedule_node *node,\n\t\tvoid *user), void *user);\n\n__isl_export\nisl_size isl_schedule_node_get_tree_depth(__isl_keep isl_schedule_node *node);\n__isl_export\nisl_bool isl_schedule_node_has_parent(__isl_keep isl_schedule_node *node);\n__isl_export\nisl_bool isl_schedule_node_has_children(__isl_keep isl_schedule_node *node);\n__isl_export\nisl_bool isl_schedule_node_has_previous_sibling(\n\t__isl_keep isl_schedule_node *node);\n__isl_export\nisl_bool isl_schedule_node_has_next_sibling(__isl_keep isl_schedule_node *node);\n__isl_export\nisl_size isl_schedule_node_n_children(__isl_keep isl_schedule_node *node);\n__isl_export\nisl_size isl_schedule_node_get_child_position(\n\t__isl_keep isl_schedule_node *node);\n__isl_export\nisl_size isl_schedule_node_get_ancestor_child_position(\n\t__isl_keep isl_schedule_node *node,\n\t__isl_keep isl_schedule_node *ancestor);\n__isl_give isl_schedule_node *isl_schedule_node_get_child(\n\t__isl_keep isl_schedule_node *node, int pos);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_get_shared_ancestor(\n\t__isl_keep isl_schedule_node *node1,\n\t__isl_keep isl_schedule_node *node2);\n\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_root(\n\t__isl_take isl_schedule_node *node);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_parent(\n\t__isl_take isl_schedule_node *node);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_ancestor(\n\t__isl_take isl_schedule_node *node, int generation);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_child(\n\t__isl_take isl_schedule_node *node, int pos);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_first_child(\n\t__isl_take isl_schedule_node *node);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_previous_sibling(\n\t__isl_take isl_schedule_node *node);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_next_sibling(\n\t__isl_take isl_schedule_node *node);\n\n__isl_export\nisl_bool isl_schedule_node_is_subtree_anchored(\n\t__isl_keep isl_schedule_node *node);\n\n__isl_give isl_schedule_node *isl_schedule_node_group(\n\t__isl_take isl_schedule_node *node, __isl_take isl_id *group_id);\n\n__isl_give isl_schedule_node *isl_schedule_node_sequence_splice_child(\n\t__isl_take isl_schedule_node *node, int pos);\n\n__isl_give isl_space *isl_schedule_node_band_get_space(\n\t__isl_keep isl_schedule_node *node);\n__isl_export\n__isl_give isl_multi_union_pw_aff *isl_schedule_node_band_get_partial_schedule(\n\t__isl_keep isl_schedule_node *node);\n__isl_give isl_union_map *isl_schedule_node_band_get_partial_schedule_union_map(\n\t__isl_keep isl_schedule_node *node);\nenum isl_ast_loop_type isl_schedule_node_band_member_get_ast_loop_type(\n\t__isl_keep isl_schedule_node *node, int pos);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_band_member_set_ast_loop_type(\n\t__isl_take isl_schedule_node *node, int pos,\n\tenum isl_ast_loop_type type);\nenum isl_ast_loop_type isl_schedule_node_band_member_get_isolate_ast_loop_type(\n\t__isl_keep isl_schedule_node *node, int pos);\n__isl_give isl_schedule_node *\nisl_schedule_node_band_member_set_isolate_ast_loop_type(\n\t__isl_take isl_schedule_node *node, int pos,\n\tenum isl_ast_loop_type type);\n__isl_export\n__isl_give isl_union_set *isl_schedule_node_band_get_ast_build_options(\n\t__isl_keep isl_schedule_node *node);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_band_set_ast_build_options(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_set *options);\n__isl_export\n__isl_give isl_set *isl_schedule_node_band_get_ast_isolate_option(\n\t__isl_keep isl_schedule_node *node);\n__isl_export\nisl_size isl_schedule_node_band_n_member(__isl_keep isl_schedule_node *node);\n__isl_export\nisl_bool isl_schedule_node_band_member_get_coincident(\n\t__isl_keep isl_schedule_node *node, int pos);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_band_member_set_coincident(\n\t__isl_take isl_schedule_node *node, int pos, int coincident);\n__isl_export\nisl_bool isl_schedule_node_band_get_permutable(\n\t__isl_keep isl_schedule_node *node);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_band_set_permutable(\n\t__isl_take isl_schedule_node *node, int permutable);\n\nisl_stat isl_options_set_tile_scale_tile_loops(isl_ctx *ctx, int val);\nint isl_options_get_tile_scale_tile_loops(isl_ctx *ctx);\nisl_stat isl_options_set_tile_shift_point_loops(isl_ctx *ctx, int val);\nint isl_options_get_tile_shift_point_loops(isl_ctx *ctx);\n\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_band_scale(\n\t__isl_take isl_schedule_node *node, __isl_take isl_multi_val *mv);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_band_scale_down(\n\t__isl_take isl_schedule_node *node, __isl_take isl_multi_val *mv);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_band_mod(\n\t__isl_take isl_schedule_node *node, __isl_take isl_multi_val *mv);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_band_shift(\n\t__isl_take isl_schedule_node *node,\n\t__isl_take isl_multi_union_pw_aff *shift);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_band_tile(\n\t__isl_take isl_schedule_node *node, __isl_take isl_multi_val *sizes);\n__isl_give isl_schedule_node *isl_schedule_node_band_sink(\n\t__isl_take isl_schedule_node *node);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_band_split(\n\t__isl_take isl_schedule_node *node, int pos);\n\n__isl_export\n__isl_give isl_set *isl_schedule_node_context_get_context(\n\t__isl_keep isl_schedule_node *node);\n__isl_export\n__isl_give isl_union_set *isl_schedule_node_domain_get_domain(\n\t__isl_keep isl_schedule_node *node);\n__isl_export\n__isl_give isl_union_map *isl_schedule_node_expansion_get_expansion(\n\t__isl_keep isl_schedule_node *node);\n__isl_export\n__isl_give isl_union_pw_multi_aff *isl_schedule_node_expansion_get_contraction(\n\t__isl_keep isl_schedule_node *node);\n__isl_export\n__isl_give isl_union_map *isl_schedule_node_extension_get_extension(\n\t__isl_keep isl_schedule_node *node);\n__isl_export\n__isl_give isl_union_set *isl_schedule_node_filter_get_filter(\n\t__isl_keep isl_schedule_node *node);\n__isl_export\n__isl_give isl_set *isl_schedule_node_guard_get_guard(\n\t__isl_keep isl_schedule_node *node);\n__isl_give isl_id *isl_schedule_node_mark_get_id(\n\t__isl_keep isl_schedule_node *node);\n\nisl_size isl_schedule_node_get_schedule_depth(\n\t__isl_keep isl_schedule_node *node);\n__isl_give isl_union_set *isl_schedule_node_get_domain(\n\t__isl_keep isl_schedule_node *node);\n__isl_give isl_union_set *isl_schedule_node_get_universe_domain(\n\t__isl_keep isl_schedule_node *node);\n__isl_export\n__isl_give isl_multi_union_pw_aff *\nisl_schedule_node_get_prefix_schedule_multi_union_pw_aff(\n\t__isl_keep isl_schedule_node *node);\n__isl_export\n__isl_give isl_union_pw_multi_aff *\nisl_schedule_node_get_prefix_schedule_union_pw_multi_aff(\n\t__isl_keep isl_schedule_node *node);\n__isl_export\n__isl_give isl_union_map *isl_schedule_node_get_prefix_schedule_union_map(\n\t__isl_keep isl_schedule_node *node);\n__isl_give isl_union_map *isl_schedule_node_get_prefix_schedule_relation(\n\t__isl_keep isl_schedule_node *node);\n__isl_give isl_union_map *isl_schedule_node_get_subtree_schedule_union_map(\n\t__isl_keep isl_schedule_node *node);\n__isl_give isl_union_map *isl_schedule_node_get_subtree_expansion(\n\t__isl_keep isl_schedule_node *node);\n__isl_give isl_union_pw_multi_aff *isl_schedule_node_get_subtree_contraction(\n\t__isl_keep isl_schedule_node *node);\n\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_insert_context(\n\t__isl_take isl_schedule_node *node, __isl_take isl_set *context);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_insert_partial_schedule(\n\t__isl_take isl_schedule_node *node,\n\t__isl_take isl_multi_union_pw_aff *schedule);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_insert_filter(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_set *filter);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_insert_guard(\n\t__isl_take isl_schedule_node *node, __isl_take isl_set *context);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_insert_mark(\n\t__isl_take isl_schedule_node *node, __isl_take isl_id *mark);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_insert_sequence(\n\t__isl_take isl_schedule_node *node,\n\t__isl_take isl_union_set_list *filters);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_insert_set(\n\t__isl_take isl_schedule_node *node,\n\t__isl_take isl_union_set_list *filters);\n\n__isl_give isl_schedule_node *isl_schedule_node_cut(\n\t__isl_take isl_schedule_node *node);\n__isl_give isl_schedule_node *isl_schedule_node_delete(\n\t__isl_take isl_schedule_node *node);\n\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_order_before(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_set *filter);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_order_after(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_set *filter);\n\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_graft_before(\n\t__isl_take isl_schedule_node *node,\n\t__isl_take isl_schedule_node *graft);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_graft_after(\n\t__isl_take isl_schedule_node *node,\n\t__isl_take isl_schedule_node *graft);\n\n__isl_give isl_schedule_node *isl_schedule_node_reset_user(\n\t__isl_take isl_schedule_node *node);\n__isl_give isl_schedule_node *isl_schedule_node_align_params(\n\t__isl_take isl_schedule_node *node, __isl_take isl_space *space);\n\n__isl_give isl_printer *isl_printer_print_schedule_node(\n\t__isl_take isl_printer *p, __isl_keep isl_schedule_node *node);\nvoid isl_schedule_node_dump(__isl_keep isl_schedule_node *node);\n__isl_give char *isl_schedule_node_to_str(__isl_keep isl_schedule_node *node);\n\n/* AutoSA Extended */\n__isl_export\nenum autosa_loop_type isl_schedule_node_band_member_get_space_time(\n\t__isl_keep isl_schedule_node *node, int pos);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_band_member_set_space_time(\n\t__isl_take isl_schedule_node *node, int pos, enum autosa_loop_type loop_type);\n__isl_export\nenum autosa_loop_type isl_schedule_node_band_member_get_pe_opt(\n\t__isl_keep isl_schedule_node *node, int pos);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_band_member_set_pe_opt(\n\t__isl_take isl_schedule_node *node, int pos, enum autosa_loop_type loop_type);\n__isl_export\nint isl_schedule_node_band_member_get_sched_pos(\n\t__isl_keep isl_schedule_node *node, int pos);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_band_member_set_sched_pos(\n\t__isl_take isl_schedule_node *node, int pos, int sched_pos);\n__isl_export\nvoid *isl_schedule_node_band_member_get_iter(\n\t__isl_keep isl_schedule_node *node, int pos);\n__isl_export\n__isl_give isl_schedule_node *isl_schedule_node_band_member_set_iter(\n\t__isl_take isl_schedule_node *node, int pos, void *iter);\n\n__isl_export\n__isl_take isl_schedule_node *isl_schedule_node_dup(\n\t__isl_keep isl_schedule_node *node);\n/* AutoSA Extended */\n\n#if defined(__cplusplus)\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "autosa_scripts/ppcg_changes/isl/vec.h",
    "content": "/*\n * Copyright 2008-2009 Katholieke Universiteit Leuven\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege, K.U.Leuven, Departement\n * Computerwetenschappen, Celestijnenlaan 200A, B-3001 Leuven, Belgium\n */\n\n#ifndef ISL_VEC_H\n#define ISL_VEC_H\n\n#include <stdio.h>\n\n#include <isl/ctx.h>\n#include <isl/val_type.h>\n#include <isl/printer.h>\n\n#if defined(__cplusplus)\nextern \"C\" {\n#endif\n\nstruct isl_vec;\ntypedef struct isl_vec isl_vec;\n\n__isl_give isl_vec *isl_vec_alloc(isl_ctx *ctx, unsigned size);\n__isl_give isl_vec *isl_vec_zero(isl_ctx *ctx, unsigned size);\n__isl_give isl_vec *isl_vec_copy(__isl_keep isl_vec *vec);\n__isl_null isl_vec *isl_vec_free(__isl_take isl_vec *vec);\n\nisl_ctx *isl_vec_get_ctx(__isl_keep isl_vec *vec);\n\nisl_size isl_vec_size(__isl_keep isl_vec *vec);\n__isl_give isl_val *isl_vec_get_element_val(__isl_keep isl_vec *vec, int pos);\n__isl_give isl_vec *isl_vec_set_element_si(__isl_take isl_vec *vec,\n\tint pos, int v);\n__isl_give isl_vec *isl_vec_set_element_val(__isl_take isl_vec *vec,\n\tint pos, __isl_take isl_val *v);\n\nisl_bool isl_vec_is_equal(__isl_keep isl_vec *vec1, __isl_keep isl_vec *vec2);\nint isl_vec_cmp_element(__isl_keep isl_vec *vec1, __isl_keep isl_vec *vec2,\n\tint pos);\n\nvoid isl_vec_dump(__isl_keep isl_vec *vec);\n__isl_give isl_printer *isl_printer_print_vec(__isl_take isl_printer *printer,\n\t__isl_keep isl_vec *vec);\n\n__isl_give isl_vec *isl_vec_ceil(__isl_take isl_vec *vec);\nstruct isl_vec *isl_vec_normalize(struct isl_vec *vec);\n__isl_give isl_vec *isl_vec_set_si(__isl_take isl_vec *vec, int v);\n__isl_give isl_vec *isl_vec_set_val(__isl_take isl_vec *vec,\n\t__isl_take isl_val *v);\n__isl_give isl_vec *isl_vec_clr(__isl_take isl_vec *vec);\n__isl_give isl_vec *isl_vec_neg(__isl_take isl_vec *vec);\n__isl_give isl_vec *isl_vec_add(__isl_take isl_vec *vec1,\n\t__isl_take isl_vec *vec2);\n__isl_give isl_vec *isl_vec_extend(__isl_take isl_vec *vec, unsigned size);\n__isl_give isl_vec *isl_vec_zero_extend(__isl_take isl_vec *vec, unsigned size);\n__isl_give isl_vec *isl_vec_concat(__isl_take isl_vec *vec1,\n\t__isl_take isl_vec *vec2);\n\n__isl_give isl_vec *isl_vec_sort(__isl_take isl_vec *vec);\n\n__isl_give isl_vec *isl_vec_read_from_file(isl_ctx *ctx, FILE *input);\n\n__isl_give isl_vec *isl_vec_drop_els(__isl_take isl_vec *vec,\n\tunsigned pos, unsigned n);\n__isl_give isl_vec *isl_vec_add_els(__isl_take isl_vec *vec, unsigned n);\n__isl_give isl_vec *isl_vec_insert_els(__isl_take isl_vec *vec,\n\tunsigned pos, unsigned n);\n__isl_give isl_vec *isl_vec_insert_zero_els(__isl_take isl_vec *vec,\n\tunsigned pos, unsigned n);\n__isl_give isl_vec *isl_vec_move_els(__isl_take isl_vec *vec,\n\tunsigned dst_col, unsigned src_col, unsigned n);\n\n/* AutoSA Extended */\n__isl_give isl_vec *isl_vec_dup(__isl_keep isl_vec *vec);\n/* AutoSA Extended */\n\n#if defined(__cplusplus)\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "autosa_scripts/ppcg_changes/ppcg/files.txt",
    "content": "cpu.h\ncuda.h\nopencl.h\nppcg_options.h\nppcg_options.c\nppcg.c\nppcg.h\nutil.h\nprint.h\nschedule.h\ngpu.h\n"
  },
  {
    "path": "autosa_scripts/resource_model.py",
    "content": "import os\nimport json\nimport re\nimport xml.etree.ElementTree as ET\nimport numpy as np\nimport pandas as pd\nimport joblib\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn import metrics\nfrom sklearn.model_selection import train_test_split\nfrom scipy.stats.mstats import gmean\nfrom statistics import mean\nimport shutil\nimport math\nimport pprint\nimport argparse\n\n# Helper functions to predict certain modules\ndef BRAM_predict_HLS(dw, depth, use_18K=0):\n    \"\"\" Predict the resource usage of BRAM on Xilinx platforms.  \n\n    Parameters\n    ----------\n    dw: int\n        BRAM port width\n    depth: int\n        BRAM depth\n    use_18K: int\n        Force the estimator to use the BRAM18K model. (for HLS FIFOs)\n    \"\"\"\n    if dw <= 18 or use_18K:\n        alpha = np.ceil(float(dw) / 18)\n        BRAM = alpha * np.ceil(float(depth) / 1024)   \n    else:\n        alpha = np.ceil(float(dw) / 36)\n        BRAM = alpha * np.ceil(float(depth) / 512)    \n        \n    return BRAM\n\ndef URAM_predict_HLS(dw, depth):\n    \"\"\" Predict the resource usage of URAM on Xilinx platforms.  \n\n    Parameters\n    ----------\n    dw: int\n        URAM port width\n    depth: int\n        URAM depth\n    \"\"\"\n    alpha = np.ceil(float(dw) / 72)\n    URAM = alpha * np.ceil(float(depth) / 4096)\n    return URAM\n\ndef BRAM_array_predict_HLS(dw, depth, n_part):\n    \"\"\" Predict the BRAM resource usage of arrays on Xilinx platform.  \n\n    Parameters\n    ----------\n    dw: int\n        BRAM port width (in bytes)\n    depth: int\n        BRAM depth\n    n_part: int\n        number of partitions\n    \"\"\"\n    return n_part * BRAM_predict_HLS(dw * 8, np.ceil(float(depth) / n_part))\n\ndef FF_array_predict_HLS(dw, depth):\n    \"\"\" Predict the FF resource usage of arrays on Xilinx platform.\n\n    Parameters\n    ----------\n    dw: int\n        BRAM port width (in bytes)\n    depth : int\n        BRAM depth\n    \"\"\"\n    return dw * 8 * depth\n\ndef URAM_array_predict_HLS(dw, depth, n_part):\n    return n_part * URAM_predict_HLS(dw * 8, np.ceil(float(depth) / n_part))\n\ndef FIFO_predict_xilinx(dw, depth):\n    \"\"\" Predict the resource ussage of fifo modules on Xilinx platforms.\n  \n\n    Parameters\n    ----------\n    dw: int\n        fifo data width\n    depth: int\n        fifo depth\n    \"\"\"\n    DSP = 0\n    if dw * depth <= 512:\n        BRAM = 0\n        FF = 5\n        LUT = dw + 12\n    else:\n        BRAM = BRAM_predict_HLS(dw, depth, 1)        \n    # In the current codegen, we will use SRL to implement FIFOs\n    #    BRAM = 0\n        FF = dw + 10\n        LUT = int(0.9687 * dw + 13.982)\n\n    return {'BRAM18K': BRAM, 'DSP': DSP, 'FF': FF, 'LUT': LUT}\n\ndef extract_axi_res_from_hls_rpt(rpt_path):\n    \"\"\" Extract the resource usage for AXI modules from the HLS report in text format\n\n    Parameters\n    ----------\n    rpt_path: str\n        The path of HLS report\n\n    Returns\n    -------\n    BRAM18K, FF, LUT\n    \"\"\"\n    with open(rpt_path) as f:\n        lines = f.readlines()\n    BRAM18K_total = 0\n    FF_total = 0\n    LUT_total = 0\n    for line in lines:\n        if line.find('kernel0_gmem_') != -1:\n            line = line.split('|')\n            BRAM18K_total += float(line[3])\n            FF_total += float(line[5])\n            LUT_total += float(line[6])\n    return BRAM18K_total, FF_total, LUT_total\n\ndef extract_design_info(design_dir, synth=0):\n    \"\"\" Extract the design infomation.\n\n    Load the design_info.json and design_info.dat under the diretory 'resource_est'.\n    If synth is set to 1, load the HLS reports.\n    Return a dictionary that contains all the information above.\n    - FF: int\n    - LUT: int\n    - BRAM18K: int\n    - DSP: int\n    - URAM: int\n    - fifos:\n      - fifo_name:\n        - fifo_cnt: int\n        - fifo_width: int\n        - fifo_depth: int\n    - modules:\n      - module_name:\n        - module_cnt: int\n        - FF, LUT, BRAM18K, URAM, DSP: int\n        - data_pack_inter, data_pack_intra: int\n        - ele_type: str\n        - ele_size: int\n        - local_buffers\n        - unroll: int\n\n    Parameters\n    ----------\n    design_dir: str\n        The design directory.\n    synth: int\n        Is the design synthesized or not.\n    \"\"\"\n    # Load the design info\n    f_dir = f'{design_dir}/resource_est/design_info.json'\n    with open(f_dir, 'r') as f:\n        design_info = json.load(f)\n    design_info['fifos'] = {}\n    f_dir = f'{design_dir}/resource_est/design_info.dat'\n    with open(f_dir, 'r') as f:\n        lines = f.readlines()\n    for line in lines:\n        line = line.strip().split(':')\n        if line[0] == 'fifo':\n            fifo_name = line[1]\n            fifo_cnt = int(line[2])\n            fifo_w = int(line[3])\n            fifo_depth = 2 # default                 \n            design_info['fifos'][fifo_name] = {\n                'fifo_cnt': fifo_cnt,\n                'fifo_width': fifo_w,\n                'fifo_depth': fifo_depth\n            }\n            if fifo_cnt == 0 and fifo_name in design_info['fifos']:\n                design_info['fifos'].pop(fifo_name)\n        elif line[0] == 'module':\n            module_name = line[1]\n            module_cnt = int(line[2])                        \n            design_info['modules'][module_name]['module_cnt'] = module_cnt\n            if module_cnt == 0 and module_name in design_info['modules']:\n                design_info['modules'].pop(module_name)\n    if synth:\n        # Load the HLS project              \n        hls_rpts = {}\n        hls_prj_dir = f'{design_dir}/hls_prj'\n        hls_rpts_dir = f'{hls_prj_dir}/solution1/syn/report'\n        hls_rpt_names = os.listdir(hls_rpts_dir)\n        hls_rpt_names = [r for r in hls_rpt_names if r.endswith('_csynth.xml')]\n        for r in hls_rpt_names:\n            with open(hls_rpts_dir + '/' + r, 'r') as f:\n                tree = ET.parse(f)\n                root = tree.getroot()\n                module_name = r[:-11]\n                # For duplicate modules, get rid of the digits suffix.\n                while module_name[-1].isdigit():\n                    module_name = module_name[:-1]\n                hls_rpts[module_name] = root\n        \n        # Extract the resource info from the hls report\n        for module in design_info['modules']:\n            if module in hls_rpts:\n                rpt = hls_rpts[module]\n            elif f'{module}_wrapper' in hls_rpts:\n                # It is possible the module is wrapped. \n                # Look for the wrapper module.\n                rpt = hls_rpts[module + '_wrapper']\n            else:\n                # The module is inlined\n                rpt = None\n\n            if rpt:\n                res = extract_resource_info_from_hls_rpt(rpt)\n                design_info['modules'][module]['FF'] = res['FF']\n                # Extract the FF storage if existing\n                if \"local_buffers\" in design_info['modules'][module]:\n                    local_buffers = design_info['modules'][module]['local_buffers']\n                    for local_buffer in local_buffers:\n                        if local_buffer['mem_type'] == 'FF':\n                            design_info['modules'][module]['FF'] -= \\\n                                FF_array_predict_HLS(local_buffer['port_width'], \\\n                                                     local_buffer['buffer_depth'])                            \n                design_info['modules'][module]['LUT'] = res['LUT']\n                design_info['modules'][module]['BRAM18K'] = res['BRAM18K']\n                design_info['modules'][module]['URAM'] = res['URAM']\n                design_info['modules'][module]['DSP'] = res['DSP']\n            else:\n                # For inlined module, its resource usage is included in the parent module.\n                design_info['modules'][module]['FF'] = None\n                design_info['modules'][module]['LUT'] = None\n                design_info['modules'][module]['BRAM18K'] = None\n                design_info['modules'][module]['URAM'] = None\n                design_info['modules'][module]['DSP'] = None                \n        # Top module\n        rpt = hls_rpts['kernel']\n        res = extract_resource_info_from_hls_rpt(rpt) \n        # For the top module, we will also parse the report for BRAM usage of AXI modules\n        top_module_rpt_name = 'kernel0_csynth.rpt'\n        axi_bram, axi_ff, axi_lut = extract_axi_res_from_hls_rpt(f'{hls_rpts_dir}/{top_module_rpt_name}')\n        res['BRAM18K'] -= axi_bram\n        res['FF'] -= axi_ff\n        res['LUT'] -= axi_lut\n\n        design_info['FF'] = res['FF']\n        design_info['LUT'] = res['LUT']\n        design_info['BRAM18K'] = res['BRAM18K']\n        design_info['URAM'] = res['URAM']\n        design_info['DSP'] = res['DSP']\n    else:\n        for module in design_info['modules']:\n            design_info['modules'][module]['FF'] = None\n            design_info['modules'][module]['LUT'] = None\n            design_info['modules'][module]['BRAM18K'] = None\n            design_info['modules'][module]['URAM'] = None\n            design_info['modules'][module]['DSP'] = None\n        design_info['FF'] = None\n        design_info['LUT'] = None\n        design_info['BRAM18K'] = None\n        design_info['URAM'] = None\n        design_info['DSP'] = None\n\n    return design_info\n\ndef extract_resource_info_from_hls_rpt(rpt):\n    \"\"\" Extract the resource info from the HLS rpt.\n\n    Parameters\n    ----------\n    rpt: \n        HLS report in XML format\n    \"\"\"\n    res = {\n        'BRAM18K': 0,\n        'DSP': 0,\n        'URAM': 0,\n        'FF': 0,\n        'LUT': 0\n    }\n    root = rpt\n    for est in root.iter('AreaEstimates'):\n        for child in est:\n            if child.tag == 'Resources':\n                for item in child:\n                    if item.tag == 'BRAM_18K':\n                        res['BRAM18K'] = int(item.text)\n                    elif item.tag == 'URAM':\n                        res['URAM'] = int(item.text)\n                    elif item.tag == 'DSP48E':\n                        res['DSP'] = int(item.text)    \n                    elif item.tag == 'FF':\n                        res['FF'] = int(item.text)   \n                    elif item.tag == 'LUT':\n                        res['LUT'] = int(item.text)                        \n\n    return res\n\ndef convert_design_infos_to_df(design_infos):\n    \"\"\" Convert the design infos into a dataframe.\n\n    Parameters\n    ----------\n    design_infos: list\n        A list containing all design informations.\n    \"\"\"\n    modules = []\n    fifos = []\n    for design_info in design_infos:\n        fs = design_info['fifos']\n        ms = design_info['modules']\n        for f in fs:\n            if f not in fifos:\n                fifos.append(f)\n        for m in ms:\n            if m not in modules and m.find('wrapper') == -1:\n                modules.append(m)\n\n    # Reorganize the design information to a dictionary\n    info_dict = {}\n    info_dict['FF'] = []\n    info_dict['LUT'] = []\n    info_dict['DSP'] = []\n    info_dict['BRAM18K'] = []\n    info_dict['URAM'] = []\n    for fifo in fifos:\n        info_dict[fifo + '_fifo_cnt'] = []\n        info_dict[fifo + '_fifo_width'] = []\n        info_dict[fifo + '_fifo_depth'] = []\n    for module in modules:\n        # IO_module: \n        #   module_cnt, data_pack_inter, data_pack_intra, ele_type, ele_size\n        #   [local_buffers_local_X]_{port_width, buffer_depth, partition_number}\n        # PE_module: \n        #   module_cnt, unroll\n        if module.find('IO') != -1:\n            # IO module\n            info_dict[module + '_data_pack_inter'] = []\n            info_dict[module + '_data_pack_intra'] = []\n            info_dict[module + '_ele_size'] = []\n        else:\n            # PE module\n            info_dict[module + '_unroll'] = []\n        \n        info_dict[module + '_module_cnt'] = []\n        info_dict[module + '_FF'] = []\n        info_dict[module + '_LUT'] = []\n        info_dict[module + '_BRAM18K'] = []\n        info_dict[module + '_URAM'] = []\n        info_dict[module + '_DSP'] = []\n\n    for design_info in design_infos:\n        # FF, LUT, BRAM, DSP\n        info_dict['FF'].append(design_info['FF'])\n        info_dict['LUT'].append(design_info['LUT'])\n        info_dict['DSP'].append(design_info['DSP'])\n        info_dict['BRAM18K'].append(design_info['BRAM18K'])\n        info_dict['URAM'].append(design_info['URAM'])\n\n        fs = design_info['fifos']\n        ms = design_info['modules']\n        for fifo in fifos:\n            if fifo in fs:\n                info_dict[fifo + '_fifo_cnt'].append(fs[fifo]['fifo_cnt'])\n                info_dict[fifo + '_fifo_width'].append(fs[fifo]['fifo_width'])\n                info_dict[fifo + '_fifo_depth'].append(fs[fifo]['fifo_depth'])\n            else:\n                info_dict[fifo + '_fifo_cnt'].append(None)\n                info_dict[fifo + '_fifo_width'].append(None)\n                info_dict[fifo + '_fifo_depth'].append(None)\n    \n        for module in modules:\n            if module.find('IO') != -1:\n                # IO module\n                if module in ms:\n                    info_dict[module + '_module_cnt'].append(ms[module]['module_cnt'])\n                    info_dict[module + '_data_pack_inter'].append(ms[module]['data_pack_inter'])\n                    info_dict[module + '_data_pack_intra'].append(ms[module]['data_pack_intra'])\n                    info_dict[module + '_ele_size'].append(ms[module]['ele_size'])\n                else:\n                    info_dict[module + '_module_cnt'].append(None)\n                    info_dict[module + '_data_pack_inter'].append(None)\n                    info_dict[module + '_data_pack_intra'].append(None)\n                    info_dict[module + '_ele_size'].append(None)\n            else:\n                # PE module\n                if module in ms:\n                    info_dict[module + '_module_cnt'].append(ms[module]['module_cnt'])\n                    info_dict[module + '_unroll'].append(ms[module]['unroll'])\n                else:\n                    info_dict[module + '_module_cnt'].append(None)\n                    info_dict[module + '_unroll'].append(None)\n      \n            if module in ms:\n                info_dict[module + '_FF'].append(ms[module]['FF'])\n                info_dict[module + '_LUT'].append(ms[module]['LUT'])\n                info_dict[module + '_BRAM18K'].append(ms[module]['BRAM18K'])\n                info_dict[module + '_URAM'].append(ms[module]['URAM'])\n                info_dict[module + '_DSP'].append(ms[module]['DSP'])\n            else:\n                info_dict[module + '_FF'].append(None)\n                info_dict[module + '_LUT'].append(None)\n                info_dict[module + '_BRAM18K'].append(None)\n                info_dict[module + '_URAM'].append(None)\n                info_dict[module + '_DSP'].append(None)\n\n    df = pd.DataFrame(info_dict)\n    return modules, fifos, df \n\ndef df_feature_extract(df, module):\n    \"\"\" Expand the dataframe to include new features for the module.\n\n    Parameters\n    ----------\n    df: dataframe\n    module: str\n    \"\"\"\n    if module.find('IO') != -1:\n        df[module + '_data_pack_inter/' + module + '_data_pack_intra'] = \\\n            df.apply(lambda row: float(row[module + '_data_pack_inter']) / float(row[module + '_data_pack_intra']), axis = 1)\n        #df[module + '_data_pack_inter*' + module + '_ele_size'] = \\\n        #    df.apply(lambda row: float(row[module + '_data_pack_inter']) * float(row[module + '_ele_size']), axis = 1)\n\n    return df\n\ndef get_feature_set(module):\n    \"\"\" Exatract the feature set for the resource models.\n\n    Parameters\n    ----------\n    module: str\n        Module name.\n    \"\"\"\n    feature_set = []\n    if 'IO' in module:\n        feature_set.append(f'{module}_data_pack_inter')\n        feature_set.append(f'{module}_data_pack_inter/{module}_data_pack_intra')\n    else:\n        feature_set.append(f'{module}_unroll')\n    return feature_set\n\ndef train(df, modules, fifos, design_infos, work_dir, logger):\n    \"\"\" Train the resource models for each module.\n\n    Parameters\n    ----------\n    df: dataframe\n        A dataframe that containing all designs\n    modules: list\n        Module name list.\n    fifos: list\n        FIFO name list.\n    design_infos: list\n        A list containing all design informations.\n    work_dir: str\n        Directory to save the trained models.\n    logger:\n        Logger.\n    \"\"\"\n    # Split the training set and validation set.\n    feature_set = []\n    pred_set = []\n    for module in modules:\n        # Expand the dataframe if necessary        \n        df = df_feature_extract(df, module)\n        feature_set += get_feature_set(module)        \n        pred_set.append(module + '_FF')\n        pred_set.append(module + '_LUT')\n        pred_set.append(module + '_BRAM18K')\n        pred_set.append(module + '_URAM')\n        pred_set.append(module + '_DSP')\n\n    X = df.loc[:, feature_set]\n    y = df.loc[:, pred_set]\n    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n    logger.info(f'#Training samples: {X_train.shape[0]}')\n    logger.info(f'#Validation samples: {X_test.shape[0]}')\n\n    # Evaluation metrics\n    FF_mape = []\n    LUT_mape = []\n    DSP_mape = []\n    BRAM18K_mape = []\n    URAM_mape = []    \n    \n    for module in modules:\n        logger.info('Training resource model for module: ' + module)\n        feature_set = get_feature_set(module)\n\n        # FF\n        pred_set = [module + '_FF']\n        y_train_module = y_train.loc[:, pred_set]        \n        y_train_module = y_train_module.dropna()        \n        X_train_module = X_train.loc[y_train_module.index, feature_set]                \n        if X_train_module.shape[0] > 0:\n            model = LinearRegression()\n            model.fit(X_train_module.to_numpy(), y_train_module.to_numpy())\n            model_name = module + '_FF_model'\n            joblib_file = work_dir + '/' + model_name + '.pkl'\n            joblib.dump(model, joblib_file)\n        # Validate the accuracy\n        y_test_module = y_test.loc[:, pred_set]\n        y_test_module = y_test_module.dropna()\n        X_test_module = X_test.loc[y_test_module.index, feature_set]        \n        if X_test_module.shape[0] > 0:\n            y_pred_module = model.predict(X_test_module.to_numpy())        \n            y_test_module = y_test_module.to_numpy()\n            logger.info('======== FF ========')\n            logger.info(f'Mean Absolute Error: {metrics.mean_absolute_error(y_test_module, y_pred_module)}')\n            logger.info(f'Mean Squared Error: {metrics.mean_squared_error(y_test_module, y_pred_module)}')\n            logger.info(f'Mean Absolute Percentage Error: {mean_absolute_percentage_error(y_test_module, y_pred_module)}')\n            FF_mape.append(mean_absolute_percentage_error(y_test_module, y_pred_module))\n\n        # LUT\n        pred_set = [module + '_LUT']\n        y_train_module = y_train.loc[:, pred_set]\n        y_train_module = y_train_module.dropna()\n        X_train_module = X_train.loc[y_train_module.index, feature_set]        \n        if X_train_module.shape[0] > 0:\n            model = LinearRegression()\n            model.fit(X_train_module.to_numpy(), y_train_module.to_numpy())\n            model_name = module + '_LUT_model'\n            joblib_file = work_dir + '/' + model_name + '.pkl'\n            joblib.dump(model, joblib_file)\n        # Validate the accuracy\n        y_test_module = y_test.loc[:, pred_set]\n        y_test_module = y_test_module.dropna()\n        X_test_module = X_test.loc[y_test_module.index, feature_set]        \n        if X_test_module.shape[0] > 0:\n            y_pred_module = model.predict(X_test_module.to_numpy())        \n            y_test_module = y_test_module.to_numpy()\n            logger.info('======== LUT ========')\n            logger.info(f'Mean Absolute Error: {metrics.mean_absolute_error(y_test_module, y_pred_module)}')\n            logger.info(f'Mean Squared Error: {metrics.mean_squared_error(y_test_module, y_pred_module)}')\n            logger.info(f'Mean Absolute Percentage Error: {mean_absolute_percentage_error(y_test_module, y_pred_module)}')\n            LUT_mape.append(mean_absolute_percentage_error(y_test_module, y_pred_module))\n\n        # DSP\n        pred_set = [module + '_DSP']\n        y_train_module = y_train.loc[:, pred_set]\n        y_train_module = y_train_module.dropna()\n        X_train_module = X_train.loc[y_train_module.index, feature_set]\n        if X_train_module.shape[0] > 0:\n            model = LinearRegression()\n            model.fit(X_train_module.to_numpy(), y_train_module.to_numpy())\n            model_name = module + '_DSP_model'\n            joblib_file = work_dir + '/' + model_name + '.pkl'\n            joblib.dump(model, joblib_file)\n        # Validate the accuracy\n        y_test_module = y_test.loc[:, pred_set]\n        y_test_module = y_test_module.dropna()\n        X_test_module = X_test.loc[y_test_module.index, feature_set]        \n        if X_test_module.shape[0] > 0:\n            y_pred_module = model.predict(X_test_module.to_numpy())        \n            y_test_module = y_test_module.to_numpy()        \n            logger.info('======== DSP ========')\n            logger.info(f'Mean Absolute Error: {metrics.mean_absolute_error(y_test_module, y_pred_module)}')\n            logger.info(f'Mean Squared Error: {metrics.mean_squared_error(y_test_module, y_pred_module)}')\n            logger.info(f'Mean Absolute Percentage Error: {mean_absolute_percentage_error(y_test_module, y_pred_module)}')\n            DSP_mape.append(mean_absolute_percentage_error(y_test_module, y_pred_module))\n\n        # BRAM18K\n        pred_set = [module + '_BRAM18K']\n        y_test_module = y_test.loc[:, pred_set]        \n        y_test_module = y_test_module.dropna()\n        X_test_module = X_test.loc[y_test_module.index, feature_set]        \n        if X_test_module.shape[0] > 0:\n            y_pred_module = np.zeros((y_test_module.shape[0], 1), dtype=float)\n            cnt = 0\n            for index, row in y_test_module.iterrows():            \n                design_info = design_infos[index]\n                BRAM_usage = 0\n                if \"local_buffers\" in design_info['modules'][module]:\n                    local_buffers = design_info['modules'][module]['local_buffers']\n                    for local_buffer in local_buffers:\n                        if local_buffer['mem_type'] == 'BRAM':\n                            if 'array_map' in local_buffer:\n                                # For horizontal mapping, we will merge two ping/pong buffers to one\n                                BRAM_usage += BRAM_array_predict_HLS(local_buffer['port_width'], \\\n                                    local_buffer['buffer_depth'] * 2, local_buffer['partition_number']) / 2\n                            else:\n                                BRAM_usage += BRAM_array_predict_HLS(local_buffer['port_width'], \\\n                                    local_buffer['buffer_depth'], local_buffer['partition_number'])                                  \n\n                y_pred_module[cnt] = BRAM_usage\n                cnt += 1\n\n            y_test_module = y_test_module.to_numpy()\n            logger.info('======== BRAM18K ========')\n            logger.info(f'Mean Absolute Error: {metrics.mean_absolute_error(y_test_module, y_pred_module)}')\n            logger.info(f'Mean Squared Error: {metrics.mean_squared_error(y_test_module, y_pred_module)}')\n            logger.info(f'Mean Absolute Percentage Error: {mean_absolute_percentage_error(y_test_module, y_pred_module)}')\n            BRAM18K_mape.append(mean_absolute_percentage_error(y_test_module, y_pred_module))\n\n        # URAM\n        pred_set = [module + '_URAM']\n        y_test_module = y_test.loc[:, pred_set]        \n        y_test_module = y_test_module.dropna()\n        X_test_module = X_test.loc[y_test_module.index, feature_set]     \n        if X_test_module.shape[0] > 0:           \n            y_pred_module = np.zeros((y_test_module.shape[0], 1), dtype=float)\n            cnt = 0\n            for index, row in y_test_module.iterrows():\n                design = 'design' + str(index)\n                design_info = design_infos[index]\n                URAM_usage = 0\n                if \"local_buffers\" in design_info['modules'][module]:\n                    local_buffers = design_info['modules'][module]['local_buffers']\n                    for local_buffer in local_buffers:\n                        if local_buffer['mem_type'] == 'URAM':\n                            BRAM_usage += URAM_array_predict_HLS(local_buffer['port_width'], \\\n                                local_buffer['buffer_depth'], local_buffer['partition_number'])\n                y_pred_module[cnt] = URAM_usage\n                cnt += 1\n\n            y_test_module = y_test_module.to_numpy()\n            logger.info('======== URAM ========')\n            logger.info(f'Mean Absolute Error: {metrics.mean_absolute_error(y_test_module, y_pred_module)}')\n            logger.info(f'Mean Squared Error: {metrics.mean_squared_error(y_test_module, y_pred_module)}')\n            logger.info(f'Mean Absolute Percentage Error: {mean_absolute_percentage_error(y_test_module, y_pred_module)}')\n            URAM_mape.append(mean_absolute_percentage_error(y_test_module, y_pred_module))\n        \n    logger.info('======== Module-Level Resource Model Validation Results ========')\n    logger.info('FF Mean Absoulate Percentage Error (Arith. Mean): %.2f%%' %(mean(FF_mape)))\n    logger.info('LUT Mean Absoulate Percentage Error (Arith. Mean): %.2f%%' %(mean(LUT_mape)))\n    logger.info('DSP Mean Absoulate Percentage Error (Arith. Mean): %.2f%%' %(mean(DSP_mape)))\n    logger.info('BRAM18K Mean Absoulate Percentage Error (Arith. Mean): %.2f%%' %(mean(BRAM18K_mape)))\n    logger.info('URAM Mean Absoulate Percentage Error (Arith. Mean): %.2f%%' %(mean(URAM_mape)))\n\n    # Validate on the whole design.\n    df_test = df.loc[y_test.index.values.tolist(), :]\n    FF_design_mape = []\n    LUT_design_mape = []\n    DSP_design_mape = []\n    BRAM18K_design_mape = []\n    URAM_design_mape = []\n\n    for index, row in df_test.iterrows():\n        #print(index)\n        design_info = design_infos[index]\n        df_design = df_test.loc[[index], :]\n        res = predict_design_resource_usage(df_design, modules, fifos, design_info, work_dir)                 \n\n        #print(design_info['BRAM18K'], res['BRAM18K'])\n        #print(design_info['FF'], res['FF'])\n        #print(design_info['LUT'], res['LUT'])\n\n        FF_mape = mean_absolute_percentage_error(float(design_info['FF']), res['FF'])        \n        LUT_mape = mean_absolute_percentage_error(float(design_info['LUT']), res['LUT'])\n        DSP_mape = mean_absolute_percentage_error(float(design_info['DSP']), res['DSP'])        \n        BRAM18K_mape = mean_absolute_percentage_error(float(design_info['BRAM18K']), res['BRAM18K'])\n        URAM_mape = mean_absolute_percentage_error(float(design_info['URAM']), res['URAM'])\n\n        FF_design_mape.append(FF_mape)\n        LUT_design_mape.append(LUT_mape)\n        DSP_design_mape.append(DSP_mape)\n        BRAM18K_design_mape.append(BRAM18K_mape)\n        URAM_design_mape.append(URAM_mape)\n\n    logger.info('======== Design-Level Resource Model Validation Results ========')\n    logger.info('FF Mean Absoulate Percentage Error (Arith. Mean): %.2f%%' %(mean(FF_design_mape)))\n    logger.info('LUT Mean Absoulate Percentage Error (Arith. Mean): %.2f%%' %(mean(LUT_design_mape)))\n    logger.info('DSP Mean Absoulate Percentage Error (Arith. Mean): %.2f%%' %(mean(DSP_design_mape)))\n    logger.info('BRAM18K Mean Absoulate Percentage Error (Arith. Mean): %.2f%%' %(mean(BRAM18K_design_mape)))\n    logger.info('URAM Mean Absoulate Percentage Error (Arith. Mean): %.2f%%' %(mean(URAM_design_mape)))    \n\ndef predict_design_resource_usage(df, modules, fifos, design_info, prj_dir, \\\n    target=['FF', 'LUT', 'DSP', 'BRAM18K', 'URAM']):\n    \"\"\" Predict the resource usage for a single design on Xilinx platforms\n\n    Parameters\n    ----------\n    df: dataframe\n        A dataframe storing the information for the current design.\n    modules: list\n        A list containing all module names.\n    fifos: list\n        A list containing all FIFO names.\n    design_info: dict\n        A dictionary containing the design information.\n    prj_dir: str\n        Directory to the resource models.    \n    target: list\n        Resource types to predict.\n    \"\"\"\n    resource = {'FF': 0, 'LUT': 0, 'DSP': 0, 'BRAM18K': 0, 'URAM': 0}    \n    resource_all = {}\n\n    # Predict FIFOs\n    for fifo in fifos:\n        if fifo in design_info['fifos']:\n            # Query the library to get the data\n            fifo_w = design_info['fifos'][fifo]['fifo_width'] * 8\n            fifo_depth = design_info['fifos'][fifo]['fifo_depth']\n            resource_info = FIFO_predict_xilinx(fifo_w, fifo_depth)\n            FF = resource_info['FF']\n            LUT = resource_info['LUT']\n            BRAM = resource_info['BRAM18K']\n            URAM = 0\n            DSP = resource_info['DSP']\n            resource_all[fifo] = {\n                'FF': FF, 'LUT': LUT, 'BRAM18K': BRAM, 'URAM': URAM, 'DSP': DSP, \\\n                'n': design_info['fifos'][fifo]['fifo_cnt']}\n\n    # Predict modules\n    for module in modules:\n        if module in design_info['modules']:\n            df = df_feature_extract(df, module)\n            module_feature_set = get_feature_set(module)\n\n            FF = 0\n            if 'FF' in target:\n                # FF\n                X = df.loc[:, module_feature_set]\n                model_name = module + '_FF_model'\n                joblib_file = prj_dir + '/' + model_name + '.pkl'\n                if os.path.isfile(joblib_file):\n                    model = joblib.load(joblib_file)\n                    FF = np.asscalar(model.predict(X.to_numpy()))\n                    # Add back the FF arrays if existing\n                    if \"local_buffers\" in design_info['modules'][module]:\n                        local_buffers = design_info['modules'][module]['local_buffers']\n                        for local_buffer in local_buffers:\n                            if local_buffer['mem_type'] == 'FF':\n                                FF += FF_array_predict_HLS(local_buffer['port_width'], \\\n                                                           local_buffer['buffer_depth'])\n            LUT = 0\n            if 'LUT' in target:\n                # LUT\n                X = df.loc[:, module_feature_set]\n                model_name = module + '_LUT_model'\n                joblib_file = prj_dir + '/' + model_name + '.pkl'\n                if os.path.isfile(joblib_file):\n                    model = joblib.load(joblib_file)\n                    LUT = np.asscalar(model.predict(X.to_numpy()))\n\n            DSP = 0\n            if 'DSP' in target:\n                # DSP\n                X = df.loc[:, module_feature_set]\n                model_name = module + '_DSP_model'\n                joblib_file = prj_dir + '/' + model_name + '.pkl'\n                if os.path.isfile(joblib_file):\n                    model = joblib.load(joblib_file)\n                    DSP = np.asscalar(model.predict(X.to_numpy()))\n\n            BRAM = 0\n            if 'BRAM18K' in target:\n                # BRAM                \n                if 'local_buffers' in design_info['modules'][module]:\n                    local_buffers = design_info['modules'][module]['local_buffers']\n                    for local_buffer in local_buffers:\n                        if local_buffer['mem_type'] == 'BRAM':\n                            if 'array_map' in local_buffer:\n                                # For horizontal mapping, we will merge two ping/pong buffers to one\n                                BRAM += BRAM_array_predict_HLS(local_buffer['port_width'], \\\n                                    local_buffer['buffer_depth'] * 2, local_buffer['partition_number']) / 2\n                            else:\n                                BRAM += BRAM_array_predict_HLS(local_buffer['port_width'], \\\n                                    local_buffer['buffer_depth'], local_buffer['partition_number'])                            \n\n            #if BRAM > 0:\n            #    print(module, BRAM)\n\n            URAM = 0\n            if 'URAM' in target:\n                # URAM                \n                if 'local_buffers' in design_info['modules'][module]:\n                    local_buffers = design_info['modules'][module]['local_buffers']\n                    for local_buffer in local_buffers:\n                        if local_buffer['mem_type'] == 'URAM':\n                            URAM += URAM_array_predict_HLS(local_buffer['port_width'], \\\n                                local_buffer['buffer_depth'], local_buffer['partition_number'])\n\n            resource_all[module] = {\n                'FF': FF, 'LUT': LUT, 'BRAM18K': BRAM, 'URAM': URAM, 'DSP': DSP, \\\n                'n': design_info['modules'][module]['module_cnt']}        \n\n    #pp = pprint.PrettyPrinter(indent=4)\n    #pp.pprint(resource_all)\n\n    # Aggregate the resource\n    for inst in resource_all:\n        # For FF/LUT/DSP prediction, if the module contains inner module, skip it.\n        #is_outer_module = 0\n        #if inst.find('boundary') != -1:\n        #    if inst[:-9] + '_inter_trans' in resource_all:\n        #        is_outer_module = 1\n        #else:\n        #    if inst + '_inter_trans' in resource_all:\n        #        is_outer_module = 1\n        is_inner_module = 0\n        if inst.find('inter_trans') != -1 or inst.find('intra_trans') != -1:\n            is_inner_module = 1\n        #if not is_outer_module:\n        #    resource['FF'] += resource_all[inst]['FF'] * resource_all[inst]['n']\n        #    resource['LUT'] += resource_all[inst]['LUT'] * resource_all[inst]['n']\n        #    resource['DSP'] += resource_all[inst]['DSP'] * resource_all[inst]['n']\n        if is_inner_module:\n            continue\n\n        resource['FF'] += resource_all[inst]['FF'] * resource_all[inst]['n']\n        resource['LUT'] += resource_all[inst]['LUT'] * resource_all[inst]['n']\n        resource['DSP'] += resource_all[inst]['DSP'] * resource_all[inst]['n']\n        resource['BRAM18K'] += resource_all[inst]['BRAM18K'] * resource_all[inst]['n']\n        resource['URAM'] += resource_all[inst]['URAM'] * resource_all[inst]['n']\n\n    ret = {}\n    for r in resource:\n        if r in target:\n            ret[r] = int(resource[r])\n        else:\n            ret[r] = 0\n\n    return ret\n\ndef mean_absolute_percentage_error(y_true, y_pred):    \n    if isinstance(y_true, np.ndarray) and isinstance(y_pred, np.ndarray):\n        error = np.divide((y_true - y_pred), y_true, out=(-y_pred), where=y_true!=0)\n        return np.mean(np.abs(error)) * 100    \n    else:    \n        # scalar\n        if y_true == 0:\n            return abs(y_pred) * 100\n        else:            \n            return abs((y_true - y_pred) / y_true) * 100\n\ndef resource_valid(res, hw_info, range, target):\n    \"\"\" Test if the resource usage is valid.\n\n    Parameters\n    ----------\n    res: dict\n        A dict containing the resource usage of the current design.\n    hw_info: dict\n        A dict containing the hardware platform information.\n    thres: dict\n        A dict containing the resource threshold.\n    target: list\n        A list containing the hw resource target to predict.\n\n    Returns\n    -------\n    ret: boolean\n    \"\"\"\n    for r in res:\n        if r in target:\n            usage = res[r]\n            if usage > hw_info[r] * range[r][1]:\n                return False\n            if usage < hw_info[r] * range[r][0]:\n                return False\n    return True\n\ndef compute_res_util_score(res, hw_info):\n    \"\"\" Compute a score for the current design utilization.\n\n    We put different weights for different types of resource.\n    URAM, DSP, BRAM18K: 0.3\n    LUT: 0.2\n    FF: 0.1\n    \"\"\"\n    score = 0\n    if 'FF' in res:\n        score += 0.1 * float(int(res['FF'])) / hw_info['FF']\n    if 'LUT' in res:\n        score += 0.2 * float(int(res['LUT'])) / hw_info['LUT']\n    if 'BRAM18K' in res:\n        score += 0.3 * float(int(res['BRAM18K'])) / hw_info['BRAM18K']\n    if 'DSP' in res:\n        score += 0.3 * float(int(res['DSP'])) / hw_info['DSP']\n    if 'URAM' in res:\n        score += 0.3 * float(int(res['URAM'])) / hw_info['URAM']\n\n    return score\n\ndef unit_test_predict_design_resource(design_dir, hw_info, model_path):\n    design_info = extract_design_info(design_dir, 0)\n    modules, fifos, df = convert_design_infos_to_df([design_info])\n    kernel_id = design_info['kernel_id']        \n    res_model_path = f'{model_path}/kernel{kernel_id}'\n    res = predict_design_resource_usage(\n        df, modules, fifos, design_info,\n        res_model_path)\n    # compute the ratio\n    print(f\"FF: {res['FF']}/{hw_info['FF']} ({res['FF']/hw_info['FF']:.2f})\")\n    print(f\"LUT: {res['LUT']}/{hw_info['LUT']} ({res['LUT']/hw_info['LUT']:.2f})\")\n    print(f\"BRAM18K: {res['BRAM18K']}/{hw_info['BRAM18K']} ({res['BRAM18K']/hw_info['BRAM18K']:.2f})\")\n    print(f\"DSP: {res['DSP']}/{hw_info['DSP']} ({res['DSP']/hw_info['DSP']:.2f})\")\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser(description=\"==== AutoSA Resource Model ====\")\n    parser.add_argument('-d', required=True, help='design directory')\n    parser.add_argument('-i', required=True, help='hardware info')\n    parser.add_argument('-m', required=True, help='resource model path')\n\n    args = parser.parse_args()\n    with open(args.i, 'r') as f:\n        hw_info = json.load(f)\n    unit_test_predict_design_resource(args.d, hw_info, args.m)"
  },
  {
    "path": "autosa_scripts/tapa_scripts/CMakeLists.txt",
    "content": "cmake_minimum_required(VERSION 3.13)\ncmake_policy(SET CMP0076 NEW)\n\nproject(kernel)\n\nadd_executable(kernel)\ntarget_sources(kernel PRIVATE kernel_host.cpp kernel_kernel.cpp)\ntarget_link_libraries(kernel PUBLIC tapa::tapa)\ntarget_compile_features(kernel PUBLIC cxx_std_11)\ninclude_directories(/opt/tools/xilinx/Vitis_HLS/2020.2/include)\n\nadd_test(NAME kernel COMMAND kernel)\n\nfind_package(gflags REQUIRED)\nfind_package(TAPA REQUIRED)\nfind_package(FRT REQUIRED)\nset(TAPA tapa::tapa)\n\nfind_package(SDx)\nif(SDx_FOUND)\n  add_tapa_target(\n    kernel-hw-xo\n    INPUT kernel_kernel.cpp\n    FRT_INTERFACE ${CMAKE_CURRENT_BINARY_DIR}/kernel.frt.cpp\n    TOP kernel0\n    PLATFORM xilinx_u250_xdma_201830_2)\n\n  add_xocc_hw_link_targets(\n    ${CMAKE_CURRENT_BINARY_DIR}\n    INPUT kernel-hw-xo\n    HW_EMU_XCLBIN\n    hw_emu_xclbin\n    HW_XCLBIN\n    hw_xclbin)\n\n  add_executable(kernel-frt)\n  target_include_directories(kernel-frt PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})\n  target_sources(kernel-frt PRIVATE kernel_host.cpp\n                                  ${CMAKE_CURRENT_BINARY_DIR}/kernel.frt.cpp)\n  target_link_libraries(kernel-frt PRIVATE ${TAPA} frt::frt)\n\n  add_custom_target(\n    kernel-cosim\n    COMMAND TAPAB=$<TARGET_PROPERTY:${hw_emu_xclbin},FILE_NAME>\n            $<TARGET_FILE:kernel-frt>\n    DEPENDS kernel-frt ${hw_emu_xclbin}\n    WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})\n  add_custom_target(\n    kernel-hw\n    COMMAND TAPAB=$<TARGET_PROPERTY:${hw_xclbin},FILE_NAME>\n            $<TARGET_FILE:kernel-frt>\n    DEPENDS kernel-frt ${hw_xclbin}\n    WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})\n\n  add_test(NAME kernel-cosim COMMAND ${CMAKE_COMMAND} --build ${CMAKE_BINARY_DIR}\n                                   --target kernel-cosim)\nendif()\n"
  },
  {
    "path": "autosa_scripts/tuner/constraint.py",
    "content": "import json\n\nclass Constraint(object):\n    def __init__(self, cst_path):\n        with open(cst_path) as f:\n            data = json.load(f)\n        self.hw_cst = {}\n        for res in data:\n            self.hw_cst[res] = data[res][\"total\"] * data[res][\"ratio\"]        \n            self.hw_cst[f'{res}_total'] = data[res][\"total\"]\n\n    def __repr__(self):\n        ret = \"\"\n        ret += f\"b{int(self.hw_cst['BRAM18K'])}\"\n        ret += f\"d{int(self.hw_cst['DSP'])}\"\n        return ret    "
  },
  {
    "path": "autosa_scripts/tuner/cst/hw_cst.json",
    "content": "{\n  \"BRAM18K\": {\n    \"total\": 5376,\n    \"ratio\": 0.7\n  },\n  \"DSP\": {\n    \"total\": 12288,\n    \"ratio\": 0.7\n  },\n  \"FF\": {\n    \"total\": 3456000,\n    \"ratio\": 0.7\n  },\n  \"LUT\": {\n    \"total\": 1728000,\n    \"ratio\": 0.7\n  },\n  \"URAM\": {\n    \"total\": 1280,\n    \"ratio\": 0.7\n  }\n}\n"
  },
  {
    "path": "autosa_scripts/tuner/design.py",
    "content": "import numpy as np\nimport json\nimport sys\nimport os\nfrom numpy import ceil, floor\n\nclass Design(object):\n    def __init__(self, name):\n        self.name = name # design name        \n        self.est_resource_func = None\n        self.est_latency_func = None\n        self.infer_params_func = None\n        self.random_sampling_func = None\n        self.bound_check_func = None\n        self.params_config = None      \n        self.desp = None  \n\n    def print_resource_est_func(self, f, desp):\n        f.write(\"def est_resource(params):\\n\")\n        # Load parameters\n        f.write(\"\\t\")\n        is_first = True\n        for p in desp[\"params\"]:\n            if not is_first:\n                f.write(\", \")\n            f.write(p[\"name\"])\n            is_first = False\n        f.write(\" = \")\n        is_first = True\n        for p in desp[\"params\"]:\n            if not is_first:\n                f.write(\", \")\n            f.write(f'params[\\\"{p[\"name\"]}\\\"]')\n            is_first = False\n        f.write(\"\\n\\n\")\n\n        f.write(\"\\t# DSP\\n\")\n        f.write(f\"\\tDSP = {desp['compute']['PE']['num']} * \")\n        f.write(f\"{desp['compute']['PE']['unroll_factor']} * \")\n        if desp[\"compute\"][\"PE\"][\"ele_type\"] == \"float\":\n            f.write(f\"5\\n\")\n        else:\n            raise RuntimeError(f\"Unsupported data type {desp['compute']['PE']['ele_type']} in resource estimation\")        \n        f.write(\"\\n\")\n\n        # Print function est_BRAM18K\n        f.write(\"\\t# BRAM18K\\n\")\n        f.write(\"\\tdef est_BRAM18K(ele_size, ele_num, pack):\\n\")\n        f.write(f\"\\t\\treturn ceil(ele_size*8*pack / 18) * ceil(ele_num/pack/1024)\\n\\n\")\n\n        # Check if drain module can be merged.\n        # Note: It should be supported in the codegen of AutoSA. However, currently, \n        # we move it here in the tuner.\n        out_module = {}\n        out_drain_module = {}\n        for module in desp[\"memory\"]:\n            module_mem = desp[\"memory\"][module]\n            if module.endswith('_out'):\n                item = {'buf_size': module_mem['buf_size'], \n                        'num': module_mem['num']}\n                if module.find('drain') != -1:\n                    item['merged'] = 0\n                    out_drain_module[module_mem['array']] = item\n                else:                    \n                    if module_mem['array'] not in out_module:\n                        out_module[module_mem['array']] = [item]\n                    else:\n                        out_module[module_mem['array']].append(item)\n        for array in out_drain_module:\n            if array in out_module:\n                for m in out_module[array]:                \n                    if m['buf_size'] == out_drain_module[array]['buf_size'] and \\\n                       m['num'] == out_drain_module[array]['num']:\n                       out_drain_module[array]['merged'] = 1\n\n        for module in desp[\"memory\"]:\n            module_mem = desp[\"memory\"][module]\n            if module.find('drain') != -1 and out_drain_module[module_mem['array']]['merged'] == 1:\n                continue\n            f.write(f\"\\t{module}_unit_memory = est_BRAM18K({module_mem['ele_size']}, \")\n            f.write(f\"{module_mem['buf_size']}, \")\n            if \"data_pack_factor\" in module_mem:\n                f.write(f\"{module_mem['data_pack_factor']})\\n\")\n            else:\n                f.write(f\"1)\\n\")        \n        #f.write(\"\\tprint(A_IO_L1_in_unit_memory)\\n\")\n        #f.write(\"\\tprint(A_IO_L2_in_unit_memory)\\n\")\n        #f.write(\"\\tprint(B_IO_L2_in_unit_memory)\\n\")        \n        #f.write(\"\\tprint(PE_unit_memory)\\n\")\n        #f.write(\"\\tprint(C_1_IO_L2_out_unit_memory)\\n\")        \n        #f.write(\"\\tprint(C_drain_IO_L1_out_unit_memory)\\n\")\n\n        f.write(\"\\tBRAM18K = \")\n        is_first = True\n        for module in desp[\"memory\"]:\n            module_mem = desp[\"memory\"][module]\n            if module.find('drain') != -1 and out_drain_module[module_mem['array']]['merged'] == 1:\n                continue\n            if not is_first:\n                f.write(\" + \")            \n            f.write(f\"{module}_unit_memory\")\n            if module_mem[\"double_buffer\"]:\n                f.write(f\" * 2\")\n            else:\n                f.write(f\" * 1\")\n            f.write(f\" * {module_mem['num']}\")            \n            is_first = False            \n        f.write(\"\\n\\n\")\n\n        #for module in desp[\"memory\"]:\n        #    module_mem = desp[\"memory\"][module]\n        #    f.write(f\"\\tprint({module_mem['num']})\\n\")\n\n        f.write(\"\\treturn {\\\"DSP\\\": DSP, \\\"BRAM18K\\\": BRAM18K}\\n\")\n        f.write(\"\\n\")\n\n    def print_latency_est_func(self, f, desp):\n        f.write(\"def est_latency(params):\\n\")\n        # Load parameters\n        f.write(\"\\t\")\n        is_first = True\n        for p in desp[\"params\"]:\n            if not is_first:\n                f.write(\", \")\n            f.write(p[\"name\"])\n            is_first = False\n        f.write(\" = \")\n        is_first = True\n        for p in desp[\"params\"]:\n            if not is_first:\n                f.write(\", \")\n            f.write(f'params[\\\"{p[\"name\"]}\\\"]')\n            is_first = False\n        f.write(\"\\n\\n\")\n\n        def extract_latency_expr(lat, info):\n            ret = \"\"\n            if lat[\"type\"] == \"block\":\n                info[\"has_for_child\"] = 0\n                no_for_child = True\n                is_first = True\n                ret += \"(\"\n                for child in lat[\"child\"]:\n                    if not is_first:\n                        ret += \" + \"                    \n                    ret += extract_latency_expr(child, info)                    \n                    if info[\"has_for_child\"] == 1:\n                        no_for_child = False\n                    is_first = False\n                ret += \")\"\n                if no_for_child:\n                    ret = \"1\"\n            elif lat[\"type\"] == \"for\":                \n                child = lat[\"child\"]\n                expr = extract_latency_expr(child, info)                \n                if info[\"valid\"]:\n                    ret = lat[\"bounds\"][1] + \" * \" + expr\n                else:\n                    ret = expr\n                info[\"has_for_child\"] = 1\n            elif lat[\"type\"] == \"mark\":      \n                if info[\"under_mark\"] and lat[\"content\"] == info[\"under_mark\"]:\n                    info[\"valid\"] = True\n                if lat[\"content\"] == \"simd\":\n                    if info[\"valid\"]:\n                        ret = \"1\"\n                    else:\n                        ret = \"0\"\n                else:\n                    child = lat[\"child\"]\n                    ret = extract_latency_expr(child, info)\n                if info[\"under_mark\"] and lat[\"content\"] == info[\"under_mark\"]:\n                    info[\"valid\"] = False\n            elif lat[\"type\"] == \"user\":\n                user_expr = lat[\"child\"][\"user_expr\"]\n                if 'inter_intra' in user_expr or 'intra_inter' in user_expr:                    \n                    if user_expr[:-2].split(\".\")[-1] == \"1\":\n                        double_buffer = 1\n                    else:\n                        double_buffer = 0                    \n                    # Plug in submodule latency\n                    if f\"{info['name']}_inter\" in info[\"modules\"]:\n                        inter_expr = info[\"modules\"][f\"{info['name']}_inter\"]\n                    else:\n                        inter_expr = None\n                    if f\"{info['name']}_intra\" in info[\"modules\"]:\n                        intra_expr = info[\"modules\"][f\"{info['name']}_intra\"]\n                    else:\n                        intra_expr = None\n\n                    if inter_expr and intra_expr:\n                        if info[\"in\"] == 1 or info[\"in\"] == 0:\n                            ret = inter_expr\n                        else:\n                            if double_buffer:\n                                ret = f\"max({inter_expr}, {intra_expr})\"\n                            else:\n                                ret = f\"({inter_expr} + {intra_expr})\"\n                        info[\"has_for_child\"] = 1\n                    else:                        \n                        ret = \"1\"                        \n                    if not info[\"valid\"]:\n                        ret = \"0\"\n                elif \"inter_trans\" in user_expr:\n                    # Plug in submodule latency\n                    if f\"{info['name']}_inter\" in info[\"modules\"]:\n                        ret = info[\"modules\"][f\"{info['name']}_inter\"]\n                    else:\n                        ret = \"1\"\n                    if not info[\"valid\"]:\n                        ret = \"0\"\n                elif \"intra_trans\" in user_expr:\n                    # Plug in submodule latency                    \n                    if f\"{info['name']}_intra\" in info[\"modules\"]:\n                        ret = info[\"modules\"][f\"{info['name']}_intra\"]\n                    else:\n                        ret = \"1\"\n                    if not info[\"valid\"]:\n                        ret = \"0\"\n                else:\n                    ret = \"1\"\n            elif lat[\"type\"] == \"if\":\n                # Only examine the first child\n                child = lat[\"child\"][0]\n                ret = extract_latency_expr(child, info)\n            elif lat[\"type\"] == \"array_tile\":      \n                if info[\"module_attr\"][\"to_dram\"] == 1 and info[\"module_attr\"][\"serialize\"] == 0:\n                    # Consider the DRAM latency here.\n                    ret = \"(\" + f\"{lat['size']}/{lat['last_dim']}*(20+{lat['last_dim']}/(512/8/{lat['ele_size']}))\" + \")\"\n                else:\n                    ret = \"(\" + lat[\"size\"] + \"/\" + lat[\"data_pack_factor\"] + \")\"\n            else:\n                raise RuntimeError(f\"Unsupported latency node type {lat['type']}\")\n\n            return ret\n\n        # Latency prologue\n        info = {\"has_for_child\": 0, \"name\": None, \"modules\": {}}\n        for i in range(2):\n            for module in desp[\"latency\"]:\n                if desp[\"attr\"][module][\"in\"] != 1:\n                    continue\n                if \"inter\" in module or \"intra\" in module:                    \n                    # Keep all the latency AST under the mark.\n                    info[\"valid\"] = True\n                    info[\"under_mark\"] = None\n                    info[\"in\"] = 1\n                else:\n                    # Only keep the latency AST under the mark.\n                    info[\"valid\"] = False\n                    info[\"under_mark\"] = \"array\"\n                    info[\"in\"] = 1\n                module_lat = desp[\"latency\"][module]  \n                info[\"name\"] = module     \n                info[\"module_attr\"] = desp[\"attr\"][module]\n                info[\"modules\"][module] = extract_latency_expr(module_lat, info)\n        for module in info[\"modules\"]:\n            if \"inter\" in module or \"intra\" in module:\n                continue\n            f.write(f\"\\t{module}_single_latency = \")                        \n            f.write(info[\"modules\"][module])\n            f.write(f\"\\n\")        \n        f.write(\"\\tlatency_prologue = max(\")\n        is_first = True\n        for module in info[\"modules\"]:\n            if \"inter\" in module or \"intra\" in module:\n                continue            \n            if not is_first:\n                f.write(\", \")\n            f.write(f\"{module}_single_latency\")\n            is_first = False\n        f.write(\")\\n\\n\")\n\n        # Latency epilogue\n        info = {\"has_for_child\": 0, \"name\": None, \"modules\": {}}\n        for i in range(2):\n            for module in desp[\"latency\"]:\n                if desp[\"attr\"][module][\"in\"] != 0:\n                    continue\n                if \"inter\" in module or \"intra\" in module:\n                    info[\"valid\"] = True\n                    info[\"under_mark\"] = None\n                    info[\"in\"] = 0\n                else:\n                    info[\"valid\"] = False\n                    info[\"under_mark\"] = \"array\"\n                    info[\"in\"] = 0\n                module_lat = desp[\"latency\"][module]  \n                info[\"name\"] = module                \n                info[\"module_attr\"] = desp[\"attr\"][module]\n                info[\"modules\"][module] = extract_latency_expr(module_lat, info)\n        for module in info[\"modules\"]:\n            if \"inter\" in module or \"intra\" in module:\n                continue\n            f.write(f\"\\t{module}_single_latency = \")                        \n            f.write(info[\"modules\"][module])\n            f.write(f\"\\n\")        \n        cnt = 0\n        for module in info[\"modules\"]:\n            if \"inter\" in module or \"intra\" in module:\n                continue    \n            cnt += 1\n        if cnt == 1:\n            f.write(\"\\tlatency_epilogue = \")\n        else:\n            f.write(\"\\tlatency_epilogue = max(\")\n        is_first = True\n        for module in info[\"modules\"]:\n            if \"inter\" in module or \"intra\" in module:\n                continue            \n            if not is_first:\n                f.write(\", \")\n            f.write(f\"{module}_single_latency\")\n            is_first = False\n        if cnt == 1:            \n            f.write(\"\\n\\n\")\n        else:\n            f.write(\")\\n\\n\")\n\n        # Latency main\n        info = {\"has_for_child\": 0, \"name\": None, \"modules\": {}}\n        for i in range(2):\n            # Run second time to fill in the incomplete expression            \n            for module in desp[\"latency\"]:\n                module_lat = desp[\"latency\"][module]  \n                info[\"name\"] = module\n                info[\"valid\"] = True\n                info[\"under_mark\"] = None\n                info[\"in\"] = -1\n                info[\"module_attr\"] = desp[\"attr\"][module]\n                info[\"modules\"][module] = extract_latency_expr(module_lat, info)            \n        for module in info[\"modules\"]:\n            if \"inter\" in module or \"intra\" in module:\n                continue\n            f.write(f\"\\t{module}_latency = \")                        \n            f.write(info[\"modules\"][module])\n            f.write(f\"\\n\")        \n        f.write(\"\\tlatency_main = max(\")\n        is_first = True\n        for module in info[\"modules\"]:\n            if \"inter\" in module or \"intra\" in module:\n                continue            \n            if not is_first:\n                f.write(\", \")\n            f.write(f\"{module}_latency\")\n            is_first = False\n        f.write(\")\\n\\n\")\n\n        #f.write(\"\\tprint(latency_prologue, latency_main, latency_epilogue)\\n\\n\")\n\n        f.write(\"\\tlatency = latency_prologue + latency_main + latency_epilogue\\n\\n\")\n        \n        f.write(\"\\treturn latency\\n\")\n        f.write(\"\\n\")\n\n    def print_infer_params_func(self, f, desp):\n        f.write(\"def infer_params(params):\\n\")\n        # Load parameters\n        f.write(\"\\t\")\n        is_first = True\n        for p in desp[\"params\"]:\n            if \"tags\" in p and \"auto_infer\" in p[\"tags\"]:\n                continue\n            if not is_first:\n                f.write(\", \")            \n            f.write(p[\"name\"])\n            is_first = False\n        f.write(\" = \")\n        is_first = True\n        for p in desp[\"params\"]:\n            if \"tags\" in p and \"auto_infer\" in p[\"tags\"]:\n                continue\n            if not is_first:\n                f.write(\", \")            \n            f.write(f'params[\\\"{p[\"name\"]}\\\"]')\n            is_first = False\n        f.write(\"\\n\\n\")\n\n        for p in desp[\"params\"]:\n            if \"tags\" in p and \"auto_infer\" in p[\"tags\"]:\n                f.write(f\"\\t{p['name']}_choices = [n*{p['bounds'][0]} for n in range(1, {p['bounds'][1]}//{p['bounds'][0]}+1) if {p['bounds'][1]}%(n*{p['bounds'][0]})==0]\\n\")\n                f.write(f\"\\tif len({p['name']}_choices) == 0:\\n\")\n                f.write(f\"\\t\\treturn None\\n\")\n                f.write(f\"\\tparams[\\\"{p['name']}\\\"] = max({p['name']}_choices)\\n\")\n        f.write(\"\\n\")                \n        f.write(\"\\treturn params\\n\\n\")\n\n    def print_random_sampling_func(self, f, desp):\n        f.write(\"def random_sampling(params):\\n\")\n        f.write(f\"\\tdef filter_non_power_of_two(x):\\n\")\n        f.write(f\"\\t\\tif np.log2(x) != int(np.log2(x)):\\n\")\n        f.write(f\"\\t\\t\\treturn True\\n\")\n        f.write(f\"\\t\\treturn False\\n\\n\")\n        # Print the task params\n        for p in self.params_config[\"external\"]:\n            f.write(f\"\\t{p} = params[\\\"{p}\\\"]\\n\")\n        f.write(\"\\twhile True:\\n\")\n        params_to_process = []\n        for param in self.params_config[\"tunable\"]:\n            params_to_process.append(self.params_config[\"tunable\"][param])\n        #while len(params_to_process) > 0:            \n        while True:\n            update = False\n            for param in params_to_process:\n                if \"divisors\" not in param: \n                    #print(\"first \", param[\"name\"])                   \n                    f.write(f\"\\t\\tsample = random.randint(int({param['bounds'][0]}), int({param['bounds'][1]}))\\n\")\n                    f.write(f\"\\t\\t{param['name']} = sample\\n\")\n                    f.write(f\"\\t\\tparams[\\\"{param['name']}\\\"] = sample\\n\")\n                    params_to_process.remove(param)\n                    update = True\n            if not update:\n                break\n        while len(params_to_process) > 0:            \n            for param in params_to_process:                \n                if \"divisors\" in param and param[\"divisors\"] not in params_to_process:                    \n                    #print(\"second \", param[\"name\"])\n                    if \"tags\" in param and \"power_of_two\" in param[\"tags\"]:\n                        f.write(f\"\\t\\tsample = random.sample(utils.get_divisors(int({param['bounds'][1]}), filter_non_power_of_two), 1)[-1]\\n\")\n                    else:\n                        f.write(f\"\\t\\tsample = random.sample(utils.get_divisors(int({param['bounds'][1]}), None), 1)[-1]\\n\")\n                    f.write(f\"\\t\\t{param['name']} = sample\\n\")\n                    f.write(f\"\\t\\tparams[\\\"{param['name']}\\\"] = sample\\n\")\n                    params_to_process.remove(param)\n        # Latency hiding\n        if \"PE\" not in desp[\"memory\"]:        \n            f.write(f\"\\t\\tbreak\\n\")\n        else:\n            f.write(f\"\\t\\tlatency_factors = 1\\n\")\n            for p, param in self.params_config[\"tunable\"].items():\n                if param[\"attr\"] == \"latency_tiling_factor\":\n                    f.write(f\"\\t\\tlatency_factors *= {param['name']}\\n\")\n                if param[\"attr\"] == \"SIMD_tiling_factor\":\n                    f.write(f\"\\t\\tsimd_factor = {param['name']}\\n\")\n            data_type = desp[\"memory\"][\"PE\"][\"ele_type\"]\n            if data_type == \"float\":\n                f.write(f\"\\t\\tif latency_factors >= 8 * simd_factor:\\n\")\n                f.write(f\"\\t\\t\\tbreak\\n\")\n            else:\n                raise RuntimeError(f\"Unsupported data type in random sample generation: {data_type}\")\n        f.write(\"\\n\")                \n        f.write(\"\\treturn params\\n\\n\")        \n\n    def print_bound_check_func(self, f, desp):\n        f.write(\"def bound_check(params):\\n\")\n        f.write(f\"\\tdef filter_non_power_of_two(x):\\n\")\n        f.write(f\"\\t\\tif np.log2(x) != int(np.log2(x)):\\n\")\n        f.write(f\"\\t\\t\\treturn True\\n\")\n        f.write(f\"\\t\\treturn False\\n\\n\")\n        # Load parameters\n        f.write(\"\\t\")\n        is_first = True\n        for p in desp[\"params\"]:\n            if not is_first:\n                f.write(\", \")\n            f.write(p[\"name\"])\n            is_first = False\n        f.write(\" = \")\n        is_first = True\n        for p in desp[\"params\"]:\n            if not is_first:\n                f.write(\", \")\n            f.write(f'params[\\\"{p[\"name\"]}\\\"]')\n            is_first = False\n        f.write(\"\\n\\n\")\n        for p in desp[\"params\"]:\n            if \"bounds\" in p:\n                f.write(f\"\\tif {p['name']} < {p['bounds'][0]}:\\n\")\n                f.write(f\"\\t\\treturn False\\n\")\n                f.write(f\"\\tif {p['name']} > {p['bounds'][1]}:\\n\")\n                f.write(f\"\\t\\treturn False\\n\")\n            if \"tags\" in p and \"power_of_two\" in p[\"tags\"]:\n                f.write(f\"\\tif filter_non_power_of_two({p['name']}):\\n\")\n                f.write(f\"\\t\\treturn False\\n\")\n        # Latency hiding\n        if \"PE\" in desp[\"memory\"]:\n            f.write(f\"\\tlatency_factors = 1\\n\")\n            for p, param in self.params_config[\"tunable\"].items():\n                if param[\"attr\"] == \"latency_tiling_factor\":\n                    f.write(f\"\\tlatency_factors *= {param['name']}\\n\")\n                if param[\"attr\"] == \"SIMD_tiling_factor\":\n                    f.write(f\"\\tsimd_factor = {param['name']}\\n\")\n            data_type = desp[\"memory\"][\"PE\"][\"ele_type\"]\n            if data_type == \"float\":\n                f.write(f\"\\tif latency_factors < 8 * simd_factor:\\n\")\n                f.write(f\"\\t\\treturn False\\n\")\n            else:\n                raise RuntimeError(f\"Unsupported data type in random sample generation: {data_type}\")\n        \n        f.write(\"\\treturn True\\n\\n\")        \n\n    def register(self, desp, py_f):\n        \"\"\" Register the design in the descriptor file\n        Generate all the necessary functions for evaluating the performance of the \n        target design.         \n        \"\"\"        \n        #print(desp[\"compute\"])        \n        with open(py_f, 'w') as f:\n            f.write(\"from math import ceil\\n\")\n            f.write(\"import numpy as np\\n\")\n            f.write(\"import random\\n\")\n            f.write(\"import utils\\n\\n\")\n\n            # Generate resource est func        \n            self.print_resource_est_func(f, desp)\n\n            # Generate latency est func\n            self.print_latency_est_func(f, desp)\n\n            # Tuning parameters\n            #self.params_config = desp[\"params\"]\n            self.params_config = {\"external\": {}, \"tunable\": {}, \"infer\": {}}\n            for param in desp[\"params\"]:\n                if param[\"tunable\"]:\n                    self.params_config[\"tunable\"][param[\"name\"]] = param\n                else:\n                    if \"external\" in param[\"tags\"]:\n                        self.params_config[\"external\"][param[\"name\"]] = param\n                    elif \"auto_infer\" in param[\"tags\"]:\n                        self.params_config[\"infer\"][param[\"name\"]] = param\n        \n            # Generate infer parameter func\n            self.print_infer_params_func(f, desp)\n\n            # Generate the random sampling func\n            self.print_random_sampling_func(f, desp)\n\n            # Generate the bound check func\n            self.print_bound_check_func(f, desp)\n\n        sys.path.append(os.path.dirname(py_f))\n        basename = os.path.basename(py_f).split(\".\")[0]        \n        module = __import__(basename)\n        self.est_resource_func = module.est_resource\n        self.est_latency_func = module.est_latency\n        self.infer_params_func = module.infer_params\n        self.random_sampling_func = module.random_sampling\n        self.bound_check_func = module.bound_check\n        self.desp = desp\n\n    def est_latency(self, params):\n        if not self.est_latency_func:\n            raise RuntimeError(f\"Latency function for design {self.name} undefined\")\n        else:\n            return self.est_latency_func(params)\n    \n    def est_resource(self, params):\n        if not self.est_latency_func:\n            raise RuntimeError(f\"Resource function for design {self.name} undefined\")\n        else:\n            return self.est_resource_func(params)\n\n    def infer_params(self, params):\n        if not self.infer_params_func:\n            raise RuntimeError(f\"Internal parameter inference function for design {self.name} undefined\")\n        else:\n            return self.infer_params_func(params)\n\n    def random_sampling(self, params):\n        if not self.random_sampling_func:\n            raise RuntimeError(f\"Random sampling function for design {self.name} undefined\")\n        else:\n            return self.random_sampling_func(params)\n\n    def bound_check(self, params):\n        if not self.bound_check_func:\n            raise RuntimeError(f\"Bound check function for design {self.name} undefined\")\n        else:\n            return self.bound_check_func(params)            "
  },
  {
    "path": "autosa_scripts/tuner/main.py",
    "content": "import argparse\nfrom datetime import datetime\nimport logging\nimport numpy as np\nimport os\nimport pickle\nimport concurrent.futures\nimport json\nimport pprint\n\nfrom design import Design\nfrom constraint import Constraint\nfrom search_task import SearchTask\nimport utils\nimport tuner\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--outdir', type=str, default=\"outdir\", help=\"output directory\")\n    parser.add_argument('--db', type=str, default=\"db\", help=\"search database\")\n    parser.add_argument('--objective', type=str, default=\"latency\", help=\"optimization target\")\n    parser.add_argument('--cst', type=str, default=\"hw_cst\", help=\"hardware constraint\")\n    parser.add_argument('--stop-after-epochs', type=int, default=-1, help=\"number of epochs of the unit searching task\")\n    parser.add_argument('--stop-after-time', type=int, default=-1, help=\"number of epochs of the unit searching task\")\n    parser.add_argument('--use-db', type=int, default=1, help=\"use database\")\n    parser.add_argument('--n-thread', type=int, default=16, help=\"number of threads to use for searching\")\n    parser.add_argument('--designs', type=str, default=\"designs\", help=\"systolic array design directory\")\n    parser.add_argument('--task', type=str, default=\"mm\", help=\"search task\")\n\n    args = parser.parse_args()\n    \n    search_obj = args.objective    \n    \n    # Set up the working directory\n    now = datetime.now()\n    outdir = args.outdir\n    os.makedirs(outdir, exist_ok=True)    \n    explore_config = \"\"\n    exp_name = f\"O_{args.objective}-C_{explore_config}-T_{now.date()}-{now.time()}\"\n    outdir = f\"{outdir}/{exp_name}\"\n    os.makedirs(outdir, exist_ok=True)\n    logger = utils.init_logger(outdir)\n\n    # Load the constraints\n    cst = Constraint(f'cst/{args.cst}.json')\n\n    # Set up the searching algorithm stop criteria\n    max_epochs = -1\n    max_time = -1\n    if args.stop_after_epochs > 0:\n        max_epochs = args.stop_after_epochs\n    elif args.stop_after_time > 0:\n        max_time = args.stop_after_time\n    else:\n        max_time = 60\n\n    # Set up the parallel executor    \n    # TODO\n\n    # Register designs    \n    design_dir = args.designs\n    os.makedirs(f\"{design_dir}/register\", exist_ok=True)\n    designs = []\n    for f in os.listdir(design_dir):\n        if f.endswith(\".json\"):\n            with open(f'{design_dir}/{f}', 'r') as json_f:\n                desp = json.load(json_f)\n            design = Design(f.split(\".\")[0])\n            design.register(desp, f\"{design_dir}/register/{design.name}.py\")\n            #print(design.name)\n            designs.append(design)\n    if len(designs) == 0:\n        raise RuntimeError(\"No design found\")        \n    #exit(0)\n\n    # Load task\n    with open(f'task/{args.task}.json') as f:\n        data = json.load(f)\n    tasks = []\n    for task in data[\"tasks\"]:\n        tasks.append(task)\n\n    # Start searching\n    counter = utils.PerfCounter(logger)\n    counter.init_counter(\"Total Search Time\")\n    all_records = []        \n    for task in tasks:\n        search_record = utils.SearchRecord().reset()\n        #for design in [designs[4]]:\n        for design in designs:\n            search_task = SearchTask(design, task)\n            record = tuner.genetic_search(search_task, cst, search_obj, logger, max_epochs, max_time)\n            all_records.append(record)\n            search_record.update(record)\n        task[\"search results\"] = search_record\n\n    counter.update_counter(\"Total Search Time\")\n    counter.print_counter(\"Total Search Time\")\n\n    print(all_records)\n\n    # Display and dump the search history\n    #for task in tasks:\n    #    logger.info(pprint.pformat(task, indent=4))\n    with open(f\"{outdir}/results.log\", 'w') as f:\n        f.write(pprint.pformat(task, indent=4))\n    with open(f\"{outdir}/history.log\", 'w') as f:\n        f.write(pprint.pformat(all_records, indent=4))"
  },
  {
    "path": "autosa_scripts/tuner/search_task.py",
    "content": "import json\nimport random\nimport numpy as np\nimport bisect\n#from sympy import *\n\nimport utils\n\nclass SearchTask(object):\n    def __init__(self, design, task):\n        self.design = design\n        self.task = task        \n\n    def adjust_params(self, params):\n        \"\"\" Adjust the parameters based on its contraints.\n        \"\"\"\n        def filter_non_power_of_two(x):\n            if np.log2(x) != int(np.log2(x)):\n                return True\n            return False\n        \n        # Making all factors to be even numbers to have more divisors\n        for p, param in self.design.params_config[\"tunable\"].items():\n            params[p] = int(np.ceil(params[p] / 2) * 2)        \n        \n        # Making all divisor factors to be divisors of the dependent variable\n        for p, param in self.design.params_config[\"tunable\"].items():\n            #print(param)\n            if \"divisors\" in param:\n                if \"tags\" in param and \"power_of_two\" in param[\"tags\"]:\n                    choices = utils.get_divisors(params[param[\"divisors\"][0]], filter_non_power_of_two)\n                else:\n                    choices = utils.get_divisors(params[param[\"divisors\"][0]], None)\n                idx = bisect.bisect(choices, params[p])\n                if idx >= len(choices):\n                    idx -= 1\n                if idx > 1:\n                    if abs(choices[idx - 1] - params[p]) < abs(choices[idx] - params[p]):\n                        idx -= 1\n                params[p] = choices[idx]\n\n        return params\n\n    def generate_random_sample(self):\n        \"\"\" Generate a random sample in the design space.\n        \"\"\"\n        task_params = {}\n        for param in self.task[\"params\"]:\n            task_params[param] = self.task[\"params\"][param]\n        return self.design.random_sampling(task_params)        \n\n    def evaluate(self, params, metric=\"latency\"):        \n        if metric == \"latency\":\n            params = self.design.infer_params(params)                        \n            if params:\n                if not self.design.bound_check(params):\n                    return 0, None\n                latency = self.design.est_latency(params)\n                resource = self.design.est_resource(params)\n                if latency:\n                    return 1 / latency, resource\n                else:\n                    return 0, None\n            else:\n                return 0, None\n        else:                        \n            raise RuntimeError(f\"Not supported metric: {metric}\")"
  },
  {
    "path": "autosa_scripts/tuner/task/cnn.json",
    "content": "{\n  \"tasks\": [\n    {\n      \"name\": \"conv\",\n      \"params\": {\n        \"o\": 6,\n        \"i\": 1,\n        \"r\": 5,\n        \"c\": 5,\n        \"p\": 3,\n        \"q\": 3\n      }      \n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/tuner/task/mm.json",
    "content": "{\n  \"tasks\": [\n    {\n      \"name\": \"gemm1\",\n      \"params\": {\n        \"p0\": 1024,\n        \"p1\": 1024,\n        \"p2\": 1024\n      }      \n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/tuner/task/mm2.json",
    "content": "{\n  \"tasks\": [\n    {\n      \"name\": \"gemm1\",\n      \"params\": {\n        \"p0\": 1024,\n        \"p1\": 1024,\n        \"p2\": 1024\n      }      \n    },\n    {\n      \"name\": \"gemm2\",\n      \"params\": {\n        \"p0\": 512,\n        \"p1\": 512,\n        \"p2\": 512\n      }      \n    }\n  ]\n}\n"
  },
  {
    "path": "autosa_scripts/tuner/tuner.py",
    "content": "import numpy as np\n\nimport utils\nimport random\n\nclass Tuner(object):\n    def __init__(self, task, cst, obj, logger, max_epoch, max_time):\n        self.task = task\n        self.cst = cst\n        self.obj = obj\n        self.logger = logger\n        self.max_epoch = max_epoch\n        self.max_time = max_time\n        self.best_reward = 0\n        self.best_task_params = None\n        self.best_search_record = utils.SearchRecord().reset()        \n\n    def overuse_constraint(self, used_cst):\n        if not used_cst:\n            # If constraint doesn't exist, return True to exclude this design\n            return True\n\n        if used_cst['BRAM18K'] > self.cst.hw_cst['BRAM18K']:            \n            return True\n        if used_cst['DSP'] > self.cst.hw_cst['DSP']:            \n            return True\n        return False\n\nclass GeneticTuner(Tuner):\n    def __init__(self, task, cst, obj, logger, max_epoch, max_time, params):\n        super().__init__(task, cst, obj, logger, max_epoch, max_time)        \n        self.params = params\n        self.epoch = 0\n        if max_epoch > 0:\n            self.stop_criteria = \"epoch\"\n            self.max_epoch = max_epoch\n        else:\n            self.stop_criteria = \"time\"\n            self.max_time = max_time\n        self.counter = utils.PerfCounter(self.logger)\n        self.search_time = None\n        self.param_idx_map = {}\n        self.idx_param_map = {}\n\n    def select_parents(self, population, fitness, num_parents):\n        \"\"\" Select \"num_parents\" parents with the highest fitness score.\n        \"\"\"        \n        fitness_idx_sorted = np.argsort(-fitness)        \n        parents = population[fitness_idx_sorted[:num_parents]][:]\n        return parents\n\n    def crossover(self, pool, num_children):\n        \"\"\" Perform single-point crossover.\n        \"\"\"\n        children = np.empty((num_children, len(self.task.design.params_config[\"tunable\"])))\n        # Build the parameter dependecy chain\n        param_deps = {}\n        param_cnt = 0\n        for p, param in self.task.design.params_config[\"tunable\"].items():\n            if \"divisors\" in param:\n                param_deps[param[\"name\"]] = param[\"divisors\"][0]\n                param_cnt += 2\n        if param_cnt != len(self.task.design.params_config[\"tunable\"]):\n            raise RuntimeError(\"Not all tuning parameters can be handled by crossover\")\n        #print(param_deps)        \n        for i in range(num_children):\n            parents_idx = [i % pool.shape[0], np.random.randint(0, pool.shape[0])]\n            #print(parents_idx)\n            #print(pool[parents_idx[0]][:])\n            #print(pool[parents_idx[1]][:])\n            for param in param_deps:\n                idx = np.random.randint(0, 2)\n                #print(idx)\n                children[i][self.param_idx_map[param]] = pool[parents_idx[idx]][self.param_idx_map[param]]\n                children[i][self.param_idx_map[param_deps[param]]] = pool[parents_idx[idx]][self.param_idx_map[param_deps[param]]]\n            #print(children[i][:])\n            #exit(0)\n\n        return children\n\n    def mutation(self, pool):\n        \"\"\" Perform mutation\n        \"\"\"\n        for p_idx in range(pool.shape[0]):\n            if random.random() < self.params[\"mutation_probability\"]:\n                if random.random() < self.params[\"epsilon\"]:\n                    task_params = self.task.generate_random_sample()\n                    for i in range(pool.shape[1]):\n                        pool[p_idx][i] = task_params[self.idx_param_map[i]]\n                else:\n                    idv = pool[p_idx][:]\n                    task_params = {}                    \n                    for p, param in self.task.design.params_config[\"tunable\"].items():                \n                        task_params[param[\"name\"]] = idv[self.param_idx_map[param[\"name\"]]]\n                    for p, param in self.task.design.params_config[\"external\"].items():\n                        task_params[param[\"name\"]] = self.task.task[\"params\"][param[\"name\"]]\n                    # Build the chains\n                    # [{\"params\": [p0, p3, p7], \"factors\": [ceil(p0/p3), p3/p7, p7]}, {}]\n                    split_chains = []\n                    for p, param in self.task.design.params_config[\"external\"].items():\n                        chain = {\"params\": [param[\"name\"]], \"factors\": []}\n                        cur_param = param                                                \n                        while \"split_by\" in cur_param:\n                            #print(self.task.design.params_config[\"tunable\"][cur_param[\"split_by\"]])\n                            if \"divisors\" in self.task.design.params_config[\"tunable\"][cur_param[\"split_by\"]] \\\n                                and cur_param[\"name\"] in self.task.design.params_config[\"tunable\"][cur_param[\"split_by\"]][\"divisors\"]:\n                                div = 1\n                            else:\n                                div = 0\n                            chain[\"params\"].append(cur_param[\"split_by\"])\n                            if div:\n                                factor = np.ceil(task_params[cur_param[\"name\"]] / task_params[cur_param[\"split_by\"]])\n                            else:\n                                factor = task_params[cur_param[\"name\"]] / task_params[cur_param[\"split_by\"]]                            \n                            chain[\"factors\"].append(int(factor))                            \n                            cur_param = self.task.design.params_config[\"tunable\"][cur_param[\"split_by\"]]                        \n                        chain[\"factors\"].append(int(task_params[cur_param[\"name\"]]))\n                        split_chains.append(chain)\n                    \n                    # Mutation\n                    for chain in split_chains:\n                        if len(chain[\"factors\"]) <= 1:\n                            continue\n                        src_idx, dst_idx = random.sample(range(0, len(chain[\"factors\"])), 2)\n                        mutation_policy_probs = [0.2, 0, 0.8]\n                        mutation_policy_probs = np.cumsum(mutation_policy_probs)\n                        if random.random() < mutation_policy_probs[0]:\n                            if chain[\"factors\"][dst_idx] == 1:\n                                continue\n                            inc_stride = max(1, int(chain[\"factors\"][src_idx] * random.random() * 1.0))\n                            dec_stride = max(1, int(chain[\"factors\"][dst_idx] - chain[\"factors\"][src_idx] * chain[\"factors\"][dst_idx] / (chain[\"factors\"][src_idx] + inc_stride)))\n                            chain[\"factors\"][src_idx] += inc_stride                        \n                            chain[\"factors\"][dst_idx] -= dec_stride\n                            chain[\"factors\"][dst_idx] = max(1, chain[\"factors\"][dst_idx])          \n                        elif random.random() < mutation_policy_probs[1]:\n                            pass\n                        else:\n                            factor = chain[\"factors\"][src_idx]\n                            if factor == 1:\n                                continue\n                            divs = utils.factorization(factor)\n                            div = random.choice(divs)\n                            chain[\"factors\"][src_idx] /= div\n                            chain[\"factors\"][dst_idx] *= div\n\n                    # Revert to the params\n                    # [{\"params\": [p0, p3, p7], \"factors\": [ceil(p0/p3), p3/p7, p7]}, {}]\n                    for chain in split_chains:\n                        factor = chain[\"factors\"][-1]\n                        param = chain[\"params\"][-1]                        \n                        if param in self.param_idx_map:\n                            pool[p_idx][self.param_idx_map[param]] = factor\n                        for idx in range(len(chain[\"factors\"]) - 2, -1, -1):\n                            param = chain[\"params\"][idx]\n                            factor *= chain[\"factors\"][idx]\n                            if param in self.param_idx_map:\n                                pool[p_idx][self.param_idx_map[param]] = factor\n        \n        return pool             \n\n    def search(self):\n        \"\"\" Search the design space using genetic algorithms.\n\n        The algorithm is configured by several parameters.\n        @ population_size: the number of trial solutions in each epoch.\n        @ mutation_probability: the chance of each gene in each individual solution\n        to be replaced by a random value.\n        @ crossover_probability: the chance of an existed solution to pass its genome\n        to new trial solutions.\n        @ parents_ratio: the ratio of population filled by the members of the previous\n        generation.\n        \"\"\"     \n        self.counter.init_counter('Search Time')   \n        if self.stop_criteria == \"time\":\n            self.counter.init_counter('time')\n\n        # Init the stats\n        num_pop = int(self.params[\"population_size\"])\n        num_gen = int(self.max_epoch // num_pop)        \n        num_parents = int(num_pop * self.params[\"parents_ratio\"])\n        self.logger.info(f'Number of generations: {num_gen}')\n        self.logger.info(f'Number of population: {num_pop}')\n        self.logger.info(f'Number of parents: {num_parents}')\n\n        # Init the population\n        population = np.empty((num_pop, len(self.task.design.params_config[\"tunable\"])), dtype=int)\n        if \"ancestor\" in self.params and self.params[\"ancestor\"] != None:\n            pass\n        else:\n            # Initialize the population randomly\n            pop_cnt = 0\n            while pop_cnt < num_pop:                \n                task_params = self.task.generate_random_sample()\n                param_arr = []\n                for p, param in self.task.design.params_config[\"tunable\"].items():                    \n                    param_arr.append(task_params[param[\"name\"]])\n                population[pop_cnt] = np.array(param_arr, dtype=int)\n                pop_cnt += 1                \n        idx = 0\n        for p, param in self.task.design.params_config[\"tunable\"].items():\n            self.param_idx_map[param[\"name\"]] = idx\n            self.idx_param_map[idx] = param[\"name\"]\n            idx += 1\n\n        fitness = np.empty(num_pop, dtype=float)\n        for i in range(num_pop):\n            idv = population[i]\n            task_params = {}\n            for p, param in self.task.design.params_config[\"tunable\"].items():\n                task_params[param[\"name\"]] = idv[self.param_idx_map[param[\"name\"]]]                    \n            for p, param in self.task.design.params_config[\"external\"].items():\n                task_params[param[\"name\"]] = self.task.task[\"params\"][param[\"name\"]]\n            reward, used_constraint = self.task.evaluate(task_params, self.obj)\n            if self.overuse_constraint(used_constraint):                \n                reward = 0\n            fitness[i] = reward\n\n        while True:\n            # Select the parents\n            parents = self.select_parents(population, fitness, num_parents)\n            # Crossover\n            children = self.crossover(parents, num_pop - num_parents)\n            # Mutation            \n            children = self.mutation(children) \n            # Compose the new generation\n            population[0:parents.shape[0], :] = parents\n            population[parents.shape[0]:, :] = children      \n            # Update the fitness\n            for i in range(num_pop):\n                idv = population[i]\n                task_params = {}                \n                for p, param in self.task.design.params_config[\"tunable\"].items():\n                    task_params[param[\"name\"]] = idv[self.param_idx_map[param[\"name\"]]]                    \n                for p, param in self.task.design.params_config[\"external\"].items():\n                    task_params[param[\"name\"]] = self.task.task[\"params\"][param[\"name\"]]\n                #print(task_params)\n                task_params = self.task.adjust_params(task_params)\n                #if task_params[\"p3\"] % task_params[\"p7\"] != 0:\n                #    print(task_params)\n                #    exit(0)\n                #print(task_params)                \n                reward, used_constraint = self.task.evaluate(task_params, self.obj)\n                if self.overuse_constraint(used_constraint):                \n                    reward = 0\n                fitness[i] = reward\n                # Update the record\n                if reward > self.best_reward:\n                    self.best_reward = reward\n                    self.best_cst = used_constraint\n                    self.best_task_params = task_params\n                    self.logger.info(f'Epoch {self.epoch}: new best reward: {self.best_reward} ({1/self.best_reward:.0f})')\n                    self.best_search_record = utils.SearchRecord().extract_from_tuner(self)\n            #exit(0)\n            self.epoch += num_pop\n            if self.stop_criteria == \"epoch\" and epoch > self.max_epoch:\n                break\n            if self.stop_criteria == \"time\":\n                self.counter.update_counter('time')\n                if self.counter.get_counter('time') > self.max_time:\n                    break\n\n        self.counter.update_counter('Search Time')   \n        self.search_time = self.counter.get_counter('Search Time')\n        return\n\ndef genetic_search(task, cst, obj, logger, max_epochs, max_time):\n    tuner_params = {\n        \"population_size\": 200,\\\n        \"mutation_probability\": 0.5,\\\n        \"parents_ratio\": 0.3,\\\n        \"epsilon\": 0.1,\\\n        \"ancestor\": None            \n    }\n\n    tuner = GeneticTuner(task, cst, obj, logger, max_epochs, max_time, tuner_params)\n    tuner.search()\n    search_record = utils.SearchRecord().extract_from_tuner(tuner)    \n\n    return search_record"
  },
  {
    "path": "autosa_scripts/tuner/unit_test.py",
    "content": "import argparse\nfrom datetime import datetime\nimport logging\nimport numpy as np\nimport os\nimport pickle\nimport concurrent.futures\nimport json\nimport pprint\n\nfrom design import Design\nfrom constraint import Constraint\nfrom search_task import SearchTask\nimport utils\nimport tuner\n\nif __name__ == \"__main__\":\n    cst = Constraint(f'cst/hw_cst.json')\n    max_epochs = -1\n    max_time = 20\n    search_obj = \"latency\"\n\n    # Set up the working directory\n    now = datetime.now()\n    outdir = \"outdir\"\n    os.makedirs(outdir, exist_ok=True)    \n    explore_config = \"\"\n    exp_name = f\"O_{search_obj}-C_{explore_config}-T_{now.date()}-{now.time()}\"\n    outdir = f\"{outdir}/{exp_name}\"\n    os.makedirs(outdir, exist_ok=True)\n    logger = utils.init_logger(outdir)\n\n    design_dir = \"/curr/jaywang/research/autosa/AutoSA/autosa.tmp/output/tuning\"\n    designs = []\n    for f in os.listdir(design_dir):\n        if f.endswith(\".json\"):\n            with open(f'{design_dir}/{f}', 'r') as json_f:\n                desp = json.load(json_f)\n            design = Design(f.split(\".\")[0])\n            design.register(desp, f\"{design_dir}/register/{design.name}.py\")\n            designs.append(design)\n    if len(designs) == 0:\n        raise RuntimeError(\"No design found\")\n\n    # Load task    \n    with open(f'task/mm.json') as f:\n        data = json.load(f)\n    tasks = []\n    for task in data[\"tasks\"]:\n        tasks.append(task)\n\n    # Start searching\n    for task in tasks:\n        search_record = utils.SearchRecord().reset()\n        #for design in designs:\n        for design in [designs[0]]:\n            print(design.name)\n            search_task = SearchTask(design , task)\n            #task_params = {\n            #    \"p0\": 1024, \"p1\": 1024, \"p2\": 1024,\n            #    \"p3\": 206, \"p4\": 172, \"p5\": 8,\n            #    \"p6\": 86, \"p7\": 2, \"p8\": 8\n            #}\n            task_params = {\n                \"p0\": 1024, \"p1\": 1024, \"p2\": 1024,\n                \"p3\": 342, \"p4\": 56, \"p5\": 148,\n                \"p6\": 19, \"p7\": 2, \"p8\": 8\n            }\n            # i j k \n            # i k j\n            # i j k \n            reward, resource = search_task.evaluate(task_params)\n            print(1/reward)\n            print(resource)\n            #search_record.update(tuner.genetic_search(search_task, cst, search_obj, logger, max_epochs, max_time))\n        #task[\"search results\"] = search_record\n\n    #for task in tasks:\n    #    logger.info(pprint.pformat(task, indent=4))"
  },
  {
    "path": "autosa_scripts/tuner/utils.py",
    "content": "import time\nimport functools\nimport math\nimport logging\nimport itertools\nfrom datetime import datetime\nfrom subprocess import Popen, PIPE\nimport json\nimport pprint\nimport concurrent.futures\nimport queue\n\ndef factorization(x):\n    if x == 0:\n        raise RuntimeError(f\"Factorization of 0\")\n    prime_factors = []\n    while x % 2 == 0:\n        prime_factors.append(2)\n        x = x / 2\n    \n    for i in range(3, int(math.sqrt(x)) + 1, 2):\n        while x % i == 0:\n            prime_factors.append(int(i))\n            x = x / i\n    \n    if x > 2:\n        prime_factors.append(int(x))\n\n    return prime_factors\n\ndef get_divisors(x, filter=None):\n    \"\"\" Return the divisors of the integer x\n    Call the filter function to filter out the illegal one.\n    \"\"\"\n    divisors = []\n    large_divisors = []\n    for i in range(1, int(math.sqrt(x) + 1)):\n        if x % i == 0:\n            if (filter and not filter(i)) or not filter:\n                divisors.append(int(i))\n            if i * i != x:\n                if (filter and not filter(int(x / i))) or not filter:\n                    large_divisors.append(int(x / i))\n    for d in reversed(large_divisors):\n        divisors.append(d)\n\n    return divisors\n\nclass PerfCounter(object):\n    def __init__(self, logger):\n        self.logger = logger\n        self.counters = {}\n    \n    def init_counter(self, name):        \n        self.counters[name] = {'start': time.perf_counter(), 'elapsed': 0}\n        \n    def update_counter(self, name):\n        if name not in self.counters:\n            raise RuntimeError(f\"Counter {name} is not defined\")\n        now = time.perf_counter()\n        self.counters[name]['elapsed'] += (now - self.counters[name]['start'])\n        self.counters[name]['start'] = now\n\n    def get_counter(self, name):\n        if name not in self.counters:\n            raise RuntimeError(f\"Counter {name} is not defined\")\n        return self.counters[name]['elapsed']\n\n    def print_counter(self, name):\n        if name not in self.counters:\n            raise RuntimeError(f\"Counter {name} is not defined\")\n        self.logger.info(f'[Event: {name}] Total elapsed time: {self.counters[name][\"elapsed\"]:.4f} s')\n\n    def print_counters(self):\n        for name in self.counters:\n            self.logger.info(f'[Event: {name}] Total elapsed time: {self.counters[name][\"elapsed\"]:.4f} s')\n\ndef init_logger(outdir):\t\n    logger = logging.getLogger('AutoSA-Tuner')\n    # If there is already any handlers, remove them\t\n    for handler in logger.handlers[:]:\n        handler.close()\n        logger.removeHandler(handler)\n    formatter = logging.Formatter(\n                '[%(name)s %(asctime)s] %(levelname)s: %(message)s',\n                '%Y-%m-%d %H:%M:%S')\n    logger.setLevel(logging.INFO)\n    s_handler = logging.StreamHandler()    \t\n    f_handler = logging.FileHandler(f'{outdir}/tuning.log', 'a')\n    s_handler.setLevel(level=logging.INFO)\n    f_handler.setLevel(level=logging.INFO)    \n    s_handler.setFormatter(formatter)\n    f_handler.setFormatter(formatter)\n    logger.addHandler(s_handler)\n    logger.addHandler(f_handler)\n    \n    return logger       \n\nclass SearchRecord(object):\n    def __init__(self, max=1):\n        self.cst = None\n        self.max = max\n        if self.max == 1:\n            self.reward = 0\n        else:\n            self.reward = float(\"inf\")\n        self.latency = 0\n        self.dsp_eff = 0\n        self.design = -1\n        self.ops = 0\n        self.task_params = {}\n        self.task_name = None\n        self.metric = None\n        self.tuning_params = {}\n\n    def reset(self):\n        self.cst = None        \n        if self.max == 1:\n            self.reward = 0\n        else:\n            self.reward = float(\"inf\")\n        self.latency = 0\n        self.dsp_eff = 0\n        self.design = -1\n        self.ops = 0\n        self.task_params = {}\n        self.task_name = None\n        self.metric = None        \n\n        return self\n\n    def update(self, new_record):        \n        if self.max != new_record.max:\n            raise RuntimeError(\"Inconsistent search record configuration\")\n        status = False\n        if self.max == 1:\n            if new_record.reward > self.reward:\t\t\t\t\n                status = True\n        else:\n            if new_record.reward < self.reward:\n                status = True\n        if status:\n            self.cst = new_record.cst\n            self.reward = new_record.reward\n            self.latency = new_record.latency\n            self.dsp_eff = new_record.dsp_eff\n            self.design = new_record.design            \n            self.ops = new_record.ops\n            self.task_params = new_record.task_params\n            self.task_name = new_record.task_name            \n\n    def extract_from_tuner(self, tuner):\n        if tuner.best_task_params:\n            self.cst = tuner.best_cst\n            self.reward = tuner.best_reward\n            if tuner.obj == \"latency\":\n                self.latency = 1 / self.reward\n            else:\n                raise RuntimeError(\"Unsupported search objective\")\n            self.design = tuner.task.design.name\n            self.task_params = tuner.best_task_params\n            self.task_name = tuner.task.task[\"name\"]            \n\n        return self\n\n    def __repr__(self):\n        to_print = \"\"\n        to_print += f\"\\nreward: {self.reward}\"\n        to_print += f\"\\ncst: {pprint.pformat(self.cst, indent=4)}\"\n        to_print += f\"\\nlatency: {self.latency}\"\n        to_print += f\"\\ndesign: {self.design}\"\n        to_print += f\"\\ntask_name: {self.task_name}\"\n        to_print += f\"\\ntask_params: \\n{pprint.pformat(self.task_params, indent=4)}\"\n        to_print += \"\\n\"\n\n        return to_print"
  },
  {
    "path": "autosa_scripts/tuning_scripts/cnn.sh",
    "content": "#!/bin/bash\n\ncd ../../\n# <[i,r,c],o>\n# <[o,r,c],i>\n# <[o,i],[r,c]>\n\nfor loop_order in 1\ndo\n    #./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[0]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --explore-loop-permute --loop-permute-order=$loop_order\n    #./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[1]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --explore-loop-permute --loop-permute-order=$loop_order\n    ./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[2]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --select-rar-dep=\"{kernel[]->__pet_ref_3[1]}\" --explore-loop-permute --loop-permute-order=$loop_order    \n    #./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --local-reduce --reduce-op=\"+\" --simd-touch-space --explore-loop-permute --loop-permute-order=$loop_order\n    #./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[4]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --explore-loop-permute --loop-permute-order=$loop_order\n    ./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[5]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --select-rar-dep=\"{kernel[]->__pet_ref_3[1]}\" --explore-loop-permute --loop-permute-order=$loop_order\n    #./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[6]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --local-reduce --reduce-op=\"+\" --simd-touch-space --explore-loop-permute --loop-permute-order=$loop_order\n    #./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[7]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --explore-loop-permute --loop-permute-order=$loop_order\n    #./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[8]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --local-reduce --reduce-op=\"+\" --simd-touch-space --explore-loop-permute --loop-permute-order=$loop_order\n    ./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[9]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --select-rar-dep=\"{kernel[]->__pet_ref_3[1]}\" --local-reduce --reduce-op=\"+\" --simd-touch-space --explore-loop-permute --loop-permute-order=$loop_order\ndone\n\nfor loop_order in 0 2\ndo\n    #./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[0]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --explore-loop-permute --loop-permute-order=$loop_order\n    #./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[1]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --explore-loop-permute --loop-permute-order=$loop_order\n    ./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[2]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --select-rar-dep=\"{kernel[]->__pet_ref_3[1]}\" --explore-loop-permute --loop-permute-order=$loop_order\n    #./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --simd-touch-space --explore-loop-permute --loop-permute-order=$loop_order\n    #./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[4]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --explore-loop-permute --loop-permute-order=$loop_order\n    ./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[5]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --select-rar-dep=\"{kernel[]->__pet_ref_3[1]}\" --explore-loop-permute --loop-permute-order=$loop_order\n    #./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[6]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --simd-touch-space --explore-loop-permute --loop-permute-order=$loop_order\n    #./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[7]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --explore-loop-permute --loop-permute-order=$loop_order\n    #./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[8]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --simd-touch-space --explore-loop-permute --loop-permute-order=$loop_order\n    ./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[9]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/cnn/param_names.json --simd-touch-space --explore-loop-permute --select-rar-dep=\"{kernel[]->__pet_ref_3[1]}\" --loop-permute-order=$loop_order\ndone\ncd -\n"
  },
  {
    "path": "autosa_scripts/tuning_scripts/gemm.sh",
    "content": "#!/bin/bash\n\ncd ../../\n# <[i,j],k>\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[0]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --explore-loop-permute --loop-permute-order=2\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[1]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --explore-loop-permute --loop-permute-order=2\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[2]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --local-reduce --reduce-op=\"+\" --simd-touch-space --explore-loop-permute --loop-permute-order=2\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --explore-loop-permute --loop-permute-order=2\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[4]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --local-reduce --reduce-op=\"+\" --simd-touch-space --explore-loop-permute --loop-permute-order=2\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[5]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --local-reduce --reduce-op=\"+\" --simd-touch-space --explore-loop-permute --loop-permute-order=2\n\n# <[i,k],j>\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[0]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --explore-loop-permute --loop-permute-order=0\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[1]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --explore-loop-permute --loop-permute-order=0\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[2]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --simd-touch-space --explore-loop-permute --loop-permute-order=0\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --explore-loop-permute --loop-permute-order=0\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[4]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --simd-touch-space --explore-loop-permute --loop-permute-order=0\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[5]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --simd-touch-space --explore-loop-permute --loop-permute-order=0\n\n# <[k,j],i>\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[0]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --explore-loop-permute --loop-permute-order=1\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[1]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --explore-loop-permute --loop-permute-order=1\n#./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[2]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --local-reduce --reduce-op=\"+\" --simd-touch-space --explore-loop-permute --loop-permute-order=1\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[2]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --simd-touch-space --explore-loop-permute --loop-permute-order=1\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --explore-loop-permute --loop-permute-order=1\n#./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[4]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --local-reduce --reduce-op=\"+\" --simd-touch-space --explore-loop-permute --loop-permute-order=1\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[4]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --simd-touch-space --explore-loop-permute --loop-permute-order=1\n#./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[5]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --local-reduce --reduce-op=\"+\" --simd-touch-space --explore-loop-permute --loop-permute-order=1\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[5]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --tuning-method=1 --param-names=./autosa_tests/mm/param_names.json --simd-touch-space --explore-loop-permute --loop-permute-order=1\ncd -"
  },
  {
    "path": "autosa_scripts/tuning_scripts/model_validate.sh",
    "content": "# Dataflow [i] Permutation <[i,j],k>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[0];kernel[]->array_part[32,32,32];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls\n\n# Dataflow [j] Permutation <[i,j],k>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[1];kernel[]->array_part[32,32,32];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls\n\n# Dataflow [k] Permutation <[i,j],k>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[2];kernel[]->array_part[4,32,32];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls \\\n--local-reduce \\\n--reduce-op=\"+\" \\\n--simd-touch-space \\\n--array-contraction\n\n# Dataflow [i,j] Permutation <[i,j],k>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls\n\n# Dataflow [i,k] Permutation <[i,j],k>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[32,4,32];kernel[]->latency[16,16];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls \\\n--local-reduce \\\n--reduce-op=\"+\" \\\n--simd-touch-space \\\n--array-contraction\n\n# Dataflow [j,k] Permutation <[i,j],k>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[5];kernel[]->array_part[32,4,32];kernel[]->latency[16,16];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls \\\n--local-reduce \\\n--reduce-op=\"+\" \\\n--simd-touch-space \\\n--array-contraction\n\n#####################################################\n\n# Dataflow [i] Permutation <[i,k],j>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[0];kernel[]->array_part[32,32,32];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls \\\n--explore-loop-permute \\\n--loop-permute-order=0\n\n# Dataflow [j] Permutation <[i,k],j>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[1];kernel[]->array_part[32,32,32];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls \\\n--explore-loop-permute \\\n--loop-permute-order=0\n\n# Dataflow [k] Permutation <[i,k],j>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[2];kernel[]->array_part[4,32,32];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls \\\n--local-reduce \\\n--reduce-op=\"+\" \\\n--simd-touch-space \\\n--array-contraction \\\n--explore-loop-permute \\\n--loop-permute-order=0\n\n# Dataflow [i,j] Permutation <[i,k],j>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls \\\n--explore-loop-permute \\\n--loop-permute-order=0\n\n# Dataflow [i,k] Permutation <[i,k],j>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[32,4,32];kernel[]->latency[16,16];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls \\\n--local-reduce \\\n--reduce-op=\"+\" \\\n--simd-touch-space \\\n--array-contraction \\\n--explore-loop-permute \\\n--loop-permute-order=0\n\n# Dataflow [j,k] Permutation <[i,k],j>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[5];kernel[]->array_part[32,4,32];kernel[]->latency[16,16];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls \\\n--local-reduce \\\n--reduce-op=\"+\" \\\n--simd-touch-space \\\n--array-contraction \\\n--explore-loop-permute \\\n--loop-permute-order=0\n\n#####################################################\n\n# Dataflow [i] Permutation <[k,j],i>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[0];kernel[]->array_part[32,32,32];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls \\\n--explore-loop-permute \\\n--loop-permute-order=1\n\n# Dataflow [j] Permutation <[k,j],i>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[1];kernel[]->array_part[32,32,32];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls \\\n--explore-loop-permute \\\n--loop-permute-order=1\n\n# Dataflow [k] Permutation <[k,j],i>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[2];kernel[]->array_part[4,32,32];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls \\\n--local-reduce \\\n--reduce-op=\"+\" \\\n--simd-touch-space \\\n--array-contraction \\\n--explore-loop-permute \\\n--loop-permute-order=1\n\n# Dataflow [i,j] Permutation <[k,j],i>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls \\\n--explore-loop-permute \\\n--loop-permute-order=1\n\n# Dataflow [i,k] Permutation <[k,j],i>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[32,4,32];kernel[]->latency[16,16];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls \\\n--local-reduce \\\n--reduce-op=\"+\" \\\n--simd-touch-space \\\n--array-contraction \\\n--explore-loop-permute \\\n--loop-permute-order=1\n\n# Dataflow [j,k] Permutation <[k,j],i>\n./autosa ./autosa_tests/mm/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[5];kernel[]->array_part[32,4,32];kernel[]->latency[16,16];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm/simd_info.json \\\n--host-serialize \\\n--hls \\\n--local-reduce \\\n--reduce-op=\"+\" \\\n--simd-touch-space \\\n--array-contraction \\\n--explore-loop-permute \\\n--loop-permute-order=1"
  },
  {
    "path": "autosa_scripts/vitis_scripts/Makefile",
    "content": "VPP := $(XILINX_VITIS)/bin/v++\nEMCONFIGUTIL := $(XILINX_VITIS)/bin/emconfigutil\nMODE := hw\nPLATFORM := xilinx_u250_xdma_201830_2\n\n# sources\nKERNEL_SRC := src/kernel_kernel.cpp\nHOST_SRC := src/kernel_host.cpp\n\n# targets\nHOST_EXE := host.exe\n\nXOS := kernel0.$(MODE).xo\nXCLBIN := kernel0.$(MODE).xclbin\nEMCONFIG_FILE := emconfig.json\n\n# Linker options to map kernel ports to DDR banks\nVPP_LINK_OPTS := --config connectivity.cfg\n\nVPP_COMMON_OPTS := -s -t $(MODE) --platform $(PLATFORM) -R2 -O3 --kernel_frequency 250 --vivado.prop=run.impl_1.STRATEGY=Performance_EarlyBlockPlacement\nCFLAGS := -g -std=c++11 -I$(XILINX_XRT)/include\nLFLAGS := -L$(XILINX_XRT)/lib -lxilinxopencl -lpthread -lrt\nNUMDEVICES := 1\n\n# run time args\nEXE_OPT := kernel0.$(MODE).xclbin\n\n# primary build targets\n.PHONY: xclbin app all\n\nxclbin:  $(XCLBIN)\napp: $(HOST_EXE)\n\nall: xclbin app\n\nclean:\n\t-$(RM) $(EMCONFIG_FILE) $(HOST_EXE) $(XCLBIN) *.xclbin *.xo $(XOS)\n\n# kernel rules\n$(XOS): $(KERNEL_SRC)\n\t$(RM) $@\n\t$(VPP) $(VPP_COMMON_OPTS) -c -k kernel0 -o $@ $+\n\n\n$(XCLBIN): $(XOS)\n\t$(VPP) $(VPP_COMMON_OPTS) -l -o $@ $+ $(VPP_LINK_OPTS)\n\n# host rules\n$(HOST_EXE): $(HOST_SRC)\n\tg++ $(CFLAGS) -o $@ $+ $(LFLAGS)\n\t@echo 'Compiled Host Executable: $(HOST_EXE)'\n\n$(EMCONFIG_FILE):\n\t$(EMCONFIGUTIL) --nd $(NUMDEVICES) --od . --platform $(PLATFORM)\n\ncheck: $(XCLBIN) $(HOST_EXE) $(EMCONFIG_FILE)\n\tXCL_EMULATION_MODE=${MODE} ./$(HOST_EXE) $(EXE_OPT)\n"
  },
  {
    "path": "autosa_scripts/vitis_scripts/connectivity.cfg",
    "content": "[connectivity]\nsp=kernel0_1.A:DDR[0]\nsp=kernel0_1.B:DDR[1] \nsp=kernel0_1.C:DDR[2]\n\n"
  },
  {
    "path": "autosa_tests/cnn/Makefile",
    "content": "VPP := $(XILINX_VITIS)/bin/v++\nEMCONFIGUTIL := $(XILINX_VITIS)/bin/emconfigutil\nMODE := hw\n#PLATFORM := xilinx_u200_qdma_201920_1\nPLATFORM := xilinx_u250_xdma_201830_2\n\n# sources\nKERNEL_SRC := src/kernel_kernel.cpp\nHOST_SRC := src/kernel_host.cpp\n\n# targets\nHOST_EXE := host.exe\n\nXOS := kernel0.$(MODE).xo\nXCLBIN := kernel0.$(MODE).xclbin\nEMCONFIG_FILE := emconfig.json\n\n# Linker options to map kernel ports to DDR banks\nVPP_LINK_OPTS := --config connectivity.cfg\n\nVPP_COMMON_OPTS := -s -t $(MODE) --platform $(PLATFORM) -R2 -O3 --kernel_frequency 250 --vivado.prop=run.impl_1.STRATEGY=Performance_EarlyBlockPlacement\nCFLAGS := -g -std=c++11 -I$(XILINX_XRT)/include\nLFLAGS := -L$(XILINX_XRT)/lib -lxilinxopencl -lpthread -lrt\nNUMDEVICES := 1\n\n# run time args\nEXE_OPT := kernel0.$(MODE).xclbin\n\n# primary build targets\n.PHONY: xclbin app all\n\nxclbin:  $(XCLBIN)\napp: $(HOST_EXE)\n\nall: xclbin app\n\nclean:\n\t-$(RM) $(EMCONFIG_FILE) $(HOST_EXE) $(XCLBIN) *.xclbin *.xo $(XOS)\n\n# kernel rules\n$(XOS): $(KERNEL_SRC)\n\t$(RM) $@\n\t$(VPP) $(VPP_COMMON_OPTS) -c -k kernel0 -o $@ $+\n\n\n$(XCLBIN): $(XOS)\n\t$(VPP) $(VPP_COMMON_OPTS) -l -o $@ $+ $(VPP_LINK_OPTS)\n\n# host rules\n$(HOST_EXE): $(HOST_SRC)\n\tg++ $(CFLAGS) -o $@ $+ $(LFLAGS)\n\t@echo 'Compiled Host Executable: $(HOST_EXE)'\n\n$(EMCONFIG_FILE):\n\t$(EMCONFIGUTIL) --nd $(NUMDEVICES) --od . --platform $(PLATFORM)\n\ncheck: $(XCLBIN) $(HOST_EXE) $(EMCONFIG_FILE)\n\tXCL_EMULATION_MODE=${MODE} ./$(HOST_EXE) $(EXE_OPT)\n"
  },
  {
    "path": "autosa_tests/cnn/README.md",
    "content": "# Convolutional Neural Network (Single Layer, Small)\n\nBoard        | Software Version\n-------------|-----------------\nXilinx Alveo U250 | Xilinx Vitis 2019.2\n\n__Files__:\n```\nautosa_tests/cnn/kernel.c\nautosa_tests/cnn/kernel.h\nautosa_tests/cnn/simd_info.json\nautosa_tests/cnn/Makefile\nautosa_tests/cnn/connectivity.cfg\n```\n\n__Command__:\n```c\n./autosa ./autosa_tests/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[8,8,4,8];kernel[]->latency[4,2,4];kernel[]->simd[1,1,1,2]}\" --simd-info=./autosa_tests/cnn/simd_info.json --host-serialize\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` and `connectivity.cfg` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/cnn/Makefile autosa.tmp/output/\ncp autosa_tests/cnn/connectivity.cfg autosa.tmp/output/\n```\n\nExecute the makefile to build the design.\n\n```\ncd autosa.tmp/output\nmake all\n```\n"
  },
  {
    "path": "autosa_tests/cnn/connectivity.cfg",
    "content": "[connectivity]\nsp=kernel0_1.cin:DDR[0]\nsp=kernel0_1.w:DDR[1] \nsp=kernel0_1.cout:DDR[2]\n"
  },
  {
    "path": "autosa_tests/cnn/hls_script.tcl",
    "content": "############################################################\n## This file is generated automatically by Vivado HLS.\n## Please DO NOT edit it.\n## Copyright (C) 1986-2019 Xilinx, Inc. All Rights Reserved.\n############################################################\nopen_project hls_prj\nset_top kernel0\nadd_files src/kernel_kernel.h\nadd_files src/kernel_kernel.cpp\nadd_files -tb src/kernel_host.cpp\nopen_solution \"solution1\"\nset_part {xcu200-fsgd2104-2-e}\ncreate_clock -period 5 -name default\nconfig_compile -name_max_length 50\n#source \"./prj/solution1/directives.tcl\"\ncsim_design\n#csynth_design\n#cosim_design\n#cosim_design -trace_level all\n#cosim_design -setup -trace_level all\n#export_design -format ip_catalog\nexit\n"
  },
  {
    "path": "autosa_tests/cnn/kernel.c",
    "content": "#include \"kernel.h\"\n\nint main(int argc, char **argv){\n  data_t cin[R + K - 1][C + K - 1][I];\n  data_t w[O][K][K][I];\n  data_t cout[R][C][O];\n  data_t cout_golden[R][C][O];\n\n  // data initialization\n  for (int i = 0 ; i < I; i++)\n    for (int r = 0; r < R + K - 1; r++)\n      for (int c = 0; c < C + K - 1; c++) {\n        cin[r][c][i] = i;\n      }\n\n  for (int o = 0; o < O; o++)\n    for (int i = 0; i < I; i++) \n      for (int p = 0; p < K; p++)\n        for (int q = 0; q < K; q++) {\n          w[o][p][q][i] = o;\n        }\n \n#pragma scop\n  for (int o = 0; o < O; o++)\n    for (int r = 0; r < R; r++)\n      for (int c = 0; c < C; c++) {\n        //cout[r][c][o] = 0;\n        for (int i = 0; i < I; i++)\n          for (int p = 0; p < K; p++)\n            for (int q = 0; q < K; q++) {\n              cout[r][c][o] = cout[r][c][o] + cin[r + p][c + q][i] * w[o][p][q][i];\n            }\n      }\n#pragma endscop  \n \n  for (int o = 0; o < O; o++)\n    for (int r = 0; r < R; r++)\n      for (int c = 0; c < C; c++) {\n        cout_golden[r][c][o] = 0;\n        for (int i = 0; i < I; i++)\n          for (int p = 0; p < K; p++)\n            for (int q = 0; q < K; q++) {\n              cout_golden[r][c][o] = cout_golden[r][c][o] + cin[r + p][c + q][i] * w[o][p][q][i];\n            }\n      }\n\n  int err = 0;\n  float thres = 0.001;\n  for (int o = 0; o < O; o++)\n    for (int r = 0; r < R; r++)\n      for (int c = 0; c < C; c++) {\n        if (fabs((float)cout_golden[r][c][o] - (float)cout[r][c][o]) > thres) {\n          err++;\n        }\n      }\n\n  //if (err) {\n  //  printf(\"Test failed with %d errors!\\n\", err);\n  //  return -1;\n  //} else {\n  //  printf(\"Test passed!\\n\");\n  //  return 0;\n  //}\n}\n"
  },
  {
    "path": "autosa_tests/cnn/kernel.h",
    "content": "#include \"stdio.h\"\n#include \"stdlib.h\"\n#include \"math.h\"\n\ntypedef float data_t;\n#define O 16\n#define I 16\n#define R 16\n#define C 16\n#define K 3\n\n//#define O 6\n//#define I 1\n//#define R 5\n//#define C 5\n//#define K 3\n"
  },
  {
    "path": "autosa_tests/cnn/param_names.json",
    "content": "{\n  \"kernel0\": [\"q\", \"p\", \"o\", \"r\", \"c\", \"i\"],\n  \"kernel1\": [\"q\", \"p\", \"o\", \"r\", \"c\", \"i\"],\n  \"kernel2\": [\"q\", \"p\", \"o\", \"r\", \"c\", \"i\"],\n  \"kernel3\": [\"q\", \"p\", \"o\", \"r\", \"c\", \"i\"],\n  \"kernel4\": [\"q\", \"p\", \"o\", \"r\", \"c\", \"i\"],\n  \"kernel5\": [\"q\", \"p\", \"o\", \"r\", \"c\", \"i\"],\n  \"kernel6\": [\"q\", \"p\", \"o\", \"r\", \"c\", \"i\"],\n  \"kernel7\": [\"q\", \"p\", \"o\", \"r\", \"c\", \"i\"],\n  \"kernel8\": [\"q\", \"p\", \"o\", \"r\", \"c\", \"i\"],\n  \"kernel9\": [\"q\", \"p\", \"o\", \"r\", \"c\", \"i\"]\n}\n"
  },
  {
    "path": "autosa_tests/cnn/simd_info.json",
    "content": "{\n  \"kernel0\": {\n    \"reduction\": [\"y\", \"y\", \"y\"]\n  },\n  \"kernel1\": {\n    \"reduction\": [\"y\", \"y\", \"y\"]\n  },\n  \"kernel2\": {\n    \"reduction\": [\"y\", \"y\", \"y\"]\n  },\n  \"kernel3\": {\n    \"reduction\": [\"y\", \"y\", \"y\"]\n  },\n  \"kernel4\": {\n    \"reduction\": [\"y\", \"y\", \"y\"]\n  },\n  \"kernel5\": {\n    \"reduction\": [\"y\", \"y\", \"y\"]\n  },\n  \"kernel6\": {\n    \"reduction\": [\"y\", \"y\", \"y\"]\n  },\n  \"kernel7\": {\n    \"reduction\": [\"y\", \"y\", \"y\"]\n  },\n  \"kernel8\": {\n    \"reduction\": [\"y\", \"y\", \"y\"]\n  },\n  \"kernel9\": {\n    \"reduction\": [\"y\", \"y\", \"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/dnn_ops/dc_simd_info.json",
    "content": "{\n  \"kernel4\": {\n    \"reduction\": [\"y\", \"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/dnn_ops/fc_simd_info.json",
    "content": "{\n  \"kernel2\": {\n    \"reduction\": [\"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/dnn_ops/hls_script.tcl",
    "content": "############################################################\n## This file is generated automatically by Vivado HLS.\n## Please DO NOT edit it.\n## Copyright (C) 1986-2019 Xilinx, Inc. All Rights Reserved.\n############################################################\nopen_project hls_prj\nset_top kernel0\nadd_files src/kernel_kernel.h\nadd_files src/kernel_kernel.cpp\nadd_files -tb src/kernel_host.cpp\nopen_solution \"solution1\"\nset_part {xcu200-fsgd2104-2-e}\ncreate_clock -period 5 -name default\nconfig_compile -name_max_length 50\n#source \"./prj/solution1/directives.tcl\"\ncsim_design\n#csynth_design\n#cosim_design\n#cosim_design -trace_level all\n#cosim_design -setup -trace_level all\n#export_design -format ip_catalog\nexit\n"
  },
  {
    "path": "autosa_tests/dnn_ops/kernel.c",
    "content": "// In this example, we compile three different operators that are found often in \n// DNNs, including: point-wise conv, depth-wise conv, and FC.\n\n#include \"kernel.h\"\n\nint main(int argc, char **argv){\n#ifdef PC\t\n  // Point-wise CONV\n  data_t pc_cin[PC_R + PC_K - 1][PC_C + PC_K - 1][PC_I];\n  data_t pc_w[PC_O][PC_K][PC_K][PC_I];\n  data_t pc_cout[PC_R][PC_C][PC_O];\n  data_t pc_cout_golden[PC_R][PC_C][PC_O];\n\n  for (int i = 0; i < PC_I; i++)\n    for (int r = 0; r < PC_R + PC_K - 1; r++)\n      for (int c = 0; c < PC_C + PC_K - 1; c++) {\n        pc_cin[r][c][i] = i;\n      }\n\n\tfor (int o = 0; o < PC_O; o++)\n\t\tfor (int i = 0; i < PC_I; i++)\n\t\t\tfor (int p = 0; p < PC_K; p++)\n\t\t\t\tfor (int q = 0; q < PC_K; q++) {\n\t\t\t\t\tpc_w[o][p][q][i] = o;\n\t\t\t\t}\n\n#pragma scop\n  for (int o = 0; o < PC_O; o++)\n    for (int r = 0; r < PC_R; r++)\n      for (int c = 0; c < PC_C; c++) {\n        pc_cout[r][c][o] = 0;\n        for (int i = 0; i < PC_I; i++)\n          for (int p = 0; p < PC_K; p++)\n            for (int q = 0; q < PC_K; q++) {\n              pc_cout[r][c][o] = pc_cout[r][c][o] + pc_cin[r + p][c + q][i] * pc_w[o][p][q][i];\n            }\n      }\t\n#pragma endscop\n\n  for (int o = 0; o < PC_O; o++)\n    for (int r = 0; r < PC_R; r++)\n      for (int c = 0; c < PC_C; c++) {\n        pc_cout_golden[r][c][o] = 0;\n        for (int i = 0; i < PC_I; i++)\n          for (int p = 0; p < PC_K; p++)\n            for (int q = 0; q < PC_K; q++) {\n              pc_cout_golden[r][c][o] = pc_cout_golden[r][c][o] + pc_cin[r + p][c + q][i] * pc_w[o][p][q][i];\n            }\n      }\n\n  int err = 0;\n  float thres = 0.001;\n  for (int o = 0; o < PC_O; o++)\n    for (int r = 0; r < PC_R; r++)\n      for (int c = 0; c < PC_C; c++) {\n        if (fabs((float)pc_cout_golden[r][c][o] - (float)pc_cout[r][c][o]) > thres) {\n          err++;\n        }\n      }\n\n  if (err) {\n    printf(\"Test failed with %d errors!\\n\", err);\n    return -1;\n  } else {\n    printf(\"Test passed!\\n\");\n    return 0;\n  }\n#endif\n\n#ifdef DC\n  // Depth-wise CONV\n  data_t dc_cin[DC_R + DC_K - 1][DC_C + DC_K - 1][DC_I];\n  data_t dc_w[DC_K][DC_K][DC_I];\n  data_t dc_cout[DC_R][DC_C][DC_O];\n  data_t dc_cout_golden[DC_R][DC_C][DC_O];\n\n  for (int i = 0; i < DC_I; i++)\n    for (int r = 0; r < DC_R + DC_K - 1; r++)\n      for (int c = 0; c < DC_C + DC_K - 1; c++) {\n        dc_cin[r][c][i] = i;\n      }\n\t\n\tfor (int i = 0; i < DC_I; i++)\n\t\tfor (int p = 0; p < DC_K; p++)\n\t\t\tfor (int q = 0; q < DC_K; q++) {\n\t\t\t\tdc_w[p][q][i] = i;\n\t\t\t}\n\n#pragma scop\n  for (int o = 0; o < DC_O; o++)\n    for (int r = 0; r < DC_R; r++)\n      for (int c = 0; c < DC_C; c++) {\n        dc_cout[r][c][o] = 0;        \n        for (int p = 0; p < DC_K; p++)\n          for (int q = 0; q < DC_K; q++) {\n            dc_cout[r][c][o] = dc_cout[r][c][o] + dc_cin[r + p][c + q][o] * dc_w[p][q][o];\n          }\n      }\t\n#pragma endscop\n\n  for (int o = 0; o < DC_O; o++)\n    for (int r = 0; r < DC_R; r++)\n      for (int c = 0; c < DC_C; c++) {\n        dc_cout_golden[r][c][o] = 0;        \n        for (int p = 0; p < DC_K; p++)\n          for (int q = 0; q < DC_K; q++) {\n            dc_cout_golden[r][c][o] = dc_cout_golden[r][c][o] + dc_cin[r + p][c + q][o] * dc_w[p][q][o];\n          }\n      }\t\n\n  int err = 0;\n  float thres = 0.001;\n  for (int o = 0; o < DC_O; o++)\n    for (int r = 0; r < DC_R; r++)\n      for (int c = 0; c < DC_C; c++) {\n        if (fabs((float)dc_cout_golden[r][c][o] - (float)dc_cout[r][c][o]) > thres) {\n          err++;\n\t\t\t\t\tprintf(\"(golden, hw)@(%d, %d, %d): (%f, %f)\\n\", o, r, c, (float)dc_cout_golden[r][c][o], (float)dc_cout[r][c][o]);\n        }\n      }\n\n  if (err) {\n    printf(\"Test failed with %d errors!\\n\", err);\n    return -1;\n  } else {\n    printf(\"Test passed!\\n\");\n    return 0;\n  }\n#endif\n\n#ifdef FC\n  // Fully-connected Layers\n  data_t fc_cin[FC_I][FC_J];\n  data_t fc_w[FC_J];\n  data_t fc_cout[FC_I];\n  data_t fc_cout_golden[FC_I];\n\n  for (int i = 0; i < FC_I; i++)\n    for (int j = 0; j < FC_J; j++) {\n      fc_cin[i][j] = i;\n    }\n\t\n\tfor (int j = 0; j < FC_J; j++) {\n\t\tfc_w[j] = j;\n\t}\n\n#pragma scop\n  for (int i = 0; i < FC_I; i++) {\n\t\tfc_cout[i] = 0;       \n    for (int j = 0; j < FC_J; j++) {\n      fc_cout[i] = fc_cout[i] + fc_cin[i][j] * fc_w[j];\n    }\n  }\n#pragma endscop\n\n  for (int i = 0; i < FC_I; i++) {\n\t\tfc_cout_golden[i] = 0;       \n    for (int j = 0; j < FC_J; j++) {\n      fc_cout_golden[i] = fc_cout_golden[i] + fc_cin[i][j] * fc_w[j];\n    }\n  }\t\n\n  int err = 0;\n  float thres = 0.001;\n  for (int i = 0; i < FC_I; i++)    \n    if (fabs((float)fc_cout_golden[i] - (float)fc_cout[i]) > thres) {\n      err++;\n\t\t\tprintf(\"(golden, hw)@(%d): (%f, %f)\\n\", i, (float)fc_cout_golden[i], (float)fc_cout[i]);\n    }    \n\n  if (err) {\n    printf(\"Test failed with %d errors!\\n\", err);\n    return -1;\n  } else {\n    printf(\"Test passed!\\n\");\n    return 0;\n  }\n#endif\n}"
  },
  {
    "path": "autosa_tests/dnn_ops/kernel.h",
    "content": "#include \"stdio.h\"\n#include \"stdlib.h\"\n#include \"math.h\"\n\n//#define PC\n//#define DC\n#define FC\n\ntypedef float data_t;\n// point-wise conv\n#define PC_O 16\n#define PC_I 16\n#define PC_R 8\n#define PC_C 8\n#define PC_K 3\n\n// depth-wise conv\n#define DC_O 16\n#define DC_I 16\n#define DC_R 8\n#define DC_C 8\n#define DC_K 3\n\n// fc\n#define FC_I 16\n#define FC_J 16\n"
  },
  {
    "path": "autosa_tests/dnn_ops/pc_simd_info.json",
    "content": "{\n  \"kernel4\": {\n    \"reduction\": [\"y\", \"y\", \"y\"]\n  },\n  \"kernel5\": {\n    \"reduction\": [\"y\", \"y\", \"y\"]\n  } \n}\n"
  },
  {
    "path": "autosa_tests/large/cnn/Makefile",
    "content": "VPP := $(XILINX_VITIS)/bin/v++\nEMCONFIGUTIL := $(XILINX_VITIS)/bin/emconfigutil\nMODE := hw\n#PLATFORM := xilinx_u200_qdma_201920_1\nPLATFORM := xilinx_u250_xdma_201830_2\n\n# sources\nKERNEL_SRC := src/kernel_kernel.cpp\nHOST_SRC := src/kernel_host.cpp\n\n# targets\nHOST_EXE := host.exe\n\nXOS := kernel0.$(MODE).xo\nXCLBIN := kernel0.$(MODE).xclbin\nEMCONFIG_FILE := emconfig.json\n\n# Linker options to map kernel ports to DDR banks\nVPP_LINK_OPTS := --config connectivity.cfg\n\nVPP_COMMON_OPTS := -s -t $(MODE) --platform $(PLATFORM) -R2 -O3 --kernel_frequency 250 --vivado.prop=run.impl_1.STRATEGY=Performance_EarlyBlockPlacement\nCFLAGS := -g -std=c++11 -I$(XILINX_XRT)/include\nLFLAGS := -L$(XILINX_XRT)/lib -lxilinxopencl -lpthread -lrt\nNUMDEVICES := 1\n\n# run time args\nEXE_OPT := kernel0.$(MODE).xclbin\n\n# primary build targets\n.PHONY: xclbin app all\n\nxclbin:  $(XCLBIN)\napp: $(HOST_EXE)\n\nall: xclbin app\n\nclean:\n\t-$(RM) $(EMCONFIG_FILE) $(HOST_EXE) $(XCLBIN) *.xclbin *.xo $(XOS)\n\n# kernel rules\n$(XOS): $(KERNEL_SRC)\n\t$(RM) $@\n\t$(VPP) $(VPP_COMMON_OPTS) -c -k kernel0 -o $@ $+\n\n\n$(XCLBIN): $(XOS)\n\t$(VPP) $(VPP_COMMON_OPTS) -l -o $@ $+ $(VPP_LINK_OPTS)\n\n# host rules\n$(HOST_EXE): $(HOST_SRC)\n\tg++ $(CFLAGS) -o $@ $+ $(LFLAGS)\n\t@echo 'Compiled Host Executable: $(HOST_EXE)'\n\n$(EMCONFIG_FILE):\n\t$(EMCONFIGUTIL) --nd $(NUMDEVICES) --od . --platform $(PLATFORM)\n\ncheck: $(XCLBIN) $(HOST_EXE) $(EMCONFIG_FILE)\n\tXCL_EMULATION_MODE=${MODE} ./$(HOST_EXE) $(EXE_OPT)\n"
  },
  {
    "path": "autosa_tests/large/cnn/README.md",
    "content": "# Convolutional Neural Network (Single Layer, Large)\n\nBoard        | Software Version\n-------------|-----------------\nXilinx Alveo U250 | Xilinx Vitis 2019.2\n\n__Files__:\n```\nautosa_tests/large/cnn/kernel.c\nautosa_tests/large/cnn/kernel.h\nautosa_tests/large/cnn/simd_info.json\nautosa_tests/large/cnn/Makefile\nautosa_tests/large/cnn/connectivity.cfg\n```\n\n__Command__:\n```c\n./autosa ./autosa_tests/large/cnn/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[64,56,14,64];kernel[]->latency[4,4,7];kernel[]->simd[1,1,8]}\" --simd-info=./autosa_tests/large/cnn/simd_info.json\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` and `connectivity.cfg` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/large/cnn/Makefile autosa.tmp/output/\ncp autosa_tests/large/cnn/connectivity.cfg autosa.tmp/output/\n```\n\nExecute the makefile to build the design.\n\n```\ncd autosa.tmp/output\nmake all\n```"
  },
  {
    "path": "autosa_tests/large/cnn/connectivity.cfg",
    "content": "[connectivity]\nsp=kernel0_1.cin:DDR[0]\nsp=kernel0_1.w:DDR[1] \nsp=kernel0_1.cout:DDR[3]\n"
  },
  {
    "path": "autosa_tests/large/cnn/hls_script.tcl",
    "content": "############################################################\n## This file is generated automatically by Vivado HLS.\n## Please DO NOT edit it.\n## Copyright (C) 1986-2019 Xilinx, Inc. All Rights Reserved.\n############################################################\nopen_project hls_prj\nset_top kernel0\nadd_files src/kernel_kernel.h\nadd_files src/kernel_kernel.cpp\nadd_files -tb src/kernel_host.cpp\nopen_solution \"solution1\"\nset_part {xcu200-fsgd2104-2-e}\ncreate_clock -period 5 -name default\nconfig_compile -name_max_length 50\n#source \"./prj/solution1/directives.tcl\"\ncsim_design\n#csynth_design\n#cosim_design\n#cosim_design -trace_level all\n#cosim_design -setup -trace_level all\n#export_design -format ip_catalog\nexit\n"
  },
  {
    "path": "autosa_tests/large/cnn/kernel.c",
    "content": "#include \"kernel.h\"\n\nint main(int argc, char **argv){\n  // declarations\n//  data_t cin[I][R + K - 1][C + K - 1];\n//  data_t w[O][I][K][K];\n//  data_t cout[O][R][C];\n//  data_t cout_golden[O][R][C];\n  static data_t cin[R + K - 1][C + K - 1][I];\n  static data_t w[O][K][K][I];\n  static data_t cout[R][C][O];\n  static data_t cout_golden[R][C][O];\n\n  // data initialization\n  for (int i = 0 ; i < I; i++)\n    for (int r = 0; r < R + K - 1; r++)\n      for (int c = 0; c < C + K - 1; c++) {\n        cin[r][c][i] = 1;\n      }\n\n  for (int o = 0; o < O; o++)\n    for (int i = 0; i < I; i++) \n      for (int p = 0; p < K; p++)\n        for (int q = 0; q < K; q++) {\n          w[o][p][q][i] = 1;\n        }\n \n#pragma scop\n  for (int o = 0; o < O; o++)\n    for (int r = 0; r < R; r++)\n      for (int c = 0; c < C; c++) {\n        cout[r][c][o] = 0;\n        for (int i = 0; i < I; i++)\n          for (int p = 0; p < 3; p++)\n            for (int q = 0; q < 3; q++) {\n              cout[r][c][o] = cout[r][c][o] + cin[r + p][c + q][i] * w[o][p][q][i];\n            }\n      }\n#pragma endscop  \n \n  for (int o = 0; o < O; o++)\n    for (int r = 0; r < R; r++)\n      for (int c = 0; c < C; c++) {\n        cout_golden[r][c][o] = 0;\n        for (int i = 0; i < I; i++)\n          for (int p = 0; p < 3; p++)\n            for (int q = 0; q < 3; q++) {\n              cout_golden[r][c][o] = cout_golden[r][c][o] + cin[r + p][c + q][i] * w[o][p][q][i];\n            }\n      }\n\n  int err = 0;\n  float thres = 0.001;\n  for (int o = 0; o < O; o++)\n    for (int r = 0; r < R; r++)\n      for (int c = 0; c < C; c++) {\n        if (fabs((float)cout_golden[r][c][o] - (float)cout[r][c][o]) > thres) {\n          err++;\n        }\n      }\n\n  if (err) {\n    printf(\"Test failed with %d errors!\\n\", err);\n    return -1;\n  } else {\n    printf(\"Test passed!\\n\");\n    return 0;\n  }\n}\n"
  },
  {
    "path": "autosa_tests/large/cnn/kernel.h",
    "content": "#include \"stdio.h\"\n#include \"stdlib.h\"\n#include \"math.h\"\n\ntypedef float data_t;\n//#define O 512\n#define O 640\n#define I 512\n//#define R 60\n#define R 56\n#define C 56\n#define K 3\n\n//#define O 264\n//#define I 256\n//#define R 224\n//#define C 224\n//#define K 5\n"
  },
  {
    "path": "autosa_tests/large/cnn/simd_info.json",
    "content": "{\n  \"kernel4\": {\n    \"reduction\": [\"y\", \"y\", \"y\"]\n  },\n  \"kernel5\": {\n    \"reduction\": [\"y\", \"y\", \"y\"]\n  } \n}\n"
  },
  {
    "path": "autosa_tests/large/cnn/step1-run-hls.tcl",
    "content": "open_project kernel0\nset_top kernel0\nadd_files \"src/kernel_kernel.cpp\"\n#add_files -tb PATH_TO_TESTBENCH_FILE\n\nopen_solution solution\n\n#u250\nset_part xcu250-figd2104-2L-e\n\n# u280\n#set_part xcu280-fsvh2892-2L-e\n\n# 300 MHz\ncreate_clock -period 3.333\n\nconfig_dataflow -strict_mode warning\nset_clock_uncertainty 27.000000%\nconfig_rtl -enable_maxiConservative=1\nconfig_interface -m_axi_addr64\n\n# to enable integration with Vitis\nconfig_sdx -target xocc\n\n#csim_design\ncsynth_design\nclose_project\nexit\n"
  },
  {
    "path": "autosa_tests/large/cnn/step2-autobridge.py",
    "content": "#! /usr/bin/python3.6\n\n# add the path to where you place the autobridge source code\nimport sys\nsys.path.append('../src')\n\nimport graph\nfrom formator import FormatHLS\nimport collections\nimport os\nimport subprocess\n\n\"\"\"\nAutoBridge divides the target device as follows and assign each HLS function to one slot\nFor more details pls refer to the paper\n\n      u250                     u280\n   -----------\n 3 |    |    |\n   |----|----|              |----|----|\n 2 |    |    |            2 |    |    |\n   |----|----|              |----|----|\n 1 |    |    |            1 |    |    |\n   |----|----|              |----|----|\n 0 |    |    |            0 |    |    |\n   -----------              -----------\n     0    1                   0    1\n\"\"\"\n\n################### Modify Accordingly ###############################\n\n# (1) fill basic information\nproject_path = '/home/jaywang/doc_examples/cnn_large_ab/kernel0' # path to your hls project\ntop_name = 'kernel0' # name of the top function in your hls design\nsolution_path = f'{project_path}/solution/'\nproject_name = 'kernel0'\nboard_name = 'u250' # or 'u280'\n# where the results will be saved. Your HLS project will be copied there and your top RTL will be replaced.\n# Note that if the directory already exists, we will try to reset the contents\n\n# (2) specify how your designs connect to the external memory\n\"\"\" Example:\n\nvoid kernel0(ap_uint<512> *p1, ap_uint<512> *p2)\n{\n  #pragma HLS INTERFACE m_axi port=p1 offset=slave bundle=gmem_A\n  #pragma HLS INTERFACE m_axi port=p2 offset=slave bundle=gmem_B\n\n  load_p1 (p1, ...);\n  load_p2 (p2, ...);\n}\n\n--------------------------------------\n\nIn this example, the pointer p1 and p2 will become M_AXI controllers to connect to the dedicated DDR IP.\nIf you want p1 to connect to DDR 2 in the 2-nd SLR, then you need to specify that the corresponding RTL controller must be floorplanned at the 2-nd SLR\nMeanwhile, your function load_p1() will talk to the M_AXI controller also through AXI interface which cannot be easily pipelined.\nThus the RTL module corresponds to load_p1() must also be in the 2-nd SLR in this example.\nSince load_p1() will communicate with the rest of your design using FIFO interface, you don't need to specify the location of other modules\n\n(transparent)|                        (user visible)\n             |\n   Vitis     |                    what your HLS design becomes\n             |\n             | M_AXI                     AXI                        FIFO\nDDR IP  <--- | ----> M_AXI controller <-------> your first module <-------> your other modules\n(fixed loc)  |         (p1)                       (load_p1)\n             |\n             | M_AXI                     AXI                        FIFO\nDDR IP  <--- | ----> M_AXI controller <-------> your first module <-------> your other modules\n(fixed loc)  |         (p2)                       (load_p2)\n             |\n             | S_AXI\nPCIe    <--- | ----> S_AXI controller\n             |\n\"\"\"\n\n# on the left side or the right side of an SLR\nDDR_loc_2d_x = collections.defaultdict(dict)\n\n# on which SLR\nDDR_loc_2d_y = collections.defaultdict(dict)\n\n# use DDR 0, 1, 3\nDDR_loc_2d_y['cin_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_x['cin_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_cin_m_axi_U'] = 0\nDDR_loc_2d_x['kernel0_gmem_cin_m_axi_U'] = 0\n\nDDR_loc_2d_y['w_IO_L3_in_serialize_U0'] = 1\nDDR_loc_2d_x['w_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_w_m_axi_U'] = 1\nDDR_loc_2d_x['kernel0_gmem_w_m_axi_U'] = 0\n\nDDR_loc_2d_y['cout_drain_IO_L3_out_serialize_U0'] = 3\nDDR_loc_2d_x['cout_drain_IO_L3_out_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_cout_m_axi_U'] = 3\nDDR_loc_2d_x['kernel0_gmem_cout_m_axi_U'] = 0\n\nDDR_loc_2d_y['kernel0_control_s_axi_U'] = 1\nDDR_loc_2d_x['kernel0_control_s_axi_U'] = 1\nDDR_loc_2d_y['kernel0_entry12_U0'] = 1\nDDR_loc_2d_x['kernel0_entry12_U0'] = 1\n\n# (3) specify DDR information\n# If you instantiate a DDR controller, it will consume non-trivial amount of resource\n# to make the floorplanning better, you need to specify which DDRs have been enabled\n# In this example, you connect p1 to DDR-2 in SLR-2 and p2 to DDR-1 in SLR-1\n# If you want to use all DDRs, for example, you need to set it as [1, 1, 1, 1]\nDDR_enable = [1, 1, 0, 1]\n\n# (4) specify how much resource can be used in each slot\n# In this way you could force the design to be placed evenly across the device and avoid local congestion\n\"\"\" Example:\n   -----------\n 3 |0.76|0.62|\n   |----|----|\n 2 |0.74|0.61|\n   |----|----|\n 1 |0.75|0.6 |\n   |----|----|\n 0 | 0.7|0.6 |\n   -----------\n     0    1\n\"\"\"\n#max_usage_ratio_2d = [ [0.9, 0.85], [0.9, 0.85], [0.9, 0.85], [0.9, 0.85] ]\nmax_usage_ratio_2d = [ [0.9, 0.82], [0.9, 0.82], [0.9, 0.82], [0.9, 0.82] ]\n\n\n##################### DON'T TOUCH THE SECTION BELOW #################################\ntarget_dir = '/home/jaywang/doc_examples/cnn_large_ab/autobridge_v4'\n\nformator = FormatHLS(\n  rpt_path = f'{solution_path}/syn/report/',\n  hls_sche_path = f'{solution_path}/.autopilot/db/',\n  top_hdl_path = f'{solution_path}/syn/verilog/{top_name}_{top_name}.v',\n  top_name = top_name,\n  DDR_loc_2d_x = DDR_loc_2d_x,\n  DDR_loc_2d_y = DDR_loc_2d_y,\n  DDR_enable = DDR_enable,\n  max_usage_ratio_2d = max_usage_ratio_2d,\n  board_name = board_name,\n  target_dir = target_dir,\n  relay_station_count = lambda x : 2 * x, # how many levels of relay stations to add for x-unit of crossing\n  max_search_time = 600,\n  NaiveBalance = True)\n\n# run floorplanning\ng = graph.Graph(formator)\n\n# move results to target dir\nif (os.path.isdir(target_dir)):\n  subprocess.run(['rm', '-rf', f'{target_dir}'])\nsubprocess.run(['mkdir', f'{target_dir}/'])\nsubprocess.run(['cp', '-r', project_path, f'{target_dir}/{project_name}'])\nsubprocess.run(['cp', os.path.realpath(__file__), f'{target_dir}/archived_source.txt'])\nsubprocess.run(['chmod', '+w', '-R', f'{target_dir}'])\nsubprocess.run(['cp', 'constraint.tcl', target_dir])\nsubprocess.run(['cp', 'pack_xo.tcl', target_dir])\nsubprocess.run(['cp', 'autobridge.log', target_dir])\nsubprocess.run(['cp', f'{top_name}_{top_name}.v', f'{target_dir}/{project_name}/solution/syn/verilog/'])\n\n# clean up\nos.system('rm *.lp')\nsubprocess.run(['rm', 'parser.out'])\nsubprocess.run(['rm', 'parsetab.py'])\nsubprocess.run(['rm', '-rf', '__pycache__'])\n\n"
  },
  {
    "path": "autosa_tests/large/cnn/step3-pack-xo.tcl",
    "content": "open_project kernel0\nopen_solution solution\nexport_design -rtl verilog -format ip_catalog -xo kernel0.xo\n\nclose_project\nputs \"Pack XO successfully\"\nexit\n"
  },
  {
    "path": "autosa_tests/large/cnn/step4-run-vitis.sh",
    "content": "OUTPUT_DIR=\"$(pwd)/vitis_run\"\n\n# name of the top function\nTOP=kernel0\n\n# choose the target device\nPLATFORM=xilinx_u250_xdma_201830_2 \n#PLATFORM=xilinx_u280_xdma_201920_3 \n\nXO=\"$(pwd)/kernel0.xo\"\n\n# For different approaches see UG904-vivado-implementation\n#STRATEGY=\"Default\" \nSTRATEGY=\"EarlyBlockPlacement\" \n\n# remove the unused '--connectivity.sp' option for v++ if some DDRs are not used \n# Example: if we map p1 to DDR 3 and p2 to DDR 0\n#\n# void kernel0(ap_uint<512> *p1, ap_uint<512> *p2)\n# {\n#   #pragma HLS INTERFACE m_axi port=p1 offset=slave bundle=gmem_A\n#   #pragma HLS INTERFACE m_axi port=p2 offset=slave bundle=gmem_B\n# \n#   load_p1 (p1, ...);\n#   load_p2 (p2, ...);\n# }\n#\n# ARG_FOR_DDR_0=p2\n# ARG_FOR_DDR_3=p1\n# Should remove '--connectivity.sp' for DDR1 and DDR2\n\nARG_FOR_DDR_1=cin\nARG_FOR_DDR_2=w\n#ARG_FOR_DDR_3=\"YOUR_HLS_ARGUMENT_NAME_FOR_DDR_3\"\nARG_FOR_DDR_4=cout\n\n# the constraint file containing the floorplan results\n# WARNING: must use absolute address\nCONSTRAINT=\"$(pwd)/constraint.tcl\"\nif [ ! -f \"$CONSTRAINT\" ]; then\n    echo \"no constraint file found\"\n    exit\nfi\n\nv++ \\\n  --link \\\n  --output \"${OUTPUT_DIR}/${TOP}_${PLATFORM}.xclbin\" \\\n  --kernel ${TOP} \\\n  --platform ${PLATFORM} \\\n  --target hw \\\n  --report_level 2 \\\n  --temp_dir \"${OUTPUT_DIR}/${TOP}_${PLATFORM}.temp\" \\\n  --optimize 3 \\\n  --connectivity.nk ${TOP}:1:${TOP}_1 \\\n  --max_memory_ports ${TOP} \\\n  --save-temps \\\n  ${XO} \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_1}:DDR[0] \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_2}:DDR[1] \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_4}:DDR[3] \\\n  --kernel_frequency 300 \\\n  --vivado.prop run.impl_1.STEPS.PLACE_DESIGN.ARGS.DIRECTIVE=$STRATEGY \\\n  --vivado.prop run.impl_1.STEPS.OPT_DESIGN.TCL.PRE=$CONSTRAINT\n"
  },
  {
    "path": "autosa_tests/large/mm/Makefile",
    "content": "VPP := $(XILINX_VITIS)/bin/v++\nEMCONFIGUTIL := $(XILINX_VITIS)/bin/emconfigutil\nMODE := hw\n#PLATFORM := xilinx_u200_qdma_201920_1\nPLATFORM := xilinx_u250_xdma_201830_2\n\n# sources\nKERNEL_SRC := src/kernel_kernel.cpp\nHOST_SRC := src/kernel_host.cpp\n\n# targets\nHOST_EXE := host.exe\n\nXOS := kernel0.$(MODE).xo\nXCLBIN := kernel0.$(MODE).xclbin\nEMCONFIG_FILE := emconfig.json\n\n# Linker options to map kernel ports to DDR banks\nVPP_LINK_OPTS := --config connectivity.cfg\n\nVPP_COMMON_OPTS := -s -t $(MODE) --platform $(PLATFORM) -R2 -O3 --kernel_frequency 300 --vivado.prop=run.impl_1.STRATEGY=Performance_EarlyBlockPlacement\nCFLAGS := -g -std=c++11 -I$(XILINX_XRT)/include\nLFLAGS := -L$(XILINX_XRT)/lib -lxilinxopencl -lpthread -lrt\nNUMDEVICES := 1\n\n# run time args\nEXE_OPT := kernel0.$(MODE).xclbin\n\n# primary build targets\n.PHONY: xclbin app all\n\nxclbin:  $(XCLBIN)\napp: $(HOST_EXE)\n\nall: xclbin app\n\nclean:\n\t-$(RM) $(EMCONFIG_FILE) $(HOST_EXE) $(XCLBIN) *.xclbin *.xo $(XOS)\n\n# kernel rules\n$(XOS): $(KERNEL_SRC)\n\t$(RM) $@\n\t$(VPP) $(VPP_COMMON_OPTS) -c -k kernel0 -o $@ $+\n\n\n$(XCLBIN): $(XOS)\n\t$(VPP) $(VPP_COMMON_OPTS) -l -o $@ $+ $(VPP_LINK_OPTS)\n\n# host rules\n$(HOST_EXE): $(HOST_SRC)\n\tg++ $(CFLAGS) -o $@ $+ $(LFLAGS)\n\t@echo 'Compiled Host Executable: $(HOST_EXE)'\n\n$(EMCONFIG_FILE):\n\t$(EMCONFIGUTIL) --nd $(NUMDEVICES) --od . --platform $(PLATFORM)\n\ncheck: $(XCLBIN) $(HOST_EXE) $(EMCONFIG_FILE)\n\tXCL_EMULATION_MODE=${MODE} ./$(HOST_EXE) $(EXE_OPT)\n"
  },
  {
    "path": "autosa_tests/large/mm/README.md",
    "content": "# Matrix Multiplication (Large)\n\nBoard        | Software Version\n-------------|-----------------\nXilinx Alveo U250 | Xilinx Vitis 2019.2\n\n__Files__:\n```\nautosa_tests/large/mm/kernel.c\nautosa_tests/large/mm/kernel.h\nautosa_tests/large/mm/simd_info.json\nautosa_tests/large/mm/Makefile\nautosa_tests/large/mm/connectivity.cfg\n```\n\n__Command__:\n```c\n./autosa ./autosa_tests/large/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[260,256,512];kernel[]->latency[20,16];kernel[]->simd[8]}\" --simd-info=./autosa_tests/large/mm/simd_info.json --host-serialize\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` and `connectivity.cfg` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/large/mm/Makefile autosa.tmp/output/\ncp autosa_tests/large/mm/connectivity.cfg autosa.tmp/output/\n```\n\nExecute the makefile to build the design.\n\n```\ncd autosa.tmp/output\nmake all\n```"
  },
  {
    "path": "autosa_tests/large/mm/connectivity.cfg",
    "content": "[connectivity]\nsp=kernel0_1.A:DDR[0]\nsp=kernel0_1.B:DDR[1] \nsp=kernel0_1.C:DDR[3]\n"
  },
  {
    "path": "autosa_tests/large/mm/hls_script.tcl",
    "content": "############################################################\n## This file is generated automatically by Vivado HLS.\n## Please DO NOT edit it.\n## Copyright (C) 1986-2019 Xilinx, Inc. All Rights Reserved.\n############################################################\nopen_project hls_prj\nset_top kernel0\nadd_files src/kernel_kernel.h\nadd_files src/kernel_kernel.cpp\nadd_files -tb src/kernel_host.cpp\nopen_solution \"solution1\"\nset_part {xcu200-fsgd2104-2-e}\ncreate_clock -period 5 -name default\nconfig_compile -name_max_length 50\n#source \"./prj/solution1/directives.tcl\"\ncsim_design\n#csynth_design\n#cosim_design\n#cosim_design -trace_level all\n#cosim_design -setup -trace_level all\n#export_design -format ip_catalog\nexit\n"
  },
  {
    "path": "autosa_tests/large/mm/kernel.c",
    "content": "#include \"kernel.h\"\n\n//#define LAYOUT1\n#define LAYOUT2\n//#define LAYOUT3\n\nint main(int argc, char **argv) {\n//  data_t A[I][K], B[K][J], C[I][J], C_golden[I][J]; \n#ifdef LAYOUT2  \n  static data_t A[I][K], B[J][K], C[I][J], C_golden[I][J]; // gemm0,3\n#endif  \n#ifdef LAYOUT3  \n  static data_t A[K][I], B[K][J], C[I][J], C_golden[I][J]; // gemm4\n#endif  \n\n  for (int i = 0; i < I; i++) \n    for (int k = 0; k < K; k++) {\n#ifdef LAYOUT2      \n      A[i][k] = (data_t)rand() / RAND_MAX;\n#endif\n#ifdef LAYOUT3      \n      A[k][i] = (data_t)rand() / RAND_MAX;\n#endif      \n    }\n\n  for (int j = 0; j < J; j++)\n    for (int k = 0; k < K; k++) {\n#ifdef LAYOUT2      \n      B[j][k] = (data_t)rand() / RAND_MAX;\n#endif\n#ifdef LAYOUT3      \n      B[k][j] = (data_t)rand() / RAND_MAX;\n#endif      \n    }\n\n#pragma scop\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n#ifdef LAYOUT2        \n        C[i][j] = C[i][j] + A[i][k] * B[j][k];\n#endif\n#ifdef LAYOUT3      \n        C[i][j] = C[i][j] + A[k][i] * B[k][j];\n#endif        \n      }\n    }\n#pragma endscop\n\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C_golden[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n#ifdef LAYOUT2        \n        C_golden[i][j] = C_golden[i][j] + A[i][k] * B[j][k];\n#endif\n#ifdef LAYOUT3        \n        C_golden[i][j] = C_golden[i][j] + A[k][i] * B[k][j];\n#endif        \n      }\n    }\n\n  int err = 0;\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      if (fabs((float)C_golden[i][j] - (float)C[i][j]) > 0.001)\n        err++;\n    }\n\n  if (err)\n    printf(\"Failed with %d errors!\\n\", err);\n  else\n    printf(\"Passed!\\n\");\n\n  return 0;\n}\n"
  },
  {
    "path": "autosa_tests/large/mm/kernel.h",
    "content": "#include \"stdio.h\"\n#include \"stdlib.h\"\n#include \"math.h\"\n\n//typedef float data_t;\ntypedef int data_t;\n//#define I 1024\n//#define J 1024\n//#define K 1024\n\n//#define I 1040\n//#define J 1024\n//#define K 1024\n\n#define I 208\n#define J 512\n#define K 256\n\n//#define I 1032\n//#define J 1024\n//#define K 1024\n\n//#define I 1024\n//#define J 1032\n//#define K 1024\n\n//#define I 1024\n//#define J 1024\n//#define K 1032\n\n//#define I 1060\n//#define J 1024\n//#define K 1024\n\n//#define I 1040\n//#define J 1024\n//#define K 1024\n\n//#define I 1024\n//#define J 1056\n//#define K 1080\n"
  },
  {
    "path": "autosa_tests/large/mm/simd_info.json",
    "content": "{\n  \"kernel0\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel1\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel2\": {\n    \"reduction\": [\"y\"]\n  }, \n  \"kernel3\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel4\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel5\": {\n    \"reduction\": [\"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/large/mm/step1-run-hls.tcl",
    "content": "open_project kernel0\nset_top kernel0\nadd_files \"src/kernel_kernel.cpp\"\n#add_files -tb PATH_TO_TESTBENCH_FILE\n\nopen_solution solution\n\n#u250\nset_part xcu250-figd2104-2L-e\n\n# u280\n#set_part xcu280-fsvh2892-2L-e\n\n# 300 MHz\ncreate_clock -period 3.333\n\nconfig_dataflow -strict_mode warning\nset_clock_uncertainty 27.000000%\nconfig_rtl -enable_maxiConservative=1\nconfig_interface -m_axi_addr64\n\n# to enable integration with Vitis\nconfig_sdx -target xocc\n\n#csim_design\ncsynth_design\nclose_project\nexit\n"
  },
  {
    "path": "autosa_tests/large/mm/step2-autobridge.py",
    "content": "#! /usr/bin/python3.6\n\n# add the path to where you place the autobridge source code\nimport sys\nsys.path.append('../src')\n\nimport graph\nfrom formator import FormatHLS\nimport collections\nimport os\nimport subprocess\n\n\"\"\"\nAutoBridge divides the target device as follows and assign each HLS function to one slot\nFor more details pls refer to the paper\n\n      u250                     u280\n   -----------\n 3 |    |    |\n   |----|----|              |----|----|\n 2 |    |    |            2 |    |    |\n   |----|----|              |----|----|\n 1 |    |    |            1 |    |    |\n   |----|----|              |----|----|\n 0 |    |    |            0 |    |    |\n   -----------              -----------\n     0    1                   0    1\n\"\"\"\n\n################### Modify Accordingly ###############################\n\n# (1) fill basic information\nproject_path = '/home/jaywang/doc_ab/use/kernel0' # path to your hls project\ntop_name = 'kernel0' # name of the top function in your hls design\nsolution_path = f'{project_path}/solution/'\nproject_name = 'kernel0'\nboard_name = 'u250' # or 'u280'\n# where the results will be saved. Your HLS project will be copied there and your top RTL will be replaced.\n# Note that if the directory already exists, we will try to reset the contents\n\n# (2) specify how your designs connect to the external memory\n\"\"\" Example:\n\nvoid kernel0(ap_uint<512> *p1, ap_uint<512> *p2)\n{\n  #pragma HLS INTERFACE m_axi port=p1 offset=slave bundle=gmem_A\n  #pragma HLS INTERFACE m_axi port=p2 offset=slave bundle=gmem_B\n\n  load_p1 (p1, ...);\n  load_p2 (p2, ...);\n}\n\n--------------------------------------\n\nIn this example, the pointer p1 and p2 will become M_AXI controllers to connect to the dedicated DDR IP.\nIf you want p1 to connect to DDR 2 in the 2-nd SLR, then you need to specify that the corresponding RTL controller must be floorplanned at the 2-nd SLR\nMeanwhile, your function load_p1() will talk to the M_AXI controller also through AXI interface which cannot be easily pipelined.\nThus the RTL module corresponds to load_p1() must also be in the 2-nd SLR in this example.\nSince load_p1() will communicate with the rest of your design using FIFO interface, you don't need to specify the location of other modules\n\n(transparent)|                        (user visible)\n             |\n   Vitis     |                    what your HLS design becomes\n             |\n             | M_AXI                     AXI                        FIFO\nDDR IP  <--- | ----> M_AXI controller <-------> your first module <-------> your other modules\n(fixed loc)  |         (p1)                       (load_p1)\n             |\n             | M_AXI                     AXI                        FIFO\nDDR IP  <--- | ----> M_AXI controller <-------> your first module <-------> your other modules\n(fixed loc)  |         (p2)                       (load_p2)\n             |\n             | S_AXI\nPCIe    <--- | ----> S_AXI controller\n             |\n\"\"\"\n\n# on the left side or the right side of an SLR\nDDR_loc_2d_x = collections.defaultdict(dict)\n\n# on which SLR\nDDR_loc_2d_y = collections.defaultdict(dict)\n\n# use DDR 0, 1, 3\nDDR_loc_2d_y['A_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_x['A_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_A_m_axi_U'] = 0\nDDR_loc_2d_x['kernel0_gmem_A_m_axi_U'] = 0\n\nDDR_loc_2d_y['B_IO_L3_in_serialize_U0'] = 1\nDDR_loc_2d_x['B_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_B_m_axi_U'] = 1\nDDR_loc_2d_x['kernel0_gmem_B_m_axi_U'] = 0\n\nDDR_loc_2d_y['C_drain_IO_L3_out_serialize_U0'] = 3\nDDR_loc_2d_x['C_drain_IO_L3_out_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_C_m_axi_U'] = 3\nDDR_loc_2d_x['kernel0_gmem_C_m_axi_U'] = 0\n\nDDR_loc_2d_y['kernel0_control_s_axi_U'] = 0\n\n# (3) specify DDR information\n# If you instantiate a DDR controller, it will consume non-trivial amount of resource\n# to make the floorplanning better, you need to specify which DDRs have been enabled\n# In this example, you connect p1 to DDR-2 in SLR-2 and p2 to DDR-1 in SLR-1\n# If you want to use all DDRs, for example, you need to set it as [1, 1, 1, 1]\nDDR_enable = [1, 1, 0, 1]\n\n# (4) specify how much resource can be used in each slot\n# In this way you could force the design to be placed evenly across the device and avoid local congestion\n\"\"\" Example:\n   -----------\n 3 |0.76|0.62|\n   |----|----|\n 2 |0.74|0.61|\n   |----|----|\n 1 |0.75|0.6 |\n   |----|----|\n 0 | 0.7|0.6 |\n   -----------\n     0    1\n\"\"\"\nmax_usage_ratio_2d = [ [0.8, 0.7], [0.85, 0.75], [0.85, 0.85], [0.85, 0.7] ]\n\n\n##################### DON'T TOUCH THE SECTION BELOW #################################\ntarget_dir = 'autobridge_prj'\n\nformator = FormatHLS(\n  rpt_path = f'{solution_path}/syn/report/',\n  hls_sche_path = f'{solution_path}/.autopilot/db/',\n  top_hdl_path = f'{solution_path}/syn/verilog/{top_name}_{top_name}.v',\n  top_name = top_name,\n  DDR_loc_2d_x = DDR_loc_2d_x,\n  DDR_loc_2d_y = DDR_loc_2d_y,\n  DDR_enable = DDR_enable,\n  max_usage_ratio_2d = max_usage_ratio_2d,\n  board_name = board_name,\n  target_dir = target_dir,\n  relay_station_count = lambda x : 2 * x, # how many levels of relay stations to add for x-unit of crossing\n  max_search_time = 600,\n  NaiveBalance = True)\n\n# run floorplanning\ng = graph.Graph(formator)\n\n# move results to target dir\nif (os.path.isdir(target_dir)):\n  subprocess.run(['rm', '-rf', f'{target_dir}'])\nsubprocess.run(['mkdir', f'{target_dir}/'])\nsubprocess.run(['cp', '-r', project_path, f'{target_dir}/{project_name}'])\nsubprocess.run(['cp', os.path.realpath(__file__), f'{target_dir}/archived_source.txt'])\nsubprocess.run(['chmod', '+w', '-R', f'{target_dir}'])\nsubprocess.run(['cp', 'constraint.tcl', target_dir])\nsubprocess.run(['cp', 'pack_xo.tcl', target_dir])\nsubprocess.run(['cp', 'autobridge.log', target_dir])\nsubprocess.run(['cp', f'{top_name}_{top_name}.v', f'{target_dir}/{project_name}/solution/syn/verilog/'])\n\n# clean up\nos.system('rm *.lp')\nsubprocess.run(['rm', 'parser.out'])\nsubprocess.run(['rm', 'parsetab.py'])\nsubprocess.run(['rm', '-rf', '__pycache__'])\n\n"
  },
  {
    "path": "autosa_tests/large/mm/step3-pack-xo.tcl",
    "content": "open_project kernel0\nopen_solution solution\nexport_design -rtl verilog -format ip_catalog -xo kernel0.xo\n\nclose_project\nputs \"Pack XO successfully\"\nexit\n"
  },
  {
    "path": "autosa_tests/large/mm/step4-run-vitis.sh",
    "content": "OUTPUT_DIR=\"$(pwd)/vitis_run\"\n\n# name of the top function\nTOP=kernel0\n\n# choose the target device\nPLATFORM=xilinx_u250_xdma_201830_2 \n#PLATFORM=xilinx_u280_xdma_201920_3 \n\nXO=\"$(pwd)/kernel0.xo\"\n\n# For different approaches see UG904-vivado-implementation\nSTRATEGY=\"Default\" \n#STRATEGY=\"EarlyBlockPlacement\" \n\n# remove the unused '--connectivity.sp' option for v++ if some DDRs are not used \n# Example: if we map p1 to DDR 3 and p2 to DDR 0\n#\n# void kernel0(ap_uint<512> *p1, ap_uint<512> *p2)\n# {\n#   #pragma HLS INTERFACE m_axi port=p1 offset=slave bundle=gmem_A\n#   #pragma HLS INTERFACE m_axi port=p2 offset=slave bundle=gmem_B\n# \n#   load_p1 (p1, ...);\n#   load_p2 (p2, ...);\n# }\n#\n# ARG_FOR_DDR_0=p2\n# ARG_FOR_DDR_3=p1\n# Should remove '--connectivity.sp' for DDR1 and DDR2\n\nARG_FOR_DDR_1=A\nARG_FOR_DDR_2=B\n#ARG_FOR_DDR_3=\"YOUR_HLS_ARGUMENT_NAME_FOR_DDR_3\"\nARG_FOR_DDR_4=C\n\n# the constraint file containing the floorplan results\n# WARNING: must use absolute address\nCONSTRAINT=\"$(pwd)/constraint.tcl\"\nif [ ! -f \"$CONSTRAINT\" ]; then\n    echo \"no constraint file found\"\n    exit\nfi\n\nv++ \\\n  --link \\\n  --output \"${OUTPUT_DIR}/${TOP}_${PLATFORM}.xclbin\" \\\n  --kernel ${TOP} \\\n  --platform ${PLATFORM} \\\n  --target hw \\\n  --report_level 2 \\\n  --temp_dir \"${OUTPUT_DIR}/${TOP}_${PLATFORM}.temp\" \\\n  --optimize 3 \\\n  --connectivity.nk ${TOP}:1:${TOP}_1 \\\n  --max_memory_ports ${TOP} \\\n  --save-temps \\\n  ${XO} \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_1}:DDR[0] \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_2}:DDR[1] \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_4}:DDR[3] \\\n  --kernel_frequency 300 \\\n  --vivado.prop run.impl_1.STEPS.PLACE_DESIGN.ARGS.DIRECTIVE=$STRATEGY \\\n  --vivado.prop run.impl_1.STEPS.OPT_DESIGN.TCL.PRE=$CONSTRAINT\n"
  },
  {
    "path": "autosa_tests/large/mm_block_sparse/Makefile",
    "content": "VPP := $(XILINX_VITIS)/bin/v++\nEMCONFIGUTIL := $(XILINX_VITIS)/bin/emconfigutil\nMODE := hw\n#PLATFORM := xilinx_u200_qdma_201920_1\nPLATFORM := xilinx_u250_xdma_201830_2\n\n# sources\nKERNEL_SRC := src/kernel_kernel.cpp\nHOST_SRC := src/kernel_host.cpp\n\n# targets\nHOST_EXE := host.exe\n\nXOS := kernel0.$(MODE).xo\nXCLBIN := kernel0.$(MODE).xclbin\nEMCONFIG_FILE := emconfig.json\n\n# Linker options to map kernel ports to DDR banks\nVPP_LINK_OPTS := --config connectivity.cfg\n\nVPP_COMMON_OPTS := -s -t $(MODE) --platform $(PLATFORM) -R2 -O3 --kernel_frequency 250 --vivado.prop=run.impl_1.STRATEGY=Performance_EarlyBlockPlacement\nCFLAGS := -g -std=c++11 -I$(XILINX_XRT)/include\nLFLAGS := -L$(XILINX_XRT)/lib -lxilinxopencl -lpthread -lrt\nNUMDEVICES := 1\n\n# run time args\nEXE_OPT := kernel0.$(MODE).xclbin\n\n# primary build targets\n.PHONY: xclbin app all\n\nxclbin:  $(XCLBIN)\napp: $(HOST_EXE)\n\nall: xclbin app\n\nclean:\n\t-$(RM) $(EMCONFIG_FILE) $(HOST_EXE) $(XCLBIN) *.xclbin *.xo $(XOS)\n\n# kernel rules\n$(XOS): $(KERNEL_SRC)\n\t$(RM) $@\n\t$(VPP) $(VPP_COMMON_OPTS) -c -k kernel0 -o $@ $+\n\n\n$(XCLBIN): $(XOS)\n\t$(VPP) $(VPP_COMMON_OPTS) -l -o $@ $+ $(VPP_LINK_OPTS)\n\n# host rules\n$(HOST_EXE): $(HOST_SRC)\n\tg++ $(CFLAGS) -o $@ $+ $(LFLAGS)\n\t@echo 'Compiled Host Executable: $(HOST_EXE)'\n\n$(EMCONFIG_FILE):\n\t$(EMCONFIGUTIL) --nd $(NUMDEVICES) --od . --platform $(PLATFORM)\n\ncheck: $(XCLBIN) $(HOST_EXE) $(EMCONFIG_FILE)\n\tXCL_EMULATION_MODE=${MODE} ./$(HOST_EXE) $(EXE_OPT)\n"
  },
  {
    "path": "autosa_tests/large/mm_block_sparse/README.md",
    "content": "# Matrix Multiplication with Block Sparsity (Large)\n\nBoard        | Software Version\n-------------|-----------------\nXilinx Alveo U250 | Xilinx Vitis 2019.2\n\n__Files__:\n```\nautosa_tests/large/mm_block_sparse/kernel.c\nautosa_tests/large/mm_block_sparse/kernel.h\nautosa_tests/large/mm_block_sparse/simd_info.json\nautosa_tests/large/mm_block_sparse/Makefile\nautosa_tests/large/mm_block_sparse/connectivity.cfg\nautosa_tests/large/mm_block_sparse/hls_script.tcl\n```\n\n__Command__:\nTo run the HLS flow for C/RTL simulation\n```bash\n./autosa ./autosa_tests/large/mm_block_sparse/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[256,256,512];kernel[]->latency[32,32];kernel[]->simd[8]}\" --simd-info=./autosa_tests/large/mm_block_sparse/simd_info.json --host-serialize --hls --block-sparse --block-sparse-ratio=\"{kernel[]->A[4,8]}\"\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `hls_script.tcl` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/mm/hls_script.tcl autosa.tmp/output/\n```\n\nRun the TCL script to build the HLS project.\n\n```\ncd autosa.tmp/output\nvivado_hls -f hls_script.tcl\n```\n\nAlternatively, if you need to generate the bitstream for on-board testing, simply remove the `--hls` flag from the AutoSA command.\n```bash\n./autosa ./autosa_tests/large/mm_block_sparse/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[256,256,512];kernel[]->latency[32,32];kernel[]->simd[8]}\" --simd-info=./autosa_tests/mm_block_sparse/simd_info.json --host-serialize --block-sparse --block-sparse-ratio=\"{kernel[]->A[4,8]}\"\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` and `connectivity.cfg` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/mm/Makefile autosa.tmp/output/\ncp autosa_tests/mm/connectivity.cfg autosa.tmp/output/\n```\n\nExecute the makefile to build the design.\n\n```\ncd autosa.tmp/output\nmake all\nmake check\n```"
  },
  {
    "path": "autosa_tests/large/mm_block_sparse/connectivity.cfg",
    "content": "[connectivity]\nsp=kernel0_1.A:DDR[0]\nsp=kernel0_1.B:DDR[1] \nsp=kernel0_1.C:DDR[2]\n"
  },
  {
    "path": "autosa_tests/large/mm_block_sparse/hls_script.tcl",
    "content": "############################################################\n## This file is generated automatically by Vivado HLS.\n## Please DO NOT edit it.\n## Copyright (C) 1986-2019 Xilinx, Inc. All Rights Reserved.\n############################################################\nopen_project hls_prj\nset_top kernel0\nadd_files src/kernel_kernel.h\nadd_files src/kernel_kernel.cpp\nadd_files -tb src/kernel_host.cpp\nopen_solution \"solution1\"\nset_part {xcu200-fsgd2104-2-e}\ncreate_clock -period 5 -name default\nconfig_compile -name_max_length 50\n#source \"./prj/solution1/directives.tcl\"\ncsim_design\n#csynth_design\n#cosim_design\n#cosim_design -trace_level all\n#cosim_design -setup -trace_level all\n#export_design -format ip_catalog\nexit\n"
  },
  {
    "path": "autosa_tests/large/mm_block_sparse/kernel.c",
    "content": "/* This example uses the block sparsity to compute a matrix multiplication.\n * C = A * B\n * The matrix A is with block sparsity and the matrix B is dense.\n * For matrix A, every VEC_LEN elements are grouped into a vector.\n * Inside each vector, there are NUM_NZERO non-zero elements.\n * The sparsity of the matrix A is computed as 1 - NUM_NZERO / VEC_LEN.\n * To store the sparse matrix A, we use two data structs,\n * A_d for storing the non-zero elements and A_i for storing the offset of non-zero elements in each vector.\n * As an example, for matrix A of size I * K, where I = K = 8,\n * suppose that we have VEC_LEN = 4 and NUM_NZERO = 2, we denote the compression ratio\n * COMPRESS_RATIO = VEC_LEN / NUM_NZERO\n * then, we will have A_d[I][K / COMPRESS_RATIO],\n * for A_i, we use a char to store the mask of non-zero elements.\n * For example, if the vector is 0 1 0 2, we will have a mask 0101_0000 to store the \n * offsets of non-zero elements.\n * Currently, we assume the vector length is a power of two and is no greater than 8.\n * If it is grater than 8, we could use a larger-width data type to store the offset accordingly.\n * Based on the analysis above, we will have the index matrix A_i as\n * char A_i[I][K / VEC_LEN].\n * In summary, we use A_d[I][K / COMPRESS_RATIO] and A_i[I][K / VEC_LEN] to represent the sparse matrix.\n */\n#include \"kernel.h\"\n\nint main(int argc, char **argv) {\n  static data_t A[I][K], B[J][K], C[I][J], C_golden[I][J];\n  static data_t A_d[I][K / COMPRESS_RATIO];\n  static unsigned char A_i[I][K / VEC_LEN];\n  static data_t A_s[I][K / EFF_COMPRESS_RATIO];\n\n  for (int i = 0; i < I; i++) \n    for (int k = 0; k < K; k++) {\n      A[i][k] = (data_t)rand() / RAND_MAX;\n    }\n\n  for (int j = 0; j < J; j++)\n    for (int k = 0; k < K; k++) {\n      B[j][k] = (data_t)rand() / RAND_MAX;\n    }\n\n  for (int i = 0; i < I; i++)\n    for (int k = 0; k < K / VEC_LEN; k++) {\n      unsigned char offset = 0;\n      int n = 0;\n      while (n < NON_ZERO_NUM) {      \n        int pos = rand() % VEC_LEN;\n        /* Check if this position is already inserted */        \n        unsigned char cur_mask = offset & (1 << pos);\n        if (cur_mask) {\n          continue;\n        }\n        offset = offset | (1 << pos);\n        n++;\n      }\n      A_i[i][k] = offset;\n\n      int pos = 0;\n      int non_zero_pos = 0;\n      while (pos < VEC_LEN) {\n        unsigned char cur_mask = offset & (1 << pos);\n        if (cur_mask) {\n          A_d[i][k * NON_ZERO_NUM + non_zero_pos] = A[i][k * VEC_LEN + pos];\n          non_zero_pos++;\n        }\n        pos++;\n      }      \n    }\n\n  for (int i = 0; i < I; i++)\n    for (int k = 0; k < K / VEC_LEN; k++) {\n      int n;\n      for (n = 0; n < NON_ZERO_NUM; n++) {\n        A_s[i][k * (NON_ZERO_NUM + META_DATA_NUM) + n] = A_d[i][k * NON_ZERO_NUM + n];\n      }\n      unsigned char offset = A_i[i][k];\n      union {data_t d; unsigned char c;} u;\n      u.c = offset;\n      A_s[i][k * (NON_ZERO_NUM + META_DATA_NUM) + n] = u.d;\n    }\n\n  /* For polyheral analysis */\n#pragma scop\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n        C[i][j] = C[i][j] + A[i][k] * B[j][k];\n      }\n    }\n#pragma endscop\n\n//  /* The actual computation */\n//  for (int i = 0; i < I; i++)  \n//    for (int j = 0; j < J; j++) {\n//      C[i][j] = 0;\n//      for (int k = 0; k < K / VEC_LEN; k++) {\n//        /* Extract the non zero offset */\n//        int offset[NON_ZERO_NUM];\n//        unsigned char mask = A_i[i][k];\n//        int pos = 0;\n//        int non_zero_pos = 0;\n//        while (pos < VEC_LEN) {\n//          unsigned char cur_mask = mask & (1 << pos);\n//          if (cur_mask) {\n//            offset[non_zero_pos] = pos;\n//            non_zero_pos++;\n//          }\n//          pos++;\n//        }\n//        for (int n = 0; n < NON_ZERO_NUM; n++) {\n//          C[i][j] += A_d[i][k * NON_ZERO_NUM + n] * B[j][k * VEC_LEN + offset[n]];\n//        }\n//      }\n//    }\n\n  for (int i = 0; i < I; i++)  \n    for (int j = 0; j < J; j++) {\n      C_golden[i][j] = 0;\n      for (int k = 0; k < K / VEC_LEN; k++) {\n        /* Extract the non zero offset */\n        int offset[NON_ZERO_NUM];\n        unsigned char mask = A_i[i][k];\n        int pos = 0;\n        int non_zero_pos = 0;\n        while (pos < VEC_LEN) {\n          unsigned char cur_mask = mask & (1 << pos);\n          if (cur_mask) {\n            offset[non_zero_pos] = pos;\n            non_zero_pos++;\n          }\n          pos++;\n        }\n        for (int n = 0; n < NON_ZERO_NUM; n++) {\n          C_golden[i][j] += A_d[i][k * NON_ZERO_NUM + n] * B[j][k * VEC_LEN + offset[n]];\n        }\n      }\n    }  \n\n  int err = 0;\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      if (fabs((float)C_golden[i][j] - (float)C[i][j]) > 0.001)\n        err++;\n    }\n\n  if (err)\n    printf(\"Failed with %d errors!\\n\", err);\n  else\n    printf(\"Passed!\\n\");\n\n  return 0;\n}\n"
  },
  {
    "path": "autosa_tests/large/mm_block_sparse/kernel.h",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <math.h>\n\ntypedef float data_t;\n#define I 1024\n#define J 1024\n#define K 1024\n\n// Sparsity [3:4]\n//#define VEC_LEN 4\n//#define NON_ZERO_NUM 3\n//#define COMPRESS_RATIO (VEC_LEN/NON_ZERO_NUM)\n//#define META_DATA_NUM 1\n//#define EFF_COMPRESS_RATIO (VEC_LEN/(NON_ZERO_NUM+META_DATA_NUM))\n\n// Sparsity [2:4]\n//#define VEC_LEN 4\n//#define NON_ZERO_NUM 2\n//#define COMPRESS_RATIO (VEC_LEN/NON_ZERO_NUM)\n//#define META_DATA_NUM 2\n//#define EFF_COMPRESS_RATIO (VEC_LEN/(NON_ZERO_NUM+META_DATA_NUM))\n\n// Sparsity [1:4]\n//#define VEC_LEN 4\n//#define NON_ZERO_NUM 1\n//#define COMPRESS_RATIO (VEC_LEN/NON_ZERO_NUM)\n//#define META_DATA_NUM 1\n//#define EFF_COMPRESS_RATIO (VEC_LEN/(NON_ZERO_NUM+META_DATA_NUM))\n\n// Sparsity [4:8]\n#define VEC_LEN 8\n#define NON_ZERO_NUM 4\n#define COMPRESS_RATIO (VEC_LEN/NON_ZERO_NUM)\n#define META_DATA_NUM 4\n#define EFF_COMPRESS_RATIO (VEC_LEN/(NON_ZERO_NUM+META_DATA_NUM))\n\n// Sparsity [3:8]\n//#define VEC_LEN 8\n//#define NON_ZERO_NUM 3\n//#define COMPRESS_RATIO (VEC_LEN/NON_ZERO_NUM)\n//#define META_DATA_NUM 1\n//#define EFF_COMPRESS_RATIO (VEC_LEN/(NON_ZERO_NUM+META_DATA_NUM))\n\n// Sparsity [2:8]\n//#define VEC_LEN 8\n//#define NON_ZERO_NUM 2\n//#define COMPRESS_RATIO (VEC_LEN/NON_ZERO_NUM)\n//#define META_DATA_NUM 2\n//#define EFF_COMPRESS_RATIO (VEC_LEN/(NON_ZERO_NUM+META_DATA_NUM))"
  },
  {
    "path": "autosa_tests/large/mm_block_sparse/simd_info.json",
    "content": "{\n  \"kernel0\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel1\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel2\": {\n    \"reduction\": [\"y\"]\n  }, \n  \"kernel3\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel4\": {\n    \"reduction\": [\"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/large/mm_int16/Makefile",
    "content": "VPP := $(XILINX_VITIS)/bin/v++\nEMCONFIGUTIL := $(XILINX_VITIS)/bin/emconfigutil\nMODE := hw\n#PLATFORM := xilinx_u200_qdma_201920_1\nPLATFORM := xilinx_u250_xdma_201830_2\n\n# sources\nKERNEL_SRC := src/kernel_kernel.cpp\nHOST_SRC := src/kernel_host.cpp\n\n# targets\nHOST_EXE := host.exe\n\nXOS := kernel0.$(MODE).xo\nXCLBIN := kernel0.$(MODE).xclbin\nEMCONFIG_FILE := emconfig.json\n\n# Linker options to map kernel ports to DDR banks\nVPP_LINK_OPTS := --config connectivity.cfg\n\nVPP_COMMON_OPTS := -s -t $(MODE) --platform $(PLATFORM) -R2 -O3 --kernel_frequency 250 --vivado.prop=run.impl_1.STRATEGY=Performance_EarlyBlockPlacement\nCFLAGS := -g -std=c++11 -I$(XILINX_XRT)/include\nLFLAGS := -L$(XILINX_XRT)/lib -lxilinxopencl -lpthread -lrt\nNUMDEVICES := 1\n\n# run time args\nEXE_OPT := kernel0.$(MODE).xclbin\n\n# primary build targets\n.PHONY: xclbin app all\n\nxclbin:  $(XCLBIN)\napp: $(HOST_EXE)\n\nall: xclbin app\n\nclean:\n\t-$(RM) $(EMCONFIG_FILE) $(HOST_EXE) $(XCLBIN) *.xclbin *.xo $(XOS)\n\n# kernel rules\n$(XOS): $(KERNEL_SRC)\n\t$(RM) $@\n\t$(VPP) $(VPP_COMMON_OPTS) -c -k kernel0 -o $@ $+\n\n\n$(XCLBIN): $(XOS)\n\t$(VPP) $(VPP_COMMON_OPTS) -l -o $@ $+ $(VPP_LINK_OPTS)\n\n# host rules\n$(HOST_EXE): $(HOST_SRC)\n\tg++ $(CFLAGS) -o $@ $+ $(LFLAGS)\n\t@echo 'Compiled Host Executable: $(HOST_EXE)'\n\n$(EMCONFIG_FILE):\n\t$(EMCONFIGUTIL) --nd $(NUMDEVICES) --od . --platform $(PLATFORM)\n\ncheck: $(XCLBIN) $(HOST_EXE) $(EMCONFIG_FILE)\n\tXCL_EMULATION_MODE=${MODE} ./$(HOST_EXE) $(EXE_OPT)\n"
  },
  {
    "path": "autosa_tests/large/mm_int16/README.md",
    "content": "# Matrix Multiplication in int16 (Large)\n\nBoard        | Software Version\n-------------|-----------------\nXilinx Alveo U250 | Xilinx Vitis 2019.2\n\n__Files__:\n```\nautosa_tests/large/mm_int16/kernel.c\nautosa_tests/large/mm_int16/kernel.h\nautosa_tests/large/mm_int16/simd_info.json\nautosa_tests/large/mm_int16/Makefile\nautosa_tests/large/mm_int16/connectivity.cfg\n```\n\n__Command__:\n```c\n./autosa ./autosa_tests/large/mm_int16/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[256,256,32];kernel[]->latency[16,16];kernel[]->simd[32]}\" --simd-info=./autosa_tests/large/mm_int16/simd_info.json --host-serialize --data-pack-sizes=\"{kernel[]->A[32,32,64];kernel[]->B[32,32,64];kernel[]->C[32,32,64]}\"\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` and `connectivity.cfg` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/large/mm_int16/Makefile autosa.tmp/output/\ncp autosa_tests/large/mm_int16/connectivity.cfg autosa.tmp/output/\n```\n\nExecute the makefile to build the design.\n\n```\ncd autosa.tmp/output\nmake all\n```"
  },
  {
    "path": "autosa_tests/large/mm_int16/code.c",
    "content": "unsigned short mul_4_0_0 = local_A[0][0] * local_B[0][0];\nunsigned short add_4_0 = mul_4_0_0 + local_A[0][1] * local_B[0][1];\nunsigned short mul_4_1_0 = local_A[0][2] * local_B[0][2];\nunsigned short add_4_1 = mul_4_1_0 + local_A[0][3] * local_B[0][3];\nunsigned short mul_4_2_0 = local_A[0][4] * local_B[0][4];\nunsigned short add_4_2 = mul_4_2_0 + local_A[0][5] * local_B[0][5];\nunsigned short mul_4_3_0 = local_A[0][6] * local_B[0][6];\nunsigned short add_4_3 = mul_4_3_0 + local_A[0][7] * local_B[0][7];\nunsigned short mul_4_4_0 = local_A[0][8] * local_B[0][8];\nunsigned short add_4_4 = mul_4_4_0 + local_A[0][9] * local_B[0][9];\nunsigned short mul_4_5_0 = local_A[0][10] * local_B[0][10];\nunsigned short add_4_5 = mul_4_5_0 + local_A[0][11] * local_B[0][11];\nunsigned short mul_4_6_0 = local_A[0][12] * local_B[0][12];\nunsigned short add_4_6 = mul_4_6_0 + local_A[0][13] * local_B[0][13];\nunsigned short mul_4_7_0 = local_A[0][14] * local_B[0][14];\nunsigned short add_4_7 = mul_4_7_0 + local_A[0][15] * local_B[0][15];\nunsigned short mul_4_8_0 = local_A[0][16] * local_B[0][16];\nunsigned short add_4_8 = mul_4_8_0 + local_A[0][17] * local_B[0][17];\nunsigned short mul_4_9_0 = local_A[0][18] * local_B[0][18];\nunsigned short add_4_9 = mul_4_9_0 + local_A[0][19] * local_B[0][19];\nunsigned short mul_4_10_0 = local_A[0][20] * local_B[0][20];\nunsigned short add_4_10 = mul_4_10_0 + local_A[0][21] * local_B[0][21];\nunsigned short mul_4_11_0 = local_A[0][22] * local_B[0][22];\nunsigned short add_4_11 = mul_4_11_0 + local_A[0][23] * local_B[0][23];\nunsigned short mul_4_12_0 = local_A[0][24] * local_B[0][24];\nunsigned short add_4_12 = mul_4_12_0 + local_A[0][25] * local_B[0][25];\nunsigned short mul_4_13_0 = local_A[0][26] * local_B[0][26];\nunsigned short add_4_13 = mul_4_13_0 + local_A[0][27] * local_B[0][27];\nunsigned short mul_4_14_0 = local_A[0][28] * local_B[0][28];\nunsigned short add_4_14 = mul_4_14_0 + local_A[0][29] * local_B[0][29];\nunsigned short mul_4_15_0 = local_A[0][30] * local_B[0][30];\nunsigned short add_4_15 = mul_4_15_0 + local_A[0][31] * local_B[0][31];\nunsigned short add_3_0 = add_4_0 + add_4_1;\nunsigned short add_3_1 = add_4_2 + add_4_3;\nunsigned short add_3_2 = add_4_4 + add_4_5;\nunsigned short add_3_3 = add_4_6 + add_4_7;\nunsigned short add_3_4 = add_4_8 + add_4_9;\nunsigned short add_3_5 = add_4_10 + add_4_11;\nunsigned short add_3_6 = add_4_12 + add_4_13;\nunsigned short add_3_7 = add_4_14 + add_4_15;\nunsigned short add_2_0 = add_3_0 + add_3_1;\nunsigned short add_2_1 = add_3_2 + add_3_3;\nunsigned short add_2_2 = add_3_4 + add_3_5;\nunsigned short add_2_3 = add_3_6 + add_3_7;\nunsigned short add_1_0 = add_2_0 + add_2_1;\nunsigned short add_1_1 = add_2_2 + add_2_3;\nunsigned short add_0_0 = add_1_0 + add_1_1;\nlocal_C[c7][c6] += add_0_0;\n"
  },
  {
    "path": "autosa_tests/large/mm_int16/connectivity.cfg",
    "content": "[connectivity]\nsp=kernel0_1.A:DDR[0]\nsp=kernel0_1.B:DDR[1] \nsp=kernel0_1.C:DDR[3]\n"
  },
  {
    "path": "autosa_tests/large/mm_int16/hls_script.tcl",
    "content": "############################################################\n## This file is generated automatically by Vivado HLS.\n## Please DO NOT edit it.\n## Copyright (C) 1986-2019 Xilinx, Inc. All Rights Reserved.\n############################################################\nopen_project hls_prj\nset_top kernel0\nadd_files src/kernel_kernel.h\nadd_files src/kernel_kernel.cpp\nadd_files -tb src/kernel_host.cpp\nopen_solution \"solution1\"\nset_part {xcu200-fsgd2104-2-e}\ncreate_clock -period 5 -name default\nconfig_compile -name_max_length 50\n#source \"./prj/solution1/directives.tcl\"\ncsim_design\n#csynth_design\n#cosim_design\n#cosim_design -trace_level all\n#cosim_design -setup -trace_level all\n#export_design -format ip_catalog\nexit\n"
  },
  {
    "path": "autosa_tests/large/mm_int16/kernel.c",
    "content": "#include \"kernel.h\"\n\nint main(int argc, char **argv) {\n//  data_t A[I][K], B[K][J], C[I][J], C_golden[I][J]; \n  static data_t A[I][K], B[J][K], C[I][J], C_golden[I][J];\n\n  for (int i = 0; i < I; i++) \n    for (int k = 0; k < K; k++) {\n      A[i][k] = rand() % 100;\n    }\n\n  for (int j = 0; j < J; j++)\n    for (int k = 0; k < K; k++) {\n      B[j][k] = rand() % 100;\n    }\n\n#pragma scop\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n        C[i][j] = C[i][j] + A[i][k] * B[j][k];\n      }\n    }\n#pragma endscop\n\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C_golden[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n        C_golden[i][j] = C_golden[i][j] + A[i][k] * B[j][k];\n      }\n    }\n\n  int err = 0;\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      if (abs(C_golden[i][j] - C[i][j]) > 0.001)\n        err++;\n    }\n\n  if (err)\n    printf(\"Failed with %d errors!\\n\", err);\n  else\n    printf(\"Passed!\\n\");\n\n  return 0;\n}\n"
  },
  {
    "path": "autosa_tests/large/mm_int16/kernel.h",
    "content": "#include \"stdio.h\"\n#include \"stdlib.h\"\n#include \"math.h\"\n\ntypedef unsigned short data_t;\n#define I 1024\n#define J 1024\n#define K 1024 "
  },
  {
    "path": "autosa_tests/large/mm_int16/simd_info.json",
    "content": "{\n  \"kernel0\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel1\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel3\": {\n    \"reduction\": [\"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/large/mm_int16/step1-run-hls.tcl",
    "content": "open_project kernel0\nset_top kernel0\nadd_files \"src/kernel_kernel.cpp\"\n#add_files -tb PATH_TO_TESTBENCH_FILE\n\nopen_solution solution\n\n#u250\nset_part xcu250-figd2104-2L-e\n\n# u280\n#set_part xcu280-fsvh2892-2L-e\n\n# 300 MHz\ncreate_clock -period 3.333\n\nconfig_dataflow -strict_mode warning\nset_clock_uncertainty 27.000000%\nconfig_rtl -enable_maxiConservative=1\nconfig_interface -m_axi_addr64\n\n# to enable integration with Vitis\nconfig_sdx -target xocc\n\n#csim_design\ncsynth_design\nclose_project\nexit\n"
  },
  {
    "path": "autosa_tests/large/mm_int16/step2-autobridge.py",
    "content": "#! /usr/bin/python3.6\n\n# add the path to where you place the autobridge source code\nimport sys\nsys.path.append('../src')\n\nimport graph\nfrom formator import FormatHLS\nimport collections\nimport os\nimport subprocess\n\n\"\"\"\nAutoBridge divides the target device as follows and assign each HLS function to one slot\nFor more details pls refer to the paper\n\n      u250                     u280\n   -----------\n 3 |    |    |\n   |----|----|              |----|----|\n 2 |    |    |            2 |    |    |\n   |----|----|              |----|----|\n 1 |    |    |            1 |    |    |\n   |----|----|              |----|----|\n 0 |    |    |            0 |    |    |\n   -----------              -----------\n     0    1                   0    1\n\"\"\"\n\n################### Modify Accordingly ###############################\n\n# (1) fill basic information\nproject_path = '/home/jaywang/doc_examples/mm_int16_ab/kernel0' # path to your hls project\n#project_path = '/home/jaywang/doc_examples/mm_ab/kernel0' # path to your hls project\ntop_name = 'kernel0' # name of the top function in your hls design\nsolution_path = f'{project_path}/solution/'\nproject_name = 'kernel0'\nboard_name = 'u250' # or 'u280'\n# where the results will be saved. Your HLS project will be copied there and your top RTL will be replaced.\n# Note that if the directory already exists, we will try to reset the contents\n\n# (2) specify how your designs connect to the external memory\n\"\"\" Example:\n\nvoid kernel0(ap_uint<512> *p1, ap_uint<512> *p2)\n{\n  #pragma HLS INTERFACE m_axi port=p1 offset=slave bundle=gmem_A\n  #pragma HLS INTERFACE m_axi port=p2 offset=slave bundle=gmem_B\n\n  load_p1 (p1, ...);\n  load_p2 (p2, ...);\n}\n\n--------------------------------------\n\nIn this example, the pointer p1 and p2 will become M_AXI controllers to connect to the dedicated DDR IP.\nIf you want p1 to connect to DDR 2 in the 2-nd SLR, then you need to specify that the corresponding RTL controller must be floorplanned at the 2-nd SLR\nMeanwhile, your function load_p1() will talk to the M_AXI controller also through AXI interface which cannot be easily pipelined.\nThus the RTL module corresponds to load_p1() must also be in the 2-nd SLR in this example.\nSince load_p1() will communicate with the rest of your design using FIFO interface, you don't need to specify the location of other modules\n\n(transparent)|                        (user visible)\n             |\n   Vitis     |                    what your HLS design becomes\n             |\n             | M_AXI                     AXI                        FIFO\nDDR IP  <--- | ----> M_AXI controller <-------> your first module <-------> your other modules\n(fixed loc)  |         (p1)                       (load_p1)\n             |\n             | M_AXI                     AXI                        FIFO\nDDR IP  <--- | ----> M_AXI controller <-------> your first module <-------> your other modules\n(fixed loc)  |         (p2)                       (load_p2)\n             |\n             | S_AXI\nPCIe    <--- | ----> S_AXI controller\n             |\n\"\"\"\n\n# on the left side or the right side of an SLR\nDDR_loc_2d_x = collections.defaultdict(dict)\n\n# on which SLR\nDDR_loc_2d_y = collections.defaultdict(dict)\n\n# use DDR 0, 1, 3\nDDR_loc_2d_y['A_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_x['A_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_A_m_axi_U'] = 0\nDDR_loc_2d_x['kernel0_gmem_A_m_axi_U'] = 0\n\nDDR_loc_2d_y['B_IO_L3_in_serialize_U0'] = 1\nDDR_loc_2d_x['B_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_B_m_axi_U'] = 1\nDDR_loc_2d_x['kernel0_gmem_B_m_axi_U'] = 0\n\nDDR_loc_2d_y['C_drain_IO_L3_out_serialize_U0'] = 3\nDDR_loc_2d_x['C_drain_IO_L3_out_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_C_m_axi_U'] = 3\nDDR_loc_2d_x['kernel0_gmem_C_m_axi_U'] = 0\n\nDDR_loc_2d_y['kernel0_control_s_axi_U'] = 0\n\n# (3) specify DDR information\n# If you instantiate a DDR controller, it will consume non-trivial amount of resource\n# to make the floorplanning better, you need to specify which DDRs have been enabled\n# In this example, you connect p1 to DDR-2 in SLR-2 and p2 to DDR-1 in SLR-1\n# If you want to use all DDRs, for example, you need to set it as [1, 1, 1, 1]\nDDR_enable = [1, 1, 0, 1]\n\n# (4) specify how much resource can be used in each slot\n# In this way you could force the design to be placed evenly across the device and avoid local congestion\n\"\"\" Example:\n   -----------\n 3 |0.76|0.62|\n   |----|----|\n 2 |0.74|0.61|\n   |----|----|\n 1 |0.75|0.6 |\n   |----|----|\n 0 | 0.7|0.6 |\n   -----------\n     0    1\n\"\"\"\nmax_usage_ratio_2d = [ [0.85, 0.7], [0.85, 0.7], [0.85, 0.85], [0.85, 0.7] ]\n\n\n##################### DON'T TOUCH THE SECTION BELOW #################################\ntarget_dir = '/home/jaywang/doc_examples/mm_int16_ab/autobridge'\n\nformator = FormatHLS(\n  rpt_path = f'{solution_path}/syn/report/',\n  hls_sche_path = f'{solution_path}/.autopilot/db/',\n  top_hdl_path = f'{solution_path}/syn/verilog/{top_name}_{top_name}.v',\n  top_name = top_name,\n  DDR_loc_2d_x = DDR_loc_2d_x,\n  DDR_loc_2d_y = DDR_loc_2d_y,\n  DDR_enable = DDR_enable,\n  max_usage_ratio_2d = max_usage_ratio_2d,\n  board_name = board_name,\n  target_dir = target_dir,\n  relay_station_count = lambda x : 2 * x, # how many levels of relay stations to add for x-unit of crossing\n  max_search_time = 600,\n  NaiveBalance = True)\n\n# run floorplanning\ng = graph.Graph(formator)\n\n# move results to target dir\nif (os.path.isdir(target_dir)):\n  subprocess.run(['rm', '-rf', f'{target_dir}'])\nsubprocess.run(['mkdir', f'{target_dir}/'])\nsubprocess.run(['cp', '-r', project_path, f'{target_dir}/{project_name}'])\nsubprocess.run(['cp', os.path.realpath(__file__), f'{target_dir}/archived_source.txt'])\nsubprocess.run(['chmod', '+w', '-R', f'{target_dir}'])\nsubprocess.run(['cp', 'constraint.tcl', target_dir])\nsubprocess.run(['cp', 'pack_xo.tcl', target_dir])\nsubprocess.run(['cp', 'autobridge.log', target_dir])\nsubprocess.run(['cp', f'{top_name}_{top_name}.v', f'{target_dir}/{project_name}/solution/syn/verilog/'])\n\n# clean up\nos.system('rm *.lp')\nsubprocess.run(['rm', 'parser.out'])\nsubprocess.run(['rm', 'parsetab.py'])\nsubprocess.run(['rm', '-rf', '__pycache__'])\n\n"
  },
  {
    "path": "autosa_tests/large/mm_int16/step3-pack-xo.tcl",
    "content": "open_project kernel0\nopen_solution solution\nexport_design -rtl verilog -format ip_catalog -xo kernel0.xo\n\nclose_project\nputs \"Pack XO successfully\"\nexit\n"
  },
  {
    "path": "autosa_tests/large/mm_int16/step4-run-vitis.sh",
    "content": "OUTPUT_DIR=\"$(pwd)/vitis_run\"\n\n# name of the top function\nTOP=kernel0\n\n# choose the target device\nPLATFORM=xilinx_u250_xdma_201830_2 \n#PLATFORM=xilinx_u280_xdma_201920_3 \n\nXO=\"$(pwd)/kernel0.xo\"\n\n# For different approaches see UG904-vivado-implementation\n#STRATEGY=\"Default\" \nSTRATEGY=\"EarlyBlockPlacement\" \n\n# remove the unused '--connectivity.sp' option for v++ if some DDRs are not used \n# Example: if we map p1 to DDR 3 and p2 to DDR 0\n#\n# void kernel0(ap_uint<512> *p1, ap_uint<512> *p2)\n# {\n#   #pragma HLS INTERFACE m_axi port=p1 offset=slave bundle=gmem_A\n#   #pragma HLS INTERFACE m_axi port=p2 offset=slave bundle=gmem_B\n# \n#   load_p1 (p1, ...);\n#   load_p2 (p2, ...);\n# }\n#\n# ARG_FOR_DDR_0=p2\n# ARG_FOR_DDR_3=p1\n# Should remove '--connectivity.sp' for DDR1 and DDR2\n\nARG_FOR_DDR_1=A\nARG_FOR_DDR_2=B\n#ARG_FOR_DDR_3=\"YOUR_HLS_ARGUMENT_NAME_FOR_DDR_3\"\nARG_FOR_DDR_4=C\n\n# the constraint file containing the floorplan results\n# WARNING: must use absolute address\nCONSTRAINT=\"$(pwd)/constraint.tcl\"\nif [ ! -f \"$CONSTRAINT\" ]; then\n    echo \"no constraint file found\"\n    exit\nfi\n\nv++ \\\n  --link \\\n  --output \"${OUTPUT_DIR}/${TOP}_${PLATFORM}.xclbin\" \\\n  --kernel ${TOP} \\\n  --platform ${PLATFORM} \\\n  --target hw \\\n  --report_level 2 \\\n  --temp_dir \"${OUTPUT_DIR}/${TOP}_${PLATFORM}.temp\" \\\n  --optimize 3 \\\n  --connectivity.nk ${TOP}:1:${TOP}_1 \\\n  --max_memory_ports ${TOP} \\\n  --save-temps \\\n  ${XO} \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_1}:DDR[0] \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_2}:DDR[1] \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_4}:DDR[3] \\\n  --kernel_frequency 300 \\\n  --vivado.prop run.impl_1.STEPS.PLACE_DESIGN.ARGS.DIRECTIVE=$STRATEGY \\\n  --vivado.prop run.impl_1.STEPS.OPT_DESIGN.TCL.PRE=$CONSTRAINT\n"
  },
  {
    "path": "autosa_tests/large/mm_int16/unroll.py",
    "content": "import math\n\n# Modify the parameters here\nUNROLL_FACTOR = 32\nDATA_T = 'unsigned short'\n\n# Generate the code\ndata_type = DATA_T\nlevel = int(math.log2(UNROLL_FACTOR))\nfor layer in range(level - 1, -1, -1):\n    pair = int(math.pow(2, layer))\n    for i in range(pair):\n        # data_t tmp_[layer]_[pair] = tmp_[layer+1]_[pair*2]_[pair*2+1]\n        if layer == level - 1:\n            print(f'{data_type} mul_{layer}_{i}_0 = local_A[0][{i*2}] * local_B[0][{i*2}];')\n            print(f'{data_type} add_{layer}_{i} = mul_{layer}_{i}_0 + local_A[0][{i*2+1}] * local_B[0][{i*2+1}];')\n        else:\n            print(f'{data_type} add_{layer}_{i} = add_{layer+1}_{i*2} + add_{layer+1}_{i*2+1};')\nprint('local_C[c7][c6] += add_0_0;')\n"
  },
  {
    "path": "autosa_tests/large/mm_int8/Makefile",
    "content": "VPP := $(XILINX_VITIS)/bin/v++\nEMCONFIGUTIL := $(XILINX_VITIS)/bin/emconfigutil\nMODE := hw\n#PLATFORM := xilinx_u200_qdma_201920_1\nPLATFORM := xilinx_u250_xdma_201830_2\n\n# sources\nKERNEL_SRC := src/kernel_kernel.cpp\nHOST_SRC := src/kernel_host.cpp\n\n# targets\nHOST_EXE := host.exe\n\nXOS := kernel0.$(MODE).xo\nXCLBIN := kernel0.$(MODE).xclbin\nEMCONFIG_FILE := emconfig.json\n\n# Linker options to map kernel ports to DDR banks\nVPP_LINK_OPTS := --config connectivity.cfg\n\nVPP_COMMON_OPTS := -s -t $(MODE) --platform $(PLATFORM) -R2 -O3 --kernel_frequency 250 --vivado.prop=run.impl_1.STRATEGY=Performance_EarlyBlockPlacement\nCFLAGS := -g -std=c++11 -I$(XILINX_XRT)/include\nLFLAGS := -L$(XILINX_XRT)/lib -lxilinxopencl -lpthread -lrt\nNUMDEVICES := 1\n\n# run time args\nEXE_OPT := kernel0.$(MODE).xclbin\n\n# primary build targets\n.PHONY: xclbin app all\n\nxclbin:  $(XCLBIN)\napp: $(HOST_EXE)\n\nall: xclbin app\n\nclean:\n\t-$(RM) $(EMCONFIG_FILE) $(HOST_EXE) $(XCLBIN) *.xclbin *.xo $(XOS)\n\n# kernel rules\n$(XOS): $(KERNEL_SRC)\n\t$(RM) $@\n\t$(VPP) $(VPP_COMMON_OPTS) -c -k kernel0 -o $@ $+\n\n\n$(XCLBIN): $(XOS)\n\t$(VPP) $(VPP_COMMON_OPTS) -l -o $@ $+ $(VPP_LINK_OPTS)\n\n# host rules\n$(HOST_EXE): $(HOST_SRC)\n\tg++ $(CFLAGS) -o $@ $+ $(LFLAGS)\n\t@echo 'Compiled Host Executable: $(HOST_EXE)'\n\n$(EMCONFIG_FILE):\n\t$(EMCONFIGUTIL) --nd $(NUMDEVICES) --od . --platform $(PLATFORM)\n\ncheck: $(XCLBIN) $(HOST_EXE) $(EMCONFIG_FILE)\n\tXCL_EMULATION_MODE=${MODE} ./$(HOST_EXE) $(EXE_OPT)\n"
  },
  {
    "path": "autosa_tests/large/mm_int8/README.md",
    "content": "# Matrix Multiplication in int8 (Large)\n\nBoard        | Software Version\n-------------|-----------------\nXilinx Alveo U250 | Xilinx Vitis 2019.2\n\n__Files__:\n```\nautosa_tests/large/mm_int8/kernel.c\nautosa_tests/large/mm_int8/kernel.h\nautosa_tests/large/mm_int8/simd_info.json\nautosa_tests/large/mm_int8/Makefile\nautosa_tests/large/mm_int8/connectivity.cfg\n```\n\n__Command__:\n```c\n./autosa ./autosa_tests/large/mm_int8/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[264,256,64];kernel[]->latency[11,32];kernel[]->simd[64]}\" --simd-info=./autosa_tests/large/mm_int8/simd_info.json --host-serialize --data-pack-sizes=\"{kernel[]->A[32,32,64];kernel[]->B[32,32,64];kernel[]->C[32,32,64]}\"\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` and `connectivity.cfg` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/large/mm_int8/Makefile autosa.tmp/output/\ncp autosa_tests/large/mm_int8/connectivity.cfg autosa.tmp/output/\n```\n\nExecute the makefile to build the design.\n\n```\ncd autosa.tmp/output\nmake all\n```"
  },
  {
    "path": "autosa_tests/large/mm_int8/code.c",
    "content": "char mul_5_0_0 = local_A[0][0] * local_B[0][0];\nchar add_5_0 = mul_5_0_0 + local_A[0][1] * local_B[0][1];\nchar mul_5_1_0 = local_A[0][2] * local_B[0][2];\nchar add_5_1 = mul_5_1_0 + local_A[0][3] * local_B[0][3];\nchar mul_5_2_0 = local_A[0][4] * local_B[0][4];\nchar add_5_2 = mul_5_2_0 + local_A[0][5] * local_B[0][5];\nchar mul_5_3_0 = local_A[0][6] * local_B[0][6];\nchar add_5_3 = mul_5_3_0 + local_A[0][7] * local_B[0][7];\nchar mul_5_4_0 = local_A[0][8] * local_B[0][8];\nchar add_5_4 = mul_5_4_0 + local_A[0][9] * local_B[0][9];\nchar mul_5_5_0 = local_A[0][10] * local_B[0][10];\nchar add_5_5 = mul_5_5_0 + local_A[0][11] * local_B[0][11];\nchar mul_5_6_0 = local_A[0][12] * local_B[0][12];\nchar add_5_6 = mul_5_6_0 + local_A[0][13] * local_B[0][13];\nchar mul_5_7_0 = local_A[0][14] * local_B[0][14];\nchar add_5_7 = mul_5_7_0 + local_A[0][15] * local_B[0][15];\nchar mul_5_8_0 = local_A[0][16] * local_B[0][16];\nchar add_5_8 = mul_5_8_0 + local_A[0][17] * local_B[0][17];\nchar mul_5_9_0 = local_A[0][18] * local_B[0][18];\nchar add_5_9 = mul_5_9_0 + local_A[0][19] * local_B[0][19];\nchar mul_5_10_0 = local_A[0][20] * local_B[0][20];\nchar add_5_10 = mul_5_10_0 + local_A[0][21] * local_B[0][21];\nchar mul_5_11_0 = local_A[0][22] * local_B[0][22];\nchar add_5_11 = mul_5_11_0 + local_A[0][23] * local_B[0][23];\nchar mul_5_12_0 = local_A[0][24] * local_B[0][24];\nchar add_5_12 = mul_5_12_0 + local_A[0][25] * local_B[0][25];\nchar mul_5_13_0 = local_A[0][26] * local_B[0][26];\nchar add_5_13 = mul_5_13_0 + local_A[0][27] * local_B[0][27];\nchar mul_5_14_0 = local_A[0][28] * local_B[0][28];\nchar add_5_14 = mul_5_14_0 + local_A[0][29] * local_B[0][29];\nchar mul_5_15_0 = local_A[0][30] * local_B[0][30];\nchar add_5_15 = mul_5_15_0 + local_A[0][31] * local_B[0][31];\nchar mul_5_16_0 = local_A[0][32] * local_B[0][32];\nchar add_5_16 = mul_5_16_0 + local_A[0][33] * local_B[0][33];\nchar mul_5_17_0 = local_A[0][34] * local_B[0][34];\nchar add_5_17 = mul_5_17_0 + local_A[0][35] * local_B[0][35];\nchar mul_5_18_0 = local_A[0][36] * local_B[0][36];\nchar add_5_18 = mul_5_18_0 + local_A[0][37] * local_B[0][37];\nchar mul_5_19_0 = local_A[0][38] * local_B[0][38];\nchar add_5_19 = mul_5_19_0 + local_A[0][39] * local_B[0][39];\nchar mul_5_20_0 = local_A[0][40] * local_B[0][40];\nchar add_5_20 = mul_5_20_0 + local_A[0][41] * local_B[0][41];\nchar mul_5_21_0 = local_A[0][42] * local_B[0][42];\nchar add_5_21 = mul_5_21_0 + local_A[0][43] * local_B[0][43];\nchar mul_5_22_0 = local_A[0][44] * local_B[0][44];\nchar add_5_22 = mul_5_22_0 + local_A[0][45] * local_B[0][45];\nchar mul_5_23_0 = local_A[0][46] * local_B[0][46];\nchar add_5_23 = mul_5_23_0 + local_A[0][47] * local_B[0][47];\nchar mul_5_24_0 = local_A[0][48] * local_B[0][48];\nchar add_5_24 = mul_5_24_0 + local_A[0][49] * local_B[0][49];\nchar mul_5_25_0 = local_A[0][50] * local_B[0][50];\nchar add_5_25 = mul_5_25_0 + local_A[0][51] * local_B[0][51];\nchar mul_5_26_0 = local_A[0][52] * local_B[0][52];\nchar add_5_26 = mul_5_26_0 + local_A[0][53] * local_B[0][53];\nchar mul_5_27_0 = local_A[0][54] * local_B[0][54];\nchar add_5_27 = mul_5_27_0 + local_A[0][55] * local_B[0][55];\nchar mul_5_28_0 = local_A[0][56] * local_B[0][56];\nchar add_5_28 = mul_5_28_0 + local_A[0][57] * local_B[0][57];\nchar mul_5_29_0 = local_A[0][58] * local_B[0][58];\nchar add_5_29 = mul_5_29_0 + local_A[0][59] * local_B[0][59];\nchar mul_5_30_0 = local_A[0][60] * local_B[0][60];\nchar add_5_30 = mul_5_30_0 + local_A[0][61] * local_B[0][61];\nchar mul_5_31_0 = local_A[0][62] * local_B[0][62];\nchar add_5_31 = mul_5_31_0 + local_A[0][63] * local_B[0][63];\nchar add_4_0 = add_5_0 + add_5_1;\nchar add_4_1 = add_5_2 + add_5_3;\nchar add_4_2 = add_5_4 + add_5_5;\nchar add_4_3 = add_5_6 + add_5_7;\nchar add_4_4 = add_5_8 + add_5_9;\nchar add_4_5 = add_5_10 + add_5_11;\nchar add_4_6 = add_5_12 + add_5_13;\nchar add_4_7 = add_5_14 + add_5_15;\nchar add_4_8 = add_5_16 + add_5_17;\nchar add_4_9 = add_5_18 + add_5_19;\nchar add_4_10 = add_5_20 + add_5_21;\nchar add_4_11 = add_5_22 + add_5_23;\nchar add_4_12 = add_5_24 + add_5_25;\nchar add_4_13 = add_5_26 + add_5_27;\nchar add_4_14 = add_5_28 + add_5_29;\nchar add_4_15 = add_5_30 + add_5_31;\nchar add_3_0 = add_4_0 + add_4_1;\nchar add_3_1 = add_4_2 + add_4_3;\nchar add_3_2 = add_4_4 + add_4_5;\nchar add_3_3 = add_4_6 + add_4_7;\nchar add_3_4 = add_4_8 + add_4_9;\nchar add_3_5 = add_4_10 + add_4_11;\nchar add_3_6 = add_4_12 + add_4_13;\nchar add_3_7 = add_4_14 + add_4_15;\nchar add_2_0 = add_3_0 + add_3_1;\nchar add_2_1 = add_3_2 + add_3_3;\nchar add_2_2 = add_3_4 + add_3_5;\nchar add_2_3 = add_3_6 + add_3_7;\nchar add_1_0 = add_2_0 + add_2_1;\nchar add_1_1 = add_2_2 + add_2_3;\nchar add_0_0 = add_1_0 + add_1_1;\n#pragma HLS RESOURCE variable=mul_5_0_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_1_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_2_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_3_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_4_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_5_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_6_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_7_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_8_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_9_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_10_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_11_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_12_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_13_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_14_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_15_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_16_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_17_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_18_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_19_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_20_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_21_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_22_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_23_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_24_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_25_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_26_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_27_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_28_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_29_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_30_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_31_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=add_4_0 core=AddSub\n#pragma HLS RESOURCE variable=add_4_1 core=AddSub\n#pragma HLS RESOURCE variable=add_4_2 core=AddSub\n#pragma HLS RESOURCE variable=add_4_3 core=AddSub\n#pragma HLS RESOURCE variable=add_4_4 core=AddSub\n#pragma HLS RESOURCE variable=add_4_5 core=AddSub\n#pragma HLS RESOURCE variable=add_4_6 core=AddSub\n#pragma HLS RESOURCE variable=add_4_7 core=AddSub\n#pragma HLS RESOURCE variable=add_4_8 core=AddSub\n#pragma HLS RESOURCE variable=add_4_9 core=AddSub\n#pragma HLS RESOURCE variable=add_4_10 core=AddSub\n#pragma HLS RESOURCE variable=add_4_11 core=AddSub\n#pragma HLS RESOURCE variable=add_4_12 core=AddSub\n#pragma HLS RESOURCE variable=add_4_13 core=AddSub\n#pragma HLS RESOURCE variable=add_4_14 core=AddSub\n#pragma HLS RESOURCE variable=add_4_15 core=AddSub\n#pragma HLS RESOURCE variable=add_3_0 core=AddSub\n#pragma HLS RESOURCE variable=add_3_1 core=AddSub\n#pragma HLS RESOURCE variable=add_3_2 core=AddSub\n#pragma HLS RESOURCE variable=add_3_3 core=AddSub\n#pragma HLS RESOURCE variable=add_3_4 core=AddSub\n#pragma HLS RESOURCE variable=add_3_5 core=AddSub\n#pragma HLS RESOURCE variable=add_3_6 core=AddSub\n#pragma HLS RESOURCE variable=add_3_7 core=AddSub\n#pragma HLS RESOURCE variable=add_2_0 core=AddSub\n#pragma HLS RESOURCE variable=add_2_1 core=AddSub\n#pragma HLS RESOURCE variable=add_2_2 core=AddSub\n#pragma HLS RESOURCE variable=add_2_3 core=AddSub\n#pragma HLS RESOURCE variable=add_1_0 core=AddSub\n#pragma HLS RESOURCE variable=add_1_1 core=AddSub\n#pragma HLS RESOURCE variable=add_0_0 core=AddSub\nlocal_C[c7][c6] += add_0_0;\n"
  },
  {
    "path": "autosa_tests/large/mm_int8/connectivity.cfg",
    "content": "[connectivity]\nsp=kernel0_1.A:DDR[0]\nsp=kernel0_1.B:DDR[1] \nsp=kernel0_1.C:DDR[3]\n"
  },
  {
    "path": "autosa_tests/large/mm_int8/hls_script.tcl",
    "content": "############################################################\n## This file is generated automatically by Vivado HLS.\n## Please DO NOT edit it.\n## Copyright (C) 1986-2019 Xilinx, Inc. All Rights Reserved.\n############################################################\nopen_project hls_prj\nset_top kernel0\nadd_files src/kernel_kernel.h\nadd_files src/kernel_kernel.cpp\nadd_files -tb src/kernel_host.cpp\nopen_solution \"solution1\"\nset_part {xcu200-fsgd2104-2-e}\ncreate_clock -period 5 -name default\nconfig_compile -name_max_length 50\n#source \"./prj/solution1/directives.tcl\"\ncsim_design\n#csynth_design\n#cosim_design\n#cosim_design -trace_level all\n#cosim_design -setup -trace_level all\n#export_design -format ip_catalog\nexit\n"
  },
  {
    "path": "autosa_tests/large/mm_int8/kernel.c",
    "content": "#include \"kernel.h\"\n\nint main(int argc, char **argv) {\n//  data_t A[I][K], B[K][J], C[I][J], C_golden[I][J]; \n  static data_t A[I][K], B[J][K], C[I][J], C_golden[I][J];\n\n  for (int i = 0; i < I; i++) \n    for (int k = 0; k < K; k++) {\n      A[i][k] = 1;\n    }\n\n  for (int j = 0; j < J; j++)\n    for (int k = 0; k < K; k++) {\n      B[j][k] = 1;\n    }\n\n#pragma scop\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n        C[i][j] = C[i][j] + A[i][k] * B[j][k];\n      }\n    }\n#pragma endscop\n\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C_golden[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n        C_golden[i][j] = C_golden[i][j] + A[i][k] * B[j][k];\n      }\n    }\n\n  int err = 0;\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      if (abs(C_golden[i][j] - C[i][j]) > 0.001)\n        err++;\n    }\n\n  if (err)\n    printf(\"Failed with %d errors!\\n\", err);\n  else\n    printf(\"Passed!\\n\");\n\n  return 0;\n}\n"
  },
  {
    "path": "autosa_tests/large/mm_int8/kernel.h",
    "content": "#include \"stdio.h\"\n#include \"stdlib.h\"\n#include \"math.h\"\n\ntypedef char data_t;\n//#define I 1024 \n//#define J 1024 \n//#define K 1024 \n\n// Test case 1\n// kernel3 2D IxJ\n#define I 1056\n#define J 1024 \n#define K 1024 "
  },
  {
    "path": "autosa_tests/large/mm_int8/kernel_kernel_opt.cpp",
    "content": "#include <ap_int.h>\n#include <hls_stream.h>\n\n#define min(x,y) ((x < y) ? x : y)\n#define max(x,y) ((x > y) ? x : y)\n\n/* Data Type */\ntypedef char A_t1;\ntypedef char B_t1;\ntypedef char C_t1;\ntypedef ap_uint<512> A_t64;\ntypedef ap_uint<512> B_t64;\ntypedef ap_uint<256> C_t32;\n/* Data Type */\n\nextern \"C\" {\nvoid kernel0(A_t64 *A, B_t64 *B, C_t32 *C);\n}\nvoid A_IO_L2_in_intra_trans(int idx, int c0, int c1, int c2, A_t64 local_A[11][1], hls::stream<A_t64> &fifo_A_local_out, bool intra_trans_en);\nvoid A_IO_L2_in_inter_trans(int idx, int c0, int c1, int c2, A_t64 local_A[11][1], hls::stream<A_t64> &fifo_A_in, hls::stream<A_t64> &fifo_A_out, bool inter_trans_en);\nvoid A_IO_L2_in_inter_trans_boundary(int idx, int c0, int c1, int c2, A_t64 local_A[11][1], hls::stream<A_t64> &fifo_A_in, bool inter_trans_en);\nvoid B_IO_L2_in_intra_trans(int idx, int c0, int c1, int c2, B_t64 local_B[32][1], hls::stream<B_t64> &fifo_B_local_out, bool intra_trans_en);\nvoid B_IO_L2_in_inter_trans(int idx, int c0, int c1, int c2, B_t64 local_B[32][1], hls::stream<B_t64> &fifo_B_in, hls::stream<B_t64> &fifo_B_out, bool inter_trans_en);\nvoid B_IO_L2_in_inter_trans_boundary(int idx, int c0, int c1, int c2, B_t64 local_B[32][1], hls::stream<B_t64> &fifo_B_in, bool inter_trans_en);\nvoid PE_wrapper(int idx, int idy, hls::stream<A_t64> &fifo_A_in, hls::stream<A_t64> &fifo_A_out, hls::stream<B_t64> &fifo_B_in, hls::stream<B_t64> &fifo_B_out, hls::stream<char> &fifo_C_drain_out);\nvoid C_drain_IO_L1_out_intra_trans(int idx, int idy, int c0, int c1, C_t32 local_C[11][1], hls::stream<char> &fifo_C_drain_local_in);\nvoid C_drain_IO_L1_out_inter_trans(int idx, int idy, int c0, int c1, C_t32 local_C[11][1], hls::stream<C_t32> &fifo_C_drain_in, hls::stream<C_t32> &fifo_C_drain_out);\nvoid C_drain_IO_L1_out_inter_trans_boundary(int idx, int idy, int c0, int c1, C_t32 local_C[11][1], hls::stream<C_t32> &fifo_C_drain_out);\nvoid C_drain_IO_L1_out_wrapper(int idx, int idy, hls::stream<C_t32> &fifo_C_drain_in, hls::stream<C_t32> &fifo_C_drain_out, hls::stream<char> &fifo_C_drain_local_in);\nvoid C_drain_IO_L1_out_boundary_wrapper(int idx, int idy, hls::stream<C_t32> &fifo_C_drain_out, hls::stream<char> &fifo_C_drain_local_in);\n\n/* Module Definition */\nvoid A_IO_L3_in(hls::stream<A_t64> &fifo_A_serialize, hls::stream<A_t64> &fifo_A_local_out) {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  /* Variable Declaration */\n\n  for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n    for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1)\n      for (ap_uint<5> c2 = 0; c2 <= 15; c2 += 1) {\n        // array\n        // io_L3\n        for (ap_uint<6> c3 = 0; c3 <= 23; c3 += 1) {\n          // io_L2\n          for (ap_uint<5> c4 = 0; c4 <= 10; c4 += 1) {\n          #pragma HLS PIPELINE II=1\n            // access_coalesce\n            // access_serialize\n            {\n              A_t64 in_data;\n              A_t64 out_data;\n              in_data = fifo_A_serialize.read();\n              out_data = in_data;\n              fifo_A_local_out.write(out_data);\n            }\n          }\n        }\n      }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid A_IO_L3_in_serialize(A_t64 *A, hls::stream<A_t64> &fifo_A_local_out) {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  /* Variable Declaration */\n\n  for (ap_uint<18> i = 0; i < 67584; i++) {\n  #pragma HLS PIPELINE II=1\n    A_t64 fifo_data;\n    fifo_data = A[i];\n    fifo_A_local_out.write(fifo_data);\n  }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid A_IO_L2_in_intra_trans(int idx, int c0, int c1, int c2, A_t64 local_A[11][1], hls::stream<A_t64> &fifo_A_local_out, bool intra_trans_en)\n {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  int p0 = idx; // module id\n  /* Variable Declaration */\n\n  if (!intra_trans_en) return;\n\n\n  // io_L2\n  // io_L1\n  // pe\n  // latency\n  for (ap_uint<6> c6 = 0; c6 <= 31; c6 += 1) {\n    // latency\n    for (ap_uint<5> c7 = 0; c7 <= 10; c7 += 1) {\n    #pragma HLS PIPELINE II=1\n      // simd\n      {\n        A_t64 in_data;\n        A_t64 out_data;\n        in_data = local_A[c7][0];\n        out_data = in_data;\n        fifo_A_local_out.write(out_data);\n      }\n    }\n  }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid A_IO_L2_in_inter_trans(int idx, int c0, int c1, int c2, A_t64 local_A[11][1], hls::stream<A_t64> &fifo_A_in, hls::stream<A_t64> &fifo_A_out, bool inter_trans_en)\n {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  int p0 = idx; // module id\n  /* Variable Declaration */\n\n  if (!inter_trans_en) return;\n\n  for (ap_uint<6> c3 = p0; c3 <= 23; c3 += 1) {\n    // io_L2\n    if (c3 == p0) {\n      for (ap_uint<5> c4 = 0; c4 <= 10; c4 += 1) {\n      #pragma HLS PIPELINE II=1\n        // access_coalesce\n        {\n          A_t64 in_data;\n          A_t64 out_data;\n          in_data = fifo_A_in.read();\n          out_data = in_data;\n          local_A[c4][0] = out_data;\n        }\n      }\n    } else {\n      for (ap_uint<5> c4 = 0; c4 <= 10; c4 += 1) {\n      #pragma HLS PIPELINE II=1\n        // access_coalesce\n        {\n          A_t64 in_data;\n          A_t64 out_data;\n          in_data = fifo_A_in.read();\n          out_data = in_data;\n          fifo_A_out.write(out_data);\n        }\n      }\n    }\n  }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid A_IO_L2_in_inter_trans_boundary(int idx, int c0, int c1, int c2, A_t64 local_A[11][1], hls::stream<A_t64> &fifo_A_in, bool inter_trans_en)\n {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  int p0 = idx; // module id\n  /* Variable Declaration */\n\n  if (!inter_trans_en) return;\n\n  for (ap_uint<6> c3 = p0; c3 <= 23; c3 += 1)\n    if (c3 == p0) {\n      // io_L2\n      for (ap_uint<5> c4 = 0; c4 <= 10; c4 += 1) {\n      #pragma HLS PIPELINE II=1\n        // access_coalesce\n        {\n          A_t64 in_data;\n          A_t64 out_data;\n          in_data = fifo_A_in.read();\n          out_data = in_data;\n          local_A[c4][0] = out_data;\n        }\n      }\n    }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid A_IO_L2_in(int idx, hls::stream<A_t64> &fifo_A_in, hls::stream<A_t64> &fifo_A_out, hls::stream<A_t64> &fifo_A_local_out) {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  int p0 = idx; // module id\n  A_t64 local_A_ping[11][1];\n  #pragma HLS RESOURCE variable=local_A_ping core=RAM_1P_BRAM\n  A_t64 local_A_pong[11][1];\n  #pragma HLS RESOURCE variable=local_A_pong core=RAM_1P_BRAM\n  bool arb = 0;\n  bool inter_trans_en = 1;\n  bool intra_trans_en = 0;\n  int c0, c0_prev;\n  int c1, c1_prev;\n  int c2, c2_prev;\n  /* Variable Declaration */\n\n  {\n    for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n      for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1)\n        for (ap_uint<5> c2 = 0; c2 <= 15; c2 += 1) {\n          // array\n          // io_L3\n          {\n            if (arb == 0) {\n              A_IO_L2_in_inter_trans(\n                /* module id */ idx, \n                /* host iter */ c0, \n                /* host iter */ c1, \n                /* host iter */ c2, \n                /* array */ local_A_pong, \n                /* fifo */ fifo_A_in, \n                /* fifo */ fifo_A_out, \n                /* enable */ inter_trans_en\n              );\n              A_IO_L2_in_intra_trans(\n                /* module id */ idx, \n                /* host iter */ c0_prev, \n                /* host iter */ c1_prev, \n                /* host iter */ c2_prev, \n                /* array */ local_A_ping, \n                /* fifo */ fifo_A_local_out, \n                /* enable */ intra_trans_en\n              );\n            } else {\n              A_IO_L2_in_inter_trans(\n                /* module id */ idx, \n                /* host iter */ c0, \n                /* host iter */ c1, \n                /* host iter */ c2, \n                /* array */ local_A_ping, \n                /* fifo */ fifo_A_in, \n                /* fifo */ fifo_A_out, \n                /* enable */ inter_trans_en\n              );\n              A_IO_L2_in_intra_trans(\n                /* module id */ idx, \n                /* host iter */ c0_prev, \n                /* host iter */ c1_prev, \n                /* host iter */ c2_prev, \n                /* array */ local_A_pong, \n                /* fifo */ fifo_A_local_out, \n                /* enable */ intra_trans_en\n              );\n            }\n            intra_trans_en = 1;\n            arb = !arb;\n            c0_prev = c0;\n            c1_prev = c1;\n            c2_prev = c2;\n          }\n        }\n    if (arb == 0) {\n      A_IO_L2_in_intra_trans(\n        /* module id */ idx, \n        /* host iter */ c0_prev, \n        /* host iter */ c1_prev, \n        /* host iter */ c2_prev, \n        /* array */ local_A_ping, \n        /* fifo */ fifo_A_local_out, \n        /* enable */ intra_trans_en\n      );\n    } else {\n      A_IO_L2_in_intra_trans(\n        /* module id */ idx, \n        /* host iter */ c0_prev, \n        /* host iter */ c1_prev, \n        /* host iter */ c2_prev, \n        /* array */ local_A_pong, \n        /* fifo */ fifo_A_local_out, \n        /* enable */ intra_trans_en\n      );\n    }\n  }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid A_IO_L2_in_boundary(int idx, hls::stream<A_t64> &fifo_A_in, hls::stream<A_t64> &fifo_A_local_out) {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  int p0 = idx; // module id\n  A_t64 local_A_ping[11][1];\n  #pragma HLS RESOURCE variable=local_A_ping core=RAM_1P_BRAM\n  A_t64 local_A_pong[11][1];\n  #pragma HLS RESOURCE variable=local_A_pong core=RAM_1P_BRAM\n  bool arb = 0;\n  bool inter_trans_en = 1;\n  bool intra_trans_en = 0;\n  int c0, c0_prev;\n  int c1, c1_prev;\n  int c2, c2_prev;\n  /* Variable Declaration */\n\n  {\n    for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n      for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1)\n        for (ap_uint<5> c2 = 0; c2 <= 15; c2 += 1) {\n          // array\n          // io_L3\n          {\n            if (arb == 0) {\n              A_IO_L2_in_inter_trans_boundary(\n                /* module id */ idx, \n                /* host iter */ c0, \n                /* host iter */ c1, \n                /* host iter */ c2, \n                /* array */ local_A_pong, \n                /* fifo */ fifo_A_in, \n                /* enable */ inter_trans_en\n              );\n              A_IO_L2_in_intra_trans(\n                /* module id */ idx, \n                /* host iter */ c0_prev, \n                /* host iter */ c1_prev, \n                /* host iter */ c2_prev, \n                /* array */ local_A_ping, \n                /* fifo */ fifo_A_local_out, \n                /* enable */ intra_trans_en\n              );\n            } else {\n              A_IO_L2_in_inter_trans_boundary(\n                /* module id */ idx, \n                /* host iter */ c0, \n                /* host iter */ c1, \n                /* host iter */ c2, \n                /* array */ local_A_ping, \n                /* fifo */ fifo_A_in, \n                /* enable */ inter_trans_en\n              );\n              A_IO_L2_in_intra_trans(\n                /* module id */ idx, \n                /* host iter */ c0_prev, \n                /* host iter */ c1_prev, \n                /* host iter */ c2_prev, \n                /* array */ local_A_pong, \n                /* fifo */ fifo_A_local_out, \n                /* enable */ intra_trans_en\n              );\n            }\n            intra_trans_en = 1;\n            arb = !arb;\n            c0_prev = c0;\n            c1_prev = c1;\n            c2_prev = c2;\n          }\n        }\n    if (arb == 0) {\n      A_IO_L2_in_intra_trans(\n        /* module id */ idx, \n        /* host iter */ c0_prev, \n        /* host iter */ c1_prev, \n        /* host iter */ c2_prev, \n        /* array */ local_A_ping, \n        /* fifo */ fifo_A_local_out, \n        /* enable */ intra_trans_en\n      );\n    } else {\n      A_IO_L2_in_intra_trans(\n        /* module id */ idx, \n        /* host iter */ c0_prev, \n        /* host iter */ c1_prev, \n        /* host iter */ c2_prev, \n        /* array */ local_A_pong, \n        /* fifo */ fifo_A_local_out, \n        /* enable */ intra_trans_en\n      );\n    }\n  }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid B_IO_L3_in(hls::stream<B_t64> &fifo_B_serialize, hls::stream<B_t64> &fifo_B_local_out) {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  /* Variable Declaration */\n\n  for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n    for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1)\n      for (ap_uint<5> c2 = 0; c2 <= 15; c2 += 1) {\n        // array\n        // io_L3\n        for (ap_uint<4> c3 = 0; c3 <= 7; c3 += 1) {\n          // io_L2\n          for (ap_uint<6> c4 = 0; c4 <= 31; c4 += 1) {\n          #pragma HLS PIPELINE II=1\n            // access_coalesce\n            // access_serialize\n            {\n              B_t64 in_data;\n              B_t64 out_data;\n              in_data = fifo_B_serialize.read();\n              out_data = in_data;\n              fifo_B_local_out.write(out_data);\n            }\n          }\n        }\n      }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid B_IO_L3_in_serialize(B_t64 *B, hls::stream<B_t64> &fifo_B_local_out) {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  /* Variable Declaration */\n\n  for (ap_uint<17> i = 0; i < 65536; i++) {\n  #pragma HLS PIPELINE II=1\n    B_t64 fifo_data;\n    fifo_data = B[i];\n    fifo_B_local_out.write(fifo_data);\n  }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid B_IO_L2_in_intra_trans(int idx, int c0, int c1, int c2, B_t64 local_B[32][1], hls::stream<B_t64> &fifo_B_local_out, bool intra_trans_en)\n {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  int p0 = idx; // module id\n  /* Variable Declaration */\n\n  if (!intra_trans_en) return;\n\n\n  // io_L2\n  // io_L1\n  // pe\n  // latency\n  for (ap_uint<6> c6 = 0; c6 <= 31; c6 += 1) {\n    // latency\n    for (ap_uint<5> c7 = 0; c7 <= 10; c7 += 1) {\n    #pragma HLS PIPELINE II=1\n      // simd\n      {\n        B_t64 in_data;\n        B_t64 out_data;\n        in_data = local_B[c6][0];\n        out_data = in_data;\n        fifo_B_local_out.write(out_data);\n      }\n    }\n  }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid B_IO_L2_in_inter_trans(int idx, int c0, int c1, int c2, B_t64 local_B[32][1], hls::stream<B_t64> &fifo_B_in, hls::stream<B_t64> &fifo_B_out, bool inter_trans_en)\n {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  int p0 = idx; // module id\n  /* Variable Declaration */\n\n  if (!inter_trans_en) return;\n\n  for (ap_uint<4> c3 = p0; c3 <= 7; c3 += 1) {\n    // io_L2\n    if (c3 == p0) {\n      for (ap_uint<6> c4 = 0; c4 <= 31; c4 += 1) {\n      #pragma HLS PIPELINE II=1\n        // access_coalesce\n        {\n          B_t64 in_data;\n          B_t64 out_data;\n          in_data = fifo_B_in.read();\n          out_data = in_data;\n          local_B[c4][0] = out_data;\n        }\n      }\n    } else {\n      for (ap_uint<6> c4 = 0; c4 <= 31; c4 += 1) {\n      #pragma HLS PIPELINE II=1\n        // access_coalesce\n        {\n          B_t64 in_data;\n          B_t64 out_data;\n          in_data = fifo_B_in.read();\n          out_data = in_data;\n          fifo_B_out.write(out_data);\n        }\n      }\n    }\n  }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid B_IO_L2_in_inter_trans_boundary(int idx, int c0, int c1, int c2, B_t64 local_B[32][1], hls::stream<B_t64> &fifo_B_in, bool inter_trans_en)\n {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  int p0 = idx; // module id\n  /* Variable Declaration */\n\n  if (!inter_trans_en) return;\n\n  for (ap_uint<4> c3 = p0; c3 <= 7; c3 += 1)\n    if (c3 == p0) {\n      // io_L2\n      for (ap_uint<6> c4 = 0; c4 <= 31; c4 += 1) {\n      #pragma HLS PIPELINE II=1\n        // access_coalesce\n        {\n          B_t64 in_data;\n          B_t64 out_data;\n          in_data = fifo_B_in.read();\n          out_data = in_data;\n          local_B[c4][0] = out_data;\n        }\n      }\n    }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid B_IO_L2_in(int idx, hls::stream<B_t64> &fifo_B_in, hls::stream<B_t64> &fifo_B_out, hls::stream<B_t64> &fifo_B_local_out) {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  int p0 = idx; // module id\n  B_t64 local_B_ping[32][1];\n  #pragma HLS RESOURCE variable=local_B_ping core=RAM_1P_BRAM\n  B_t64 local_B_pong[32][1];\n  #pragma HLS RESOURCE variable=local_B_pong core=RAM_1P_BRAM\n  bool arb = 0;\n  bool inter_trans_en = 1;\n  bool intra_trans_en = 0;\n  int c0, c0_prev;\n  int c1, c1_prev;\n  int c2, c2_prev;\n  /* Variable Declaration */\n\n  {\n    for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n      for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1)\n        for (ap_uint<5> c2 = 0; c2 <= 15; c2 += 1) {\n          // array\n          // io_L3\n          {\n            if (arb == 0) {\n              B_IO_L2_in_inter_trans(\n                /* module id */ idx, \n                /* host iter */ c0, \n                /* host iter */ c1, \n                /* host iter */ c2, \n                /* array */ local_B_pong, \n                /* fifo */ fifo_B_in, \n                /* fifo */ fifo_B_out, \n                /* enable */ inter_trans_en\n              );\n              B_IO_L2_in_intra_trans(\n                /* module id */ idx, \n                /* host iter */ c0_prev, \n                /* host iter */ c1_prev, \n                /* host iter */ c2_prev, \n                /* array */ local_B_ping, \n                /* fifo */ fifo_B_local_out, \n                /* enable */ intra_trans_en\n              );\n            } else {\n              B_IO_L2_in_inter_trans(\n                /* module id */ idx, \n                /* host iter */ c0, \n                /* host iter */ c1, \n                /* host iter */ c2, \n                /* array */ local_B_ping, \n                /* fifo */ fifo_B_in, \n                /* fifo */ fifo_B_out, \n                /* enable */ inter_trans_en\n              );\n              B_IO_L2_in_intra_trans(\n                /* module id */ idx, \n                /* host iter */ c0_prev, \n                /* host iter */ c1_prev, \n                /* host iter */ c2_prev, \n                /* array */ local_B_pong, \n                /* fifo */ fifo_B_local_out, \n                /* enable */ intra_trans_en\n              );\n            }\n            intra_trans_en = 1;\n            arb = !arb;\n            c0_prev = c0;\n            c1_prev = c1;\n            c2_prev = c2;\n          }\n        }\n    if (arb == 0) {\n      B_IO_L2_in_intra_trans(\n        /* module id */ idx, \n        /* host iter */ c0_prev, \n        /* host iter */ c1_prev, \n        /* host iter */ c2_prev, \n        /* array */ local_B_ping, \n        /* fifo */ fifo_B_local_out, \n        /* enable */ intra_trans_en\n      );\n    } else {\n      B_IO_L2_in_intra_trans(\n        /* module id */ idx, \n        /* host iter */ c0_prev, \n        /* host iter */ c1_prev, \n        /* host iter */ c2_prev, \n        /* array */ local_B_pong, \n        /* fifo */ fifo_B_local_out, \n        /* enable */ intra_trans_en\n      );\n    }\n  }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid B_IO_L2_in_boundary(int idx, hls::stream<B_t64> &fifo_B_in, hls::stream<B_t64> &fifo_B_local_out) {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  int p0 = idx; // module id\n  B_t64 local_B_ping[32][1];\n  #pragma HLS RESOURCE variable=local_B_ping core=RAM_1P_BRAM\n  B_t64 local_B_pong[32][1];\n  #pragma HLS RESOURCE variable=local_B_pong core=RAM_1P_BRAM\n  bool arb = 0;\n  bool inter_trans_en = 1;\n  bool intra_trans_en = 0;\n  int c0, c0_prev;\n  int c1, c1_prev;\n  int c2, c2_prev;\n  /* Variable Declaration */\n\n  {\n    for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n      for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1)\n        for (ap_uint<5> c2 = 0; c2 <= 15; c2 += 1) {\n          // array\n          // io_L3\n          {\n            if (arb == 0) {\n              B_IO_L2_in_inter_trans_boundary(\n                /* module id */ idx, \n                /* host iter */ c0, \n                /* host iter */ c1, \n                /* host iter */ c2, \n                /* array */ local_B_pong, \n                /* fifo */ fifo_B_in, \n                /* enable */ inter_trans_en\n              );\n              B_IO_L2_in_intra_trans(\n                /* module id */ idx, \n                /* host iter */ c0_prev, \n                /* host iter */ c1_prev, \n                /* host iter */ c2_prev, \n                /* array */ local_B_ping, \n                /* fifo */ fifo_B_local_out, \n                /* enable */ intra_trans_en\n              );\n            } else {\n              B_IO_L2_in_inter_trans_boundary(\n                /* module id */ idx, \n                /* host iter */ c0, \n                /* host iter */ c1, \n                /* host iter */ c2, \n                /* array */ local_B_ping, \n                /* fifo */ fifo_B_in, \n                /* enable */ inter_trans_en\n              );\n              B_IO_L2_in_intra_trans(\n                /* module id */ idx, \n                /* host iter */ c0_prev, \n                /* host iter */ c1_prev, \n                /* host iter */ c2_prev, \n                /* array */ local_B_pong, \n                /* fifo */ fifo_B_local_out, \n                /* enable */ intra_trans_en\n              );\n            }\n            intra_trans_en = 1;\n            arb = !arb;\n            c0_prev = c0;\n            c1_prev = c1;\n            c2_prev = c2;\n          }\n        }\n    if (arb == 0) {\n      B_IO_L2_in_intra_trans(\n        /* module id */ idx, \n        /* host iter */ c0_prev, \n        /* host iter */ c1_prev, \n        /* host iter */ c2_prev, \n        /* array */ local_B_ping, \n        /* fifo */ fifo_B_local_out, \n        /* enable */ intra_trans_en\n      );\n    } else {\n      B_IO_L2_in_intra_trans(\n        /* module id */ idx, \n        /* host iter */ c0_prev, \n        /* host iter */ c1_prev, \n        /* host iter */ c2_prev, \n        /* array */ local_B_pong, \n        /* fifo */ fifo_B_local_out, \n        /* enable */ intra_trans_en\n      );\n    }\n  }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid PE(int idx, int idy, hls::stream<A_t64> &fifo_A_in, hls::stream<A_t64> &fifo_A_out, hls::stream<B_t64> &fifo_B_in, hls::stream<B_t64> &fifo_B_out, hls::stream<char> &fifo_C_drain_out) {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  int p0 = idx, p1 = idy; // module id\n  A_t1 local_A[1][64];\n  #pragma HLS ARRAY_PARTITION variable=local_A dim=0 complete\n  B_t1 local_B[1][64];\n  #pragma HLS ARRAY_PARTITION variable=local_B dim=0 complete\n  C_t1 local_C[11][32];\n  #pragma HLS RESOURCE variable=local_C core=RAM_2P_BRAM\n  /* Variable Declaration */\n\n  for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n    for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1)\n      for (ap_uint<5> c2 = 0; c2 <= 15; c2 += 1) {\n        // array\n        // pe\n        // latency\n        for (ap_uint<6> c6 = 0; c6 <= 31; c6 += 1) {\n          // latency\n          for (ap_uint<5> c7 = 0; c7 <= 10; c7 += 1) {\n          #pragma HLS PIPELINE II=1\n            {\n              {\n                A_t64 fifo_data;\n                fifo_data = fifo_A_in.read();\n                for (ap_uint<7> n = 0; n < 64; n++) {\n                #pragma HLS UNROLL\n                  union {unsigned int ui; char ut;} u;\n                  u.ui = (unsigned int)fifo_data(7, 0);\n                  local_A[0][n] = u.ut;\n                  fifo_data = fifo_data >> 8;\n                }\n              }\n              {\n                B_t64 fifo_data;\n                fifo_data = fifo_B_in.read();\n                for (ap_uint<7> n = 0; n < 64; n++) {\n                #pragma HLS UNROLL\n                  union {unsigned int ui; char ut;} u;\n                  u.ui = (unsigned int)fifo_data(7, 0);\n                  local_B[0][n] = u.ut;\n                  fifo_data = fifo_data >> 8;\n                }\n              }\n              // simd\n              {\n                if (c2 == 0) {\n                  // hls_unroll\n                  local_C[c7][c6] = 0;\n                }\n                //for (ap_uint<7> c8 = 0; c8 <= 63; c8 += 1) {\n                //#pragma HLS UNROLL\n                //  local_C[c7][c6] = (local_C[c7][c6] + (local_A[0][c8] * local_B[0][c8]));\n                //}\n                char mul_5_0_0 = local_A[0][0] * local_B[0][0];\n                char add_5_0 = mul_5_0_0 + local_A[0][1] * local_B[0][1];\n                char mul_5_1_0 = local_A[0][2] * local_B[0][2];\n                char add_5_1 = mul_5_1_0 + local_A[0][3] * local_B[0][3];\n                char mul_5_2_0 = local_A[0][4] * local_B[0][4];\n                char add_5_2 = mul_5_2_0 + local_A[0][5] * local_B[0][5];\n                char mul_5_3_0 = local_A[0][6] * local_B[0][6];\n                char add_5_3 = mul_5_3_0 + local_A[0][7] * local_B[0][7];\n                char mul_5_4_0 = local_A[0][8] * local_B[0][8];\n                char add_5_4 = mul_5_4_0 + local_A[0][9] * local_B[0][9];\n                char mul_5_5_0 = local_A[0][10] * local_B[0][10];\n                char add_5_5 = mul_5_5_0 + local_A[0][11] * local_B[0][11];\n                char mul_5_6_0 = local_A[0][12] * local_B[0][12];\n                char add_5_6 = mul_5_6_0 + local_A[0][13] * local_B[0][13];\n                char mul_5_7_0 = local_A[0][14] * local_B[0][14];\n                char add_5_7 = mul_5_7_0 + local_A[0][15] * local_B[0][15];\n                char mul_5_8_0 = local_A[0][16] * local_B[0][16];\n                char add_5_8 = mul_5_8_0 + local_A[0][17] * local_B[0][17];\n                char mul_5_9_0 = local_A[0][18] * local_B[0][18];\n                char add_5_9 = mul_5_9_0 + local_A[0][19] * local_B[0][19];\n                char mul_5_10_0 = local_A[0][20] * local_B[0][20];\n                char add_5_10 = mul_5_10_0 + local_A[0][21] * local_B[0][21];\n                char mul_5_11_0 = local_A[0][22] * local_B[0][22];\n                char add_5_11 = mul_5_11_0 + local_A[0][23] * local_B[0][23];\n                char mul_5_12_0 = local_A[0][24] * local_B[0][24];\n                char add_5_12 = mul_5_12_0 + local_A[0][25] * local_B[0][25];\n                char mul_5_13_0 = local_A[0][26] * local_B[0][26];\n                char add_5_13 = mul_5_13_0 + local_A[0][27] * local_B[0][27];\n                char mul_5_14_0 = local_A[0][28] * local_B[0][28];\n                char add_5_14 = mul_5_14_0 + local_A[0][29] * local_B[0][29];\n                char mul_5_15_0 = local_A[0][30] * local_B[0][30];\n                char add_5_15 = mul_5_15_0 + local_A[0][31] * local_B[0][31];\n                char mul_5_16_0 = local_A[0][32] * local_B[0][32];\n                char add_5_16 = mul_5_16_0 + local_A[0][33] * local_B[0][33];\n                char mul_5_17_0 = local_A[0][34] * local_B[0][34];\n                char add_5_17 = mul_5_17_0 + local_A[0][35] * local_B[0][35];\n                char mul_5_18_0 = local_A[0][36] * local_B[0][36];\n                char add_5_18 = mul_5_18_0 + local_A[0][37] * local_B[0][37];\n                char mul_5_19_0 = local_A[0][38] * local_B[0][38];\n                char add_5_19 = mul_5_19_0 + local_A[0][39] * local_B[0][39];\n                char mul_5_20_0 = local_A[0][40] * local_B[0][40];\n                char add_5_20 = mul_5_20_0 + local_A[0][41] * local_B[0][41];\n                char mul_5_21_0 = local_A[0][42] * local_B[0][42];\n                char add_5_21 = mul_5_21_0 + local_A[0][43] * local_B[0][43];\n                char mul_5_22_0 = local_A[0][44] * local_B[0][44];\n                char add_5_22 = mul_5_22_0 + local_A[0][45] * local_B[0][45];\n                char mul_5_23_0 = local_A[0][46] * local_B[0][46];\n                char add_5_23 = mul_5_23_0 + local_A[0][47] * local_B[0][47];\n                char mul_5_24_0 = local_A[0][48] * local_B[0][48];\n                char add_5_24 = mul_5_24_0 + local_A[0][49] * local_B[0][49];\n                char mul_5_25_0 = local_A[0][50] * local_B[0][50];\n                char add_5_25 = mul_5_25_0 + local_A[0][51] * local_B[0][51];\n                char mul_5_26_0 = local_A[0][52] * local_B[0][52];\n                char add_5_26 = mul_5_26_0 + local_A[0][53] * local_B[0][53];\n                char mul_5_27_0 = local_A[0][54] * local_B[0][54];\n                char add_5_27 = mul_5_27_0 + local_A[0][55] * local_B[0][55];\n                char mul_5_28_0 = local_A[0][56] * local_B[0][56];\n                char add_5_28 = mul_5_28_0 + local_A[0][57] * local_B[0][57];\n                char mul_5_29_0 = local_A[0][58] * local_B[0][58];\n                char add_5_29 = mul_5_29_0 + local_A[0][59] * local_B[0][59];\n                char mul_5_30_0 = local_A[0][60] * local_B[0][60];\n                char add_5_30 = mul_5_30_0 + local_A[0][61] * local_B[0][61];\n                char mul_5_31_0 = local_A[0][62] * local_B[0][62];\n                char add_5_31 = mul_5_31_0 + local_A[0][63] * local_B[0][63];\n                char add_4_0 = add_5_0 + add_5_1;\n                char add_4_1 = add_5_2 + add_5_3;\n                char add_4_2 = add_5_4 + add_5_5;\n                char add_4_3 = add_5_6 + add_5_7;\n                char add_4_4 = add_5_8 + add_5_9;\n                char add_4_5 = add_5_10 + add_5_11;\n                char add_4_6 = add_5_12 + add_5_13;\n                char add_4_7 = add_5_14 + add_5_15;\n                char add_4_8 = add_5_16 + add_5_17;\n                char add_4_9 = add_5_18 + add_5_19;\n                char add_4_10 = add_5_20 + add_5_21;\n                char add_4_11 = add_5_22 + add_5_23;\n                char add_4_12 = add_5_24 + add_5_25;\n                char add_4_13 = add_5_26 + add_5_27;\n                char add_4_14 = add_5_28 + add_5_29;\n                char add_4_15 = add_5_30 + add_5_31;\n                char add_3_0 = add_4_0 + add_4_1;\n                char add_3_1 = add_4_2 + add_4_3;\n                char add_3_2 = add_4_4 + add_4_5;\n                char add_3_3 = add_4_6 + add_4_7;\n                char add_3_4 = add_4_8 + add_4_9;\n                char add_3_5 = add_4_10 + add_4_11;\n                char add_3_6 = add_4_12 + add_4_13;\n                char add_3_7 = add_4_14 + add_4_15;\n                char add_2_0 = add_3_0 + add_3_1;\n                char add_2_1 = add_3_2 + add_3_3;\n                char add_2_2 = add_3_4 + add_3_5;\n                char add_2_3 = add_3_6 + add_3_7;\n                char add_1_0 = add_2_0 + add_2_1;\n                char add_1_1 = add_2_2 + add_2_3;\n                char add_0_0 = add_1_0 + add_1_1;\n#pragma HLS RESOURCE variable=mul_5_0_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_1_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_2_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_3_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_4_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_5_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_6_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_7_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_8_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_9_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_10_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_11_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_12_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_13_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_14_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_15_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_16_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_17_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_18_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_19_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_20_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_21_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_22_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_23_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_24_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_25_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_26_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_27_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_28_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_29_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_30_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=mul_5_31_0 core=Mul_LUT\n#pragma HLS RESOURCE variable=add_4_0 core=AddSub\n#pragma HLS RESOURCE variable=add_4_1 core=AddSub\n#pragma HLS RESOURCE variable=add_4_2 core=AddSub\n#pragma HLS RESOURCE variable=add_4_3 core=AddSub\n#pragma HLS RESOURCE variable=add_4_4 core=AddSub\n#pragma HLS RESOURCE variable=add_4_5 core=AddSub\n#pragma HLS RESOURCE variable=add_4_6 core=AddSub\n#pragma HLS RESOURCE variable=add_4_7 core=AddSub\n#pragma HLS RESOURCE variable=add_4_8 core=AddSub\n#pragma HLS RESOURCE variable=add_4_9 core=AddSub\n#pragma HLS RESOURCE variable=add_4_10 core=AddSub\n#pragma HLS RESOURCE variable=add_4_11 core=AddSub\n#pragma HLS RESOURCE variable=add_4_12 core=AddSub\n#pragma HLS RESOURCE variable=add_4_13 core=AddSub\n#pragma HLS RESOURCE variable=add_4_14 core=AddSub\n#pragma HLS RESOURCE variable=add_4_15 core=AddSub\n#pragma HLS RESOURCE variable=add_3_0 core=AddSub\n#pragma HLS RESOURCE variable=add_3_1 core=AddSub\n#pragma HLS RESOURCE variable=add_3_2 core=AddSub\n#pragma HLS RESOURCE variable=add_3_3 core=AddSub\n#pragma HLS RESOURCE variable=add_3_4 core=AddSub\n#pragma HLS RESOURCE variable=add_3_5 core=AddSub\n#pragma HLS RESOURCE variable=add_3_6 core=AddSub\n#pragma HLS RESOURCE variable=add_3_7 core=AddSub\n#pragma HLS RESOURCE variable=add_2_0 core=AddSub\n#pragma HLS RESOURCE variable=add_2_1 core=AddSub\n#pragma HLS RESOURCE variable=add_2_2 core=AddSub\n#pragma HLS RESOURCE variable=add_2_3 core=AddSub\n#pragma HLS RESOURCE variable=add_1_0 core=AddSub\n#pragma HLS RESOURCE variable=add_1_1 core=AddSub\n#pragma HLS RESOURCE variable=add_0_0 core=AddSub\n             \n                local_C[c7][c6] += add_0_0;\n               \n              }\n              if (c2 == 15)\n                fifo_C_drain_out.write(local_C[c7][c6]);\n              {\n                B_t64 fifo_data;\n                union {unsigned int ui; char ut;} u63, u62, u61, u60, u59, u58, u57, u56, u55, u54, u53, u52, u51, u50, u49, u48, u47, u46, u45, u44, u43, u42, u41, u40, u39, u38, u37, u36, u35, u34, u33, u32, u31, u30, u29, u28, u27, u26, u25, u24, u23, u22, u21, u20, u19, u18, u17, u16, u15, u14, u13, u12, u11, u10, u9, u8, u7, u6, u5, u4, u3, u2, u1, u0;\n                u63.ut = local_B[0][63];\n                u62.ut = local_B[0][62];\n                u61.ut = local_B[0][61];\n                u60.ut = local_B[0][60];\n                u59.ut = local_B[0][59];\n                u58.ut = local_B[0][58];\n                u57.ut = local_B[0][57];\n                u56.ut = local_B[0][56];\n                u55.ut = local_B[0][55];\n                u54.ut = local_B[0][54];\n                u53.ut = local_B[0][53];\n                u52.ut = local_B[0][52];\n                u51.ut = local_B[0][51];\n                u50.ut = local_B[0][50];\n                u49.ut = local_B[0][49];\n                u48.ut = local_B[0][48];\n                u47.ut = local_B[0][47];\n                u46.ut = local_B[0][46];\n                u45.ut = local_B[0][45];\n                u44.ut = local_B[0][44];\n                u43.ut = local_B[0][43];\n                u42.ut = local_B[0][42];\n                u41.ut = local_B[0][41];\n                u40.ut = local_B[0][40];\n                u39.ut = local_B[0][39];\n                u38.ut = local_B[0][38];\n                u37.ut = local_B[0][37];\n                u36.ut = local_B[0][36];\n                u35.ut = local_B[0][35];\n                u34.ut = local_B[0][34];\n                u33.ut = local_B[0][33];\n                u32.ut = local_B[0][32];\n                u31.ut = local_B[0][31];\n                u30.ut = local_B[0][30];\n                u29.ut = local_B[0][29];\n                u28.ut = local_B[0][28];\n                u27.ut = local_B[0][27];\n                u26.ut = local_B[0][26];\n                u25.ut = local_B[0][25];\n                u24.ut = local_B[0][24];\n                u23.ut = local_B[0][23];\n                u22.ut = local_B[0][22];\n                u21.ut = local_B[0][21];\n                u20.ut = local_B[0][20];\n                u19.ut = local_B[0][19];\n                u18.ut = local_B[0][18];\n                u17.ut = local_B[0][17];\n                u16.ut = local_B[0][16];\n                u15.ut = local_B[0][15];\n                u14.ut = local_B[0][14];\n                u13.ut = local_B[0][13];\n                u12.ut = local_B[0][12];\n                u11.ut = local_B[0][11];\n                u10.ut = local_B[0][10];\n                u9.ut = local_B[0][9];\n                u8.ut = local_B[0][8];\n                u7.ut = local_B[0][7];\n                u6.ut = local_B[0][6];\n                u5.ut = local_B[0][5];\n                u4.ut = local_B[0][4];\n                u3.ut = local_B[0][3];\n                u2.ut = local_B[0][2];\n                u1.ut = local_B[0][1];\n                u0.ut = local_B[0][0];\n                fifo_data = (ap_uint<8>(u63.ui), ap_uint<8>(u62.ui), ap_uint<8>(u61.ui), ap_uint<8>(u60.ui), ap_uint<8>(u59.ui), ap_uint<8>(u58.ui), ap_uint<8>(u57.ui), ap_uint<8>(u56.ui), ap_uint<8>(u55.ui), ap_uint<8>(u54.ui), ap_uint<8>(u53.ui), ap_uint<8>(u52.ui), ap_uint<8>(u51.ui), ap_uint<8>(u50.ui), ap_uint<8>(u49.ui), ap_uint<8>(u48.ui), ap_uint<8>(u47.ui), ap_uint<8>(u46.ui), ap_uint<8>(u45.ui), ap_uint<8>(u44.ui), ap_uint<8>(u43.ui), ap_uint<8>(u42.ui), ap_uint<8>(u41.ui), ap_uint<8>(u40.ui), ap_uint<8>(u39.ui), ap_uint<8>(u38.ui), ap_uint<8>(u37.ui), ap_uint<8>(u36.ui), ap_uint<8>(u35.ui), ap_uint<8>(u34.ui), ap_uint<8>(u33.ui), ap_uint<8>(u32.ui), ap_uint<8>(u31.ui), ap_uint<8>(u30.ui), ap_uint<8>(u29.ui), ap_uint<8>(u28.ui), ap_uint<8>(u27.ui), ap_uint<8>(u26.ui), ap_uint<8>(u25.ui), ap_uint<8>(u24.ui), ap_uint<8>(u23.ui), ap_uint<8>(u22.ui), ap_uint<8>(u21.ui), ap_uint<8>(u20.ui), ap_uint<8>(u19.ui), ap_uint<8>(u18.ui), ap_uint<8>(u17.ui), ap_uint<8>(u16.ui), ap_uint<8>(u15.ui), ap_uint<8>(u14.ui), ap_uint<8>(u13.ui), ap_uint<8>(u12.ui), ap_uint<8>(u11.ui), ap_uint<8>(u10.ui), ap_uint<8>(u9.ui), ap_uint<8>(u8.ui), ap_uint<8>(u7.ui), ap_uint<8>(u6.ui), ap_uint<8>(u5.ui), ap_uint<8>(u4.ui), ap_uint<8>(u3.ui), ap_uint<8>(u2.ui), ap_uint<8>(u1.ui), ap_uint<8>(u0.ui));\n                fifo_B_out.write(fifo_data);\n              }\n              {\n                A_t64 fifo_data;\n                union {unsigned int ui; char ut;} u63, u62, u61, u60, u59, u58, u57, u56, u55, u54, u53, u52, u51, u50, u49, u48, u47, u46, u45, u44, u43, u42, u41, u40, u39, u38, u37, u36, u35, u34, u33, u32, u31, u30, u29, u28, u27, u26, u25, u24, u23, u22, u21, u20, u19, u18, u17, u16, u15, u14, u13, u12, u11, u10, u9, u8, u7, u6, u5, u4, u3, u2, u1, u0;\n                u63.ut = local_A[0][63];\n                u62.ut = local_A[0][62];\n                u61.ut = local_A[0][61];\n                u60.ut = local_A[0][60];\n                u59.ut = local_A[0][59];\n                u58.ut = local_A[0][58];\n                u57.ut = local_A[0][57];\n                u56.ut = local_A[0][56];\n                u55.ut = local_A[0][55];\n                u54.ut = local_A[0][54];\n                u53.ut = local_A[0][53];\n                u52.ut = local_A[0][52];\n                u51.ut = local_A[0][51];\n                u50.ut = local_A[0][50];\n                u49.ut = local_A[0][49];\n                u48.ut = local_A[0][48];\n                u47.ut = local_A[0][47];\n                u46.ut = local_A[0][46];\n                u45.ut = local_A[0][45];\n                u44.ut = local_A[0][44];\n                u43.ut = local_A[0][43];\n                u42.ut = local_A[0][42];\n                u41.ut = local_A[0][41];\n                u40.ut = local_A[0][40];\n                u39.ut = local_A[0][39];\n                u38.ut = local_A[0][38];\n                u37.ut = local_A[0][37];\n                u36.ut = local_A[0][36];\n                u35.ut = local_A[0][35];\n                u34.ut = local_A[0][34];\n                u33.ut = local_A[0][33];\n                u32.ut = local_A[0][32];\n                u31.ut = local_A[0][31];\n                u30.ut = local_A[0][30];\n                u29.ut = local_A[0][29];\n                u28.ut = local_A[0][28];\n                u27.ut = local_A[0][27];\n                u26.ut = local_A[0][26];\n                u25.ut = local_A[0][25];\n                u24.ut = local_A[0][24];\n                u23.ut = local_A[0][23];\n                u22.ut = local_A[0][22];\n                u21.ut = local_A[0][21];\n                u20.ut = local_A[0][20];\n                u19.ut = local_A[0][19];\n                u18.ut = local_A[0][18];\n                u17.ut = local_A[0][17];\n                u16.ut = local_A[0][16];\n                u15.ut = local_A[0][15];\n                u14.ut = local_A[0][14];\n                u13.ut = local_A[0][13];\n                u12.ut = local_A[0][12];\n                u11.ut = local_A[0][11];\n                u10.ut = local_A[0][10];\n                u9.ut = local_A[0][9];\n                u8.ut = local_A[0][8];\n                u7.ut = local_A[0][7];\n                u6.ut = local_A[0][6];\n                u5.ut = local_A[0][5];\n                u4.ut = local_A[0][4];\n                u3.ut = local_A[0][3];\n                u2.ut = local_A[0][2];\n                u1.ut = local_A[0][1];\n                u0.ut = local_A[0][0];\n                fifo_data = (ap_uint<8>(u63.ui), ap_uint<8>(u62.ui), ap_uint<8>(u61.ui), ap_uint<8>(u60.ui), ap_uint<8>(u59.ui), ap_uint<8>(u58.ui), ap_uint<8>(u57.ui), ap_uint<8>(u56.ui), ap_uint<8>(u55.ui), ap_uint<8>(u54.ui), ap_uint<8>(u53.ui), ap_uint<8>(u52.ui), ap_uint<8>(u51.ui), ap_uint<8>(u50.ui), ap_uint<8>(u49.ui), ap_uint<8>(u48.ui), ap_uint<8>(u47.ui), ap_uint<8>(u46.ui), ap_uint<8>(u45.ui), ap_uint<8>(u44.ui), ap_uint<8>(u43.ui), ap_uint<8>(u42.ui), ap_uint<8>(u41.ui), ap_uint<8>(u40.ui), ap_uint<8>(u39.ui), ap_uint<8>(u38.ui), ap_uint<8>(u37.ui), ap_uint<8>(u36.ui), ap_uint<8>(u35.ui), ap_uint<8>(u34.ui), ap_uint<8>(u33.ui), ap_uint<8>(u32.ui), ap_uint<8>(u31.ui), ap_uint<8>(u30.ui), ap_uint<8>(u29.ui), ap_uint<8>(u28.ui), ap_uint<8>(u27.ui), ap_uint<8>(u26.ui), ap_uint<8>(u25.ui), ap_uint<8>(u24.ui), ap_uint<8>(u23.ui), ap_uint<8>(u22.ui), ap_uint<8>(u21.ui), ap_uint<8>(u20.ui), ap_uint<8>(u19.ui), ap_uint<8>(u18.ui), ap_uint<8>(u17.ui), ap_uint<8>(u16.ui), ap_uint<8>(u15.ui), ap_uint<8>(u14.ui), ap_uint<8>(u13.ui), ap_uint<8>(u12.ui), ap_uint<8>(u11.ui), ap_uint<8>(u10.ui), ap_uint<8>(u9.ui), ap_uint<8>(u8.ui), ap_uint<8>(u7.ui), ap_uint<8>(u6.ui), ap_uint<8>(u5.ui), ap_uint<8>(u4.ui), ap_uint<8>(u3.ui), ap_uint<8>(u2.ui), ap_uint<8>(u1.ui), ap_uint<8>(u0.ui));\n                fifo_A_out.write(fifo_data);\n              }\n            }\n          }\n        }\n      }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid PE_wrapper(int idx, int idy, hls::stream<A_t64> &fifo_A_in, hls::stream<A_t64> &fifo_A_out, hls::stream<B_t64> &fifo_B_in, hls::stream<B_t64> &fifo_B_out, hls::stream<char> &fifo_C_drain_out)\n {\n  PE(\n    /* module id */ idx, \n    /* module id */ idy, \n    /* fifo */ fifo_A_in, \n    /* fifo */ fifo_A_out, \n    /* fifo */ fifo_B_in, \n    /* fifo */ fifo_B_out, \n    /* fifo */ fifo_C_drain_out);\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid A_PE_dummy_in(int idx, int idy, hls::stream<A_t64> &fifo_A_in) {\n  /* Variable Declaration */\n  int p0 = idx, p1 = idy; // module id\n  /* Variable Declaration */\n\n  for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n    for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1)\n      for (ap_uint<5> c2 = 0; c2 <= 15; c2 += 1) {\n        // array\n        // pe\n        // latency\n        for (ap_uint<6> c6 = 0; c6 <= 31; c6 += 1) {\n          // latency\n          for (ap_uint<5> c7 = 0; c7 <= 10; c7 += 1) {\n          #pragma HLS PIPELINE II=1\n            A_t64 fifo_data;\n            fifo_data = fifo_A_in.read();\n          }\n        }\n      }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid B_PE_dummy_in(int idx, int idy, hls::stream<B_t64> &fifo_B_in) {\n  /* Variable Declaration */\n  int p0 = idx, p1 = idy; // module id\n  /* Variable Declaration */\n\n  for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n    for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1)\n      for (ap_uint<5> c2 = 0; c2 <= 15; c2 += 1) {\n        // array\n        // pe\n        // latency\n        for (ap_uint<6> c6 = 0; c6 <= 31; c6 += 1) {\n          // latency\n          for (ap_uint<5> c7 = 0; c7 <= 10; c7 += 1) {\n          #pragma HLS PIPELINE II=1\n            B_t64 fifo_data;\n            fifo_data = fifo_B_in.read();\n          }\n        }\n      }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid C_drain_IO_L1_out_intra_trans(int idx, int idy, int c0, int c1, C_t32 local_C[11][1], hls::stream<char> &fifo_C_drain_local_in)\n {\n#pragma HLS INLINE\n  /* Variable Declaration */\n  int p0 = idx, p1 = idy; // module id\n  ap_uint<8> data_split[32];\n  #pragma HLS ARRAY_PARTITION variable=data_split complete\n  /* Variable Declaration */\n\n\n  // io_L1\n  // pe\n  // latency\n  for (ap_uint<6> c6 = 0; c6 <= 31; c6 += 1) {\n    // latency\n    for (ap_uint<5> c7 = 0; c7 <= 10; c7 += 1) {\n    #pragma HLS PIPELINE II=1\n      // simd\n      {\n        C_t1 in_data;\n        C_t32 out_data;\n        in_data = fifo_C_drain_local_in.read();\n        int split_idx = (c6) % 32;\n        out_data = local_C[c7][c6 / 32];\n        for (ap_uint<6> n = 0; n < 32; n++) {\n        #pragma HLS UNROLL\n          data_split[n] = out_data(7, 0);\n          out_data = out_data >> 8;\n        }\n        union {unsigned int ui; char ut;} u;\n        u.ut = in_data;\n        data_split[split_idx] = ap_uint<8>(u.ui);\n        out_data = (data_split[31], data_split[30], data_split[29], data_split[28], data_split[27], data_split[26], data_split[25], data_split[24], data_split[23], data_split[22], data_split[21], data_split[20], data_split[19], data_split[18], data_split[17], data_split[16], data_split[15], data_split[14], data_split[13], data_split[12], data_split[11], data_split[10], data_split[9], data_split[8], data_split[7], data_split[6], data_split[5], data_split[4], data_split[3], data_split[2], data_split[1], data_split[0]);\n        local_C[c7][c6 / 32] = out_data;\n      }\n    }\n  }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid C_drain_IO_L1_out_inter_trans(int idx, int idy, int c0, int c1, C_t32 local_C[11][1], hls::stream<C_t32> &fifo_C_drain_in, hls::stream<C_t32> &fifo_C_drain_out)\n {\n#pragma HLS INLINE\n  /* Variable Declaration */\n  int p0 = idx, p1 = idy; // module id\n  /* Variable Declaration */\n\n  for (ap_uint<6> c4 = p1; c4 <= 23; c4 += 1) {\n    // io_L1\n    if (c4 == p1) {\n      for (ap_uint<5> c5 = 0; c5 <= 10; c5 += 1) {\n      #pragma HLS PIPELINE II=1\n        // access_coalesce\n        {\n          C_t32 in_data;\n          C_t32 out_data;\n          in_data = local_C[c5][0];\n          out_data = in_data;\n          fifo_C_drain_out.write(out_data);\n        }\n      }\n    } else {\n      for (ap_uint<5> c5 = 0; c5 <= 10; c5 += 1) {\n      #pragma HLS PIPELINE II=1\n        // access_coalesce\n        {\n          C_t32 in_data;\n          C_t32 out_data;\n          in_data = fifo_C_drain_in.read();\n          out_data = in_data;\n          fifo_C_drain_out.write(out_data);\n        }\n      }\n    }\n  }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid C_drain_IO_L1_out_inter_trans_boundary(int idx, int idy, int c0, int c1, C_t32 local_C[11][1], hls::stream<C_t32> &fifo_C_drain_out)\n {\n#pragma HLS INLINE\n  /* Variable Declaration */\n  int p0 = idx, p1 = idy; // module id\n  /* Variable Declaration */\n\n  for (ap_uint<6> c4 = p1; c4 <= 23; c4 += 1)\n    if (c4 == p1) {\n      // io_L1\n      for (ap_uint<5> c5 = 0; c5 <= 10; c5 += 1) {\n      #pragma HLS PIPELINE II=1\n        // access_coalesce\n        {\n          C_t32 in_data;\n          C_t32 out_data;\n          in_data = local_C[c5][0];\n          out_data = in_data;\n          fifo_C_drain_out.write(out_data);\n        }\n      }\n    }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid C_drain_IO_L1_out(int idx, int idy, hls::stream<C_t32> &fifo_C_drain_in, hls::stream<C_t32> &fifo_C_drain_out, hls::stream<char> &fifo_C_drain_local_in) {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  int p0 = idx, p1 = idy; // module id\n  C_t32 local_C[11][1];\n  #pragma HLS RESOURCE variable=local_C core=RAM_2P_BRAM\n  /* Variable Declaration */\n\n  for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n    for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1) {\n      // array\n      // io_L3\n      // io_L2\n      C_drain_IO_L1_out_intra_trans(\n        /* module id */ idx, \n        /* module id */ idy, \n        /* host iter */ c0, \n        /* host iter */ c1, \n        /* array */ local_C, \n        /* fifo */ fifo_C_drain_local_in\n      );\n      C_drain_IO_L1_out_inter_trans(\n        /* module id */ idx, \n        /* module id */ idy, \n        /* host iter */ c0, \n        /* host iter */ c1, \n        /* array */ local_C, \n        /* fifo */ fifo_C_drain_in, \n        /* fifo */ fifo_C_drain_out\n      );\n    }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid C_drain_IO_L1_out_wrapper(int idx, int idy, hls::stream<C_t32> &fifo_C_drain_in, hls::stream<C_t32> &fifo_C_drain_out, hls::stream<char> &fifo_C_drain_local_in)\n {\n  C_drain_IO_L1_out(\n    /* module id */ idx, \n    /* module id */ idy, \n    /* fifo */ fifo_C_drain_in, \n    /* fifo */ fifo_C_drain_out, \n    /* fifo */ fifo_C_drain_local_in);\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid C_drain_IO_L1_out_boundary(int idx, int idy, hls::stream<C_t32> &fifo_C_drain_out, hls::stream<char> &fifo_C_drain_local_in) {\n#pragma HLS INLINE\n  /* Variable Declaration */\n  int p0 = idx, p1 = idy; // module id\n  C_t32 local_C[11][1];\n  #pragma HLS RESOURCE variable=local_C core=RAM_2P_BRAM\n  /* Variable Declaration */\n\n  for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n    for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1) {\n      // array\n      // io_L3\n      // io_L2\n      C_drain_IO_L1_out_intra_trans(\n        /* module id */ idx, \n        /* module id */ idy, \n        /* host iter */ c0, \n        /* host iter */ c1, \n        /* array */ local_C, \n        /* fifo */ fifo_C_drain_local_in\n      );\n      C_drain_IO_L1_out_inter_trans_boundary(\n        /* module id */ idx, \n        /* module id */ idy, \n        /* host iter */ c0, \n        /* host iter */ c1, \n        /* array */ local_C, \n        /* fifo */ fifo_C_drain_out\n      );\n    }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid C_drain_IO_L1_out_boundary_wrapper(int idx, int idy, hls::stream<C_t32> &fifo_C_drain_out, hls::stream<char> &fifo_C_drain_local_in)\n {\n  C_drain_IO_L1_out_boundary(\n    /* module id */ idx, \n    /* module id */ idy, \n    /* fifo */ fifo_C_drain_out, \n    /* fifo */ fifo_C_drain_local_in);\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid C_drain_IO_L2_out(int idx, hls::stream<C_t32> &fifo_C_drain_in, hls::stream<C_t32> &fifo_C_drain_out, hls::stream<C_t32> &fifo_C_drain_local_in) {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  int p0 = idx; // module id\n  /* Variable Declaration */\n\n  for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n    for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1) {\n      // array\n      // io_L3\n      for (ap_uint<4> c3 = p0; c3 <= 7; c3 += 1) {\n        // io_L2\n        if (c3 == p0) {\n          for (ap_uint<6> c4 = 0; c4 <= 23; c4 += 1) {\n            // io_L1\n            for (ap_uint<5> c5 = 0; c5 <= 10; c5 += 1) {\n            #pragma HLS PIPELINE II=1\n              // access_coalesce\n              {\n                C_t32 in_data;\n                C_t32 out_data;\n                in_data = fifo_C_drain_local_in.read();\n                out_data = in_data;\n                fifo_C_drain_out.write(out_data);\n              }\n            }\n          }\n        } else {\n          for (ap_uint<6> c4 = 0; c4 <= 23; c4 += 1) {\n            // io_L1\n            for (ap_uint<5> c5 = 0; c5 <= 10; c5 += 1) {\n            #pragma HLS PIPELINE II=1\n              // access_coalesce\n              {\n                C_t32 in_data;\n                C_t32 out_data;\n                in_data = fifo_C_drain_in.read();\n                out_data = in_data;\n                fifo_C_drain_out.write(out_data);\n              }\n            }\n          }\n        }\n      }\n    }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid C_drain_IO_L2_out_boundary(int idx, hls::stream<C_t32> &fifo_C_drain_out, hls::stream<C_t32> &fifo_C_drain_local_in) {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  int p0 = idx; // module id\n  /* Variable Declaration */\n\n  for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n    for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1) {\n      // array\n      // io_L3\n      for (ap_uint<4> c3 = p0; c3 <= 7; c3 += 1)\n        if (c3 == p0) {\n          // io_L2\n          for (ap_uint<6> c4 = 0; c4 <= 23; c4 += 1) {\n            // io_L1\n            for (ap_uint<5> c5 = 0; c5 <= 10; c5 += 1) {\n            #pragma HLS PIPELINE II=1\n              // access_coalesce\n              {\n                C_t32 in_data;\n                C_t32 out_data;\n                in_data = fifo_C_drain_local_in.read();\n                out_data = in_data;\n                fifo_C_drain_out.write(out_data);\n              }\n            }\n          }\n        }\n    }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid C_drain_IO_L3_out(hls::stream<C_t32> &fifo_C_drain_serialize, hls::stream<C_t32> &fifo_C_drain_local_in) {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  /* Variable Declaration */\n\n  for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n    for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1) {\n      // array\n      // io_L3\n      for (ap_uint<4> c3 = 0; c3 <= 7; c3 += 1) {\n        // io_L2\n        for (ap_uint<6> c4 = 0; c4 <= 23; c4 += 1) {\n          // io_L1\n          // pe\n          for (ap_uint<5> c5 = 0; c5 <= 10; c5 += 1) {\n          #pragma HLS PIPELINE II=1\n            // access_coalesce\n            // access_serialize\n            {\n              C_t32 in_data;\n              C_t32 out_data;\n              in_data = fifo_C_drain_local_in.read();\n              out_data = in_data;\n              fifo_C_drain_serialize.write(out_data);\n            }\n          }\n        }\n      }\n    }\n}\n/* Module Definition */\n\n/* Module Definition */\nvoid C_drain_IO_L3_out_serialize(C_t32 *C, hls::stream<C_t32> &fifo_C_drain_local_in) {\n#pragma HLS INLINE OFF\n  /* Variable Declaration */\n  /* Variable Declaration */\n\n  for (ap_uint<17> i = 0; i < 33792; i++) {\n  #pragma HLS PIPELINE II=1\n    C_t32 fifo_data;\n    fifo_data = fifo_C_drain_local_in.read();\n    C[i] = fifo_data;\n  }\n}\n/* Module Definition */\n\nextern \"C\" {\nvoid kernel0(A_t64 *A, B_t64 *B, C_t32 *C)\n{\n#pragma HLS INTERFACE m_axi port=A offset=slave bundle=gmem_A\n#pragma HLS INTERFACE m_axi port=B offset=slave bundle=gmem_B\n#pragma HLS INTERFACE m_axi port=C offset=slave bundle=gmem_C\n#pragma HLS INTERFACE s_axilite port=A bundle=control\n#pragma HLS INTERFACE s_axilite port=B bundle=control\n#pragma HLS INTERFACE s_axilite port=C bundle=control\n#pragma HLS INTERFACE s_axilite port=return bundle=control\n\n#pragma HLS DATAFLOW\n#pragma HLS dataflow disable_start_propagation\n\n  /* FIFO Declaration */\n  /* A_IO_L3_in_serialize fifo */ hls::stream<A_t64> fifo_A_A_IO_L3_in_serialize;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L3_in_serialize depth=2\n  /* B_IO_L3_in_serialize fifo */ hls::stream<B_t64> fifo_B_B_IO_L3_in_serialize;\n  #pragma HLS STREAM variable=fifo_B_B_IO_L3_in_serialize depth=2\n  /* C_drain_IO_L3_out_serialize fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L3_out_serialize;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L3_out_serialize depth=2\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_0;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_0 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_1;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_1 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_2;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_2 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_3;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_3 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_4;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_4 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_5;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_5 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_6;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_6 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_7;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_7 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_8;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_8 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_9;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_9 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_9 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_10;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_10 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_10 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_11;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_11 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_11 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_12;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_12 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_12 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_13;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_13 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_13 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_14;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_14 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_14 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_15;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_15 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_15 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_16;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_16 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_16 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_17;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_17 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_17 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_18;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_18 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_18 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_19;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_19 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_19 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_20;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_20 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_20 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_21;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_21 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_21 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_22;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_22 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_22 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_23;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_23 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_23 core=FIFO_SRL\n  /* A_IO_L2_in fifo */ hls::stream<A_t64> fifo_A_A_IO_L2_in_24;\n  #pragma HLS STREAM variable=fifo_A_A_IO_L2_in_24 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_A_IO_L2_in_24 core=FIFO_SRL\n  /* B_IO_L2_in fifo */ hls::stream<B_t64> fifo_B_B_IO_L2_in_0;\n  #pragma HLS STREAM variable=fifo_B_B_IO_L2_in_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_B_IO_L2_in_0 core=FIFO_SRL\n  /* B_IO_L2_in fifo */ hls::stream<B_t64> fifo_B_B_IO_L2_in_1;\n  #pragma HLS STREAM variable=fifo_B_B_IO_L2_in_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_B_IO_L2_in_1 core=FIFO_SRL\n  /* B_IO_L2_in fifo */ hls::stream<B_t64> fifo_B_B_IO_L2_in_2;\n  #pragma HLS STREAM variable=fifo_B_B_IO_L2_in_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_B_IO_L2_in_2 core=FIFO_SRL\n  /* B_IO_L2_in fifo */ hls::stream<B_t64> fifo_B_B_IO_L2_in_3;\n  #pragma HLS STREAM variable=fifo_B_B_IO_L2_in_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_B_IO_L2_in_3 core=FIFO_SRL\n  /* B_IO_L2_in fifo */ hls::stream<B_t64> fifo_B_B_IO_L2_in_4;\n  #pragma HLS STREAM variable=fifo_B_B_IO_L2_in_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_B_IO_L2_in_4 core=FIFO_SRL\n  /* B_IO_L2_in fifo */ hls::stream<B_t64> fifo_B_B_IO_L2_in_5;\n  #pragma HLS STREAM variable=fifo_B_B_IO_L2_in_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_B_IO_L2_in_5 core=FIFO_SRL\n  /* B_IO_L2_in fifo */ hls::stream<B_t64> fifo_B_B_IO_L2_in_6;\n  #pragma HLS STREAM variable=fifo_B_B_IO_L2_in_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_B_IO_L2_in_6 core=FIFO_SRL\n  /* B_IO_L2_in fifo */ hls::stream<B_t64> fifo_B_B_IO_L2_in_7;\n  #pragma HLS STREAM variable=fifo_B_B_IO_L2_in_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_B_IO_L2_in_7 core=FIFO_SRL\n  /* B_IO_L2_in fifo */ hls::stream<B_t64> fifo_B_B_IO_L2_in_8;\n  #pragma HLS STREAM variable=fifo_B_B_IO_L2_in_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_B_IO_L2_in_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_0_0;\n  #pragma HLS STREAM variable=fifo_A_PE_0_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_0_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_0_1;\n  #pragma HLS STREAM variable=fifo_A_PE_0_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_0_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_0_2;\n  #pragma HLS STREAM variable=fifo_A_PE_0_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_0_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_0_3;\n  #pragma HLS STREAM variable=fifo_A_PE_0_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_0_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_0_4;\n  #pragma HLS STREAM variable=fifo_A_PE_0_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_0_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_0_5;\n  #pragma HLS STREAM variable=fifo_A_PE_0_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_0_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_0_6;\n  #pragma HLS STREAM variable=fifo_A_PE_0_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_0_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_0_7;\n  #pragma HLS STREAM variable=fifo_A_PE_0_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_0_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_0_8;\n  #pragma HLS STREAM variable=fifo_A_PE_0_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_0_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_1_0;\n  #pragma HLS STREAM variable=fifo_A_PE_1_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_1_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_1_1;\n  #pragma HLS STREAM variable=fifo_A_PE_1_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_1_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_1_2;\n  #pragma HLS STREAM variable=fifo_A_PE_1_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_1_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_1_3;\n  #pragma HLS STREAM variable=fifo_A_PE_1_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_1_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_1_4;\n  #pragma HLS STREAM variable=fifo_A_PE_1_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_1_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_1_5;\n  #pragma HLS STREAM variable=fifo_A_PE_1_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_1_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_1_6;\n  #pragma HLS STREAM variable=fifo_A_PE_1_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_1_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_1_7;\n  #pragma HLS STREAM variable=fifo_A_PE_1_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_1_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_1_8;\n  #pragma HLS STREAM variable=fifo_A_PE_1_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_1_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_2_0;\n  #pragma HLS STREAM variable=fifo_A_PE_2_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_2_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_2_1;\n  #pragma HLS STREAM variable=fifo_A_PE_2_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_2_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_2_2;\n  #pragma HLS STREAM variable=fifo_A_PE_2_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_2_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_2_3;\n  #pragma HLS STREAM variable=fifo_A_PE_2_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_2_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_2_4;\n  #pragma HLS STREAM variable=fifo_A_PE_2_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_2_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_2_5;\n  #pragma HLS STREAM variable=fifo_A_PE_2_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_2_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_2_6;\n  #pragma HLS STREAM variable=fifo_A_PE_2_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_2_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_2_7;\n  #pragma HLS STREAM variable=fifo_A_PE_2_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_2_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_2_8;\n  #pragma HLS STREAM variable=fifo_A_PE_2_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_2_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_3_0;\n  #pragma HLS STREAM variable=fifo_A_PE_3_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_3_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_3_1;\n  #pragma HLS STREAM variable=fifo_A_PE_3_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_3_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_3_2;\n  #pragma HLS STREAM variable=fifo_A_PE_3_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_3_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_3_3;\n  #pragma HLS STREAM variable=fifo_A_PE_3_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_3_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_3_4;\n  #pragma HLS STREAM variable=fifo_A_PE_3_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_3_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_3_5;\n  #pragma HLS STREAM variable=fifo_A_PE_3_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_3_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_3_6;\n  #pragma HLS STREAM variable=fifo_A_PE_3_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_3_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_3_7;\n  #pragma HLS STREAM variable=fifo_A_PE_3_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_3_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_3_8;\n  #pragma HLS STREAM variable=fifo_A_PE_3_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_3_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_4_0;\n  #pragma HLS STREAM variable=fifo_A_PE_4_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_4_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_4_1;\n  #pragma HLS STREAM variable=fifo_A_PE_4_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_4_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_4_2;\n  #pragma HLS STREAM variable=fifo_A_PE_4_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_4_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_4_3;\n  #pragma HLS STREAM variable=fifo_A_PE_4_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_4_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_4_4;\n  #pragma HLS STREAM variable=fifo_A_PE_4_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_4_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_4_5;\n  #pragma HLS STREAM variable=fifo_A_PE_4_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_4_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_4_6;\n  #pragma HLS STREAM variable=fifo_A_PE_4_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_4_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_4_7;\n  #pragma HLS STREAM variable=fifo_A_PE_4_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_4_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_4_8;\n  #pragma HLS STREAM variable=fifo_A_PE_4_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_4_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_5_0;\n  #pragma HLS STREAM variable=fifo_A_PE_5_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_5_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_5_1;\n  #pragma HLS STREAM variable=fifo_A_PE_5_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_5_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_5_2;\n  #pragma HLS STREAM variable=fifo_A_PE_5_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_5_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_5_3;\n  #pragma HLS STREAM variable=fifo_A_PE_5_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_5_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_5_4;\n  #pragma HLS STREAM variable=fifo_A_PE_5_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_5_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_5_5;\n  #pragma HLS STREAM variable=fifo_A_PE_5_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_5_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_5_6;\n  #pragma HLS STREAM variable=fifo_A_PE_5_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_5_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_5_7;\n  #pragma HLS STREAM variable=fifo_A_PE_5_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_5_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_5_8;\n  #pragma HLS STREAM variable=fifo_A_PE_5_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_5_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_6_0;\n  #pragma HLS STREAM variable=fifo_A_PE_6_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_6_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_6_1;\n  #pragma HLS STREAM variable=fifo_A_PE_6_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_6_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_6_2;\n  #pragma HLS STREAM variable=fifo_A_PE_6_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_6_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_6_3;\n  #pragma HLS STREAM variable=fifo_A_PE_6_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_6_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_6_4;\n  #pragma HLS STREAM variable=fifo_A_PE_6_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_6_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_6_5;\n  #pragma HLS STREAM variable=fifo_A_PE_6_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_6_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_6_6;\n  #pragma HLS STREAM variable=fifo_A_PE_6_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_6_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_6_7;\n  #pragma HLS STREAM variable=fifo_A_PE_6_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_6_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_6_8;\n  #pragma HLS STREAM variable=fifo_A_PE_6_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_6_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_7_0;\n  #pragma HLS STREAM variable=fifo_A_PE_7_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_7_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_7_1;\n  #pragma HLS STREAM variable=fifo_A_PE_7_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_7_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_7_2;\n  #pragma HLS STREAM variable=fifo_A_PE_7_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_7_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_7_3;\n  #pragma HLS STREAM variable=fifo_A_PE_7_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_7_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_7_4;\n  #pragma HLS STREAM variable=fifo_A_PE_7_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_7_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_7_5;\n  #pragma HLS STREAM variable=fifo_A_PE_7_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_7_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_7_6;\n  #pragma HLS STREAM variable=fifo_A_PE_7_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_7_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_7_7;\n  #pragma HLS STREAM variable=fifo_A_PE_7_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_7_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_7_8;\n  #pragma HLS STREAM variable=fifo_A_PE_7_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_7_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_8_0;\n  #pragma HLS STREAM variable=fifo_A_PE_8_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_8_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_8_1;\n  #pragma HLS STREAM variable=fifo_A_PE_8_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_8_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_8_2;\n  #pragma HLS STREAM variable=fifo_A_PE_8_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_8_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_8_3;\n  #pragma HLS STREAM variable=fifo_A_PE_8_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_8_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_8_4;\n  #pragma HLS STREAM variable=fifo_A_PE_8_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_8_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_8_5;\n  #pragma HLS STREAM variable=fifo_A_PE_8_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_8_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_8_6;\n  #pragma HLS STREAM variable=fifo_A_PE_8_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_8_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_8_7;\n  #pragma HLS STREAM variable=fifo_A_PE_8_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_8_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_8_8;\n  #pragma HLS STREAM variable=fifo_A_PE_8_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_8_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_9_0;\n  #pragma HLS STREAM variable=fifo_A_PE_9_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_9_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_9_1;\n  #pragma HLS STREAM variable=fifo_A_PE_9_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_9_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_9_2;\n  #pragma HLS STREAM variable=fifo_A_PE_9_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_9_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_9_3;\n  #pragma HLS STREAM variable=fifo_A_PE_9_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_9_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_9_4;\n  #pragma HLS STREAM variable=fifo_A_PE_9_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_9_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_9_5;\n  #pragma HLS STREAM variable=fifo_A_PE_9_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_9_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_9_6;\n  #pragma HLS STREAM variable=fifo_A_PE_9_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_9_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_9_7;\n  #pragma HLS STREAM variable=fifo_A_PE_9_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_9_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_9_8;\n  #pragma HLS STREAM variable=fifo_A_PE_9_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_9_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_10_0;\n  #pragma HLS STREAM variable=fifo_A_PE_10_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_10_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_10_1;\n  #pragma HLS STREAM variable=fifo_A_PE_10_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_10_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_10_2;\n  #pragma HLS STREAM variable=fifo_A_PE_10_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_10_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_10_3;\n  #pragma HLS STREAM variable=fifo_A_PE_10_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_10_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_10_4;\n  #pragma HLS STREAM variable=fifo_A_PE_10_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_10_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_10_5;\n  #pragma HLS STREAM variable=fifo_A_PE_10_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_10_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_10_6;\n  #pragma HLS STREAM variable=fifo_A_PE_10_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_10_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_10_7;\n  #pragma HLS STREAM variable=fifo_A_PE_10_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_10_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_10_8;\n  #pragma HLS STREAM variable=fifo_A_PE_10_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_10_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_11_0;\n  #pragma HLS STREAM variable=fifo_A_PE_11_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_11_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_11_1;\n  #pragma HLS STREAM variable=fifo_A_PE_11_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_11_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_11_2;\n  #pragma HLS STREAM variable=fifo_A_PE_11_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_11_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_11_3;\n  #pragma HLS STREAM variable=fifo_A_PE_11_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_11_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_11_4;\n  #pragma HLS STREAM variable=fifo_A_PE_11_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_11_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_11_5;\n  #pragma HLS STREAM variable=fifo_A_PE_11_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_11_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_11_6;\n  #pragma HLS STREAM variable=fifo_A_PE_11_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_11_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_11_7;\n  #pragma HLS STREAM variable=fifo_A_PE_11_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_11_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_11_8;\n  #pragma HLS STREAM variable=fifo_A_PE_11_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_11_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_12_0;\n  #pragma HLS STREAM variable=fifo_A_PE_12_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_12_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_12_1;\n  #pragma HLS STREAM variable=fifo_A_PE_12_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_12_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_12_2;\n  #pragma HLS STREAM variable=fifo_A_PE_12_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_12_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_12_3;\n  #pragma HLS STREAM variable=fifo_A_PE_12_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_12_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_12_4;\n  #pragma HLS STREAM variable=fifo_A_PE_12_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_12_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_12_5;\n  #pragma HLS STREAM variable=fifo_A_PE_12_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_12_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_12_6;\n  #pragma HLS STREAM variable=fifo_A_PE_12_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_12_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_12_7;\n  #pragma HLS STREAM variable=fifo_A_PE_12_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_12_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_12_8;\n  #pragma HLS STREAM variable=fifo_A_PE_12_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_12_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_13_0;\n  #pragma HLS STREAM variable=fifo_A_PE_13_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_13_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_13_1;\n  #pragma HLS STREAM variable=fifo_A_PE_13_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_13_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_13_2;\n  #pragma HLS STREAM variable=fifo_A_PE_13_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_13_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_13_3;\n  #pragma HLS STREAM variable=fifo_A_PE_13_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_13_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_13_4;\n  #pragma HLS STREAM variable=fifo_A_PE_13_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_13_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_13_5;\n  #pragma HLS STREAM variable=fifo_A_PE_13_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_13_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_13_6;\n  #pragma HLS STREAM variable=fifo_A_PE_13_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_13_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_13_7;\n  #pragma HLS STREAM variable=fifo_A_PE_13_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_13_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_13_8;\n  #pragma HLS STREAM variable=fifo_A_PE_13_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_13_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_14_0;\n  #pragma HLS STREAM variable=fifo_A_PE_14_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_14_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_14_1;\n  #pragma HLS STREAM variable=fifo_A_PE_14_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_14_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_14_2;\n  #pragma HLS STREAM variable=fifo_A_PE_14_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_14_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_14_3;\n  #pragma HLS STREAM variable=fifo_A_PE_14_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_14_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_14_4;\n  #pragma HLS STREAM variable=fifo_A_PE_14_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_14_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_14_5;\n  #pragma HLS STREAM variable=fifo_A_PE_14_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_14_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_14_6;\n  #pragma HLS STREAM variable=fifo_A_PE_14_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_14_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_14_7;\n  #pragma HLS STREAM variable=fifo_A_PE_14_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_14_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_14_8;\n  #pragma HLS STREAM variable=fifo_A_PE_14_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_14_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_15_0;\n  #pragma HLS STREAM variable=fifo_A_PE_15_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_15_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_15_1;\n  #pragma HLS STREAM variable=fifo_A_PE_15_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_15_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_15_2;\n  #pragma HLS STREAM variable=fifo_A_PE_15_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_15_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_15_3;\n  #pragma HLS STREAM variable=fifo_A_PE_15_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_15_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_15_4;\n  #pragma HLS STREAM variable=fifo_A_PE_15_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_15_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_15_5;\n  #pragma HLS STREAM variable=fifo_A_PE_15_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_15_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_15_6;\n  #pragma HLS STREAM variable=fifo_A_PE_15_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_15_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_15_7;\n  #pragma HLS STREAM variable=fifo_A_PE_15_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_15_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_15_8;\n  #pragma HLS STREAM variable=fifo_A_PE_15_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_15_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_16_0;\n  #pragma HLS STREAM variable=fifo_A_PE_16_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_16_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_16_1;\n  #pragma HLS STREAM variable=fifo_A_PE_16_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_16_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_16_2;\n  #pragma HLS STREAM variable=fifo_A_PE_16_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_16_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_16_3;\n  #pragma HLS STREAM variable=fifo_A_PE_16_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_16_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_16_4;\n  #pragma HLS STREAM variable=fifo_A_PE_16_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_16_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_16_5;\n  #pragma HLS STREAM variable=fifo_A_PE_16_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_16_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_16_6;\n  #pragma HLS STREAM variable=fifo_A_PE_16_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_16_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_16_7;\n  #pragma HLS STREAM variable=fifo_A_PE_16_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_16_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_16_8;\n  #pragma HLS STREAM variable=fifo_A_PE_16_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_16_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_17_0;\n  #pragma HLS STREAM variable=fifo_A_PE_17_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_17_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_17_1;\n  #pragma HLS STREAM variable=fifo_A_PE_17_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_17_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_17_2;\n  #pragma HLS STREAM variable=fifo_A_PE_17_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_17_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_17_3;\n  #pragma HLS STREAM variable=fifo_A_PE_17_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_17_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_17_4;\n  #pragma HLS STREAM variable=fifo_A_PE_17_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_17_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_17_5;\n  #pragma HLS STREAM variable=fifo_A_PE_17_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_17_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_17_6;\n  #pragma HLS STREAM variable=fifo_A_PE_17_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_17_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_17_7;\n  #pragma HLS STREAM variable=fifo_A_PE_17_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_17_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_17_8;\n  #pragma HLS STREAM variable=fifo_A_PE_17_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_17_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_18_0;\n  #pragma HLS STREAM variable=fifo_A_PE_18_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_18_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_18_1;\n  #pragma HLS STREAM variable=fifo_A_PE_18_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_18_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_18_2;\n  #pragma HLS STREAM variable=fifo_A_PE_18_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_18_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_18_3;\n  #pragma HLS STREAM variable=fifo_A_PE_18_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_18_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_18_4;\n  #pragma HLS STREAM variable=fifo_A_PE_18_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_18_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_18_5;\n  #pragma HLS STREAM variable=fifo_A_PE_18_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_18_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_18_6;\n  #pragma HLS STREAM variable=fifo_A_PE_18_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_18_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_18_7;\n  #pragma HLS STREAM variable=fifo_A_PE_18_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_18_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_18_8;\n  #pragma HLS STREAM variable=fifo_A_PE_18_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_18_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_19_0;\n  #pragma HLS STREAM variable=fifo_A_PE_19_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_19_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_19_1;\n  #pragma HLS STREAM variable=fifo_A_PE_19_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_19_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_19_2;\n  #pragma HLS STREAM variable=fifo_A_PE_19_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_19_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_19_3;\n  #pragma HLS STREAM variable=fifo_A_PE_19_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_19_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_19_4;\n  #pragma HLS STREAM variable=fifo_A_PE_19_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_19_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_19_5;\n  #pragma HLS STREAM variable=fifo_A_PE_19_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_19_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_19_6;\n  #pragma HLS STREAM variable=fifo_A_PE_19_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_19_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_19_7;\n  #pragma HLS STREAM variable=fifo_A_PE_19_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_19_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_19_8;\n  #pragma HLS STREAM variable=fifo_A_PE_19_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_19_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_20_0;\n  #pragma HLS STREAM variable=fifo_A_PE_20_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_20_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_20_1;\n  #pragma HLS STREAM variable=fifo_A_PE_20_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_20_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_20_2;\n  #pragma HLS STREAM variable=fifo_A_PE_20_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_20_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_20_3;\n  #pragma HLS STREAM variable=fifo_A_PE_20_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_20_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_20_4;\n  #pragma HLS STREAM variable=fifo_A_PE_20_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_20_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_20_5;\n  #pragma HLS STREAM variable=fifo_A_PE_20_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_20_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_20_6;\n  #pragma HLS STREAM variable=fifo_A_PE_20_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_20_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_20_7;\n  #pragma HLS STREAM variable=fifo_A_PE_20_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_20_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_20_8;\n  #pragma HLS STREAM variable=fifo_A_PE_20_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_20_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_21_0;\n  #pragma HLS STREAM variable=fifo_A_PE_21_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_21_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_21_1;\n  #pragma HLS STREAM variable=fifo_A_PE_21_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_21_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_21_2;\n  #pragma HLS STREAM variable=fifo_A_PE_21_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_21_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_21_3;\n  #pragma HLS STREAM variable=fifo_A_PE_21_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_21_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_21_4;\n  #pragma HLS STREAM variable=fifo_A_PE_21_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_21_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_21_5;\n  #pragma HLS STREAM variable=fifo_A_PE_21_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_21_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_21_6;\n  #pragma HLS STREAM variable=fifo_A_PE_21_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_21_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_21_7;\n  #pragma HLS STREAM variable=fifo_A_PE_21_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_21_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_21_8;\n  #pragma HLS STREAM variable=fifo_A_PE_21_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_21_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_22_0;\n  #pragma HLS STREAM variable=fifo_A_PE_22_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_22_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_22_1;\n  #pragma HLS STREAM variable=fifo_A_PE_22_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_22_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_22_2;\n  #pragma HLS STREAM variable=fifo_A_PE_22_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_22_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_22_3;\n  #pragma HLS STREAM variable=fifo_A_PE_22_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_22_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_22_4;\n  #pragma HLS STREAM variable=fifo_A_PE_22_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_22_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_22_5;\n  #pragma HLS STREAM variable=fifo_A_PE_22_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_22_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_22_6;\n  #pragma HLS STREAM variable=fifo_A_PE_22_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_22_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_22_7;\n  #pragma HLS STREAM variable=fifo_A_PE_22_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_22_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_22_8;\n  #pragma HLS STREAM variable=fifo_A_PE_22_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_22_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_23_0;\n  #pragma HLS STREAM variable=fifo_A_PE_23_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_23_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_23_1;\n  #pragma HLS STREAM variable=fifo_A_PE_23_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_23_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_23_2;\n  #pragma HLS STREAM variable=fifo_A_PE_23_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_23_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_23_3;\n  #pragma HLS STREAM variable=fifo_A_PE_23_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_23_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_23_4;\n  #pragma HLS STREAM variable=fifo_A_PE_23_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_23_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_23_5;\n  #pragma HLS STREAM variable=fifo_A_PE_23_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_23_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_23_6;\n  #pragma HLS STREAM variable=fifo_A_PE_23_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_23_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_23_7;\n  #pragma HLS STREAM variable=fifo_A_PE_23_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_23_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<A_t64> fifo_A_PE_23_8;\n  #pragma HLS STREAM variable=fifo_A_PE_23_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_A_PE_23_8 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_0_0;\n  #pragma HLS STREAM variable=fifo_B_PE_0_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_0_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_1_0;\n  #pragma HLS STREAM variable=fifo_B_PE_1_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_1_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_2_0;\n  #pragma HLS STREAM variable=fifo_B_PE_2_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_2_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_3_0;\n  #pragma HLS STREAM variable=fifo_B_PE_3_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_3_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_4_0;\n  #pragma HLS STREAM variable=fifo_B_PE_4_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_4_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_5_0;\n  #pragma HLS STREAM variable=fifo_B_PE_5_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_5_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_6_0;\n  #pragma HLS STREAM variable=fifo_B_PE_6_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_6_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_7_0;\n  #pragma HLS STREAM variable=fifo_B_PE_7_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_7_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_8_0;\n  #pragma HLS STREAM variable=fifo_B_PE_8_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_8_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_9_0;\n  #pragma HLS STREAM variable=fifo_B_PE_9_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_9_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_10_0;\n  #pragma HLS STREAM variable=fifo_B_PE_10_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_10_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_11_0;\n  #pragma HLS STREAM variable=fifo_B_PE_11_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_11_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_12_0;\n  #pragma HLS STREAM variable=fifo_B_PE_12_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_12_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_13_0;\n  #pragma HLS STREAM variable=fifo_B_PE_13_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_13_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_14_0;\n  #pragma HLS STREAM variable=fifo_B_PE_14_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_14_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_15_0;\n  #pragma HLS STREAM variable=fifo_B_PE_15_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_15_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_16_0;\n  #pragma HLS STREAM variable=fifo_B_PE_16_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_16_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_17_0;\n  #pragma HLS STREAM variable=fifo_B_PE_17_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_17_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_18_0;\n  #pragma HLS STREAM variable=fifo_B_PE_18_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_18_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_19_0;\n  #pragma HLS STREAM variable=fifo_B_PE_19_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_19_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_20_0;\n  #pragma HLS STREAM variable=fifo_B_PE_20_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_20_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_21_0;\n  #pragma HLS STREAM variable=fifo_B_PE_21_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_21_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_22_0;\n  #pragma HLS STREAM variable=fifo_B_PE_22_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_22_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_23_0;\n  #pragma HLS STREAM variable=fifo_B_PE_23_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_23_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_24_0;\n  #pragma HLS STREAM variable=fifo_B_PE_24_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_24_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_0_1;\n  #pragma HLS STREAM variable=fifo_B_PE_0_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_0_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_1_1;\n  #pragma HLS STREAM variable=fifo_B_PE_1_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_1_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_2_1;\n  #pragma HLS STREAM variable=fifo_B_PE_2_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_2_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_3_1;\n  #pragma HLS STREAM variable=fifo_B_PE_3_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_3_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_4_1;\n  #pragma HLS STREAM variable=fifo_B_PE_4_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_4_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_5_1;\n  #pragma HLS STREAM variable=fifo_B_PE_5_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_5_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_6_1;\n  #pragma HLS STREAM variable=fifo_B_PE_6_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_6_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_7_1;\n  #pragma HLS STREAM variable=fifo_B_PE_7_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_7_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_8_1;\n  #pragma HLS STREAM variable=fifo_B_PE_8_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_8_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_9_1;\n  #pragma HLS STREAM variable=fifo_B_PE_9_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_9_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_10_1;\n  #pragma HLS STREAM variable=fifo_B_PE_10_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_10_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_11_1;\n  #pragma HLS STREAM variable=fifo_B_PE_11_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_11_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_12_1;\n  #pragma HLS STREAM variable=fifo_B_PE_12_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_12_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_13_1;\n  #pragma HLS STREAM variable=fifo_B_PE_13_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_13_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_14_1;\n  #pragma HLS STREAM variable=fifo_B_PE_14_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_14_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_15_1;\n  #pragma HLS STREAM variable=fifo_B_PE_15_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_15_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_16_1;\n  #pragma HLS STREAM variable=fifo_B_PE_16_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_16_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_17_1;\n  #pragma HLS STREAM variable=fifo_B_PE_17_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_17_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_18_1;\n  #pragma HLS STREAM variable=fifo_B_PE_18_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_18_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_19_1;\n  #pragma HLS STREAM variable=fifo_B_PE_19_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_19_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_20_1;\n  #pragma HLS STREAM variable=fifo_B_PE_20_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_20_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_21_1;\n  #pragma HLS STREAM variable=fifo_B_PE_21_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_21_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_22_1;\n  #pragma HLS STREAM variable=fifo_B_PE_22_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_22_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_23_1;\n  #pragma HLS STREAM variable=fifo_B_PE_23_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_23_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_24_1;\n  #pragma HLS STREAM variable=fifo_B_PE_24_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_24_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_0_2;\n  #pragma HLS STREAM variable=fifo_B_PE_0_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_0_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_1_2;\n  #pragma HLS STREAM variable=fifo_B_PE_1_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_1_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_2_2;\n  #pragma HLS STREAM variable=fifo_B_PE_2_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_2_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_3_2;\n  #pragma HLS STREAM variable=fifo_B_PE_3_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_3_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_4_2;\n  #pragma HLS STREAM variable=fifo_B_PE_4_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_4_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_5_2;\n  #pragma HLS STREAM variable=fifo_B_PE_5_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_5_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_6_2;\n  #pragma HLS STREAM variable=fifo_B_PE_6_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_6_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_7_2;\n  #pragma HLS STREAM variable=fifo_B_PE_7_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_7_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_8_2;\n  #pragma HLS STREAM variable=fifo_B_PE_8_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_8_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_9_2;\n  #pragma HLS STREAM variable=fifo_B_PE_9_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_9_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_10_2;\n  #pragma HLS STREAM variable=fifo_B_PE_10_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_10_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_11_2;\n  #pragma HLS STREAM variable=fifo_B_PE_11_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_11_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_12_2;\n  #pragma HLS STREAM variable=fifo_B_PE_12_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_12_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_13_2;\n  #pragma HLS STREAM variable=fifo_B_PE_13_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_13_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_14_2;\n  #pragma HLS STREAM variable=fifo_B_PE_14_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_14_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_15_2;\n  #pragma HLS STREAM variable=fifo_B_PE_15_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_15_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_16_2;\n  #pragma HLS STREAM variable=fifo_B_PE_16_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_16_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_17_2;\n  #pragma HLS STREAM variable=fifo_B_PE_17_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_17_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_18_2;\n  #pragma HLS STREAM variable=fifo_B_PE_18_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_18_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_19_2;\n  #pragma HLS STREAM variable=fifo_B_PE_19_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_19_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_20_2;\n  #pragma HLS STREAM variable=fifo_B_PE_20_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_20_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_21_2;\n  #pragma HLS STREAM variable=fifo_B_PE_21_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_21_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_22_2;\n  #pragma HLS STREAM variable=fifo_B_PE_22_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_22_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_23_2;\n  #pragma HLS STREAM variable=fifo_B_PE_23_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_23_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_24_2;\n  #pragma HLS STREAM variable=fifo_B_PE_24_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_24_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_0_3;\n  #pragma HLS STREAM variable=fifo_B_PE_0_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_0_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_1_3;\n  #pragma HLS STREAM variable=fifo_B_PE_1_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_1_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_2_3;\n  #pragma HLS STREAM variable=fifo_B_PE_2_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_2_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_3_3;\n  #pragma HLS STREAM variable=fifo_B_PE_3_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_3_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_4_3;\n  #pragma HLS STREAM variable=fifo_B_PE_4_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_4_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_5_3;\n  #pragma HLS STREAM variable=fifo_B_PE_5_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_5_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_6_3;\n  #pragma HLS STREAM variable=fifo_B_PE_6_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_6_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_7_3;\n  #pragma HLS STREAM variable=fifo_B_PE_7_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_7_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_8_3;\n  #pragma HLS STREAM variable=fifo_B_PE_8_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_8_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_9_3;\n  #pragma HLS STREAM variable=fifo_B_PE_9_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_9_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_10_3;\n  #pragma HLS STREAM variable=fifo_B_PE_10_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_10_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_11_3;\n  #pragma HLS STREAM variable=fifo_B_PE_11_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_11_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_12_3;\n  #pragma HLS STREAM variable=fifo_B_PE_12_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_12_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_13_3;\n  #pragma HLS STREAM variable=fifo_B_PE_13_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_13_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_14_3;\n  #pragma HLS STREAM variable=fifo_B_PE_14_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_14_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_15_3;\n  #pragma HLS STREAM variable=fifo_B_PE_15_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_15_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_16_3;\n  #pragma HLS STREAM variable=fifo_B_PE_16_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_16_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_17_3;\n  #pragma HLS STREAM variable=fifo_B_PE_17_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_17_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_18_3;\n  #pragma HLS STREAM variable=fifo_B_PE_18_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_18_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_19_3;\n  #pragma HLS STREAM variable=fifo_B_PE_19_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_19_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_20_3;\n  #pragma HLS STREAM variable=fifo_B_PE_20_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_20_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_21_3;\n  #pragma HLS STREAM variable=fifo_B_PE_21_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_21_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_22_3;\n  #pragma HLS STREAM variable=fifo_B_PE_22_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_22_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_23_3;\n  #pragma HLS STREAM variable=fifo_B_PE_23_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_23_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_24_3;\n  #pragma HLS STREAM variable=fifo_B_PE_24_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_24_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_0_4;\n  #pragma HLS STREAM variable=fifo_B_PE_0_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_0_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_1_4;\n  #pragma HLS STREAM variable=fifo_B_PE_1_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_1_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_2_4;\n  #pragma HLS STREAM variable=fifo_B_PE_2_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_2_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_3_4;\n  #pragma HLS STREAM variable=fifo_B_PE_3_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_3_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_4_4;\n  #pragma HLS STREAM variable=fifo_B_PE_4_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_4_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_5_4;\n  #pragma HLS STREAM variable=fifo_B_PE_5_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_5_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_6_4;\n  #pragma HLS STREAM variable=fifo_B_PE_6_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_6_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_7_4;\n  #pragma HLS STREAM variable=fifo_B_PE_7_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_7_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_8_4;\n  #pragma HLS STREAM variable=fifo_B_PE_8_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_8_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_9_4;\n  #pragma HLS STREAM variable=fifo_B_PE_9_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_9_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_10_4;\n  #pragma HLS STREAM variable=fifo_B_PE_10_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_10_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_11_4;\n  #pragma HLS STREAM variable=fifo_B_PE_11_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_11_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_12_4;\n  #pragma HLS STREAM variable=fifo_B_PE_12_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_12_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_13_4;\n  #pragma HLS STREAM variable=fifo_B_PE_13_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_13_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_14_4;\n  #pragma HLS STREAM variable=fifo_B_PE_14_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_14_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_15_4;\n  #pragma HLS STREAM variable=fifo_B_PE_15_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_15_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_16_4;\n  #pragma HLS STREAM variable=fifo_B_PE_16_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_16_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_17_4;\n  #pragma HLS STREAM variable=fifo_B_PE_17_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_17_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_18_4;\n  #pragma HLS STREAM variable=fifo_B_PE_18_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_18_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_19_4;\n  #pragma HLS STREAM variable=fifo_B_PE_19_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_19_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_20_4;\n  #pragma HLS STREAM variable=fifo_B_PE_20_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_20_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_21_4;\n  #pragma HLS STREAM variable=fifo_B_PE_21_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_21_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_22_4;\n  #pragma HLS STREAM variable=fifo_B_PE_22_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_22_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_23_4;\n  #pragma HLS STREAM variable=fifo_B_PE_23_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_23_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_24_4;\n  #pragma HLS STREAM variable=fifo_B_PE_24_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_24_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_0_5;\n  #pragma HLS STREAM variable=fifo_B_PE_0_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_0_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_1_5;\n  #pragma HLS STREAM variable=fifo_B_PE_1_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_1_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_2_5;\n  #pragma HLS STREAM variable=fifo_B_PE_2_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_2_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_3_5;\n  #pragma HLS STREAM variable=fifo_B_PE_3_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_3_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_4_5;\n  #pragma HLS STREAM variable=fifo_B_PE_4_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_4_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_5_5;\n  #pragma HLS STREAM variable=fifo_B_PE_5_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_5_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_6_5;\n  #pragma HLS STREAM variable=fifo_B_PE_6_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_6_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_7_5;\n  #pragma HLS STREAM variable=fifo_B_PE_7_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_7_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_8_5;\n  #pragma HLS STREAM variable=fifo_B_PE_8_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_8_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_9_5;\n  #pragma HLS STREAM variable=fifo_B_PE_9_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_9_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_10_5;\n  #pragma HLS STREAM variable=fifo_B_PE_10_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_10_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_11_5;\n  #pragma HLS STREAM variable=fifo_B_PE_11_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_11_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_12_5;\n  #pragma HLS STREAM variable=fifo_B_PE_12_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_12_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_13_5;\n  #pragma HLS STREAM variable=fifo_B_PE_13_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_13_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_14_5;\n  #pragma HLS STREAM variable=fifo_B_PE_14_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_14_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_15_5;\n  #pragma HLS STREAM variable=fifo_B_PE_15_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_15_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_16_5;\n  #pragma HLS STREAM variable=fifo_B_PE_16_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_16_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_17_5;\n  #pragma HLS STREAM variable=fifo_B_PE_17_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_17_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_18_5;\n  #pragma HLS STREAM variable=fifo_B_PE_18_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_18_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_19_5;\n  #pragma HLS STREAM variable=fifo_B_PE_19_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_19_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_20_5;\n  #pragma HLS STREAM variable=fifo_B_PE_20_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_20_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_21_5;\n  #pragma HLS STREAM variable=fifo_B_PE_21_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_21_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_22_5;\n  #pragma HLS STREAM variable=fifo_B_PE_22_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_22_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_23_5;\n  #pragma HLS STREAM variable=fifo_B_PE_23_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_23_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_24_5;\n  #pragma HLS STREAM variable=fifo_B_PE_24_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_24_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_0_6;\n  #pragma HLS STREAM variable=fifo_B_PE_0_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_0_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_1_6;\n  #pragma HLS STREAM variable=fifo_B_PE_1_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_1_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_2_6;\n  #pragma HLS STREAM variable=fifo_B_PE_2_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_2_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_3_6;\n  #pragma HLS STREAM variable=fifo_B_PE_3_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_3_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_4_6;\n  #pragma HLS STREAM variable=fifo_B_PE_4_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_4_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_5_6;\n  #pragma HLS STREAM variable=fifo_B_PE_5_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_5_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_6_6;\n  #pragma HLS STREAM variable=fifo_B_PE_6_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_6_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_7_6;\n  #pragma HLS STREAM variable=fifo_B_PE_7_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_7_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_8_6;\n  #pragma HLS STREAM variable=fifo_B_PE_8_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_8_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_9_6;\n  #pragma HLS STREAM variable=fifo_B_PE_9_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_9_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_10_6;\n  #pragma HLS STREAM variable=fifo_B_PE_10_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_10_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_11_6;\n  #pragma HLS STREAM variable=fifo_B_PE_11_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_11_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_12_6;\n  #pragma HLS STREAM variable=fifo_B_PE_12_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_12_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_13_6;\n  #pragma HLS STREAM variable=fifo_B_PE_13_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_13_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_14_6;\n  #pragma HLS STREAM variable=fifo_B_PE_14_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_14_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_15_6;\n  #pragma HLS STREAM variable=fifo_B_PE_15_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_15_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_16_6;\n  #pragma HLS STREAM variable=fifo_B_PE_16_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_16_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_17_6;\n  #pragma HLS STREAM variable=fifo_B_PE_17_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_17_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_18_6;\n  #pragma HLS STREAM variable=fifo_B_PE_18_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_18_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_19_6;\n  #pragma HLS STREAM variable=fifo_B_PE_19_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_19_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_20_6;\n  #pragma HLS STREAM variable=fifo_B_PE_20_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_20_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_21_6;\n  #pragma HLS STREAM variable=fifo_B_PE_21_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_21_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_22_6;\n  #pragma HLS STREAM variable=fifo_B_PE_22_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_22_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_23_6;\n  #pragma HLS STREAM variable=fifo_B_PE_23_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_23_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_24_6;\n  #pragma HLS STREAM variable=fifo_B_PE_24_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_24_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_0_7;\n  #pragma HLS STREAM variable=fifo_B_PE_0_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_0_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_1_7;\n  #pragma HLS STREAM variable=fifo_B_PE_1_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_1_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_2_7;\n  #pragma HLS STREAM variable=fifo_B_PE_2_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_2_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_3_7;\n  #pragma HLS STREAM variable=fifo_B_PE_3_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_3_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_4_7;\n  #pragma HLS STREAM variable=fifo_B_PE_4_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_4_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_5_7;\n  #pragma HLS STREAM variable=fifo_B_PE_5_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_5_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_6_7;\n  #pragma HLS STREAM variable=fifo_B_PE_6_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_6_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_7_7;\n  #pragma HLS STREAM variable=fifo_B_PE_7_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_7_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_8_7;\n  #pragma HLS STREAM variable=fifo_B_PE_8_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_8_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_9_7;\n  #pragma HLS STREAM variable=fifo_B_PE_9_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_9_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_10_7;\n  #pragma HLS STREAM variable=fifo_B_PE_10_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_10_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_11_7;\n  #pragma HLS STREAM variable=fifo_B_PE_11_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_11_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_12_7;\n  #pragma HLS STREAM variable=fifo_B_PE_12_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_12_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_13_7;\n  #pragma HLS STREAM variable=fifo_B_PE_13_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_13_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_14_7;\n  #pragma HLS STREAM variable=fifo_B_PE_14_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_14_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_15_7;\n  #pragma HLS STREAM variable=fifo_B_PE_15_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_15_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_16_7;\n  #pragma HLS STREAM variable=fifo_B_PE_16_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_16_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_17_7;\n  #pragma HLS STREAM variable=fifo_B_PE_17_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_17_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_18_7;\n  #pragma HLS STREAM variable=fifo_B_PE_18_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_18_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_19_7;\n  #pragma HLS STREAM variable=fifo_B_PE_19_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_19_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_20_7;\n  #pragma HLS STREAM variable=fifo_B_PE_20_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_20_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_21_7;\n  #pragma HLS STREAM variable=fifo_B_PE_21_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_21_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_22_7;\n  #pragma HLS STREAM variable=fifo_B_PE_22_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_22_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_23_7;\n  #pragma HLS STREAM variable=fifo_B_PE_23_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_23_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<B_t64> fifo_B_PE_24_7;\n  #pragma HLS STREAM variable=fifo_B_PE_24_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_B_PE_24_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_0_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_0_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_0_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_1_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_1_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_1_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_2_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_2_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_2_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_3_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_3_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_3_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_4_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_4_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_4_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_5_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_5_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_5_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_6_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_6_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_6_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_7_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_7_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_7_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_8_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_8_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_8_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_9_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_9_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_9_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_10_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_10_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_10_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_11_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_11_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_11_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_12_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_12_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_12_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_13_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_13_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_13_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_14_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_14_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_14_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_15_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_15_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_15_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_16_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_16_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_16_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_17_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_17_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_17_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_18_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_18_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_18_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_19_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_19_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_19_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_20_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_20_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_20_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_21_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_21_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_21_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_22_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_22_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_22_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_23_0;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_23_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_23_0 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_0_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_0_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_0_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_1_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_1_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_1_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_2_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_2_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_2_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_3_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_3_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_3_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_4_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_4_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_4_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_5_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_5_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_5_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_6_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_6_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_6_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_7_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_7_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_7_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_8_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_8_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_8_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_9_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_9_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_9_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_10_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_10_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_10_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_11_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_11_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_11_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_12_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_12_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_12_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_13_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_13_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_13_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_14_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_14_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_14_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_15_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_15_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_15_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_16_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_16_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_16_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_17_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_17_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_17_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_18_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_18_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_18_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_19_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_19_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_19_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_20_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_20_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_20_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_21_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_21_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_21_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_22_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_22_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_22_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_23_1;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_23_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_23_1 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_0_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_0_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_0_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_1_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_1_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_1_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_2_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_2_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_2_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_3_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_3_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_3_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_4_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_4_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_4_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_5_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_5_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_5_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_6_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_6_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_6_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_7_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_7_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_7_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_8_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_8_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_8_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_9_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_9_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_9_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_10_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_10_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_10_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_11_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_11_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_11_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_12_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_12_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_12_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_13_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_13_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_13_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_14_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_14_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_14_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_15_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_15_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_15_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_16_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_16_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_16_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_17_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_17_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_17_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_18_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_18_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_18_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_19_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_19_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_19_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_20_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_20_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_20_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_21_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_21_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_21_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_22_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_22_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_22_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_23_2;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_23_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_23_2 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_0_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_0_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_0_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_1_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_1_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_1_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_2_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_2_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_2_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_3_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_3_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_3_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_4_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_4_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_4_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_5_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_5_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_5_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_6_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_6_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_6_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_7_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_7_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_7_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_8_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_8_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_8_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_9_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_9_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_9_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_10_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_10_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_10_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_11_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_11_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_11_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_12_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_12_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_12_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_13_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_13_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_13_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_14_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_14_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_14_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_15_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_15_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_15_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_16_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_16_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_16_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_17_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_17_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_17_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_18_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_18_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_18_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_19_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_19_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_19_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_20_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_20_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_20_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_21_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_21_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_21_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_22_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_22_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_22_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_23_3;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_23_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_23_3 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_0_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_0_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_0_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_1_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_1_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_1_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_2_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_2_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_2_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_3_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_3_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_3_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_4_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_4_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_4_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_5_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_5_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_5_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_6_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_6_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_6_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_7_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_7_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_7_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_8_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_8_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_8_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_9_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_9_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_9_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_10_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_10_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_10_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_11_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_11_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_11_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_12_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_12_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_12_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_13_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_13_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_13_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_14_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_14_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_14_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_15_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_15_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_15_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_16_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_16_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_16_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_17_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_17_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_17_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_18_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_18_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_18_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_19_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_19_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_19_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_20_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_20_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_20_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_21_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_21_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_21_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_22_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_22_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_22_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_23_4;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_23_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_23_4 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_0_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_0_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_0_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_1_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_1_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_1_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_2_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_2_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_2_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_3_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_3_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_3_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_4_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_4_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_4_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_5_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_5_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_5_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_6_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_6_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_6_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_7_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_7_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_7_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_8_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_8_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_8_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_9_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_9_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_9_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_10_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_10_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_10_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_11_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_11_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_11_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_12_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_12_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_12_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_13_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_13_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_13_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_14_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_14_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_14_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_15_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_15_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_15_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_16_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_16_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_16_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_17_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_17_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_17_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_18_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_18_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_18_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_19_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_19_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_19_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_20_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_20_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_20_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_21_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_21_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_21_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_22_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_22_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_22_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_23_5;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_23_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_23_5 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_0_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_0_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_0_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_1_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_1_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_1_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_2_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_2_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_2_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_3_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_3_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_3_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_4_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_4_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_4_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_5_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_5_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_5_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_6_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_6_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_6_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_7_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_7_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_7_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_8_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_8_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_8_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_9_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_9_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_9_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_10_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_10_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_10_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_11_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_11_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_11_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_12_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_12_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_12_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_13_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_13_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_13_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_14_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_14_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_14_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_15_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_15_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_15_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_16_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_16_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_16_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_17_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_17_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_17_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_18_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_18_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_18_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_19_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_19_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_19_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_20_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_20_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_20_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_21_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_21_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_21_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_22_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_22_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_22_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_23_6;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_23_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_23_6 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_0_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_0_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_0_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_1_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_1_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_1_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_2_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_2_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_2_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_3_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_3_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_3_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_4_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_4_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_4_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_5_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_5_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_5_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_6_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_6_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_6_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_7_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_7_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_7_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_8_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_8_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_8_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_9_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_9_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_9_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_10_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_10_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_10_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_11_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_11_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_11_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_12_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_12_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_12_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_13_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_13_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_13_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_14_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_14_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_14_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_15_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_15_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_15_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_16_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_16_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_16_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_17_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_17_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_17_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_18_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_18_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_18_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_19_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_19_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_19_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_20_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_20_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_20_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_21_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_21_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_21_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_22_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_22_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_22_7 core=FIFO_SRL\n  /* PE fifo */ hls::stream<char> fifo_C_drain_PE_23_7;\n  #pragma HLS STREAM variable=fifo_C_drain_PE_23_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_PE_23_7 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_0;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_0 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_1;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_1 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_2;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_2 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_3;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_3 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_4;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_4 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_5;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_5 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_6;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_6 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_7;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_7 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_8;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_8 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_9;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_9 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_9 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_10;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_10 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_10 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_11;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_11 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_11 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_12;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_12 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_12 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_13;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_13 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_13 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_14;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_14 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_14 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_15;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_15 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_15 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_16;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_16 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_16 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_17;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_17 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_17 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_18;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_18 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_18 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_19;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_19 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_19 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_20;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_20 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_20 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_21;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_21 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_21 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_22;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_22 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_22 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_23;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_23 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_23 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_0_24;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_0_24 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_0_24 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_0;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_0 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_1;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_1 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_2;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_2 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_3;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_3 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_4;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_4 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_5;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_5 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_6;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_6 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_7;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_7 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_8;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_8 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_9;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_9 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_9 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_10;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_10 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_10 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_11;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_11 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_11 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_12;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_12 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_12 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_13;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_13 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_13 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_14;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_14 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_14 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_15;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_15 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_15 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_16;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_16 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_16 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_17;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_17 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_17 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_18;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_18 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_18 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_19;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_19 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_19 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_20;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_20 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_20 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_21;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_21 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_21 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_22;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_22 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_22 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_23;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_23 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_23 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_1_24;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_1_24 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_1_24 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_0;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_0 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_1;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_1 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_2;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_2 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_3;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_3 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_4;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_4 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_5;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_5 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_6;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_6 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_7;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_7 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_8;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_8 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_9;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_9 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_9 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_10;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_10 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_10 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_11;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_11 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_11 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_12;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_12 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_12 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_13;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_13 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_13 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_14;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_14 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_14 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_15;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_15 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_15 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_16;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_16 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_16 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_17;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_17 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_17 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_18;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_18 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_18 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_19;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_19 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_19 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_20;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_20 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_20 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_21;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_21 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_21 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_22;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_22 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_22 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_23;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_23 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_23 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_2_24;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_2_24 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_2_24 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_0;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_0 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_1;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_1 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_2;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_2 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_3;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_3 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_4;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_4 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_5;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_5 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_6;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_6 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_7;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_7 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_8;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_8 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_9;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_9 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_9 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_10;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_10 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_10 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_11;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_11 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_11 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_12;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_12 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_12 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_13;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_13 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_13 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_14;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_14 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_14 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_15;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_15 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_15 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_16;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_16 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_16 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_17;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_17 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_17 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_18;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_18 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_18 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_19;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_19 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_19 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_20;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_20 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_20 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_21;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_21 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_21 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_22;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_22 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_22 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_23;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_23 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_23 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_3_24;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_3_24 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_3_24 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_0;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_0 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_1;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_1 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_2;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_2 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_3;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_3 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_4;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_4 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_5;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_5 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_6;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_6 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_7;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_7 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_8;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_8 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_9;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_9 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_9 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_10;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_10 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_10 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_11;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_11 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_11 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_12;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_12 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_12 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_13;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_13 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_13 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_14;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_14 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_14 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_15;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_15 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_15 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_16;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_16 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_16 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_17;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_17 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_17 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_18;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_18 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_18 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_19;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_19 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_19 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_20;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_20 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_20 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_21;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_21 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_21 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_22;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_22 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_22 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_23;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_23 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_23 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_4_24;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_4_24 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_4_24 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_0;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_0 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_1;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_1 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_2;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_2 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_3;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_3 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_4;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_4 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_5;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_5 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_6;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_6 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_7;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_7 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_8;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_8 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_9;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_9 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_9 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_10;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_10 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_10 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_11;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_11 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_11 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_12;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_12 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_12 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_13;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_13 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_13 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_14;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_14 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_14 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_15;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_15 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_15 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_16;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_16 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_16 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_17;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_17 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_17 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_18;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_18 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_18 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_19;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_19 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_19 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_20;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_20 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_20 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_21;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_21 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_21 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_22;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_22 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_22 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_23;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_23 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_23 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_5_24;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_5_24 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_5_24 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_0;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_0 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_1;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_1 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_2;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_2 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_3;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_3 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_4;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_4 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_5;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_5 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_6;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_6 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_7;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_7 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_8;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_8 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_9;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_9 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_9 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_10;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_10 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_10 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_11;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_11 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_11 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_12;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_12 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_12 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_13;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_13 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_13 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_14;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_14 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_14 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_15;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_15 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_15 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_16;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_16 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_16 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_17;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_17 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_17 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_18;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_18 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_18 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_19;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_19 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_19 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_20;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_20 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_20 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_21;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_21 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_21 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_22;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_22 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_22 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_23;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_23 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_23 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_6_24;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_6_24 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_6_24 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_0;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_0 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_1;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_1 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_2;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_2 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_3;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_3 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_4;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_4 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_5;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_5 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_6;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_6 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_7;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_7 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_8;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_8 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_9;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_9 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_9 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_10;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_10 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_10 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_11;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_11 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_11 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_12;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_12 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_12 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_13;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_13 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_13 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_14;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_14 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_14 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_15;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_15 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_15 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_16;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_16 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_16 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_17;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_17 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_17 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_18;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_18 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_18 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_19;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_19 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_19 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_20;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_20 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_20 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_21;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_21 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_21 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_22;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_22 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_22 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_23;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_23 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_23 core=FIFO_SRL\n  /* C_drain_IO_L1_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L1_out_7_24;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L1_out_7_24 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L1_out_7_24 core=FIFO_SRL\n  /* C_drain_IO_L2_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L2_out_0;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L2_out_0 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L2_out_0 core=FIFO_SRL\n  /* C_drain_IO_L2_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L2_out_1;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L2_out_1 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L2_out_1 core=FIFO_SRL\n  /* C_drain_IO_L2_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L2_out_2;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L2_out_2 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L2_out_2 core=FIFO_SRL\n  /* C_drain_IO_L2_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L2_out_3;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L2_out_3 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L2_out_3 core=FIFO_SRL\n  /* C_drain_IO_L2_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L2_out_4;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L2_out_4 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L2_out_4 core=FIFO_SRL\n  /* C_drain_IO_L2_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L2_out_5;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L2_out_5 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L2_out_5 core=FIFO_SRL\n  /* C_drain_IO_L2_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L2_out_6;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L2_out_6 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L2_out_6 core=FIFO_SRL\n  /* C_drain_IO_L2_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L2_out_7;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L2_out_7 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L2_out_7 core=FIFO_SRL\n  /* C_drain_IO_L2_out fifo */ hls::stream<C_t32> fifo_C_drain_C_drain_IO_L2_out_8;\n  #pragma HLS STREAM variable=fifo_C_drain_C_drain_IO_L2_out_8 depth=2\n  #pragma HLS RESOURCE variable=fifo_C_drain_C_drain_IO_L2_out_8 core=FIFO_SRL\n  /* FIFO Declaration */\n\n  /* Module Call */\n  A_IO_L3_in_serialize(\n    /* array */ A,\n    /* fifo */ fifo_A_A_IO_L3_in_serialize\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L3_in(\n    /* fifo */ fifo_A_A_IO_L3_in_serialize,\n    /* fifo */ fifo_A_A_IO_L2_in_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 0,\n    /* fifo */ fifo_A_A_IO_L2_in_0,\n    /* fifo */ fifo_A_A_IO_L2_in_1,\n    /* fifo */ fifo_A_PE_0_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 1,\n    /* fifo */ fifo_A_A_IO_L2_in_1,\n    /* fifo */ fifo_A_A_IO_L2_in_2,\n    /* fifo */ fifo_A_PE_1_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 2,\n    /* fifo */ fifo_A_A_IO_L2_in_2,\n    /* fifo */ fifo_A_A_IO_L2_in_3,\n    /* fifo */ fifo_A_PE_2_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 3,\n    /* fifo */ fifo_A_A_IO_L2_in_3,\n    /* fifo */ fifo_A_A_IO_L2_in_4,\n    /* fifo */ fifo_A_PE_3_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 4,\n    /* fifo */ fifo_A_A_IO_L2_in_4,\n    /* fifo */ fifo_A_A_IO_L2_in_5,\n    /* fifo */ fifo_A_PE_4_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 5,\n    /* fifo */ fifo_A_A_IO_L2_in_5,\n    /* fifo */ fifo_A_A_IO_L2_in_6,\n    /* fifo */ fifo_A_PE_5_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 6,\n    /* fifo */ fifo_A_A_IO_L2_in_6,\n    /* fifo */ fifo_A_A_IO_L2_in_7,\n    /* fifo */ fifo_A_PE_6_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 7,\n    /* fifo */ fifo_A_A_IO_L2_in_7,\n    /* fifo */ fifo_A_A_IO_L2_in_8,\n    /* fifo */ fifo_A_PE_7_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 8,\n    /* fifo */ fifo_A_A_IO_L2_in_8,\n    /* fifo */ fifo_A_A_IO_L2_in_9,\n    /* fifo */ fifo_A_PE_8_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 9,\n    /* fifo */ fifo_A_A_IO_L2_in_9,\n    /* fifo */ fifo_A_A_IO_L2_in_10,\n    /* fifo */ fifo_A_PE_9_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 10,\n    /* fifo */ fifo_A_A_IO_L2_in_10,\n    /* fifo */ fifo_A_A_IO_L2_in_11,\n    /* fifo */ fifo_A_PE_10_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 11,\n    /* fifo */ fifo_A_A_IO_L2_in_11,\n    /* fifo */ fifo_A_A_IO_L2_in_12,\n    /* fifo */ fifo_A_PE_11_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 12,\n    /* fifo */ fifo_A_A_IO_L2_in_12,\n    /* fifo */ fifo_A_A_IO_L2_in_13,\n    /* fifo */ fifo_A_PE_12_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 13,\n    /* fifo */ fifo_A_A_IO_L2_in_13,\n    /* fifo */ fifo_A_A_IO_L2_in_14,\n    /* fifo */ fifo_A_PE_13_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 14,\n    /* fifo */ fifo_A_A_IO_L2_in_14,\n    /* fifo */ fifo_A_A_IO_L2_in_15,\n    /* fifo */ fifo_A_PE_14_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 15,\n    /* fifo */ fifo_A_A_IO_L2_in_15,\n    /* fifo */ fifo_A_A_IO_L2_in_16,\n    /* fifo */ fifo_A_PE_15_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 16,\n    /* fifo */ fifo_A_A_IO_L2_in_16,\n    /* fifo */ fifo_A_A_IO_L2_in_17,\n    /* fifo */ fifo_A_PE_16_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 17,\n    /* fifo */ fifo_A_A_IO_L2_in_17,\n    /* fifo */ fifo_A_A_IO_L2_in_18,\n    /* fifo */ fifo_A_PE_17_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 18,\n    /* fifo */ fifo_A_A_IO_L2_in_18,\n    /* fifo */ fifo_A_A_IO_L2_in_19,\n    /* fifo */ fifo_A_PE_18_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 19,\n    /* fifo */ fifo_A_A_IO_L2_in_19,\n    /* fifo */ fifo_A_A_IO_L2_in_20,\n    /* fifo */ fifo_A_PE_19_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 20,\n    /* fifo */ fifo_A_A_IO_L2_in_20,\n    /* fifo */ fifo_A_A_IO_L2_in_21,\n    /* fifo */ fifo_A_PE_20_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 21,\n    /* fifo */ fifo_A_A_IO_L2_in_21,\n    /* fifo */ fifo_A_A_IO_L2_in_22,\n    /* fifo */ fifo_A_PE_21_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in(\n    /* module id */ 22,\n    /* fifo */ fifo_A_A_IO_L2_in_22,\n    /* fifo */ fifo_A_A_IO_L2_in_23,\n    /* fifo */ fifo_A_PE_22_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_IO_L2_in_boundary(\n    /* module id */ 23,\n    /* fifo */ fifo_A_A_IO_L2_in_23,\n    /* fifo */ fifo_A_PE_23_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_IO_L3_in_serialize(\n    /* array */ B,\n    /* fifo */ fifo_B_B_IO_L3_in_serialize\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_IO_L3_in(\n    /* fifo */ fifo_B_B_IO_L3_in_serialize,\n    /* fifo */ fifo_B_B_IO_L2_in_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_IO_L2_in(\n    /* module id */ 0,\n    /* fifo */ fifo_B_B_IO_L2_in_0,\n    /* fifo */ fifo_B_B_IO_L2_in_1,\n    /* fifo */ fifo_B_PE_0_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_IO_L2_in(\n    /* module id */ 1,\n    /* fifo */ fifo_B_B_IO_L2_in_1,\n    /* fifo */ fifo_B_B_IO_L2_in_2,\n    /* fifo */ fifo_B_PE_0_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_IO_L2_in(\n    /* module id */ 2,\n    /* fifo */ fifo_B_B_IO_L2_in_2,\n    /* fifo */ fifo_B_B_IO_L2_in_3,\n    /* fifo */ fifo_B_PE_0_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_IO_L2_in(\n    /* module id */ 3,\n    /* fifo */ fifo_B_B_IO_L2_in_3,\n    /* fifo */ fifo_B_B_IO_L2_in_4,\n    /* fifo */ fifo_B_PE_0_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_IO_L2_in(\n    /* module id */ 4,\n    /* fifo */ fifo_B_B_IO_L2_in_4,\n    /* fifo */ fifo_B_B_IO_L2_in_5,\n    /* fifo */ fifo_B_PE_0_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_IO_L2_in(\n    /* module id */ 5,\n    /* fifo */ fifo_B_B_IO_L2_in_5,\n    /* fifo */ fifo_B_B_IO_L2_in_6,\n    /* fifo */ fifo_B_PE_0_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_IO_L2_in(\n    /* module id */ 6,\n    /* fifo */ fifo_B_B_IO_L2_in_6,\n    /* fifo */ fifo_B_B_IO_L2_in_7,\n    /* fifo */ fifo_B_PE_0_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_IO_L2_in_boundary(\n    /* module id */ 7,\n    /* fifo */ fifo_B_B_IO_L2_in_7,\n    /* fifo */ fifo_B_PE_0_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 0,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_0_0,\n    /* fifo */ fifo_A_PE_0_1,\n    /* fifo */ fifo_B_PE_0_0,\n    /* fifo */ fifo_B_PE_1_0,\n    /* fifo */ fifo_C_drain_PE_0_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 0,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_0_1,\n    /* fifo */ fifo_A_PE_0_2,\n    /* fifo */ fifo_B_PE_0_1,\n    /* fifo */ fifo_B_PE_1_1,\n    /* fifo */ fifo_C_drain_PE_0_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 0,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_0_2,\n    /* fifo */ fifo_A_PE_0_3,\n    /* fifo */ fifo_B_PE_0_2,\n    /* fifo */ fifo_B_PE_1_2,\n    /* fifo */ fifo_C_drain_PE_0_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 0,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_0_3,\n    /* fifo */ fifo_A_PE_0_4,\n    /* fifo */ fifo_B_PE_0_3,\n    /* fifo */ fifo_B_PE_1_3,\n    /* fifo */ fifo_C_drain_PE_0_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 0,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_0_4,\n    /* fifo */ fifo_A_PE_0_5,\n    /* fifo */ fifo_B_PE_0_4,\n    /* fifo */ fifo_B_PE_1_4,\n    /* fifo */ fifo_C_drain_PE_0_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 0,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_0_5,\n    /* fifo */ fifo_A_PE_0_6,\n    /* fifo */ fifo_B_PE_0_5,\n    /* fifo */ fifo_B_PE_1_5,\n    /* fifo */ fifo_C_drain_PE_0_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 0,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_0_6,\n    /* fifo */ fifo_A_PE_0_7,\n    /* fifo */ fifo_B_PE_0_6,\n    /* fifo */ fifo_B_PE_1_6,\n    /* fifo */ fifo_C_drain_PE_0_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 0,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_0_7,\n    /* fifo */ fifo_A_PE_0_8,\n    /* fifo */ fifo_B_PE_0_7,\n    /* fifo */ fifo_B_PE_1_7,\n    /* fifo */ fifo_C_drain_PE_0_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 1,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_1_0,\n    /* fifo */ fifo_A_PE_1_1,\n    /* fifo */ fifo_B_PE_1_0,\n    /* fifo */ fifo_B_PE_2_0,\n    /* fifo */ fifo_C_drain_PE_1_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 1,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_1_1,\n    /* fifo */ fifo_A_PE_1_2,\n    /* fifo */ fifo_B_PE_1_1,\n    /* fifo */ fifo_B_PE_2_1,\n    /* fifo */ fifo_C_drain_PE_1_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 1,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_1_2,\n    /* fifo */ fifo_A_PE_1_3,\n    /* fifo */ fifo_B_PE_1_2,\n    /* fifo */ fifo_B_PE_2_2,\n    /* fifo */ fifo_C_drain_PE_1_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 1,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_1_3,\n    /* fifo */ fifo_A_PE_1_4,\n    /* fifo */ fifo_B_PE_1_3,\n    /* fifo */ fifo_B_PE_2_3,\n    /* fifo */ fifo_C_drain_PE_1_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 1,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_1_4,\n    /* fifo */ fifo_A_PE_1_5,\n    /* fifo */ fifo_B_PE_1_4,\n    /* fifo */ fifo_B_PE_2_4,\n    /* fifo */ fifo_C_drain_PE_1_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 1,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_1_5,\n    /* fifo */ fifo_A_PE_1_6,\n    /* fifo */ fifo_B_PE_1_5,\n    /* fifo */ fifo_B_PE_2_5,\n    /* fifo */ fifo_C_drain_PE_1_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 1,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_1_6,\n    /* fifo */ fifo_A_PE_1_7,\n    /* fifo */ fifo_B_PE_1_6,\n    /* fifo */ fifo_B_PE_2_6,\n    /* fifo */ fifo_C_drain_PE_1_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 1,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_1_7,\n    /* fifo */ fifo_A_PE_1_8,\n    /* fifo */ fifo_B_PE_1_7,\n    /* fifo */ fifo_B_PE_2_7,\n    /* fifo */ fifo_C_drain_PE_1_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 2,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_2_0,\n    /* fifo */ fifo_A_PE_2_1,\n    /* fifo */ fifo_B_PE_2_0,\n    /* fifo */ fifo_B_PE_3_0,\n    /* fifo */ fifo_C_drain_PE_2_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 2,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_2_1,\n    /* fifo */ fifo_A_PE_2_2,\n    /* fifo */ fifo_B_PE_2_1,\n    /* fifo */ fifo_B_PE_3_1,\n    /* fifo */ fifo_C_drain_PE_2_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 2,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_2_2,\n    /* fifo */ fifo_A_PE_2_3,\n    /* fifo */ fifo_B_PE_2_2,\n    /* fifo */ fifo_B_PE_3_2,\n    /* fifo */ fifo_C_drain_PE_2_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 2,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_2_3,\n    /* fifo */ fifo_A_PE_2_4,\n    /* fifo */ fifo_B_PE_2_3,\n    /* fifo */ fifo_B_PE_3_3,\n    /* fifo */ fifo_C_drain_PE_2_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 2,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_2_4,\n    /* fifo */ fifo_A_PE_2_5,\n    /* fifo */ fifo_B_PE_2_4,\n    /* fifo */ fifo_B_PE_3_4,\n    /* fifo */ fifo_C_drain_PE_2_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 2,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_2_5,\n    /* fifo */ fifo_A_PE_2_6,\n    /* fifo */ fifo_B_PE_2_5,\n    /* fifo */ fifo_B_PE_3_5,\n    /* fifo */ fifo_C_drain_PE_2_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 2,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_2_6,\n    /* fifo */ fifo_A_PE_2_7,\n    /* fifo */ fifo_B_PE_2_6,\n    /* fifo */ fifo_B_PE_3_6,\n    /* fifo */ fifo_C_drain_PE_2_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 2,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_2_7,\n    /* fifo */ fifo_A_PE_2_8,\n    /* fifo */ fifo_B_PE_2_7,\n    /* fifo */ fifo_B_PE_3_7,\n    /* fifo */ fifo_C_drain_PE_2_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 3,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_3_0,\n    /* fifo */ fifo_A_PE_3_1,\n    /* fifo */ fifo_B_PE_3_0,\n    /* fifo */ fifo_B_PE_4_0,\n    /* fifo */ fifo_C_drain_PE_3_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 3,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_3_1,\n    /* fifo */ fifo_A_PE_3_2,\n    /* fifo */ fifo_B_PE_3_1,\n    /* fifo */ fifo_B_PE_4_1,\n    /* fifo */ fifo_C_drain_PE_3_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 3,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_3_2,\n    /* fifo */ fifo_A_PE_3_3,\n    /* fifo */ fifo_B_PE_3_2,\n    /* fifo */ fifo_B_PE_4_2,\n    /* fifo */ fifo_C_drain_PE_3_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 3,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_3_3,\n    /* fifo */ fifo_A_PE_3_4,\n    /* fifo */ fifo_B_PE_3_3,\n    /* fifo */ fifo_B_PE_4_3,\n    /* fifo */ fifo_C_drain_PE_3_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 3,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_3_4,\n    /* fifo */ fifo_A_PE_3_5,\n    /* fifo */ fifo_B_PE_3_4,\n    /* fifo */ fifo_B_PE_4_4,\n    /* fifo */ fifo_C_drain_PE_3_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 3,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_3_5,\n    /* fifo */ fifo_A_PE_3_6,\n    /* fifo */ fifo_B_PE_3_5,\n    /* fifo */ fifo_B_PE_4_5,\n    /* fifo */ fifo_C_drain_PE_3_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 3,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_3_6,\n    /* fifo */ fifo_A_PE_3_7,\n    /* fifo */ fifo_B_PE_3_6,\n    /* fifo */ fifo_B_PE_4_6,\n    /* fifo */ fifo_C_drain_PE_3_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 3,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_3_7,\n    /* fifo */ fifo_A_PE_3_8,\n    /* fifo */ fifo_B_PE_3_7,\n    /* fifo */ fifo_B_PE_4_7,\n    /* fifo */ fifo_C_drain_PE_3_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 4,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_4_0,\n    /* fifo */ fifo_A_PE_4_1,\n    /* fifo */ fifo_B_PE_4_0,\n    /* fifo */ fifo_B_PE_5_0,\n    /* fifo */ fifo_C_drain_PE_4_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 4,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_4_1,\n    /* fifo */ fifo_A_PE_4_2,\n    /* fifo */ fifo_B_PE_4_1,\n    /* fifo */ fifo_B_PE_5_1,\n    /* fifo */ fifo_C_drain_PE_4_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 4,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_4_2,\n    /* fifo */ fifo_A_PE_4_3,\n    /* fifo */ fifo_B_PE_4_2,\n    /* fifo */ fifo_B_PE_5_2,\n    /* fifo */ fifo_C_drain_PE_4_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 4,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_4_3,\n    /* fifo */ fifo_A_PE_4_4,\n    /* fifo */ fifo_B_PE_4_3,\n    /* fifo */ fifo_B_PE_5_3,\n    /* fifo */ fifo_C_drain_PE_4_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 4,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_4_4,\n    /* fifo */ fifo_A_PE_4_5,\n    /* fifo */ fifo_B_PE_4_4,\n    /* fifo */ fifo_B_PE_5_4,\n    /* fifo */ fifo_C_drain_PE_4_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 4,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_4_5,\n    /* fifo */ fifo_A_PE_4_6,\n    /* fifo */ fifo_B_PE_4_5,\n    /* fifo */ fifo_B_PE_5_5,\n    /* fifo */ fifo_C_drain_PE_4_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 4,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_4_6,\n    /* fifo */ fifo_A_PE_4_7,\n    /* fifo */ fifo_B_PE_4_6,\n    /* fifo */ fifo_B_PE_5_6,\n    /* fifo */ fifo_C_drain_PE_4_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 4,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_4_7,\n    /* fifo */ fifo_A_PE_4_8,\n    /* fifo */ fifo_B_PE_4_7,\n    /* fifo */ fifo_B_PE_5_7,\n    /* fifo */ fifo_C_drain_PE_4_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 5,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_5_0,\n    /* fifo */ fifo_A_PE_5_1,\n    /* fifo */ fifo_B_PE_5_0,\n    /* fifo */ fifo_B_PE_6_0,\n    /* fifo */ fifo_C_drain_PE_5_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 5,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_5_1,\n    /* fifo */ fifo_A_PE_5_2,\n    /* fifo */ fifo_B_PE_5_1,\n    /* fifo */ fifo_B_PE_6_1,\n    /* fifo */ fifo_C_drain_PE_5_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 5,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_5_2,\n    /* fifo */ fifo_A_PE_5_3,\n    /* fifo */ fifo_B_PE_5_2,\n    /* fifo */ fifo_B_PE_6_2,\n    /* fifo */ fifo_C_drain_PE_5_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 5,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_5_3,\n    /* fifo */ fifo_A_PE_5_4,\n    /* fifo */ fifo_B_PE_5_3,\n    /* fifo */ fifo_B_PE_6_3,\n    /* fifo */ fifo_C_drain_PE_5_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 5,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_5_4,\n    /* fifo */ fifo_A_PE_5_5,\n    /* fifo */ fifo_B_PE_5_4,\n    /* fifo */ fifo_B_PE_6_4,\n    /* fifo */ fifo_C_drain_PE_5_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 5,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_5_5,\n    /* fifo */ fifo_A_PE_5_6,\n    /* fifo */ fifo_B_PE_5_5,\n    /* fifo */ fifo_B_PE_6_5,\n    /* fifo */ fifo_C_drain_PE_5_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 5,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_5_6,\n    /* fifo */ fifo_A_PE_5_7,\n    /* fifo */ fifo_B_PE_5_6,\n    /* fifo */ fifo_B_PE_6_6,\n    /* fifo */ fifo_C_drain_PE_5_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 5,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_5_7,\n    /* fifo */ fifo_A_PE_5_8,\n    /* fifo */ fifo_B_PE_5_7,\n    /* fifo */ fifo_B_PE_6_7,\n    /* fifo */ fifo_C_drain_PE_5_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 6,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_6_0,\n    /* fifo */ fifo_A_PE_6_1,\n    /* fifo */ fifo_B_PE_6_0,\n    /* fifo */ fifo_B_PE_7_0,\n    /* fifo */ fifo_C_drain_PE_6_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 6,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_6_1,\n    /* fifo */ fifo_A_PE_6_2,\n    /* fifo */ fifo_B_PE_6_1,\n    /* fifo */ fifo_B_PE_7_1,\n    /* fifo */ fifo_C_drain_PE_6_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 6,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_6_2,\n    /* fifo */ fifo_A_PE_6_3,\n    /* fifo */ fifo_B_PE_6_2,\n    /* fifo */ fifo_B_PE_7_2,\n    /* fifo */ fifo_C_drain_PE_6_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 6,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_6_3,\n    /* fifo */ fifo_A_PE_6_4,\n    /* fifo */ fifo_B_PE_6_3,\n    /* fifo */ fifo_B_PE_7_3,\n    /* fifo */ fifo_C_drain_PE_6_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 6,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_6_4,\n    /* fifo */ fifo_A_PE_6_5,\n    /* fifo */ fifo_B_PE_6_4,\n    /* fifo */ fifo_B_PE_7_4,\n    /* fifo */ fifo_C_drain_PE_6_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 6,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_6_5,\n    /* fifo */ fifo_A_PE_6_6,\n    /* fifo */ fifo_B_PE_6_5,\n    /* fifo */ fifo_B_PE_7_5,\n    /* fifo */ fifo_C_drain_PE_6_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 6,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_6_6,\n    /* fifo */ fifo_A_PE_6_7,\n    /* fifo */ fifo_B_PE_6_6,\n    /* fifo */ fifo_B_PE_7_6,\n    /* fifo */ fifo_C_drain_PE_6_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 6,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_6_7,\n    /* fifo */ fifo_A_PE_6_8,\n    /* fifo */ fifo_B_PE_6_7,\n    /* fifo */ fifo_B_PE_7_7,\n    /* fifo */ fifo_C_drain_PE_6_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 7,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_7_0,\n    /* fifo */ fifo_A_PE_7_1,\n    /* fifo */ fifo_B_PE_7_0,\n    /* fifo */ fifo_B_PE_8_0,\n    /* fifo */ fifo_C_drain_PE_7_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 7,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_7_1,\n    /* fifo */ fifo_A_PE_7_2,\n    /* fifo */ fifo_B_PE_7_1,\n    /* fifo */ fifo_B_PE_8_1,\n    /* fifo */ fifo_C_drain_PE_7_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 7,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_7_2,\n    /* fifo */ fifo_A_PE_7_3,\n    /* fifo */ fifo_B_PE_7_2,\n    /* fifo */ fifo_B_PE_8_2,\n    /* fifo */ fifo_C_drain_PE_7_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 7,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_7_3,\n    /* fifo */ fifo_A_PE_7_4,\n    /* fifo */ fifo_B_PE_7_3,\n    /* fifo */ fifo_B_PE_8_3,\n    /* fifo */ fifo_C_drain_PE_7_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 7,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_7_4,\n    /* fifo */ fifo_A_PE_7_5,\n    /* fifo */ fifo_B_PE_7_4,\n    /* fifo */ fifo_B_PE_8_4,\n    /* fifo */ fifo_C_drain_PE_7_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 7,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_7_5,\n    /* fifo */ fifo_A_PE_7_6,\n    /* fifo */ fifo_B_PE_7_5,\n    /* fifo */ fifo_B_PE_8_5,\n    /* fifo */ fifo_C_drain_PE_7_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 7,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_7_6,\n    /* fifo */ fifo_A_PE_7_7,\n    /* fifo */ fifo_B_PE_7_6,\n    /* fifo */ fifo_B_PE_8_6,\n    /* fifo */ fifo_C_drain_PE_7_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 7,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_7_7,\n    /* fifo */ fifo_A_PE_7_8,\n    /* fifo */ fifo_B_PE_7_7,\n    /* fifo */ fifo_B_PE_8_7,\n    /* fifo */ fifo_C_drain_PE_7_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 8,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_8_0,\n    /* fifo */ fifo_A_PE_8_1,\n    /* fifo */ fifo_B_PE_8_0,\n    /* fifo */ fifo_B_PE_9_0,\n    /* fifo */ fifo_C_drain_PE_8_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 8,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_8_1,\n    /* fifo */ fifo_A_PE_8_2,\n    /* fifo */ fifo_B_PE_8_1,\n    /* fifo */ fifo_B_PE_9_1,\n    /* fifo */ fifo_C_drain_PE_8_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 8,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_8_2,\n    /* fifo */ fifo_A_PE_8_3,\n    /* fifo */ fifo_B_PE_8_2,\n    /* fifo */ fifo_B_PE_9_2,\n    /* fifo */ fifo_C_drain_PE_8_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 8,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_8_3,\n    /* fifo */ fifo_A_PE_8_4,\n    /* fifo */ fifo_B_PE_8_3,\n    /* fifo */ fifo_B_PE_9_3,\n    /* fifo */ fifo_C_drain_PE_8_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 8,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_8_4,\n    /* fifo */ fifo_A_PE_8_5,\n    /* fifo */ fifo_B_PE_8_4,\n    /* fifo */ fifo_B_PE_9_4,\n    /* fifo */ fifo_C_drain_PE_8_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 8,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_8_5,\n    /* fifo */ fifo_A_PE_8_6,\n    /* fifo */ fifo_B_PE_8_5,\n    /* fifo */ fifo_B_PE_9_5,\n    /* fifo */ fifo_C_drain_PE_8_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 8,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_8_6,\n    /* fifo */ fifo_A_PE_8_7,\n    /* fifo */ fifo_B_PE_8_6,\n    /* fifo */ fifo_B_PE_9_6,\n    /* fifo */ fifo_C_drain_PE_8_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 8,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_8_7,\n    /* fifo */ fifo_A_PE_8_8,\n    /* fifo */ fifo_B_PE_8_7,\n    /* fifo */ fifo_B_PE_9_7,\n    /* fifo */ fifo_C_drain_PE_8_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 9,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_9_0,\n    /* fifo */ fifo_A_PE_9_1,\n    /* fifo */ fifo_B_PE_9_0,\n    /* fifo */ fifo_B_PE_10_0,\n    /* fifo */ fifo_C_drain_PE_9_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 9,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_9_1,\n    /* fifo */ fifo_A_PE_9_2,\n    /* fifo */ fifo_B_PE_9_1,\n    /* fifo */ fifo_B_PE_10_1,\n    /* fifo */ fifo_C_drain_PE_9_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 9,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_9_2,\n    /* fifo */ fifo_A_PE_9_3,\n    /* fifo */ fifo_B_PE_9_2,\n    /* fifo */ fifo_B_PE_10_2,\n    /* fifo */ fifo_C_drain_PE_9_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 9,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_9_3,\n    /* fifo */ fifo_A_PE_9_4,\n    /* fifo */ fifo_B_PE_9_3,\n    /* fifo */ fifo_B_PE_10_3,\n    /* fifo */ fifo_C_drain_PE_9_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 9,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_9_4,\n    /* fifo */ fifo_A_PE_9_5,\n    /* fifo */ fifo_B_PE_9_4,\n    /* fifo */ fifo_B_PE_10_4,\n    /* fifo */ fifo_C_drain_PE_9_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 9,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_9_5,\n    /* fifo */ fifo_A_PE_9_6,\n    /* fifo */ fifo_B_PE_9_5,\n    /* fifo */ fifo_B_PE_10_5,\n    /* fifo */ fifo_C_drain_PE_9_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 9,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_9_6,\n    /* fifo */ fifo_A_PE_9_7,\n    /* fifo */ fifo_B_PE_9_6,\n    /* fifo */ fifo_B_PE_10_6,\n    /* fifo */ fifo_C_drain_PE_9_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 9,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_9_7,\n    /* fifo */ fifo_A_PE_9_8,\n    /* fifo */ fifo_B_PE_9_7,\n    /* fifo */ fifo_B_PE_10_7,\n    /* fifo */ fifo_C_drain_PE_9_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 10,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_10_0,\n    /* fifo */ fifo_A_PE_10_1,\n    /* fifo */ fifo_B_PE_10_0,\n    /* fifo */ fifo_B_PE_11_0,\n    /* fifo */ fifo_C_drain_PE_10_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 10,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_10_1,\n    /* fifo */ fifo_A_PE_10_2,\n    /* fifo */ fifo_B_PE_10_1,\n    /* fifo */ fifo_B_PE_11_1,\n    /* fifo */ fifo_C_drain_PE_10_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 10,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_10_2,\n    /* fifo */ fifo_A_PE_10_3,\n    /* fifo */ fifo_B_PE_10_2,\n    /* fifo */ fifo_B_PE_11_2,\n    /* fifo */ fifo_C_drain_PE_10_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 10,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_10_3,\n    /* fifo */ fifo_A_PE_10_4,\n    /* fifo */ fifo_B_PE_10_3,\n    /* fifo */ fifo_B_PE_11_3,\n    /* fifo */ fifo_C_drain_PE_10_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 10,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_10_4,\n    /* fifo */ fifo_A_PE_10_5,\n    /* fifo */ fifo_B_PE_10_4,\n    /* fifo */ fifo_B_PE_11_4,\n    /* fifo */ fifo_C_drain_PE_10_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 10,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_10_5,\n    /* fifo */ fifo_A_PE_10_6,\n    /* fifo */ fifo_B_PE_10_5,\n    /* fifo */ fifo_B_PE_11_5,\n    /* fifo */ fifo_C_drain_PE_10_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 10,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_10_6,\n    /* fifo */ fifo_A_PE_10_7,\n    /* fifo */ fifo_B_PE_10_6,\n    /* fifo */ fifo_B_PE_11_6,\n    /* fifo */ fifo_C_drain_PE_10_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 10,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_10_7,\n    /* fifo */ fifo_A_PE_10_8,\n    /* fifo */ fifo_B_PE_10_7,\n    /* fifo */ fifo_B_PE_11_7,\n    /* fifo */ fifo_C_drain_PE_10_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 11,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_11_0,\n    /* fifo */ fifo_A_PE_11_1,\n    /* fifo */ fifo_B_PE_11_0,\n    /* fifo */ fifo_B_PE_12_0,\n    /* fifo */ fifo_C_drain_PE_11_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 11,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_11_1,\n    /* fifo */ fifo_A_PE_11_2,\n    /* fifo */ fifo_B_PE_11_1,\n    /* fifo */ fifo_B_PE_12_1,\n    /* fifo */ fifo_C_drain_PE_11_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 11,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_11_2,\n    /* fifo */ fifo_A_PE_11_3,\n    /* fifo */ fifo_B_PE_11_2,\n    /* fifo */ fifo_B_PE_12_2,\n    /* fifo */ fifo_C_drain_PE_11_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 11,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_11_3,\n    /* fifo */ fifo_A_PE_11_4,\n    /* fifo */ fifo_B_PE_11_3,\n    /* fifo */ fifo_B_PE_12_3,\n    /* fifo */ fifo_C_drain_PE_11_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 11,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_11_4,\n    /* fifo */ fifo_A_PE_11_5,\n    /* fifo */ fifo_B_PE_11_4,\n    /* fifo */ fifo_B_PE_12_4,\n    /* fifo */ fifo_C_drain_PE_11_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 11,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_11_5,\n    /* fifo */ fifo_A_PE_11_6,\n    /* fifo */ fifo_B_PE_11_5,\n    /* fifo */ fifo_B_PE_12_5,\n    /* fifo */ fifo_C_drain_PE_11_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 11,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_11_6,\n    /* fifo */ fifo_A_PE_11_7,\n    /* fifo */ fifo_B_PE_11_6,\n    /* fifo */ fifo_B_PE_12_6,\n    /* fifo */ fifo_C_drain_PE_11_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 11,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_11_7,\n    /* fifo */ fifo_A_PE_11_8,\n    /* fifo */ fifo_B_PE_11_7,\n    /* fifo */ fifo_B_PE_12_7,\n    /* fifo */ fifo_C_drain_PE_11_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 12,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_12_0,\n    /* fifo */ fifo_A_PE_12_1,\n    /* fifo */ fifo_B_PE_12_0,\n    /* fifo */ fifo_B_PE_13_0,\n    /* fifo */ fifo_C_drain_PE_12_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 12,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_12_1,\n    /* fifo */ fifo_A_PE_12_2,\n    /* fifo */ fifo_B_PE_12_1,\n    /* fifo */ fifo_B_PE_13_1,\n    /* fifo */ fifo_C_drain_PE_12_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 12,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_12_2,\n    /* fifo */ fifo_A_PE_12_3,\n    /* fifo */ fifo_B_PE_12_2,\n    /* fifo */ fifo_B_PE_13_2,\n    /* fifo */ fifo_C_drain_PE_12_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 12,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_12_3,\n    /* fifo */ fifo_A_PE_12_4,\n    /* fifo */ fifo_B_PE_12_3,\n    /* fifo */ fifo_B_PE_13_3,\n    /* fifo */ fifo_C_drain_PE_12_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 12,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_12_4,\n    /* fifo */ fifo_A_PE_12_5,\n    /* fifo */ fifo_B_PE_12_4,\n    /* fifo */ fifo_B_PE_13_4,\n    /* fifo */ fifo_C_drain_PE_12_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 12,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_12_5,\n    /* fifo */ fifo_A_PE_12_6,\n    /* fifo */ fifo_B_PE_12_5,\n    /* fifo */ fifo_B_PE_13_5,\n    /* fifo */ fifo_C_drain_PE_12_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 12,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_12_6,\n    /* fifo */ fifo_A_PE_12_7,\n    /* fifo */ fifo_B_PE_12_6,\n    /* fifo */ fifo_B_PE_13_6,\n    /* fifo */ fifo_C_drain_PE_12_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 12,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_12_7,\n    /* fifo */ fifo_A_PE_12_8,\n    /* fifo */ fifo_B_PE_12_7,\n    /* fifo */ fifo_B_PE_13_7,\n    /* fifo */ fifo_C_drain_PE_12_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 13,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_13_0,\n    /* fifo */ fifo_A_PE_13_1,\n    /* fifo */ fifo_B_PE_13_0,\n    /* fifo */ fifo_B_PE_14_0,\n    /* fifo */ fifo_C_drain_PE_13_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 13,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_13_1,\n    /* fifo */ fifo_A_PE_13_2,\n    /* fifo */ fifo_B_PE_13_1,\n    /* fifo */ fifo_B_PE_14_1,\n    /* fifo */ fifo_C_drain_PE_13_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 13,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_13_2,\n    /* fifo */ fifo_A_PE_13_3,\n    /* fifo */ fifo_B_PE_13_2,\n    /* fifo */ fifo_B_PE_14_2,\n    /* fifo */ fifo_C_drain_PE_13_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 13,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_13_3,\n    /* fifo */ fifo_A_PE_13_4,\n    /* fifo */ fifo_B_PE_13_3,\n    /* fifo */ fifo_B_PE_14_3,\n    /* fifo */ fifo_C_drain_PE_13_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 13,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_13_4,\n    /* fifo */ fifo_A_PE_13_5,\n    /* fifo */ fifo_B_PE_13_4,\n    /* fifo */ fifo_B_PE_14_4,\n    /* fifo */ fifo_C_drain_PE_13_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 13,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_13_5,\n    /* fifo */ fifo_A_PE_13_6,\n    /* fifo */ fifo_B_PE_13_5,\n    /* fifo */ fifo_B_PE_14_5,\n    /* fifo */ fifo_C_drain_PE_13_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 13,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_13_6,\n    /* fifo */ fifo_A_PE_13_7,\n    /* fifo */ fifo_B_PE_13_6,\n    /* fifo */ fifo_B_PE_14_6,\n    /* fifo */ fifo_C_drain_PE_13_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 13,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_13_7,\n    /* fifo */ fifo_A_PE_13_8,\n    /* fifo */ fifo_B_PE_13_7,\n    /* fifo */ fifo_B_PE_14_7,\n    /* fifo */ fifo_C_drain_PE_13_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 14,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_14_0,\n    /* fifo */ fifo_A_PE_14_1,\n    /* fifo */ fifo_B_PE_14_0,\n    /* fifo */ fifo_B_PE_15_0,\n    /* fifo */ fifo_C_drain_PE_14_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 14,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_14_1,\n    /* fifo */ fifo_A_PE_14_2,\n    /* fifo */ fifo_B_PE_14_1,\n    /* fifo */ fifo_B_PE_15_1,\n    /* fifo */ fifo_C_drain_PE_14_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 14,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_14_2,\n    /* fifo */ fifo_A_PE_14_3,\n    /* fifo */ fifo_B_PE_14_2,\n    /* fifo */ fifo_B_PE_15_2,\n    /* fifo */ fifo_C_drain_PE_14_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 14,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_14_3,\n    /* fifo */ fifo_A_PE_14_4,\n    /* fifo */ fifo_B_PE_14_3,\n    /* fifo */ fifo_B_PE_15_3,\n    /* fifo */ fifo_C_drain_PE_14_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 14,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_14_4,\n    /* fifo */ fifo_A_PE_14_5,\n    /* fifo */ fifo_B_PE_14_4,\n    /* fifo */ fifo_B_PE_15_4,\n    /* fifo */ fifo_C_drain_PE_14_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 14,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_14_5,\n    /* fifo */ fifo_A_PE_14_6,\n    /* fifo */ fifo_B_PE_14_5,\n    /* fifo */ fifo_B_PE_15_5,\n    /* fifo */ fifo_C_drain_PE_14_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 14,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_14_6,\n    /* fifo */ fifo_A_PE_14_7,\n    /* fifo */ fifo_B_PE_14_6,\n    /* fifo */ fifo_B_PE_15_6,\n    /* fifo */ fifo_C_drain_PE_14_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 14,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_14_7,\n    /* fifo */ fifo_A_PE_14_8,\n    /* fifo */ fifo_B_PE_14_7,\n    /* fifo */ fifo_B_PE_15_7,\n    /* fifo */ fifo_C_drain_PE_14_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 15,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_15_0,\n    /* fifo */ fifo_A_PE_15_1,\n    /* fifo */ fifo_B_PE_15_0,\n    /* fifo */ fifo_B_PE_16_0,\n    /* fifo */ fifo_C_drain_PE_15_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 15,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_15_1,\n    /* fifo */ fifo_A_PE_15_2,\n    /* fifo */ fifo_B_PE_15_1,\n    /* fifo */ fifo_B_PE_16_1,\n    /* fifo */ fifo_C_drain_PE_15_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 15,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_15_2,\n    /* fifo */ fifo_A_PE_15_3,\n    /* fifo */ fifo_B_PE_15_2,\n    /* fifo */ fifo_B_PE_16_2,\n    /* fifo */ fifo_C_drain_PE_15_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 15,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_15_3,\n    /* fifo */ fifo_A_PE_15_4,\n    /* fifo */ fifo_B_PE_15_3,\n    /* fifo */ fifo_B_PE_16_3,\n    /* fifo */ fifo_C_drain_PE_15_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 15,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_15_4,\n    /* fifo */ fifo_A_PE_15_5,\n    /* fifo */ fifo_B_PE_15_4,\n    /* fifo */ fifo_B_PE_16_4,\n    /* fifo */ fifo_C_drain_PE_15_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 15,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_15_5,\n    /* fifo */ fifo_A_PE_15_6,\n    /* fifo */ fifo_B_PE_15_5,\n    /* fifo */ fifo_B_PE_16_5,\n    /* fifo */ fifo_C_drain_PE_15_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 15,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_15_6,\n    /* fifo */ fifo_A_PE_15_7,\n    /* fifo */ fifo_B_PE_15_6,\n    /* fifo */ fifo_B_PE_16_6,\n    /* fifo */ fifo_C_drain_PE_15_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 15,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_15_7,\n    /* fifo */ fifo_A_PE_15_8,\n    /* fifo */ fifo_B_PE_15_7,\n    /* fifo */ fifo_B_PE_16_7,\n    /* fifo */ fifo_C_drain_PE_15_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 16,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_16_0,\n    /* fifo */ fifo_A_PE_16_1,\n    /* fifo */ fifo_B_PE_16_0,\n    /* fifo */ fifo_B_PE_17_0,\n    /* fifo */ fifo_C_drain_PE_16_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 16,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_16_1,\n    /* fifo */ fifo_A_PE_16_2,\n    /* fifo */ fifo_B_PE_16_1,\n    /* fifo */ fifo_B_PE_17_1,\n    /* fifo */ fifo_C_drain_PE_16_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 16,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_16_2,\n    /* fifo */ fifo_A_PE_16_3,\n    /* fifo */ fifo_B_PE_16_2,\n    /* fifo */ fifo_B_PE_17_2,\n    /* fifo */ fifo_C_drain_PE_16_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 16,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_16_3,\n    /* fifo */ fifo_A_PE_16_4,\n    /* fifo */ fifo_B_PE_16_3,\n    /* fifo */ fifo_B_PE_17_3,\n    /* fifo */ fifo_C_drain_PE_16_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 16,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_16_4,\n    /* fifo */ fifo_A_PE_16_5,\n    /* fifo */ fifo_B_PE_16_4,\n    /* fifo */ fifo_B_PE_17_4,\n    /* fifo */ fifo_C_drain_PE_16_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 16,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_16_5,\n    /* fifo */ fifo_A_PE_16_6,\n    /* fifo */ fifo_B_PE_16_5,\n    /* fifo */ fifo_B_PE_17_5,\n    /* fifo */ fifo_C_drain_PE_16_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 16,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_16_6,\n    /* fifo */ fifo_A_PE_16_7,\n    /* fifo */ fifo_B_PE_16_6,\n    /* fifo */ fifo_B_PE_17_6,\n    /* fifo */ fifo_C_drain_PE_16_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 16,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_16_7,\n    /* fifo */ fifo_A_PE_16_8,\n    /* fifo */ fifo_B_PE_16_7,\n    /* fifo */ fifo_B_PE_17_7,\n    /* fifo */ fifo_C_drain_PE_16_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 17,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_17_0,\n    /* fifo */ fifo_A_PE_17_1,\n    /* fifo */ fifo_B_PE_17_0,\n    /* fifo */ fifo_B_PE_18_0,\n    /* fifo */ fifo_C_drain_PE_17_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 17,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_17_1,\n    /* fifo */ fifo_A_PE_17_2,\n    /* fifo */ fifo_B_PE_17_1,\n    /* fifo */ fifo_B_PE_18_1,\n    /* fifo */ fifo_C_drain_PE_17_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 17,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_17_2,\n    /* fifo */ fifo_A_PE_17_3,\n    /* fifo */ fifo_B_PE_17_2,\n    /* fifo */ fifo_B_PE_18_2,\n    /* fifo */ fifo_C_drain_PE_17_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 17,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_17_3,\n    /* fifo */ fifo_A_PE_17_4,\n    /* fifo */ fifo_B_PE_17_3,\n    /* fifo */ fifo_B_PE_18_3,\n    /* fifo */ fifo_C_drain_PE_17_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 17,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_17_4,\n    /* fifo */ fifo_A_PE_17_5,\n    /* fifo */ fifo_B_PE_17_4,\n    /* fifo */ fifo_B_PE_18_4,\n    /* fifo */ fifo_C_drain_PE_17_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 17,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_17_5,\n    /* fifo */ fifo_A_PE_17_6,\n    /* fifo */ fifo_B_PE_17_5,\n    /* fifo */ fifo_B_PE_18_5,\n    /* fifo */ fifo_C_drain_PE_17_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 17,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_17_6,\n    /* fifo */ fifo_A_PE_17_7,\n    /* fifo */ fifo_B_PE_17_6,\n    /* fifo */ fifo_B_PE_18_6,\n    /* fifo */ fifo_C_drain_PE_17_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 17,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_17_7,\n    /* fifo */ fifo_A_PE_17_8,\n    /* fifo */ fifo_B_PE_17_7,\n    /* fifo */ fifo_B_PE_18_7,\n    /* fifo */ fifo_C_drain_PE_17_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 18,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_18_0,\n    /* fifo */ fifo_A_PE_18_1,\n    /* fifo */ fifo_B_PE_18_0,\n    /* fifo */ fifo_B_PE_19_0,\n    /* fifo */ fifo_C_drain_PE_18_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 18,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_18_1,\n    /* fifo */ fifo_A_PE_18_2,\n    /* fifo */ fifo_B_PE_18_1,\n    /* fifo */ fifo_B_PE_19_1,\n    /* fifo */ fifo_C_drain_PE_18_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 18,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_18_2,\n    /* fifo */ fifo_A_PE_18_3,\n    /* fifo */ fifo_B_PE_18_2,\n    /* fifo */ fifo_B_PE_19_2,\n    /* fifo */ fifo_C_drain_PE_18_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 18,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_18_3,\n    /* fifo */ fifo_A_PE_18_4,\n    /* fifo */ fifo_B_PE_18_3,\n    /* fifo */ fifo_B_PE_19_3,\n    /* fifo */ fifo_C_drain_PE_18_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 18,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_18_4,\n    /* fifo */ fifo_A_PE_18_5,\n    /* fifo */ fifo_B_PE_18_4,\n    /* fifo */ fifo_B_PE_19_4,\n    /* fifo */ fifo_C_drain_PE_18_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 18,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_18_5,\n    /* fifo */ fifo_A_PE_18_6,\n    /* fifo */ fifo_B_PE_18_5,\n    /* fifo */ fifo_B_PE_19_5,\n    /* fifo */ fifo_C_drain_PE_18_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 18,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_18_6,\n    /* fifo */ fifo_A_PE_18_7,\n    /* fifo */ fifo_B_PE_18_6,\n    /* fifo */ fifo_B_PE_19_6,\n    /* fifo */ fifo_C_drain_PE_18_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 18,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_18_7,\n    /* fifo */ fifo_A_PE_18_8,\n    /* fifo */ fifo_B_PE_18_7,\n    /* fifo */ fifo_B_PE_19_7,\n    /* fifo */ fifo_C_drain_PE_18_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 19,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_19_0,\n    /* fifo */ fifo_A_PE_19_1,\n    /* fifo */ fifo_B_PE_19_0,\n    /* fifo */ fifo_B_PE_20_0,\n    /* fifo */ fifo_C_drain_PE_19_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 19,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_19_1,\n    /* fifo */ fifo_A_PE_19_2,\n    /* fifo */ fifo_B_PE_19_1,\n    /* fifo */ fifo_B_PE_20_1,\n    /* fifo */ fifo_C_drain_PE_19_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 19,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_19_2,\n    /* fifo */ fifo_A_PE_19_3,\n    /* fifo */ fifo_B_PE_19_2,\n    /* fifo */ fifo_B_PE_20_2,\n    /* fifo */ fifo_C_drain_PE_19_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 19,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_19_3,\n    /* fifo */ fifo_A_PE_19_4,\n    /* fifo */ fifo_B_PE_19_3,\n    /* fifo */ fifo_B_PE_20_3,\n    /* fifo */ fifo_C_drain_PE_19_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 19,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_19_4,\n    /* fifo */ fifo_A_PE_19_5,\n    /* fifo */ fifo_B_PE_19_4,\n    /* fifo */ fifo_B_PE_20_4,\n    /* fifo */ fifo_C_drain_PE_19_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 19,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_19_5,\n    /* fifo */ fifo_A_PE_19_6,\n    /* fifo */ fifo_B_PE_19_5,\n    /* fifo */ fifo_B_PE_20_5,\n    /* fifo */ fifo_C_drain_PE_19_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 19,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_19_6,\n    /* fifo */ fifo_A_PE_19_7,\n    /* fifo */ fifo_B_PE_19_6,\n    /* fifo */ fifo_B_PE_20_6,\n    /* fifo */ fifo_C_drain_PE_19_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 19,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_19_7,\n    /* fifo */ fifo_A_PE_19_8,\n    /* fifo */ fifo_B_PE_19_7,\n    /* fifo */ fifo_B_PE_20_7,\n    /* fifo */ fifo_C_drain_PE_19_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 20,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_20_0,\n    /* fifo */ fifo_A_PE_20_1,\n    /* fifo */ fifo_B_PE_20_0,\n    /* fifo */ fifo_B_PE_21_0,\n    /* fifo */ fifo_C_drain_PE_20_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 20,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_20_1,\n    /* fifo */ fifo_A_PE_20_2,\n    /* fifo */ fifo_B_PE_20_1,\n    /* fifo */ fifo_B_PE_21_1,\n    /* fifo */ fifo_C_drain_PE_20_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 20,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_20_2,\n    /* fifo */ fifo_A_PE_20_3,\n    /* fifo */ fifo_B_PE_20_2,\n    /* fifo */ fifo_B_PE_21_2,\n    /* fifo */ fifo_C_drain_PE_20_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 20,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_20_3,\n    /* fifo */ fifo_A_PE_20_4,\n    /* fifo */ fifo_B_PE_20_3,\n    /* fifo */ fifo_B_PE_21_3,\n    /* fifo */ fifo_C_drain_PE_20_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 20,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_20_4,\n    /* fifo */ fifo_A_PE_20_5,\n    /* fifo */ fifo_B_PE_20_4,\n    /* fifo */ fifo_B_PE_21_4,\n    /* fifo */ fifo_C_drain_PE_20_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 20,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_20_5,\n    /* fifo */ fifo_A_PE_20_6,\n    /* fifo */ fifo_B_PE_20_5,\n    /* fifo */ fifo_B_PE_21_5,\n    /* fifo */ fifo_C_drain_PE_20_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 20,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_20_6,\n    /* fifo */ fifo_A_PE_20_7,\n    /* fifo */ fifo_B_PE_20_6,\n    /* fifo */ fifo_B_PE_21_6,\n    /* fifo */ fifo_C_drain_PE_20_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 20,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_20_7,\n    /* fifo */ fifo_A_PE_20_8,\n    /* fifo */ fifo_B_PE_20_7,\n    /* fifo */ fifo_B_PE_21_7,\n    /* fifo */ fifo_C_drain_PE_20_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 21,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_21_0,\n    /* fifo */ fifo_A_PE_21_1,\n    /* fifo */ fifo_B_PE_21_0,\n    /* fifo */ fifo_B_PE_22_0,\n    /* fifo */ fifo_C_drain_PE_21_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 21,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_21_1,\n    /* fifo */ fifo_A_PE_21_2,\n    /* fifo */ fifo_B_PE_21_1,\n    /* fifo */ fifo_B_PE_22_1,\n    /* fifo */ fifo_C_drain_PE_21_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 21,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_21_2,\n    /* fifo */ fifo_A_PE_21_3,\n    /* fifo */ fifo_B_PE_21_2,\n    /* fifo */ fifo_B_PE_22_2,\n    /* fifo */ fifo_C_drain_PE_21_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 21,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_21_3,\n    /* fifo */ fifo_A_PE_21_4,\n    /* fifo */ fifo_B_PE_21_3,\n    /* fifo */ fifo_B_PE_22_3,\n    /* fifo */ fifo_C_drain_PE_21_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 21,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_21_4,\n    /* fifo */ fifo_A_PE_21_5,\n    /* fifo */ fifo_B_PE_21_4,\n    /* fifo */ fifo_B_PE_22_4,\n    /* fifo */ fifo_C_drain_PE_21_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 21,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_21_5,\n    /* fifo */ fifo_A_PE_21_6,\n    /* fifo */ fifo_B_PE_21_5,\n    /* fifo */ fifo_B_PE_22_5,\n    /* fifo */ fifo_C_drain_PE_21_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 21,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_21_6,\n    /* fifo */ fifo_A_PE_21_7,\n    /* fifo */ fifo_B_PE_21_6,\n    /* fifo */ fifo_B_PE_22_6,\n    /* fifo */ fifo_C_drain_PE_21_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 21,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_21_7,\n    /* fifo */ fifo_A_PE_21_8,\n    /* fifo */ fifo_B_PE_21_7,\n    /* fifo */ fifo_B_PE_22_7,\n    /* fifo */ fifo_C_drain_PE_21_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 22,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_22_0,\n    /* fifo */ fifo_A_PE_22_1,\n    /* fifo */ fifo_B_PE_22_0,\n    /* fifo */ fifo_B_PE_23_0,\n    /* fifo */ fifo_C_drain_PE_22_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 22,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_22_1,\n    /* fifo */ fifo_A_PE_22_2,\n    /* fifo */ fifo_B_PE_22_1,\n    /* fifo */ fifo_B_PE_23_1,\n    /* fifo */ fifo_C_drain_PE_22_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 22,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_22_2,\n    /* fifo */ fifo_A_PE_22_3,\n    /* fifo */ fifo_B_PE_22_2,\n    /* fifo */ fifo_B_PE_23_2,\n    /* fifo */ fifo_C_drain_PE_22_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 22,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_22_3,\n    /* fifo */ fifo_A_PE_22_4,\n    /* fifo */ fifo_B_PE_22_3,\n    /* fifo */ fifo_B_PE_23_3,\n    /* fifo */ fifo_C_drain_PE_22_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 22,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_22_4,\n    /* fifo */ fifo_A_PE_22_5,\n    /* fifo */ fifo_B_PE_22_4,\n    /* fifo */ fifo_B_PE_23_4,\n    /* fifo */ fifo_C_drain_PE_22_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 22,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_22_5,\n    /* fifo */ fifo_A_PE_22_6,\n    /* fifo */ fifo_B_PE_22_5,\n    /* fifo */ fifo_B_PE_23_5,\n    /* fifo */ fifo_C_drain_PE_22_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 22,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_22_6,\n    /* fifo */ fifo_A_PE_22_7,\n    /* fifo */ fifo_B_PE_22_6,\n    /* fifo */ fifo_B_PE_23_6,\n    /* fifo */ fifo_C_drain_PE_22_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 22,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_22_7,\n    /* fifo */ fifo_A_PE_22_8,\n    /* fifo */ fifo_B_PE_22_7,\n    /* fifo */ fifo_B_PE_23_7,\n    /* fifo */ fifo_C_drain_PE_22_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 23,\n    /* module id */ 0,\n    /* fifo */ fifo_A_PE_23_0,\n    /* fifo */ fifo_A_PE_23_1,\n    /* fifo */ fifo_B_PE_23_0,\n    /* fifo */ fifo_B_PE_24_0,\n    /* fifo */ fifo_C_drain_PE_23_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 23,\n    /* module id */ 1,\n    /* fifo */ fifo_A_PE_23_1,\n    /* fifo */ fifo_A_PE_23_2,\n    /* fifo */ fifo_B_PE_23_1,\n    /* fifo */ fifo_B_PE_24_1,\n    /* fifo */ fifo_C_drain_PE_23_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 23,\n    /* module id */ 2,\n    /* fifo */ fifo_A_PE_23_2,\n    /* fifo */ fifo_A_PE_23_3,\n    /* fifo */ fifo_B_PE_23_2,\n    /* fifo */ fifo_B_PE_24_2,\n    /* fifo */ fifo_C_drain_PE_23_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 23,\n    /* module id */ 3,\n    /* fifo */ fifo_A_PE_23_3,\n    /* fifo */ fifo_A_PE_23_4,\n    /* fifo */ fifo_B_PE_23_3,\n    /* fifo */ fifo_B_PE_24_3,\n    /* fifo */ fifo_C_drain_PE_23_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 23,\n    /* module id */ 4,\n    /* fifo */ fifo_A_PE_23_4,\n    /* fifo */ fifo_A_PE_23_5,\n    /* fifo */ fifo_B_PE_23_4,\n    /* fifo */ fifo_B_PE_24_4,\n    /* fifo */ fifo_C_drain_PE_23_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 23,\n    /* module id */ 5,\n    /* fifo */ fifo_A_PE_23_5,\n    /* fifo */ fifo_A_PE_23_6,\n    /* fifo */ fifo_B_PE_23_5,\n    /* fifo */ fifo_B_PE_24_5,\n    /* fifo */ fifo_C_drain_PE_23_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 23,\n    /* module id */ 6,\n    /* fifo */ fifo_A_PE_23_6,\n    /* fifo */ fifo_A_PE_23_7,\n    /* fifo */ fifo_B_PE_23_6,\n    /* fifo */ fifo_B_PE_24_6,\n    /* fifo */ fifo_C_drain_PE_23_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  PE_wrapper(\n    /* module id */ 23,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_23_7,\n    /* fifo */ fifo_A_PE_23_8,\n    /* fifo */ fifo_B_PE_23_7,\n    /* fifo */ fifo_B_PE_24_7,\n    /* fifo */ fifo_C_drain_PE_23_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 0,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_0_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 1,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_1_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 2,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_2_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 3,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_3_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 4,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_4_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 5,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_5_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 6,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_6_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 7,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_7_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 8,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_8_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 9,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_9_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 10,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_10_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 11,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_11_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 12,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_12_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 13,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_13_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 14,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_14_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 15,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_15_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 16,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_16_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 17,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_17_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 18,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_18_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 19,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_19_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 20,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_20_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 21,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_21_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 22,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_22_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  A_PE_dummy_in(\n    /* module id */ 23,\n    /* module id */ 7,\n    /* fifo */ fifo_A_PE_23_8\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_PE_dummy_in(\n    /* module id */ 23,\n    /* module id */ 0,\n    /* fifo */ fifo_B_PE_24_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_PE_dummy_in(\n    /* module id */ 23,\n    /* module id */ 1,\n    /* fifo */ fifo_B_PE_24_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_PE_dummy_in(\n    /* module id */ 23,\n    /* module id */ 2,\n    /* fifo */ fifo_B_PE_24_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_PE_dummy_in(\n    /* module id */ 23,\n    /* module id */ 3,\n    /* fifo */ fifo_B_PE_24_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_PE_dummy_in(\n    /* module id */ 23,\n    /* module id */ 4,\n    /* fifo */ fifo_B_PE_24_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_PE_dummy_in(\n    /* module id */ 23,\n    /* module id */ 5,\n    /* fifo */ fifo_B_PE_24_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_PE_dummy_in(\n    /* module id */ 23,\n    /* module id */ 6,\n    /* fifo */ fifo_B_PE_24_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  B_PE_dummy_in(\n    /* module id */ 23,\n    /* module id */ 7,\n    /* fifo */ fifo_B_PE_24_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_boundary_wrapper(\n    /* module id */ 0,\n    /* module id */ 23,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_23,\n    /* fifo */ fifo_C_drain_PE_23_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 22,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_23,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_22,\n    /* fifo */ fifo_C_drain_PE_22_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 21,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_22,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_21,\n    /* fifo */ fifo_C_drain_PE_21_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 20,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_21,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_20,\n    /* fifo */ fifo_C_drain_PE_20_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 19,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_20,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_19,\n    /* fifo */ fifo_C_drain_PE_19_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 18,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_19,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_18,\n    /* fifo */ fifo_C_drain_PE_18_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 17,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_18,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_17,\n    /* fifo */ fifo_C_drain_PE_17_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 16,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_17,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_16,\n    /* fifo */ fifo_C_drain_PE_16_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 15,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_16,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_15,\n    /* fifo */ fifo_C_drain_PE_15_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 14,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_15,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_14,\n    /* fifo */ fifo_C_drain_PE_14_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 13,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_14,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_13,\n    /* fifo */ fifo_C_drain_PE_13_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 12,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_13,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_12,\n    /* fifo */ fifo_C_drain_PE_12_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 11,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_12,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_11,\n    /* fifo */ fifo_C_drain_PE_11_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 10,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_11,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_10,\n    /* fifo */ fifo_C_drain_PE_10_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 9,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_10,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_9,\n    /* fifo */ fifo_C_drain_PE_9_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 8,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_9,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_8,\n    /* fifo */ fifo_C_drain_PE_8_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_8,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_7,\n    /* fifo */ fifo_C_drain_PE_7_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_6,\n    /* fifo */ fifo_C_drain_PE_6_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_5,\n    /* fifo */ fifo_C_drain_PE_5_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_4,\n    /* fifo */ fifo_C_drain_PE_4_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_3,\n    /* fifo */ fifo_C_drain_PE_3_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_2,\n    /* fifo */ fifo_C_drain_PE_2_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_1,\n    /* fifo */ fifo_C_drain_PE_1_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 0,\n    /* module id */ 0,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_0,\n    /* fifo */ fifo_C_drain_PE_0_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_boundary_wrapper(\n    /* module id */ 1,\n    /* module id */ 23,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_23,\n    /* fifo */ fifo_C_drain_PE_23_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 22,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_23,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_22,\n    /* fifo */ fifo_C_drain_PE_22_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 21,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_22,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_21,\n    /* fifo */ fifo_C_drain_PE_21_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 20,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_21,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_20,\n    /* fifo */ fifo_C_drain_PE_20_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 19,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_20,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_19,\n    /* fifo */ fifo_C_drain_PE_19_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 18,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_19,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_18,\n    /* fifo */ fifo_C_drain_PE_18_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 17,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_18,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_17,\n    /* fifo */ fifo_C_drain_PE_17_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 16,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_17,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_16,\n    /* fifo */ fifo_C_drain_PE_16_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 15,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_16,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_15,\n    /* fifo */ fifo_C_drain_PE_15_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 14,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_15,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_14,\n    /* fifo */ fifo_C_drain_PE_14_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 13,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_14,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_13,\n    /* fifo */ fifo_C_drain_PE_13_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 12,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_13,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_12,\n    /* fifo */ fifo_C_drain_PE_12_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 11,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_12,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_11,\n    /* fifo */ fifo_C_drain_PE_11_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 10,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_11,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_10,\n    /* fifo */ fifo_C_drain_PE_10_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 9,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_10,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_9,\n    /* fifo */ fifo_C_drain_PE_9_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 8,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_9,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_8,\n    /* fifo */ fifo_C_drain_PE_8_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_8,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_7,\n    /* fifo */ fifo_C_drain_PE_7_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_6,\n    /* fifo */ fifo_C_drain_PE_6_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_5,\n    /* fifo */ fifo_C_drain_PE_5_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_4,\n    /* fifo */ fifo_C_drain_PE_4_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_3,\n    /* fifo */ fifo_C_drain_PE_3_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_2,\n    /* fifo */ fifo_C_drain_PE_2_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_1,\n    /* fifo */ fifo_C_drain_PE_1_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 1,\n    /* module id */ 0,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_0,\n    /* fifo */ fifo_C_drain_PE_0_1\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_boundary_wrapper(\n    /* module id */ 2,\n    /* module id */ 23,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_23,\n    /* fifo */ fifo_C_drain_PE_23_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 22,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_23,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_22,\n    /* fifo */ fifo_C_drain_PE_22_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 21,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_22,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_21,\n    /* fifo */ fifo_C_drain_PE_21_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 20,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_21,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_20,\n    /* fifo */ fifo_C_drain_PE_20_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 19,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_20,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_19,\n    /* fifo */ fifo_C_drain_PE_19_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 18,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_19,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_18,\n    /* fifo */ fifo_C_drain_PE_18_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 17,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_18,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_17,\n    /* fifo */ fifo_C_drain_PE_17_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 16,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_17,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_16,\n    /* fifo */ fifo_C_drain_PE_16_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 15,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_16,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_15,\n    /* fifo */ fifo_C_drain_PE_15_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 14,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_15,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_14,\n    /* fifo */ fifo_C_drain_PE_14_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 13,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_14,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_13,\n    /* fifo */ fifo_C_drain_PE_13_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 12,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_13,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_12,\n    /* fifo */ fifo_C_drain_PE_12_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 11,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_12,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_11,\n    /* fifo */ fifo_C_drain_PE_11_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 10,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_11,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_10,\n    /* fifo */ fifo_C_drain_PE_10_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 9,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_10,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_9,\n    /* fifo */ fifo_C_drain_PE_9_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 8,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_9,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_8,\n    /* fifo */ fifo_C_drain_PE_8_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_8,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_7,\n    /* fifo */ fifo_C_drain_PE_7_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_6,\n    /* fifo */ fifo_C_drain_PE_6_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_5,\n    /* fifo */ fifo_C_drain_PE_5_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_4,\n    /* fifo */ fifo_C_drain_PE_4_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_3,\n    /* fifo */ fifo_C_drain_PE_3_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_2,\n    /* fifo */ fifo_C_drain_PE_2_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_1,\n    /* fifo */ fifo_C_drain_PE_1_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 2,\n    /* module id */ 0,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_0,\n    /* fifo */ fifo_C_drain_PE_0_2\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_boundary_wrapper(\n    /* module id */ 3,\n    /* module id */ 23,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_23,\n    /* fifo */ fifo_C_drain_PE_23_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 22,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_23,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_22,\n    /* fifo */ fifo_C_drain_PE_22_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 21,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_22,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_21,\n    /* fifo */ fifo_C_drain_PE_21_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 20,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_21,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_20,\n    /* fifo */ fifo_C_drain_PE_20_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 19,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_20,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_19,\n    /* fifo */ fifo_C_drain_PE_19_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 18,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_19,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_18,\n    /* fifo */ fifo_C_drain_PE_18_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 17,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_18,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_17,\n    /* fifo */ fifo_C_drain_PE_17_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 16,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_17,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_16,\n    /* fifo */ fifo_C_drain_PE_16_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 15,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_16,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_15,\n    /* fifo */ fifo_C_drain_PE_15_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 14,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_15,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_14,\n    /* fifo */ fifo_C_drain_PE_14_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 13,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_14,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_13,\n    /* fifo */ fifo_C_drain_PE_13_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 12,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_13,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_12,\n    /* fifo */ fifo_C_drain_PE_12_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 11,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_12,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_11,\n    /* fifo */ fifo_C_drain_PE_11_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 10,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_11,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_10,\n    /* fifo */ fifo_C_drain_PE_10_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 9,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_10,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_9,\n    /* fifo */ fifo_C_drain_PE_9_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 8,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_9,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_8,\n    /* fifo */ fifo_C_drain_PE_8_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_8,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_7,\n    /* fifo */ fifo_C_drain_PE_7_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_6,\n    /* fifo */ fifo_C_drain_PE_6_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_5,\n    /* fifo */ fifo_C_drain_PE_5_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_4,\n    /* fifo */ fifo_C_drain_PE_4_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_3,\n    /* fifo */ fifo_C_drain_PE_3_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_2,\n    /* fifo */ fifo_C_drain_PE_2_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_1,\n    /* fifo */ fifo_C_drain_PE_1_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 3,\n    /* module id */ 0,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_0,\n    /* fifo */ fifo_C_drain_PE_0_3\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_boundary_wrapper(\n    /* module id */ 4,\n    /* module id */ 23,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_23,\n    /* fifo */ fifo_C_drain_PE_23_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 22,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_23,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_22,\n    /* fifo */ fifo_C_drain_PE_22_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 21,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_22,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_21,\n    /* fifo */ fifo_C_drain_PE_21_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 20,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_21,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_20,\n    /* fifo */ fifo_C_drain_PE_20_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 19,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_20,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_19,\n    /* fifo */ fifo_C_drain_PE_19_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 18,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_19,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_18,\n    /* fifo */ fifo_C_drain_PE_18_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 17,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_18,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_17,\n    /* fifo */ fifo_C_drain_PE_17_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 16,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_17,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_16,\n    /* fifo */ fifo_C_drain_PE_16_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 15,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_16,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_15,\n    /* fifo */ fifo_C_drain_PE_15_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 14,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_15,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_14,\n    /* fifo */ fifo_C_drain_PE_14_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 13,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_14,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_13,\n    /* fifo */ fifo_C_drain_PE_13_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 12,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_13,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_12,\n    /* fifo */ fifo_C_drain_PE_12_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 11,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_12,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_11,\n    /* fifo */ fifo_C_drain_PE_11_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 10,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_11,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_10,\n    /* fifo */ fifo_C_drain_PE_10_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 9,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_10,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_9,\n    /* fifo */ fifo_C_drain_PE_9_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 8,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_9,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_8,\n    /* fifo */ fifo_C_drain_PE_8_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_8,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_7,\n    /* fifo */ fifo_C_drain_PE_7_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_6,\n    /* fifo */ fifo_C_drain_PE_6_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_5,\n    /* fifo */ fifo_C_drain_PE_5_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_4,\n    /* fifo */ fifo_C_drain_PE_4_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_3,\n    /* fifo */ fifo_C_drain_PE_3_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_2,\n    /* fifo */ fifo_C_drain_PE_2_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_1,\n    /* fifo */ fifo_C_drain_PE_1_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 4,\n    /* module id */ 0,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_0,\n    /* fifo */ fifo_C_drain_PE_0_4\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_boundary_wrapper(\n    /* module id */ 5,\n    /* module id */ 23,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_23,\n    /* fifo */ fifo_C_drain_PE_23_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 22,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_23,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_22,\n    /* fifo */ fifo_C_drain_PE_22_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 21,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_22,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_21,\n    /* fifo */ fifo_C_drain_PE_21_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 20,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_21,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_20,\n    /* fifo */ fifo_C_drain_PE_20_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 19,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_20,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_19,\n    /* fifo */ fifo_C_drain_PE_19_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 18,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_19,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_18,\n    /* fifo */ fifo_C_drain_PE_18_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 17,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_18,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_17,\n    /* fifo */ fifo_C_drain_PE_17_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 16,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_17,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_16,\n    /* fifo */ fifo_C_drain_PE_16_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 15,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_16,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_15,\n    /* fifo */ fifo_C_drain_PE_15_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 14,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_15,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_14,\n    /* fifo */ fifo_C_drain_PE_14_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 13,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_14,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_13,\n    /* fifo */ fifo_C_drain_PE_13_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 12,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_13,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_12,\n    /* fifo */ fifo_C_drain_PE_12_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 11,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_12,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_11,\n    /* fifo */ fifo_C_drain_PE_11_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 10,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_11,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_10,\n    /* fifo */ fifo_C_drain_PE_10_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 9,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_10,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_9,\n    /* fifo */ fifo_C_drain_PE_9_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 8,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_9,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_8,\n    /* fifo */ fifo_C_drain_PE_8_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_8,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_7,\n    /* fifo */ fifo_C_drain_PE_7_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_6,\n    /* fifo */ fifo_C_drain_PE_6_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_5,\n    /* fifo */ fifo_C_drain_PE_5_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_4,\n    /* fifo */ fifo_C_drain_PE_4_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_3,\n    /* fifo */ fifo_C_drain_PE_3_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_2,\n    /* fifo */ fifo_C_drain_PE_2_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_1,\n    /* fifo */ fifo_C_drain_PE_1_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 5,\n    /* module id */ 0,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_0,\n    /* fifo */ fifo_C_drain_PE_0_5\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_boundary_wrapper(\n    /* module id */ 6,\n    /* module id */ 23,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_23,\n    /* fifo */ fifo_C_drain_PE_23_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 22,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_23,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_22,\n    /* fifo */ fifo_C_drain_PE_22_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 21,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_22,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_21,\n    /* fifo */ fifo_C_drain_PE_21_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 20,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_21,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_20,\n    /* fifo */ fifo_C_drain_PE_20_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 19,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_20,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_19,\n    /* fifo */ fifo_C_drain_PE_19_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 18,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_19,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_18,\n    /* fifo */ fifo_C_drain_PE_18_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 17,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_18,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_17,\n    /* fifo */ fifo_C_drain_PE_17_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 16,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_17,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_16,\n    /* fifo */ fifo_C_drain_PE_16_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 15,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_16,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_15,\n    /* fifo */ fifo_C_drain_PE_15_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 14,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_15,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_14,\n    /* fifo */ fifo_C_drain_PE_14_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 13,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_14,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_13,\n    /* fifo */ fifo_C_drain_PE_13_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 12,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_13,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_12,\n    /* fifo */ fifo_C_drain_PE_12_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 11,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_12,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_11,\n    /* fifo */ fifo_C_drain_PE_11_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 10,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_11,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_10,\n    /* fifo */ fifo_C_drain_PE_10_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 9,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_10,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_9,\n    /* fifo */ fifo_C_drain_PE_9_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 8,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_9,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_8,\n    /* fifo */ fifo_C_drain_PE_8_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_8,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_7,\n    /* fifo */ fifo_C_drain_PE_7_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_6,\n    /* fifo */ fifo_C_drain_PE_6_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_5,\n    /* fifo */ fifo_C_drain_PE_5_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_4,\n    /* fifo */ fifo_C_drain_PE_4_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_3,\n    /* fifo */ fifo_C_drain_PE_3_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_2,\n    /* fifo */ fifo_C_drain_PE_2_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_1,\n    /* fifo */ fifo_C_drain_PE_1_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 6,\n    /* module id */ 0,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_0,\n    /* fifo */ fifo_C_drain_PE_0_6\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_boundary_wrapper(\n    /* module id */ 7,\n    /* module id */ 23,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_23,\n    /* fifo */ fifo_C_drain_PE_23_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 22,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_23,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_22,\n    /* fifo */ fifo_C_drain_PE_22_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 21,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_22,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_21,\n    /* fifo */ fifo_C_drain_PE_21_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 20,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_21,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_20,\n    /* fifo */ fifo_C_drain_PE_20_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 19,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_20,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_19,\n    /* fifo */ fifo_C_drain_PE_19_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 18,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_19,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_18,\n    /* fifo */ fifo_C_drain_PE_18_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 17,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_18,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_17,\n    /* fifo */ fifo_C_drain_PE_17_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 16,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_17,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_16,\n    /* fifo */ fifo_C_drain_PE_16_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 15,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_16,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_15,\n    /* fifo */ fifo_C_drain_PE_15_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 14,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_15,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_14,\n    /* fifo */ fifo_C_drain_PE_14_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 13,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_14,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_13,\n    /* fifo */ fifo_C_drain_PE_13_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 12,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_13,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_12,\n    /* fifo */ fifo_C_drain_PE_12_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 11,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_12,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_11,\n    /* fifo */ fifo_C_drain_PE_11_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 10,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_11,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_10,\n    /* fifo */ fifo_C_drain_PE_10_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 9,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_10,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_9,\n    /* fifo */ fifo_C_drain_PE_9_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 8,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_9,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_8,\n    /* fifo */ fifo_C_drain_PE_8_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_8,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_7,\n    /* fifo */ fifo_C_drain_PE_7_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_6,\n    /* fifo */ fifo_C_drain_PE_6_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_5,\n    /* fifo */ fifo_C_drain_PE_5_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_4,\n    /* fifo */ fifo_C_drain_PE_4_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_3,\n    /* fifo */ fifo_C_drain_PE_3_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_2,\n    /* fifo */ fifo_C_drain_PE_2_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_1,\n    /* fifo */ fifo_C_drain_PE_1_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L1_out_wrapper(\n    /* module id */ 7,\n    /* module id */ 0,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_0,\n    /* fifo */ fifo_C_drain_PE_0_7\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L2_out_boundary(\n    /* module id */ 7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L2_out_7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_7_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L2_out(\n    /* module id */ 6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L2_out_7,\n    /* fifo */ fifo_C_drain_C_drain_IO_L2_out_6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_6_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L2_out(\n    /* module id */ 5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L2_out_6,\n    /* fifo */ fifo_C_drain_C_drain_IO_L2_out_5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_5_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L2_out(\n    /* module id */ 4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L2_out_5,\n    /* fifo */ fifo_C_drain_C_drain_IO_L2_out_4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_4_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L2_out(\n    /* module id */ 3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L2_out_4,\n    /* fifo */ fifo_C_drain_C_drain_IO_L2_out_3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_3_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L2_out(\n    /* module id */ 2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L2_out_3,\n    /* fifo */ fifo_C_drain_C_drain_IO_L2_out_2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_2_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L2_out(\n    /* module id */ 1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L2_out_2,\n    /* fifo */ fifo_C_drain_C_drain_IO_L2_out_1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L2_out(\n    /* module id */ 0,\n    /* fifo */ fifo_C_drain_C_drain_IO_L2_out_1,\n    /* fifo */ fifo_C_drain_C_drain_IO_L2_out_0,\n    /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L3_out(\n    /* fifo */ fifo_C_drain_C_drain_IO_L3_out_serialize,\n    /* fifo */ fifo_C_drain_C_drain_IO_L2_out_0\n  );\n  /* Module Call */\n\n  /* Module Call */\n  C_drain_IO_L3_out_serialize(\n    /* array */ C,\n    /* fifo */ fifo_C_drain_C_drain_IO_L3_out_serialize\n  );\n  /* Module Call */\n\n}\n}\n"
  },
  {
    "path": "autosa_tests/large/mm_int8/simd_info.json",
    "content": "{\n  \"kernel0\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel1\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel3\": {\n    \"reduction\": [\"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/large/mm_int8/step1-run-hls.tcl",
    "content": "open_project kernel0\nset_top kernel0\nadd_files \"src/kernel_kernel.cpp\"\n#add_files -tb PATH_TO_TESTBENCH_FILE\n\nopen_solution solution\n\n#u250\nset_part xcu250-figd2104-2L-e\n\n# u280\n#set_part xcu280-fsvh2892-2L-e\n\n# 300 MHz\ncreate_clock -period 3.333\n\nconfig_dataflow -strict_mode warning\nset_clock_uncertainty 27.000000%\nconfig_rtl -enable_maxiConservative=1\nconfig_interface -m_axi_addr64\n\n# to enable integration with Vitis\nconfig_sdx -target xocc\n\n#csim_design\ncsynth_design\nclose_project\nexit\n"
  },
  {
    "path": "autosa_tests/large/mm_int8/step2-autobridge.py",
    "content": "#! /usr/bin/python3.6\n\n# add the path to where you place the autobridge source code\nimport sys\nsys.path.append('../src')\n\nimport graph\nfrom formator import FormatHLS\nimport collections\nimport os\nimport subprocess\n\n\"\"\"\nAutoBridge divides the target device as follows and assign each HLS function to one slot\nFor more details pls refer to the paper\n\n      u250                     u280\n   -----------\n 3 |    |    |\n   |----|----|              |----|----|\n 2 |    |    |            2 |    |    |\n   |----|----|              |----|----|\n 1 |    |    |            1 |    |    |\n   |----|----|              |----|----|\n 0 |    |    |            0 |    |    |\n   -----------              -----------\n     0    1                   0    1\n\"\"\"\n\n################### Modify Accordingly ###############################\n\n# (1) fill basic information\nproject_path = '/home/jaywang/doc_examples/mm_int8_ab_pe/kernel0' # path to your hls project\n#project_path = '/home/jaywang/doc_examples/mm_ab/kernel0' # path to your hls project\ntop_name = 'kernel0' # name of the top function in your hls design\nsolution_path = f'{project_path}/solution/'\nproject_name = 'kernel0'\nboard_name = 'u250' # or 'u280'\n# where the results will be saved. Your HLS project will be copied there and your top RTL will be replaced.\n# Note that if the directory already exists, we will try to reset the contents\n\n# (2) specify how your designs connect to the external memory\n\"\"\" Example:\n\nvoid kernel0(ap_uint<512> *p1, ap_uint<512> *p2)\n{\n  #pragma HLS INTERFACE m_axi port=p1 offset=slave bundle=gmem_A\n  #pragma HLS INTERFACE m_axi port=p2 offset=slave bundle=gmem_B\n\n  load_p1 (p1, ...);\n  load_p2 (p2, ...);\n}\n\n--------------------------------------\n\nIn this example, the pointer p1 and p2 will become M_AXI controllers to connect to the dedicated DDR IP.\nIf you want p1 to connect to DDR 2 in the 2-nd SLR, then you need to specify that the corresponding RTL controller must be floorplanned at the 2-nd SLR\nMeanwhile, your function load_p1() will talk to the M_AXI controller also through AXI interface which cannot be easily pipelined.\nThus the RTL module corresponds to load_p1() must also be in the 2-nd SLR in this example.\nSince load_p1() will communicate with the rest of your design using FIFO interface, you don't need to specify the location of other modules\n\n(transparent)|                        (user visible)\n             |\n   Vitis     |                    what your HLS design becomes\n             |\n             | M_AXI                     AXI                        FIFO\nDDR IP  <--- | ----> M_AXI controller <-------> your first module <-------> your other modules\n(fixed loc)  |         (p1)                       (load_p1)\n             |\n             | M_AXI                     AXI                        FIFO\nDDR IP  <--- | ----> M_AXI controller <-------> your first module <-------> your other modules\n(fixed loc)  |         (p2)                       (load_p2)\n             |\n             | S_AXI\nPCIe    <--- | ----> S_AXI controller\n             |\n\"\"\"\n\n# on the left side or the right side of an SLR\nDDR_loc_2d_x = collections.defaultdict(dict)\n\n# on which SLR\nDDR_loc_2d_y = collections.defaultdict(dict)\n\n# use DDR 0, 1, 3\nDDR_loc_2d_y['A_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_x['A_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_A_m_axi_U'] = 0\nDDR_loc_2d_x['kernel0_gmem_A_m_axi_U'] = 0\n\nDDR_loc_2d_y['B_IO_L3_in_serialize_U0'] = 1\nDDR_loc_2d_x['B_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_B_m_axi_U'] = 1\nDDR_loc_2d_x['kernel0_gmem_B_m_axi_U'] = 0\n\nDDR_loc_2d_y['C_drain_IO_L3_out_serialize_U0'] = 3\nDDR_loc_2d_x['C_drain_IO_L3_out_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_C_m_axi_U'] = 3\nDDR_loc_2d_x['kernel0_gmem_C_m_axi_U'] = 0\n\nDDR_loc_2d_y['kernel0_control_s_axi_U'] = 1\nDDR_loc_2d_x['kernel0_control_s_axi_U'] = 1\nDDR_loc_2d_y['kernel0_entry12_U0'] = 1\nDDR_loc_2d_x['kernel0_entry12_U0'] = 1\n\n# (3) specify DDR information\n# If you instantiate a DDR controller, it will consume non-trivial amount of resource\n# to make the floorplanning better, you need to specify which DDRs have been enabled\n# In this example, you connect p1 to DDR-2 in SLR-2 and p2 to DDR-1 in SLR-1\n# If you want to use all DDRs, for example, you need to set it as [1, 1, 1, 1]\nDDR_enable = [1, 1, 0, 1]\n\n# (4) specify how much resource can be used in each slot\n# In this way you could force the design to be placed evenly across the device and avoid local congestion\n\"\"\" Example:\n   -----------\n 3 |0.76|0.62|\n   |----|----|\n 2 |0.74|0.61|\n   |----|----|\n 1 |0.75|0.6 |\n   |----|----|\n 0 | 0.7|0.6 |\n   -----------\n     0    1\n\"\"\"\nmax_usage_ratio_2d = [ [0.8, 0.6], [0.8, 0.6], [0.8, 0.8], [0.8, 0.6] ]\n\n\n##################### DON'T TOUCH THE SECTION BELOW #################################\ntarget_dir = '/home/jaywang/doc_examples/mm_int8_ab_pe/autobridge'\n\nformator = FormatHLS(\n  rpt_path = f'{solution_path}/syn/report/',\n  hls_sche_path = f'{solution_path}/.autopilot/db/',\n  top_hdl_path = f'{solution_path}/syn/verilog/{top_name}_{top_name}.v',\n  top_name = top_name,\n  DDR_loc_2d_x = DDR_loc_2d_x,\n  DDR_loc_2d_y = DDR_loc_2d_y,\n  DDR_enable = DDR_enable,\n  max_usage_ratio_2d = max_usage_ratio_2d,\n  board_name = board_name,\n  target_dir = target_dir,\n  relay_station_count = lambda x : 2 * x, # how many levels of relay stations to add for x-unit of crossing\n  max_search_time = 600,\n  NaiveBalance = True)\n\n# run floorplanning\ng = graph.Graph(formator)\n\n# move results to target dir\nif (os.path.isdir(target_dir)):\n  subprocess.run(['rm', '-rf', f'{target_dir}'])\nsubprocess.run(['mkdir', f'{target_dir}/'])\nsubprocess.run(['cp', '-r', project_path, f'{target_dir}/{project_name}'])\nsubprocess.run(['cp', os.path.realpath(__file__), f'{target_dir}/archived_source.txt'])\nsubprocess.run(['chmod', '+w', '-R', f'{target_dir}'])\nsubprocess.run(['cp', 'constraint.tcl', target_dir])\nsubprocess.run(['cp', 'pack_xo.tcl', target_dir])\nsubprocess.run(['cp', 'autobridge.log', target_dir])\nsubprocess.run(['cp', f'{top_name}_{top_name}.v', f'{target_dir}/{project_name}/solution/syn/verilog/'])\n\n# clean up\nos.system('rm *.lp')\nsubprocess.run(['rm', 'parser.out'])\nsubprocess.run(['rm', 'parsetab.py'])\nsubprocess.run(['rm', '-rf', '__pycache__'])\n\n"
  },
  {
    "path": "autosa_tests/large/mm_int8/step3-pack-xo.tcl",
    "content": "open_project kernel0\nopen_solution solution\nexport_design -rtl verilog -format ip_catalog -xo kernel0.xo\n\nclose_project\nputs \"Pack XO successfully\"\nexit\n"
  },
  {
    "path": "autosa_tests/large/mm_int8/step4-run-vitis.sh",
    "content": "OUTPUT_DIR=\"$(pwd)/vitis_run\"\n\n# name of the top function\nTOP=kernel0\n\n# choose the target device\nPLATFORM=xilinx_u250_xdma_201830_2 \n#PLATFORM=xilinx_u280_xdma_201920_3 \n\nXO=\"$(pwd)/kernel0.xo\"\n\n# For different approaches see UG904-vivado-implementation\n#STRATEGY=\"Default\" \nSTRATEGY=\"EarlyBlockPlacement\" \n\n# remove the unused '--connectivity.sp' option for v++ if some DDRs are not used \n# Example: if we map p1 to DDR 3 and p2 to DDR 0\n#\n# void kernel0(ap_uint<512> *p1, ap_uint<512> *p2)\n# {\n#   #pragma HLS INTERFACE m_axi port=p1 offset=slave bundle=gmem_A\n#   #pragma HLS INTERFACE m_axi port=p2 offset=slave bundle=gmem_B\n# \n#   load_p1 (p1, ...);\n#   load_p2 (p2, ...);\n# }\n#\n# ARG_FOR_DDR_0=p2\n# ARG_FOR_DDR_3=p1\n# Should remove '--connectivity.sp' for DDR1 and DDR2\n\nARG_FOR_DDR_1=A\nARG_FOR_DDR_2=B\n#ARG_FOR_DDR_3=\"YOUR_HLS_ARGUMENT_NAME_FOR_DDR_3\"\nARG_FOR_DDR_4=C\n\n# the constraint file containing the floorplan results\n# WARNING: must use absolute address\nCONSTRAINT=\"$(pwd)/constraint.tcl\"\nif [ ! -f \"$CONSTRAINT\" ]; then\n    echo \"no constraint file found\"\n    exit\nfi\n\nv++ \\\n  --link \\\n  --output \"${OUTPUT_DIR}/${TOP}_${PLATFORM}.xclbin\" \\\n  --kernel ${TOP} \\\n  --platform ${PLATFORM} \\\n  --target hw \\\n  --report_level 2 \\\n  --temp_dir \"${OUTPUT_DIR}/${TOP}_${PLATFORM}.temp\" \\\n  --optimize 3 \\\n  --connectivity.nk ${TOP}:1:${TOP}_1 \\\n  --max_memory_ports ${TOP} \\\n  --save-temps \\\n  ${XO} \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_1}:DDR[0] \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_2}:DDR[1] \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_4}:DDR[3] \\\n  --kernel_frequency 300 \\\n  --vivado.prop run.impl_1.STEPS.PLACE_DESIGN.ARGS.DIRECTIVE=$STRATEGY \\\n  --vivado.prop run.impl_1.STEPS.OPT_DESIGN.TCL.PRE=$CONSTRAINT\n"
  },
  {
    "path": "autosa_tests/large/mm_int8/unroll.py",
    "content": "import math\n\n# Modify the parameters here\nUNROLL_FACTOR = 64\nDATA_T = 'char'\n\n# Generate the code\ndata_type = DATA_T\nlevel = int(math.log2(UNROLL_FACTOR))\nfor layer in range(level - 1, -1, -1):\n    pair = int(math.pow(2, layer))\n    for i in range(pair):\n        # data_t tmp_[layer]_[pair] = tmp_[layer+1]_[pair*2]_[pair*2+1]\n        if layer == level - 1:\n            print(f'{data_type} mul_{layer}_{i}_0 = local_A[0][{i*2}] * local_B[0][{i*2}];')\n            print(f'{data_type} add_{layer}_{i} = mul_{layer}_{i}_0 + local_A[0][{i*2+1}] * local_B[0][{i*2+1}];')\n        else:\n            print(f'{data_type} add_{layer}_{i} = add_{layer+1}_{i*2} + add_{layer+1}_{i*2+1};')\n\n# Add resource\nfor layer in range(level - 1, -1, -1):\n    pair = int(math.pow(2, layer))\n    for i in range(pair):\n        if layer == level - 1:\n            print(f'#pragma HLS RESOURCE variable=mul_{layer}_{i}_0 core=Mul_LUT')\n        else:\n            print(f'#pragma HLS RESOURCE variable=add_{layer}_{i} core=AddSub')\n\nprint('local_C[c7][c6] += add_0_0;')\n"
  },
  {
    "path": "autosa_tests/large/mm_intel/Makefile",
    "content": "APP ?= kernel\nAOCL_BOARD ?= s10mx_hbm_es\nSW_EMU_AOCX ?= $(APP)_sw_emu.aocx\nHW_EMU_AOCX ?= $(APP)_hw_emu.aocx\nHW_AOCX ?= $(APP)_hw.aocx\nAOCO ?= $(APP).aoco\nAOCR ?= $(APP).aocr\n\n# Compiler\nAOC ?= aoc\nCXX ?= g++\nAOC_FLAGS ?= -board=$(AOCL_BOARD) -fp-relaxed -report -hyper-optimized-handshaking=off -I $(INTELFPGAOCLSDKROOT)/include/kernel_headers\n\nTARGET ?= host\nSW_EMU_TARGET ?= host_sw_emu\nTARGET_DIR ?= bin\nAOCL_UTILS ?= $(INTELFPGAOCLSDKROOT)/examples_aoc/common\n\n# Directories\nINC_DIRS := src $(AOCL_UTILS)/inc\nLIB_DIRS := \n\n# Files\nINCS := $(wildcard src/*.h)\nHOST_SRCS := $(wildcard src/$(APP)_host.cpp $(AOCL_UTILS)/src/AOCLUtils/*.cpp)\nKERNEL_SRCS := src/$(APP)_kernel.cl\n\nifeq ($(VERBOSE),1)\nECHO := \nelse\nECHO := @\nendif\n\n# Where is the Intel(R) FPGA SDK for OpenCL(TM) software?\nifeq ($(wildcard $(INTELFPGAOCLSDKROOT)),)\n$(error Set INTELFPGAOCLSDKROOT to the root directory of the Intel(R) FPGA SDK for OpenCL(TM) software installation)\nendif\nifeq ($(wildcard $(INTELFPGAOCLSDKROOT)/host/include/CL/opencl.h),)\n$(error Set INTELFPGAOCLSDKROOT to the root directory of the Intel(R) FPGA SDK for OpenCL(TM) software installation.)\nendif\n\n# OpenCL compile and link flags.\nAOCL_COMPILE_CONFIG := $(shell aocl compile-config )\nAOCL_LINK_LIBS := $(shell aocl ldlibs )\nAOCL_LINK_FLAGS := $(shell aocl ldflags )\n# Linking with defences enabled\nAOCL_LINK_FLAGS += -z noexecstack\nAOCL_LINK_FLAGS += -Wl,-z,relro,-z,now\nAOCL_LINK_FLAGS += -Wl,-Bsymbolic\nAOCL_LINK_FLAGS += -pie\nAOCL_LINK_CONFIG := $(AOCL_LINK_FLAGS) $(AOCL_LINK_LIBS)\n\n# Compilation flags\nifeq ($(DEBUG),1)\nCXXFLAGS += -g\nelse\nCXXFLAGS += -O2\nendif\nCXXFLAGS += -std=gnu++0x\n\n# Compiling with defences enabled\nCXXFLAGS += -fstack-protector\nCXXFLAGS += -D_FORTIFY_SOURCE=2\nCXXFLAGS += -Wformat -Wformat-security\nCXXFLAGS += -fPIE\n\n# We must force GCC to never assume that it can shove in its own\n# sse2/sse3 versions of strlen and strcmp because they will CRASH.\n# Very hard to debug!\nCXXFLAGS += -fPIC\n\nLIBS := rt pthread\n\n## Make it all!\n#all : $(TARGET_DIR)/$(TARGET)\n\nsw_emu : $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(SW_EMU_AOCX)\n\nhls: $(TARGET_DIR)/$(AOCR)\n\nhw : $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(HW_AOCX)\n\nhw_emu: $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(HW_EMU_AOCX)\n\nhw_emu_check: $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(HW_EMU_AOCX)\n\tCL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 $(TARGET_DIR)/$(TARGET) $(HW_EMU_AOCX)\n\nsw_emu_check : $(TARGET_DIR)/$(SW_EMU_TARGET) $(TARGET_DIR)/$(SW_EMU_AOCX)\n\tCL_CONTEXT_EMULATOR_DEVICE_INTELFPGA=1 $(TARGET_DIR)/$(TARGET) $(SW_EMU_AOCX)\n\nhw_check : $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(HW_AOCX)\n\t$(TARGET_DIR)/$(TARGET) $(HW_AOCX)\n\n# Host executable target.\n$(TARGET_DIR)/$(TARGET) : Makefile $(HOST_SRCS) $(INCS) $(TARGET_DIR)\n\t$(ECHO)$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(EXTRACXXFLAGS) -fPIC $(foreach D,$(INC_DIRS),-I$D) \\\n\t\t\t$(AOCL_COMPILE_CONFIG) $(HOST_SRCS) $(AOCL_LINK_CONFIG) \\\n\t\t\t$(foreach D,$(LIB_DIRS),-L$D) \\\n\t\t\t$(foreach L,$(LIBS),-l$L) \\\n\t\t\t-o $(TARGET_DIR)/$(TARGET)\n\n$(TARGET_DIR)/$(SW_EMU_TARGET) : Makefile $(HOST_SRCS) $(INCS) $(TARGET_DIR)\n\t$(ECHO)$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(EXTRACXXFLAGS) -fPIC $(foreach D,$(INC_DIRS),-I$D) \\\n\t\t\t$(AOCL_COMPILE_CONFIG) $(HOST_SRCS) $(AOCL_LINK_CONFIG) \\\n\t\t\t$(foreach D,$(LIB_DIRS),-L$D) \\\n\t\t\t$(foreach L,$(LIBS),-l$L) \\\n\t\t\t-o $(TARGET_DIR)/$(TARGET) -DEMULATE\n\n$(TARGET_DIR) :\n\t$(ECHO)mkdir $(TARGET_DIR)\n\n$(TARGET_DIR)/$(SW_EMU_AOCX) : $(KERNEL_SRCS)\n\t$(AOC) $(AOC_FLAGS) -march=emulator -legacy-emulator -o $@ $^\n\n$(TARGET_DIR)/$(HW_EMU_AOCX) : $(KERNEL_SRCS)\n\t$(AOC) $(AOC_FLAGS) -march=simulator -ghdl -o $@ $^\n\n$(TARGET_DIR)/$(HW_AOCX) : $(KERNEL_SRCS)\n\t$(AOC) $(AOC_FLAGS) -o $@ $^\n\n$(TARGET_DIR)/$(AOCO) : $(KERNEL_SRCS)\n\t$(AOC) $(AOC_FLAGS) -c -o $@ $^\n\n$(TARGET_DIR)/$(AOCR) : $(TARGET_DIR)/$(AOCO)\n\t$(AOC) $(AOC_FLAGS) -rtl -o $@ $^\n\n# Standard make targets\nclean :\n\t$(ECHO)rm -rf $(TARGET_DIR)/*\n\n.PHONY : all clean\n"
  },
  {
    "path": "autosa_tests/large/mm_intel/README.md",
    "content": "# Matrix Multiplication (Large)\n\nBoard        | Software Version\n-------------|-----------------\nStratix 10 | Intel FPGA SDK for OpenCL 19.4\n\n__Files__:\n```\nautosa_tests/large/mm_intel/kernel.c\nautosa_tests/large/mm_intel/kernel.h\nautosa_tests/large/mm_intel/simd_info.json\nautosa_tests/large/mm_intel/Makefile\n```\n\n__Command__:\n```c\n./autosa ./autosa_tests/large/mm_intel/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_opencl --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[260,256,512];kernel[]->latency[20,16];kernel[]->simd[8]}\" --simd-info=./autosa_tests/large/mm_intel/simd_info.json --host-serialize --loop-infinitize --double-buffer-style=0 --mem-port-map=\"{kernel[]->A[0];kernel[]->B[1];kernel[]->C[2]}\"\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` and `connectivity.cfg` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/large/mm_intel/Makefile autosa.tmp/output/\n```\n\nExecute the makefile to perform software emulation\n```\nmake sw_emu_check\n```\nor synthesize the design to RTL\n```\nmake hls\n```\nor generate the bitstream\n```\nmake hw\n```"
  },
  {
    "path": "autosa_tests/large/mm_intel/kernel.c",
    "content": "#include \"kernel.h\"\n\n//#define LAYOUT1\n#define LAYOUT2\n//#define LAYOUT3\n\nint main(int argc, char **argv) {\n//  data_t A[I][K], B[K][J], C[I][J], C_golden[I][J]; \n#ifdef LAYOUT2  \n  static data_t A[I][K], B[J][K], C[I][J], C_golden[I][J]; // gemm0,3\n#endif  \n#ifdef LAYOUT3  \n  static data_t A[K][I], B[K][J], C[I][J], C_golden[I][J]; // gemm4\n#endif  \n\n  for (int i = 0; i < I; i++) \n    for (int k = 0; k < K; k++) {\n#ifdef LAYOUT2      \n      A[i][k] = k;\n#endif\n#ifdef LAYOUT3      \n      A[k][i] = k;\n#endif      \n    }\n\n  for (int j = 0; j < J; j++)\n    for (int k = 0; k < K; k++) {\n#ifdef LAYOUT2      \n      B[j][k] = k;\n#endif\n#ifdef LAYOUT3      \n      B[k][j] = k;\n#endif      \n    }\n\n#pragma scop\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n#ifdef LAYOUT2        \n        C[i][j] = C[i][j] + A[i][k] * B[j][k];\n#endif\n#ifdef LAYOUT3      \n        C[i][j] = C[i][j] + A[k][i] * B[k][j];\n#endif        \n      }\n    }\n#pragma endscop\n\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C_golden[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n#ifdef LAYOUT2        \n        C_golden[i][j] = C_golden[i][j] + A[i][k] * B[j][k];\n#endif\n#ifdef LAYOUT3        \n        C_golden[i][j] = C_golden[i][j] + A[k][i] * B[k][j];\n#endif        \n      }\n    }\n\n  int err = 0;\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      if (fabs((float)C_golden[i][j] - (float)C[i][j]) > 0.001)\n        err++;\n    }\n\n  if (err)\n    printf(\"Failed with %d errors!\\n\", err);\n  else\n    printf(\"Passed!\\n\");\n\n  return 0;\n}\n"
  },
  {
    "path": "autosa_tests/large/mm_intel/kernel.h",
    "content": "#include \"stdio.h\"\n#include \"stdlib.h\"\n#include \"math.h\"\n\ntypedef float data_t;\n\n#define I 1040 \n#define J 1024\n#define K 1024\n"
  },
  {
    "path": "autosa_tests/large/mm_intel/simd_info.json",
    "content": "{\n  \"kernel0\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel1\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel2\": {\n    \"reduction\": [\"y\"]\n  }, \n  \"kernel3\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel4\": {\n    \"reduction\": [\"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/large/mttkrp/Makefile",
    "content": "VPP := $(XILINX_VITIS)/bin/v++\nEMCONFIGUTIL := $(XILINX_VITIS)/bin/emconfigutil\nMODE := hw\n#PLATFORM := xilinx_u200_qdma_201920_1\nPLATFORM := xilinx_u250_xdma_201830_2\n\n# sources\nKERNEL_SRC := src/kernel_kernel.cpp\nHOST_SRC := src/kernel_host.cpp\n\n# targets\nHOST_EXE := host.exe\n\nXOS := kernel0.$(MODE).xo\nXCLBIN := kernel0.$(MODE).xclbin\nEMCONFIG_FILE := emconfig.json\n\n# Linker options to map kernel ports to DDR banks\nVPP_LINK_OPTS := --config connectivity.cfg\n\nVPP_COMMON_OPTS := -s -t $(MODE) --platform $(PLATFORM) -R2 -O3 --kernel_frequency 250 --vivado.prop=run.impl_1.STRATEGY=Performance_EarlyBlockPlacement\nCFLAGS := -g -std=c++11 -I$(XILINX_XRT)/include\nLFLAGS := -L$(XILINX_XRT)/lib -lxilinxopencl -lpthread -lrt\nNUMDEVICES := 1\n\n# run time args\nEXE_OPT := kernel0.$(MODE).xclbin\n\n# primary build targets\n.PHONY: xclbin app all\n\nxclbin:  $(XCLBIN)\napp: $(HOST_EXE)\n\nall: xclbin app\n\nclean:\n\t-$(RM) $(EMCONFIG_FILE) $(HOST_EXE) $(XCLBIN) *.xclbin *.xo $(XOS)\n\n# kernel rules\n$(XOS): $(KERNEL_SRC)\n\t$(RM) $@\n\t$(VPP) $(VPP_COMMON_OPTS) -c -k kernel0 -o $@ $+\n\n\n$(XCLBIN): $(XOS)\n\t$(VPP) $(VPP_COMMON_OPTS) -l -o $@ $+ $(VPP_LINK_OPTS)\n\n# host rules\n$(HOST_EXE): $(HOST_SRC)\n\tg++ $(CFLAGS) -o $@ $+ $(LFLAGS)\n\t@echo 'Compiled Host Executable: $(HOST_EXE)'\n\n$(EMCONFIG_FILE):\n\t$(EMCONFIGUTIL) --nd $(NUMDEVICES) --od . --platform $(PLATFORM)\n\ncheck: $(XCLBIN) $(HOST_EXE) $(EMCONFIG_FILE)\n\tXCL_EMULATION_MODE=${MODE} ./$(HOST_EXE) $(EXE_OPT)\n"
  },
  {
    "path": "autosa_tests/large/mttkrp/README.md",
    "content": "# Matricized Tensor Times Khatri-Rao Product (MTTKRP)\n\nBoard        | Software Version\n-------------|-----------------\nXilinx Alveo U250 | Xilinx Vitis 2019.2\n\n__Files__:\n```\nautosa_tests/large/mttkrp/kernel.c\nautosa_tests/large/mttkrp/kernel.h\nautosa_tests/large/mttkrp/simd_info.json\nautosa_tests/large/mttkrp/Makefile\nautosa_tests/large/mttkrp/connectivity.cfg\n```\n\n__Command__:\n```c\n./autosa ./autosa_tests/large/mttkrp/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[128,128,2];kernel[]->latency[16,8];kernel[]->simd[8,1]}\" --simd-info=./autosa_tests/large/mttkrp/simd_info.json --host-serialize\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` and `connectivity.cfg` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/large/mttkrp/Makefile autosa.tmp/output/\ncp autosa_tests/large/mttkrp/connectivity.cfg autosa.tmp/output/\n```\n\nExecute the makefile to build the design.\n\n```\ncd autosa.tmp/output\nmake all\n```"
  },
  {
    "path": "autosa_tests/large/mttkrp/connectivity.cfg",
    "content": "[connectivity]\nsp=kernel0_1.A:DDR[0]\nsp=kernel0_1.B:DDR[1] \nsp=kernel0_1.C:DDR[2]\nsp=kernel0_1.D:DDR[3]"
  },
  {
    "path": "autosa_tests/large/mttkrp/kernel.c",
    "content": "/*\n * This code implements the Matricized Tensor Times Khatri-Rao Product (MTTKRP), which performs:\n * D(i,j) += A(i,k,l) * B(k,j) * C(l,j)\n * Input: A[I][K][L], B[K][J], C[L][J]\n * Output: D[I][J]\n */\n\n#include \"kernel.h\"\n\nint main(int argc, char **argv){\n  // declarations\n  static data_t A[I][K][L];\n  static data_t B[K][J];\n//  static data_t C[L][J];\n  static data_t C[J][L];\n  static data_t D[I][J];\n  static data_t D_golden[I][J];\n\n  // data initialization\n  for (int i = 0; i < I; i++)\n    for (int k = 0; k < K; k++) \n      for (int l = 0; l < L; l++) {\n        A[i][k][l] = 2.5;\n      }\n  for (int k = 0; k < K; k++)\n    for (int j = 0; j < J; j++) {\n      B[k][j] = 2.5;\n    }\n  for (int l = 0; l < L; l++)\n    for (int j = 0; j < J; j++) {\n//      C[l][j] = 2.5;\n      C[j][l] = 2.5;\n    }\n  data_t tmp;\n\n  // computation\n#pragma scop\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      D[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n        for (int l = 0; l < L; l++) {\n//          D[i][j] += A[i][k][l] * B[k][j] * C[l][j];\n          D[i][j] = D[i][j] + A[i][k][l] * B[k][j] * C[j][l];\n        }\n      }\n    }\n#pragma endscop\n\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      D_golden[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n//        for (int l = 0; l < L; l++) {\n//          D_golden[i][j] += A[i][k][l] * B[k][j] * C[l][j];\n//        }\n        data_t tmp = 0;\n        for (int l = 0; l < L; l++) {\n//          tmp += A[i][k][l] * C[l][j];\n          tmp += A[i][k][l] * C[j][l];\n        }\n        D_golden[i][j] += B[k][j] * tmp;\n      }\n    }\n\n  // comparison\n  int err = 0;\n  float thres = 0.01;\n  for (int i = 0; i < I; i++) \n    for (int j = 0; j < J; j++) {\n      if (fabs((float)D_golden[i][j] - (float)D[i][j]) > thres) {\n        err++;\n      }\n    }\n\n  if (err) {\n    printf(\"Test failed with %d errors!\\n\", err);\n    return -1;\n  } else {\n    printf(\"Test passed!\\n\");\n    return 0;\n  }\n}\n"
  },
  {
    "path": "autosa_tests/large/mttkrp/kernel.h",
    "content": "#include \"stdio.h\"\n#include \"stdlib.h\"\n#include \"math.h\"\n\ntypedef float data_t;\n#define I 256 \n//#define J 256 \n#define J 336\n#define K 256 \n#define L 256 \n"
  },
  {
    "path": "autosa_tests/large/mttkrp/simd_info.json",
    "content": "{\n  \"kernel3\": {\n    \"reduction\": [\"y\", \"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/large/mttkrp/step1-run-hls.tcl",
    "content": "open_project kernel0\nset_top kernel0\nadd_files \"src/kernel_kernel.cpp\"\n#add_files -tb PATH_TO_TESTBENCH_FILE\n\nopen_solution solution\n\n#u250\nset_part xcu250-figd2104-2L-e\n\n# u280\n#set_part xcu280-fsvh2892-2L-e\n\n# 300 MHz\ncreate_clock -period 3.333\n\nconfig_dataflow -strict_mode warning\nset_clock_uncertainty 27.000000%\nconfig_rtl -enable_maxiConservative=1\nconfig_interface -m_axi_addr64\n\n# to enable integration with Vitis\nconfig_sdx -target xocc\n\n#csim_design\ncsynth_design\nclose_project\nexit\n"
  },
  {
    "path": "autosa_tests/large/mttkrp/step2-autobridge.py",
    "content": "#! /usr/bin/python3.6\n\n# add the path to where you place the autobridge source code\nimport sys\nsys.path.append('../src')\n\nimport graph\nfrom formator import FormatHLS\nimport collections\nimport os\nimport subprocess\n\n\"\"\"\nAutoBridge divides the target device as follows and assign each HLS function to one slot\nFor more details pls refer to the paper\n\n      u250                     u280\n   -----------\n 3 |    |    |\n   |----|----|              |----|----|\n 2 |    |    |            2 |    |    |\n   |----|----|              |----|----|\n 1 |    |    |            1 |    |    |\n   |----|----|              |----|----|\n 0 |    |    |            0 |    |    |\n   -----------              -----------\n     0    1                   0    1\n\"\"\"\n\n################### Modify Accordingly ###############################\n\n# (1) fill basic information\nproject_path = '/home/jaywang/doc_examples/mttkrp_ab/kernel0' # path to your hls project\ntop_name = 'kernel0' # name of the top function in your hls design\nsolution_path = f'{project_path}/solution/'\nproject_name = 'kernel0'\nboard_name = 'u250' # or 'u280'\n# where the results will be saved. Your HLS project will be copied there and your top RTL will be replaced.\n# Note that if the directory already exists, we will try to reset the contents\n\n# (2) specify how your designs connect to the external memory\n\"\"\" Example:\n\nvoid kernel0(ap_uint<512> *p1, ap_uint<512> *p2)\n{\n  #pragma HLS INTERFACE m_axi port=p1 offset=slave bundle=gmem_A\n  #pragma HLS INTERFACE m_axi port=p2 offset=slave bundle=gmem_B\n\n  load_p1 (p1, ...);\n  load_p2 (p2, ...);\n}\n\n--------------------------------------\n\nIn this example, the pointer p1 and p2 will become M_AXI controllers to connect to the dedicated DDR IP.\nIf you want p1 to connect to DDR 2 in the 2-nd SLR, then you need to specify that the corresponding RTL controller must be floorplanned at the 2-nd SLR\nMeanwhile, your function load_p1() will talk to the M_AXI controller also through AXI interface which cannot be easily pipelined.\nThus the RTL module corresponds to load_p1() must also be in the 2-nd SLR in this example.\nSince load_p1() will communicate with the rest of your design using FIFO interface, you don't need to specify the location of other modules\n\n(transparent)|                        (user visible)\n             |\n   Vitis     |                    what your HLS design becomes\n             |\n             | M_AXI                     AXI                        FIFO\nDDR IP  <--- | ----> M_AXI controller <-------> your first module <-------> your other modules\n(fixed loc)  |         (p1)                       (load_p1)\n             |\n             | M_AXI                     AXI                        FIFO\nDDR IP  <--- | ----> M_AXI controller <-------> your first module <-------> your other modules\n(fixed loc)  |         (p2)                       (load_p2)\n             |\n             | S_AXI\nPCIe    <--- | ----> S_AXI controller\n             |\n\"\"\"\n\n# on the left side or the right side of an SLR\nDDR_loc_2d_x = collections.defaultdict(dict)\n\n# on which SLR\nDDR_loc_2d_y = collections.defaultdict(dict)\n\n# use DDR 0, 1, 3\nDDR_loc_2d_y['A_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_x['A_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_A_m_axi_U'] = 0\nDDR_loc_2d_x['kernel0_gmem_A_m_axi_U'] = 0\n\nDDR_loc_2d_y['B_IO_L3_in_serialize_U0'] = 1\nDDR_loc_2d_x['B_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_B_m_axi_U'] = 1\nDDR_loc_2d_x['kernel0_gmem_B_m_axi_U'] = 0\n\nDDR_loc_2d_y['C_IO_L3_in_serialize_U0'] = 2\nDDR_loc_2d_x['C_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_C_m_axi_U'] = 2\nDDR_loc_2d_x['kernel0_gmem_C_m_axi_U'] = 0\n\nDDR_loc_2d_y['D_drain_IO_L3_out_serialize_U0'] = 3\nDDR_loc_2d_x['D_drain_IO_L3_out_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_D_m_axi_U'] = 3\nDDR_loc_2d_x['kernel0_gmem_D_m_axi_U'] = 0\n\nDDR_loc_2d_y['kernel0_control_s_axi_U'] = 1\nDDR_loc_2d_y['kernel0_entry16_U0'] = 1\nDDR_loc_2d_x['kernel0_control_s_axi_U'] = 1\nDDR_loc_2d_x['kernel0_entry16_U0'] = 1\n\n# (3) specify DDR information\n# If you instantiate a DDR controller, it will consume non-trivial amount of resource\n# to make the floorplanning better, you need to specify which DDRs have been enabled\n# In this example, you connect p1 to DDR-2 in SLR-2 and p2 to DDR-1 in SLR-1\n# If you want to use all DDRs, for example, you need to set it as [1, 1, 1, 1]\nDDR_enable = [1, 1, 1, 1]\n\n# (4) specify how much resource can be used in each slot\n# In this way you could force the design to be placed evenly across the device and avoid local congestion\n\"\"\" Example:\n   -----------\n 3 |0.76|0.62|\n   |----|----|\n 2 |0.74|0.61|\n   |----|----|\n 1 |0.75|0.6 |\n   |----|----|\n 0 | 0.7|0.6 |\n   -----------\n     0    1\n\"\"\"\nmax_usage_ratio_2d = [ [0.8, 0.75], [0.8, 0.75], [0.8, 0.75], [0.8, 0.75] ]\n\n\n##################### DON'T TOUCH THE SECTION BELOW #################################\ntarget_dir = '/home/jaywang/doc_examples/mttkrp_ab/autobridge_v2'\n\nformator = FormatHLS(\n  rpt_path = f'{solution_path}/syn/report/',\n  hls_sche_path = f'{solution_path}/.autopilot/db/',\n  top_hdl_path = f'{solution_path}/syn/verilog/{top_name}_{top_name}.v',\n  top_name = top_name,\n  DDR_loc_2d_x = DDR_loc_2d_x,\n  DDR_loc_2d_y = DDR_loc_2d_y,\n  DDR_enable = DDR_enable,\n  max_usage_ratio_2d = max_usage_ratio_2d,\n  board_name = board_name,\n  target_dir = target_dir,\n  relay_station_count = lambda x : 2 * x, # how many levels of relay stations to add for x-unit of crossing\n  max_search_time = 600,\n  NaiveBalance = True)\n\n# run floorplanning\ng = graph.Graph(formator)\n\n# move results to target dir\nif (os.path.isdir(target_dir)):\n  subprocess.run(['rm', '-rf', f'{target_dir}'])\nsubprocess.run(['mkdir', f'{target_dir}/'])\nsubprocess.run(['cp', '-r', project_path, f'{target_dir}/{project_name}'])\nsubprocess.run(['cp', os.path.realpath(__file__), f'{target_dir}/archived_source.txt'])\nsubprocess.run(['chmod', '+w', '-R', f'{target_dir}'])\nsubprocess.run(['cp', 'constraint.tcl', target_dir])\nsubprocess.run(['cp', 'pack_xo.tcl', target_dir])\nsubprocess.run(['cp', 'autobridge.log', target_dir])\nsubprocess.run(['cp', f'{top_name}_{top_name}.v', f'{target_dir}/{project_name}/solution/syn/verilog/'])\n\n# clean up\nos.system('rm *.lp')\nsubprocess.run(['rm', 'parser.out'])\nsubprocess.run(['rm', 'parsetab.py'])\nsubprocess.run(['rm', '-rf', '__pycache__'])\n\n"
  },
  {
    "path": "autosa_tests/large/mttkrp/step3-pack-xo.tcl",
    "content": "open_project kernel0\nopen_solution solution\nexport_design -rtl verilog -format ip_catalog -xo kernel0.xo\n\nclose_project\nputs \"Pack XO successfully\"\nexit\n"
  },
  {
    "path": "autosa_tests/large/mttkrp/step4-run-vitis.sh",
    "content": "OUTPUT_DIR=\"$(pwd)/vitis_run\"\n\n# name of the top function\nTOP=kernel0\n\n# choose the target device\nPLATFORM=xilinx_u250_xdma_201830_2 \n#PLATFORM=xilinx_u280_xdma_201920_3 \n\nXO=\"$(pwd)/kernel0.xo\"\n\n# For different approaches see UG904-vivado-implementation\n#STRATEGY=\"Default\" \nSTRATEGY=\"EarlyBlockPlacement\" \n\n# remove the unused '--connectivity.sp' option for v++ if some DDRs are not used \n# Example: if we map p1 to DDR 3 and p2 to DDR 0\n#\n# void kernel0(ap_uint<512> *p1, ap_uint<512> *p2)\n# {\n#   #pragma HLS INTERFACE m_axi port=p1 offset=slave bundle=gmem_A\n#   #pragma HLS INTERFACE m_axi port=p2 offset=slave bundle=gmem_B\n# \n#   load_p1 (p1, ...);\n#   load_p2 (p2, ...);\n# }\n#\n# ARG_FOR_DDR_0=p2\n# ARG_FOR_DDR_3=p1\n# Should remove '--connectivity.sp' for DDR1 and DDR2\n\nARG_FOR_DDR_1=A\nARG_FOR_DDR_2=B\nARG_FOR_DDR_3=C\nARG_FOR_DDR_4=D\n\n# the constraint file containing the floorplan results\n# WARNING: must use absolute address\nCONSTRAINT=\"$(pwd)/constraint.tcl\"\nif [ ! -f \"$CONSTRAINT\" ]; then\n    echo \"no constraint file found\"\n    exit\nfi\n\nv++ \\\n  --link \\\n  --output \"${OUTPUT_DIR}/${TOP}_${PLATFORM}.xclbin\" \\\n  --kernel ${TOP} \\\n  --platform ${PLATFORM} \\\n  --target hw \\\n  --report_level 2 \\\n  --temp_dir \"${OUTPUT_DIR}/${TOP}_${PLATFORM}.temp\" \\\n  --optimize 3 \\\n  --connectivity.nk ${TOP}:1:${TOP}_1 \\\n  --max_memory_ports ${TOP} \\\n  --save-temps \\\n  ${XO} \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_1}:DDR[0] \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_2}:DDR[1] \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_3}:DDR[2] \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_4}:DDR[3] \\\n  --kernel_frequency 300 \\\n  --vivado.prop run.impl_1.STEPS.PLACE_DESIGN.ARGS.DIRECTIVE=$STRATEGY \\\n  --vivado.prop run.impl_1.STEPS.OPT_DESIGN.TCL.PRE=$CONSTRAINT\n"
  },
  {
    "path": "autosa_tests/large/ttm/Makefile",
    "content": "VPP := $(XILINX_VITIS)/bin/v++\nEMCONFIGUTIL := $(XILINX_VITIS)/bin/emconfigutil\nMODE := hw\n#PLATFORM := xilinx_u200_qdma_201920_1\nPLATFORM := xilinx_u250_xdma_201830_2\n\n# sources\nKERNEL_SRC := src/kernel_kernel.cpp\nHOST_SRC := src/kernel_host.cpp\n\n# targets\nHOST_EXE := host.exe\n\nXOS := kernel0.$(MODE).xo\nXCLBIN := kernel0.$(MODE).xclbin\nEMCONFIG_FILE := emconfig.json\n\n# Linker options to map kernel ports to DDR banks\nVPP_LINK_OPTS := --config connectivity.cfg\n\nVPP_COMMON_OPTS := -s -t $(MODE) --platform $(PLATFORM) -R2 -O3 --kernel_frequency 250 --vivado.prop=run.impl_1.STRATEGY=Performance_EarlyBlockPlacement\nCFLAGS := -g -std=c++11 -I$(XILINX_XRT)/include\nLFLAGS := -L$(XILINX_XRT)/lib -lxilinxopencl -lpthread -lrt\nNUMDEVICES := 1\n\n# run time args\nEXE_OPT := kernel0.$(MODE).xclbin\n\n# primary build targets\n.PHONY: xclbin app all\n\nxclbin:  $(XCLBIN)\napp: $(HOST_EXE)\n\nall: xclbin app\n\nclean:\n\t-$(RM) $(EMCONFIG_FILE) $(HOST_EXE) $(XCLBIN) *.xclbin *.xo $(XOS)\n\n# kernel rules\n$(XOS): $(KERNEL_SRC)\n\t$(RM) $@\n\t$(VPP) $(VPP_COMMON_OPTS) -c -k kernel0 -o $@ $+\n\n\n$(XCLBIN): $(XOS)\n\t$(VPP) $(VPP_COMMON_OPTS) -l -o $@ $+ $(VPP_LINK_OPTS)\n\n# host rules\n$(HOST_EXE): $(HOST_SRC)\n\tg++ $(CFLAGS) -o $@ $+ $(LFLAGS)\n\t@echo 'Compiled Host Executable: $(HOST_EXE)'\n\n$(EMCONFIG_FILE):\n\t$(EMCONFIGUTIL) --nd $(NUMDEVICES) --od . --platform $(PLATFORM)\n\ncheck: $(XCLBIN) $(HOST_EXE) $(EMCONFIG_FILE)\n\tXCL_EMULATION_MODE=${MODE} ./$(HOST_EXE) $(EXE_OPT)\n"
  },
  {
    "path": "autosa_tests/large/ttm/README.md",
    "content": "# Tensor Times Matrix (TTM)\n\nBoard        | Software Version\n-------------|-----------------\nXilinx Alveo U250 | Xilinx Vitis 2019.2\n\n__Files__:\n```\nautosa_tests/large/ttm/kernel.c\nautosa_tests/large/ttm/kernel.h\nautosa_tests/large/ttm/simd_info.json\nautosa_tests/large/ttm/Makefile\nautosa_tests/large/ttm/connectivity.cfg\n```\n\n__Command__:\n```c\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` and `connectivity.cfg` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/large/ttm/Makefile autosa.tmp/output/\ncp autosa_tests/large/ttm/connectivity.cfg autosa.tmp/output/\n```\n\nExecute the makefile to build the design.\n\n```\ncd autosa.tmp/output\nmake all\n```"
  },
  {
    "path": "autosa_tests/large/ttm/connectivity.cfg",
    "content": "[connectivity]\nsp=kernel0_1.A:DDR[0]\nsp=kernel0_1.B:DDR[1] \nsp=kernel0_1.C:DDR[2]\n"
  },
  {
    "path": "autosa_tests/large/ttm/kernel.c",
    "content": "/*\n * This code implements the Tensor Times Matrix (TTM), which performs:\n * C(i,j,k) += A(i,j,l) * B(l,k)\n * Input: A[I][J][L], B[L][K]\n * Output: C[I][J][K]\n */\n\n#include \"kernel.h\"\n\nint main(int argc, char **argv){\n  // declarations\n  static data_t A[I][J][L];\n//  static data_t B[L][K];\n  static data_t B[K][L];\n  static data_t C[I][J][K];\n  static data_t C_golden[I][J][K];\n\n  // data initialization\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) \n      for (int l = 0; l < L; l++) {\n        A[i][j][l] = 2.5;\n      }\n  for (int l = 0; l < L; l++)\n    for (int k = 0; k < K; k++) {\n//      B[l][k] = 2.5;\n      B[k][l] = 2.5;\n    }\n\n  // computation\n#pragma scop\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) \n      for (int k = 0; k < K; k++) {\n//        C[i][j][k] = 0;\n        for (int l = 0; l < L; l++) {\n//          C[i][j][k] = C[i][j][k] + A[i][j][l] * B[l][k];\n          C[i][j][k] = C[i][j][k] + A[i][j][l] * B[k][l];\n        }\n      }\n#pragma endscop\n\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) \n      for (int k = 0; k < K; k++) {\n        C_golden[i][j][k] = 0;\n        for (int l = 0; l < L; l++) {\n//          C_golden[i][j][k] += A[i][j][l] * B[l][k];\n          C_golden[i][j][k] += A[i][j][l] * B[k][l];\n        }\n      }\n\n  // comparison\n  int err = 0;\n  float thres = 0.001;\n  for (int i = 0; i < I; i++) \n    for (int j = 0; j < J; j++) \n      for (int k = 0; k < K; k++) {\n        if (fabs(C_golden[i][j][k] - C[i][j][k]) > thres) {\n          err++;\n        }\n      }\n\n  if (err) {\n    printf(\"Test failed with %d errors!\\n\", err);\n    return -1;\n  } else {\n    printf(\"Test passed!\\n\");\n    return 0;\n  }\n}\n"
  },
  {
    "path": "autosa_tests/large/ttm/kernel.h",
    "content": "#include \"stdio.h\"\n#include \"stdlib.h\"\n#include \"math.h\"\n\ntypedef float data_t;\n//#define I 256\n//#define J 256\n//#define K 256\n//#define L 256\n\n#define I 264\n#define J 256 \n#define K 256 \n#define L 256 \n"
  },
  {
    "path": "autosa_tests/large/ttm/simd_info.json",
    "content": "{\n  \"kernel4\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel5\": {\n    \"reduction\": [\"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/large/ttmc/Makefile",
    "content": "VPP := $(XILINX_VITIS)/bin/v++\nEMCONFIGUTIL := $(XILINX_VITIS)/bin/emconfigutil\nMODE := hw\n#PLATFORM := xilinx_u200_qdma_201920_1\nPLATFORM := xilinx_u250_xdma_201830_2\n\n# sources\nKERNEL_SRC := src/kernel_kernel.cpp\nHOST_SRC := src/kernel_host.cpp\n\n# targets\nHOST_EXE := host.exe\n\nXOS := kernel0.$(MODE).xo\nXCLBIN := kernel0.$(MODE).xclbin\nEMCONFIG_FILE := emconfig.json\n\n# Linker options to map kernel ports to DDR banks\nVPP_LINK_OPTS := --config connectivity.cfg\n\nVPP_COMMON_OPTS := -s -t $(MODE) --platform $(PLATFORM) -R2 -O3 --kernel_frequency 250 --vivado.prop=run.impl_1.STRATEGY=Performance_EarlyBlockPlacement\nCFLAGS := -g -std=c++11 -I$(XILINX_XRT)/include\nLFLAGS := -L$(XILINX_XRT)/lib -lxilinxopencl -lpthread -lrt\nNUMDEVICES := 1\n\n# run time args\nEXE_OPT := kernel0.$(MODE).xclbin\n\n# primary build targets\n.PHONY: xclbin app all\n\nxclbin:  $(XCLBIN)\napp: $(HOST_EXE)\n\nall: xclbin app\n\nclean:\n\t-$(RM) $(EMCONFIG_FILE) $(HOST_EXE) $(XCLBIN) *.xclbin *.xo $(XOS)\n\n# kernel rules\n$(XOS): $(KERNEL_SRC)\n\t$(RM) $@\n\t$(VPP) $(VPP_COMMON_OPTS) -c -k kernel0 -o $@ $+\n\n\n$(XCLBIN): $(XOS)\n\t$(VPP) $(VPP_COMMON_OPTS) -l -o $@ $+ $(VPP_LINK_OPTS)\n\n# host rules\n$(HOST_EXE): $(HOST_SRC)\n\tg++ $(CFLAGS) -o $@ $+ $(LFLAGS)\n\t@echo 'Compiled Host Executable: $(HOST_EXE)'\n\n$(EMCONFIG_FILE):\n\t$(EMCONFIGUTIL) --nd $(NUMDEVICES) --od . --platform $(PLATFORM)\n\ncheck: $(XCLBIN) $(HOST_EXE) $(EMCONFIG_FILE)\n\tXCL_EMULATION_MODE=${MODE} ./$(HOST_EXE) $(EXE_OPT)\n"
  },
  {
    "path": "autosa_tests/large/ttmc/README.md",
    "content": "# Chain of Tensor-matrix multiplications (TTMc)\n\nBoard        | Software Version\n-------------|-----------------\nXilinx Alveo U250 | Xilinx Vitis 2019.2\n\n__Files__:\n```\nautosa_tests/large/ttmc/kernel.c\nautosa_tests/large/ttmc/kernel.h\nautosa_tests/large/ttmc/simd_info.json\nautosa_tests/large/ttmc/Makefile\nautosa_tests/large/ttmc/connectivity.cfg\n```\n\n__Command__:\n```c\n./autosa ./autosa_tests/large/ttmc/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[16,64,16,32];kernel[]->latency[1,8,8];kernel[]->simd[8,1]}\" --simd-info=./autosa_tests/large/ttmc/simd_info.json --host-serialize\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` and `connectivity.cfg` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/large/ttmc/Makefile autosa.tmp/output/\ncp autosa_tests/large/ttmc/connectivity.cfg autosa.tmp/output/\n```\n\nExecute the makefile to build the design.\n\n```\ncd autosa.tmp/output\nmake all\n```"
  },
  {
    "path": "autosa_tests/large/ttmc/connectivity.cfg",
    "content": "[connectivity]\nsp=kernel0_1.A:DDR[0]\nsp=kernel0_1.B:DDR[1] \nsp=kernel0_1.C:DDR[2]\nsp=kernel0_1.D:DDR[3]\n"
  },
  {
    "path": "autosa_tests/large/ttmc/kernel.c",
    "content": "/*\n * This code implements the Chain of Tensor-matrix multiplications (TTMc), which performs:\n * D(i,j,k) += A(i,l,m) * B(l,j) * C(m,k)\n * Input: A[I][L][M], B[L][J], C[M][K]\n * Output: D[I][J][K]\n */\n\n#include \"kernel.h\"\n\nint main(int argc, char **argv){\n  // declarations\n  static data_t A[I][L][M];\n  static data_t B[L][J];\n//  static data_t C[M][K];\n  static data_t C[K][M];\n  static data_t D[I][J][K];\n  static data_t D_golden[I][J][K];\n\n  // data initialization\n  for (int i = 0; i < I; i++)\n    for (int l = 0; l < L; l++) \n      for (int m = 0; m < M; m++) {\n        A[i][l][m] = 2.5;\n      }\n  for (int l = 0; l < L; l++)\n    for (int j = 0; j < J; j++) {\n      B[l][j] = 2.5;\n    }\n  for (int m = 0; m < M; m++)\n    for (int k = 0; k < K; k++) {\n//      C[m][k] = 2.5;\n      C[k][m] = 2.5;\n    }\n  \n  // computation\n#pragma scop\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) \n      for (int k = 0; k < K; k++) {\n        D[i][j][k] = 0;        \n        for (int l = 0; l < L; l++) \n          for (int m = 0; m < M; m++) {\n//            D[i][j][k] = D[i][j][k] + A[i][l][m] * B[l][j] * C[m][k];\n            D[i][j][k] = D[i][j][k] + A[i][l][m] * B[l][j] * C[k][m];\n          }\n      }    \n#pragma endscop\n\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) \n      for (int k = 0; k < K; k++) {\n        D_golden[i][j][k] = 0;        \n        for (int l = 0; l < L; l++) \n          for (int m = 0; m < M; m++) {\n//            D_golden[i][j][k] += A[i][l][m] * B[l][j] * C[m][k];\n            D_golden[i][j][k] += A[i][l][m] * B[l][j] * C[k][m];\n          }\n      }    \n\n  // comparison\n  int err = 0;\n  float thres = 0.001;\n  for (int i = 0; i < I; i++) \n    for (int j = 0; j < J; j++) \n      for (int k = 0; k < K; k++) {\n        if (fabs(D_golden[i][j][k] - D[i][j][k]) > thres) {\n          err++;\n        }\n      }\n\n  if (err) {\n    printf(\"Test failed with %d errors!\\n\", err);\n    return -1;\n  } else {\n    printf(\"Test passed!\\n\");\n    return 0;\n  }\n}\n"
  },
  {
    "path": "autosa_tests/large/ttmc/kernel.h",
    "content": "#include \"stdio.h\"\n#include \"stdlib.h\"\n#include \"math.h\"\n\ntypedef float data_t;\n#define I 128 \n#define J 128 \n#define K 128 \n#define L 128 \n#define M 128 \n"
  },
  {
    "path": "autosa_tests/large/ttmc/simd_info.json",
    "content": "{\n  \"kernel4\": {\n    \"reduction\": [\"y\", \"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/large/ttmc/step1-run-hls.tcl",
    "content": "open_project kernel0\nset_top kernel0\nadd_files \"src/kernel_kernel.cpp\"\n#add_files -tb PATH_TO_TESTBENCH_FILE\n\nopen_solution solution\n\n#u250\nset_part xcu250-figd2104-2L-e\n\n# u280\n#set_part xcu280-fsvh2892-2L-e\n\n# 300 MHz\ncreate_clock -period 3.333\n\nconfig_dataflow -strict_mode warning\nset_clock_uncertainty 27.000000%\nconfig_rtl -enable_maxiConservative=1\nconfig_interface -m_axi_addr64\n\n# to enable integration with Vitis\nconfig_sdx -target xocc\n\n#csim_design\ncsynth_design\nclose_project\nexit\n"
  },
  {
    "path": "autosa_tests/large/ttmc/step2-autobridge.py",
    "content": "#! /usr/bin/python3.6\n\n# add the path to where you place the autobridge source code\nimport sys\nsys.path.append('../src')\n\nimport graph\nfrom formator import FormatHLS\nimport collections\nimport os\nimport subprocess\n\n\"\"\"\nAutoBridge divides the target device as follows and assign each HLS function to one slot\nFor more details pls refer to the paper\n\n      u250                     u280\n   -----------\n 3 |    |    |\n   |----|----|              |----|----|\n 2 |    |    |            2 |    |    |\n   |----|----|              |----|----|\n 1 |    |    |            1 |    |    |\n   |----|----|              |----|----|\n 0 |    |    |            0 |    |    |\n   -----------              -----------\n     0    1                   0    1\n\"\"\"\n\n################### Modify Accordingly ###############################\n\n# (1) fill basic information\nproject_path = '/home/jaywang/doc_examples/ttmc_ab/kernel0' # path to your hls project\ntop_name = 'kernel0' # name of the top function in your hls design\nsolution_path = f'{project_path}/solution/'\nproject_name = 'kernel0'\nboard_name = 'u250' # or 'u280'\n# where the results will be saved. Your HLS project will be copied there and your top RTL will be replaced.\n# Note that if the directory already exists, we will try to reset the contents\n\n# (2) specify how your designs connect to the external memory\n\"\"\" Example:\n\nvoid kernel0(ap_uint<512> *p1, ap_uint<512> *p2)\n{\n  #pragma HLS INTERFACE m_axi port=p1 offset=slave bundle=gmem_A\n  #pragma HLS INTERFACE m_axi port=p2 offset=slave bundle=gmem_B\n\n  load_p1 (p1, ...);\n  load_p2 (p2, ...);\n}\n\n--------------------------------------\n\nIn this example, the pointer p1 and p2 will become M_AXI controllers to connect to the dedicated DDR IP.\nIf you want p1 to connect to DDR 2 in the 2-nd SLR, then you need to specify that the corresponding RTL controller must be floorplanned at the 2-nd SLR\nMeanwhile, your function load_p1() will talk to the M_AXI controller also through AXI interface which cannot be easily pipelined.\nThus the RTL module corresponds to load_p1() must also be in the 2-nd SLR in this example.\nSince load_p1() will communicate with the rest of your design using FIFO interface, you don't need to specify the location of other modules\n\n(transparent)|                        (user visible)\n             |\n   Vitis     |                    what your HLS design becomes\n             |\n             | M_AXI                     AXI                        FIFO\nDDR IP  <--- | ----> M_AXI controller <-------> your first module <-------> your other modules\n(fixed loc)  |         (p1)                       (load_p1)\n             |\n             | M_AXI                     AXI                        FIFO\nDDR IP  <--- | ----> M_AXI controller <-------> your first module <-------> your other modules\n(fixed loc)  |         (p2)                       (load_p2)\n             |\n             | S_AXI\nPCIe    <--- | ----> S_AXI controller\n             |\n\"\"\"\n\n# on the left side or the right side of an SLR\nDDR_loc_2d_x = collections.defaultdict(dict)\n\n# on which SLR\nDDR_loc_2d_y = collections.defaultdict(dict)\n\n# use DDR 0, 1, 2, 3\nDDR_loc_2d_y['A_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_x['A_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_A_m_axi_U'] = 0\nDDR_loc_2d_x['kernel0_gmem_A_m_axi_U'] = 0\n\nDDR_loc_2d_y['B_IO_L3_in_serialize_U0'] = 1\nDDR_loc_2d_x['B_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_B_m_axi_U'] = 1\nDDR_loc_2d_x['kernel0_gmem_B_m_axi_U'] = 0\n\nDDR_loc_2d_y['C_IO_L3_in_serialize_U0'] = 2\nDDR_loc_2d_x['C_IO_L3_in_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_C_m_axi_U'] = 2\nDDR_loc_2d_x['kernel0_gmem_C_m_axi_U'] = 0\n\nDDR_loc_2d_y['D_drain_IO_L3_out_serialize_U0'] = 3\nDDR_loc_2d_x['D_drain_IO_L3_out_serialize_U0'] = 0\nDDR_loc_2d_y['kernel0_gmem_D_m_axi_U'] = 3\nDDR_loc_2d_x['kernel0_gmem_D_m_axi_U'] = 0\n\nDDR_loc_2d_y['kernel0_control_s_axi_U'] = 0\n\n# (3) specify DDR information\n# If you instantiate a DDR controller, it will consume non-trivial amount of resource\n# to make the floorplanning better, you need to specify which DDRs have been enabled\n# In this example, you connect p1 to DDR-2 in SLR-2 and p2 to DDR-1 in SLR-1\n# If you want to use all DDRs, for example, you need to set it as [1, 1, 1, 1]\nDDR_enable = [1, 1, 1, 1]\n\n# (4) specify how much resource can be used in each slot\n# In this way you could force the design to be placed evenly across the device and avoid local congestion\n\"\"\" Example:\n   -----------\n 3 |0.76|0.62|\n   |----|----|\n 2 |0.74|0.61|\n   |----|----|\n 1 |0.75|0.6 |\n   |----|----|\n 0 | 0.7|0.6 |\n   -----------\n     0    1\n\"\"\"\nmax_usage_ratio_2d = [ [0.85, 0.7], [0.85, 0.7], [0.85, 0.7], [0.85, 0.7] ]\n\n\n##################### DON'T TOUCH THE SECTION BELOW #################################\ntarget_dir = '/home/jaywang/doc_examples/ttmc_ab/autobridge'\n\nformator = FormatHLS(\n  rpt_path = f'{solution_path}/syn/report/',\n  hls_sche_path = f'{solution_path}/.autopilot/db/',\n  top_hdl_path = f'{solution_path}/syn/verilog/{top_name}_{top_name}.v',\n  top_name = top_name,\n  DDR_loc_2d_x = DDR_loc_2d_x,\n  DDR_loc_2d_y = DDR_loc_2d_y,\n  DDR_enable = DDR_enable,\n  max_usage_ratio_2d = max_usage_ratio_2d,\n  board_name = board_name,\n  target_dir = target_dir,\n  relay_station_count = lambda x : 2 * x, # how many levels of relay stations to add for x-unit of crossing\n  max_search_time = 600,\n  NaiveBalance = True)\n\n# run floorplanning\ng = graph.Graph(formator)\n\n# move results to target dir\nif (os.path.isdir(target_dir)):\n  subprocess.run(['rm', '-rf', f'{target_dir}'])\nsubprocess.run(['mkdir', f'{target_dir}/'])\nsubprocess.run(['cp', '-r', project_path, f'{target_dir}/{project_name}'])\nsubprocess.run(['cp', os.path.realpath(__file__), f'{target_dir}/archived_source.txt'])\nsubprocess.run(['chmod', '+w', '-R', f'{target_dir}'])\nsubprocess.run(['cp', 'constraint.tcl', target_dir])\nsubprocess.run(['cp', 'pack_xo.tcl', target_dir])\nsubprocess.run(['cp', 'autobridge.log', target_dir])\nsubprocess.run(['cp', f'{top_name}_{top_name}.v', f'{target_dir}/{project_name}/solution/syn/verilog/'])\n\n# clean up\nos.system('rm *.lp')\nsubprocess.run(['rm', 'parser.out'])\nsubprocess.run(['rm', 'parsetab.py'])\nsubprocess.run(['rm', '-rf', '__pycache__'])\n\n"
  },
  {
    "path": "autosa_tests/large/ttmc/step3-pack-xo.tcl",
    "content": "open_project kernel0\nopen_solution solution\nexport_design -rtl verilog -format ip_catalog -xo kernel0.xo\n\nclose_project\nputs \"Pack XO successfully\"\nexit\n"
  },
  {
    "path": "autosa_tests/large/ttmc/step4-run-vitis.sh",
    "content": "OUTPUT_DIR=\"$(pwd)/vitis_run\"\n\n# name of the top function\nTOP=kernel0\n\n# choose the target device\nPLATFORM=xilinx_u250_xdma_201830_2 \n#PLATFORM=xilinx_u280_xdma_201920_3 \n\nXO=\"$(pwd)/kernel0.xo\"\n\n# For different approaches see UG904-vivado-implementation\nSTRATEGY=\"Default\" \n#STRATEGY=\"EarlyBlockPlacement\" \n\n# remove the unused '--connectivity.sp' option for v++ if some DDRs are not used \n# Example: if we map p1 to DDR 3 and p2 to DDR 0\n#\n# void kernel0(ap_uint<512> *p1, ap_uint<512> *p2)\n# {\n#   #pragma HLS INTERFACE m_axi port=p1 offset=slave bundle=gmem_A\n#   #pragma HLS INTERFACE m_axi port=p2 offset=slave bundle=gmem_B\n# \n#   load_p1 (p1, ...);\n#   load_p2 (p2, ...);\n# }\n#\n# ARG_FOR_DDR_0=p2\n# ARG_FOR_DDR_3=p1\n# Should remove '--connectivity.sp' for DDR1 and DDR2\n\nARG_FOR_DDR_1=A\nARG_FOR_DDR_2=B\nARG_FOR_DDR_3=C\nARG_FOR_DDR_4=D\n\n# the constraint file containing the floorplan results\n# WARNING: must use absolute address\nCONSTRAINT=\"$(pwd)/constraint.tcl\"\nif [ ! -f \"$CONSTRAINT\" ]; then\n    echo \"no constraint file found\"\n    exit\nfi\n\nv++ \\\n  --link \\\n  --output \"${OUTPUT_DIR}/${TOP}_${PLATFORM}.xclbin\" \\\n  --kernel ${TOP} \\\n  --platform ${PLATFORM} \\\n  --target hw \\\n  --report_level 2 \\\n  --temp_dir \"${OUTPUT_DIR}/${TOP}_${PLATFORM}.temp\" \\\n  --optimize 3 \\\n  --connectivity.nk ${TOP}:1:${TOP}_1 \\\n  --max_memory_ports ${TOP} \\\n  --save-temps \\\n  ${XO} \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_1}:DDR[0] \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_2}:DDR[1] \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_3}:DDR[2] \\\n  --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_4}:DDR[3] \\\n  --kernel_frequency 300 \\\n  --vivado.prop run.impl_1.STEPS.PLACE_DESIGN.ARGS.DIRECTIVE=$STRATEGY \\\n  --vivado.prop run.impl_1.STEPS.OPT_DESIGN.TCL.PRE=$CONSTRAINT\n"
  },
  {
    "path": "autosa_tests/lu/Makefile",
    "content": "VPP := $(XILINX_VITIS)/bin/v++\nEMCONFIGUTIL := $(XILINX_VITIS)/bin/emconfigutil\nMODE := hw\n#PLATFORM := xilinx_u200_qdma_201920_1\nPLATFORM := xilinx_u250_xdma_201830_2\n\n# sources\nKERNEL_SRC := src/kernel_kernel.cpp\nHOST_SRC := src/kernel_host.cpp\n\n# targets\nHOST_EXE := host.exe\n\nXOS := kernel0.$(MODE).xo\nXCLBIN := kernel0.$(MODE).xclbin\nEMCONFIG_FILE := emconfig.json\n\n# Linker options to map kernel ports to DDR banks\nVPP_LINK_OPTS := --config connectivity.cfg\n\nVPP_COMMON_OPTS := -s -t $(MODE) --platform $(PLATFORM) -R2 -O3 --kernel_frequency 250 --vivado.prop=run.impl_1.STRATEGY=Performance_EarlyBlockPlacement\nCFLAGS := -g -std=c++11 -I$(XILINX_XRT)/include\nLFLAGS := -L$(XILINX_XRT)/lib -lxilinxopencl -lpthread -lrt\nNUMDEVICES := 1\n\n# run time args\nEXE_OPT := kernel0.$(MODE).xclbin\n\n# primary build targets\n.PHONY: xclbin app all\n\nxclbin:  $(XCLBIN)\napp: $(HOST_EXE)\n\nall: xclbin app\n\nclean:\n\t-$(RM) $(EMCONFIG_FILE) $(HOST_EXE) $(XCLBIN) *.xclbin *.xo $(XOS)\n\n# kernel rules\n$(XOS): $(KERNEL_SRC)\n\t$(RM) $@\n\t$(VPP) $(VPP_COMMON_OPTS) -c -k kernel0 -o $@ $+\n\n\n$(XCLBIN): $(XOS)\n\t$(VPP) $(VPP_COMMON_OPTS) -l -o $@ $+ $(VPP_LINK_OPTS)\n\n# host rules\n$(HOST_EXE): $(HOST_SRC)\n\tg++ $(CFLAGS) -o $@ $+ $(LFLAGS)\n\t@echo 'Compiled Host Executable: $(HOST_EXE)'\n\n$(EMCONFIG_FILE):\n\t$(EMCONFIGUTIL) --nd $(NUMDEVICES) --od . --platform $(PLATFORM)\n\ncheck: $(XCLBIN) $(HOST_EXE) $(EMCONFIG_FILE)\n\tXCL_EMULATION_MODE=${MODE} ./$(HOST_EXE) $(EXE_OPT)\n"
  },
  {
    "path": "autosa_tests/lu/README.md",
    "content": "# LU Decomposition (Small)\n\nBoard        | Software Version\n-------------|-----------------\nXilinx Alveo U250 | Xilinx Vitis 2019.2\n\n__Files__:\n```\nautosa_tests/lu/kernel.c\nautosa_tests/lu/kernel.h\nautosa_tests/lu/simd_info.json\nautosa_tests/lu/Makefile\nautosa_tests/lu/connectivity.cfg\n```\n\n__Command__:\n```bash\n./autosa ./autosa_tests/lu/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[-1,-1,-1];kernel[]->latency[]}\" --simd-info=./autosa_tests/lu/simd_info.json --use-cplusplus-template --no-reschedule --live-range-reordering\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` and `connectivity.cfg` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/lu/Makefile autosa.tmp/output/\ncp autosa_tests/lu/connectivity.cfg autosa.tmp/output/\n```\n\nExecute the makefile to build the design.\n\n```\ncd autosa.tmp/output\nmake all\n```"
  },
  {
    "path": "autosa_tests/lu/add_batch.py",
    "content": "import argparse\n\ndef run(input_f, output_f, batch):\n    new_lines = []\n    with open(input_f, 'r') as f:\n        lines = f.readlines()\n    inside_module = False\n    inside_inner_module = False\n    var_decl = 0\n    add_loop = False\n    #for line in lines:\n    for line in lines:\n        if line == '}\\n' and add_loop:\n            if inside_module and not inside_inner_module:\n                new_lines.append(f'  }}\\n')\n        new_lines.append(line)\n        if line.find('Module Definition') != -1:\n            inside_module = not inside_module\n            if not inside_module:\n                inside_inner_module = False\n                var_decl = 0\n                add_loop = False\n        if inside_module:\n            if line.find('intra_trans(') != -1 or \\\n               line.find('inter_trans(') != -1 or \\\n               line.find('inter_trans_boundary(') != -1:\n               inside_inner_module = True\n        if inside_module and not inside_inner_module:\n            if line.find('Variable Declaration') != -1:\n                var_decl += 1\n            if var_decl == 2:\n                # Insert the batch loop here            \n                new_lines.append(f'  for (int bn = 0; bn < {batch}; bn++) {{\\n')                \n                add_loop = True\n                var_decl = 0\n\n    with open(output_f, 'w') as f:\n        f.writelines(new_lines)\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser(description=\"Add batch loops into the code\")\n    parser.add_argument('-i', required=True, help='intput kernel file')\n    parser.add_argument('-b', required=True, help='batch num')\n    parser.add_argument('-o', required=True, help='output kernel file')\n\n    args = parser.parse_args()\n    run(args.i, args.o, args.b)"
  },
  {
    "path": "autosa_tests/lu/hls_script.tcl",
    "content": "############################################################\n## This file is generated automatically by Vivado HLS.\n## Please DO NOT edit it.\n## Copyright (C) 1986-2019 Xilinx, Inc. All Rights Reserved.\n############################################################\nopen_project hls_prj\nset_top kernel0\nadd_files src/kernel_kernel.h\nadd_files src/kernel_kernel.cpp\nadd_files -tb src/kernel_host.cpp\nopen_solution \"solution1\"\nset_part {xcu200-fsgd2104-2-e}\ncreate_clock -period 5 -name default\nconfig_compile -name_max_length 50\n#source \"./prj/solution1/directives.tcl\"\ncsim_design\n#csynth_design\n#cosim_design\n#cosim_design -trace_level all\n#cosim_design -setup -trace_level all\n#export_design -format ip_catalog\nexit\n"
  },
  {
    "path": "autosa_tests/lu/kernel.c",
    "content": "#include \"kernel.h\"\n\nvoid init_array(data_t A[N][N])\n{\n  int i, j;\n\n  for (i = 0; i < N; i++)\n  {\n    for (j = 0; j <= i; j++)\n      A[i][j] = (data_t)(-j % N) / N + 1;\n    for (j = i + 1; j < N; j++) {\n      A[i][j] = 0;\n    }\n    A[i][i] = 1;\n  }\n\n  /* Make the matrix positive semi-definite. */\n  /* not necessary for LU, but using same code as cholesky */\n  int r, s, t;\n  data_t B[N][N];\n  for (r = 0; r < N; r++)\n    for (s = 0; s < N; s++) \n      B[r][s] = 0;\n  for (t = 0; t < N; t++)\n    for (r = 0; r < N; r++)\n      for (s = 0; s < N; s++)\n        B[r][s] += A[r][t] * A[s][t];\n  for (r = 0; r < N; r++)        \n    for (s = 0; s < N; s++)\n      A[r][s] = B[r][s];\n}\n\nvoid lu_cpu(data_t A[N][N], data_t L[N][N], data_t U[N][N]) {\n  data_t prev_V[N][N][N];\n  data_t V_tmp[N][N][N];\n  data_t U_tmp[N][N][N];\n  data_t L_tmp[N][N][N];\n\n  for (int k = 0; k < N; k++)\n    for (int j = k; j < N; j++)\n      for (int i = k; i < N; i++) {\n        if (k == 0)\n          prev_V[i][j][k] = A[i][j];\n        else\n          prev_V[i][j][k] = V_tmp[i][j][k - 1];\n        \n        if (j == k) {\n          U_tmp[i][j][k] = prev_V[i][j][k];\n          U[j][i] = U_tmp[i][j][k];\n        } else {\n          U_tmp[i][j][k] = U_tmp[i][j - 1][k];\n\n          if (i == k) {            \n            L_tmp[i][j][k] = prev_V[i][j][k] / U_tmp[i][j - 1][k]; // final\n            L[i][j] = L_tmp[i][j][k];\n          } else {\n            L_tmp[i][j][k] = L_tmp[i - 1][j][k];\n          }\n          V_tmp[i][j][k] = prev_V[i][j][k] - L_tmp[i][j][k] * U_tmp[i][j - 1][k];\n        }\n      }  \n}\n\nvoid lu_device(data_t A[N][N], data_t L[N][N], data_t U[N][N])\n{\n#pragma scop\n  {\n    data_t prev_V[N][N];  \n    data_t V[N][N];\n    data_t U_tmp[N][N];\n    data_t L_tmp[N][N];\n\n    for (int k = 0; k < N; k++) {    \n      for (int j = k; j < N; j++)\n        for (int i = k; i < N; i++) {\n          if (k == 0)\n            prev_V[i][j] = A[i][j];\n          else\n            prev_V[i][j] = V[i][j];          \n\n          if (j == k) {          \n            U_tmp[i][j] = prev_V[i][j]; \n            U[j][i] = U_tmp[i][j]; // final\n          } else {          \n            U_tmp[i][j] = U_tmp[i][j - 1];        \n\n            if (i == k) {\n              L_tmp[i][j] = prev_V[i][j] / U_tmp[i][j]; \n              L[i][j] = L_tmp[i][j]; // final\n            } else {            \n              L_tmp[i][j] = L_tmp[i - 1][j];\n            }          \n          \n            V[i][j] = prev_V[i][j] - L_tmp[i][j] * U_tmp[i][j];\n          }\n\n        }\n    }\n  }\n#pragma endscop\n}\n\nint main(int argc, char **argv) {\n  data_t A[N][N], L[N][N], U[N][N], L_golden[N][N], U_golden[N][N];\n\n  init_array(A);\n  for (int i = 0; i < N; i++)\n    for (int j = 0; j < N; j++) {\n      L[i][j] = 0;\n      U[i][j] = 0;\n      L_golden[i][j] = 0;\n      U_golden[i][j] = 0;\n    }\n    \n  lu_device(A, L, U);\n  lu_cpu(A, L_golden, U_golden);\n\n  int err = 0;\n  for (int i = 0; i < N; i++)\n    for (int j = 0; j <= i; j++) {\n      if (fabs((float)L_golden[i][j] - (float)L[i][j]) > 0.001)\n        err++;\n    }\n  for (int i = 0; i < N; i++)\n    for (int j = i; j < N; j++) {\n      if (fabs((float)U_golden[i][j] - (float)U[i][j]) > 0.001)\n        err++;\n    }\n\n  if (err)\n    printf(\"Failed with %d errors!\\n\", err);\n  else\n    printf(\"Passed!\\n\");\n\n  printf(\"A:\\n\");\n  for (int i = 0; i < N; i++) {\n    for (int j = 0; j < N; j++) \n      printf(\"%f \", A[i][j]);\n    printf(\"\\n\");\n  }\n\n  printf(\"L_golden:\\n\");\n  for (int i = 0; i < N; i++) {\n    for (int j = 0; j < N; j++) {      \n      printf(\"%f \", (i == j)? 1.0 : L_golden[j][i]);      \n    }\n    printf(\"\\n\");\n  }\n\n  printf(\"U_golden:\\n\");\n  for (int i = 0; i < N; i++) {\n    for (int j = 0; j < N; j++) {\n      printf(\"%f \", U_golden[i][j]);\n    }\n    printf(\"\\n\");\n  }\n\n  printf(\"L:\\n\");\n  for (int i = 0; i < N; i++) {\n    for (int j = 0; j < N; j++) {      \n      printf(\"%f \", (i == j)? 1.0 : (j < i)? L[j][i] : 0.0);      \n    }\n    printf(\"\\n\");\n  }\n\n  printf(\"U:\\n\");\n  for (int i = 0; i < N; i++) {\n    for (int j = 0; j < N; j++) {\n      printf(\"%f \", (j < i)? 0.0 : U[i][j]);\n    }\n    printf(\"\\n\");\n  }\n\n  return 0;    \n}\n"
  },
  {
    "path": "autosa_tests/lu/kernel.h",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <math.h>\n\ntypedef float data_t;\n//#define N 3\n#define N 32\n"
  },
  {
    "path": "autosa_tests/lu/simd_info.json",
    "content": "{\n  \"kernel3\": {\n    \"reduction\": [\"n\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/mm/Makefile",
    "content": "VPP := $(XILINX_VITIS)/bin/v++\nEMCONFIGUTIL := $(XILINX_VITIS)/bin/emconfigutil\nMODE := hw\n#PLATFORM := xilinx_u200_qdma_201920_1\nPLATFORM := xilinx_u250_xdma_201830_2\n\n# sources\nKERNEL_SRC := src/kernel_kernel.cpp\nHOST_SRC := src/kernel_host.cpp\n\n# targets\nHOST_EXE := host.exe\n\nXOS := kernel0.$(MODE).xo\nXCLBIN := kernel0.$(MODE).xclbin\nEMCONFIG_FILE := emconfig.json\n\n# Linker options to map kernel ports to DDR banks\nVPP_LINK_OPTS := --config connectivity.cfg\n\nVPP_COMMON_OPTS := -s -t $(MODE) --platform $(PLATFORM) -R2 -O3 --kernel_frequency 250 --vivado.prop=run.impl_1.STRATEGY=Performance_EarlyBlockPlacement\nCFLAGS := -g -std=c++11 -I$(XILINX_XRT)/include\nLFLAGS := -L$(XILINX_XRT)/lib -lxilinxopencl -lpthread -lrt\nNUMDEVICES := 1\n\n# run time args\nEXE_OPT := kernel0.$(MODE).xclbin\n\n# primary build targets\n.PHONY: xclbin app all\n\nxclbin:  $(XCLBIN)\napp: $(HOST_EXE)\n\nall: xclbin app\n\nclean:\n\t-$(RM) $(EMCONFIG_FILE) $(HOST_EXE) $(XCLBIN) *.xclbin *.xo $(XOS)\n\n# kernel rules\n$(XOS): $(KERNEL_SRC)\n\t$(RM) $@\n\t$(VPP) $(VPP_COMMON_OPTS) -c -k kernel0 -o $@ $+\n\n\n$(XCLBIN): $(XOS)\n\t$(VPP) $(VPP_COMMON_OPTS) -l -o $@ $+ $(VPP_LINK_OPTS)\n\n# host rules\n$(HOST_EXE): $(HOST_SRC)\n\tg++ $(CFLAGS) -o $@ $+ $(LFLAGS)\n\t@echo 'Compiled Host Executable: $(HOST_EXE)'\n\n$(EMCONFIG_FILE):\n\t$(EMCONFIGUTIL) --nd $(NUMDEVICES) --od . --platform $(PLATFORM)\n\ncheck: $(XCLBIN) $(HOST_EXE) $(EMCONFIG_FILE)\n\tXCL_EMULATION_MODE=${MODE} ./$(HOST_EXE) $(EXE_OPT)\n"
  },
  {
    "path": "autosa_tests/mm/README.md",
    "content": "# Matrix Multiplication (Small)\n\nBoard        | Software Version\n-------------|-----------------\nXilinx Alveo U250 | Xilinx Vitis 2019.2\n\n__Files__:\n```\nautosa_tests/mm/kernel.c\nautosa_tests/mm/kernel.h\nautosa_tests/mm/simd_info.json\nautosa_tests/mm/Makefile\nautosa_tests/mm/connectivity.cfg\nautosa_tests/mm/hls_script.tcl\n```\n\n__Command__:\nTo run the HLS flow for C/RTL simulation\n```bash\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `hls_script.tcl` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/mm/hls_script.tcl autosa.tmp/output/\n```\n\nRun the TCL script to build the HLS project.\n\n```\ncd autosa.tmp/output\nvivado_hls -f hls_script.tcl\n```\n\nAlternatively, if you need to generate the bitstream for on-board testing, simply remove the `--hls` flag from the AutoSA command.\n```bash\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` and `connectivity.cfg` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/mm/Makefile autosa.tmp/output/\ncp autosa_tests/mm/connectivity.cfg autosa.tmp/output/\n```\n\nExecute the makefile to build the design.\n\n```\ncd autosa.tmp/output\nmake all\nmake check\n```\n\n__Other Test Cases__:\nBelow we provide some other test cases for you to try out.\n1. 1D systolic array\n```bash\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[0];kernel[]->array_part[32,32,32];kernel[]->latency[8,8];kernel[]->simd[2]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls\n```\n\n2. 2D systolic array\n```bash\n./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[32,4,32];kernel[]->latency[16,16];kernel[]->simd[2]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --local-reduce --reduce-op=\"+\" --simd-touch-space\n```\n"
  },
  {
    "path": "autosa_tests/mm/connectivity.cfg",
    "content": "[connectivity]\nsp=kernel0_1.A:DDR[0]\nsp=kernel0_1.B:DDR[1] \nsp=kernel0_1.C:DDR[2]\n"
  },
  {
    "path": "autosa_tests/mm/hls_script.tcl",
    "content": "############################################################\n## This file is generated automatically by Vivado HLS.\n## Please DO NOT edit it.\n## Copyright (C) 1986-2019 Xilinx, Inc. All Rights Reserved.\n############################################################\nopen_project hls_prj\nset_top kernel0\nadd_files src/kernel_kernel.h\nadd_files src/kernel_kernel.cpp\nadd_files -tb src/kernel_host.cpp\nopen_solution \"solution1\"\nset_part {xcu200-fsgd2104-2-e}\ncreate_clock -period 5 -name default\nconfig_compile -name_max_length 50\n#source \"./prj/solution1/directives.tcl\"\ncsim_design\n#csynth_design\n#cosim_design\n#cosim_design -trace_level all\n#cosim_design -setup -trace_level all\n#export_design -format ip_catalog\nexit\n"
  },
  {
    "path": "autosa_tests/mm/kernel.c",
    "content": "#include \"kernel.h\"\n\nint main(int argc, char **argv) {\n  data_t A[I][K], B[J][K], C[I][J], C_golden[I][J];  \n\n  for (int i = 0; i < I; i++) \n    for (int k = 0; k < K; k++) {\n      A[i][k] = (data_t)rand() / RAND_MAX;\n    }\n\n  for (int j = 0; j < J; j++)\n    for (int k = 0; k < K; k++) {\n      B[j][k] = (data_t)rand() / RAND_MAX;      \n    }\n\n#pragma scop\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      //C[i][j] = 0;\n      for (int k = 0; k < K; k++) {        \n        C[i][j] = C[i][j] + A[i][k] * B[j][k];\n      }\n    }\n#pragma endscop\n\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C_golden[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n        C_golden[i][j] = C_golden[i][j] + A[i][k] * B[j][k];\n      }\n    }\n\n  int err = 0;\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      if (fabs((float)C_golden[i][j] - (float)C[i][j]) > 0.001)\n        err++;\n    }\n\n  if (err)\n    printf(\"Failed with %d errors!\\n\", err);\n  else\n    printf(\"Passed!\\n\");\n\n  return 0;\n}\n"
  },
  {
    "path": "autosa_tests/mm/kernel.h",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <math.h>\n\ntypedef float data_t;\n//#define I 256 \n//#define J 264 \n//#define K 256\n\n//#define I 128 \n//#define J 128 \n//#define K 128\n\n#define I 64\n#define J 64\n#define K 64\n"
  },
  {
    "path": "autosa_tests/mm/param_names.json",
    "content": "{\n  \"kernel0\": [\"i\", \"j\", \"k\"],\n  \"kernel1\": [\"i\", \"j\", \"k\"],\n  \"kernel2\": [\"i\", \"j\", \"k\"],\n  \"kernel3\": [\"i\", \"j\", \"k\"],\n  \"kernel4\": [\"i\", \"j\", \"k\"],\n  \"kernel5\": [\"i\", \"j\", \"k\"]\n}\n"
  },
  {
    "path": "autosa_tests/mm/simd_info.json",
    "content": "{\n  \"kernel0\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel1\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel2\": {\n    \"reduction\": [\"y\"]\n  }, \n  \"kernel3\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel4\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel5\": {\n    \"reduction\": [\"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/mm_block_sparse/Makefile",
    "content": "VPP := $(XILINX_VITIS)/bin/v++\nEMCONFIGUTIL := $(XILINX_VITIS)/bin/emconfigutil\nMODE := hw\n#PLATFORM := xilinx_u200_qdma_201920_1\nPLATFORM := xilinx_u250_xdma_201830_2\n\n# sources\nKERNEL_SRC := src/kernel_kernel.cpp\nHOST_SRC := src/kernel_host.cpp\n\n# targets\nHOST_EXE := host.exe\n\nXOS := kernel0.$(MODE).xo\nXCLBIN := kernel0.$(MODE).xclbin\nEMCONFIG_FILE := emconfig.json\n\n# Linker options to map kernel ports to DDR banks\nVPP_LINK_OPTS := --config connectivity.cfg\n\nVPP_COMMON_OPTS := -s -t $(MODE) --platform $(PLATFORM) -R2 -O3 --kernel_frequency 250 --vivado.prop=run.impl_1.STRATEGY=Performance_EarlyBlockPlacement\nCFLAGS := -g -std=c++11 -I$(XILINX_XRT)/include\nLFLAGS := -L$(XILINX_XRT)/lib -lxilinxopencl -lpthread -lrt\nNUMDEVICES := 1\n\n# run time args\nEXE_OPT := kernel0.$(MODE).xclbin\n\n# primary build targets\n.PHONY: xclbin app all\n\nxclbin:  $(XCLBIN)\napp: $(HOST_EXE)\n\nall: xclbin app\n\nclean:\n\t-$(RM) $(EMCONFIG_FILE) $(HOST_EXE) $(XCLBIN) *.xclbin *.xo $(XOS)\n\n# kernel rules\n$(XOS): $(KERNEL_SRC)\n\t$(RM) $@\n\t$(VPP) $(VPP_COMMON_OPTS) -c -k kernel0 -o $@ $+\n\n\n$(XCLBIN): $(XOS)\n\t$(VPP) $(VPP_COMMON_OPTS) -l -o $@ $+ $(VPP_LINK_OPTS)\n\n# host rules\n$(HOST_EXE): $(HOST_SRC)\n\tg++ $(CFLAGS) -o $@ $+ $(LFLAGS)\n\t@echo 'Compiled Host Executable: $(HOST_EXE)'\n\n$(EMCONFIG_FILE):\n\t$(EMCONFIGUTIL) --nd $(NUMDEVICES) --od . --platform $(PLATFORM)\n\ncheck: $(XCLBIN) $(HOST_EXE) $(EMCONFIG_FILE)\n\tXCL_EMULATION_MODE=${MODE} ./$(HOST_EXE) $(EXE_OPT)\n"
  },
  {
    "path": "autosa_tests/mm_block_sparse/README.md",
    "content": "# Matrix Multiplication with Block Sparsity (Small)\n\nBoard        | Software Version\n-------------|-----------------\nXilinx Alveo U250 | Xilinx Vitis 2019.2\n\n__Files__:\n```\nautosa_tests/mm_block_sparse/kernel.c\nautosa_tests/mm_block_sparse/kernel.h\nautosa_tests/mm_block_sparse/simd_info.json\nautosa_tests/mm_block_sparse/Makefile\nautosa_tests/mm_block_sparse/connectivity.cfg\nautosa_tests/mm_block_sparse/hls_script.tcl\n```\n\n__Command__:\nTo run the HLS flow for C/RTL simulation\n```bash\n./autosa ./autosa_tests/mm_block_sparse/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[8]}\" --simd-info=./autosa_tests/mm_block_sparse/simd_info.json --host-serialize --hls --block-sparse --block-sparse-ratio=\"{kernel[]->A[2,4]}\"\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `hls_script.tcl` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/mm/hls_script.tcl autosa.tmp/output/\n```\n\nRun the TCL script to build the HLS project.\n\n```\ncd autosa.tmp/output\nvivado_hls -f hls_script.tcl\n```\n\nAlternatively, if you need to generate the bitstream for on-board testing, simply remove the `--hls` flag from the AutoSA command.\n```bash\n./autosa ./autosa_tests/mm_block_sparse/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[8]}\" --simd-info=./autosa_tests/mm_block_sparse/simd_info.json --host-serialize --block-sparse --block-sparse-ratio=\"{kernel[]->block_sparse[2,4]}\"\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` and `connectivity.cfg` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/mm/Makefile autosa.tmp/output/\ncp autosa_tests/mm/connectivity.cfg autosa.tmp/output/\n```\n\nExecute the makefile to build the design.\n\n```\ncd autosa.tmp/output\nmake all\nmake check\n```\n\n__Tuning__(Alpha):\n\n__Other Test Cases__:\nBelow we provide some other test cases for you to try out.\n1. \n```bash\n./autosa ./autosa_tests/mm_block_sparse/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[8]}\" --simd-info=./autosa_tests/mm_block_sparse/simd_info.json --host-serialize --block-sparse --block-sparse-ratio=\"{kernel[]->block_sparse[3,8]}\"\n```"
  },
  {
    "path": "autosa_tests/mm_block_sparse/connectivity.cfg",
    "content": "[connectivity]\nsp=kernel0_1.A:DDR[0]\nsp=kernel0_1.B:DDR[1] \nsp=kernel0_1.C:DDR[2]\n"
  },
  {
    "path": "autosa_tests/mm_block_sparse/hls_script.tcl",
    "content": "############################################################\n## This file is generated automatically by Vivado HLS.\n## Please DO NOT edit it.\n## Copyright (C) 1986-2019 Xilinx, Inc. All Rights Reserved.\n############################################################\nopen_project hls_prj\nset_top kernel0\nadd_files src/kernel_kernel.h\nadd_files src/kernel_kernel.cpp\nadd_files -tb src/kernel_host.cpp\nopen_solution \"solution1\"\nset_part {xcu200-fsgd2104-2-e}\ncreate_clock -period 5 -name default\nconfig_compile -name_max_length 50\n#source \"./prj/solution1/directives.tcl\"\ncsim_design\n#csynth_design\n#cosim_design\n#cosim_design -trace_level all\n#cosim_design -setup -trace_level all\n#export_design -format ip_catalog\nexit\n"
  },
  {
    "path": "autosa_tests/mm_block_sparse/kernel.c",
    "content": "/* This example uses the block sparsity to compute the matrix multiplication C = A * B.\n * The matrix A is with block sparsity and the matrix B is dense.\n * For matrix A, every VEC_LEN elements are grouped into a vector.\n * Inside each vector, there are NON_ZERO_NUM non-zero elements.\n * The sparsity of the matrix A is computed as 1 - NON_ZERO_NUM / VEC_LEN.\n * We use the matrix A_s to store both the data and index of the sparse matrix A.\n * \n * For each vector group, we use an unsigned char to record the relative position\n * of the non-zero element in the group.\n * At present, we assume the vector group size to be a power of two and is no greater than 8.\n * Then every NON_ZERO_NUM non-zero elements and their index are grouped together and \n * store in the A_s. \n * However, to make the data structure aligned, we will also pad this group if necessary.\n * For example, if the group size VEC_LEN is 8, and NON_ZERO_NUM is 4, we will concatenate the \n * index right after the first 4 data elements, resulting in 5 elements. \n * Furthermore, we will pad this group and extend it to 8 elements. \n * In this case, the effective storage for matrix A is the same with the unsparsified one.\n * If the group size VEC_LEN is 8, and NON_ZERO_NUM is 3, we will concatenate the \n * index after the first 3 elements, resulting in 4 elements. No further padding is needed.\n * The effective storage compression ratio for matrix A is 8/4 = 2x for this example.\n * In summary, we denote the number of elements other than the data elements as META_DATA_NUM.\n * And it can be computed as:\n * META_DATA_NUM = 2^{ceil(log2(NON_ZERO_NUM + 1))} - NON_ZERO_NUM\n */\n#include \"kernel.h\"\n\nint main(int argc, char **argv) {\n  data_t A[I][K], B[J][K], C[I][J], C_golden[I][J];\n\n  data_t A_d[I][K / COMPRESS_RATIO];\n  unsigned char A_i[I][K / VEC_LEN];\n\n  data_t A_s[I][K / EFF_COMPRESS_RATIO];\n\n  /* Initialize the matrix */\n  for (int i = 0; i < I; i++) \n    for (int k = 0; k < K; k++) {\n      A[i][k] = (data_t)rand() / RAND_MAX;\n    }\n\n  for (int j = 0; j < J; j++)\n    for (int k = 0; k < K; k++) {\n      B[j][k] = (data_t)rand() / RAND_MAX;\n    }\n\n  /* Generate the random sparse matrix */\n  for (int i = 0; i < I; i++)\n    for (int k = 0; k < K / VEC_LEN; k++) {\n      unsigned char offset = 0;\n      int n = 0;\n      while (n < NON_ZERO_NUM) {      \n        int pos = rand() % VEC_LEN;\n        /* Check if this position is already inserted */        \n        unsigned char cur_mask = offset & (1 << pos);\n        if (cur_mask) {\n          continue;\n        }\n        offset = offset | (1 << pos);\n        n++;\n      }\n      A_i[i][k] = offset;\n\n      int pos = 0;\n      int non_zero_pos = 0;\n      while (pos < VEC_LEN) {\n        unsigned char cur_mask = offset & (1 << pos);\n        if (cur_mask) {\n          A_d[i][k * NON_ZERO_NUM + non_zero_pos] = A[i][k * VEC_LEN + pos];\n          non_zero_pos++;\n        }\n        pos++;\n      }      \n    }\n\n  /* Generate the matrix to store both the sparse data and index */\n  for (int i = 0; i < I; i++)\n    for (int k = 0; k < K / VEC_LEN; k++) {\n      int n;\n      for (n = 0; n < NON_ZERO_NUM; n++) {\n        A_s[i][k * (NON_ZERO_NUM + META_DATA_NUM) + n] = A_d[i][k * NON_ZERO_NUM + n];\n      }\n      unsigned char offset = A_i[i][k];\n      union {data_t d; unsigned char c;} u;\n      u.c = offset;\n      A_s[i][k * (NON_ZERO_NUM + META_DATA_NUM) + n] = u.d;\n    }\n\n  /* For polyheral analysis */\n#pragma scop\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n        C[i][j] = C[i][j] + A[i][k] * B[j][k];\n      }\n    }\n#pragma endscop\n\n  /* The actual computation */\n//  for (int i = 0; i < I; i++)  \n//    for (int j = 0; j < J; j++) {\n//      C[i][j] = 0;\n//      for (int k = 0; k < K / VEC_LEN; k++) {\n//        /* Extract the non zero offset */\n//        int offset[NON_ZERO_NUM];\n//        unsigned char mask = A_i[i][k];\n//        int pos = 0;\n//        int non_zero_pos = 0;\n//        while (pos < VEC_LEN) {\n//          unsigned char cur_mask = mask & (1 << pos);\n//          if (cur_mask) {\n//            offset[non_zero_pos] = pos;\n//            non_zero_pos++;\n//          }\n//          pos++;\n//        }\n//        for (int n = 0; n < NON_ZERO_NUM; n++) {\n//          C[i][j] += A_d[i][k * NON_ZERO_NUM + n] * B[j][k * VEC_LEN + offset[n]];\n//        }\n//      }\n//    }\n\n  /* Compute the golden reference */\n  for (int i = 0; i < I; i++)  \n    for (int j = 0; j < J; j++) {\n      C_golden[i][j] = 0;\n      for (int k = 0; k < K / VEC_LEN; k++) {\n        /* Extract the non zero offset */\n        int offset[NON_ZERO_NUM];\n        unsigned char mask = A_i[i][k];\n        int pos = 0;\n        int non_zero_pos = 0;\n        while (pos < VEC_LEN) {\n          unsigned char cur_mask = mask & (1 << pos);\n          if (cur_mask) {\n            offset[non_zero_pos] = pos;\n            non_zero_pos++;\n          }\n          pos++;\n        }\n        for (int n = 0; n < NON_ZERO_NUM; n++) {\n          C_golden[i][j] += A_d[i][k * NON_ZERO_NUM + n] * B[j][k * VEC_LEN + offset[n]];\n        }\n      }\n    }  \n\n  /* Compare the results */\n  int err = 0;\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      if (fabs((float)C_golden[i][j] - (float)C[i][j]) > 0.001)\n        err++;\n    }\n\n  if (err)\n    printf(\"Failed with %d errors!\\n\", err);\n  else\n    printf(\"Passed!\\n\");\n\n  return 0;\n}\n"
  },
  {
    "path": "autosa_tests/mm_block_sparse/kernel.h",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <math.h>\n\ntypedef float data_t;\n#define I 64\n#define J 64\n#define K 64\n\n//#define VEC_LEN 4\n//#define NON_ZERO_NUM 3\n//#define COMPRESS_RATIO (VEC_LEN/NON_ZERO_NUM)\n//#define META_DATA_NUM 1\n//#define EFF_COMPRESS_RATIO (VEC_LEN/(NON_ZERO_NUM+META_DATA_NUM))\n\n#define VEC_LEN 4\n#define NON_ZERO_NUM 2\n#define COMPRESS_RATIO (VEC_LEN/NON_ZERO_NUM)\n#define META_DATA_NUM 2\n#define EFF_COMPRESS_RATIO (VEC_LEN/(NON_ZERO_NUM+META_DATA_NUM))\n\n//#define VEC_LEN 4\n//#define NON_ZERO_NUM 1\n//#define COMPRESS_RATIO (VEC_LEN/NON_ZERO_NUM)\n//#define META_DATA_NUM 1\n//#define EFF_COMPRESS_RATIO (VEC_LEN/(NON_ZERO_NUM+META_DATA_NUM))\n\n//#define VEC_LEN 8\n//#define NON_ZERO_NUM 4\n//#define COMPRESS_RATIO (VEC_LEN/NON_ZERO_NUM)\n//#define META_DATA_NUM 4\n//#define EFF_COMPRESS_RATIO (VEC_LEN/(NON_ZERO_NUM+META_DATA_NUM))\n\n//#define VEC_LEN 8\n//#define NON_ZERO_NUM 3\n//#define COMPRESS_RATIO (VEC_LEN/NON_ZERO_NUM)\n//#define META_DATA_NUM 1\n//#define EFF_COMPRESS_RATIO (VEC_LEN/(NON_ZERO_NUM+META_DATA_NUM))\n\n//#define VEC_LEN 8\n//#define NON_ZERO_NUM 2\n//#define COMPRESS_RATIO (VEC_LEN/NON_ZERO_NUM)\n//#define META_DATA_NUM 2\n//#define EFF_COMPRESS_RATIO (VEC_LEN/(NON_ZERO_NUM+META_DATA_NUM))\n"
  },
  {
    "path": "autosa_tests/mm_block_sparse/simd_info.json",
    "content": "{\n  \"kernel0\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel1\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel2\": {\n    \"reduction\": [\"y\"]\n  }, \n  \"kernel3\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel4\": {\n    \"reduction\": [\"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/mm_catapult/README.md",
    "content": "# Matrix Multiplication (Small)\n\nBoard        | Software Version\n-------------|-----------------\nN/A | Mentor Graphics Catapult Ultra 10.5c\n\n__Files__:\n```\nautosa_tests/mm_catapult/kernel.c\nautosa_tests/mm_catapult/kernel.h\nautosa_tests/mm_catapult/simd_info.json\n```\n\n__Command__:\nThis project shows the example of using Catapult HLS to generate FPGA designs.\n\nTo generate the input code for Catapult HLS, use the command below.\n```bash\n./autosa ./autosa_tests/mm_catapult/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_catapult_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`.\nCatapult HLS requires the GUI or TCL to perform the hardware optimization. AutoSA generates an example TCL flow named `kernel_directives.tcl` that can be found in the directory `autosa.tmp/output`.\n\nThere are several limitations for the current Catapult HLS flow.\n1. Floating point is not supported. We currently supported unsigned short and unsigned int.\n2. In order to achieve II=1, programmers need to provide additional dependence information in the TCL file.\n3. To successfully pass the C simulation, Catapult HLS requires the use of guards for input fifos. At present, programmers are required to add the guards manually.\n\nCatapult HLS will generate RTL which can be synthesized on the target FPGAs."
  },
  {
    "path": "autosa_tests/mm_catapult/directives.tcl",
    "content": "solution new -state initial\nsolution options defaults\nsolution options set /Input/CppStandard c++11\nsolution options set /Output/GenerateCycleNetlist false\nsolution options set /Flows/SCVerify/USE_CCS_BLOCK true\nsolution file add ../../research/autosa/AutoSA/autosa.tmp/output/src/kernel_kernel_hw.h -type CHEADER\nsolution file add ../../research/autosa/AutoSA/autosa.tmp/output/src/kernel_kernel.h -type CHEADER\nsolution file add ../../research/autosa/AutoSA/autosa.tmp/output/src/kernel.h -type CHEADER\nsolution file add ../../research/autosa/AutoSA/autosa.tmp/output/src/kernel_host.cpp -type C++\ndirective set -PIPELINE_RAMP_UP true\ndirective set -PROTOTYPING_ENGINE oasys\ndirective set -CLUSTER_TYPE combinational\ndirective set -CLUSTER_FAST_MODE false\ndirective set -CLUSTER_RTL_SYN false\ndirective set -CLUSTER_OPT_CONSTANT_INPUTS true\ndirective set -CLUSTER_ADDTREE_IN_COUNT_THRESHOLD 0\ndirective set -CLUSTER_ADDTREE_IN_WIDTH_THRESHOLD 0\ndirective set -ROM_THRESHOLD 64\ndirective set -PROTOTYPE_ROM true\ndirective set -CHARACTERIZE_ROM false\ndirective set -OPT_CONST_MULTS use_library\ndirective set -CLOCK_OVERHEAD 20.000000\ndirective set -RESET_CLEARS_ALL_REGS use_library\ndirective set -START_FLAG {}\ndirective set -READY_FLAG {}\ndirective set -DONE_FLAG {}\ndirective set -TRANSACTION_DONE_SIGNAL true\ndirective set -STALL_FLAG false\ndirective set -IDLE_SIGNAL {}\ndirective set -REGISTER_IDLE_SIGNAL false\ndirective set -ARRAY_SIZE 1024\ndirective set -CHAN_IO_PROTOCOL use_library\ndirective set -IO_MODE super\ndirective set -UNROLL no\ndirective set -REALLOC true\ndirective set -MUXPATH true\ndirective set -TIMING_CHECKS true\ndirective set -ASSIGN_OVERHEAD 0\ndirective set -REGISTER_SHARING_LIMIT 0\ndirective set -REGISTER_SHARING_MAX_WIDTH_DIFFERENCE 8\ndirective set -SAFE_FSM false\ndirective set -NO_X_ASSIGNMENTS true\ndirective set -REG_MAX_FANOUT 0\ndirective set -FSM_BINARY_ENCODING_THRESHOLD 64\ndirective set -FSM_ENCODING none\ndirective set -LOGIC_OPT false\ndirective set -MEM_MAP_THRESHOLD 32\ndirective set -REGISTER_THRESHOLD 256\ndirective set -MERGEABLE true\ndirective set -SPECULATE true\ndirective set -DESIGN_GOAL area\ngo new\nsolution library add mgc_Xilinx-VIRTEX-uplus-2LV_beh -- -rtlsyntool Vivado -manufacturer Xilinx -family VIRTEX-uplus -speed -2LV -part xcvu11p-flga2577-2LV-e\nsolution library add Xilinx_RAMS\nsolution library add Xilinx_ROMS\nsolution library add amba\nsolution library add ccs_fpga_hic\nsolution library add Xilinx_FIFO\ngo libraries\ndirective set -CLOCKS {clk {-CLOCK_PERIOD 5.0 -CLOCK_EDGE rising -CLOCK_UNCERTAINTY 0.0 -CLOCK_HIGH_TIME 2.5 -RESET_SYNC_NAME rst -RESET_ASYNC_NAME arst_n -RESET_KIND sync -RESET_SYNC_ACTIVE high -RESET_ASYNC_ACTIVE low -ENABLE_ACTIVE high}}\ngo assembly\ndirective set -FIFO_DEPTH 1\ndirective set /kernel0/A_IO_L2_in_intra_trans/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/A_IO_L2_in_inter_trans/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/A_IO_L2_in_inter_trans_boundary/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/A_IO_L2_in/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/A_IO_L2_in_boundary/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/B_IO_L2_in_intra_trans/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/B_IO_L2_in_inter_trans/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/B_IO_L2_in_inter_trans_boundary/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/B_IO_L2_in/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/B_IO_L2_in_boundary/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/PE/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/PE/idy:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/C_drain_IO_L1_out_intra_trans/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/C_drain_IO_L1_out_intra_trans/idy:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/C_drain_IO_L1_out_inter_trans/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/C_drain_IO_L1_out_inter_trans/idy:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/C_drain_IO_L1_out_inter_trans_boundary/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/C_drain_IO_L1_out_inter_trans_boundary/idy:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/C_drain_IO_L1_out/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/C_drain_IO_L1_out/idy:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/C_drain_IO_L1_out_boundary/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/C_drain_IO_L1_out_boundary/idy:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/C_drain_IO_L2_out/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/C_drain_IO_L2_out_boundary/idx:rsc -MAP_TO_MODULE {[DirectInput]}\ndirective set /kernel0/A_IO_L2_in/A_IO_L2_in_local_A_inst:cns -MAP_TO_MODULE Xilinx_RAMS.BLOCK_1R1W_RBW_DUAL\ndirective set /kernel0/A_IO_L2_in/A_IO_L2_in_local_A_inst:cns -STAGE_REPLICATION 2\ndirective set /kernel0/A_IO_L2_in/A_IO_L2_in_local_A_inst -WORD_WIDTH 256\ndirective set /kernel0/A_IO_L2_in_boundary/A_IO_L2_in_local_A_inst:cns -MAP_TO_MODULE Xilinx_RAMS.BLOCK_1R1W_RBW_DUAL\ndirective set /kernel0/A_IO_L2_in_boundary/A_IO_L2_in_local_A_inst:cns -STAGE_REPLICATION 2\ndirective set /kernel0/A_IO_L2_in_boundary/A_IO_L2_in_local_A_inst -WORD_WIDTH 256\ndirective set /kernel0/B_IO_L2_in/B_IO_L2_in_local_B_inst:cns -MAP_TO_MODULE Xilinx_RAMS.BLOCK_1R1W_RBW_DUAL\ndirective set /kernel0/B_IO_L2_in/B_IO_L2_in_local_B_inst:cns -STAGE_REPLICATION 2\ndirective set /kernel0/B_IO_L2_in/B_IO_L2_in_local_B_inst -WORD_WIDTH 256\ndirective set /kernel0/B_IO_L2_in_boundary/B_IO_L2_in_local_B_inst:cns -MAP_TO_MODULE Xilinx_RAMS.BLOCK_1R1W_RBW_DUAL\ndirective set /kernel0/B_IO_L2_in_boundary/B_IO_L2_in_local_B_inst:cns -STAGE_REPLICATION 2\ndirective set /kernel0/B_IO_L2_in_boundary/B_IO_L2_in_local_B_inst -WORD_WIDTH 256\ndirective set /kernel0/C_drain_IO_L1_out/C_drain_IO_L1_out_local_C_inst:cns -MAP_TO_MODULE Xilinx_RAMS.BLOCK_1R1W_RBW_DUAL\ndirective set /kernel0/C_drain_IO_L1_out/C_drain_IO_L1_out_local_C_inst:cns -STAGE_REPLICATION 1\ndirective set /kernel0/C_drain_IO_L1_out/C_drain_IO_L1_out_local_C_inst -WORD_WIDTH 64\ndirective set /kernel0/C_drain_IO_L1_out_boundary/C_drain_IO_L1_out_local_C_inst:cns -MAP_TO_MODULE Xilinx_RAMS.BLOCK_1R1W_RBW_DUAL\ndirective set /kernel0/C_drain_IO_L1_out_boundary/C_drain_IO_L1_out_local_C_inst:cns -STAGE_REPLICATION 1\ndirective set /kernel0/C_drain_IO_L1_out_boundary/C_drain_IO_L1_out_local_C_inst -WORD_WIDTH 64\ngo architect\n// Insert directives for dependence if necessary\n// Example: directive set /kernel0/PE/run/for:read_mem(local_C:rsc.@) -IGNORE_DEPENDENCY_FROM {for:write_mem(local_C:rsc.@) for:write_mem(local_C:rsc.@)}\ndirective set /kernel0/PE/run/for#1:for:for:for:for#2:read_mem(local_C:rsc.@) -IGNORE_DEPENDENCY_FROM {for#1:for:for:for:for#2:write_mem(local_C:rsc.@)}\ngo allocate\ngo extract\n"
  },
  {
    "path": "autosa_tests/mm_catapult/kernel.c",
    "content": "#include \"kernel.h\"\n\nint main(int argc, char **argv) {\n  data_t A[I_P][K_P], B[J_P][K_P], C[I_P][J_P], C_golden[I_P][J_P]; // gemm0,3\n\n  for (int i = 0; i < I_P; i++) \n    for (int k = 0; k < K_P; k++) {\n      //A[i][k] = (data_t)rand() / RAND_MAX;\n      A[i][k] = (data_t)1;\n    }\n\n  for (int j = 0; j < J_P; j++)\n    for (int k = 0; k < K_P; k++) {\n      //B[j][k] = (data_t)rand() / RAND_MAX;\n      B[j][k] = (data_t)1;\n    }\n\n#pragma scop\n  for (int i = 0; i < I_P; i++)\n    for (int j = 0; j < J_P; j++) {\n      C[i][j] = 0;\n      for (int k = 0; k < K_P; k++) {\n        C[i][j] = C[i][j] + A[i][k] * B[j][k];\n      }\n    }\n#pragma endscop\n\n  for (int i = 0; i < I_P; i++)\n    for (int j = 0; j < J_P; j++) {\n      C_golden[i][j] = 0;\n      for (int k = 0; k < K_P; k++) {\n        C_golden[i][j] = C_golden[i][j] + A[i][k] * B[j][k];\n      }\n    }\n\n  int err = 0;\n  for (int i = 0; i < I_P; i++)\n    for (int j = 0; j < J_P; j++) {\n      if (fabs((float)C_golden[i][j] - (float)C[i][j]) > 0.001)\n        err++;\n    }\n\n  if (err)\n    printf(\"Failed with %d errors!\\n\", err);\n  else\n    printf(\"Passed!\\n\");\n\n  return 0;\n}\n"
  },
  {
    "path": "autosa_tests/mm_catapult/kernel.h",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <math.h>\n\n//typedef float data_t;\ntypedef unsigned int data_t;\n#define I_P 64\n#define J_P 64\n#define K_P 64\n"
  },
  {
    "path": "autosa_tests/mm_catapult/kernel_kernel_hw.h",
    "content": "#include \"kernel_kernel.h\"\n\nstruct A_IO_L2_in_local_A {\n  A_t8 data[8][2];\n};\n\nstruct B_IO_L2_in_local_B {\n  B_t8 data[8][2];\n};\n\nstruct C_drain_IO_L1_out_local_C {\n  C_t2 data[8][4];\n};\n\n#include <mc_scverify.h>\n\n/* Module Definition */\nclass A_IO_L3_in {\n  public:\n    A_IO_L3_in() {}\n    #pragma hls_design interface\n    #pragma hls_pipeline_init_interval 1\n    void CCS_BLOCK(run)(ac_channel<A_t8> &fifo_A_serialize, ac_channel<A_t8> &fifo_A_local_out) {\n      /* Variable Declaration */\n      /* Variable Declaration */\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n      for (ac_int<3, false> c0 = 0; c0 <= 3; c0 += 1)\n        for (ac_int<3, false> c1 = 0; c1 <= 3; c1 += 1)\n          for (ac_int<3, false> c2 = 0; c2 <= 3; c2 += 1)\n            for (ac_int<2, false> c3 = 0; c3 <= 1; c3 += 1)\n              for (ac_int<4, false> c4 = 0; c4 <= 7; c4 += 1)\n                for (ac_int<2, false> c5 = 0; c5 <= 1; c5 += 1)\n#endif\n                {\n                  // hls_pipeline\n                {\n                  A_t8 fifo_data;\n                  fifo_data = fifo_A_serialize.read();\n                  fifo_A_local_out.write(fifo_data);\n                }\n                }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass A_IO_L3_in_serialize {\n  public:\n    A_IO_L3_in_serialize() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(A_t16 A[1024], ac_channel<A_t8> &fifo_A_local_out) {\n      /* Variable Declaration */\n      /* Variable Declaration */\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n#endif\n      A_t8 fifo_data;\n      A_t16 mem_data;\n      #pragma hls_pipeline_init_interval 1\n      for (ac_int<11, false> i = 0; i < 1024; i++) {\n        mem_data = A[i];\n        for (ac_int<2, false> p = 0; p < 2; p++) {\n          fifo_data = mem_data.slc<256>(0);\n          mem_data = mem_data >> 256;\n          fifo_A_local_out.write(fifo_data);\n        }\n      }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass A_IO_L2_in_intra_trans {\n  public:\n    A_IO_L2_in_intra_trans() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, ac_channel<A_IO_L2_in_local_A> &local_A, ac_channel<A_t2> &fifo_A_local_out) {\n      /* Variable Declaration */\n      int p0 = idx; // module id\n      /* Variable Declaration */\n\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n      for (ac_int<3, false> c0 = 0; c0 <= 3; c0 += 1)\n        for (ac_int<3, false> c1 = 0; c1 <= 3; c1 += 1)\n          for (ac_int<3, false> c2 = 0; c2 <= 3; c2 += 1)\n#endif\n          {\n            A_IO_L2_in_local_A local_A_tmp;\n            local_A_tmp = local_A.read();\n            // synth\n            #pragma hls_pipeline_init_interval 1\n            for (ac_int<4, false> c5 = 0; c5 <= 7; c5 += 1)\n              for (ac_int<4, false> c6 = 0; c6 <= 7; c6 += 1)\n                for (ac_int<4, false> c7 = 0; c7 <= 7; c7 += 1) {\n                  // hls_pipeline\n                  A_t2 fifo_data;\n                  A_t8 buf_data;\n                  A_t2 buf_data_split[4];\n                  buf_data = local_A_tmp.data[c7][2 * c5 / 8];\n                  buf_data_split[0] = buf_data.slc<64>(0);\n                  buf_data_split[1] = buf_data.slc<64>(64);\n                  buf_data_split[2] = buf_data.slc<64>(128);\n                  buf_data_split[3] = buf_data.slc<64>(192);\n                  int split_i = (c5) % 4;\n                  fifo_data = buf_data_split[split_i];\n                  fifo_A_local_out.write(fifo_data);\n                }\n          }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass A_IO_L2_in_inter_trans {\n  public:\n    A_IO_L2_in_inter_trans() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, ac_channel<A_IO_L2_in_local_A> &local_A, ac_channel<A_t8> &fifo_A_in, ac_channel<A_t8> &fifo_A_out) {\n      /* Variable Declaration */\n      int p0 = idx; // module id\n      /* Variable Declaration */\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n      for (ac_int<3, false> c0 = 0; c0 <= 3; c0 += 1)\n        for (ac_int<3, false> c1 = 0; c1 <= 3; c1 += 1)\n          for (ac_int<3, false> c2 = 0; c2 <= 3; c2 += 1)\n#endif\n          {\n            A_IO_L2_in_local_A local_A_tmp;\n            // synth\n            #pragma hls_pipeline_init_interval 1\n            for (ac_int<2, false> c3 = p0; c3 <= 1; c3 += 1) {\n              if (c3 == p0) {\n                for (ac_int<4, false> c4 = 0; c4 <= 7; c4 += 1)\n                  for (ac_int<2, false> c5 = 0; c5 <= 1; c5 += 1) {\n                    // hls_pipeline\n                    A_t8 fifo_data;\n                    fifo_data = fifo_A_in.read();\n                    local_A_tmp.data[c4][c5] = fifo_data;\n                  }\n              } else {\n                for (ac_int<4, false> c4 = 0; c4 <= 7; c4 += 1)\n                  for (ac_int<2, false> c5 = 0; c5 <= 1; c5 += 1) {\n                    // hls_pipeline\n                    A_t8 fifo_data;\n                    fifo_data = fifo_A_in.read();\n                    fifo_A_out.write(fifo_data);\n                  }\n              }\n            }\n            local_A.write(local_A_tmp);\n          }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass A_IO_L2_in_inter_trans_boundary {\n  public:\n    A_IO_L2_in_inter_trans_boundary() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, ac_channel<A_IO_L2_in_local_A> &local_A, ac_channel<A_t8> &fifo_A_in) {\n      /* Variable Declaration */\n      int p0 = idx; // module id\n      /* Variable Declaration */\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n      for (ac_int<3, false> c0 = 0; c0 <= 3; c0 += 1)\n        for (ac_int<3, false> c1 = 0; c1 <= 3; c1 += 1)\n          for (ac_int<3, false> c2 = 0; c2 <= 3; c2 += 1)\n#endif\n          {\n            A_IO_L2_in_local_A local_A_tmp;\n            // synth\n            #pragma hls_pipeline_init_interval 1\n            for (ac_int<2, false> c3 = p0; c3 <= 1; c3 += 1)\n              if (c3 == p0)\n                for (ac_int<4, false> c4 = 0; c4 <= 7; c4 += 1)\n                  for (ac_int<2, false> c5 = 0; c5 <= 1; c5 += 1) {\n                    // hls_pipeline\n                    A_t8 fifo_data;\n                    fifo_data = fifo_A_in.read();\n                    local_A_tmp.data[c4][c5] = fifo_data;\n                  }\n            local_A.write(local_A_tmp);\n          }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass A_IO_L2_in {\n  public:\n    A_IO_L2_in() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, ac_channel<A_t8> &fifo_A_in, ac_channel<A_t8> &fifo_A_out, ac_channel<A_t2> &fifo_A_local_out) {\n      /* Variable Declaration */\n      int p0 = idx; // module id\n      /* Variable Declaration */\n\n      A_IO_L2_in_inter_trans_inst.run(\n        /* module id */ idx, \n        /* array */ A_IO_L2_in_local_A_inst, \n        /* fifo */ fifo_A_in, \n        /* fifo */ fifo_A_out\n      );\n      A_IO_L2_in_intra_trans_inst.run(\n        /* module id */ idx, \n        /* array */ A_IO_L2_in_local_A_inst, \n        /* fifo */ fifo_A_local_out\n      );\n    }\n\n  private:\n    A_IO_L2_in_inter_trans A_IO_L2_in_inter_trans_inst;\n    A_IO_L2_in_intra_trans A_IO_L2_in_intra_trans_inst;\n    ac_channel<A_IO_L2_in_local_A> A_IO_L2_in_local_A_inst;\n};\n/* Module Definition */\n\n/* Module Definition */\nclass A_IO_L2_in_boundary {\n  public:\n    A_IO_L2_in_boundary() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, ac_channel<A_t8> &fifo_A_in, ac_channel<A_t2> &fifo_A_local_out) {\n      /* Variable Declaration */\n      int p0 = idx; // module id\n      /* Variable Declaration */\n\n      A_IO_L2_in_inter_trans_boundary_inst.run(\n        /* module id */ idx, \n        /* array */ A_IO_L2_in_local_A_inst, \n        /* fifo */ fifo_A_in\n      );\n      A_IO_L2_in_intra_trans_inst.run(\n        /* module id */ idx, \n        /* array */ A_IO_L2_in_local_A_inst, \n        /* fifo */ fifo_A_local_out\n      );\n    }\n\n  private:\n    A_IO_L2_in_inter_trans_boundary A_IO_L2_in_inter_trans_boundary_inst;\n    A_IO_L2_in_intra_trans A_IO_L2_in_intra_trans_inst;\n    ac_channel<A_IO_L2_in_local_A> A_IO_L2_in_local_A_inst;\n};\n/* Module Definition */\n\n/* Module Definition */\nclass B_IO_L3_in {\n  public:\n    B_IO_L3_in() {}\n    #pragma hls_design interface\n    #pragma hls_pipeline_init_interval 1\n    void CCS_BLOCK(run)(ac_channel<B_t8> &fifo_B_serialize, ac_channel<B_t8> &fifo_B_local_out) {\n      /* Variable Declaration */\n      /* Variable Declaration */\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n      for (ac_int<3, false> c0 = 0; c0 <= 3; c0 += 1)\n        for (ac_int<3, false> c1 = 0; c1 <= 3; c1 += 1)\n          for (ac_int<3, false> c2 = 0; c2 <= 3; c2 += 1)\n            for (ac_int<2, false> c3 = 0; c3 <= 1; c3 += 1)\n              for (ac_int<4, false> c4 = 0; c4 <= 7; c4 += 1)\n                for (ac_int<2, false> c5 = 0; c5 <= 1; c5 += 1)\n#endif\n                {\n                  // hls_pipeline\n                {\n                  B_t8 fifo_data;\n                  fifo_data = fifo_B_serialize.read();\n                  fifo_B_local_out.write(fifo_data);\n                }\n                }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass B_IO_L3_in_serialize {\n  public:\n    B_IO_L3_in_serialize() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(B_t16 B[1024], ac_channel<B_t8> &fifo_B_local_out) {\n      /* Variable Declaration */\n      /* Variable Declaration */\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n#endif\n      B_t8 fifo_data;\n      B_t16 mem_data;\n      #pragma hls_pipeline_init_interval 1\n      for (ac_int<11, false> i = 0; i < 1024; i++) {\n        mem_data = B[i];\n        for (ac_int<2, false> p = 0; p < 2; p++) {\n          fifo_data = mem_data.slc<256>(0);\n          mem_data = mem_data >> 256;\n          fifo_B_local_out.write(fifo_data);\n        }\n      }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass B_IO_L2_in_intra_trans {\n  public:\n    B_IO_L2_in_intra_trans() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, ac_channel<B_IO_L2_in_local_B> &local_B, ac_channel<B_t2> &fifo_B_local_out) {\n      /* Variable Declaration */\n      int p0 = idx; // module id\n      /* Variable Declaration */\n\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n      for (ac_int<3, false> c0 = 0; c0 <= 3; c0 += 1)\n        for (ac_int<3, false> c1 = 0; c1 <= 3; c1 += 1)\n          for (ac_int<3, false> c2 = 0; c2 <= 3; c2 += 1)\n#endif\n          {\n            B_IO_L2_in_local_B local_B_tmp;\n            local_B_tmp = local_B.read();\n            // synth\n            #pragma hls_pipeline_init_interval 1\n            for (ac_int<4, false> c5 = 0; c5 <= 7; c5 += 1)\n              for (ac_int<4, false> c6 = 0; c6 <= 7; c6 += 1)\n                for (ac_int<4, false> c7 = 0; c7 <= 7; c7 += 1) {\n                  // hls_pipeline\n                  B_t2 fifo_data;\n                  B_t8 buf_data;\n                  B_t2 buf_data_split[4];\n                  buf_data = local_B_tmp.data[c6][2 * c5 / 8];\n                  buf_data_split[0] = buf_data.slc<64>(0);\n                  buf_data_split[1] = buf_data.slc<64>(64);\n                  buf_data_split[2] = buf_data.slc<64>(128);\n                  buf_data_split[3] = buf_data.slc<64>(192);\n                  int split_i = (c5) % 4;\n                  fifo_data = buf_data_split[split_i];\n                  fifo_B_local_out.write(fifo_data);\n                }\n          }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass B_IO_L2_in_inter_trans {\n  public:\n    B_IO_L2_in_inter_trans() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, ac_channel<B_IO_L2_in_local_B> &local_B, ac_channel<B_t8> &fifo_B_in, ac_channel<B_t8> &fifo_B_out) {\n      /* Variable Declaration */\n      int p0 = idx; // module id\n      /* Variable Declaration */\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n      for (ac_int<3, false> c0 = 0; c0 <= 3; c0 += 1)\n        for (ac_int<3, false> c1 = 0; c1 <= 3; c1 += 1)\n          for (ac_int<3, false> c2 = 0; c2 <= 3; c2 += 1)\n#endif\n          {\n            B_IO_L2_in_local_B local_B_tmp;\n            // synth\n            #pragma hls_pipeline_init_interval 1\n            for (ac_int<2, false> c3 = p0; c3 <= 1; c3 += 1) {\n              if (c3 == p0) {\n                for (ac_int<4, false> c4 = 0; c4 <= 7; c4 += 1)\n                  for (ac_int<2, false> c5 = 0; c5 <= 1; c5 += 1) {\n                    // hls_pipeline\n                    B_t8 fifo_data;\n                    fifo_data = fifo_B_in.read();\n                    local_B_tmp.data[c4][c5] = fifo_data;\n                  }\n              } else {\n                for (ac_int<4, false> c4 = 0; c4 <= 7; c4 += 1)\n                  for (ac_int<2, false> c5 = 0; c5 <= 1; c5 += 1) {\n                    // hls_pipeline\n                    B_t8 fifo_data;\n                    fifo_data = fifo_B_in.read();\n                    fifo_B_out.write(fifo_data);\n                  }\n              }\n            }\n            local_B.write(local_B_tmp);\n          }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass B_IO_L2_in_inter_trans_boundary {\n  public:\n    B_IO_L2_in_inter_trans_boundary() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, ac_channel<B_IO_L2_in_local_B> &local_B, ac_channel<B_t8> &fifo_B_in) {\n      /* Variable Declaration */\n      int p0 = idx; // module id\n      /* Variable Declaration */\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n      for (ac_int<3, false> c0 = 0; c0 <= 3; c0 += 1)\n        for (ac_int<3, false> c1 = 0; c1 <= 3; c1 += 1)\n          for (ac_int<3, false> c2 = 0; c2 <= 3; c2 += 1)\n#endif\n          {\n            B_IO_L2_in_local_B local_B_tmp;\n            // synth\n            #pragma hls_pipeline_init_interval 1\n            for (ac_int<2, false> c3 = p0; c3 <= 1; c3 += 1)\n              if (c3 == p0)\n                for (ac_int<4, false> c4 = 0; c4 <= 7; c4 += 1)\n                  for (ac_int<2, false> c5 = 0; c5 <= 1; c5 += 1) {\n                    // hls_pipeline\n                    B_t8 fifo_data;\n                    fifo_data = fifo_B_in.read();\n                    local_B_tmp.data[c4][c5] = fifo_data;\n                  }\n            local_B.write(local_B_tmp);\n          }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass B_IO_L2_in {\n  public:\n    B_IO_L2_in() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, ac_channel<B_t8> &fifo_B_in, ac_channel<B_t8> &fifo_B_out, ac_channel<B_t2> &fifo_B_local_out) {\n      /* Variable Declaration */\n      int p0 = idx; // module id\n      /* Variable Declaration */\n\n      B_IO_L2_in_inter_trans_inst.run(\n        /* module id */ idx, \n        /* array */ B_IO_L2_in_local_B_inst, \n        /* fifo */ fifo_B_in, \n        /* fifo */ fifo_B_out\n      );\n      B_IO_L2_in_intra_trans_inst.run(\n        /* module id */ idx, \n        /* array */ B_IO_L2_in_local_B_inst, \n        /* fifo */ fifo_B_local_out\n      );\n    }\n\n  private:\n    B_IO_L2_in_inter_trans B_IO_L2_in_inter_trans_inst;\n    B_IO_L2_in_intra_trans B_IO_L2_in_intra_trans_inst;\n    ac_channel<B_IO_L2_in_local_B> B_IO_L2_in_local_B_inst;\n};\n/* Module Definition */\n\n/* Module Definition */\nclass B_IO_L2_in_boundary {\n  public:\n    B_IO_L2_in_boundary() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, ac_channel<B_t8> &fifo_B_in, ac_channel<B_t2> &fifo_B_local_out) {\n      /* Variable Declaration */\n      int p0 = idx; // module id\n      /* Variable Declaration */\n\n      B_IO_L2_in_inter_trans_boundary_inst.run(\n        /* module id */ idx, \n        /* array */ B_IO_L2_in_local_B_inst, \n        /* fifo */ fifo_B_in\n      );\n      B_IO_L2_in_intra_trans_inst.run(\n        /* module id */ idx, \n        /* array */ B_IO_L2_in_local_B_inst, \n        /* fifo */ fifo_B_local_out\n      );\n    }\n\n  private:\n    B_IO_L2_in_inter_trans_boundary B_IO_L2_in_inter_trans_boundary_inst;\n    B_IO_L2_in_intra_trans B_IO_L2_in_intra_trans_inst;\n    ac_channel<B_IO_L2_in_local_B> B_IO_L2_in_local_B_inst;\n};\n/* Module Definition */\n\n/* Module Definition */\nclass PE {\n  public:\n    PE() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, int idy, ac_channel<A_t2> &fifo_A_in, ac_channel<A_t2> &fifo_A_out, ac_channel<B_t2> &fifo_B_in, ac_channel<B_t2> &fifo_B_out, ac_channel<C_t1> &fifo_C_drain_out) {\n      /* Variable Declaration */\n      int p0 = idx, p1 = idy; // module id\n      A_t1 local_A[1][2];\n      B_t1 local_B[1][2];\n      C_t1 local_C[8][8];\n      /* Variable Declaration */\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n      for (ac_int<3, false> c0 = 0; c0 <= 3; c0 += 1)\n        for (ac_int<3, false> c1 = 0; c1 <= 3; c1 += 1)\n#endif\n        {\n          {\n            #pragma hls_pipeline_init_interval 1\n            for (ac_int<4, false> c6 = 0; c6 <= 7; c6 += 1)\n              for (ac_int<4, false> c7 = 0; c7 <= 7; c7 += 1) {\n                // hls_unroll\n                local_C[c7][c6] = 0;\n              }\n            #pragma hls_pipeline_init_interval 1\n            for (ac_int<3, false> c2 = 0; c2 <= 3; c2 += 1)\n              for (ac_int<4, false> c5 = 0; c5 <= 7; c5 += 1)\n                for (ac_int<4, false> c6 = 0; c6 <= 7; c6 += 1)\n                  for (ac_int<4, false> c7 = 0; c7 <= 7; c7 += 1) {\n                    {\n                      A_t2 fifo_data;\n                      fifo_data = fifo_A_in.read();\n                      #pragma unroll yes\n                      for (ac_int<2, false> n = 0; n < 2; n++) {\n                        local_A[0][n] = (A_t1)fifo_data.slc<32>(0);\n                        fifo_data = fifo_data >> 32;\n                      }\n                    }\n                    {\n                      B_t2 fifo_data;\n                      fifo_data = fifo_B_in.read();\n                      #pragma unroll yes\n                      for (ac_int<2, false> n = 0; n < 2; n++) {\n                        local_B[0][n] = (B_t1)fifo_data.slc<32>(0);\n                        fifo_data = fifo_data >> 32;\n                      }\n                    }\n                    #pragma unroll yes\n                    for (ac_int<2, false> c8 = 0; c8 <= 1; c8 += 1)\n                      local_C[c7][c6] = (local_C[c7][c6] + (local_A[0][c8] * local_B[0][c8]));\n                    if (c2 == 3 && c5 == 7)\n                      fifo_C_drain_out.write(local_C[c7][c6]);\n                    {\n                      B_t2 fifo_data;\n                      fifo_data.set_slc(32, local_B[0][1]);\n                      fifo_data.set_slc(0, local_B[0][0]);\n                      fifo_B_out.write(fifo_data);\n                    }\n                    {\n                      A_t2 fifo_data;\n                      fifo_data.set_slc(32, local_A[0][1]);\n                      fifo_data.set_slc(0, local_A[0][0]);\n                      fifo_A_out.write(fifo_data);\n                    }\n                  }\n          }\n        }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass C_drain_IO_L1_out_intra_trans {\n  public:\n    C_drain_IO_L1_out_intra_trans() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, int idy, ac_channel<C_drain_IO_L1_out_local_C> &local_C, ac_channel<C_t1> &fifo_C_drain_local_in) {\n      /* Variable Declaration */\n      int p0 = idx, p1 = idy; // module id\n      /* Variable Declaration */\n\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n      for (ac_int<3, false> c0 = 0; c0 <= 3; c0 += 1)\n        for (ac_int<3, false> c1 = 0; c1 <= 3; c1 += 1)\n#endif\n        {\n          C_drain_IO_L1_out_local_C local_C_tmp;\n          // synth\n          #pragma hls_pipeline_init_interval 1\n          for (ac_int<4, false> c6 = 0; c6 <= 7; c6 += 1)\n            for (ac_int<4, false> c7 = 0; c7 <= 7; c7 += 1) {\n              // hls_pipeline\n              C_t1 fifo_data;\n              C_t2 buf_data;\n              C_t1 buf_data_split[2];\n              buf_data = local_C_tmp.data[c7][c6 / 2];\n              buf_data_split[0] = buf_data.slc<32>(0);\n              buf_data_split[1] = buf_data.slc<32>(32);\n              int split_i = (c6) % 2;\n              fifo_data = fifo_C_drain_local_in.read();\n              buf_data_split[split_i] = fifo_data;\n                            buf_data.set_slc(0, buf_data_split[0]);\n              buf_data.set_slc(32, buf_data_split[1]);\n\n              local_C_tmp.data[c7][c6 / 2] = buf_data;\n            }\n          local_C.write(local_C_tmp);\n        }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass C_drain_IO_L1_out_inter_trans {\n  public:\n    C_drain_IO_L1_out_inter_trans() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, int idy, ac_channel<C_drain_IO_L1_out_local_C> &local_C, ac_channel<C_t2> &fifo_C_drain_in, ac_channel<C_t2> &fifo_C_drain_out) {\n      /* Variable Declaration */\n      int p0 = idx, p1 = idy; // module id\n      /* Variable Declaration */\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n      for (ac_int<3, false> c0 = 0; c0 <= 3; c0 += 1)\n        for (ac_int<3, false> c1 = 0; c1 <= 3; c1 += 1)\n#endif\n        {\n          C_drain_IO_L1_out_local_C local_C_tmp;\n          local_C_tmp = local_C.read();\n          // synth\n          #pragma hls_pipeline_init_interval 1\n          for (ac_int<2, false> c4 = p1; c4 <= 1; c4 += 1) {\n            if (c4 == p1) {\n              for (ac_int<4, false> c5 = 0; c5 <= 7; c5 += 1)\n                for (ac_int<3, false> c6 = 0; c6 <= 3; c6 += 1) {\n                  // hls_pipeline\n                  C_t2 fifo_data;\n                  fifo_data = local_C_tmp.data[c5][c6];\n                  fifo_C_drain_out.write(fifo_data);\n                }\n            } else {\n              for (ac_int<4, false> c5 = 0; c5 <= 7; c5 += 1)\n                for (ac_int<3, false> c6 = 0; c6 <= 3; c6 += 1) {\n                  // hls_pipeline\n                  C_t2 fifo_data;\n                  fifo_data = fifo_C_drain_in.read();\n                  fifo_C_drain_out.write(fifo_data);\n                }\n            }\n          }\n        }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass C_drain_IO_L1_out_inter_trans_boundary {\n  public:\n    C_drain_IO_L1_out_inter_trans_boundary() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, int idy, ac_channel<C_drain_IO_L1_out_local_C> &local_C, ac_channel<C_t2> &fifo_C_drain_out) {\n      /* Variable Declaration */\n      int p0 = idx, p1 = idy; // module id\n      /* Variable Declaration */\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n      for (ac_int<3, false> c0 = 0; c0 <= 3; c0 += 1)\n        for (ac_int<3, false> c1 = 0; c1 <= 3; c1 += 1)\n#endif\n        {\n          C_drain_IO_L1_out_local_C local_C_tmp;\n          local_C_tmp = local_C.read();\n          // synth\n          #pragma hls_pipeline_init_interval 1\n          for (ac_int<2, false> c4 = p1; c4 <= 1; c4 += 1)\n            if (c4 == p1)\n              for (ac_int<4, false> c5 = 0; c5 <= 7; c5 += 1)\n                for (ac_int<3, false> c6 = 0; c6 <= 3; c6 += 1) {\n                  // hls_pipeline\n                  C_t2 fifo_data;\n                  fifo_data = local_C_tmp.data[c5][c6];\n                  fifo_C_drain_out.write(fifo_data);\n                }\n        }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass C_drain_IO_L1_out {\n  public:\n    C_drain_IO_L1_out() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, int idy, ac_channel<C_t2> &fifo_C_drain_in, ac_channel<C_t2> &fifo_C_drain_out, ac_channel<C_t1> &fifo_C_drain_local_in) {\n      /* Variable Declaration */\n      int p0 = idx, p1 = idy; // module id\n      /* Variable Declaration */\n\n      C_drain_IO_L1_out_intra_trans_inst.run(\n        /* module id */ idx, \n        /* module id */ idy, \n        /* array */ C_drain_IO_L1_out_local_C_inst, \n        /* fifo */ fifo_C_drain_local_in\n      );\n      C_drain_IO_L1_out_inter_trans_inst.run(\n        /* module id */ idx, \n        /* module id */ idy, \n        /* array */ C_drain_IO_L1_out_local_C_inst, \n        /* fifo */ fifo_C_drain_in, \n        /* fifo */ fifo_C_drain_out\n      );\n    }\n\n  private:\n    C_drain_IO_L1_out_inter_trans C_drain_IO_L1_out_inter_trans_inst;\n    C_drain_IO_L1_out_intra_trans C_drain_IO_L1_out_intra_trans_inst;\n    ac_channel<C_drain_IO_L1_out_local_C> C_drain_IO_L1_out_local_C_inst;\n};\n/* Module Definition */\n\n/* Module Definition */\nclass C_drain_IO_L1_out_boundary {\n  public:\n    C_drain_IO_L1_out_boundary() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, int idy, ac_channel<C_t2> &fifo_C_drain_out, ac_channel<C_t1> &fifo_C_drain_local_in) {\n      /* Variable Declaration */\n      int p0 = idx, p1 = idy; // module id\n      /* Variable Declaration */\n\n      C_drain_IO_L1_out_intra_trans_inst.run(\n        /* module id */ idx, \n        /* module id */ idy, \n        /* array */ C_drain_IO_L1_out_local_C_inst, \n        /* fifo */ fifo_C_drain_local_in\n      );\n      C_drain_IO_L1_out_inter_trans_boundary_inst.run(\n        /* module id */ idx, \n        /* module id */ idy, \n        /* array */ C_drain_IO_L1_out_local_C_inst, \n        /* fifo */ fifo_C_drain_out\n      );\n    }\n\n  private:\n    C_drain_IO_L1_out_inter_trans_boundary C_drain_IO_L1_out_inter_trans_boundary_inst;\n    C_drain_IO_L1_out_intra_trans C_drain_IO_L1_out_intra_trans_inst;\n    ac_channel<C_drain_IO_L1_out_local_C> C_drain_IO_L1_out_local_C_inst;\n};\n/* Module Definition */\n\n/* Module Definition */\nclass C_drain_IO_L2_out {\n  public:\n    C_drain_IO_L2_out() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, ac_channel<C_t2> &fifo_C_drain_in, ac_channel<C_t2> &fifo_C_drain_out, ac_channel<C_t2> &fifo_C_drain_local_in) {\n      /* Variable Declaration */\n      int p0 = idx; // module id\n      /* Variable Declaration */\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n      for (ac_int<3, false> c0 = 0; c0 <= 3; c0 += 1)\n        for (ac_int<3, false> c1 = 0; c1 <= 3; c1 += 1)\n#endif\n        {\n          #pragma hls_pipeline_init_interval 1\n          for (ac_int<2, false> c3 = p0; c3 <= 1; c3 += 1) {\n            if (c3 == p0) {\n              for (ac_int<2, false> c4 = 0; c4 <= 1; c4 += 1)\n                for (ac_int<4, false> c5 = 0; c5 <= 7; c5 += 1)\n                  for (ac_int<3, false> c6 = 0; c6 <= 3; c6 += 1) {\n                    // hls_pipeline\n                    C_t2 fifo_data;\n                    fifo_data = fifo_C_drain_local_in.read();\n                    fifo_C_drain_out.write(fifo_data);\n                  }\n            } else {\n              for (ac_int<2, false> c4 = 0; c4 <= 1; c4 += 1)\n                for (ac_int<4, false> c5 = 0; c5 <= 7; c5 += 1)\n                  for (ac_int<3, false> c6 = 0; c6 <= 3; c6 += 1) {\n                    // hls_pipeline\n                    C_t2 fifo_data;\n                    fifo_data = fifo_C_drain_in.read();\n                    fifo_C_drain_out.write(fifo_data);\n                  }\n            }\n          }\n        }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass C_drain_IO_L2_out_boundary {\n  public:\n    C_drain_IO_L2_out_boundary() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(int idx, ac_channel<C_t2> &fifo_C_drain_out, ac_channel<C_t2> &fifo_C_drain_local_in) {\n      /* Variable Declaration */\n      int p0 = idx; // module id\n      /* Variable Declaration */\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n      for (ac_int<3, false> c0 = 0; c0 <= 3; c0 += 1)\n        for (ac_int<3, false> c1 = 0; c1 <= 3; c1 += 1)\n#endif\n        {\n          #pragma hls_pipeline_init_interval 1\n          for (ac_int<2, false> c3 = p0; c3 <= 1; c3 += 1)\n            if (c3 == p0)\n              for (ac_int<2, false> c4 = 0; c4 <= 1; c4 += 1)\n                for (ac_int<4, false> c5 = 0; c5 <= 7; c5 += 1)\n                  for (ac_int<3, false> c6 = 0; c6 <= 3; c6 += 1) {\n                    // hls_pipeline\n                    C_t2 fifo_data;\n                    fifo_data = fifo_C_drain_local_in.read();\n                    fifo_C_drain_out.write(fifo_data);\n                  }\n        }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass C_drain_IO_L3_out {\n  public:\n    C_drain_IO_L3_out() {}\n    #pragma hls_design interface\n    #pragma hls_pipeline_init_interval 1\n    void CCS_BLOCK(run)(ac_channel<C_t2> &fifo_C_drain_serialize, ac_channel<C_t2> &fifo_C_drain_local_in) {\n      /* Variable Declaration */\n      /* Variable Declaration */\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n      for (ac_int<3, false> c0 = 0; c0 <= 3; c0 += 1)\n        for (ac_int<3, false> c1 = 0; c1 <= 3; c1 += 1)\n          for (ac_int<2, false> c3 = 0; c3 <= 1; c3 += 1)\n            for (ac_int<2, false> c4 = 0; c4 <= 1; c4 += 1)\n              for (ac_int<4, false> c5 = 0; c5 <= 7; c5 += 1)\n                for (ac_int<3, false> c6 = 0; c6 <= 3; c6 += 1)\n#endif\n                {\n                  // hls_pipeline\n                {\n                  C_t2 fifo_data;\n                  fifo_data = fifo_C_drain_local_in.read();\n                  fifo_C_drain_serialize.write(fifo_data);\n                }\n                }\n    }\n};\n/* Module Definition */\n\n/* Module Definition */\nclass C_drain_IO_L3_out_serialize {\n  public:\n    C_drain_IO_L3_out_serialize() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(C_t16 C[256], ac_channel<C_t2> &fifo_C_drain_local_in) {\n      /* Variable Declaration */\n      /* Variable Declaration */\n\n#ifndef __SYNTHESIS__\n      // while () // Please add the fifo check for C sim.\n#endif\n      #pragma hls_pipeline_init_interval 1\n      for (ac_int<9, false> i = 0; i < 256; i++) {\n        C_t2 fifo_data;\n        C_t16 mem_data;\n        C_t2 mem_data_split[8];\n        for (ac_int<4, false> p = 0; p < 8; p++) {\n          fifo_data = fifo_C_drain_local_in.read();\n          mem_data_split[p] = fifo_data;\n        }\n        mem_data.set_slc(0, mem_data_split[0]);\n        mem_data.set_slc(64, mem_data_split[1]);\n        mem_data.set_slc(128, mem_data_split[2]);\n        mem_data.set_slc(192, mem_data_split[3]);\n        mem_data.set_slc(256, mem_data_split[4]);\n        mem_data.set_slc(320, mem_data_split[5]);\n        mem_data.set_slc(384, mem_data_split[6]);\n        mem_data.set_slc(448, mem_data_split[7]);\n        C[i] = mem_data;\n      }\n    }\n};\n/* Module Definition */\n\n#pragma hls_design top\nclass kernel0 {\n  public:\n    kernel0() {}\n    #pragma hls_design interface\n    void CCS_BLOCK(run)(A_t16 A[16384 / 16], B_t16 B[16384 / 16], C_t16 C[4096 / 16])\n    {\n      /* Module Call */\n      A_IO_L3_in_serialize_inst.run(\n        /* array */ A,\n        /* fifo */ fifo_A_A_IO_L3_in_serialize\n      );\n      /* Module Call */\n\n      /* Module Call */\n      A_IO_L3_in_inst.run(\n        /* fifo */ fifo_A_A_IO_L3_in_serialize,\n        /* fifo */ fifo_A_A_IO_L2_in_0\n      );\n      /* Module Call */\n\n      /* Module Call */\n      A_IO_L2_in_inst_0.run(\n        /* module id */ 0,\n        /* fifo */ fifo_A_A_IO_L2_in_0,\n        /* fifo */ fifo_A_A_IO_L2_in_1,\n        /* fifo */ fifo_A_PE_0_0\n      );\n      /* Module Call */\n\n      /* Module Call */\n      A_IO_L2_in_boundary_inst_1.run(\n        /* module id */ 1,\n        /* fifo */ fifo_A_A_IO_L2_in_1,\n        /* fifo */ fifo_A_PE_1_0\n      );\n      /* Module Call */\n\n      /* Module Call */\n      B_IO_L3_in_serialize_inst.run(\n        /* array */ B,\n        /* fifo */ fifo_B_B_IO_L3_in_serialize\n      );\n      /* Module Call */\n\n      /* Module Call */\n      B_IO_L3_in_inst.run(\n        /* fifo */ fifo_B_B_IO_L3_in_serialize,\n        /* fifo */ fifo_B_B_IO_L2_in_0\n      );\n      /* Module Call */\n\n      /* Module Call */\n      B_IO_L2_in_inst_0.run(\n        /* module id */ 0,\n        /* fifo */ fifo_B_B_IO_L2_in_0,\n        /* fifo */ fifo_B_B_IO_L2_in_1,\n        /* fifo */ fifo_B_PE_0_0\n      );\n      /* Module Call */\n\n      /* Module Call */\n      B_IO_L2_in_boundary_inst_1.run(\n        /* module id */ 1,\n        /* fifo */ fifo_B_B_IO_L2_in_1,\n        /* fifo */ fifo_B_PE_0_1\n      );\n      /* Module Call */\n\n      /* Module Call */\n      PE_inst_0_0.run(\n        /* module id */ 0,\n        /* module id */ 0,\n        /* fifo */ fifo_A_PE_0_0,\n        /* fifo */ fifo_A_PE_0_1,\n        /* fifo */ fifo_B_PE_0_0,\n        /* fifo */ fifo_B_PE_1_0,\n        /* fifo */ fifo_C_drain_PE_0_0\n      );\n      /* Module Call */\n\n      /* Module Call */\n      PE_inst_0_1.run(\n        /* module id */ 0,\n        /* module id */ 1,\n        /* fifo */ fifo_A_PE_0_1,\n        /* fifo */ fifo_A_PE_0_2,\n        /* fifo */ fifo_B_PE_0_1,\n        /* fifo */ fifo_B_PE_1_1,\n        /* fifo */ fifo_C_drain_PE_0_1\n      );\n      /* Module Call */\n\n      /* Module Call */\n      PE_inst_1_0.run(\n        /* module id */ 1,\n        /* module id */ 0,\n        /* fifo */ fifo_A_PE_1_0,\n        /* fifo */ fifo_A_PE_1_1,\n        /* fifo */ fifo_B_PE_1_0,\n        /* fifo */ fifo_B_PE_2_0,\n        /* fifo */ fifo_C_drain_PE_1_0\n      );\n      /* Module Call */\n\n      /* Module Call */\n      PE_inst_1_1.run(\n        /* module id */ 1,\n        /* module id */ 1,\n        /* fifo */ fifo_A_PE_1_1,\n        /* fifo */ fifo_A_PE_1_2,\n        /* fifo */ fifo_B_PE_1_1,\n        /* fifo */ fifo_B_PE_2_1,\n        /* fifo */ fifo_C_drain_PE_1_1\n      );\n      /* Module Call */\n\n      /* Module Call */\n      C_drain_IO_L1_out_boundary_inst_0_1.run(\n        /* module id */ 0,\n        /* module id */ 1,\n        /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_1,\n        /* fifo */ fifo_C_drain_PE_1_0\n      );\n      /* Module Call */\n\n      /* Module Call */\n      C_drain_IO_L1_out_inst_0_0.run(\n        /* module id */ 0,\n        /* module id */ 0,\n        /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_1,\n        /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_0,\n        /* fifo */ fifo_C_drain_PE_0_0\n      );\n      /* Module Call */\n\n      /* Module Call */\n      C_drain_IO_L1_out_boundary_inst_1_1.run(\n        /* module id */ 1,\n        /* module id */ 1,\n        /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_1,\n        /* fifo */ fifo_C_drain_PE_1_1\n      );\n      /* Module Call */\n\n      /* Module Call */\n      C_drain_IO_L1_out_inst_1_0.run(\n        /* module id */ 1,\n        /* module id */ 0,\n        /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_1,\n        /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_0,\n        /* fifo */ fifo_C_drain_PE_0_1\n      );\n      /* Module Call */\n\n      /* Module Call */\n      C_drain_IO_L2_out_boundary_inst_1.run(\n        /* module id */ 1,\n        /* fifo */ fifo_C_drain_C_drain_IO_L2_out_1,\n        /* fifo */ fifo_C_drain_C_drain_IO_L1_out_1_0\n      );\n      /* Module Call */\n\n      /* Module Call */\n      C_drain_IO_L2_out_inst_0.run(\n        /* module id */ 0,\n        /* fifo */ fifo_C_drain_C_drain_IO_L2_out_1,\n        /* fifo */ fifo_C_drain_C_drain_IO_L2_out_0,\n        /* fifo */ fifo_C_drain_C_drain_IO_L1_out_0_0\n      );\n      /* Module Call */\n\n      /* Module Call */\n      C_drain_IO_L3_out_inst.run(\n        /* fifo */ fifo_C_drain_C_drain_IO_L3_out_serialize,\n        /* fifo */ fifo_C_drain_C_drain_IO_L2_out_0\n      );\n      /* Module Call */\n\n      /* Module Call */\n      C_drain_IO_L3_out_serialize_inst.run(\n        /* array */ C,\n        /* fifo */ fifo_C_drain_C_drain_IO_L3_out_serialize\n      );\n      /* Module Call */\n\n    }\n\n  private:\n    /* Module Declaration */\n    A_IO_L3_in_serialize A_IO_L3_in_serialize_inst;\n    A_IO_L3_in A_IO_L3_in_inst;\n    A_IO_L2_in A_IO_L2_in_inst_0;\n    A_IO_L2_in_boundary A_IO_L2_in_boundary_inst_1;\n    B_IO_L3_in_serialize B_IO_L3_in_serialize_inst;\n    B_IO_L3_in B_IO_L3_in_inst;\n    B_IO_L2_in B_IO_L2_in_inst_0;\n    B_IO_L2_in_boundary B_IO_L2_in_boundary_inst_1;\n    PE PE_inst_0_0;\n    PE PE_inst_0_1;\n    PE PE_inst_1_0;\n    PE PE_inst_1_1;\n    C_drain_IO_L1_out C_drain_IO_L1_out_inst_0_0;\n    C_drain_IO_L1_out_boundary C_drain_IO_L1_out_boundary_inst_0_1;\n    C_drain_IO_L1_out C_drain_IO_L1_out_inst_1_0;\n    C_drain_IO_L1_out_boundary C_drain_IO_L1_out_boundary_inst_1_1;\n    C_drain_IO_L2_out C_drain_IO_L2_out_inst_0;\n    C_drain_IO_L2_out_boundary C_drain_IO_L2_out_boundary_inst_1;\n    C_drain_IO_L3_out C_drain_IO_L3_out_inst;\n    C_drain_IO_L3_out_serialize C_drain_IO_L3_out_serialize_inst;\n    /* Module Declaration */\n\n    /* FIFO Declaration */\n    /* A_IO_L3_in_serialize fifo */ ac_channel<A_t8> fifo_A_A_IO_L3_in_serialize;\n    /* B_IO_L3_in_serialize fifo */ ac_channel<B_t8> fifo_B_B_IO_L3_in_serialize;\n    /* C_drain_IO_L3_out_serialize fifo */ ac_channel<C_t2> fifo_C_drain_C_drain_IO_L3_out_serialize;\n    /* A_IO_L2_in fifo */ ac_channel<A_t8> fifo_A_A_IO_L2_in_0;\n    /* A_IO_L2_in fifo */ ac_channel<A_t8> fifo_A_A_IO_L2_in_1;\n    /* A_IO_L2_in fifo */ ac_channel<A_t8> fifo_A_A_IO_L2_in_2;\n    /* B_IO_L2_in fifo */ ac_channel<B_t8> fifo_B_B_IO_L2_in_0;\n    /* B_IO_L2_in fifo */ ac_channel<B_t8> fifo_B_B_IO_L2_in_1;\n    /* B_IO_L2_in fifo */ ac_channel<B_t8> fifo_B_B_IO_L2_in_2;\n    /* PE fifo */ ac_channel<A_t2> fifo_A_PE_0_0;\n    /* PE fifo */ ac_channel<A_t2> fifo_A_PE_0_1;\n    /* PE fifo */ ac_channel<A_t2> fifo_A_PE_0_2;\n    /* PE fifo */ ac_channel<A_t2> fifo_A_PE_1_0;\n    /* PE fifo */ ac_channel<A_t2> fifo_A_PE_1_1;\n    /* PE fifo */ ac_channel<A_t2> fifo_A_PE_1_2;\n    /* PE fifo */ ac_channel<B_t2> fifo_B_PE_0_0;\n    /* PE fifo */ ac_channel<B_t2> fifo_B_PE_1_0;\n    /* PE fifo */ ac_channel<B_t2> fifo_B_PE_2_0;\n    /* PE fifo */ ac_channel<B_t2> fifo_B_PE_0_1;\n    /* PE fifo */ ac_channel<B_t2> fifo_B_PE_1_1;\n    /* PE fifo */ ac_channel<B_t2> fifo_B_PE_2_1;\n    /* PE fifo */ ac_channel<C_t1> fifo_C_drain_PE_0_0;\n    /* PE fifo */ ac_channel<C_t1> fifo_C_drain_PE_1_0;\n    /* PE fifo */ ac_channel<C_t1> fifo_C_drain_PE_0_1;\n    /* PE fifo */ ac_channel<C_t1> fifo_C_drain_PE_1_1;\n    /* C_drain_IO_L1_out fifo */ ac_channel<C_t2> fifo_C_drain_C_drain_IO_L1_out_0_0;\n    /* C_drain_IO_L1_out fifo */ ac_channel<C_t2> fifo_C_drain_C_drain_IO_L1_out_0_1;\n    /* C_drain_IO_L1_out fifo */ ac_channel<C_t2> fifo_C_drain_C_drain_IO_L1_out_0_2;\n    /* C_drain_IO_L1_out fifo */ ac_channel<C_t2> fifo_C_drain_C_drain_IO_L1_out_1_0;\n    /* C_drain_IO_L1_out fifo */ ac_channel<C_t2> fifo_C_drain_C_drain_IO_L1_out_1_1;\n    /* C_drain_IO_L1_out fifo */ ac_channel<C_t2> fifo_C_drain_C_drain_IO_L1_out_1_2;\n    /* C_drain_IO_L2_out fifo */ ac_channel<C_t2> fifo_C_drain_C_drain_IO_L2_out_0;\n    /* C_drain_IO_L2_out fifo */ ac_channel<C_t2> fifo_C_drain_C_drain_IO_L2_out_1;\n    /* C_drain_IO_L2_out fifo */ ac_channel<C_t2> fifo_C_drain_C_drain_IO_L2_out_2;\n    /* FIFO Declaration */\n};\n"
  },
  {
    "path": "autosa_tests/mm_catapult/simd_info.json",
    "content": "{\n  \"kernel0\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel1\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel2\": {\n    \"reduction\": [\"y\"]\n  }, \n  \"kernel3\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel4\": {\n    \"reduction\": [\"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/mm_getting_started/Makefile",
    "content": "VPP := $(XILINX_VITIS)/bin/v++\nEMCONFIGUTIL := $(XILINX_VITIS)/bin/emconfigutil\nMODE := hw\n#PLATFORM := xilinx_u200_qdma_201920_1\nPLATFORM := xilinx_u250_xdma_201830_2\n\n# sources\nKERNEL_SRC := src/kernel_kernel.cpp\nHOST_SRC := src/kernel_host.cpp\n\n# targets\nHOST_EXE := host.exe\n\nXOS := kernel0.$(MODE).xo\nXCLBIN := kernel0.$(MODE).xclbin\nEMCONFIG_FILE := emconfig.json\n\n# Linker options to map kernel ports to DDR banks\nVPP_LINK_OPTS := --config connectivity.cfg\n\nVPP_COMMON_OPTS := -s -t $(MODE) --platform $(PLATFORM) -R2 -O3 --kernel_frequency 250 --vivado.prop=run.impl_1.STRATEGY=Performance_EarlyBlockPlacement\nCFLAGS := -g -std=c++11 -I$(XILINX_XRT)/include\nLFLAGS := -L$(XILINX_XRT)/lib -lxilinxopencl -lpthread -lrt\nNUMDEVICES := 1\n\n# run time args\nEXE_OPT := kernel0.$(MODE).xclbin\n\n# primary build targets\n.PHONY: xclbin app all\n\nxclbin:  $(XCLBIN)\napp: $(HOST_EXE)\n\nall: xclbin app\n\nclean:\n\t-$(RM) $(EMCONFIG_FILE) $(HOST_EXE) $(XCLBIN) *.xclbin *.xo $(XOS)\n\n# kernel rules\n$(XOS): $(KERNEL_SRC)\n\t$(RM) $@\n\t$(VPP) $(VPP_COMMON_OPTS) -c -k kernel0 -o $@ $+\n\n\n$(XCLBIN): $(XOS)\n\t$(VPP) $(VPP_COMMON_OPTS) -l -o $@ $+ $(VPP_LINK_OPTS)\n\n# host rules\n$(HOST_EXE): $(HOST_SRC)\n\tg++ $(CFLAGS) -o $@ $+ $(LFLAGS)\n\t@echo 'Compiled Host Executable: $(HOST_EXE)'\n\n$(EMCONFIG_FILE):\n\t$(EMCONFIGUTIL) --nd $(NUMDEVICES) --od . --platform $(PLATFORM)\n\ncheck: $(XCLBIN) $(HOST_EXE) $(EMCONFIG_FILE)\n\tXCL_EMULATION_MODE=${MODE} ./$(HOST_EXE) $(EXE_OPT)\n"
  },
  {
    "path": "autosa_tests/mm_getting_started/connectivity.cfg",
    "content": "[connectivity]\nsp=kernel0_1.A:DDR[0]\nsp=kernel0_1.B:DDR[1] \nsp=kernel0_1.C:DDR[2]\n"
  },
  {
    "path": "autosa_tests/mm_getting_started/hls_script.tcl",
    "content": "############################################################\n## This file is generated automatically by Vivado HLS.\n## Please DO NOT edit it.\n## Copyright (C) 1986-2019 Xilinx, Inc. All Rights Reserved.\n############################################################\nopen_project hls_prj\nset_top kernel0\nadd_files src/kernel_kernel.h\nadd_files src/kernel_kernel.cpp\nadd_files -tb src/kernel_host.cpp\nopen_solution \"solution1\"\nset_part {xcu200-fsgd2104-2-e}\ncreate_clock -period 5 -name default\nconfig_compile -name_max_length 50\n#source \"./prj/solution1/directives.tcl\"\ncsim_design\n#csynth_design\n#cosim_design\n#cosim_design -trace_level all\n#cosim_design -setup -trace_level all\n#export_design -format ip_catalog\nexit\n"
  },
  {
    "path": "autosa_tests/mm_getting_started/kernel.c",
    "content": "// Uncomment the macro below to apply the layout transformation on array B to enable SIMD vectorization\n#define LAYOUT_TRANSFORM\n\n#include \"kernel.h\"\n\nint main(int argc, char **argv) {\n#ifndef LAYOUT_TRANSFORM  \n  data_t A[I][K], B[K][J], C[I][J], C_golden[I][J]; \n#else  \n  data_t A[I][K], B[J][K], C[I][J], C_golden[I][J];\n#endif\n\n  for (int i = 0; i < I; i++) \n    for (int k = 0; k < K; k++) {\n      A[i][k] = (data_t)rand() / RAND_MAX;\n    }\n\n  for (int j = 0; j < J; j++)\n    for (int k = 0; k < K; k++) {\n#ifndef LAYOUT_TRANSFORM      \n      B[k][j] = (data_t)rand() / RAND_MAX;\n#else      \n      B[j][k] = (data_t)rand() / RAND_MAX;\n#endif      \n    }\n\n#pragma scop\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n#ifndef LAYOUT_TRANSFORM        \n        C[i][j] = C[i][j] + A[i][k] * B[k][j];\n#else        \n        C[i][j] = C[i][j] + A[i][k] * B[j][k];\n#endif        \n      }\n    }\n#pragma endscop\n\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C_golden[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n#ifndef LAYOUT_TRANSFORM        \n        C_golden[i][j] = C_golden[i][j] + A[i][k] * B[k][j];\n#else\n        C_golden[i][j] = C_golden[i][j] + A[i][k] * B[j][k];\n#endif        \n      }\n    }\n\n  int err = 0;\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      if (fabs((float)C_golden[i][j] - (float)C[i][j]) > 0.001)\n        err++;\n    }\n\n  if (err)\n    printf(\"Failed with %d errors!\\n\", err);\n  else\n    printf(\"Passed!\\n\");\n\n  return 0;\n}\n"
  },
  {
    "path": "autosa_tests/mm_getting_started/kernel.h",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <math.h>\n\ntypedef float data_t;\n#define I 64\n#define J 64\n#define K 64\n"
  },
  {
    "path": "autosa_tests/mm_getting_started/simd_info.json",
    "content": "{\n  \"kernel0\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel1\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel2\": {\n    \"reduction\": [\"y\"]\n  }, \n  \"kernel3\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel4\": {\n    \"reduction\": [\"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/mm_hbm/Makefile",
    "content": "VPP := $(XILINX_VITIS)/bin/v++\nEMCONFIGUTIL := $(XILINX_VITIS)/bin/emconfigutil\nMODE := hw\nPLATFORM := xilinx_u280_xdma_201920_3\n\n# sources\nKERNEL_SRC := src/kernel_kernel.cpp\nHOST_SRC := src/kernel_host.cpp\n\n# targets\nHOST_EXE := host.exe\n\nXOS := kernel0.$(MODE).xo\nXCLBIN := kernel0.$(MODE).xclbin\nEMCONFIG_FILE := emconfig.json\n\n# Linker options to map kernel ports to DDR banks\nVPP_LINK_OPTS := --config connectivity.cfg\n\nVPP_COMMON_OPTS := -s -t $(MODE) --platform $(PLATFORM) -R2 -O3 --kernel_frequency 250 --vivado.prop=run.impl_1.STRATEGY=Performance_EarlyBlockPlacement\nCFLAGS := -g -std=c++11 -I$(XILINX_XRT)/include\nLFLAGS := -L$(XILINX_XRT)/lib -lxilinxopencl -lpthread -lrt\nNUMDEVICES := 1\n\n# run time args\nEXE_OPT := kernel0.$(MODE).xclbin\n\n# primary build targets\n.PHONY: xclbin app all\n\nxclbin:  $(XCLBIN)\napp: $(HOST_EXE)\n\nall: xclbin app\n\nclean:\n\t-$(RM) $(EMCONFIG_FILE) $(HOST_EXE) $(XCLBIN) *.xclbin *.xo $(XOS)\n\n# kernel rules\n$(XOS): $(KERNEL_SRC)\n\t$(RM) $@\n\t$(VPP) $(VPP_COMMON_OPTS) -c -k kernel0 -o $@ $+\n\n\n$(XCLBIN): $(XOS)\n\t$(VPP) $(VPP_COMMON_OPTS) -l -o $@ $+ $(VPP_LINK_OPTS)\n\n# host rules\n$(HOST_EXE): $(HOST_SRC)\n\tg++ $(CFLAGS) -o $@ $+ $(LFLAGS)\n\t@echo 'Compiled Host Executable: $(HOST_EXE)'\n\n$(EMCONFIG_FILE):\n\t$(EMCONFIGUTIL) --nd $(NUMDEVICES) --od . --platform $(PLATFORM)\n\ncheck: $(XCLBIN) $(HOST_EXE) $(EMCONFIG_FILE)\n\tXCL_EMULATION_MODE=${MODE} ./$(HOST_EXE) $(EXE_OPT)\n"
  },
  {
    "path": "autosa_tests/mm_hbm/README.md",
    "content": "# Matrix Multiplication (HBM)\n\nThis is an example of small-size matrix multiplication using high-bandwidth memory (HBM).\n\nBoard        | Software Version\n-------------|-----------------\nXilinx Alveo U280 | Xilinx Vitis 2019.2\n\n__Files__:\n```\nautosa_tests/mm_hbm/kernel.c\nautosa_tests/mm_hbm/kernel.h\nautosa_tests/mm_hbm/simd_info.json\nautosa_tests/mm_hbm/Makefile\nautosa_tests/mm_hbm/connectivity.cfg\n```\n\n__Command__:\n```c\n./autosa ./autosa_tests/mm_hbm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[32,32,32];kernel[]->latency[8,8];kernel[]->simd[2];kernel[]->hbm_A[2];kernel[]->hbm_B[2];kernel[]->hbm_C_drain[2]}\" --simd-info=./autosa_tests/mm_hbm/simd_info.json --hbm\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` and `connectivity.cfg` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/mm_hbm/Makefile autosa.tmp/output/\ncp autosa_tests/mm_hbm/connectivity.cfg autosa.tmp/output/\n```\n\nExecute the makefile to build the design.\n\n```\ncd autosa.tmp/output\nmake all\n```"
  },
  {
    "path": "autosa_tests/mm_hbm/connectivity.cfg",
    "content": "[connectivity]\nsp=kernel0_1.A_0:HBM[0]\nsp=kernel0_1.A_1:HBM[1]\nsp=kernel0_1.B_0:HBM[2] \nsp=kernel0_1.B_1:HBM[3] \nsp=kernel0_1.C_0:HBM[4]\nsp=kernel0_1.C_1:HBM[5]\n"
  },
  {
    "path": "autosa_tests/mm_hbm/hls_script.tcl",
    "content": "############################################################\n## This file is generated automatically by Vivado HLS.\n## Please DO NOT edit it.\n## Copyright (C) 1986-2019 Xilinx, Inc. All Rights Reserved.\n############################################################\nopen_project hls_prj\nset_top kernel0\nadd_files src/kernel_kernel.h\nadd_files src/kernel_kernel.cpp\nadd_files -tb src/kernel_host.cpp\nopen_solution \"solution1\"\nset_part {xcu200-fsgd2104-2-e}\ncreate_clock -period 5 -name default\nconfig_compile -name_max_length 50\n#source \"./prj/solution1/directives.tcl\"\ncsim_design\n#csynth_design\n#cosim_design\n#cosim_design -trace_level all\n#cosim_design -setup -trace_level all\n#export_design -format ip_catalog\nexit\n"
  },
  {
    "path": "autosa_tests/mm_hbm/kernel.c",
    "content": "#include \"kernel.h\"\n\nint main(int argc, char **argv) {\n//  data_t A[I][K], B[K][J], C[I][J], C_golden[I][J]; \n  data_t A[I][K], B[J][K], C[I][J], C_golden[I][J];\n\n  for (int i = 0; i < I; i++) \n    for (int k = 0; k < K; k++) {\n      A[i][k] = k;\n    }\n\n  for (int j = 0; j < J; j++)\n    for (int k = 0; k < K; k++) {\n      B[j][k] = k;\n    }\n\n#pragma scop\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n        C[i][j] = C[i][j] + A[i][k] * B[j][k];\n      }\n    }\n#pragma endscop\n\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C_golden[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n        C_golden[i][j] = C_golden[i][j] + A[i][k] * B[j][k];\n      }\n    }\n\n  int err = 0;\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      if (fabs((float)C_golden[i][j] - (float)C[i][j]) > 0.001)\n        err++;\n    }\n\n  if (err)\n    printf(\"Failed with %d errors!\\n\", err);\n  else\n    printf(\"Passed!\\n\");\n\n  return 0;\n}\n"
  },
  {
    "path": "autosa_tests/mm_hbm/kernel.h",
    "content": "#include \"stdio.h\"\n#include \"stdlib.h\"\n#include \"math.h\"\n\ntypedef float data_t;\n#define I 64\n#define J 64\n#define K 64\n"
  },
  {
    "path": "autosa_tests/mm_hbm/simd_info.json",
    "content": "{\n  \"kernel3\": {\n    \"reduction\": [\"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/mm_hcl/README.md",
    "content": "# Matrix Multiplication (Small)\n\nBoard        | Software Version\n-------------|-----------------\nXilinx Alveo U250 | Xilinx Vitis 2019.2\n\n__Files__:\n```\nautosa_tests/mm_hcl/kernel.c\nautosa_tests/mm_hcl/kernel.h\nautosa_tests/mm_hcl/simd_info.json\nautosa_tests/mm_hcl/hls_script.tcl\n```\n\n__Command__:\nThis is an internal test example for HeteroCL integration.\n\n## Transposition\n\nFirst, HeteroCL might provide AutoSA with transposed input matrices. We consider four test cases here.\n\n1. A_B: Both input matrices A and B keep the row major.\n\nSet `TRANS` to `A_B` in `kernel.c`.\nUse the following command to compile the program.\n```bash\n./autosa ./autosa_tests/mm_hcl/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8]}\" \\\n--simd-info=./autosa_tests/mm_hcl/simd_info.json \\\n--hls \\\n--hcl\n```\n\nThe generated files can be found under `autosa.tmp/output`.\nYou may verify the design using Xilinx HLS.\n\n```bash\ncp ./autosa_tests/mm_hcl/hls_script.tcl ./autosa.tmp/output/\ncd ./autosa.tmp/output\nvivado_hls -f hls_script.tcl\n```\n\nYou may notice here that we didn't use SIMD vectorization. The reason is that by default AutoSA will only examine the time loops (loops not mapped to the PE dimensions, aka, space loops). In this example, only loop k is available.\nHowever, with the default layout `A[i][k]`, `B[k][j]`, and `C[i][j]`, as k is not the last-varying dimension of matrix B, it can't be used for vectorization.\n\nTo enable vectorization, we could enable AutoSA to use space loops as candidates as well. In this example, loop j can be used for vectorization.\nNote that loop j is invariant to `A[i][k]` and leads to stride-one access for `B[k][j]`. However, before using this loop as the vectorization loop, we have to \nturn off the latency hiding optimization on loop j. The reason is that \nthe loop j is tiled for latency hiding before vectorization, the remaining tiled loop is no longer consecutive as it is now mapped to hyper tiles. And therefore, the array access `B[k][j]` is no longer coalesced under this loop and \nSIMD vectorization opportunity is lost. \n\nTo make use of SIMD vectorization, use the following command.\n```bash\n./autosa ./autosa_tests/mm_hcl/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,1]}\" \\\n--simd-info=./autosa_tests/mm_hcl/simd_info.json \\\n--hls \\\n--hcl \\\n--simd-touch-space\n```\n\nWe add `--simd-touch-space` to consider space loops as well for vectorization. To use loop j for vectorization, we set latency tiling factors to `[8,1]` which means that only loop i is tiled for latency hiding. AutoSA will dump out the possible loops for SIMD vectorization. Take a look at the file `tuning.json` under the directory `autosa.tmp/output`\n\n```json\n\"simd\": {\n    \"tilable_loops\": [16,16],\n    \"scores\": [13,13],\n    \"legal\": [1,0]\n}\n```\n\nAutoSA identifies two candidate loops. The first loop is the loop j, and the second loop is the loop k.\nHowever, layout transformation is required for loop k.\nTherefore, the `legal` value is set to 0 for the second loop.\n\nNow to apply SIMD vectorization, use the following command.\n\n```bash\n./autosa ./autosa_tests/mm_hcl/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,1];kernel[]->simd[8,1]}\" \\\n--simd-info=./autosa_tests/mm_hcl/simd_info.json \\\n--hls \\\n--hcl \\\n--simd-touch-space\n```\n\nA complete design with loop j vectorized is generated now.\n\n2. AT_B: The input matrix A is transposed to column major, and the matrix B keeps the column major.\n\nSet `TRANS` to `AT_B` in `kernel.c`.\nUse the following command to compile the program.\n```bash\n./autosa ./autosa_tests/mm_hcl/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8]}\" \\\n--simd-info=./autosa_tests/mm_hcl/simd_info.json \\\n--hls \\\n--hcl\n```\n\nTo enable SIMD vectorization, let's take a look at the array accesses `A[k][i]`, `B[k][j]`, and `C[i][j]`.\nIn this case, loop j can be used for vectorization as long as it is avoided during the latency hiding. \nUse the following command to only tile loop i for latency hiding.\n\n```bash\n./autosa ./autosa_tests/mm_hcl/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,1]}\" \\\n--simd-info=./autosa_tests/mm_hcl/simd_info.json \\\n--hls \\\n--hcl \\\n--simd-touch-space\n```\n\nSimilarly you may check `tuning.json` for more detailed information. Finally use the command below to generated a vectorized design.\n\n```bash\n./autosa ./autosa_tests/mm_hcl/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,1];kernel[]->simd[8,1]}\" \\\n--simd-info=./autosa_tests/mm_hcl/simd_info.json \\\n--hls \\\n--hcl \\\n--simd-touch-space\n```\n\n3. A_BT: The input matrix A remains the row major, and the matrix B is transposed to column major.\n\nSet `TRANS` to `A_BT` in `kernel.c`.\nRun the following command first.\n\n```bash\n./autosa ./autosa_tests/mm_hcl/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8]}\" \\\n--simd-info=./autosa_tests/mm_hcl/simd_info.json \\\n--hls \\\n--hcl\n```\n\nIn this case, AutoSA already detects a SIMD candidate loop k and will stop.\nArray accesses in the current layout are `A[i][k]`, `B[j][k]`, and `C[i][j]`. Therefore, loop k can be used as the SIMD loop. Let's set the SIMD factor to 8 by using the following command to generate a complete design.\n\n```bash\n./autosa ./autosa_tests/mm_hcl/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[8]}\" \\\n--simd-info=./autosa_tests/mm_hcl/simd_info.json \\\n--hls \\\n--hcl\n```\n\n4. AT_BT: Both matrix A and B are transposed to column major.\n\nSet `TRANS` to `AT_BT` in `kernel.c`.\n\nRun the following command first.\n\n```bash\n./autosa ./autosa_tests/mm_hcl/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8]}\" \\\n--simd-info=./autosa_tests/mm_hcl/simd_info.json \\\n--hls \\\n--hcl\n```\n\nAn unvectorized design is generated.\nArray accesses in the current layout are `A[k][i]`, `B[j][k]`, and `C[i][j]`. In this case, none of the loops can be used for vectorization.\n\nIn conclusion, when matrix A and B are supplied to AutoSA with different layouts, there are different rules to consider to enable full optimization (specifically, SIMD vectorization). We summarize these rules below.\n\n| Layout |     Latency Hiding     |         SIMD        |  Compilation Flag  |\n|:------:|:----------------------:|:-------------------:|:------------------:|\n|   A_B  | kernel[]->latency[X,1] | kernel[]->simd[X,1] | --simd-touch-space |\n|  AT_B  | kernel[]->latency[X,1] | kernel[]->simd[X,1] | --simd-touch-space |\n|  A_BT  | kernel[]->latency[X,X] |  kernel[]->simd[X]  |                    |\n|  AT_BT | kernel[]->latency[X,X] |         N/A         |                    |\n\n## Data Packing\n\nIn additional to transposition, HeteroCL could also supply AutoSA with pre-packed array. \nBy default, AutoSA will try to pack data as much as possible for each array to improve the effective DRAM bandwidth.\nThe data packing factors can be restrained by using the argument `--data-pack-sizes`. \nFor each array, AutoSA allows users to restrain the data packing factors at three levels:\n\n- Innermost level: Data packing factors for L1 I/O modules.\n- Outermost level: Data packing factors for I/O modules accessing the DRAM.\n- Intermediate level: Data packing factors for I/O modules except L1 or outermost I/O modules.\n\nTo restrain any data packing factors in the program. Specify it using the following format.\n\n```bash\n--data-pack-sizes=\"{kernel[]->A[8,32,64]}\"\n```\n\nUsing the above commands, we retrain the innermost level data packing factors to be no greater than 8 bytes (64 bits), \nthe intermediate level to be no greater than 32 bytes (256 bits), and the outermost level to be no greater than 64 bytes (512 bits).\nDue to the limitation of Xilinx devices, we require the outermost data packing factors to be no greater than 512 bits. \nIn addition, as a rule of thumb, we recommend to limit the intermediate level no greater than 256 bits to restrain the FIFO overheads.\n\nSet `TRANS` to `A_BT` in `kernel.c`.\nUse the following command to compile the design.\n\n```bash\n./autosa ./autosa_tests/mm_hcl/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[8]}\" \\\n--simd-info=./autosa_tests/mm_hcl/simd_info.json \\\n--hls \\\n--hcl \\\n--data-pack-sizes=\"{kernel[]->A[8,32,64];kernel[]->B[8,32,64];kernel[]->C[8,32,64]}\"\n```\n\nNow let's take a look at the generated code.\nAt the top-level function `void autosa_func(A_t16 *A, B_t16 *B, C_t8 *C)`, we have array A packed with 16 elements (512 bits), array B packed with 16 elements (512 bits), and array C packed with 8 elements (256 bits). Although we have specified the maximal outermost packing factor to be 512 bits for each array, only array A and B achieved the maximal packing factor.\n\nFor array `C[I][J]`, as we partitoned the whole systolic array with factors `[16,16,16]`, each time the systolic array computes a tile of `C[16][16]`. Furthermore, as this tile is partitioned to be computed in a `2x2` array, each PE generates a sub-tile of `C[8][8]`. Therefore, when draining out the results, we transfer out the data in the size of sub-tile `C[8][8]`. The maximal data packing factor that we can achieve is 8.\n\nIf programmers hope to have a larger data packing factor for array C as well, there are two options to consider:\n\n- Use host data serialization. \n- Partition a larger tile inside each PE.\n\nHost serialization requires layout transformation on the host side which makes it difficult to integrate with the existing HeteroCL environment.\n\nThe command below shows an example of using a larger latency hiding factor to allocate a larger tile inside each PE.\n\n```bash\n./autosa ./autosa_tests/mm_hcl/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,32,16];kernel[]->latency[8,16];kernel[]->simd[8]}\" \\\n--simd-info=./autosa_tests/mm_hcl/simd_info.json \\\n--hls \\\n--hcl \\\n--data-pack-sizes=\"{kernel[]->A[8,32,64];kernel[]->B[8,32,64];kernel[]->C[8,32,64]}\"\n```\n\nYou can check the generated the source code and find that we have successuflly packed all arrays to 16 elements each.\n\nThe last thing to mention is that in the current flow we prioritize SIMD vectorization factors to user-specified data packing factors. In this example, as we specify the SIMD factor to be 8, array A and B will be packed with 8 elements at least. As an example, if running the following commmand which tries to restrain the data packing factors of A and B to 4 elements (16 bytes), AutoSA will ignore this constraint and pack A and B with 8 elements, only array C will be packed with 4 elements.\n\n```bash\n./autosa ./autosa_tests/mm_hcl/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_hls_c \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,32,16];kernel[]->latency[8,16];kernel[]->simd[8]}\" \\\n--simd-info=./autosa_tests/mm_hcl/simd_info.json \\\n--hls \\\n--hcl \\\n--data-pack-sizes=\"{kernel[]->A[8,16,16];kernel[]->B[8,16,16];kernel[]->C[8,16,16]}\"\n```"
  },
  {
    "path": "autosa_tests/mm_hcl/hls_script.tcl",
    "content": "############################################################\n## This file is generated automatically by Vivado HLS.\n## Please DO NOT edit it.\n## Copyright (C) 1986-2019 Xilinx, Inc. All Rights Reserved.\n############################################################\nopen_project hls_prj\nset_top kernel0\nadd_files src/kernel_kernel.h\nadd_files src/kernel_kernel.cpp\nadd_files -tb src/kernel_host.cpp\nopen_solution \"solution1\"\nset_part {xcu200-fsgd2104-2-e}\ncreate_clock -period 5 -name default\nconfig_compile -name_max_length 50\n#source \"./prj/solution1/directives.tcl\"\ncsim_design\n#csynth_design\n#cosim_design\n#cosim_design -trace_level all\n#cosim_design -setup -trace_level all\n#export_design -format ip_catalog\nexit\n"
  },
  {
    "path": "autosa_tests/mm_hcl/kernel.c",
    "content": "#include \"kernel.h\"\n\n#define A_B 0\n#define AT_B 1\n#define A_BT 2\n#define AT_BT 3\n#define TRANS A_BT\n\nint main(int argc, char **argv) {\n#if TRANS == A_B \n  data_t A[I][K], B[K][J], C[I][J], C_golden[I][J]; \n#elif TRANS == AT_B\n  data_t A[K][I], B[K][J], C[I][J], C_golden[I][J];\n#elif TRANS == A_BT\n  data_t A[I][K], B[J][K], C[I][J], C_golden[I][J];\n#elif TRANS == AT_BT\n  data_t A[K][I], B[J][K], C[I][J], C_golden[I][J];\n#endif\n\n  for (int i = 0; i < I; i++) \n    for (int k = 0; k < K; k++) {\n#if TRANS == A_B \n      A[i][k] = (data_t)rand() / RAND_MAX;\n#elif TRANS == A_BT\n      A[i][k] = (data_t)rand() / RAND_MAX;\n#elif TRANS == AT_B\n      A[k][i] = (data_t)rand() / RAND_MAX;\n#elif TRANS == AT_BT\n      A[k][i] = (data_t)rand() / RAND_MAX;\n#endif\n    }\n\n  for (int j = 0; j < J; j++)\n    for (int k = 0; k < K; k++) {\n#if TRANS == A_B\n      B[k][j] = (data_t)rand() / RAND_MAX;\n#elif TRANS == A_BT\n      B[j][k] = (data_t)rand() / RAND_MAX;\n#elif TRANS == AT_B\n      B[k][j] = (data_t)rand() / RAND_MAX;\n#elif TRANS == AT_BT\n      B[j][k] = (data_t)rand() / RAND_MAX;\n#endif     \n    }\n\n#pragma scop\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n#if TRANS == A_B\n        C[i][j] = C[i][j] + A[i][k] * B[k][j];\n#elif TRANS == A_BT\n        C[i][j] = C[i][j] + A[i][k] * B[j][k];\n#elif TRANS == AT_B\n        C[i][j] = C[i][j] + A[k][i] * B[k][j];\n#elif TRANS == AT_BT\n        C[i][j] = C[i][j] + A[k][i] * B[j][k];\n#endif          \n      }\n    }\n#pragma endscop\n\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C_golden[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n#if TRANS == A_B\n        C_golden[i][j] = C_golden[i][j] + A[i][k] * B[k][j];\n#elif TRANS == A_BT\n        C_golden[i][j] = C_golden[i][j] + A[i][k] * B[j][k];\n#elif TRANS == AT_B\n        C_golden[i][j] = C_golden[i][j] + A[k][i] * B[k][j];\n#elif TRANS == AT_BT\n        C_golden[i][j] = C_golden[i][j] + A[k][i] * B[j][k];\n#endif          \n      }\n    }\n\n  int err = 0;\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      if (fabs((float)C_golden[i][j] - (float)C[i][j]) > 0.001)\n        err++;\n    }\n\n  if (err)\n    printf(\"Failed with %d errors!\\n\", err);\n  else\n    printf(\"Passed!\\n\");\n\n  return 0;\n}\n\n//#include <stdio.h>\n//int main(int argc, char **argv) {\n//\n//      float L2[1][10];\n//      float FL[1][64];\n//      float w2[64][10];\n//#pragma scop\n//      for (int j1 = 0; j1 < 10; ++j1) {\n//        L2[0][j1] = 0.000000e+00f;\n//        for (int k1 = 0; k1 < 64; ++k1) {\n//          L2[0][j1] = (L2[0][j1] + (FL[0][k1] * w2[k1][j1]));\n//        }\n//      }\n//#pragma endscop\n//      printf(\"%f\", L2[0][0]);\n//      printf(\"%f\", FL[0][0]);\n//      printf(\"%f\", w2[0][0]);\n//}"
  },
  {
    "path": "autosa_tests/mm_hcl/kernel.h",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <math.h>\n\ntypedef float data_t;\n#define I 64\n#define J 64\n#define K 64\n"
  },
  {
    "path": "autosa_tests/mm_hcl/simd_info.json",
    "content": "{\n  \"kernel0\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel1\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel2\": {\n    \"reduction\": [\"y\"]\n  }, \n  \"kernel3\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel4\": {\n    \"reduction\": [\"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/mm_hcl_intel/Makefile",
    "content": "APP ?= kernel\nAOCL_BOARD ?= s10mx_hbm_es\nSW_EMU_AOCX ?= $(APP)_sw_emu.aocx\nHW_EMU_AOCX ?= $(APP)_hw_emu.aocx\nHW_AOCX ?= $(APP)_hw.aocx\nAOCO ?= $(APP).aoco\nAOCR ?= $(APP).aocr\n\n# Compiler\nAOC ?= aoc\nCXX ?= g++\nAOC_FLAGS ?= -board=$(AOCL_BOARD) -fp-relaxed -report -hyper-optimized-handshaking=off -I $(INTELFPGAOCLSDKROOT)/include/kernel_headers\n\nTARGET ?= host\nSW_EMU_TARGET ?= host_sw_emu\nTARGET_DIR ?= bin\nAOCL_UTILS ?= $(INTELFPGAOCLSDKROOT)/examples_aoc/common\n\n# Directories\nINC_DIRS := src $(AOCL_UTILS)/inc\nLIB_DIRS := \n\n# Files\nINCS := $(wildcard src/*.h)\nHOST_SRCS := $(wildcard src/$(APP)_host.cpp $(AOCL_UTILS)/src/AOCLUtils/*.cpp)\nKERNEL_SRCS := src/$(APP)_kernel.cl\n\nifeq ($(VERBOSE),1)\nECHO := \nelse\nECHO := @\nendif\n\n# Where is the Intel(R) FPGA SDK for OpenCL(TM) software?\nifeq ($(wildcard $(INTELFPGAOCLSDKROOT)),)\n$(error Set INTELFPGAOCLSDKROOT to the root directory of the Intel(R) FPGA SDK for OpenCL(TM) software installation)\nendif\nifeq ($(wildcard $(INTELFPGAOCLSDKROOT)/host/include/CL/opencl.h),)\n$(error Set INTELFPGAOCLSDKROOT to the root directory of the Intel(R) FPGA SDK for OpenCL(TM) software installation.)\nendif\n\n# OpenCL compile and link flags.\nAOCL_COMPILE_CONFIG := $(shell aocl compile-config )\nAOCL_LINK_LIBS := $(shell aocl ldlibs )\nAOCL_LINK_FLAGS := $(shell aocl ldflags )\n# Linking with defences enabled\nAOCL_LINK_FLAGS += -z noexecstack\nAOCL_LINK_FLAGS += -Wl,-z,relro,-z,now\nAOCL_LINK_FLAGS += -Wl,-Bsymbolic\nAOCL_LINK_FLAGS += -pie\nAOCL_LINK_CONFIG := $(AOCL_LINK_FLAGS) $(AOCL_LINK_LIBS)\n\n# Compilation flags\nifeq ($(DEBUG),1)\nCXXFLAGS += -g\nelse\nCXXFLAGS += -O2\nendif\nCXXFLAGS += -std=gnu++0x\n\n# Compiling with defences enabled\nCXXFLAGS += -fstack-protector\nCXXFLAGS += -D_FORTIFY_SOURCE=2\nCXXFLAGS += -Wformat -Wformat-security\nCXXFLAGS += -fPIE\n\n# We must force GCC to never assume that it can shove in its own\n# sse2/sse3 versions of strlen and strcmp because they will CRASH.\n# Very hard to debug!\nCXXFLAGS += -fPIC\n\nLIBS := rt pthread\n\n## Make it all!\n#all : $(TARGET_DIR)/$(TARGET)\n\nsw_emu : $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(SW_EMU_AOCX)\n\nhls: $(TARGET_DIR)/$(AOCR)\n\nhw : $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(HW_AOCX)\n\nhw_emu: $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(HW_EMU_AOCX)\n\nhw_emu_check: $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(HW_EMU_AOCX)\n\tCL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 $(TARGET_DIR)/$(TARGET) $(HW_EMU_AOCX)\n\nsw_emu_check : $(TARGET_DIR)/$(SW_EMU_TARGET) $(TARGET_DIR)/$(SW_EMU_AOCX)\n\tCL_CONTEXT_EMULATOR_DEVICE_INTELFPGA=1 $(TARGET_DIR)/$(TARGET) $(SW_EMU_AOCX)\n\nhw_check : $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(HW_AOCX)\n\t$(TARGET_DIR)/$(TARGET) $(HW_AOCX)\n\n# Host executable target.\n$(TARGET_DIR)/$(TARGET) : Makefile $(HOST_SRCS) $(INCS) $(TARGET_DIR)\n\t$(ECHO)$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(EXTRACXXFLAGS) -fPIC $(foreach D,$(INC_DIRS),-I$D) \\\n\t\t\t$(AOCL_COMPILE_CONFIG) $(HOST_SRCS) $(AOCL_LINK_CONFIG) \\\n\t\t\t$(foreach D,$(LIB_DIRS),-L$D) \\\n\t\t\t$(foreach L,$(LIBS),-l$L) \\\n\t\t\t-o $(TARGET_DIR)/$(TARGET)\n\n$(TARGET_DIR)/$(SW_EMU_TARGET) : Makefile $(HOST_SRCS) $(INCS) $(TARGET_DIR)\n\t$(ECHO)$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(EXTRACXXFLAGS) -fPIC $(foreach D,$(INC_DIRS),-I$D) \\\n\t\t\t$(AOCL_COMPILE_CONFIG) $(HOST_SRCS) $(AOCL_LINK_CONFIG) \\\n\t\t\t$(foreach D,$(LIB_DIRS),-L$D) \\\n\t\t\t$(foreach L,$(LIBS),-l$L) \\\n\t\t\t-o $(TARGET_DIR)/$(TARGET) -DEMULATE\n\n$(TARGET_DIR) :\n\t$(ECHO)mkdir $(TARGET_DIR)\n\n$(TARGET_DIR)/$(SW_EMU_AOCX) : $(KERNEL_SRCS)\n\t$(AOC) $(AOC_FLAGS) -march=emulator -legacy-emulator -o $@ $^\n\n$(TARGET_DIR)/$(HW_EMU_AOCX) : $(KERNEL_SRCS)\n\t$(AOC) $(AOC_FLAGS) -march=simulator -ghdl -o $@ $^\n\n$(TARGET_DIR)/$(HW_AOCX) : $(KERNEL_SRCS)\n\t$(AOC) $(AOC_FLAGS) -o $@ $^\n\n$(TARGET_DIR)/$(AOCO) : $(KERNEL_SRCS)\n\t$(AOC) $(AOC_FLAGS) -c -o $@ $^\n\n$(TARGET_DIR)/$(AOCR) : $(TARGET_DIR)/$(AOCO)\n\t$(AOC) $(AOC_FLAGS) -rtl -o $@ $^\n\n# Standard make targets\nclean :\n\t$(ECHO)rm -rf $(TARGET_DIR)/*\n\n.PHONY : all clean\n"
  },
  {
    "path": "autosa_tests/mm_hcl_intel/README.md",
    "content": "# Matrix Multiplication (Small)\n\nBoard        | Software Version\n-------------|-----------------\nStratix 10 | Intel FPGA SDK for OpenCL 19.4\n\n__Files__:\n```\nautosa_tests/mm_hcl_intel/kernel.c\nautosa_tests/mm_hcl_intel/kernel.h\nautosa_tests/mm_hcl_intel/simd_info.json\nautosa_tests/mm_hcl_intel/Makefile\n```\n\n__Command__:\nThis is an internal test example for HeteroCL integration.\n\n## Example 1\n\n```c\n./autosa ./autosa_tests/mm_hcl_intel/kernel.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_opencl \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n--simd-info=./autosa_tests/mm_hcl_intel/simd_info.json \\\n--host-serialize \\\n--loop-infinitize \\\n--double-buffer-style=0 \\\n--mem-port-map=\"{kernel[]->A[0];kernel[]->B[1];kernel[]->C[2]}\" \\\n--hcl\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/mm/Makefile autosa.tmp/output/\n```\n\nExecute the makefile to perform software emulation\n```\nmake sw_emu_check\n```\nor synthesize the design to RTL\n```\nmake hls\n```\nor generate the bitstream\n```\nmake hw\n```\n\n## Example 2\n\n```c\n./autosa ./autosa_tests/mm_hcl_intel/kernel2.c \\\n--config=./autosa_config/autosa_config.json \\\n--target=autosa_opencl \\\n--output-dir=./autosa.tmp/output \\\n--sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[32,32,512];kernel[]->\nlatency[8,8];kernel[]->simd[1]}\" \\\n--simd-info=./autosa_tests/mm_hcl_intel/simd_info.json \\\n--host-serialize \\\n--loop-infinitize \\\n--double-buffer-style=0 \\\n--hcl\n```"
  },
  {
    "path": "autosa_tests/mm_hcl_intel/kernel.c",
    "content": "#include \"kernel.h\"\n\nint main(int argc, char **argv) {\n//  data_t A[I][K], B[K][J], C[I][J], C_golden[I][J]; \n  data_t A[I][K], B[J][K], C[I][J], C_golden[I][J];\n\n  for (int i = 0; i < I; i++) \n    for (int k = 0; k < K; k++) {\n      A[i][k] = k;\n    }\n\n  for (int j = 0; j < J; j++)\n    for (int k = 0; k < K; k++) {\n      B[j][k] = k;\n    }\n\n#pragma scop\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n        C[i][j] = C[i][j] + A[i][k] * B[j][k];\n      }\n    }\n#pragma endscop\n\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C_golden[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n        C_golden[i][j] = C_golden[i][j] + A[i][k] * B[j][k];\n      }\n    }\n\n  int err = 0;\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      if (fabs((float)C_golden[i][j] - (float)C[i][j]) > 0.001)\n        err++;\n    }\n\n  if (err)\n    printf(\"Failed with %d errors!\\n\", err);\n  else\n    printf(\"Passed!\\n\");\n\n  return 0;\n}\n"
  },
  {
    "path": "autosa_tests/mm_hcl_intel/kernel.h",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <math.h>\n\ntypedef float data_t;\n#define I 64\n#define J 64\n#define K 64\n"
  },
  {
    "path": "autosa_tests/mm_hcl_intel/kernel2.c",
    "content": "#include <stdio.h>\nint main(int argc, char **argv) {\n  static float Y0[1024][1024];\n  static float A[1024][1024];\n  static float B[1024][1024];\n\n#pragma scop\n  for (int i = 0; i < 1024; ++i) {\n    for (int j = 0; j < 1024; ++j) {\n      Y0[i][j] = 0.000000e+00f;\n      for (int k = 0; k < 1024; ++k) {\n        Y0[i][j] = (Y0[i][j] + (A[i][k] * B[j][k]));\n      }\n    }\n  }\n#pragma endscop\n\n  printf(\"%f\", Y0[0][0]);\n  printf(\"%f\", A[0][0]);\n  printf(\"%f\", B[0][0]);\n}"
  },
  {
    "path": "autosa_tests/mm_hcl_intel/simd_info.json",
    "content": "{\n  \"kernel0\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel1\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel2\": {\n    \"reduction\": [\"y\"]\n  }, \n  \"kernel3\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel4\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel5\": {\n    \"reduction\": [\"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/mm_int16/Makefile",
    "content": "VPP := $(XILINX_VITIS)/bin/v++\nEMCONFIGUTIL := $(XILINX_VITIS)/bin/emconfigutil\nMODE := hw\n#PLATFORM := xilinx_u200_qdma_201920_1\nPLATFORM := xilinx_u250_xdma_201830_2\n\n# sources\nKERNEL_SRC := src/kernel_kernel.cpp\nHOST_SRC := src/kernel_host.cpp\n\n# targets\nHOST_EXE := host.exe\n\nXOS := kernel0.$(MODE).xo\nXCLBIN := kernel0.$(MODE).xclbin\nEMCONFIG_FILE := emconfig.json\n\n# Linker options to map kernel ports to DDR banks\nVPP_LINK_OPTS := --config connectivity.cfg\n\nVPP_COMMON_OPTS := -s -t $(MODE) --platform $(PLATFORM) -R2 -O3 --kernel_frequency 250 --vivado.prop=run.impl_1.STRATEGY=Performance_EarlyBlockPlacement\nCFLAGS := -g -std=c++11 -I$(XILINX_XRT)/include\nLFLAGS := -L$(XILINX_XRT)/lib -lxilinxopencl -lpthread -lrt\nNUMDEVICES := 1\n\n# run time args\nEXE_OPT := kernel0.$(MODE).xclbin\n\n# primary build targets\n.PHONY: xclbin app all\n\nxclbin:  $(XCLBIN)\napp: $(HOST_EXE)\n\nall: xclbin app\n\nclean:\n\t-$(RM) $(EMCONFIG_FILE) $(HOST_EXE) $(XCLBIN) *.xclbin *.xo $(XOS)\n\n# kernel rules\n$(XOS): $(KERNEL_SRC)\n\t$(RM) $@\n\t$(VPP) $(VPP_COMMON_OPTS) -c -k kernel0 -o $@ $+\n\n\n$(XCLBIN): $(XOS)\n\t$(VPP) $(VPP_COMMON_OPTS) -l -o $@ $+ $(VPP_LINK_OPTS)\n\n# host rules\n$(HOST_EXE): $(HOST_SRC)\n\tg++ $(CFLAGS) -o $@ $+ $(LFLAGS)\n\t@echo 'Compiled Host Executable: $(HOST_EXE)'\n\n$(EMCONFIG_FILE):\n\t$(EMCONFIGUTIL) --nd $(NUMDEVICES) --od . --platform $(PLATFORM)\n\ncheck: $(XCLBIN) $(HOST_EXE) $(EMCONFIG_FILE)\n\tXCL_EMULATION_MODE=${MODE} ./$(HOST_EXE) $(EXE_OPT)\n"
  },
  {
    "path": "autosa_tests/mm_int16/README.md",
    "content": "# Matrix Multiplication in int16 (Small) \n\nBoard        | Software Version\n-------------|-----------------\nXilinx Alveo U250 | Xilinx Vitis 2019.2\n\n__Files__:\n```\nautosa_tests/mm_int16/kernel.c\nautosa_tests/mm_int16/kernel.h\nautosa_tests/mm_int16/simd_info.json\nautosa_tests/mm_int16/Makefile\nautosa_tests/mm_int16/connectivity.cfg\nautosa_tests/mm_int16/hls_script.tcl\n```\n\n__Command__:\nTo run the HLS flow for C/RTL simulation\n```bash\n./autosa ./autosa_tests/mm_int16/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" --simd-info=./autosa_tests/mm_int16/simd_info.json --host-serialize --hls\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `hls_script.tcl` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/mm_int16/hls_script.tcl autosa.tmp/output/\n```\n\nRun the TCL script to build the HLS project.\n\n```\ncd autosa.tmp/output\nvivado_hls -f hls_script.tcl\n```\n\nAlternatively, if you need to generate the bitstream for on-board testing, simply remove the `--hls` flag from the AutoSA command.\n```bash\n./autosa ./autosa_tests/mm_int16/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" --simd-info=./autosa_tests/mm_int16/simd_info.json --host-serialize\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` and `connectivity.cfg` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/mm_int16/Makefile autosa.tmp/output/\ncp autosa_tests/mm_int16/connectivity.cfg autosa.tmp/output/\n```\n\nExecute the makefile to build the design.\n\n```\ncd autosa.tmp/output\nmake all\nmake check\n```"
  },
  {
    "path": "autosa_tests/mm_int16/connectivity.cfg",
    "content": "[connectivity]\nsp=kernel0_1.A:DDR[0]\nsp=kernel0_1.B:DDR[1] \nsp=kernel0_1.C:DDR[2]\n"
  },
  {
    "path": "autosa_tests/mm_int16/hls_script.tcl",
    "content": "############################################################\n## This file is generated automatically by Vivado HLS.\n## Please DO NOT edit it.\n## Copyright (C) 1986-2019 Xilinx, Inc. All Rights Reserved.\n############################################################\nopen_project hls_prj\nset_top kernel0\nadd_files src/kernel_kernel.h\nadd_files src/kernel_kernel.cpp\nadd_files -tb src/kernel_host.cpp\nopen_solution \"solution1\"\nset_part {xcu200-fsgd2104-2-e}\ncreate_clock -period 5 -name default\nconfig_compile -name_max_length 50\n#source \"./prj/solution1/directives.tcl\"\ncsim_design\n#csynth_design\n#cosim_design\n#cosim_design -trace_level all\n#cosim_design -setup -trace_level all\n#export_design -format ip_catalog\nexit\n"
  },
  {
    "path": "autosa_tests/mm_int16/kernel.c",
    "content": "#include \"kernel.h\"\n\nint main(int argc, char **argv) {\n//  data_t A[I][K], B[K][J], C[I][J], C_golden[I][J]; \n  data_t A[I][K], B[J][K], C[I][J], C_golden[I][J]; // gemm0,3\n//  data_t A[K][I], B[K][J], C[I][J], C_golden[I][J]; // gemm4\n\n  for (int i = 0; i < I; i++) \n    for (int k = 0; k < K; k++) {\n      A[i][k] = k;\n//      A[k][i] = k;\n    }\n\n  for (int j = 0; j < J; j++)\n    for (int k = 0; k < K; k++) {\n      B[j][k] = k;\n//      B[k][j] = k;\n    }\n\n#pragma scop\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n        C[i][j] = C[i][j] + A[i][k] * B[j][k];\n//        C[i][j] = C[i][j] + A[k][i] * B[k][j];\n      }\n    }\n#pragma endscop\n\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C_golden[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n        C_golden[i][j] = C_golden[i][j] + A[i][k] * B[j][k];\n//        C_golden[i][j] = C_golden[i][j] + A[k][i] * B[k][j];\n      }\n    }\n\n  int err = 0;\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      if (fabs((float)C_golden[i][j] - (float)C[i][j]) > 0.001)\n        err++;\n    }\n\n  if (err)\n    printf(\"Failed with %d errors!\\n\", err);\n  else\n    printf(\"Passed!\\n\");\n\n  return 0;\n}\n"
  },
  {
    "path": "autosa_tests/mm_int16/kernel.h",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <math.h>\n\ntypedef unsigned short data_t;\n#define I 64\n#define J 64\n#define K 64\n"
  },
  {
    "path": "autosa_tests/mm_int16/simd_info.json",
    "content": "{\n  \"kernel0\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel1\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel2\": {\n    \"reduction\": [\"y\"]\n  }, \n  \"kernel3\": {\n    \"reduction\": [\"y\"]\n  },\n  \"kernel4\": {\n    \"reduction\": [\"y\"]\n  }\n}\n"
  },
  {
    "path": "autosa_tests/mm_intel/Makefile",
    "content": "APP ?= kernel\nAOCL_BOARD ?= s10mx_hbm_es\nSW_EMU_AOCX ?= $(APP)_sw_emu.aocx\nHW_EMU_AOCX ?= $(APP)_hw_emu.aocx\nHW_AOCX ?= $(APP)_hw.aocx\nAOCO ?= $(APP).aoco\nAOCR ?= $(APP).aocr\n\n# Compiler\nAOC ?= aoc\nCXX ?= g++\nAOC_FLAGS ?= -board=$(AOCL_BOARD) -fp-relaxed -report -hyper-optimized-handshaking=off -I $(INTELFPGAOCLSDKROOT)/include/kernel_headers\n\nTARGET ?= host\nSW_EMU_TARGET ?= host_sw_emu\nTARGET_DIR ?= bin\nAOCL_UTILS ?= $(INTELFPGAOCLSDKROOT)/examples_aoc/common\n\n# Directories\nINC_DIRS := src $(AOCL_UTILS)/inc\nLIB_DIRS := \n\n# Files\nINCS := $(wildcard src/*.h)\nHOST_SRCS := $(wildcard src/$(APP)_host.cpp $(AOCL_UTILS)/src/AOCLUtils/*.cpp)\nKERNEL_SRCS := src/$(APP)_kernel.cl\n\nifeq ($(VERBOSE),1)\nECHO := \nelse\nECHO := @\nendif\n\n# Where is the Intel(R) FPGA SDK for OpenCL(TM) software?\nifeq ($(wildcard $(INTELFPGAOCLSDKROOT)),)\n$(error Set INTELFPGAOCLSDKROOT to the root directory of the Intel(R) FPGA SDK for OpenCL(TM) software installation)\nendif\nifeq ($(wildcard $(INTELFPGAOCLSDKROOT)/host/include/CL/opencl.h),)\n$(error Set INTELFPGAOCLSDKROOT to the root directory of the Intel(R) FPGA SDK for OpenCL(TM) software installation.)\nendif\n\n# OpenCL compile and link flags.\nAOCL_COMPILE_CONFIG := $(shell aocl compile-config )\nAOCL_LINK_LIBS := $(shell aocl ldlibs )\nAOCL_LINK_FLAGS := $(shell aocl ldflags )\n# Linking with defences enabled\nAOCL_LINK_FLAGS += -z noexecstack\nAOCL_LINK_FLAGS += -Wl,-z,relro,-z,now\nAOCL_LINK_FLAGS += -Wl,-Bsymbolic\nAOCL_LINK_FLAGS += -pie\nAOCL_LINK_CONFIG := $(AOCL_LINK_FLAGS) $(AOCL_LINK_LIBS)\n\n# Compilation flags\nifeq ($(DEBUG),1)\nCXXFLAGS += -g\nelse\nCXXFLAGS += -O2\nendif\nCXXFLAGS += -std=gnu++0x\n\n# Compiling with defences enabled\nCXXFLAGS += -fstack-protector\nCXXFLAGS += -D_FORTIFY_SOURCE=2\nCXXFLAGS += -Wformat -Wformat-security\nCXXFLAGS += -fPIE\n\n# We must force GCC to never assume that it can shove in its own\n# sse2/sse3 versions of strlen and strcmp because they will CRASH.\n# Very hard to debug!\nCXXFLAGS += -fPIC\n\nLIBS := rt pthread\n\n## Make it all!\n#all : $(TARGET_DIR)/$(TARGET)\n\nsw_emu : $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(SW_EMU_AOCX)\n\nhls: $(TARGET_DIR)/$(AOCR)\n\nhw : $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(HW_AOCX)\n\nhw_emu: $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(HW_EMU_AOCX)\n\nhw_emu_check: $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(HW_EMU_AOCX)\n\tCL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 $(TARGET_DIR)/$(TARGET) $(HW_EMU_AOCX)\n\nsw_emu_check : $(TARGET_DIR)/$(SW_EMU_TARGET) $(TARGET_DIR)/$(SW_EMU_AOCX)\n\tCL_CONTEXT_EMULATOR_DEVICE_INTELFPGA=1 $(TARGET_DIR)/$(TARGET) $(SW_EMU_AOCX)\n\nhw_check : $(TARGET_DIR)/$(TARGET) $(TARGET_DIR)/$(HW_AOCX)\n\t$(TARGET_DIR)/$(TARGET) $(HW_AOCX)\n\n# Host executable target.\n$(TARGET_DIR)/$(TARGET) : Makefile $(HOST_SRCS) $(INCS) $(TARGET_DIR)\n\t$(ECHO)$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(EXTRACXXFLAGS) -fPIC $(foreach D,$(INC_DIRS),-I$D) \\\n\t\t\t$(AOCL_COMPILE_CONFIG) $(HOST_SRCS) $(AOCL_LINK_CONFIG) \\\n\t\t\t$(foreach D,$(LIB_DIRS),-L$D) \\\n\t\t\t$(foreach L,$(LIBS),-l$L) \\\n\t\t\t-o $(TARGET_DIR)/$(TARGET)\n\n$(TARGET_DIR)/$(SW_EMU_TARGET) : Makefile $(HOST_SRCS) $(INCS) $(TARGET_DIR)\n\t$(ECHO)$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(EXTRACXXFLAGS) -fPIC $(foreach D,$(INC_DIRS),-I$D) \\\n\t\t\t$(AOCL_COMPILE_CONFIG) $(HOST_SRCS) $(AOCL_LINK_CONFIG) \\\n\t\t\t$(foreach D,$(LIB_DIRS),-L$D) \\\n\t\t\t$(foreach L,$(LIBS),-l$L) \\\n\t\t\t-o $(TARGET_DIR)/$(TARGET) -DEMULATE\n\n$(TARGET_DIR) :\n\t$(ECHO)mkdir $(TARGET_DIR)\n\n$(TARGET_DIR)/$(SW_EMU_AOCX) : $(KERNEL_SRCS)\n\t$(AOC) $(AOC_FLAGS) -march=emulator -legacy-emulator -o $@ $^\n\n$(TARGET_DIR)/$(HW_EMU_AOCX) : $(KERNEL_SRCS)\n\t$(AOC) $(AOC_FLAGS) -march=simulator -ghdl -o $@ $^\n\n$(TARGET_DIR)/$(HW_AOCX) : $(KERNEL_SRCS)\n\t$(AOC) $(AOC_FLAGS) -o $@ $^\n\n$(TARGET_DIR)/$(AOCO) : $(KERNEL_SRCS)\n\t$(AOC) $(AOC_FLAGS) -c -o $@ $^\n\n$(TARGET_DIR)/$(AOCR) : $(TARGET_DIR)/$(AOCO)\n\t$(AOC) $(AOC_FLAGS) -rtl -o $@ $^\n\n# Standard make targets\nclean :\n\t$(ECHO)rm -rf $(TARGET_DIR)/*\n\n.PHONY : all clean\n"
  },
  {
    "path": "autosa_tests/mm_intel/README.md",
    "content": "# Matrix Multiplication (Small)\n\nBoard        | Software Version\n-------------|-----------------\nStratix 10 | Intel FPGA SDK for OpenCL 19.4\n\n__Files__:\n```\nautosa_tests/mm_intel/kernel.c\nautosa_tests/mm_intel/kernel.h\nautosa_tests/mm_intel/simd_info.json\nautosa_tests/mm_intel/Makefile\n```\n\n__Command__:\n```c\n./autosa ./autosa_tests/mm_intel/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_opencl --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->array_part_L2[2,2,2];kernel[]->latency[8,8];kernel[]->simd[2]}\" --simd-info=./autosa_tests/mm_intel/simd_info.json --host-serialize --loop-infinitize --double-buffer-style=0 --mem-port-map=\"{kernel[]->A[0];kernel[]->B[1];kernel[]->C[2]}\"\n```\n\nAfter compilation, you will find all generated files under the directory `autosa.tmp/output/src`. Copy the `Makefile` to the directory `autosa.tmp/output`.\n\n```\ncp autosa_tests/mm/Makefile autosa.tmp/output/\n```\n\nExecute the makefile to perform software emulation\n```\nmake sw_emu_check\n```\nor synthesize the design to RTL\n```\nmake hls\n```\nor generate the bitstream\n```\nmake hw\n```\n"
  },
  {
    "path": "autosa_tests/mm_intel/kernel.c",
    "content": "#include \"kernel.h\"\n\nint main(int argc, char **argv) {\n//  data_t A[I][K], B[K][J], C[I][J], C_golden[I][J]; \n  data_t A[I][K], B[J][K], C[I][J], C_golden[I][J];\n\n  for (int i = 0; i < I; i++) \n    for (int k = 0; k < K; k++) {\n      A[i][k] = k;\n    }\n\n  for (int j = 0; j < J; j++)\n    for (int k = 0; k < K; k++) {\n      B[j][k] = k;\n    }\n\n#pragma scop\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n        C[i][j] = C[i][j] + A[i][k] * B[j][k];\n      }\n    }\n#pragma endscop\n\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      C_golden[i][j] = 0;\n      for (int k = 0; k < K; k++) {\n        C_golden[i][j] = C_golden[i][j] + A[i][k] * B[j][k];\n      }\n    }\n\n  int err = 0;\n  for (int i = 0; i < I; i++)\n    for (int j = 0; j < J; j++) {\n      if (fabs((float)C_golden[i][j] - (float)C[i][j]) > 0.001)\n        err++;\n    }\n\n  if (err)\n    printf(\"Failed with %d errors!\\n\", err);\n  else\n    printf(\"Passed!\\n\");\n\n  return 0;\n}\n"
  },
  {
    "path": "autosa_tests/mm_intel/kernel.h",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include <math.h>\n\ntypedef float data_t;\n#define I 64\n#define J 64\n#define K 64\n"
  },
  {
    "path": "autosa_tests/mm_intel/simd_info.json",
    "content": "{\n  \"kernel3\": {\n    \"reduction\": [\"y\"]\n  }\n}\n"
  },
  {
    "path": "clean.sh",
    "content": "#!/bin/sh\nrm ./autosa\nrm -rf ./autosa.tmp\ncd src\nmake clean\ncd -\n"
  },
  {
    "path": "docs/Makefile",
    "content": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line, and also\n# from the environment for the first two.\nSPHINXOPTS    ?=\nSPHINXBUILD   ?= sphinx-build\nSOURCEDIR     = .\nBUILDDIR      = _build\n\n# Put it first so that \"make\" without argument is like \"make help\".\nhelp:\n\t@$(SPHINXBUILD) -M help \"$(SOURCEDIR)\" \"$(BUILDDIR)\" $(SPHINXOPTS) $(O)\n\n.PHONY: help Makefile\n\n# Catch-all target: route all unknown targets to Sphinx using the new\n# \"make mode\" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).\n%: Makefile\n\t@$(SPHINXBUILD) -M $@ \"$(SOURCEDIR)\" \"$(BUILDDIR)\" $(SPHINXOPTS) $(O)\n"
  },
  {
    "path": "docs/conf.py",
    "content": "# Configuration file for the Sphinx documentation builder.\n#\n# This file only contains a selection of the most common options. For a full\n# list see the documentation:\n# https://www.sphinx-doc.org/en/master/usage/configuration.html\n\n# -- Path setup --------------------------------------------------------------\n\n# If extensions (or modules to document with autodoc) are in another directory,\n# add these directories to sys.path here. If the directory is relative to the\n# documentation root, use os.path.abspath to make it absolute, like shown here.\n#\n# import os\n# import sys\n# sys.path.insert(0, os.path.abspath('.'))\n\nimport sphinx_rtd_theme\n\n# -- Project information -----------------------------------------------------\n\nproject = 'AutoSA'\ncopyright = '2021, Jie Wang'\nauthor = 'Jie Wang'\n\n# The full version, including alpha/beta/rc tags\nrelease = '0.01'\n\n\n# -- General configuration ---------------------------------------------------\n\n# Add any Sphinx extension module names here, as strings. They can be\n# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom\n# ones.\nextensions = [\n        \"sphinx_rtd_theme\"\n]\n\n# Add any paths that contain templates here, relative to this directory.\ntemplates_path = ['_templates']\n\n# List of patterns, relative to source directory, that match files and\n# directories to ignore when looking for source files.\n# This pattern also affects html_static_path and html_extra_path.\nexclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']\n\n\n# -- Options for HTML output -------------------------------------------------\n\n# The theme to use for HTML and HTML Help pages.  See the documentation for\n# a list of builtin themes.\n#\nhtml_theme = 'sphinx_rtd_theme'\n\n# Add any paths that contain custom static files (such as style sheets) here,\n# relative to this directory. They are copied after the builtin static files,\n# so a file named \"default.css\" will overwrite the builtin \"default.css\".\nhtml_static_path = ['_static']\n"
  },
  {
    "path": "docs/docker_image.rst",
    "content": ".. _docker-image-label:\n\nDocker Image\n============\n\nWe provide a docker image to quickly try out the features of AutoSA.\n\nPull the Docker image using the following command.\n\n.. code:: bash\n    \n    docker pull whbldhwj/autosa:latest"
  },
  {
    "path": "docs/examples/cnn.rst",
    "content": "Convolutional Neural Network (Single Layer, Small)\n==================================================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nThis is an example of small-size CNN. \nThe design files can be found at ``${AUTOSA_ROOT}/autosa_tests/cnn``.\nThe testing environment is summarized in the table below.\n\n+--------------------------+-----------------------------------------------+\n| **Target FPGA**          | Xilinx Alveo U250                             |\n+--------------------------+-----------------------------------------------+\n| **FPGA Synthesis Tools** | Xilinx Vivado HLS 2019.2, Xilinx Vitis 2019.2 |\n+--------------------------+-----------------------------------------------+\n| **CPU**                  | Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz     |\n+--------------------------+-----------------------------------------------+\n\nC Simulation\n------------\n\nRun the following example command to generate one design with HLS host code.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/cnn/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[8,8,4,8];kernel[]->latency[4,2,4];kernel[]->simd[1,1,1,2]}\" \\\n    --simd-info=./autosa_tests/cnn/simd_info.json \\\n    --host-serialize \\\n    --no-reverse-order \\\n    --hls\n\nAfter compilation, you will find all generated files under the directory \n``${AUTOSA_ROOT}/autosa.tmp/output/src``. \nCopy the ``hls_script.tcl`` to the directory ``autosa.tmp/output``.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/cnn/hls_script.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n\nRun the TCL script to perform C simulation.\n\n.. code:: bash\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output/\n    vivado_hls -f hls_script.tcl\n\nYou should see ``Passed`` printed out in your terminal showing that \nC simulation is performed successfully.    \n\nBitstream Generation\n--------------------\n\nIf you need to generate the bitstream for on-board testing, simply remove the ``--hls``\nflag from the previous AutoSA command.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/cnn/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[8,8,4,8];kernel[]->latency[4,2,4];kernel[]->simd[1,1,1,2]}\" \\\n    --simd-info=./autosa_tests/cnn/simd_info.json \\\n    --host-serialize \\\n    --no-reverse-order\n\nNow instead of HLS host code, an OpenCL host code is generated.    \n\nWe have prepared a template Makefile for Xilinx Vitis tools.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/cnn/Makefile ${AUTOSA_ROOT}/autosa.tmp/output/\n    cp ${AUTOSA_ROOT}/autosa_tests/cnn/connectivity.cfg ${AUTOSA_ROOT}/autosa.tmp/output/\n\nSet the proper ``PLATFORM`` in the Makefile. \nBy default, we set it to ``xilinx_u250_xdma_201830_2``.\nYou may notice that we also copy a file ``connectivity.cfg`` here.\nThis file assigns the DDR bank mapping for the design. \nBy default, we map pointers A, B, C to DDR bank 0, 1, 2.\nLastly, modify the ``MODE`` in the Makefile for performing different tasks.\n\n* ``sw_emu``: C simulation\n* ``hw_emu``: RTL simulation\n* ``hw``: Bitstream generation\n\n.. note:: \n\n    When using Vitis flow to perform RTL simulation, nothing needs to change in the source code.\n    You may directly set the ``MODE`` to ``hw_emu`` and perform RTL simulation.\n    However, by default, we will run the kernel 10 times to collect the average runtime.\n    This may significantly prolong the simulation time. Consider reducing the kernel\n    launching times to 1 before using RTL simulation.\n\nTo generate the bitstream, set the ``MODE`` to ``hw`` and use the command below.\n\n.. code:: bash\n\n    make all\n\nIt will take a few hours to finish. After the bitstream is generated,\nuse the following command to run it on-board.    \n\n.. code:: bash\n\n    make check\n\nDataflow Exploration\n--------------------\n\nSimialar to GEMM example, we provide a more detailed discussion of different \ndataflows for this application generated by AutoSA. T\nThe parameters used in this program include: \n\n* `o`, `i`: output/input channel\n* `r`, `c`: output image row/column\n* `p`, `q`: kernel height/width\n\nArray 1: [o]\n^^^^^^^^^^^^\n\nThis is an output-stationary array that chooses the loop o as the space loop.\nThe input feature map cin is reused across PEs, weights w are sent directly to each PE.\nData are computed locally and drained out from each PE.\n\n.. image:: images/cnn0_array.png\n    :width: 300\n    :align: center\n\nHere is an example command for this design.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/cnn/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[0];kernel[]->array_part[8,4,4,8];kernel[]->latency[4,2,4];kernel[]->simd[1,1,1,2]}\" \\\n    --simd-info=./autosa_tests/cnn/simd_info.json \\\n    --host-serialize \\\n    --hls    \n\nArray 2: [r]\n^^^^^^^^^^^^\n\nThis is an output-stationary array that chooses the loop r as the space loop.\nThe wights w is reused across PEs, input feature maps cin are sent directly to each PE.\nData are computed locally and drained out from each PE.\n\n.. image:: images/cnn1_array.png\n    :width: 300\n    :align: center\n\nHere is an example command for this design.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/cnn/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[1];kernel[]->array_part[4,8,4,8];kernel[]->latency[2,4,2];kernel[]->simd[1,1,1,2]}\" \\\n    --simd-info=./autosa_tests/cnn/simd_info.json \\\n    --host-serialize \\\n    --hls        \n\nArray 3: [c]\n^^^^^^^^^^^^\n\nThis is an output-stationary array that chooses the loop c as the space loop.\nThe weights and input feature maps are sent directly to each PE.\nData are computed locally and drained out from each PE.\n\n.. image:: images/cnn2_array.png\n    :width: 300\n    :align: center\n\nHere is an example command for this design.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/cnn/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[2];kernel[]->array_part[4,8,4,8];kernel[]->latency[2,4,2];kernel[]->simd[1,1,1,2]}\" \\\n    --simd-info=./autosa_tests/cnn/simd_info.json \\\n    --host-serialize \\\n    --hls\n\nIn this design, weights are sent directly to each PE. This is due to the reason that \nAutoSA uses the data reuse along the r-axis of the weight access. \nAs can be found in the printed compilation information on the screen, there are two reuse \nvector candidates for the weight access ``w[o][i][p][q]``.\n\n.. image:: images/cnn_w_reuse.png\n    :width: 800\n    :align: center\n\nBy default, AutoSA chose the first candidate that reuse the data along the r-axis.\nYou may alter this choice by supplying the argument ``--select-rar-dep=\"{kernel[]->__pet_ref_4[1]}\"``.\nHere, we instruct AutoSA to select the candidate 1 for the array reference ``__pet_ref_4``.\n``__pet_ref_4`` is the unique ID the polyhedral front-end assigned to this reference.\nUsing the following command, we could generate a different array that reuses the \nweights across PEs.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/cnn/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[2];kernel[]->array_part[4,8,4,8];kernel[]->latency[2,4,2];kernel[]->simd[1,1,1,2]}\" \\\n    --simd-info=./autosa_tests/cnn/simd_info.json \\\n    --host-serialize \\\n    --hls \\\n    --select-rar-dep=\"{kernel[]->__pet_ref_4[1]}\"\n\n.. image:: images/cnn2_2_array.png\n    :width: 300\n    :align: center    \n\nArray 4: [i]\n^^^^^^^^^^^^    \n\nThis is an input-stationary array that chooses the loop i as the space loop.\nThe weights and input feature maps are sent directly to each PE.\nData are accumulated across PEs.\n\n.. image:: images/cnn3_array.png\n    :width: 300\n    :align: center    \n\n.. code:: bash\n\n    ./autosa ./autosa_tests/cnn/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[4,8,4,4];kernel[]->latency[2,2,2];kernel[]->simd[1,1,2]}\" \\\n    --simd-info=./autosa_tests/cnn/simd_info.json \\\n    --host-serialize \\\n    --hls \\\n    --local-reduce \\\n    --reduce-op=\"+\" \\\n    --simd-touch-space\n\nArray 5: [o,r]\n^^^^^^^^^^^^^^\n\nThis is an output-stationary array that chooses the loop o and r as the space loops.\nThe weights are reused horizontally, and the input feature maps are reused vertically.\n\n.. image:: images/cnn4_array.png\n    :width: 300\n    :align: center    \n\n.. code:: bash\n\n    ./autosa ./autosa_tests/cnn/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[8,4,4,8];kernel[]->latency[4,2,2];kernel[]->simd[1,1,1,2]}\" \\\n    --simd-info=./autosa_tests/cnn/simd_info.json \\\n    --host-serialize \\\n    --hls\n\nArray 6: [o,c]\n^^^^^^^^^^^^^^    \n\nThis array is similar to array 5.\nWe could also add the additional argument as array 3 \nto choose a better reuse vector for weights to exploit more data reuse.\n\n.. image:: images/cnn5_array.png\n    :width: 300\n    :align: center    \n\n.. code:: bash\n\n    ./autosa ./autosa_tests/cnn/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[5];kernel[]->array_part[8,4,4,8];kernel[]->latency[4,2,2];kernel[]->simd[1,1,1,2]}\" \\\n    --simd-info=./autosa_tests/cnn/simd_info.json \\\n    --host-serialize \\\n    --hls \\\n    --select-rar-dep=\"{kernel[]->__pet_ref_4[1]}\"\n\nArray 7: [o,i]\n^^^^^^^^^^^^^^     \n\nThis is an input-stationary array.\nThe input feature maps are reused vertically. Weights are directly sent to each PE.\n\n.. image:: images/cnn6_array.png\n    :width: 300\n    :align: center    \n\n.. code:: bash\n\n    ./autosa ./autosa_tests/cnn/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[6];kernel[]->array_part[8,4,4,4];kernel[]->latency[2,2,4];kernel[]->simd[1,1,2]}\" \\\n    --simd-info=./autosa_tests/cnn/simd_info.json \\\n    --host-serialize \\\n    --hls \\\n    --local-reduce \\\n    --reduce-op=\"+\" \\\n    --simd-touch-space\n\nArray 8: [r,c]\n^^^^^^^^^^^^^^\n\nThis is an output-stationary array. Input feature maps are directly sent to each PE.\nWeights are reused vertically.\n\n.. image:: images/cnn7_array.png\n    :width: 300\n    :align: center  \n\n.. code:: bash\n\n    ./autosa ./autosa_tests/cnn/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[7];kernel[]->array_part[4,4,8,8];kernel[]->latency[2,2,2];kernel[]->simd[1,1,1,2]}\" \\\n    --simd-info=./autosa_tests/cnn/simd_info.json \\\n    --host-serialize \\\n    --hls\n\nArray 9: [r,i]\n^^^^^^^^^^^^^^ \n\nThis is an input stationary array.\nWeights are reused vertically. Input feature maps are sent to each PE.\n\n.. image:: images/cnn8_array.png\n    :width: 300\n    :align: center    \n\n.. code:: bash\n\n    ./autosa ./autosa_tests/cnn/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[8];kernel[]->array_part[4,8,8,4];kernel[]->latency[2,2,2];kernel[]->simd[1,1,2]}\" \\\n    --simd-info=./autosa_tests/cnn/simd_info.json \\\n    --host-serialize \\\n    --hls \\\n    --local-reduce \\\n    --reduce-op=\"+\" \\\n    --simd-touch-space\n\n\nArray 10: [c,i]\n^^^^^^^^^^^^^^^\n\nThis is an input stationary array. \nWeights are reused vertically. Input feature maps are sent to each PE.\n\n.. image:: images/cnn9_array.png\n    :width: 300\n    :align: center    \n\n.. code:: bash\n\n    ./autosa ./autosa_tests/cnn/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[9];kernel[]->array_part[4,8,8,4];kernel[]->latency[2,2,2];kernel[]->simd[1,1,2]}\" \\\n    --simd-info=./autosa_tests/cnn/simd_info.json \\\n    --host-serialize \\\n    --hls \\\n    --local-reduce \\\n    --reduce-op=\"+\" \\\n    --simd-touch-space \\\n    --select-rar-dep=\"{kernel[]->__pet_ref_4[1]}\""
  },
  {
    "path": "docs/examples/cnn_large.rst",
    "content": "Convolutional Neural Network (Single Layer, Large)\n==================================================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nThis is an example of large-size matrix multiplication.\nThe design files can be found at ``${AUTOSA_ROOT}/autosa_tests/large/cnn``.\nThe testing environment is summarized in the table below.\n\n+--------------------------+-----------------------------------------------+\n| **Target FPGA**          | Xilinx Alveo U250                             |\n+--------------------------+-----------------------------------------------+\n| **FPGA Synthesis Tools** | Xilinx Vivado HLS 2019.2, Xilinx Vitis 2019.2 |\n+--------------------------+-----------------------------------------------+\n| **CPU**                  | Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz     |\n+--------------------------+-----------------------------------------------+\n\nC Simulation\n------------\n\nRun the following example command to generate one design with HLS host code.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/large/cnn/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[64,56,14,64];kernel[]->latency[4,4,7];kernel[]->simd[1,1,8]}\" \\\n    --simd-info=./autosa_tests/large/cnn/simd_info.json \\\n    --host-serialize \\\n    --no-reverse-order \\\n    --hls\n\nAfter compilation, you will find all generated files under the directory \n``${AUTOSA_ROOT}/autosa.tmp/output/src``. \nCopy the ``hls_script.tcl`` to the directory ``autosa.tmp/output``.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/large/cnn/hls_script.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n\nRun the TCL script to perform C simulation.\n\n.. code:: bash\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output/\n    vivado_hls -f hls_script.tcl\n\nYou should see ``Passed`` printed out in your terminal showing that \nC simulation is performed successfully.   \n\nBitstream Generation\n--------------------\n\nIf you need to generate the bitstream for on-board testing, simply remove the ``--hls``\nflag from the previous AutoSA command.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/large/cnn/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[64,56,14,64];kernel[]->latency[4,4,7];kernel[]->simd[1,1,8]}\" \\\n    --simd-info=./autosa_tests/large/cnn/simd_info.json \\\n    --no-reverse-order \\\n    --host-serialize\n\nNow instead of HLS host code, an OpenCL host code is generated.   \n\nWe have prepared a template Makefile for Xilinx Vitis tools.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/large/cnn/Makefile ${AUTOSA_ROOT}/autosa.tmp/output/\n    cp ${AUTOSA_ROOT}/autosa_tests/large/cnn/connectivity.cfg ${AUTOSA_ROOT}/autosa.tmp/output/\n\nTo generate the bitstream, use the following command.\n\n.. code:: bash\n\n    make all\n\nAfter the bitstream is generated,\nuse the following command to run it on-board.    \n\n.. code:: bash\n\n    make check\n\n.. note::\n    \n    As this design is rather large, Vitis fails to successfully route the design on-board\n    in our experiment.\n    We will rely on AutoBridge to route this design.\n\nUsing AutoBridge to Boost Frequency\n-----------------------------------\n\nYou may also try to use `AutoBridge <https://github.com/Licheng-Guo/AutoBridge>`_ \nto boost the design frequency.\nWe cover how to use AutoBridge to improve the frequency in :ref:`use-autobridge-label`.\n\nThe reference AutoBridge scripts used for this example can be found at ``${AUTOSA_ROOT}/autosa_tests/large/cnn``.\n\nThe tables below show the detailed comparison results between the original design \n(unoptimized) and the design optimized with AutoBridge (optimized).\n\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Designs     | MHz | LUT             | REG              | BRAM         | DSP           |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Unoptimized | N/A | N/A             | N/A              | N/A          | N/A           |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Optimized   | 265 | 884520 (57.93%) | 1445020 (46.05%) | 697 (29.84%) | 8960 (72.99%) |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n\n+-------------+-----------------+---------------+---------+\n| Designs     | Kernel Time (s) | Host Time (s) | GFLOPs  |\n+-------------+-----------------+---------------+---------+\n| Unoptimized | N/A             | N/A           | N/A     |\n+-------------+-----------------+---------------+---------+\n| Optimized   | 0.015865        | 0.188105      | 932.714 |\n+-------------+-----------------+---------------+---------+"
  },
  {
    "path": "docs/examples/dnn_ops.rst",
    "content": "DNN Operators (Small)\n=====================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nWe demonstrate three operators using in the DNN, including:\ndepth-wise convolution, point-wise convolution, and fully-connected layers.\nThe design files can be found at ``${AUTOSA_ROOT}/autosa_tests/dnn_ops``.\nThe testing environment is summarized in the table below.\n\n+--------------------------+-----------------------------------------------+\n| **Target FPGA**          | Xilinx Alveo U250                             |\n+--------------------------+-----------------------------------------------+\n| **FPGA Synthesis Tools** | Xilinx Vivado HLS 2019.2, Xilinx Vitis 2019.2 |\n+--------------------------+-----------------------------------------------+\n| **CPU**                  | Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz     |\n+--------------------------+-----------------------------------------------+\n\nPoint-wise Convolution\n----------------------\n\nIn ``${AUTOSA_ROOT}/autosa_tests/dnn_ops/kernel.h``, uncomment the macro:\n\n.. code:: c\n\n    #define PC\n\nRun the following command to generate a design with HLS host.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/dnn_ops/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[8,8,4,8];kernel[]->latency[4,4,4];kernel[]->simd[1,1,1,2]}\" \\\n    --simd-info=./autosa_tests/dnn_ops/pc_simd_info.json \\\n    --host-serialize \\\n    --no-reverse-order \\\n    --hls\n\nThis leads to a 2x2 systolic array.\nThe figure below shows the array architecture.\n\n.. image:: images/pconv.png\n    :width: 400\n    :align: center\n\nYou will find all generated files under the directory\n``${AUTOSA_ROOT}/autosa.tmp/output/src``. \nCopy the ``hls_script.tcl`` to the directory ``autosa.tmp/output``.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/dnn_ops/hls_script.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n\nRun the TCL script to perform C simulation.\n\n.. code:: bash\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output/\n    vivado_hls -f hls_script.tcl\n\nYou should see ``Passed`` printed out in your terminal showing that \nC simulation is performed successfully.    \n\nDepth-wise Convolution\n----------------------\n\nIn ``${AUTOSA_ROOT}/autosa_tests/dnn_ops/kernel.h``, uncomment the macro:\n\n.. code:: c\n\n    #define DC\n\nRun the following command to generate a design with HLS host.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/dnn_ops/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[4,4,4,3];kernel[]->latency[1,2,1];kernel[]->simd[1,2,1,1]}\" \\\n    --simd-info=./autosa_tests/dnn_ops/dc_simd_info.json \\\n    --host-serialize \\\n    --no-reverse-order \\\n    --simd-touch-space \\\n    --hls\n\nThis leads to a 2x2 systolic array.\nThe figure below shows the array architecture.\n\n.. image:: images/dconv.png\n    :width: 400\n    :align: center\n\nYou will find all generated files under the directory\n``${AUTOSA_ROOT}/autosa.tmp/output/src``. \nCopy the ``hls_script.tcl`` to the directory ``autosa.tmp/output``.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/dnn_ops/hls_script.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n\nRun the TCL script to perform C simulation.\n\n.. code:: bash\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output/\n    vivado_hls -f hls_script.tcl\n\nYou should see ``Passed`` printed out in your terminal showing that \nC simulation is performed successfully. \n\nFully-Connected Layer\n---------------------\n\nIn ``${AUTOSA_ROOT}/autosa_tests/dnn_ops/kernel.h``, uncomment the macro:\n\n.. code:: c\n\n    #define FC\n\nRun the following command to generate a design with HLS host.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/dnn_ops/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[2];kernel[]->array_part[8,4];kernel[]->latency[4];kernel[]->simd[2]}\" \\\n    --simd-info=./autosa_tests/dnn_ops/fc_simd_info.json \\\n    --host-serialize \\\n    --no-reverse-order \\\n    --simd-touch-space \\\n    --local-reduce \\\n    --reduce-op=\"+\" \\\n    --hls\n\nThis leads to a 2x2 systolic array.\nThe figure below shows the array architecture.\n\n.. image:: images/fc.png\n    :width: 400\n    :align: center\n\nYou will find all generated files under the directory\n``${AUTOSA_ROOT}/autosa.tmp/output/src``. \nCopy the ``hls_script.tcl`` to the directory ``autosa.tmp/output``.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/dnn_ops/hls_script.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n\nRun the TCL script to perform C simulation.\n\n.. code:: bash\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output/\n    vivado_hls -f hls_script.tcl\n\nYou should see ``Passed`` printed out in your terminal showing that \nC simulation is performed successfully. \n\nDiscussion\n----------\n\nInstead of generating three seperate systolic arrays for each operator, an ideal case would be \nusing one systolic array to support all three operators at the same time.\nOne of the solutions is to fuse the generated designs from AutoSA manually with proper \ncode optimization.\nThe other solution would be fusing the space loops during the polyhedral compilation, which is left \nas future work of AutoSA."
  },
  {
    "path": "docs/examples/index.rst",
    "content": "AutoSA Examples\n===============\n\nThis page covers a list of design exmaples to get you familiar with the AutoSA \ncompilation process. Examples are divided into two categories: \n\n* Small Designs: These designs are limited in the problem size so that you could \n  easily verify and synthesize the design within hours.\n* Large Designs: These designs are used for demonstrating the performance of AutoSA-generated\n  designs, and it may take more than days for verification and synthesis flow.\n\nSmall Designs\n-------------\n\n.. toctree::\n    :maxdepth: 1\n\n    mm\n    cnn\n    lu\n    mm_int16\n    mm_hbm\n    dnn_ops\n\nLarge Designs \n-------------\n\n.. toctree::\n    :maxdepth: 1\n\n    mm_large\n    cnn_large\n    mm_int16_large\n    mm_int8_large\n    mttkrp_large\n    ttmc_large"
  },
  {
    "path": "docs/examples/lu.rst",
    "content": "LU Decomposition (Small)\n========================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nThis is an example of small-size LU decomposition. \nThe design files can be found at ``${AUTOSA_ROOT}/autosa_tests/lu``.\nThe testing environment is summarized in the table below.\n\n+--------------------------+-----------------------------------------------+\n| **Target FPGA**          | Xilinx Alveo U250                             |\n+--------------------------+-----------------------------------------------+\n| **FPGA Synthesis Tools** | Xilinx Vivado HLS 2019.2, Xilinx Vitis 2019.2 |\n+--------------------------+-----------------------------------------------+\n| **CPU**                  | Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz     |\n+--------------------------+-----------------------------------------------+\n\nC Simulation\n------------\n\nRun the following example command to generate one design with HLS host code.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/lu/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[-1,-1,-1];kernel[]->latency[]}\" \\\n    --simd-info=./autosa_tests/lu/simd_info.json \\\n    --use-cplusplus-template \\\n    --no-reschedule \\\n    --hls\n\n.. note:: \n\n    Compared to other examples, for LU decomposition, we add some additional arguments.\n    ``--use-cplusplus-template``: This argument enables AutoSA to generate C code using \n    C++ template as different PEs will have different functionalities in this array.\n    ``--no-reschedule``: This is due to the limtation of current ISL scheduler which \n    will generate a new program without any permutable loops that prohibit the transformation\n    to systolic arrays. Therefore, we disable the ISL auto-scheduling in this application.\n\n    Besides, the input source code has been modified to make sure all dependences are uniform.\n    AutoSA lacks the ability to automatically uniformize the program and requires human\n    modification for such cases.\n\nAfter compilation, you will find all generated files under the directory\n``${AUTOSA_ROOT}/autosa.tmp/output/src``. \nCopy the ``hls_script.tcl`` to the directory ``autosa.tmp/output``.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/lu/hls_script.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n\nRun the TCL script to perform C simulation.\n\n.. code:: bash\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output/\n    vivado_hls -f hls_script.tcl\n\nYou should see ``Passed`` printed out in your terminal showing that \nC simulation is performed successfully.\n\nBitstream Generation\n--------------------\n\nIf you need to generate the bitstream for on-board testing, simply remove the ``--hls``\nflag from the previous AutoSA command.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/lu/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[-1,-1,-1];kernel[]->latency[]}\" \\\n    --simd-info=./autosa_tests/lu/simd_info.json \\\n    --use-cplusplus-template \\\n    --no-reschedule\n\nNow instead of HLS host code, an OpenCL host code is generated.  \n\nPlease refer to other examples for the instructions on using Xilinx Vitis for generating the bitstream."
  },
  {
    "path": "docs/examples/mm.rst",
    "content": "Matrix Multiplication (Small)\n=============================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nThis is an example of small-size matrix multiplication. \nThe design files can be found at ``${AUTOSA_ROOT}/autosa_tests/mm``.\nThe testing environment is summarized in the table below.\n\n+--------------------------+-----------------------------------------------+\n| **Target FPGA**          | Xilinx Alveo U250                             |\n+--------------------------+-----------------------------------------------+\n| **FPGA Synthesis Tools** | Xilinx Vivado HLS 2019.2, Xilinx Vitis 2019.2 |\n+--------------------------+-----------------------------------------------+\n| **CPU**                  | Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz     |\n+--------------------------+-----------------------------------------------+\n\nC Simulation\n------------\n\nRun the following example command to generate one design with HLS host code.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n    --simd-info=./autosa_tests/mm/simd_info.json \\\n    --host-serialize \\\n    --hls\n\nAfter compilation, you will find all generated files under the directory \n``${AUTOSA_ROOT}/autosa.tmp/output/src``. \nCopy the ``hls_script.tcl`` to the directory ``autosa.tmp/output``.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/mm/hls_script.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n\nRun the TCL script to perform C simulation.\n\n.. code:: bash\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output/\n    vivado_hls -f hls_script.tcl\n\nYou should see ``Passed`` printed out in your terminal showing that \nC simulation is performed successfully.\n\nRTL Simulation\n--------------\n\nIf you need to verify the design using RTL simulation.\nThere are two more jobs to do.\n\nModify the Kernel Code\n^^^^^^^^^^^^^^^^^^^^^^\n\nOpen the kernel code ``${AUTOSA_ROOT}/autosa.tmp/output/src/kernel_kernel.cpp``.\nLocate to the top function ``void kernel0(A_t16 *A, B_t16 *B, C_t16 *C)``.\nYou should see the following directives for mapping three global pointers to \ndifferent AXI buses.\n\n.. code:: c\n\n    #pragma HLS INTERFACE m_axi port=A offset=slave bundle=gmem_A\n    #pragma HLS INTERFACE m_axi port=B offset=slave bundle=gmem_B\n    #pragma HLS INTERFACE m_axi port=C offset=slave bundle=gmem_C\n\nTo run RTL simulation, we will need to assign the *depth* of each AXI bus explictly.\nRefer to the host code ``kernel_host.cpp`` for the size of each array.\nAs we have applied host serialization, the array size might be slightly larger than \nthe original array. In this example, the array A, B, C are allocated with sizes of \n16384, 16384, and 4096. Since each array is packed by 16 elements,\nthe depths of each array are 16384/16=1024, 16384/16=1024, 4096/16=256, respectively.\nModify the directives above to:\n\n.. code:: c\n\n    #pragma HLS INTERFACE m_axi port=A offset=slave bundle=gmem_A depth=1024\n    #pragma HLS INTERFACE m_axi port=B offset=slave bundle=gmem_B depth=1024\n    #pragma HLS INTERFACE m_axi port=C offset=slave bundle=gmem_C depth=256\n\nModify the TCL script\n^^^^^^^^^^^^^^^^^^^^^\n\nOpen the TCL script ``hls_script.tcl``.\nUncomment the last a few steps:\n\n.. code:: tcl\n\n    csim_design\n    csynth_design\n    cosim_design\n\n* ``csim_design`` is for C simulation.\n* ``csynth_design`` is for C synthesis that synthesizes C code to RTL.\n* ``cosim_design`` is for RTL simulation.\n\nWe have also provided two more options in the TCl script.\n\n* ``cosim_design -trace_level all`` is for RTL simulation while dumping out all waveforms.\n* ``cosim_design -setup -trace_level all`` is for RTL simulation that only prepares the \n  simulation scripts without actually launching the simulation.\n\nNow run the TCL script again.\n\n.. code:: bash\n\n    vivado_hls -f hls_script.tcl\n\nWe will perform C simulation, C synthesis, RTL simulation in order.\nIt will take a few minutes to finish the entire flow.\nYou should be able to see the following information printed in your terminal showing \nthat RTL simulation finishes successfully.\n\n.. code:: bash\n\n    INFO: [COSIM 212-1000] *** C/RTL co-simulation finished: PASS ***\n\nBitstream Generation\n--------------------\n\nIf you need to generate the bitstream for on-board testing, simply remove the ``--hls``\nflag from the previous AutoSA command.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n    --simd-info=./autosa_tests/mm/simd_info.json \\\n    --host-serialize\n\nNow instead of HLS host code, an OpenCL host code is generated.\n\nWe have prepared a template Makefile for Xilinx Vitis tools.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/mm/Makefile ${AUTOSA_ROOT}/autosa.tmp/output/\n    cp ${AUTOSA_ROOT}/autosa_tests/mm/connectivity.cfg ${AUTOSA_ROOT}/autosa.tmp/output/\n\nSet the proper ``PLATFORM`` in the Makefile. \nBy default, we set it to ``xilinx_u250_xdma_201830_2``.\nYou may notice that we also copy a file ``connectivity.cfg`` here.\nThis file assigns the DDR bank mapping for the design. \nBy default, we map pointers A, B, C to DDR bank 0, 1, 2.\nLastly, modify the ``MODE`` in the Makefile for performing different tasks.\n\n* ``sw_emu``: C simulation\n* ``hw_emu``: RTL simulation\n* ``hw``: Bitstream generation\n\n.. note:: \n\n    When using Vitis flow to perform RTL simulation, nothing needs to change in the source code.\n    You may directly set the ``MODE`` to ``hw_emu`` and perform RTL simulation.\n    However, by default, we will run the kernel 10 times to collect the average runtime.\n    This may significantly prolong the simulation time. Consider reducing the kernel\n    launching times to 1 before using RTL simulation.\n\nTo generate the bitstream, set the ``MODE`` to ``hw`` and use the command below.\n\n.. code:: bash\n\n    make all\n\nIt will take a few hours to finish. After the bitstream is generated,\nuse the following command to run it on-board.    \n\n.. code:: bash\n\n    make check\n\nAuto-Tuning\n-----------\n\nWe have provided an auto-tuner in the alpha version. \nThe auto-tuner builds analytical models for resource usage and latency. \nBased on these models, the auto-tuner looks for designs with the least latency \nunder the resource constraints.\n\nTraining Resource Models\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo use the auto-tuner, the first step is to train te resource models.\nRun the command below to train the resoruce model.\n\n.. code:: bash\n\n    export AUTOSA_ROOT=$(pwd)\n    python3 ./autosa_scripts/optimizer.py \\\n    -c './autosa ./autosa_tests/mm/kernel.c --target=autosa_hls_c --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --sa-sizes=\"{kernel[]->space_time[3]}\"' \\\n    --info autosa_config/hw_info.json \\\n    -s autosa_config/optimizer_settings.json \\\n    --train \\\n    -p xilinx\n\n.. note:: \n\n    Please don't forget to set up the environment variable ``AUTOSA_ROOT`` to your \n    AutoSA root directory before running the auto-tuner.\n\nThe auto-tuner requires a minimal AutoSA compilation command to start.\nWe use the command below.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm/kernel.c --target=autosa_hls_c --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --sa-sizes=\"{kernel[]->space_time[3]}\"\n\nAs you may notice, we will need to assign the ``space_time`` to select the exact \ndataflow for auto-tuning. This is due to the reason that compiling different dataflows \nrequires some additional flags as we will discuss in the next section.\nAs for now, we use the output-stationary 2D array with the argument ``--sa-sizes=\"{kernel[]->space_time[3]``.\n\n``hw_info.json`` sepecifies the hardware resource constraints of the target FPGA board.\n``optimizer_settings.json`` is the auto-tuner configuration file. \nMore details about these options are covered in :ref:`auto-tuning-label`.\n\nAs the training phase will allocate many temporary files, you may consider \nadding the flag ``--tmp-dir`` to store the intermediate files in some other directories.\n\nOnce you launch the auto-tuner in the trainning phase, the auto-tuner will randomly\nsample the design space and collect a few training samples. These training samples \nwill be synthesized using HLS. We will then build resource models using linear regression\nwith these training samples.\n\nThis script will launch multiple processes to synthesize HLS designs. \nBy default, we use 16 processes.\nThe training process takes around 10 minutes to finish on our workstation.\n\nWe also evaluate the resource models on the test sets. \nYou will see the resource model accuracy results like below printed on your terminal once this step is finished.\n\n.. image:: images/resource_model.png\n    :align: center\n\nDesign Space Exploration\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn the next step, we will perform an exaustive search with pruning to find the design \nwith the least latency given the resource constraints. \nWe will improve the DSE with more efficient methods in the future.\n\nThe pruning strategies are set in ``optimizer_settings.json``. \nDetails about this file are covered in :ref:`auto-tuning-label`.\nDepending on the hardware and application, the pruning strategies might be changed.\nWe provide an example file for this application in ``${AUTOSA_ROOT}/autosa_config/optimizer_settings_libs/mm_small.json``.\n\nNow use the following command to perform DSE.\n\n.. code:: bash\n\n    python3 ./autosa_scripts/optimizer.py \\\n    -c './autosa ./autosa_tests/mm/kernel.c --target=autosa_hls_c --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --sa-sizes=\"{kernel[]->space_time[3]}\"' \\\n    --info autosa_config/hw_info.json \\\n    -s autosa_config/optimizer_settings_libs/mm_small.json \\\n    --search \\\n    -p xilinx\n\nThis script will launch multiple processes to search the design space.\nBy default, we use 32 processes.\nThe searching process takes around 3 minutes on our workstation.\n\nYou should see the detailed information about the best design printed out in your terminal like below.\n\n.. image:: images/mm_dse.png\n    :align: center\n\nThe auto-tuner will dump out the best design found during the DSE in the file \n``DSE.log``. By default, we will record the top-10 designs found by DSE.\n\nDataflow Exploration\n--------------------\n\nAutoSA can help you explore different dataflow choices.\nAs for matrix multiplication, AutoSA finds six different systolic arrays in total.\nThey use loop pair [i], [j], [k], [i,j], [i,k], [j,k] as space loops, respectively.\nWe show each of them in detail below.\n\nArray 1: [i]\n^^^^^^^^^^^^\n\nThis is a 1D systolic array using the loop i as the space loop.\nThe figure below shows the architecture of this array.\n\n.. image:: images/gemm0_array.png\n    :width: 400\n    :align: center\n\nThis is an output-stationary array. Elements of matrix C are computed locally inside \neach PE. Data of matrix B are reused across PEs. Data of matrix A are sent \ndirectly into each PE.\n\nHere is an example command to compile such a design.\nNote that we use ``kernel[]->space_time[0]`` to select the first design.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[0];kernel[]->array_part[32,32,32];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n    --simd-info=./autosa_tests/mm/simd_info.json \\\n    --host-serialize \\\n    --hls\n\nThis command leads to a 1x4 1D systolic array.    \n\nArray 2: [j]\n^^^^^^^^^^^^\n\nAs you may expect, this is also an output-stationary array with loop j as the space loop.\nThis array is symmetric to the first array. \nThe figure below shows the detailed architecture.\n\n.. image:: images/gemm1_array.png\n    :width: 400\n    :align: center\n\nElements of matrix C are computed locally inside each PE. Data of matrix A are reused \nacross PEs. Data of matrix B are sent directly to each PE.\n\nHere is an example command to compile such a design.\nNote that we use ``kernel[]->space_time[1]`` to select the second design.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[1];kernel[]->array_part[32,32,32];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n    --simd-info=./autosa_tests/mm/simd_info.json \\\n    --host-serialize \\\n    --hls\n\nThis command leads to a 1x4 1D systolic array.    \n\nArray 3: [k]\n^^^^^^^^^^^^\n\nThis array uses loop k as the space loop.\nThe figure below depicts the array architecture.\n\n.. image:: images/gemm2_array.png\n    :width: 400\n    :align: center\n\nThis is an input-stationary array. Elements of matrix C are accumulated along \nthe PEs. Data of matrix A and B need to be sent to PEs directly.\n\nUse the command below to generate such a design.\nWe use ``kernel[]->space_time[2]`` to select the third design.\nIn addition, as AutoSA has no analysis power for reduction loops. We will \nalso need to provide additional information about the reduction property. \nNote that we add the argument ``--local-reduce --reduce-op=\"+\"`` to let AutoSA know that \nthis design perform the reduction along PEs, and the reduction operator is ``+``.\n\nBy default, when searching for SIMD loops, AutoSA only considers the time loops.\nAs the loop k is used as the space loop, we add the flag ``--simd-touch-space`` to \nadd space loops into consideration in the previous command.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[2];kernel[]->array_part[4,32,32];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n    --simd-info=./autosa_tests/mm/simd_info.json \\\n    --host-serialize \\\n    --hls \\\n    --local-reduce \\\n    --reduce-op=\"+\" \\\n    --simd-touch-space \\\n    --array-contraction\n\nThis leads to a 1x2 1D array.\n\nOne more thing to notice here is that inside each PE, AutoSA only allocates a single register \n``local_C[1][1]`` for storing the local elements of array C. \nThis is based on the facts that all time loops are parallel loops which means that \nthe PE never works on the same element again. \nAs we add the flag ``--array-contraction``, AutoSA will successfully apply the array \ncontraction to reduce the local buffer size.\nYou may turn off this optimization by removing the argument ``--array-contraction``.\nWhen array contraction is turned off, a local buffer ``local_C[32][32]``\nis allocated inside each PE.\n\nArray 4: [i,j]\n^^^^^^^^^^^^^^\n\nThis is the 2D output-stationary array as used previously. \nThe figure below shows the detailed architecture.\n\n.. image:: images/gemm3_array.png\n    :width: 400\n    :align: center\n\nIn this array, data of matrix C are computed locally inside PEs.\nData of matrix A are reused horizontally.\nData of matrix B are reused vertically.\n\nBelow is an example command to compile such a design.\nNote that we use ``kernel[]->space_time[3]`` to select the fourth design.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n    --simd-info=./autosa_tests/mm/simd_info.json \\\n    --host-serialize \\\n    --hls\n\nThis command leads to a 2x2 2D systolic array.   \n\nArray 5: [i,k]\n^^^^^^^^^^^^^^\n\nThis array uses loops i and k as the space loops.\nThe figure below depicts the array architecture.\n\n.. image:: images/gemm4_array.png\n    :width: 400\n    :align: center\n\nIn this array, data of matrix C are reduced horizontally. \nData of matrix B are reused vertically. Data of matrix A are sent directly into \neach PE.\n\nUse the command below to generate one example array.\nNote that we use ``kernel[]->space_time[4]`` to select the fifth design.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[32,4,32];kernel[]->latency[16,16];kernel[]->simd[2]}\" \\\n    --simd-info=./autosa_tests/mm/simd_info.json \\\n    --host-serialize \\\n    --hls \\\n    --local-reduce \\\n    --reduce-op=\"+\" \\\n    --simd-touch-space \\\n    --array-contraction\n\nThis command leads to a 2x2 2D array.\nSimilar as array 3, we add additional information about reduction properties of the application\nto the compiler. To let AutoSA explore the space loop as SIMD loop, we also add the flag \n``--simd-touch-space``. And we add ``--array-contraction`` to reduce the local buffer size.\n\nArray 6: [j,k]\n^^^^^^^^^^^^^^\n\nThis array uses loops i and k as the space loops.\nThe figure below depicts the array architecture.\nThis architecture is symmetric to array 5.\n\n.. image:: images/gemm5_array.png\n    :width: 400\n    :align: center\n\nIn this array, data of matrix C are reduced horizontally.\nData of matrix A are reused vertically. Data of matrix B are sent directly into \neach PE.\n\nUse the command below to generate one example array.\nNote that we use ``kernel[]->space_time[5]`` to select the fifth design.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[5];kernel[]->array_part[32,4,32];kernel[]->latency[16,16];kernel[]->simd[2]}\" \\\n    --simd-info=./autosa_tests/mm/simd_info.json \\\n    --host-serialize \\\n    --hls \\\n    --local-reduce \\\n    --reduce-op=\"+\" \\\n    --simd-touch-space \\\n    --array-contraction\n\nThis command leads to a 2x2 2D array."
  },
  {
    "path": "docs/examples/mm_block_sparse.rst",
    "content": ""
  },
  {
    "path": "docs/examples/mm_hbm.rst",
    "content": "Matrix Multiplication with HBM (Small) \n======================================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nThis is an example of a small-size matrix multiplication in using high-bandwidth memory (HBM).\nThe design files can be found at ``${AUTOSA_ROOT}/autosa_tests/mm_hbm``.\nThe testing environment is summarized in the table below.\n\n+--------------------------+-----------------------------------------------+\n| **Target FPGA**          | Xilinx Alveo U280                             |\n+--------------------------+-----------------------------------------------+\n| **FPGA Synthesis Tools** | Xilinx Vivado HLS 2019.2, Xilinx Vitis 2019.2 |\n+--------------------------+-----------------------------------------------+\n| **CPU**                  | Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz     |\n+--------------------------+-----------------------------------------------+\n\nC Simulation\n------------\n\nRun the following example command to generate one design with HLS host code.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm_hbm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[32,32,32];kernel[]->latency[8,8];kernel[]->simd[2];kernel[]->hbm_A[2];kernel[]->hbm_B[2];kernel[]->hbm_C_drain[2]}\" \\\n    --simd-info=./autosa_tests/mm_hbm/simd_info.json \\\n    --hbm \\\n    --hls\n\n.. note::\n\n    Host serialization is not supported for HBM designs.\n\nAfter compilation, you will find all generated files under the directory \n``${AUTOSA_ROOT}/autosa.tmp/output/src``. \nCopy the ``hls_script.tcl`` to the directory ``autosa.tmp/output``.    \n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/mm_hbm/hls_script.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n\nRun the TCL script to perform C simulation.\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output/\n    vivado_hls -f hls_script.tcl\n\nYou should see ``Passed`` printed out in your terminal showing that \nC simulation is performed successfully.\n\nTo utilize the HBM, currently, we simply partition the I/O modules for each I/O group \nand assign them to multiple HBM ports. As you may notice, we add the argument\n``--sa-sizes=\"{kernel[]->hbm_A[2];kernel[]->hbm_B[2];kernel[]->hbm_C_drain[2]}\"``\nto assign the I/O group ``A``, ``B``, ``C_drain`` to 2 HBM ports each.\n\nThe figure below shows the array architectures using one or two HBM/DDR banks. \n\n.. image:: images/array_hbm.png\n    :align: center\n\nNotice that please use the I/O group name when assigning the HBM ports.\nDuring the compilation, AutoSA will print all the I/O groups in the array.\nFor more information about I/O groups, please refer to :ref:`construct-and-optimize-array-label`.    \n\nBitstream Generation\n--------------------\n\nIf you need to generate the bitstream for on-board testing, simply remove the ``--hls``\nflag from the previous AutoSA command.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm_hbm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[32,32,32];kernel[]->latency[8,8];kernel[]->simd[2];kernel[]->hbm_A[2];kernel[]->hbm_B[2];kernel[]->hbm_C_drain[2]}\" \\\n    --simd-info=./autosa_tests/mm_hbm/simd_info.json \\\n    --hbm\n\nWe have prepared a template Makefile for Xilinx Vitis tools.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/mm_hbm/Makefile ${AUTOSA_ROOT}/autosa.tmp/output/\n    cp ${AUTOSA_ROOT}/autosa_tests/mm_hbm/connectivity.cfg ${AUTOSA_ROOT}/autosa.tmp/output/\n\nSet the proper ``PLATFORM`` in the Makefile. \nBy default, we set it to ``xilinx_u280_xdma_201920_3``.\nYou may notice that we also copy a file ``connectivity.cfg`` here.\nThis file assigns the HBM bank mapping for the design. \nAs we partition the array A, B, C to 2 HBM banks each,\nwe assign the newly generated pointers A_0, A_1, B_0, B_1, C_0, C_1 to \nHBM bank 0, 1, 2, 3, 4, 5.\nLastly, modify the ``MODE`` in the Makefile for performing different tasks.\n\n* ``sw_emu``: C simulation\n* ``hw_emu``: RTL simulation\n* ``hw``: Bitstream generation\n\nTo generate the bitstream, set the ``MODE`` to ``hw`` and use the command below.\n\n.. code:: bash\n\n    make all\n\nIt will take a few hours to finish. After the bitstream is generated,\nuse the following command to run it on-board.    \n\n.. code:: bash\n\n    make check"
  },
  {
    "path": "docs/examples/mm_int16.rst",
    "content": "Matrix Multiplication in int16 (Small) \n======================================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nThis is an example of a small-size matrix multiplication in int16.\nThe design files can be found at ``${AUTOSA_ROOT}/autosa_tests/mm_int16``.\nThe testing environment is summarized in the table below.\n\n+--------------------------+-----------------------------------------------+\n| **Target FPGA**          | Xilinx Alveo U250                             |\n+--------------------------+-----------------------------------------------+\n| **FPGA Synthesis Tools** | Xilinx Vivado HLS 2019.2, Xilinx Vitis 2019.2 |\n+--------------------------+-----------------------------------------------+\n| **CPU**                  | Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz     |\n+--------------------------+-----------------------------------------------+\n\nC Simulation\n------------\n\nRun the following example command to generate one design with HLS host code.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm_int16/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n    --simd-info=./autosa_tests/mm_int16/simd_info.json \\\n    --host-serialize \\\n    --hls\n\nAfter compilation, you will find all generated files under the directory \n``${AUTOSA_ROOT}/autosa.tmp/output/src``. \nCopy the ``hls_script.tcl`` to the directory ``autosa.tmp/output``.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/mm_int16/hls_script.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n\nRun the TCL script to perform C simulation.\n\n.. code:: bash\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output/\n    vivado_hls -f hls_script.tcl\n\nYou should see ``Passed`` printed out in your terminal showing that \nC simulation is performed successfully.  \n\nBitstream Generation\n--------------------\n\nIf you need to generate the bitstream for on-board testing, simply remove the ``--hls``\nflag from the previous AutoSA command.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm_int16/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n    --simd-info=./autosa_tests/mm_int16/simd_info.json \\\n    --host-serialize\n\nNow instead of HLS host code, an OpenCL host code is generated.  \n\nPlease refer to other examples for the instructions on using Xilinx Vitis for generating the bitstream."
  },
  {
    "path": "docs/examples/mm_int16_large.rst",
    "content": "Matrix Multiplication in int16 (Large)\n======================================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nThis is an example of large-size matrix multiplication in int16.\nThe design files can be found at ``${AUTOSA_ROOT}/autosa_tests/large/mm_int16``.\nThe testing environment is summarized in the table below.\n\n+--------------------------+-----------------------------------------------+\n| **Target FPGA**          | Xilinx Alveo U250                             |\n+--------------------------+-----------------------------------------------+\n| **FPGA Synthesis Tools** | Xilinx Vivado HLS 2019.2, Xilinx Vitis 2019.2 |\n+--------------------------+-----------------------------------------------+\n| **CPU**                  | Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz     |\n+--------------------------+-----------------------------------------------+\n\nC Simulation\n------------\n\nRun the following example command to generate one design with HLS host code.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/large/mm_int16/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[256,256,32];kernel[]->latency[16,16];kernel[]->simd[32]}\" \\\n    --simd-info=./autosa_tests/large/mm_int16/simd_info.json \\\n    --host-serialize \\\n    --data-pack-sizes=\"{kernel[]->A[32,32,64];kernel[]->B[32,32,64];kernel[]->C[32,32,64]}\" \\\n    --hls\n\nAfter compilation, you will find all generated files under the directory \n``${AUTOSA_ROOT}/autosa.tmp/output/src``. \nCopy the ``hls_script.tcl`` to the directory ``autosa.tmp/output``.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/large/mm_int16/hls_script.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n\nRun the TCL script to perform C simulation.\n\n.. code:: bash\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output/\n    vivado_hls -f hls_script.tcl\n\nYou should see ``Passed`` printed out in your terminal showing that \nC simulation is performed successfully.   \n\nBitstream Generation\n--------------------\n\nIf you need to generate the bitstream for on-board testing, simply remove the ``--hls``\nflag from the previous AutoSA command.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/large/mm_int16/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[256,256,32];kernel[]->latency[16,16];kernel[]->simd[32]}\" \\\n    --simd-info=./autosa_tests/large/mm_int16/simd_info.json \\\n    --host-serialize \\\n    --data-pack-sizes=\"{kernel[]->A[32,32,64];kernel[]->B[32,32,64];kernel[]->C[32,32,64]}\"\n\nNow instead of HLS host code, an OpenCL host code is generated.   \n\nWe have prepared a template Makefile for Xilinx Vitis tools.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/large/mm_int16/Makefile ${AUTOSA_ROOT}/autosa.tmp/output/\n    cp ${AUTOSA_ROOT}/autosa_tests/large/mm_int16/connectivity.cfg ${AUTOSA_ROOT}/autosa.tmp/output/\n\nSet the proper ``PLATFORM`` in the Makefile. \nBy default, we set it to ``xilinx_u250_xdma_201830_2``.\nYou may notice that we also copy a file ``connectivity.cfg`` here.\nThis file assigns the DDR bank mapping for the design. \nBy default, we map pointers A, B, C to DDR bank 0, 1, 3.\nLastly, modify the ``MODE`` in the Makefile for performing different tasks.\n\n* ``sw_emu``: C simulation\n* ``hw_emu``: RTL simulation\n* ``hw``: Bitstream generation\n\n.. note:: \n\n    When using Vitis flow to perform RTL simulation, nothing needs to change in the source code.\n    You may directly set the ``MODE`` to ``hw_emu`` and perform RTL simulation.\n    However, by default, we will run the kernel 10 times to collect the average runtime.\n    This may significantly prolong the simulation time. Consider reducing the kernel\n    launching times to 1 before using RTL simulation.\n\nTo generate the bitstream, set the ``MODE`` to ``hw`` and use the command below.\n\n.. code:: bash\n\n    make all\n\nAfter the bitstream is generated,\nuse the following command to run it on-board.    \n\n.. code:: bash\n\n    make check\n\n.. note::\n    \n    As this design is rather large, Vitis fails to successfully route the design on-board\n    in our experiment.\n    We will rely on AutoBridge to route this design. \n\nUsing AutoBridge to Boost Frequency\n-----------------------------------\n\nYou may also try to use `AutoBridge <https://github.com/Licheng-Guo/AutoBridge>`_ \nto boost the design frequency.\nWe cover how to use AutoBridge to improve the frequency in :ref:`use-autobridge-label`.\n\nThe tables below show the detailed comparison results between the original design \n(unoptimized) and the design optimized with AutoBridge (optimized).\n\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Designs     | MHz | LUT             | REG              | BRAM         | DSP           |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Unoptimized | N/A | N/A             | N/A              | N/A          | N/A           |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Optimized   | 261 | 607442 (39.78%) | 836031 (26.53%)  | 1655 (70.85%)| 8192 (66.75%) |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n\n+-------------+-----------------+---------------+---------+\n| Designs     | Kernel Time (s) | Host Time (s) | TOPs    |\n+-------------+-----------------+---------------+---------+\n| Unoptimized | N/A             | N/A           | N/A     |\n+-------------+-----------------+---------------+---------+\n| Optimized   | 0.000625233     | 0.0095829     | 3.435   |\n+-------------+-----------------+---------------+---------+"
  },
  {
    "path": "docs/examples/mm_int8_large.rst",
    "content": "Matrix Multiplication in int8 (Large)\n=====================================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nThis is an example of large-size matrix multiplication in int8.\nThe design files can be found at ``${AUTOSA_ROOT}/autosa_tests/large/mm_int8``.\nThe testing environment is summarized in the table below.\n\n+--------------------------+-----------------------------------------------+\n| **Target FPGA**          | Xilinx Alveo U250                             |\n+--------------------------+-----------------------------------------------+\n| **FPGA Synthesis Tools** | Xilinx Vivado HLS 2019.2, Xilinx Vitis 2019.2 |\n+--------------------------+-----------------------------------------------+\n| **CPU**                  | Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz     |\n+--------------------------+-----------------------------------------------+\n\nC Simulation\n------------\n\nRun the following example command to generate one design with HLS host code.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/large/mm_int8/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[264,256,64];kernel[]->latency[11,32];kernel[]->simd[64]}\" \\\n    --simd-info=./autosa_tests/large/mm_int8/simd_info.json \\\n    --host-serialize \\\n    --data-pack-sizes=\"{kernel[]->A[32,32,64];kernel[]->B[32,32,64];kernel[]->C[32,32,64]}\" \\\n    --no-isl-sink \\\n    --hls\n\nAfter compilation, you will find all generated files under the directory \n``${AUTOSA_ROOT}/autosa.tmp/output/src``. \nCopy the ``hls_script.tcl`` to the directory ``autosa.tmp/output``.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/large/mm_int8/hls_script.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n\nRun the TCL script to perform C simulation.\n\n.. code:: bash\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output/\n    vivado_hls -f hls_script.tcl\n\nYou should see ``Passed`` printed out in your terminal showing that \nC simulation is performed successfully.   \n\nBitstream Generation\n--------------------\n\nIf you need to generate the bitstream for on-board testing, simply remove the ``--hls``\nflag from the previous AutoSA command.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/large/mm_int8/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[264,256,64];kernel[]->latency[11,32];kernel[]->simd[64]}\" \\\n    --simd-info=./autosa_tests/large/mm_int8/simd_info.json \\\n    --host-serialize \\\n    --data-pack-sizes=\"{kernel[]->A[32,32,64];kernel[]->B[32,32,64];kernel[]->C[32,32,64]}\" \\\n    --no-isl-sink\n\nNow instead of HLS host code, an OpenCL host code is generated.   \n\nAs for int8, we notice that the default coding style for reduction trees in Xilinx HLS C \nwill lead to inferior performance.\nThe default coding style is as below:\n\n.. code:: c\n\n    for (ap_uint<7> c8 = 0; c8 <= 63; c8 += 1) {\n    #pragma HLS UNROLL\n      local_C[c7][c6] = (local_C[c7][c6] + (local_A[0][c8] * local_B[0][c8]));\n    }\n\nIf we synthesize the default PE using Vitis, each MAC is maped to one DSP and we get 64 DSPs for this \nreduction tree. \n\nAlternatively, if we manually unroll the reduction tree, using the following coding style,\nonly 32 DSPs are generated.\n\n.. code:: c\n\n    data_t mul_5_0_0 = local_A[0][0] * local_B[0][0];\n    data_t add_5_0 = mul_5_0_0 + local_A[0][1] * local_B[0][1];\n    data_t mul_5_1_0 = local_A[0][2] * local_B[0][2];\n    data_t add_5_1 = mul_5_1_0 + local_A[0][3] * local_B[0][3];\n    ...\n    #pragma HLS RESOURCE variable=mul_5_0_0 core=Mul_LUT\n    #pragma HLS RESOURCE variable=mul_5_1_0 core=Mul_LUT\n    ...\n    local_C[c7][c6] += add_0_0;\n\nAs you may notice, we map half the multipliers to LUTs instead. \nThis helps to balance the resource usage of this design and enables us to place more \nPEs on-chip.\n\nThis part can't be done automatically at present, we provide a simple Python script \nto generate this code, and the user will have to replace the code manually in the design code.\n\nAs an example, find the script at ``${AUTOSA_ROOT}/autosa_tests/large/mm_int8/unroll.py``.\nModify the parameter ``UNROLL_FACTOR`` and ``DATA_T`` according to your current design.\nThen, run:\n\n.. code:: bash\n\n    python3 unroll.py | tee code.c\n\nNow copy the code in ``code.c`` to replace the original reduction loop in ``kernel_kernel.c``.\nWe have also provided an example file at ``${AUTOSA_ROOT}/autosa_tests/large/mm_int8/kernel_kernel_opt.cpp``.\n\nNow you may follow the normal flow to compile the design.\nWe have prepared a template Makefile for Xilinx Vitis tools.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/large/mm_int8/Makefile ${AUTOSA_ROOT}/autosa.tmp/output/\n    cp ${AUTOSA_ROOT}/autosa_tests/large/mm_int8/connectivity.cfg ${AUTOSA_ROOT}/autosa.tmp/output/\n\nSet the proper ``PLATFORM`` in the Makefile. \nBy default, we set it to ``xilinx_u250_xdma_201830_2``.\nYou may notice that we also copy a file ``connectivity.cfg`` here.\nThis file assigns the DDR bank mapping for the design. \nBy default, we map pointers A, B, C to DDR bank 0, 1, 3.\nLastly, modify the ``MODE`` in the Makefile for performing different tasks.\n\n* ``sw_emu``: C simulation\n* ``hw_emu``: RTL simulation\n* ``hw``: Bitstream generation\n\n.. note:: \n\n    When using Vitis flow to perform RTL simulation, nothing needs to change in the source code.\n    You may directly set the ``MODE`` to ``hw_emu`` and perform RTL simulation.\n    However, by default, we will run the kernel 10 times to collect the average runtime.\n    This may significantly prolong the simulation time. Consider reducing the kernel\n    launching times to 1 before using RTL simulation.\n\nTo generate the bitstream, set the ``MODE`` to ``hw`` and use the command below.\n\n.. code:: bash\n\n    make all\n\nAfter the bitstream is generated,\nuse the following command to run it on-board.    \n\n.. code:: bash\n\n    make check\n\nBelow is the resource and frequency information we collected for this design.\n\n+-----+-----------------+------------------+--------------+---------------+\n| MHz | LUT             | REG              | BRAM         | DSP           |\n+-----+-----------------+------------------+--------------+---------------+\n| 136 | 653369 (42.80%) | 704056 (22.34%)  | 1364 (58.39%)| 6144 (50.05%) |\n+-----+-----------------+------------------+--------------+---------------+\n\nYou could also test the generated design on board. We have listed the performance of the design \nin the table below.\n\n+-----------------+---------------+---------+\n| Kernel Time (s) | Host Time (s) | TOPs    |\n+-----------------+---------------+---------+\n| 0.000759123     | 0.0103696     | 2.917   |\n+-----------------+---------------+---------+   \n\nUsing AutoBridge to Boost Frequency\n-----------------------------------\n\nYou may also try to use `AutoBridge <https://github.com/Licheng-Guo/AutoBridge>`_ \nto boost the design frequency.\nWe cover how to use AutoBridge to improve the frequency in :ref:`use-autobridge-label`.\n\nThe tables below show the detailed comparison results between the original design \n(unoptimized) and the design optimized with AutoBridge (optimized).\n\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Designs     | MHz | LUT             | REG              | BRAM         | DSP           |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Unoptimized | 136 | 653369 (42.80%) | 704056 (22.34%)  | 1364 (58.39%)| 6144 (50.05%) |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Optimized   | 300 | 730647 (47.87%) | 786680 (24.96%)  | 1364 (58.39%)| 6144 (50.05%) |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n\n+-------------+-----------------+---------------+---------+\n| Designs     | Kernel Time (s) | Host Time (s) | TOPs    |\n+-------------+-----------------+---------------+---------+\n| Unoptimized | 0.000759123     | 0.0103696     | 2.917   |\n+-------------+-----------------+---------------+---------+\n| Optimized   | 0.000302619     | 0.00532768    | 7.318   |\n+-------------+-----------------+---------------+---------+"
  },
  {
    "path": "docs/examples/mm_large.rst",
    "content": "Matrix Multiplication (Large)\n=============================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nThis is an example of large-size matrix multiplication.\nThe design files can be found at ``${AUTOSA_ROOT}/autosa_tests/large/mm``.\nThe testing environment is summarized in the table below.\n\n+--------------------------+-----------------------------------------------+\n| **Target FPGA**          | Xilinx Alveo U250                             |\n+--------------------------+-----------------------------------------------+\n| **FPGA Synthesis Tools** | Xilinx Vivado HLS 2019.2, Xilinx Vitis 2019.2 |\n+--------------------------+-----------------------------------------------+\n| **CPU**                  | Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz     |\n+--------------------------+-----------------------------------------------+\n\nC Simulation\n------------\n\nRun the following example command to generate one design with HLS host code.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/large/mm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[260,256,512];kernel[]->latency[20,16];kernel[]->simd[8]}\" \\\n    --simd-info=./autosa_tests/large/mm/simd_info.json \\\n    --host-serialize \\\n    --hls\n\nAfter compilation, you will find all generated files under the directory \n``${AUTOSA_ROOT}/autosa.tmp/output/src``. \nCopy the ``hls_script.tcl`` to the directory ``autosa.tmp/output``.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/large/mm/hls_script.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n\nRun the TCL script to perform C simulation.\n\n.. code:: bash\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output/\n    vivado_hls -f hls_script.tcl\n\nYou should see ``Passed`` printed out in your terminal showing that \nC simulation is performed successfully.   \n\nBitstream Generation\n--------------------\n\nIf you need to generate the bitstream for on-board testing, simply remove the ``--hls``\nflag from the previous AutoSA command.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/large/mm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[260,256,512];kernel[]->latency[20,16];kernel[]->simd[8]}\" \\\n    --simd-info=./autosa_tests/large/mm/simd_info.json \\\n    --host-serialize\n\nNow instead of HLS host code, an OpenCL host code is generated.   \n\nWe have prepared a template Makefile for Xilinx Vitis tools.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/large/mm/Makefile ${AUTOSA_ROOT}/autosa.tmp/output/\n    cp ${AUTOSA_ROOT}/autosa_tests/large/mm/connectivity.cfg ${AUTOSA_ROOT}/autosa.tmp/output/\n\nSet the proper ``PLATFORM`` in the Makefile. \nBy default, we set it to ``xilinx_u250_xdma_201830_2``.\nYou may notice that we also copy a file ``connectivity.cfg`` here.\nThis file assigns the DDR bank mapping for the design. \nBy default, we map pointers A, B, C to DDR bank 0, 1, 3.\nLastly, modify the ``MODE`` in the Makefile for performing different tasks.\n\n* ``sw_emu``: C simulation\n* ``hw_emu``: RTL simulation\n* ``hw``: Bitstream generation\n\n.. note:: \n\n    When using Vitis flow to perform RTL simulation, nothing needs to change in the source code.\n    You may directly set the ``MODE`` to ``hw_emu`` and perform RTL simulation.\n    However, by default, we will run the kernel 10 times to collect the average runtime.\n    This may significantly prolong the simulation time. Consider reducing the kernel\n    launching times to 1 before using RTL simulation.\n\nTo generate the bitstream, set the ``MODE`` to ``hw`` and use the command below.\n\n.. code:: bash\n\n    make all\n\nAfter the bitstream is generated,\nuse the following command to run it on-board.    \n\n.. code:: bash\n\n    make check\n\n.. note:: \n\n    As the example design is rather large, it takes approximately 40 hours to finish the synthesis on our workstation.\n\nBelow is the resource and frequency information we collected for this design.\n\n+-----+-----------------+------------------+--------------+---------------+\n| MHz | LUT             | REG              | BRAM         | DSP           |\n+-----+-----------------+------------------+--------------+---------------+\n| 146 | 804517 (52.69%) | 1360681 (43.17%) | 953 (40.80%) | 8320 (67.78%) |\n+-----+-----------------+------------------+--------------+---------------+\n\nYou could also test the generated design on board. We have listed the performance of the design \nin the table below.\n\n+-----------------+---------------+---------+\n| Kernel Time (s) | Host Time (s) | GFLOPs  |\n+-----------------+---------------+---------+\n| 0.00548694      | 0.0113009     | 397.496 |\n+-----------------+---------------+---------+   \n\nUsing AutoBridge to Boost Frequency\n-----------------------------------\n\nYou may also try to use `AutoBridge <https://github.com/Licheng-Guo/AutoBridge>`_ \nto boost the design frequency.\nWe cover how to use AutoBridge to improve the frequency in :ref:`use-autobridge-label`.\n\nThe tables below show the detailed comparison results between the original design \n(unoptimized) and the design optimized with AutoBridge (optimized).\n\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Designs     | MHz | LUT             | REG              | BRAM         | DSP           |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Unoptimized | 146 | 804517 (52.69%) | 1360681 (43.17%) | 953 (40.80%) | 8320 (67.78%) |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Optimized   | 300 | 803752 (52.64%) | 1325480 (42.05%) | 952 (40.75%) | 8320 (67.78%) |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n\n+-------------+-----------------+---------------+---------+\n| Designs     | Kernel Time (s) | Host Time (s) | GFLOPs  |\n+-------------+-----------------+---------------+---------+\n| Unoptimized | 0.00548694      | 0.0113009     | 397.496 |\n+-------------+-----------------+---------------+---------+\n| Optimized   | 0.00232357      | 0.0371066     | 938.658 |\n+-------------+-----------------+---------------+---------+"
  },
  {
    "path": "docs/examples/mttkrp_large.rst",
    "content": "Matricized Tensor Times Khatri-Rao Product (MTTKRP) (Large)\n===========================================================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nThis is an example of large-size Matricized Tensor Times Khatri-Rao Product.\nThe design files can be found at ``${AUTOSA_ROOT}/autosa_tests/large/mttkrp``.\nThe testing environment is summarized in the table below.\n\n+--------------------------+-----------------------------------------------+\n| **Target FPGA**          | Xilinx Alveo U250                             |\n+--------------------------+-----------------------------------------------+\n| **FPGA Synthesis Tools** | Xilinx Vivado HLS 2019.2, Xilinx Vitis 2019.2 |\n+--------------------------+-----------------------------------------------+\n| **CPU**                  | Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz     |\n+--------------------------+-----------------------------------------------+\n\nC Simulation\n------------\n\nRun the following example command to generate one design with HLS host code.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/large/mttkrp/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[128,128,2];kernel[]->latency[16,8];kernel[]->simd[8,1]}\" \\\n    --simd-info=./autosa_tests/large/mttkrp/simd_info.json \\\n    --host-serialize \\\n    --hls\n\nAfter compilation, you will find all generated files under the directory \n``${AUTOSA_ROOT}/autosa.tmp/output/src``. \nCopy the ``hls_script.tcl`` to the directory ``autosa.tmp/output``.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/large/mttkrp/hls_script.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n\nRun the TCL script to perform C simulation.\n\n.. code:: bash\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output/\n    vivado_hls -f hls_script.tcl\n\nYou should see ``Passed`` printed out in your terminal showing that \nC simulation is performed successfully.   \n\nBitstream Generation\n--------------------\n\nIf you need to generate the bitstream for on-board testing, simply remove the ``--hls``\nflag from the previous AutoSA command.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/large/mttkrp/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[128,128,2];kernel[]->latency[16,8];kernel[]->simd[8,1]}\" \\\n    --simd-info=./autosa_tests/large/mttkrp/simd_info.json \\\n    --host-serialize    \n\nNow instead of HLS host code, an OpenCL host code is generated.   \n\nWe have prepared a template Makefile for Xilinx Vitis tools.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/large/mttkrp/Makefile ${AUTOSA_ROOT}/autosa.tmp/output/\n    cp ${AUTOSA_ROOT}/autosa_tests/large/mttkrp/connectivity.cfg ${AUTOSA_ROOT}/autosa.tmp/output/\n\nTo generate the bitstream, use the command below.\n\n.. code:: bash\n\n    make all\n\nAfter the bitstream is generated,\nuse the following command to run it on-board.    \n\n.. code:: bash\n\n    make check\n\nBelow is the resource and frequency information we collected for this design.\n\n+-----+-----------------+------------------+--------------+---------------+\n| MHz | LUT             | REG              | BRAM         | DSP           |\n+-----+-----------------+------------------+--------------+---------------+\n| 184 | 623061 (41.53%) | 1016803 (32.58%) | 599 (26.26%) | 8192 (66.75%) |\n+-----+-----------------+------------------+--------------+---------------+\n\nYou could also test the generated design on board. We have listed the performance of the design \nin the table below.\n\n+-----------------+---------------+---------+\n| Kernel Time (s) | Host Time (s) | GFLOPs  |\n+-----------------+---------------+---------+\n| 0.0237726       | 0.288613      | 542.006 |\n+-----------------+---------------+---------+   \n\nUsing AutoBridge to Boost Frequency\n-----------------------------------\n\nYou may also try to use `AutoBridge <https://github.com/Licheng-Guo/AutoBridge>`_ \nto boost the design frequency.\nWe cover how to use AutoBridge to improve the frequency in :ref:`use-autobridge-label`.\n\nThe tables below show the detailed comparison results between the original design \n(unoptimized) and the design optimized with AutoBridge (optimized).\n\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Designs     | MHz | LUT             | REG              | BRAM         | DSP           |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Unoptimized | 184 | 623061 (41.53%) | 1016803 (32.58%) | 599 (26.26%) | 8192 (66.75%) |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Optimized   | 300 | 625001 (41.67%) | 1000623 (32.08%) | 599 (26.26%) | 8192 (66.75%) |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n\n+-------------+-----------------+---------------+---------+\n| Designs     | Kernel Time (s) | Host Time (s) | GFLOPs  |\n+-------------+-----------------+---------------+---------+\n| Unoptimized | 0.0237726       | 0.288613      | 542.006 |\n+-------------+-----------------+---------------+---------+\n| Optimized   | 0.0141298       | 0.174689      | 911.895 |\n+-------------+-----------------+---------------+---------+"
  },
  {
    "path": "docs/examples/ttmc_large.rst",
    "content": "Chain of Tensor-matrix multiplications (TTMc) (Large)\n=====================================================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nThis is an example of large-size Chain of Tensor-matrix multiplications.\nThe design files can be found at ``${AUTOSA_ROOT}/autosa_tests/large/ttmc``.\nThe testing environment is summarized in the table below.\n\n+--------------------------+-----------------------------------------------+\n| **Target FPGA**          | Xilinx Alveo U250                             |\n+--------------------------+-----------------------------------------------+\n| **FPGA Synthesis Tools** | Xilinx Vivado HLS 2019.2, Xilinx Vitis 2019.2 |\n+--------------------------+-----------------------------------------------+\n| **CPU**                  | Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz     |\n+--------------------------+-----------------------------------------------+\n\nC Simulation\n------------\n\nRun the following example command to generate one design with HLS host code.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/large/ttmc/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[16,64,16,32];kernel[]->latency[1,8,8];kernel[]->simd[8,1]}\" \\\n    --simd-info=./autosa_tests/large/ttmc/simd_info.json \\\n    --host-serialize \\\n    --hls\n\nAfter compilation, you will find all generated files under the directory \n``${AUTOSA_ROOT}/autosa.tmp/output/src``. \nCopy the ``hls_script.tcl`` to the directory ``autosa.tmp/output``.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/large/ttmc/hls_script.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n\nRun the TCL script to perform C simulation.\n\n.. code:: bash\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output/\n    vivado_hls -f hls_script.tcl\n\nYou should see ``Passed`` printed out in your terminal showing that \nC simulation is performed successfully.   \n\nBitstream Generation\n--------------------\n\nIf you need to generate the bitstream for on-board testing, simply remove the ``--hls``\nflag from the previous AutoSA command.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/large/ttmc/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[4];kernel[]->array_part[16,64,16,32];kernel[]->latency[1,8,8];kernel[]->simd[8,1]}\" \\\n    --simd-info=./autosa_tests/large/ttmc/simd_info.json \\\n    --host-serialize\n\nNow instead of HLS host code, an OpenCL host code is generated.   \n\nWe have prepared a template Makefile for Xilinx Vitis tools.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/large/ttmc/Makefile ${AUTOSA_ROOT}/autosa.tmp/output/\n    cp ${AUTOSA_ROOT}/autosa_tests/large/ttmc/connectivity.cfg ${AUTOSA_ROOT}/autosa.tmp/output/\n\nTo generate the bitstream, use the command below.\n\n.. code:: bash\n\n    make all\n\nAfter the bitstream is generated, use the following command to run it on-board.    \n\n.. code:: bash\n\n    make check\n\nBelow is the resource and frequency information we collected for this design.\n\n+-----+-----------------+------------------+--------------+---------------+\n| MHz | LUT             | REG              | BRAM         | DSP           |\n+-----+-----------------+------------------+--------------+---------------+\n| 201 | 621584 (41.43%) | 1016231 (32.57%) | 479 (21.01%) | 8192 (66.75%) |\n+-----+-----------------+------------------+--------------+---------------+\n\nYou could also test the generated design on board. We have listed the performance of the design \nin the table below.\n\n+-----------------+---------------+---------+\n| Kernel Time (s) | Host Time (s) | GFLOPs  |\n+-----------------+---------------+---------+\n| 0.168946        | 1.8771        | 610.131 |\n+-----------------+---------------+---------+   \n\nUsing AutoBridge to Boost Frequency\n-----------------------------------\n\nYou may also try to use `AutoBridge <https://github.com/Licheng-Guo/AutoBridge>`_ \nto boost the design frequency.\nWe cover how to use AutoBridge to improve the frequency in :ref:`use-autobridge-label`.\n\nThe tables below show the detailed comparison results between the original design \n(unoptimized) and the design optimized with AutoBridge (optimized).\n\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Designs     | MHz | LUT             | REG              | BRAM         | DSP           |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Unoptimized | 201 | 621584 (41.43%) | 1016231 (32.57%) | 479 (21.01%) | 8192 (66.75%) |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Optimized   | 300 | 622878 (41.53%) | 1010672 (32.40%) | 479 (21.01%) | 8192 (66.75%) |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n\n+-------------+-----------------+---------------+---------+\n| Designs     | Kernel Time (s) | Host Time (s) | GFLOPs  |\n+-------------+-----------------+---------------+---------+\n| Unoptimized | 0.168946        | 1.8771        | 610.131 |\n+-------------+-----------------+---------------+---------+\n| Optimized   | 0.112436        | 1.25489       | 916.781 |\n+-------------+-----------------+---------------+---------+"
  },
  {
    "path": "docs/index.rst",
    "content": ".. AutoSA documentation master file, created by\n   sphinx-quickstart on Sun Jan 17 15:06:11 2021.\n   You can adapt this file completely to your liking, but it should at least\n   contain the root `toctree` directive.\n\nWelcome to AutoSA's documentation!\n==================================\n\nAutoSA is an end-to-end systolic array compiler for FPGAs based on the polyhedral model. \nIt takes algorithms in high-level programming languages (C) as inputs, \nperforms polyhedral transformation and other architecture optimizations to map algorithms \nto systolic array architecture. \n\n\nGetting Started\n---------------\n\n.. toctree::\n   :maxdepth: 1\n   \n   installation\n   tutorials/index\n   examples/index\n\nResources\n---------\n* `AutoSA Paper <https://vast.cs.ucla.edu/sites/default/files/publications/FPGA2021_AutoSA_camera.pdf>`_\n* `Github Project <https://github.com/UCLA-VAST/AutoSA>`_\n* `Docker Image <https://hub.docker.com/repository/docker/whbldhwj/autosa>`_\n* `FCCM 2021 Tutorial Slides <https://www.dropbox.com/s/pusu5htagdvvuch/autosa_fccm21_final.pdf?dl=0>`_\n\nIndices and tables\n==================\n\n* :ref:`genindex`\n* :ref:`modindex`\n* :ref:`search`\n"
  },
  {
    "path": "docs/install_from_source.rst",
    "content": ".. _install-from-source-label:\n\nInstall from Source\n===================\n\nThis page gives instructions on how to build and install AutoSA from scratch.\nIt consists of two steps.\n\n* `Step 1: Install the Prerequisites`_\n* `Step 2: Compile AutoSA`_\n\nStep 1: Install the Prerequisites\n---------------------------------\nBelow we list the detailed instructions about installing the prerequisites of AutoSA.\n\nAdditionally, you could take a look at our `Dockerfile <https://github.com/UCLA-VAST/AutoSA/blob/master/Dockerfile>`_ for building the Docker image \nof AutoSA for reference instructions to build all the prerequisites on Ubuntu.\n\nPPCG\n^^^^\n\nAutoSA is developed upon PPCG (`link <https://repo.or.cz/ppcg.git>`_).\nBelow are the requirements of PPCG. \n\n* automake, autoconf, libtool (not needed when compiling a release)\n* pkg-config (http://www.freedesktop.org/wiki/Software/pkg-config) (not needed when compiling a release using the included isl and pet)\n* gmp (http://gmplib.org/)\n* libyaml (http://pyyaml.org/wiki/LibYAML) (only needed if you want to compile the pet executable)\n* LLVM/clang libraries, 2.9 or higher (http://clang.llvm.org/get_started.html) Unless you have some other reasons for wanting to use the svn version, it is best to install the latest supported release. For more details, including the latest supported release, see pet/README.\n\nIf you are installing on Ubuntu, then you can install the following packages:\n\n.. code:: bash\n\n    automake autoconf libtool pkg-config libgmp3-dev libyaml-dev libclang-dev llvm\n\nNote that you need at least version 3.2 of libclang-dev (ubuntu raring).\nOlder versions of this package did not include the required libraries.\nIf you are using an older version of ubuntu, then you need to compile and\ninstall LLVM/clang from source.\n\n\nBarvinok\n^^^^^^^^\n\nAutoSA also uses Barvinok library (`link <http://barvinok.gforge.inria.fr/>`_). \nBelow are the requirements of Barvinok.\n\n* NTL (https://libntl.org/)\n\nThe detailed instructions for installing NTL can be found at `link <https://libntl.org/doc/tour-unix.html>`_.\nNote that NTL needs to be compiled with GMP support, this is, you have to specify\n\n.. code:: bash\n\n    NTL_GMP_LIP=on\n\nNTL also needs to be compiled with ISO mode.   \nFor versions older than 5.4, this means you need an additional\n\n.. code:: bash\n\n    NTL_STD_CXX=on\n\nOthers\n^^^^^^\n\n* Python 3.6+ and the corresponding pip.\n\nStep 2: Compile AutoSA\n----------------------\n\nAfter installing the prerequisites, this step will build AutoSA from source.\n\nGet Source from Github\n^^^^^^^^^^^^^^^^^^^^^^\n\nClone the source repo from Github.\n\n.. code:: bash\n\n    git clone https://github.com/UCLA-VAST/AutoSA.git\n\nRun the Installation Script\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nRun the installation script to build and install AutoSA.\n\n.. code:: bash\n\n    ./install.sh\n\nAfter the installation has finished, to test if AutoSA is installed correctly,\nyou could run the following command to obtain the help information of AutoSA.\n\n.. code:: bash\n\n    ./autosa --help\n\nIf the help information is printed on the screen, you are all set and may start to explore \nthe magic of AutoSA!    "
  },
  {
    "path": "docs/installation.rst",
    "content": "Installation\n============\n\nTo install AutoSA, please read :ref:`install-from-source-label`. Alternatively, \nif you would like to quickly try out AutoSA, please check the \n:ref:`docker-image-label`.\n\n.. toctree::\n   :maxdepth: 1\n\n   install_from_source\n   docker_image"
  },
  {
    "path": "docs/make.bat",
    "content": "@ECHO OFF\r\n\r\npushd %~dp0\r\n\r\nREM Command file for Sphinx documentation\r\n\r\nif \"%SPHINXBUILD%\" == \"\" (\r\n\tset SPHINXBUILD=sphinx-build\r\n)\r\nset SOURCEDIR=.\r\nset BUILDDIR=_build\r\n\r\nif \"%1\" == \"\" goto help\r\n\r\n%SPHINXBUILD% >NUL 2>NUL\r\nif errorlevel 9009 (\r\n\techo.\r\n\techo.The 'sphinx-build' command was not found. Make sure you have Sphinx\r\n\techo.installed, then set the SPHINXBUILD environment variable to point\r\n\techo.to the full path of the 'sphinx-build' executable. Alternatively you\r\n\techo.may add the Sphinx directory to PATH.\r\n\techo.\r\n\techo.If you don't have Sphinx installed, grab it from\r\n\techo.http://sphinx-doc.org/\r\n\texit /b 1\r\n)\r\n\r\n%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%\r\ngoto end\r\n\r\n:help\r\n%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%\r\n\r\n:end\r\npopd\r\n"
  },
  {
    "path": "docs/tutorials/auto_bridge.rst",
    "content": ".. _use-autobridge-label:\n\nLeveraging AutoBridge to Boost the Design Frequency\n===================================================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nAutoBridge is an automation framework to boost the FPGA design frequency. \nThis page explains how to leverage AutoBridge to further boost the systolic array \nfrequency on Xilinx FPGAs.\n\nThe testing environment of all the designs presented in this tutorial is described by the table below.\n\n+--------------------------+-----------------------------------------------+\n| **Target FPGA**          | Xilinx Alveo U250                             |\n+--------------------------+-----------------------------------------------+\n| **FPGA Synthesis Tools** | Xilinx Vivado HLS 2019.2, Xilinx Vitis 2019.2 |\n+--------------------------+-----------------------------------------------+\n| **CPU**                  | Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz     |\n+--------------------------+-----------------------------------------------+\n\nIntroduction of AutoBridge\n--------------------------\n\nAutoBridge is a floorplanning tool based on the Vivado HLS design flow. It parses the \nXilinx HLS designs and generates the floorplanning constraints to help boost the design frequency.\nMore details about this tool can be found at:\n\n* `Github repo <https://github.com/Licheng-Guo/AutoBridge>`_\n* `Paper <https://vast.cs.ucla.edu/sites/default/files/publications/AutoBridge_FPGA2021.pdf>`_\n\nUsing AutoBridge to Boost the Frequency\n---------------------------------------\n\nPlease follow the instructions on AutoBrige's Github repo to install the tool.\n\nThe design example used for this tutorial can be found at the directory ``${AUTOSA_ROOT}/autosa_tests/large/mm``.\n\nStep 0: Generating the Reference Design\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nFirst of all, let's generate a design directly without using AutoBridge.\nUse the following command to generate the systolic array design.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/large/mm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[260,256,512];kernel[]->latency[20,16];kernel[]->simd[8]}\" \\\n    --simd-info=./autosa_tests/large/mm/simd_info.json \\\n    --host-serialize\n\nThe generated designs can be found at ``${AUTOSA_ROOT}/autosa.tmp/output/src``.\n\nCopy the Makefile and the DRAM connectivity configuration file to the project directory.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/large/mm/Makefile ${AUTOSA_ROOT}/autosa.tmp/output/\n    cp ${AUTOSA_ROOT}/autosa_tests/large/mm/connectivity.cfg autosa.tmp/output/\n\nSet up your local Xilinx Vitis environment. Note that we target the Xilinx Alveo U250 in the Makefile.\nChange the Makefile and connectivity file accordingly if you target a different FPGA board. \nYou may also need to change the design parameters described by ``--sa-sizes`` if your target FPGA board has \nless resource than Xilinx Alveo U250.\n\nRun the following command to synthesize the design into bitstream.\n\n.. code:: bash\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output\n    make all\n\n.. note::\n\n    As the example design is rather large, it takes approximately 40 hours to finish the synthesis on our workstation.\n    \nAfter the synthesis is completed, you can check the design resource and frequency.\nBelow is the resource and frequency information we collected for this design.\n\n+-----+-----------------+------------------+--------------+---------------+\n| MHz | LUT             | REG              | BRAM         | DSP           |\n+-----+-----------------+------------------+--------------+---------------+\n| 146 | 804517 (52.69%) | 1360681 (43.17%) | 953 (40.80%) | 8320 (67.78%) |\n+-----+-----------------+------------------+--------------+---------------+\n\nYou could also test the generated design on board. We have listed the performance of the design \nin the table below.\n\n+-----------------+---------------+---------+\n| Kernel Time (s) | Host Time (s) | GFLOPs  |\n+-----------------+---------------+---------+\n| 0.00548694      | 0.0113009     | 397.496 |\n+-----------------+---------------+---------+\n\nStep 1: Compiling the Design Using Vivado HLS\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNow let's use AutoBridge to generate a design with higher frequency. \n\nBefore synthesizing the HLS design, add the pragma ``#pragma HLS dataflow disable_start_propagation`` at the top function.\nIn our example, open the file ``${AUTOSA_ROOT}/autosa.tmp/output/src/kernel_kernel.cpp``.\nYou will find the definition of the top function ``kernel0`` starting from the line 1204.\n\n.. code:: c\n\n    extern \"C\" {\n    void kernel0(A_t16 *A, B_t16 *B, C_t16 *C)\n    {\n    #pragma HLS INTERFACE m_axi port=A offset=slave bundle=gmem_A\n    #pragma HLS INTERFACE m_axi port=B offset=slave bundle=gmem_B\n    #pragma HLS INTERFACE m_axi port=C offset=slave bundle=gmem_C\n    #pragma HLS INTERFACE s_axilite port=A bundle=control\n    #pragma HLS INTERFACE s_axilite port=B bundle=control\n    #pragma HLS INTERFACE s_axilite port=C bundle=control\n    #pragma HLS INTERFACE s_axilite port=return bundle=control\n\n    #pragma HLS DATAFLOW\n    ...\n\nAdd the pragma ``#pragma HLS dataflow disable_start_propagation`` into the top function.\nThe modified code looks like below.\n\n.. code:: c\n\n    extern \"C\" {\n    void kernel0(A_t16 *A, B_t16 *B, C_t16 *C)\n    {\n    #pragma HLS INTERFACE m_axi port=A offset=slave bundle=gmem_A\n    #pragma HLS INTERFACE m_axi port=B offset=slave bundle=gmem_B\n    #pragma HLS INTERFACE m_axi port=C offset=slave bundle=gmem_C\n    #pragma HLS INTERFACE s_axilite port=A bundle=control\n    #pragma HLS INTERFACE s_axilite port=B bundle=control\n    #pragma HLS INTERFACE s_axilite port=C bundle=control\n    #pragma HLS INTERFACE s_axilite port=return bundle=control\n\n    #pragma HLS DATAFLOW\n    #pragma HLS dataflow disable_start_propagation\n    ...\n\nNext, copy the Xilinx HLS TCL file from the AutoBridge repo to the project directory to synthesize the C code \nto RTL using Xilinx HLS.\n\n.. code:: bash\n\n    cp ${AUTOBRIDGE_ROOT}/reference-scripts/step1-run-hls.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n\nModify the TCL file to add the information for our project. \nSpecifically, modify the first four lines of ``step1-run-hls.tcl`` from\n\n.. code:: tcl\n\n    open_project PROJECT_NAME\n    set_top TOP_FUNCTION_NAME\n    add_files PATH_TO_SRC_FILE\n    add_files -tb PATH_TO_TESTBENCH_FILE\n\nto\n\n.. code:: tcl\n\n    open_project kernel0\n    set_top kernel0\n    add_files \"src/kernel_kernel.cpp\"\n    #add_files -tb PATH_TO_TESTBENCH_FILE\n\nModify lines 25-26 of ``step1-run-hls.tcl`` from\n\n.. code:: tcl\n\n    csim_design\n    csynth_design    \n\nto \n\n.. code:: tcl\n\n    #csim_design\n    csynth_design    \n\nNote that we define the target FPGA board at line 9 to Xilinx Alveo U250.\nModify it accordingly for your project.\n\nNow call Xilinx Vivado HLS to synthesize the design.\n\n.. code:: bash\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output\n    vivado_hls -f step1-run-hls.tcl\n\nStep 2: Invoking AutoBridge to Generate Floorplanning Configuration for the Target Design\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAfter the design is synthesized by HLS, we will invoke AutoBridge to analyze the project and generate \nthe floorplanning constraints for the project.\n\nAutoBridge provides a Python script for processing the HLS project automatically, which \ncan be found at ``${AUTOBRIDGE_ROOT}/reference-scripts/step2-autobridge.py``.\n\nPlease refer to AutoBridge's `repo <https://github.com/Licheng-Guo/AutoBridge>`_ for more details about this script.\n\nNormally, before running this script, we will have to modify the following fields in the script.\n\n``project_math``: Modify it to the directory of the HLS project. As for our example, we set it as:\n\n.. code:: Python\n\n    project_path = '${AUTOSA_ROOT}/autosa.tmp/output/kernel0'\n\n``top_name``: Modify it the top function of the HLS project.\n\n.. code:: Python\n\n    top_name = 'kernel0'\n\n``board_name``: Modify it to the target FPGA board. AutoBridge currently supports Xilinx Alveo U250 and U280.\nWe use the U250 by default.\n\n.. code:: Python\n\n    board_name = 'u250'\n\n``DDR_loc_2d_y``, ``DDR_loc_2d_x``: Modify them to assign the locations of the AXI modules.\n\nIn the generated HLS code, we have assigned diffrent global pointers to different AXI buses by default.\nIn lines 1204-1212, we have the following code:\n\n.. code:: c\n\n    void kernel0(A_t16 *A, B_t16 *B, C_t16 *C)\n    {\n    #pragma HLS INTERFACE m_axi port=A offset=slave bundle=gmem_A\n    #pragma HLS INTERFACE m_axi port=B offset=slave bundle=gmem_B\n    #pragma HLS INTERFACE m_axi port=C offset=slave bundle=gmem_C\n    #pragma HLS INTERFACE s_axilite port=A bundle=control\n    #pragma HLS INTERFACE s_axilite port=B bundle=control\n    #pragma HLS INTERFACE s_axilite port=C bundle=control\n    #pragma HLS INTERFACE s_axilite port=return bundle=control\n\nWe have assigned the three global pointers ``A``, ``B``, ``C`` to three different AXI buses \n``gmem_A``, ``gmem_B``, and ``gmem_C``.\n\nThere are four DDR controllers available on U250. In this design, we will assign \n``gmem_A`` to ``DDR0``, ``gmem_B`` to ``DDR1``, and ``gmem_C`` to ``DDR3``.\nWe have already assigned this DDR configuration in the connectivity file ``connectivity.cfg`` we mentioned previously.\n\nWe will have to modify the AutoBridge script to reflect this mapping as well.\n\nModify the lines 84-111 of ``step2-autobridge.py`` as follows:\n\n.. code:: Python\n\n    DDR_loc_2d_y['A_IO_L3_in_serialize_U0'] = 0\n    DDR_loc_2d_x['A_IO_L3_in_serialize_U0'] = 0\n    DDR_loc_2d_y['kernel0_gmem_A_m_axi_U'] = 0\n    DDR_loc_2d_x['kernel0_gmem_A_m_axi_U'] = 0\n\n    DDR_loc_2d_y['B_IO_L3_in_serialize_U0'] = 1\n    DDR_loc_2d_x['B_IO_L3_in_serialize_U0'] = 0\n    DDR_loc_2d_y['kernel0_gmem_B_m_axi_U'] = 1\n    DDR_loc_2d_x['kernel0_gmem_B_m_axi_U'] = 0\n\n    DDR_loc_2d_y['C_drain_IO_L3_out_serialize_U0'] = 3\n    DDR_loc_2d_x['C_drain_IO_L3_out_serialize_U0'] = 0\n    DDR_loc_2d_y['kernel0_gmem_C_m_axi_U'] = 3\n    DDR_loc_2d_x['kernel0_gmem_C_m_axi_U'] = 0\n\n    DDR_loc_2d_y['kernel0_control_s_axi_U'] = 0\n\n    DDR_enable = [1, 1, 0, 1]\n\nFor each AXI bus, HLS generates two modules that are associated with it.\nFirst, the hardware module in the user code that accesses the data via this bus.\nAs for our example, in ``kernel_kernel.cpp``, the global pointer ``A`` is used by the function\n``A_IO_L3_in_serialize``. Xilinx HLS will rename to the function name to ``A_IO_L3_in_serialize_U0`` after \nsynthesis. AutoBridge requires the RTL module name in the script. \nYou may refer to the HLS report or generated RTL to find the exact RTL module name for your design.\nThe second module is the AXI bus module that connects the user logic to the DDR controller. \nIn our design, it is named ``kernel0_gmem_A_m_axi_U``.\n\nAutoBridge divides the FPGA on-chip area to multiple regions. The figure below shows the \npartitioned regions for both Xilinx Alveo U250 and U280 boards.\n\n.. image:: images/ab_map.png\n    :align: center\n\nAs we can see from the figure, the on-chip logic is physically scattered by die boundaries, DDR/HBM controllers,\nnon-programmable logic, and other peripheral IPs. AutoBridge partitions the on-chip logic based on \nthese modules. \nThe partitioned regions and indices are shown in the figure on the right.\n\nAs the ``gmem_A`` is connected to ``DDR0``, we assign the locations for these modules as:\n\n.. code:: Python\n\n    DDR_loc_2d_y['A_IO_L3_in_serialize_U0'] = 0\n    DDR_loc_2d_x['A_IO_L3_in_serialize_U0'] = 0\n    DDR_loc_2d_y['kernel0_gmem_A_m_axi_U'] = 0\n    DDR_loc_2d_x['kernel0_gmem_A_m_axi_U'] = 0\n\nSimilarly, we add the locations for other AXI buses as shown in the code above.\n\nFor each kernel, there is a controller with S_AXI interface.\nBy the recommendation of AutoBridge, we will assign it to the bottom SLR as it \ntalks to the PCIe IP.\n\n.. code:: Python\n    \n    DDR_loc_2d_y['kernel0_control_s_axi_U'] = 0\n\nLastly, we will also need to update the variable ``DDR_enable`` to reflect the DDR controllers in use.\nIn our example, since we only use the first, second, and fourth DDR channel, we set it as:\n\n.. code:: Python\n\n    DDR_enable = [1, 1, 0, 1]\n\nWe are almost done here, the final step, is to specify the maximal resource utilization ratio of each region.\nAs an example, we set the variable ``max_usage_ratio_2d`` as:\n\n.. code:: Python\n\n    max_usage_ratio_2d = [ [0.8, 0.7], [0.85, 0.75], [0.85, 0.85], [0.85, 0.7] ]\n\nPlease feel free to adjust these ratios according to the resource usage of your design.\nSetting the upper bound of resource usage for each region helps guide AutoBridge to scatter \nthe logic across chip which helps improve the timing. AutoBridge might fail in the case where we \nset the upper bounds lower than the required resource of the design. In that case, try to increase the \nratio until AutoBridge can successfully place the design.\nBesides, as AutoBridge uses the estimated resource from HLS reports which might \nbe inconsistant with the syntheized resource usage. You may need to re-adjust these values \nif the design fails routing in the later stages.\n\nUntil now, you have a modified AutoBridge script customized for our design.\nWe also provide an example script at ``${AUTOSA_ROOT}/autosa_tests/large/mm/step2-autobridge.py``.\n\nNow, execute the Python script to run AutoBridge.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/large/mm/step2-autobridge.py ${AUTOBRIDGE_ROOT}/reference-scripts/\n    cd ${AUTOBRIDGE_ROOT}/reference-scripts\n    ./step2-autobridge.py | tee autobridge.log\n\nAfter it finishes, you should see a folder named ``autobridge`` in the same directory.\nIt contains the modified RTL code and the floorplanning constraint ``constraint.tcl``.\nThe AutoBridge-generated information is printed to ``autobridge.log``.\n\n.. note:: \n\n    If AutoBridge fails, modify the ``max_usage_ratio_2d`` accordingly to make sure \n    there is enough area allocated for the design.\n\nStep 3: Packing the Design\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAutoBridge modifies the HLS generated RTL. \nIn this step, we will pack the modified design into an ``xo`` file that can be synthesized by Xilinx Vitis.\nAutoBridge provides a TCL file for packing the design. Run the following command to pack the design.\n\n.. code:: bash\n\n    cp ${AUTOBRIDGE_ROOT}/reference-scripts/step3-pack-xo.tcl ${AUTOBRIDGE_ROOT}/reference-scripts/autobridge/\n    \nNow modify the this TCL file according to your project.\n\nModify the line 1 from\n\n.. code:: tcl\n\n    open_project PROJECT_NAME\n\nto \n\n.. code:: tcl\n\n    open_project kernel0\n\nModify the line 3 from \n\n.. code:: tcl\n\n    export_design -rtl verilog -format ip_catalog -xo XO_NAME.xo\n\nto \n\n.. code:: tcl\n\n    export_design -rtl verilog -format ip_catalog -xo kernel0.xo\n\n.. note::\n\n    We also provide an example TCL file ``step3-pack-xo.tcl`` under the design example directory ``${AUTOSA_ROOT}/autosa_tests/large/mm/pack_xo.tcl``.\n\nBefore running the TCL script, we will need to copy the original HLS source files to the working directory.\n\n.. code:: bash\n\n    cp -r ${AUTOSA_ROOT}/autosa.tmp/output/src cd ${AUTOBRIDGE_ROOT}/reference-scripts/autobridge/\n\nNow, run the TCL script.\n\n.. code:: bash\n\n    cd ${AUTOBRIDGE_ROOT}/reference-scripts/autobridge\n    vivado_hls -f step3-pack-xo.tcl\n\nAfter Vivado HLS finishes the packing process, you will find a file named ``kernel0.xo`` under the working directory.\n\nStep 4: Synthesizing the Design\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe last step will be synthesizing the design to bitstream using Xilinx Vitis.\nCopy the script for synthesizing the design to the working directory.\n\n.. code:: bash\n\n    cp ${AUTOBRIDGE_ROOT}/reference-scripts/step4-run-vitis.sh ${AUTOBRIDGE_ROOT}/reference-scripts/autobridge/\n\nModify the file ``step4-run-vitis.sh`` according to the design configuration.\nFor this example, modify line 4 from \n\n.. code:: bash\n    \n    TOP=\"YOUR_TOP_NAME\"\n\nto \n\n.. code:: bash\n    \n    TOP=kernel0\n\nModify line 10 from \n\n.. code:: bash\n    \n    XO=\"$(pwd)/YOUR_XO_NAME\"\n\nto \n\n.. code:: bash\n    \n    XO=\"$(pwd)/kernel0.xo\"\n\nModify lines 32-35 from\n\n.. code:: bash\n\n    ARG_FOR_DDR_1=\"YOUR_HLS_ARGUMENT_NAME_FOR_DDR_1\"\n    ARG_FOR_DDR_2=\"YOUR_HLS_ARGUMENT_NAME_FOR_DDR_2\"\n    ARG_FOR_DDR_3=\"YOUR_HLS_ARGUMENT_NAME_FOR_DDR_3\"\n    ARG_FOR_DDR_4=\"YOUR_HLS_ARGUMENT_NAME_FOR_DDR_4\"\n\nto \n\n.. code:: bash\n\n    ARG_FOR_DDR_1=A\n    ARG_FOR_DDR_2=B\n    #ARG_FOR_DDR_3=\"YOUR_HLS_ARGUMENT_NAME_FOR_DDR_3\"    \n    ARG_FOR_DDR_4=C\n\nModify lines 58-61 from \n\n.. code:: bash\n\n    --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_1}:DDR[0] \\\n    --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_2}:DDR[1] \\\n    --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_3}:DDR[2] \\\n    --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_4}:DDR[3] \\\n\nto \n\n.. code:: bash\n\n    --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_1}:DDR[0] \\\n    --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_2}:DDR[1] \\    \n    --connectivity.sp ${TOP}_1.${ARG_FOR_DDR_4}:DDR[3] \\\n\nAn example script of this project can be also found at ``${AUTOSA_ROOT}/autosa_tests/large/mm/step4-run-vitis.tcl``.\n\nNow set up the Xilinx Vitis environment and run the script.\n\n.. code:: bash\n\n    chmod u+x ./step4-run-vitis.sh\n    ./step4-run-vitis.sh\n\nPlease wait until the synthesis process is finished.    \n\nResults Comparsion\n^^^^^^^^^^^^^^^^^^\n\nWe could now compare the designs unoptimized and optimized by AutoBridge.\nThe tables below show the detailed comparison results.\n\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Designs     | MHz | LUT             | REG              | BRAM         | DSP           |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Unoptimized | 146 | 804517 (52.69%) | 1360681 (43.17%) | 953 (40.80%) | 8320 (67.78%) |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n| Optimized   | 300 | 803752 (52.64%) | 1325480 (42.05%) | 952 (40.75%) | 8320 (67.78%) |\n+-------------+-----+-----------------+------------------+--------------+---------------+\n\n+-------------+-----------------+---------------+---------+\n| Designs     | Kernel Time (s) | Host Time (s) | GFLOPs  |\n+-------------+-----------------+---------------+---------+\n| Unoptimized | 0.00548694      | 0.0113009     | 397.496 |\n+-------------+-----------------+---------------+---------+\n| Optimized   | 0.00232357      | 0.0371066     | 938.658 |\n+-------------+-----------------+---------------+---------+\n\n.. image:: images/autobridge.jpg\n    :align: center\n    \nCredit: Young-kyu Choi (ykchoi@cs.ucla.edu)"
  },
  {
    "path": "docs/tutorials/auto_tuning_exhaustive.rst",
    "content": ".. _auto-tuning-label:\n\nAuto-Tuning (Exhaustive Search)\n===============================================================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nAutoSA introduces many tuning knobs during the compilation process, which form a large \ndesign space. To search for designs with good performance, we introduce a simple \nauto-tuner. This page introduces the basics of the auto-tuner and shows how to use\nit for tuning arbitrary programs.\n\nHow Auto-Tuning Works\n---------------------\n\nFirst, let's take a look at the AutoSA compilation flow again, as shown in the figure below.\n\n.. image:: images/flow.png\n    :align: center\n\nThere are multiple optimization passes in the stages of computation and communication management. \nFor each pass, they can either be run in the manual or auto mode.\nIn the manual mode, users will need to supply AutoSA with specific optimization strategies to apply on the \nprogram. In the auto mode, AutoSA will proceed based on the preset policy.\n\nIn the AutoSA configuration file ``${AUTOSA_ROOT}/autosa_config/autosa_config.json``, we list the steps \nthat can be tunally tuned.\n\n* **space_time**: \n  This step applies the space-time transformation to transform algorithms to systolic arrays. \n  By default, for each algorithm, multiple systolic arrays will be generated. In the auto mode,\n  AutoSA will select one array based on the heuristics. In the manual mode, users will select the \n  array to be processed in the following steps.\n* **array_part**: \n  This step partitions the aray into smaller sub-arrays. In the auto mode, all tilable loops \n  that can be used as array partitioning loops will be tiled with a fixed factor. In the manual mode,\n  users can select loops to be tiled and provide the compiler with specific tiling factors.\n* **array_part_L2**:\n  AutoSA allows to generate up to two levels of array partitioning loops. This is helpful to architectures\n  with many levels of memory hierarchy. Similarly, in the auto mode, AutoSA decides which loops to be further tiled and \n  selects a fixed tiling factor. Users can make such choices in the manual mode.\n* **latency**:\n  This step performs the latency hiding in case the innermost loop in the program carries\n  dependence which prevents the design to be fully pipelined. Parallel loops in the program can be \n  used as the latency hiding candidate loops. In the auto mode, all parallel loops will be tiled and \n  the point loops will be permuted innermost. In the manual mode, users will have to specify which loops \n  to be chosen and the corresponding tiling factors.\n* **simd**:\n  This step vectorizes the computation inside PEs. In the auto mode, AutoSA analyzes the program\n  and selects the best vectorizable loop with heuristics. In the manual mode, users will select the \n  vectorizable loop.\n* **hbm**:\n  AutoSA also supports HBM memory. The systolic array will be connected to multiple HBM ports.\n  In the auto mode, AutoSA allocates each array to a fixed number of HBM banks. \n  In the manual mode, users select the number of HBM banks to be connected to each array.\n\nThe auto-tuner of AutoSA takes advantage of the manual modes and will explore all the possible \ncombinations of the optimization strategies to search for designs with good performance.\nAt present, the auto-tuner supports exploration of all the stages above except the hbm stage. \nAnd only the Xilinx HLS C back-end is supported.\n\nThe figure below shows the working flow of the auto-tuner.\n\n.. image:: images/auto_tuner_flow.png\n    :align: center\n\nThere are two phases in the auto-tuner: *training* and *searching*.\n\nIn the training phase, the auto-tuner will generate random sample designs from the input program,\nsynthesizing designs using Xilinx HLS, and use them as training samples to train the resource models. \n\nIn the searching phase, the auto-tuner will explore the design space by enumerating different\noptimization strategies at each stage with pruning. The design space is explored step by step following the \nsequence of the optimization steps in the compilation flow. After the final design samples are generated, \nthe auto-tuner will estimate the latency and resource usage of the design samples and update the searching record.\nEventually, the design with the best performance is selected and outputed.\n\nIn the next subsection, we will show how to use the auto-tuner to perform the design space exploration \nwith the example of matrix multiplication.\n\nAuto-Tuning Example\n-------------------\n\nThe auto-tuner is written as a Python script ``${AUTOSA_ROOT}/autosa_scripts/optimizer.py``.\nIt can be configured by the file ``${AUTOSA_ROOT}/autosa_config/optimizer_settings.json``.\n\nAuto-Tuner Configuration\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe configuration file ``${AUTOSA_ROOT}/autosa_config/optimizer_settings.json`` looks like below:\n\n.. code:: json\n\n    \"training\": {\n      \"sample\": {\n        \"space_time\": {\n          \"mode\": \"exhaustive\",\n          \"n\": -1\n        },\n        \"array_part\": {\n          \"mode\": \"random\",\n          \"n\": 2,\n          \"loop_limit\": -1\n        },\n        \"latency_hiding\": {\n          \"mode\": \"random\",\n          \"n\": 2,\n          \"loop_limit\": 64\n        }\n        ...\n      },\n      \"pruning\": {\n        \"array_part\": {\n          \"enable\": 1,\n          \"PE_num\": [8, 32]\n        },\n        ...\n        \"latency_hiding\": {\n          \"enable\": 1,\n          \"reg_size\": [16, 256]\n        },\n        \"SIMD_vectorization\": {\n          \"enable\": 1,\n          \"PE_num\": [8, 32],\n          \"PE_ratio\": 2\n        }\n      },\n      \"multiprocess\": {\n        \"n_job\": 1\n      }\n    },    \n    \"synth\": {\n      \"multiprocess\": {\n        \"n_job\": 16\n      },\n      \"sample\": {\n        \"n\": 16\n      }\n    },\n    \"search\": {\n      \"metric\": \"latency\",\n      \"cycle_period\": 5,\n      \"mode\": \"customized\",\n      \"n_random\": 5,\n      \"log\": {\n        \"n_record\": 10\n      },\n      \"resource_target\": [\"BRAM18K\", \"DSP\"],\n      \"time_out\": -1,\n      \"update_time_interval\": 2,\n      \"multiprocess\": {\n        \"n_job\": 32\n      },\n      \"sample\": {\n        \"space_time\": {\n          \"mode\": \"exhaustive\",\n          \"n\": -1\n        },\n        ...\n        \"SIMD_vectorization\": {\n          \"mode\": \"exhaustive\",\n          \"n\": -1,\n          \"loop_limit\": 8\n        }\n      },\n      \"pruning\": {\n        \"random_start\": {\n          \"enable\": 1,\n          \"n_trial\": 3,\n          \"n_random\": 3\n        },\n        \"resource\": {\n          \"range\": {\n            \"FF\": [0.25, 0.7],\n            ...\n            \"URAM\": [0, 0.6]\n          }\n        },\n        \"array_part\": {\n          \"enable\": 1,\n          \"PE_num\": [190, 210]\n        },\n        ...\n        \"latency_hiding\": {\n          \"enable\": 1,\n          \"reg_size\": [64, 1280]\n        },\n        \"SIMD_vectorization\": {\n          \"enable\": 1,\n          \"PE_num\": [190, 210],\n          \"PE_ratio\": 3\n        }\n      }\n    }\n\nWe will explain the configuration in detail now. At the top level, there are three sections: \n``training``, ``synth``, and ``search``.\n\n* ``training``: configures how the auto-tuner generates the training samples for resource models.\n* ``synth``: configures how the auto-tuner synthesizes the training samples.\n* ``search``: configures how the auto-tuner searches the design space.\n\nTraining\n\"\"\"\"\"\"\"\"\n\nUnder the subsection of ``training``, there are three fields:\n``sample``, ``pruning``, and ``multiprocess``.\n\n* ``sample``: configures how the auto-tuner samples the design space to generate training samples.\n* ``pruning``: configures how the auto-tuner prunes the design space while generating the training samples.\n* ``multiprocess``: The sampling step can be multiprocessed. This field configures how many processes to be used to execute the sampling step.\n\nAs for the ``sample`` field, we could configure how we sample the design space at each optimization step.\nThe table below summarizes the available attributes for each step.\n\n+---------------+---------------------------+----------------------------------------------------------------+\n| Attributes    | Values                    | Explanations                                                   |\n+===============+===========================+================================================================+\n| ``mode``      | ``exhaustive``, ``random``| This attributes specifies how we are generating the tiling     |\n|               |                           |                                                                |\n|               |                           | factors for each candidate loop. When using ``exhaustive``,    |\n|               |                           |                                                                |\n|               |                           | we will generate a list of all the sub-multiples of the loop   |\n|               |                           |                                                                |\n|               |                           | bound as the the tiling factors. When using ``random``, we     |\n|               |                           |                                                                |\n|               |                           | will randomly sample ``n`` factors from all the feasible tiling|\n|               |                           |                                                                |\n|               |                           | factors.                                                       |\n+---------------+---------------------------+----------------------------------------------------------------+\n| ``n``         | ``int``                   | The default value is -1. If the ``mode`` is set in ``random``, |\n|               |                           |                                                                |\n|               |                           | this value sets the number of candidate tiling factors         |\n|               |                           |                                                                | \n|               |                           | generated for each loop.                                       |\n+---------------+---------------------------+----------------------------------------------------------------+\n| ``loop_limit``| ``int``                   | The default value is -1. It sets the upper bound of the tiling |\n|               |                           |                                                                |\n|               |                           | factors.                                                       |\n+---------------+---------------------------+----------------------------------------------------------------+\n\nFor the ``pruning``, we implement several pruning options considering the characteristics of the systolic array architecture.\nThe table below explains these pruning options.\n\n+--------------------+-------------+--------------------+-------------------------------------------------+\n| Stage              | Attributes  | Values             | Explanations                                    |\n+====================+=============+====================+=================================================+\n| array_part         | ``enable``  | ``0``, ``1``       | Turn off/on the pruning at this step.           |\n|                    +-------------+--------------------+-------------------------------------------------+\n|                    | ``PE_num``  | [``int``, ``int``] | We prune the design space by restraining the    |\n|                    |             |                    |                                                 |\n|                    |             |                    | range of number of PEs of the design.           |\n+--------------------+-------------+--------------------+-------------------------------------------------+\n| latency_hiding     | ``enable``  | ``0``, ``1``       | Turn off/on the pruning at this step.           |\n|                    +-------------+--------------------+-------------------------------------------------+\n|                    | ``reg_size``| [``int``, ``int``] | Latency hiding creates local storage for storing|\n|                    |             |                    |                                                 |\n|                    |             |                    | the intermediate results. This attribute limits |\n|                    |             |                    |                                                 | \n|                    |             |                    | the size of local storage introduced by latency |\n|                    |             |                    |                                                 | \n|                    |             |                    | hiding.                                         | \n+--------------------+-------------+--------------------+-------------------------------------------------+\n| SIMD_vectorization | ``enable``  | ``0``, ``1``       | Turn off/on the pruning at this step.           |\n|                    +-------------+--------------------+-------------------------------------------------+\n|                    | ``PE_num``  | [``int``, ``int``] | This attribute restrains the number of PEs.     |\n|                    +-------------+--------------------+-------------------------------------------------+\n|                    | ``PE_ratio``| ``int``            | This attribute restrains the width/height ratio |\n|                    |             |                    |                                                 | \n|                    |             |                    | of the generated design. Default value is -1.   |\n+--------------------+-------------+--------------------+-------------------------------------------------+ \n\nLastly, for the ``multiprocess``, the field of ``n_job`` specifies how many processes to be used for the \nsamping process. The default value is 1.\n\nSynth\n\"\"\"\"\"\n\nAfter generating the sample designs, we will start to synthesize these designs using \nXilinx HLS for training the resource models.\nThe fields under the subsection ``synth`` configure how we synthesize the sample designs.\nThere are two fields for this subsection.\n\n* ``multiprocess``: configures the number of processes used to synthesize the sample designs.\n* ``sample``: configures the number of designs selected for synthesizing, default value as 16.\n\nSearch\n\"\"\"\"\"\"\n\nUnder the subsection of ``search``, there are the following fields:\n\n* ``metric``: The default value is ``latency``. It specifies the metric the auto-tuner\n  uses to evluate the optimal design. At present, only ``latency`` is supported. \n  The auto-tuner will select the design with the least latency.\n* ``cycle_period``: The default value is ``5``, which stands for 5ns. \n  It specifies the cycle period of the designs for estimating the runtime in seconds.\n* ``log``: During the design space exploration, the auto-tuner will keep the top-k designs \n  found during the searching process. This field specifies the number of records to keep.\n* ``resource_target``: This a list containing the types of resources that the auto-tuner \n  will evaluate for each design point. Users may choose among ``BRAM18K``, ``DSP``, ``FF``,\n  ``LUT``, and ``URAM``.\n* ``time_out``: It specifies the number of minutes after which the DSE process will time out.\n  When setting to -1, the DSE will terminate until the whole DSE is completed.\n* ``update_time_interval``: The auto-tuner can print out the best search results found so far\n  during the DSE process. This field specifies the time period that the auto-tuner updates the \n  searching progress,\n* ``multiprocess``: When the multi-processing is enabled, the design space is partitioned and \n  searched by multiple processes. This field specifies the number of processes to be used \n  for searching.\n* ``mode``: The searching processes can be executed in three modes: ``random``, ``exhaustive``, and \n  ``customized``. In the exhaustive mode, all the possible tiling factors will be explored during the \n  searching process. In the random mode, for each loop to be tiled, a number of random tiling factors \n  are picked. The number of random tiling factors can be specified in the following ``n_random`` field. \n  The default value will be 2. In the customized mode, the auto-tuner will use the sampling policy\n  specified in the ``sample`` field below.\n* ``n_random``: It specifies the number of random tiling factors to be picked per loop.\n* ``sample``: It specifies the sampling policy during the DSE. The format is similar to the \n  sampling policy used during the training step. Please refer to the training subsection for details.\n* ``pruning``: The auto-tuner applies multi-level pruning to speed up the searching process. We will\n  cover the details of this field below.\n\nThe field of ``pruning`` contains the following attributes.\n\n* ``random_start``: Before we start the search process, we can first perform a quick random search.\n  The best design found during this phase will be used as a baseline to prune away worse designs during \n  the later stage. This step can be configured by three attributes:\n  * ``enable``: configures to turn on/off this step.\n  * ``n_trial``: We could run random search multiple times. This attribute configures the number of times \n    we will run the random search.\n  * ``n_random``: configures the number of random tiling factors to be chosen for each loop.\n* ``resource``: We can also prune designs based on the resource usage. This attribute restrains the range \n  of resource usage for valid designs.\n* The rest of fields are similar to pruning fields under the subsection ``training``.\n\nRun the Auto-Tuner\n^^^^^^^^^^^^^^^^^^\n\nAfter configuring the auto-tuner properly, we may start to use the auto-tuner for DSE.\nThe first step is to train the resource models, for the matrix multiplication example, run this\ncommand to train the resource models.\n\n.. code:: bash\n\n    python3 ./autosa_scripts/optimizer.py \\\n    -c './autosa ./autosa_tests/mm/kernel.c --target=autosa_hls_c --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --sa-sizes=\"{kernel[]->space_time[3]}\"' \\\n    --info autosa_config/hw_info.json \\\n    -s autosa_config/optimizer_settings.json \\-\n    -train \\\n    -p xilinx\n\nThe table below explains each argument of the command.\n\n+---------------+---------------------------+----------------------------------------------------------------+\n| Arguments     | Values                    | Explanations                                                   |\n+===============+===========================+================================================================+\n| ``-c``        | ``str``                   | This argument is the basic AutoSA compilation command for the  |\n|               |                           |                                                                |\n|               |                           | target kernel. Please note that the space_time step should be  |\n|               |                           |                                                                |\n|               |                           | specified explictly in the current version.                    |\n+---------------+---------------------------+----------------------------------------------------------------+\n| ``-i``        | ``json``                  | A JSON file that states the resource upper bound for the target|\n|               |                           |                                                                |\n|               |                           | FPGA board.                                                    |\n+---------------+---------------------------+----------------------------------------------------------------+\n| ``-s``        | ``json``                  | A JSON file specifying the auto-tuner configuration.           |\n+---------------+---------------------------+----------------------------------------------------------------+\n| ``-p``        | ``xilinx``                | Configures the target hardware. Currently only Xilinx FPGAs are|\n|               |                           |                                                                |\n|               |                           | supported.                                                     |\n+---------------+---------------------------+----------------------------------------------------------------+\n| ``--training``|                           | Execute the auto-tuner in training phase.                      |\n+---------------+---------------------------+----------------------------------------------------------------+\n| ``--search``  |                           | Execute the auto-tuner in search phase.                        |\n+---------------+---------------------------+----------------------------------------------------------------+\n| ``--tmp-dir`` | ``str``                   | Configures the directory to store the temporary files during   |\n|               |                           |                                                                |\n|               |                           | the DSE.                                                       |\n+---------------+---------------------------+----------------------------------------------------------------+\n\nAfter resource models are trained, run the following command to search for the best design.\n\n.. code:: bash\n\n    python3 ./autosa_scripts/optimizer.py \\\n    -c './autosa ./autosa_tests/mm/kernel.c --target=autosa_hls_c --simd-info=./autosa_tests/mm/simd_info.json --host-serialize --hls --sa-sizes=\"{kernel[]->space_time[3]} --tuning-method=0' \\\n    --info autosa_config/hw_info.json \\\n    -s autosa_config/optimizer_settings.json \\\n    --search \\\n    -p xilinx    \n"
  },
  {
    "path": "docs/tutorials/auto_tuning_genetic.rst",
    "content": "Auto-Tuning (Genetic Search)\n===============================================================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nThis page introduces an alternative auto-tuning appraoch in addition to the exhaustive search.\nThis approach leverages genetic search and provides a much faster convergence speed\nthan the exhaustive search. \n\nAuto-Tuner Overview\n-------------------\n.. image:: images/odyssey_flow.png\n    :width: 500\n    :align: center\n\nOur auto-tuner is named Odyssey (abbreviated from AUtomatic DEsign space exploration for SYstolic arrays). The figure above \ndepicts the tuning flow.\nOdyssey leverages AutoSA to construct the design space automatically. \nAutoSA takes in a C program that describes the target algorithm to map to systolic arrays and generates the systolic array designs in Xilinx HLS C.\nWe extend the AutoSA framework to generate a design description file that covers the full details of the generated hardware. \nOdyssey uses this file to create hardware performance models as symbolic expressions of the tuning parameters that can be used by the auto-tuner. \nInside the auto-tuner, Odyssey implements a two-stage flow that starts with a mathematical programming (MP)-based optimizer that leverages \noptimization solvers with a simplified objective function to produce an initial high-quality design, followed by the evolutionary search with \nthe accurate performance models.\n\nAuto-Tuning Example\n-------------------\nTo tune a certain design, we will first use AutoSA to generate a description file in JSON\nformat. For the matrix multiplication example, use the following command.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3]}\" \\\n    --simd-info=./autosa_tests/mm/simd_info.json \\\n    --host-serialize \\\n    --hls \\\n    --tuning-method=1 \\\n    --param-names=./autosa_tests/mm/param_names.json\n\nNote that we will only need to specify the array to be explored using the argument \n``--sa-sizes=\"{kernel[]->space_time[3]}\"``, and we add a new flag ``--tuning-method=1``\nto instruct AutoSA to generate the required description file.\n\nYou will find a description file ``kernel3.json`` under the directory ``autosa.tmp/output/tuning``.\nThis file describes all the necessary information about the design used during the auto-tuning, including\nthe memory and computation information.\n\nNext, we will call the auto-tuner to search the optimal configuration for this design.\nSwitch to the directory ``autosa_scripts/odyssey``.\n\n.. code:: bash\n\n    cd autosa_scripts/odyssey\n\nCopy the design description file to the tuner directory.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa.tmp/output/tuning/kernel3.json ${AUTOSA_ROOT}/autosa_scripts/odyssey/designs/\n\nThen call the tuner to start the searching.\n\n.. code:: bash\n\n    python main.py --workload=mm --stop-after-time=20 --cst=hw_cst\n\nThe flag ``stop-after-time=20`` tells the tuner to stop searching after 20 seconds.\nThe flag ``cst=hw_cst`` points to the hardware constraints file ``cst/hw_cst.json``.\nThe flag ``workload=mm`` points to the task configuration file ``workload/mm.json`` which describes the \nmatrix dimensions of the problem. For this example, we set ``i=j=k=1024``.\n\nYou will find the detailed information of the optimal design found by the auto-tuner \nprinted in the screen."
  },
  {
    "path": "docs/tutorials/catapult_backend.rst",
    "content": "Generating Catapult HLS Design\n==============================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nAutoSA can generate systolic arrays in Mentor Graphics HLS C. This page shows an example \nabout generating a systolic array design in Mentor Graphics HLS C.\n\n.. note::\n\n    * The current Catapult HLS C back-end only supports two data types ``unsigned short`` and ``unsigned int``.    \n\nGenerating the Design\n---------------------\n\n`Catapult HLS <https://www.mentor.com/hls-lp/catapult-high-level-synthesis/>`_ is a HLS \nsynthesis tool provided by Mentor Graphics which can target both FPGAs and ASICs.\nAutoSA can generate the systolic array described in Catapult HLS C.\nYou may find more details about Catapult HLS at their website (`link <https://www.mentor.com/hls-lp/catapult-high-level-synthesis/>`_).\n\nGenerating the Source Code\n^^^^^^^^^^^^^^^^^^^^^^^^^^\nThe example design used in this tutorial can be found at ``${AUTOSA_ROOT}/autosa_tests/mm_catapult``.\n\nTo generate the design in Catapult HLS C, use the following command:\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm_catapult/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_catapult_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n    --simd-info=./autosa_tests/mm/simd_info.json \\\n    --host-serialize\n\nThe generated design files can be found at ``${AUTOSA_ROOT}/autosa,tmp/output/src``.\nNote that apart from the C files describing the systolic array, AutoSA also emits one TCL file ``kernel_directives.tcl``.\nThis file is a template TCL file that covers the most instructions that will be used when compiling using Catapult HLS.\nUsers still need to modify it according to their own designs to achieve the best performance.\n\nTo generate and optimize a design, programmers can either do it in GUI or use the TCL file.\nWe will first demonstrate the GUI approach, and show a complete TCL file later.\n\n.. note::\n\n    Unlike Xilinx Vivado HLS or Intel OpenCL SDK, Catapult HLS encouranges programmers \n    to use GUI to develop their designs.\n\nUsing Catapult in GUI Mode\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAfter setting up your local environment for Catapult HLS properly, launch the software.\n\n.. code:: bash\n\n    catapult &\n\nIn the GUI window, open **Flow Manager** and select **SCVerify**, set **USE_CCS_BLOCK** to ``yes``,\nas shown in the figure below.\n\n.. image:: images/catapult_0.png\n    :align: center\n\nIn the GUI window, add the following design files into the project. \n\n* ``kernel.h``: The original input kernel header file.\n* ``kernel_host.cpp``: The host code for testing and verifying the design.\n* ``kernel_kernel.h``: The header file for the host code.\n* ``kernel_kernel_hw.h``: The design code describing the systolic array kernel.\n\nClick **Input Files** in the **Synthesis Tasks** panel to add the design files in the directory ``${AUTOSA_ROOT}/autosa,tmp/output/src``.\n\n.. image:: images/catapult_1.png\n    :align: center\n\n.. image:: images/catapult_2.png\n    :align: center    \n\nNext, click **Libraries** in the **Synthesis Tasks** to proceed to the library selection step.\nSelect the FPGA library properly based on your target device.\nHere we select the Xilinx FPGA library and target ``Virtex-uplus`` device family. \n\n.. image:: images/catapult_3.png\n    :align: center  \n\nAt this stage, you should be able to verify your design using software simulation.\nHowever, the current code can't be directly used for software simulation due to some limitations of Catapult.\nOpen the source file ``kernel_kernel_hw.h`` and locate to line 28. Note the code:\n\n.. code:: c:\n\n    // while () // Please add the fifo check for C sim.\n\nThat's it. All the modules use FIFOs to transfer data between each other.\nTo correctly model the FIFO transactions, Catapult HLS requires us to specify the amount of \ninput FIFO transactions so that this function only starts to be executed when all the \ninput data are ready. Currently AutoSA is unable to generate this part automatically, \nusers need to modify this code manually based on the design.\n\nAs an example, for this function ``A_IO_L3_in.run``, we have the input FIFO ``fifo_A_serialize``.\nWe can locate the read transaction of this FIFO at line 40. \nThis transaction is surrounded by loops;\n\n.. code:: c\n\n    for (ac_int<3, false> c0 = 0; c0 <= 3; c0 += 1)\n      for (ac_int<3, false> c1 = 0; c1 <= 3; c1 += 1)\n        for (ac_int<3, false> c2 = 0; c2 <= 3; c2 += 1)\n          for (ac_int<2, false> c3 = 0; c3 <= 1; c3 += 1)\n            for (ac_int<4, false> c4 = 0; c4 <= 7; c4 += 1)\n              for (ac_int<2, false> c5 = 0; c5 <= 1; c5 += 1)\n\nWe could calculate the number of read transactions as :math:`4\\times 4\\times 4\\times 2\\times 8\\times 2 = 2048`.\n\nNow, replace the line 28 from\n\n.. code:: c\n\n    // while () // Please add the fifo check for C sim.\n\nto \n\n.. code:: c\n\n    while (fifo_A_serialize.available(2048))\n\nWe will have to modify all the functions with FIFO read transactions in the source code.\n\nAnother issue to mention is that the current coding style of AutoSA may lead to scheduling failure in the later \nstages in Catapult HLS. To be more specific, the following coding style generated by AutoSA by default \nis not friendly to Catapult.\n\n.. code:: c\n\n    for (int c0 = 0; ...)\n      if (c0 == p0) {\n        for (int c1 = 0; ...) {\n          // logic 1\n          ...\n        }\n      } else {\n        for (int c1 = 0; ...) {\n          // logic 2\n          ...\n        }\n      }\n\nIn the code above, ``if`` branch contains sub loops to be computed. \nSuch coding style could lead to scheduling failure with long feedback paths.\nYou might see the error message below when synthesizing this design in Catapult HLS in the later steps.\n\n.. code:: bash\n\n    Feedback path is too long to schedule design with current pipeline and clock constraints.\n\nTo get around this problem, we need to modify the code to lower the ``if`` branch inside the sub loops.\n\n.. code:: c\n\n    for (int c0 = 0; ...)\n      for (int c1 = 0; ...)\n        if (c0 == p0) {\n          // logic 1\n          ...\n        } else {\n          // logic 2\n          ...\n        }\n\nWe have provided a modified example at ``${AUTOSA_ROOT}/autosa_tests/mm_catapult/kernel_kernel_hw.h``    \nThis file has solved above two issues including adding the FIFO guards and modifying the ``if`` branch.\nWe will work to automate this process in the future.\n\nTo save the time, add this file into the project to replace the original one.\n\nTo perform software emulation, expand the folder of **Verification** in the **Project Files** panel and \nclick **Original Design + Testbench**. Catapult HLS will compile and execute the design.\nYou should be able to see the message ``Passed`` in the **Message** panel if everything goes normally.\n\n.. image:: images/catapult_4.png\n    :align: center \n\nIn the next step, click **Mapping** in the **Synthesis Tasks** panel.\nThis step asks you to specify the frequency target of the design.\nLet's set it to 250MHz for now.\n\n.. image:: images/catapult_5.png\n    :align: center \n\nClick the **Apply** in the frequency setting panel to proceed.\nThen click **Architecture** in the **Synthesis Tasks** panel.\nCatapult HLS will infer the hierarchy of the design.\nYou will see a list of warning messages in the **Constraint Editor**. \nLet's fix them now.\n\n.. image:: images/catapult_6.png\n    :align: center \n\nThese warning messages are of the same type. For example, the first warning message reads:\n\n.. code:: text\n\n    Resource '/kernel0/B_IO_L2_in/idx:rsc' with variable connected to multiple sub-blocks not mapped to '[DirectInput]'\n\nSelect the module ``B_IO_L2_in_inst_0`` in the **Instance Hierarchy**, expand the **Interface** folder in \nthe **Module** panel. Select the interface ``idx:rsc`` and set the **Resource Type** on the right to \n``[DirectInput]``. Then click the **Apply** to apply the changes.\n\n.. image:: images/catapult_7.png\n    :align: center \n\nSpecifically, for all the moduls generated by AutoSA, we may generate an index argument if there are \nmultiple instances of this module to help distinguish between each other.\nCatapult HLS requires us to map such scalar arguments to ``[DirectInput]`` explicitly.\n\nYou will need to apply these modifications one by one until all the warning messages disappear to be \nable to proceed to the next step. Here is a list of modules that need modifications:\n\n* ``A_IO_L2_in_inst_0``\n* ``A_IO_L2_in_boundary_inst_1``\n* ``B_IO_L2_in_inst_0``\n* ``B_IO_L2_in_boundary_inst_1``\n* ``PE_inst_0_0``, ``PE_inst_0_1``, ``PE_inst_1_0``, ``PE_inst_1_1``\n* ``C_drain_IO_L1_out_inst_0_0``, ``C_drain_IO_L1_out_inst_1_0``\n* ``C_drain_IO_L1_out_boundary_inst_0_1``, ``C_drain_IO_L1_out_boundary_inst_1_1``\n* ``C_drain_IO_L2_out_inst_0``\n* ``C_drain_IO_L2_out_boundary_inst_1``\n\nThere is another type of resources we need to specify explicitly, the local buffers.\nI/O modules generated by AutoSA might contain local buffers.\nFor example, click the module ``A_IO_L2_in_inst_0`` and expand the **Interconnect** folder \nin the **Module** panel, you will find the local buffer named ``A_IO_L2_in_local_A_inst:cns``.\nWe will need to assign it to FPGA BRAM explicitly. Select the **Resource Type** and select \n``Xilinx_RAMS.BLOCK_1R1W_RBW`` to map it to a dual-port BRAM. By default, Catapult HLS\nwill assign the property **Stage Replication** to 2, which means that the buffer will be duplicated to generate \nthe double buffer logic. Please refer to the Catapult HLS document for more details about these configurations.\nIf you want to disable the automatic double buffer inferring, modify the **Stage Replication** to 1.\n\nAs for our design, we will need to modify the local buffers inside the following modules with **Stage Replication** as 2.\n\n* ``A_IO_L2_in_inst_0``\n* ``A_IO_L2_in_boundary_inst_1``\n* ``B_IO_L2_in_inst_0``\n* ``B_IO_L2_in_boundary_inst_1``\n\nAnd the following modules with **Stage Replication** as 1.\n\n* ``C_drain_IO_L1_out_inst_0_0``, ``C_drain_IO_L1_out_inst_1_0``\n* ``C_drain_IO_L1_out_boundary_inst_0_1``, ``C_drain_IO_L1_out_boundary_inst_1_1``\n\nClick the **RTL** in **Synthesis Tasks** to proceed.\n\nCatapult HLS will schedule the design and generate RTL. \nHowever, the scheduler of Catapult HLS is limited and you might encounter the following scheduling failure.\n\n.. code:: bash\n\n    Feedback path is too long to schedule design with current pipeline and clock constraints.\n    Schedule failed, sequential delay violated. List of sequential operations and dependencies:\n      MEMORYREAD \"for#1:for:for:for:for#2:read_mem(local_C:rsc.@)\" kernel_kernel_hw.h(564,41,15)\n      MEMORYWRITE \"for#1:for:for:for:for#2:write_mem(local_C:rsc.@)\" kernel_kernel_hw.h(564,22,15)\n    Feedback path is too long to schedule design with current pipeline and clock constraints.      \n\nCatapult fails to successfully schedule certain loops in the design. \nNow let's take a look at this loop.\n\n.. code-block:: c\n    :linenos:\n\n    class PE {\n      ...\n      for (ac_int<3, false> c2 = 0; c2 <= 3; c2 += 1)\n        for (ac_int<4, false> c5 = 0; c5 <= 7; c5 += 1)\n          for (ac_int<4, false> c6 = 0; c6 <= 7; c6 += 1)\n            for (ac_int<4, false> c7 = 0; c7 <= 7; c7 += 1) {\n              ...\n              #pragma unroll yes\n              for (ac_int<2, false> c8 = 0; c8 <= 1; c8 += 1) \n                local_C[c7][c6] = (local_C[c7][c6] + (local_A[0][c8] * local_B[0][c8]));\n            }\n      ...\n    }\n\nThis loop is inside the PE function to update the local variable ``local_C[c7][c6]``.\nHowever, Catapult HLS fails to pipeline the loop and complains the dependence between the \nwrite access of ``local_C[c7][c6]`` at line 10 and the read access of ``local_C[c7][c6]`` at \nthe same line.\nHowever, if we take a closer look at this loop, as we have performed latency hiding by tiling and permuting \ntwo parallel loops ``c6`` and ``c7`` inside, and as the loop ``c8`` is unrolled, \nthere shouldn't be any dependence here and the loop should be fully pipelined, as observed when \nusing Xilinx HLS. \n\nHowever, since the scheduling algorithms are more conservative compared to Xilinx HLS, to \nachieve fully pipelining, we will have to mark this dependence false explicitly in Catapult HLS.\nTo do this, we have to modify the TCL script when compiling the design.\n\nCatapult HLS already generated a TCL file containing all the instructions we have applied \nin the previous steps in ``${CATAPULT_PRJ}/kernel0.v1/directives.tcl``. Open the file and the edit the last line from \n\n.. code:: tcl\n\n    go architect\n    go allocate\n\nto \n\n.. code:: tcl\n\n    go architect\n    directive set /kernel0/PE/run/for#1:for:for:for:for#2:read_mem(local_C:rsc.@) -IGNORE_DEPENDENCY_FROM {for#1:for:for:for:for#2:write_mem(local_C:rsc.@)}\n    go allocate\n\nNote that we add a directive to let Catapult ignore this dependence.\nNow let's use this TCL script to recompile the design.\n\nFirst, move out this TCL script\n\n.. code:: bash\n\n  mv ${CATAPULT_PRJ}/kernel0.v1/directives.tcl ${CATAPULT_PRJ}/\n\nThen in the Catapult GUI, click **File** -> **Run Script**. And select the ``directives.tcl``.\nCatapult HLS will recompile the design using this TCL file.\n\nYou should see the design successfully scheduled without any errors.\nNow click **RTL** in the **Task Bar** panel to generate the final RTL.\n\nOne more optional step is using Catapult HLS to perform RTL simulation. This requires proper \nsimulation tools installed on your workstation. Please refer to Catapult manuals for \nsupported simulators. Here we use the Mentor QuestaSim. To perform RTL simulation, \nclick **Verification** -> **QuestaSIM** -> **Concat RTL Verilog output 'concat_sim_rtl.v' vs Untimed C++**.\n\nCatapult HLS will launch QuestaSIM simulator as shown in the figure below.\n\n.. image:: images/catapult_sim.png\n    :align: center   \n\nType in ``run -all`` to start the simulation, as shown in the figure below.\n\n.. image:: images/catapult_sim2.png\n    :align: center   \n\nPhew! Up to now we have finished the complete flow in GUI. \nJust a few things to keep in mind when using the Catapult flow:\n\n* Specify the FIFO guards for C simulation.\n* Modify the ``if`` coding style for better scheduling.\n* Explicitly specify the false dependence for better scheduling.\n\nUsing Catapult in TCL Mode\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAll the steps we have presented in the previous sub seciton can be executed through a TCL script.\nA complete TCL file for this flow can be found at ``${AUTOSA_ROOT}/autosa_tests/mm_catapult/directives.tcl``.\n\nNote that we have generated a template TCL file in the source directory \n``${AUTOSA_ROOT}/autosa,tmp/output/src/kernel_directives.tcl``. \nIt cover the most boilerplate code. However, you will still need to modify some parts of the file such as \nthe source code path and inserting the dependence assertation to successfully schedule the design.\n\nTo use TCL file for compilation, open the Catapult GUI,\nclick **File** -> **Run Script**, and select the TCL file.\nCatapult HLS will compile the design and generate RTL."
  },
  {
    "path": "docs/tutorials/getting_started.rst",
    "content": "Getting Started\n===============\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nIn this tutorial, we will give an overview of the compilation process of AutoSA \nand demonstrate it with an example.\n\nThe Compilation Flow of AutoSA\n------------------------------\n\nThe figure below shows the overall compilation flow of AutoSA.\n\n.. image:: images/flow.png\n    :align: center\n\nThe input code of AutoSA is a C code that describes the algorithm to be mapped to\nthe systolic array. AutoSA is built on the polyhedral framework, which takes SCoP (static control of parts) \nprograms as the input. In addition, AutoSA assumes that all the dependences of the input\nprograms have been rendered uniform before the compilation.\n\nThe example code below describes the matrix multiplication and serves as the input to AutoSA.\n\n.. code:: c\n\n    #pragma scop\n    for (int i = 0; i < I; i++)        \n      for (int j = 0; j < J; j++)   {\n        C[i][j] = 0;\n        for (int k = 0; k < K; k++)\n          C[i][j] += A[i][k] * B[k][j];\n      }\n    #pragma endscop\n\nNote that we insert the pragma\n\n.. code:: c\n\n    #pragma scop\n\nbefore the code fragment and insert the pragma\n\n.. code:: c\n\n    #pragma endscop\n\nafter the code fragment to annotate the code region to be analyzed and transformed by the compiler.    \n\nIn the next step, a polyhedral representation of the input code is extracted. AutoSA \nuses `integer set library (ISL) <http://isl.gforge.inria.fr/>`_ for manipulating the polyhedral IR.\nAfter extracting the polyhedral IR, AutoSA will perform an initial transformation of the program using the \nISL scheduler. The ISL scheduler aims to transform the program to maximize the locality and parallelism.\nThe transformed program by ISL will be the input to the rest steps of AutoSA.\nFor more details about the ISL scheduler, please refer to the ISL manual. Readers are also \nrecommended to read this paper [PLUTO08]_ for more details about the scheduling algorithm used by ISL.\n\nThe next stage, named as *legality check*, checks if the input program can legally be\nmapped to a systolic array. At that stage, we simply check if all dependences are uniform.\n\nA complete systolic array architecture consists of both the PE array and the on-chip I/O network. \nAutoSA separates the process of building these two components into two stages: \n*computation and communication management*. \nThe stage of computation management constructs the PE and optimizes its micro-architecture. \nAfter that, the stage of communication management builds the I/O network for transferring data between PEs and the external memory. \n\nAfter the previous stages, AutoSA generates the AST from the optimized program. \nThe AST is then traversed to generate the final design for the target hardware.\nAt present, AutoSA can generate Xilinx HLS C, Intel OpenCL, and Mentor Graphics Catapult C.\n\nThe stages of computation and communication management involve multiple optimization techniques, \neach introducing several tuning options. \nAutoSA implements tunable knobs for these techniques which can be set by users manually or tuned by an auto-tuner.\n\nAn Example\n----------\n\nThe example code above can be found at ``${AUTOSA_ROOT}/autosa_tests/mm_getting_started/kernel.c``.\n\nGenerating Hardware Code\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo compile the code to Xilinx HLS C for Xilinx Vitis toolkit, run the code below.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm_getting_started/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" --simd-info=./autosa_tests/mm/simd_info.json --host-serialize\n\nThe generated code can be found in the directory ``${AUTOSA_ROOT}/autosa.tmp/output/src/`.\nFor detailed information of AutoSA compilation options, please run\n\n.. code:: bash\n\n    ./autosa --help\n\nor refer to `AutoSA Compilation Options`_.\n\nGenerating FPGA Bitstream\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nSet up the Xilinx Vitis development kit. Run the following commands.\n\n.. code:: bash\n\n    source /opt/Xilinx/Vitis/2019.2/settings64.sh\n    source /opt/xilinx/xrt/setup.sh\n\nExecute the makefile to build the design.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/mm_getting_started/Makefile autosa.tmp/output/\n    cp ${AUTOSA_ROOT}/autosa_tests/mm_getting_started/connectivity.cfg autosa.tmp/output/\n    cd ${AUTOSA_ROOT}/autosa.tmp/output\n    make all\n\n.. admonition:: Makefile Options\n\n    * ``MODE := hw_emu``: Set the build configuration mode to HW Emulation, other modes: ``sw_emu``|``hw``\n    * ``PLATFORM := xilinx_u250_xdma_201830_2``: Select the target platform\n    * ``KERNEL_SRC := `src/kernel_kernel.cpp`: List the kernel source files\n    * ``HOST_SRC := src/kernel_host.cpp``: List the host source files\n\nThe ``connectivity.cfg`` describes the DRAM port mapping. \nFor more details about how to change the DRAM port mapping, \nplease refer to the Xilinx tutorials: `Using Multiple DDR Banks <https://xilinx.github.io/Vitis-Tutorials/2020-1/docs/bloom/6_using-multiple-ddr.html>`_.\n\nGenerating Xilinx HLS project\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAutoSA also supports generate HLS projects. Add the option\n\n.. code:: bash\n\n    --hls\n\nto the command when compiling the program.\n\nAutoSA will generate an HLS host file ``${AUTOSA_ROOT}/autosa.tmp/output/src/kernel_host.cp``\ninstead of the OpenCL host file generated in the previous step. \nTo build the HLS project, use the following commands.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_scripts/hls_scripts/hls_script.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n    cd ${AUTOSA_ROOT}/autosa.tmp/output\n    vivado_hls -f hls_script.tcl\n\nUsing AutoSA in Manual Mode\n---------------------------\n\nAs mentioned previously, AutoSA can be used in both *manual* and *auto* mode. \nIn the auto mode, AutoSA will proceed based on the pre-set policy. In the manual mode,\nAutoSA will dump out the optimization choices to users. Users will then provide AutoSA with specific optimization policy, which \nwill be applied by AutoSA. \n\nThe tunable knobs of the compilation flow are included in the configuration file\n``${AUTOSA_ROOT}/autosa_config/autosa_config.json``. Currently, the following optimization \nstages can be configured in AutoSA.\n\n* **space_time**: \n  This step applies the space-time transformation to transform algorithms to systolic arrays. \n  By default, for each algorithm, multiple systolic arrays will be generated. In the auto mode,\n  AutoSA will select one array based on the heuristics. In the manual mode, users will select the \n  array to be processed in the following steps.\n* **array_part**: \n  This step partitions the aray into smaller sub-arrays. In the auto mode, all tilable loops \n  that can be used as array partitioning loops will be tiled with a fixed factor. In the manual mode,\n  users can select loops to be tiled and provide the compiler with specific tiling factors.\n* **array_part_L2**:\n  AutoSA allows to generate up to two levels of array partitioning loops. This is helpful to architectures\n  with many levels of memory hierarchy. Similarly, in the auto mode, AutoSA decides which loops to be further tiled and \n  selects a fixed tiling factor. Users can make such choices in the manual mode.\n* **latency**:\n  This step performs the latency hiding in case the innermost loop in the program carries\n  dependence which prevents the design to be fully pipelined. Parallel loops in the program can be \n  used as the latency hiding candidate loops. In the auto mode, all parallel loops will be tiled and \n  the point loops will be permuted innermost. In the manual mode, users will have to specify which loops \n  to be chosen and the corresponding tiling factors.\n* **simd**:\n  This step vectorizes the computation inside PEs. In the auto mode, AutoSA analyzes the program\n  and selects the best vectorizable loop with heuristics. In the manual mode, users will select the \n  vectorizable loop.\n* **hbm**:\n  AutoSA also supports HBM memory. The systolic array will be connected to multiple HBM ports.\n  In the auto mode, AutoSA allocates each array to a fixed number of HBM banks. \n  In the manual mode, users select the number of HBM banks to be connected to each array.\n\n.. note:: \n\n    For more details about the optimization steps in AutoSA, please refer to the tutorial :ref:`construct-and-optimize-array-label`.\n\nTo switch between two different modes, modify the modes in ``${AUTOSA_ROOT}/autosa_config/autosa_config.json``.\nFor example, modify the content in ``autosa_config.json`` to\n\n.. code:: json\n\n    \"array_part\": {\n        \"enable\": 1,\n        \"mode\": \"auto\"\n    }\n\nto enable the array partitioning to execute in the auto mode. Modify it to \n\n.. code:: json\n\n    \"array_part\": {\n        \"enable\": 1,\n        \"mode\": \"manual\"\n    }\n\nto run it in the manual mode.\n\nBelow we show how to use AutoSA in manual mode in detail.\n\nSpace-Time Transformation\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn this step, multiple systolic arrays are generated from the input program. We will \nneed to select one systolic array to proceeed. We set this step to manual mode in the \nconfiguration file.\n\n.. code:: json\n\n    \"space_time\": {\n        \"mode\": \"manual\"\n    }\n\nThen run the command.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm_getting_started/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output\n\nIn the terminal, AutoSA displays a message.\n\n.. code:: bash\n\n    [AutoSA] 6 systolic arrays generated.\n\nAutoSA also generates a file ```${AUTOSA_ROOT}/autosa.tmp/output/tuning.json``,\nwhich includes guidance information for further optimization. In this example,\nwe have the content below.\n\n.. code:: json\n\n    \"space_time\": {\n        \"n_kernel\": 6\n    }\n\nThis tells the user that there are 6 different systolic array candidates generated. \nWe may select one of them to proceed. \nFor example, we could select the fourth candidate which is a 2D systolic array \nwith the data from matrix A transferred horizontally, and data from matrix B \ntransferred vertically. Each PE computes one element of ``C[i][j]`` locally, \nwhich is drained out at last to the external memory. \nThe architecture of this array is depicted below.\n\n.. image:: images/mm_array_opt.png\n    :width: 300\n    :align: center\n\nTo guide AutoSA to select this design, supply AutoSA with an additional argument.\n\n.. code:: bash\n\n    --sa-sizes=\"{kernel[]->space_time[3]}\"\n\nwhich tells AutoSA to select the fourth array (index starting from 0) during the space-time transformation.\n\nArray Partitioning\n^^^^^^^^^^^^^^^^^^\n\nIn this step, we will tile the space loops to partition the original array into smaller ones. The computation is then scheduled onto the sub-arrays in sequence. \nWe first set this step in manual mode. Then run the command:\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm_getting_started/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3]}\"\n\nAutoSA displays new information on the terminal.\n\n.. code:: bash\n\n    [AutoSA] Appy PE optimization.\n    [AutoSA] Apply array partitioning.\n\nThe ``tuning.json`` contains the content below:\n\n.. code:: json\n\n    \"array_part\": {\n        \"tilable_loops\": [64, 64, 64],\n        \"n_sa_dim\": 2\n    }\n\nThis tells users there are three candidate loops that can be tiled. \nThe upper bounds of each loop is 64. We may select any tiling factor no greater than 64. \nBesides, AutoSA only supports tiling factors as sub-multiples of the loop bounds for now. \nIf the user is interested to understand which three loops are selected as the candidate loops, \nadd the option ``--AutoSA-verbose`` to the command and run again.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm_getting_started/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3]}\" --AutoSA-verbose\n\nBelow is the printed message from AutoSA.\n\n.. code:: text\n\n    domain: \"{ S_0[i, j] : 0 <= i <= 63 and 0 <= j <= 63; S_1[i, j, k] : 0 <= i <= 63 and 0 <= j <= 63 and 0 <= k <= 63 }\"\n    child:\n        context: \"{ [] }\"        \n        child:\n            schedule: \"[{ S_0[i, j] -> [(i)]; S_1[i, j, k] -> [(i)] }, { S_0[i, j] -> [(j)]; S_1[i, j, k] -> [(j)] }, { S_0[i, j] -> [(0)]; S_1[i, j, k] -> [(k)] }]\"\n            permutable: 1\n            coincident: [ 1, 1, 0 ]\n            space_time: [ space, space, time ]\n            pe_opt: [ array_part, array_part, array_part ]\n            sched_pos: [ 0, 1, 2 ]       \n            child:\n                sequence:\n                - filter: \"{ S_0[i, j] }\"\n                - filter: \"{ S_1[i, j, k] }\"    \n\nThis is the schedule tree of the current program. More details about the schedule tree can be found\nin the paper [SCHEDTREE14]_.\nThe first *domain* node represents the iteration domain of the input program.\nThe \"band\" node contains the partial schedule of the loops. \nIn the current program, there are three loops :math:`i`, :math:`j`, and :math:`k`.\nAutoSA provides verbose loop information. For example, the attribute of coincident indicates \nif the loop is parallel. The pe_opt attribute annotates the candidate loops that can be \nused for array partitioning. In this case, all three loops are tilable and can be used for \narray partitioning.\n\nAs an example, we select the tiling factors ``[16,16,16]``. Run hte command below.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm_getting_started/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16]}\"\n\nLatency Hiding\n^^^^^^^^^^^^^^\n\nThis step performs latency hiding. We will select parallel loops, tile them, and permute the point \nloops innermost to hide the computation latency. \nAfter the previous step, we will find the content below in the `tuning.json`.\n\n.. code:: json\n\n    \"latency\": {\n        \"tilable_loops\": [16,16]\n    }\n\nSimilarly, you may add the argument `--AutoSA-verbose` to find out which loops have \nbeen selected as the latency hiding candidate loops.\n\nWe select the tiling factors ``[8,8]`` to proceed. Run the command below.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm_getting_started/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8]}\"\n\nSIMD Vectorization    \n^^^^^^^^^^^^^^^^^^\n\nIn this step, we select the vectorizable loop, tile them, permute the point loop innermost.\nThe point loop will be unrolled by HLS at last. At present, a loop is set as the candidate loop if \nmeeting the following criteria:\n\n* It is a parallel loop or reduction loop that is annotated by users.\n* All array references within the loop are stride-one or stride-zero with regard to this loop.\n  \n.. note::\n    \n    For the reduction loops, AutoSA requires users to annotate the loop manually. This \n    is done by providing a ``simd_info.json`` file to the compiler. \n    For our example, we can provide a ``simd_info.json`` file with the content below.\n    \n    .. code:: json\n\n        \"kernel3\": {\n            \"reduction\": [\"y\"]\n        }\n\n    The ``kernel[index]`` indicates the current array to be analyzed. As mentioned in the step of \n    space-time transformation, we select the 3rd array to proceed.\n    The ``reduction`` attribute indicates if the candidate loop is a reduction loop.\n    When running the last command\n    \n    .. code:: bash\n\n        ./autosa ./autosa_tests/mm_getting_started/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8]}\"\n\n    AutoSA will check all the non-parallel loops and prompt messages to ask if the loop is a \n    reduction loop. Alternatively, users can prepare the information in ``simd_info.json`` following the loop sequence \n    as shown in the prompted compilation message.\n    \nIn this example, loops :math:`i` and :math:`j` have been selected as the space loops. Only the loop :math:`k` is left\nwhich is a non-parallel loop. Therefore, we provide the attribute ``\"reduction\": [\"y\"]`` to the compiler\nas the loop :math:`k` is a reduction loop.\n\nWith this information, AutoSA further checks if all array accesses under the loop :math:`k` are \nstride-one or stride-zero. Note that among three array accesses ``C[i][j]``, ``A[i][k]``, and ``B[k][j]``,\naccess ``C[i][j]`` is stride-zero in regard to loop :math:`k`, and ``A[i][k]`1 is stride-one.\nHowever, ``B[k][j]`` is neither stride-one nor stride-zero. \nA layout transformation is required to make this array \naccess to stride-one/zero.\nAutoSA will examine the possibility of performing layout transformation to expose more\nvectorization possibility. In this case, the following information will be printed in the terminal.\n\n.. code:: bash\n\n    [AutoSA] Array reference (R): { S_1[i, j, k] -> B[k, j] }\n    [AutoSA] Layout transform: Permute dim (0) to the innermost\n\nThis indicates that AutoSA suggests to permute the first dimension of the array B to innermost to make the loop vectorizable.\n\n.. note:: \n\n    In the example code, simply uncomment the line below to apply the layout transformation.\n\n    .. code:: c\n\n        #define LAYOUT_TRANSFORM\n\nAfter modifying the input code with this layout transformation, run the following command.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm_getting_started/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8]}\" --simd-info=./autosa_tests/mm_getting_started/simd_info.json\n\nAnd we can find the updated ``tuning.json``.\n\n.. code:: json\n\n    \"simd\": {\n        \"tilable_loops\": [16],\n        \"scores\": [15],\n        \"legal\": [1],\n        \"sa_dims\": [2, 2]\n    }\n\nThis indicates that the candidate loop has the upper bound of 16. \nWe assign a score based on heuristics to each candidate loop. \nThe higher the score is, the more hardware-friendly it is when being selected as the SIMD loop. \nThe item legal indicates that this loop can be directly used for optimization. \nOtherwise, we will need to perform further layout transformation on the arrays used by the program to expose the SIMD opportunity. \nSince we have already applied the layout transformation, this attribute is set to 1.\n\nWe select the tiling factor ``[2]`` and proceed. Run the command below.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm_getting_started/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" --simd-info=./autosa_tests/mm_getting_started/simd_info.json\n\nAfter this step, you should be able to find the files of the generated arrays in ``${AUTOSA_ROOT}/autosa.tmp/output/src``.\n\nAutoSA Compilation Options\n--------------------------\n\n* ``--autosa-autosa, --autosa``: generate systolic arrays using AutoSA [default: yes]\n* ``--autosa-block-sparse, --block-sparse``: use block sparsity [default: no]\n* ``--autosa-block-sparse-ratio, --block-sparse-ratio``: block sparsity ratio (e.g., kernel[]->A[2,4])\n* ``--autosa-config, --config``: AutoSA configuration file\n* ``--autosa-data-pack, --data-pack``: enable data packing [default: yes]\n* ``--autosa-data-pack-sizes, --data-pack-sizs``: data pack sizes upper bounds (bytes) at \n  innermost, intermediate, outermost I/O level [default: kernel[]->data_pack[8,32,64]]\n* ``--autosa-double-buffer. --double-buffer``: enable double-buffering for data transfer [default: yes]\n* ``--autosa-double-buffer-style, --double-buffer-style``: change double-buffering logic coding style\n  (0: while loop 1: for loop) [default: 1]\n* ``--autosa-fifo-depth, --fifo-depth``: default FIFO depth [default: 2]\n* ``--autosa-hbm, --hbm``: use multi-port DRAM/HBM [default: no]\n* ``--autosa-hbm-port-num, --hbm-port-num``: default HBM port number per array [default: 2]\n* ``--autosa-hls, --hls``: generate Xilinx HLS host [default: no]\n* ``--autosa-host-serialize, --host-serialize``: serialize/deserialize the host data [default: no]\n* ``--autosa-insert-hls-dependence, --insert-hls-dependence``: insert Xilinx HLS dependence pragma (alpha version) [default: no]\n* ``--autosa-int-io-dir, --int-io-dir``: set the default interior I/O direction (0: [1,x] 1: [x,1]) [default: 0]\n* ``--autosa-io-module-embedding, --io-module-embedding``: embed the I/O modules inside PEs if possible [default: no]\n* ``--autosa-loop-infinitize, --loop-infinitize``: apply loop infinitization optimization (Intel OpenCL only) [default: no]\n* ``--autosa-local-reduce, --local-reduce``: generate non-output-stationary array with local reduction [default: no]\n* ``--autosa-reduce-op, --reduce-op``: reduction operator (must be used with local-reduce together)\n* ``--autosa-lower-int-io-L1-buffer, lower-int-io-L1-buffer``: lower the L1 buffer for interior I/O modules [default: no]\n* ``--autosa-max-sa-dim, --max-sa-dim``: maximal systolic array dimension [default: 2]\n* ``--autosa-output-dir, --output-dir``: AutoSA Output directory [default: ./autosa.tmp/output]\n* ``--autosa-sa-sizes, --sa-sizes``: per kernel PE optimization tile sizes\n* ``--autosa-sa-type=sync|async, --sa-type=sync|async``: systolic array type [default: async]\n* ``--autosa-simd-info, --simd-info``: per kernel SIMD information\n* ``--autosa-simd-touch-space, --simd-touch-space``: use space loops as SIMD vectorization loops [default: no]\n* ``--autosa-two-level-buffer, --two-level-buffer``: enable two-level buffering in I/O modules [default: no]\n* ``--autosa-uram, --uram``: use Xilinx FPGA URAM [default: no]\n* ``--autosa-use-cplusplus-template, --use-cplusplus-template``: use C++ template in codegen (necessary for irregular PEs) [default: no]\n* ``--autosa-verbose, --verbose``: print verbose compilation information [default: no]\n* ``--autosa-hcl, --hcl``: generate code for integrating with HeteroCL [default: yes]\n\nBibliography\n------------\n\n.. [PLUTO08] Bondhugula, Uday, et al. \"A practical automatic polyhedral parallelizer and locality optimizer.\" Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation. 2008.\n.. [SCHEDTREE14] Verdoolaege, Sven, et al. \"Schedule trees.\" International Workshop on Polyhedral Compilation Techniques, Date: 2014/01/20-2014/01/20, Location: Vienna, Austria. 2014."
  },
  {
    "path": "docs/tutorials/hcl_integrate.rst",
    "content": "HeteroCL Integration\n====================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nThis page summarizes some issues when integrating AutoSA with HeteroCL.\n\nIssue 1: Generating HCL-compatible outputs\n------------------------------------------\n\nTo generate HCL-compatible code, we will need to add the flags ``--hcl --hls`` when compiling the program.\nBelow is the example command:\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n    --simd-info=./autosa_tests/mm/simd_info.json \\\n    --host-serialize \\\n    --hcl \\\n    --hls\n\nIssue 2: Generating kernels with AXI Stream interface\n-----------------------------------------------------\n\nTo generate AXI Stream interface, we will need to enable host serialization and generate\nthe HLS host by adding the flag ``--axi-stream --hls --host-serialize``.\nBelow is the example command:\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n    --simd-info=./autosa_tests/mm/simd_info.json \\\n    --host-serialize \\\n    --hcl \\\n    --axi-stream \\\n    --hls\n\nIssue 3: Hanging kernels (pending)\n----------------------------------\n\nThe 8x8 GEMM kernel without host serialization will hang on-board.\nThe kernel with host serialization can pass the on-board testing.\nWe are still debugging this issue.\nThe command for this design:\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/large/mm/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[256,256,512];kernel[]->latency[32,32];kernel[]->simd[8]}\" \\\n    --simd-info=./autosa_tests/large/mm/simd_info.json \\\n    --hcl \\\n    --hls    "
  },
  {
    "path": "docs/tutorials/host_serialize.rst",
    "content": "Understanding Host Serialization\n================================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nAutoSA supports serializing the data on the host side to increase the memory burst length.\nThis technique is important in achieving high effective DRAM bandwidth. \nThis page explains the mechanisms of host serialization.\n\nHow It Works\n------------\n\nHost serialization is enabled by supplying AutoSA with the flag ``--host-serialize``.\nThe figure below explains the current mechanisms of serialization.\n\n.. image:: images/serialize_example.png\n    :align: center\n\nThe upper part of the figure shows a piece of code that accesses a tiled matrix block by block.\nInside each block, data are loaded sequentially in row major.\nWe pipeline the innermost loop. The array ``A`` is stored in DRAM.\n\nWhen synthesizing such a code in Xilinx HLS, HLS will automatically infer a burst length of :math:`4\\times 4` for the DRAM \naccess based on the inner loops.\nHowever, this burst length is rather small to make use of the DRAM bandwidth.\n\nThe figure below from the paper [CHOI16]_ shows the profiled effective DRAM bandwidth versus burst length on Xilinx FPGAs.\n\n.. image:: images/dram_bw.png\n    :width: 500\n    :align: center\n\nAs can be seen in the figure above, a minimal burst length of 128KB is required to reach the maximal effective bandwidth \non Xilinx devices. The low burst length in the current design will lead to a rather \nlow DRAM effective bandwidth that will eventually limit the performance.\n\nThis phenomemon makes it critical to perform data serialization.\nThe code in the middle shows the current method of data serialization implemented in AutoSA.\nSimply, we will allocate a new array to hold the serialized data. The new array is filled \nbased on the original data access pattern with an increasing counter.\n\nThis leads to a new matrix as shown in the bottom part of the figure. Now we can simply \nreplace the original code that accesses DRAM with this new code.\nHLS will then infer the burst length of :math:`2\\times 2\\times 4\\times 4`, which is the maximal burst length \nwe can achieve for this design.\n\nAs for the systolic array design, after supplying AutoSA with the flag ``--host-serialize``, \nyou will notice a separate serialization module (S) created between the original outermost I/O module and the DRAM.\nThe figure below compares the systolic array architecture w/o and w/ data serialization.\n\n.. image:: images/array_serialize.png\n    :align: center\n\nWe plug in the serialized data access logic into these serialization modules to achieve the maximal burst length.\n\nPitfalls\n--------\n\nThe current serialization appraoch is a temporary solution, as it will create \nredundant data in the serialized matrix which bloats the size of this matrix.\nThe figure below shows one of such examples.\n\n.. image:: images/serialize_example2.png\n    :align: center\n\nIn this example, when accessing the matrix, we introduce one addition level of loop ``r1`` to \nvisit each tile twice before moving to the next tile.\nIn such a case, using the current method, we will generate a serialized matrix which is \ntwo times larger than the original matrix. Things will become worse if such reuse happens more often.\nPlease keep in mind of this shortcoming of serialization when using it in AutoSA.\nWe will improve it in the future.\n\nBibliography\n------------\n\n.. [CHOI16] Choi, Young-kyu, et al. \"A quantitative analysis on microarchitectures of modern CPU-FPGA platforms.\" Proceedings of the 53rd Annual Design Automation Conference. 2016."
  },
  {
    "path": "docs/tutorials/index.rst",
    "content": "AutoSA Tutorials\n================\n\nThis page contains a series of tutorials to get you familiar with the systolic array \narchitectures and the compilation process of AutoSA.\n\n.. toctree::\n    :maxdepth: 1\n\n    theory_background\n    optimize_array\n    getting_started    \n    matrix_multiplication\n    auto_tuning_exhaustive\n    auto_tuning_genetic\n    auto_bridge\n    structural_sparsity    \n    intel_backend\n    catapult_backend\n    host_serialize\n    hcl_integrate"
  },
  {
    "path": "docs/tutorials/intel_backend.rst",
    "content": "Generating Intel OpenCL Design\n==============================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nAutoSA can generate systolic arrays in Intel OpenCL. This page shows an example \nabout generating a systolic array design for Intel FPGAs. \n\n.. note:: \n\n    The Intel OpenCL back-end is not performant currently due to the channel overheads\n    and may halt on-board for certain test cases.\n    This back-end is provided only for demo purpose. \n    Please consider Xilinx or Catapult back-end for stable use.\n\nGenerating the Design\n---------------------\n\nThe design example used by this tutorial is at ``${AUTOSA_ROOT}/autosa_tests/mm_intel``.\nRun the following command to generate the systolic array.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm_intel/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_opencl \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->array_part_L2[2,2,2];kernel[]->latency[8,8];kernel[]->simd[2]}\" \\\n    --simd-info=./autosa_tests/mm_intel/simd_info.json \\\n    --host-serialize \\\n    --loop-infinitize \\\n    --double-buffer-style=0 \\\n    --mem-port-map=\"{kernel[]->A[0];kernel[]->B[1];kernel[]->C[2]}\"\n\nAfter compilation, you will find the generated designs under the directory\n``${AUTOSA_ROOT}/autosa.tmp/output/src``.\n\nWe also provide an example Makefile for testing the design.\nCopy it to the design directory.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/mm_intel/Makefile ${AUTOSA_ROOT}/autosa.tmp/output/\n\nYou may modify the Makefile based on your target FPGA board or use your own Makefile.\nIn the example Makfile, we target the Intel Stratix 10 board with HBM memory.\n\n.. code:: bash\n\n    AOCL_BOARD ?= s10mx_hbm_es\n\nSet up your local Intel OpenCL SDK environment. Make sure the environment variable \n``INTELFPGAOCLSDKROOT`` is set properly. Then, to perform software emulation, run:\n\n.. code:: bash\n\n    make sw_emu_check\n\nThe design will be compiled and simulated on CPU. You should be able to see the following information printed on your terminal.\n\n.. code:: bash\n\n    AOCX file: kernel_sw_emu.aocx\n\n    FPGA Time: 0.146633 s\n    Host Time: 0.14696 s\n    Passed!\n\nwhich shows the design is successfully compiled and the simulation passed successfully.\n\nTo synthesize the design to RTL, run:\n\n.. code:: bash\n\n    make hls\n\nThe design will be synthesized to RTL. This process will take some time to finish.\nIntel OpenCL SDK generates the detailed hardware information in HTML format, which \ncan be found at ``${AUTOSA_ROOT}/autosa.tmp/output/bin/kernel/reports``.\n\nLastly, to generate the bitstream, run:\n\n.. code:: bash\n\n    make hw\n\nMore Details\n------------\n\nCompared to generating Xilinx HLS designs, when generating the Intel OpenCL code, we add the following \nthree arguments to the compilation command.\n\n``--loop-infinitize``: Xilinx HLS requires the loops to be bounded. Such a limitation is \nno longer required for Intel OpenCL. Loops can be eliminated if possible as the function can be \nrun infinitely. Performing loop infitinization will eliminate the unnecessary outer loops \nin each function to reduce the hardware overheads.\n\n``--double-buffer-style=0``: When generating the double buffer logic, by default, \nwe will generate the ping-pong logic explicitly as you may see in the Xilinx HLS code as below.\n\n.. code:: c\n\n    // outer loops\n    for (...)\n      for (...) {\n        // double buffer logic\n        if (arb == 0) {\n          func1(ping_array);\n          func2(pong_array);\n        } else if (arb == 1) {\n          func1(pong_array);\n          func2(ping_array);\n        }\n      }\n      \nHowever, such a coding style no longer works in Intel OpenCL design as Intel OpenCL SDK \nlacks the ability to identify that ``func1`` and ``func2`` can be executed in parallel.\nAs a temporary solution, we will modify this coding style by inlining the function contents of \n``func1`` and ``func2`` directly. By setting ``--double-buffer-style=0``, we will generate the \nfunctional double buffering logic for Intel OpenCL. The generated logic looks like below:\n\n.. code:: c\n\n    while (1) {\n      if (func1_en) {\n        // func1 logic\n        ...\n      }\n      if (func2_en) {\n        // func2 logic\n        ...\n      }      \n    }\n\n``--mem-port-map=\"{kernel[]->A[0];kernel[]->B[1];kernel[]->C[2]}\"``: \nAs the target FPGA board is equipped with HBM memory, we may assign the global pointer to \ndifferent HBM banks. In Xilinx Vitis flow, we will write a separate configuration file \nto map global pointers to different banks. However, in Intel flow, we will need to code it \nexplicitly in the OpenCL kernel code. This arugment is optional. It maps the global pointers \n``A``, ``B``, and ``C`` to bank 0, 1, and 2. You should find the following code in the OpenCL code.\n\n.. code:: c\n\n    __kernel void A_IO_L3_in_serialize(__global volatile __attribute__((buffer_location(\"HBM0\"))) A_t16 *restrict A)\n\nin which we use the ``__attribute__((buffer_location(\"HBM0\")))`` to assign the pointer ``A`` to the bank ``HBM0``."
  },
  {
    "path": "docs/tutorials/matrix_multiplication.rst",
    "content": "How Systolic Array Works: A Case Study on Matrix Multiplication\n===============================================================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nThis page gives a detailed explanation about the AutoSA generated systolic array architecture\nfor matrix multiplication.\n\nGenerating the Systolic Array\n-----------------------------\n\nWe will use the example code in ``${AUTOSA_ROOT}/autosa_tests/mm/kernel.c``.\n\n.. code:: c\n\n    #pragma scop\n    for (int i = 0; i < 64; i++)\n      for (int j = 0; j < 64; j++) {\n        C[i][j] = 0;\n        for (int k = 0; k < K64; k++)\n          C[i][j] = C[i][j] + A[i][k] * B[j][k];\n      }\n    #pragma endscop\n\nUse the following command to generate the systolic array.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm/kernel.c --config=./autosa_config/autosa_config.json --target=autosa_hls_c --output-dir=./autosa.tmp/output --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[2]}\" --simd-info=./autosa_tests/mm/simd_info.json --hls\n\nThis will generate a :math:`2\\times 2` 2D systolic array as shown below.\n\n.. image:: images/mm_array_opt.png\n    :width: 300\n    :align: center\n\nUnderstanding the Systolic Array\n--------------------------------\n\nThe systolic array architecture is composed of two parts: the processing elements (PE) and the \nI/O network. We will explain these two components in sequence.\n\nProcessing Elements (PE)\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nBelow is the AutoSA generated HLS code for the PE.\n\n.. code-block:: c\n    :linenos:\n\n    /* Module Definition */\n    void PE(int idx, int idy, hls::stream<A_t2> &fifo_A_in, hls::stream<A_t2> &fifo_A_out, hls::stream<B_t2> &fifo_B_in, hls::stream<B_t2> &fifo_B_out, hls::stream<float> &fifo_C_drain_out) {\n    #pragma HLS INLINE OFF\n      /* Variable Declaration */\n      int p0 = idx, p1 = idy; // module id\n      A_t1 local_A[1][2];\n      #pragma HLS ARRAY_PARTITION variable=local_A dim=0 complete\n      B_t1 local_B[1][2];\n      #pragma HLS ARRAY_PARTITION variable=local_B dim=0 complete\n      C_t1 local_C[8][8];\n      #pragma HLS RESOURCE variable=local_C core=RAM_2P_BRAM\n      /* Variable Declaration */\n\n      for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n        for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1) {\n          // array\n          // pe\n          // latency\n          for (ap_uint<4> c6 = 0; c6 <= 7; c6 += 1) {\n            // latency\n            for (ap_uint<4> c7 = 0; c7 <= 7; c7 += 1) {\n            #pragma HLS PIPELINE II=1\n              // simd\n              // hls_unroll\n              local_C[c7][c6] = 0;\n            }\n          }\n          for (ap_uint<3> c2 = 0; c2 <= 3; c2 += 1) {\n            // array\n            // pe\n            for (ap_uint<4> c5 = 0; c5 <= 7; c5 += 1) {\n              // latency\n              for (ap_uint<4> c6 = 0; c6 <= 7; c6 += 1) {\n                // latency\n                for (ap_uint<4> c7 = 0; c7 <= 7; c7 += 1) {\n                #pragma HLS PIPELINE II=1\n                  {\n                    {\n                      A_t2 fifo_data;\n                      fifo_data = fifo_A_in.read();\n                      for (ap_uint<2> n = 0; n < 2; n++) {\n                      #pragma HLS UNROLL\n                        union {unsigned int ui; float ut;} u;\n                        u.ui = (unsigned int)fifo_data(31, 0);\n                        local_A[0][n] = u.ut;\n                        fifo_data = fifo_data >> 32;\n                      }\n                    }\n                    {\n                      B_t2 fifo_data;\n                      fifo_data = fifo_B_in.read();\n                      for (ap_uint<2> n = 0; n < 2; n++) {\n                      #pragma HLS UNROLL\n                        union {unsigned int ui; float ut;} u;\n                        u.ui = (unsigned int)fifo_data(31, 0);\n                        local_B[0][n] = u.ut;\n                        fifo_data = fifo_data >> 32;\n                      }\n                    }\n                    // simd\n                    for (ap_uint<2> c8 = 0; c8 <= 1; c8 += 1) {\n                    #pragma HLS UNROLL\n                      local_C[c7][c6] = (local_C[c7][c6] + (local_A[0][c8] * local_B[0][c8]));\n                    }\n                    if (c2 == 3 && c5 == 7)\n                      fifo_C_drain_out.write(local_C[c7][c6]);\n                    {\n                      B_t2 fifo_data;\n                      union {unsigned int ui; float ut;} u1, u0;\n                      u1.ut = local_B[0][1];\n                      u0.ut = local_B[0][0];\n                      fifo_data = (ap_uint<32>(u1.ui), ap_uint<32>(u0.ui));\n                      fifo_B_out.write(fifo_data);\n                    }\n                    {\n                      A_t2 fifo_data;\n                      union {unsigned int ui; float ut;} u1, u0;\n                      u1.ut = local_A[0][1];\n                      u0.ut = local_A[0][0];\n                      fifo_data = (ap_uint<32>(u1.ui), ap_uint<32>(u0.ui));\n                      fifo_A_out.write(fifo_data);\n                    }\n                  }\n                }\n              }\n            }\n          }\n        }\n    }\n    /* Module Definition */\n\nIn this 2D systolic array, data of matrix A are reused horizontally across PEs, data of matrix B are reused vertically. Each PE computes elements of matrix C locally. After the computation is done, final results of matrix C will be drained out to the external memory.\n\nThe PE interface (line 2) contains the following components:\n\n* Module index (``idx``, ``idy``): Indices of the PE module.\n* FIFO (``fifo_A_in``, ``fifo_A_out``, ``fifo_B_in``, ``fifo_B_out``, ``fifo_C_drain_out``): FIFOs for transfering data.\n\nWhile generating this array, we applied latency hiding on the orginal loops :math:`i` and :math:`j` with the factor :math:`(8,8)`, and SIMD vectorization on the loop :math:`k` with a factor of 2. With the latency hiding, each PE will compute a tile of :math:`8\\times 8` of the matrix C. With the SIMD vectorization, at each cycle, two elements of matrix A and two elements of matrix B are required to update the local elements of matrix C.\n\nWith this knowledge, we could take a look at the local variable declarations in lines 5-11 now. Line 5 is simply storing the module indices. Lines 6-11 allocate local storage inside PEs for storing the data of matrix A, B, and C.\n\nThe rest of the code performs the computation. At each cycle, PE reads data of matrix A and B from neighbor PEs at lines 38-59 and passes the data to neighbor PEs at lines 67-82. PE performs the computation at lines 61-64. \nWhen the final results of matrix C are derived, PE writes out the final results at lines 65-66.\n\nI/O Network\n^^^^^^^^^^^\n\nI/O network is composed of a series of I/O modules for transferring data between the external memory and PEs. We will use the I/O modules of array A as an example.\n\nThere are two types of I/O modules for array A: \n\n* Level-3 (L3) I/O modules: modules that read data from the external memory and send to the array.\n* Level-2 (L2) I/O modules: modules that pass data between each other. Data that belong to the PEs that the module is connected to are kept locally, the rest data are passed to the downstreaming I/O modules.\n\nBelow is the code of the L3 I/O module.\n\n.. code-block:: c\n    :linenos:\n\n    /* Module Definition */\n    void A_IO_L3_in(A_t8 *A, hls::stream<A_t8> &fifo_A_local_out) {\n    #pragma HLS INLINE OFF\n      /* Variable Declaration */\n      /* Variable Declaration */\n\n      for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n        for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1)\n          for (ap_uint<3> c2 = 0; c2 <= 3; c2 += 1) {\n            // array\n            // io_L3\n            for (ap_uint<2> c3 = 0; c3 <= 1; c3 += 1) {\n              // io_L2\n              for (ap_uint<4> c4 = 0; c4 <= 7; c4 += 1) {\n                // access_coalesce\n                for (ap_uint<2> c5 = 0; c5 <= 1; c5 += 1) {\n                #pragma HLS PIPELINE II=1\n                {\n                  A_t8 fifo_data;\n                  fifo_data = A[128*c0 + 2*c2 + 64*c3 + 8*c4 + c5];\n                  fifo_A_local_out.write(fifo_data);\n                }\n                }\n              }\n            }\n          }\n    }\n    /* Module Definition */   \n\nIn this design, we apply the array partitioning on the original loops :math:`i`, :math:`j`, and :math:`k` with the factors :math:`(16,16,16)`. The orignal loop bounds for these three loops are :math:`(64,64,64)`. \nTherefore, array partitioning loops at lines 7-9 have loop bounds of :math:`(4,4,4)`. \n\nWhen transferring the data to the PEs, we will pass data through the chain of L2 I/O modules. In this design, there are two such modules. The loop for traversing the L2 I/O modules is at line L2. \nInside each L2 I/O module, we will need to load the data tile required by the PEs that it is connected to. \n\nWith the array partitioning factors :math:`(16,16,16)`, at each array partition, a sub tile of matrix A with the size :math:`16\\times 16` is loaded from the external memory. As this array have the dimension of :math:`2\\times 2`, each L2 I/O module will store a tile with the size :math:`8\\times 16`.\nThe loops for loading the data tiles for each I/O modules can be found at lines 14-16. Note that AutoSA will pack data together to increase the I/O througput. In this case, every 8 elements are packed together. Therefore, the size of the local tile is :math:`8\\times 2`, with a data width of 8 data elements.\n\nNext, we will look at the L2 I/O module. The figure below shows the micro-architecture of the L2 I/O module.\n\n.. image:: images/io_module_arch.png\n    :width: 500\n    :align: center\n\nL2 I/O module loads data from the upstream I/O modules, keeps the data that belong to it, and sends the rest to the downstream modules. \nFor I/O modules with local buffers inside, AutoSA automatically applies double buffering to overlap the data transfer betwen the I/O modules and data transfer to/from PEs. \n\nBelow is the code of L2 I/O module.\n\n.. code-block:: c\n    :linenos:\n\n    /* Module Definition */\n    void A_IO_L2_in(int idx, hls::stream<A_t8> &fifo_A_in, hls::stream<A_t8> &    fifo_A_out, hls::stream<A_t2> &fifo_A_local_out) {\n    #pragma HLS INLINE OFF\n      /* Variable Declaration */\n      int p0 = idx; // module id\n      A_t8 local_A_ping[8][2];\n      #pragma HLS RESOURCE variable=local_A_ping core=RAM_2P_BRAM\n      A_t8 local_A_pong[8][2];\n      #pragma HLS RESOURCE variable=local_A_pong core=RAM_2P_BRAM\n      bool arb = 0;\n      bool inter_trans_en = 1;\n      bool intra_trans_en = 0;\n      int c0, c0_prev;\n      int c1, c1_prev;\n      int c2, c2_prev;\n      /* Variable Declaration */\n\n      {\n        for (ap_uint<3> c0 = 0; c0 <= 3; c0 += 1)\n          for (ap_uint<3> c1 = 0; c1 <= 3; c1 += 1)\n            for (ap_uint<3> c2 = 0; c2 <= 3; c2 += 1) {\n              // array\n              // io_L3\n              {\n                if (arb == 0) {\n                  A_IO_L2_in_inter_trans(\n                    /* module id */ idx, \n                    /* host iter */ c0, \n                    /* host iter */ c1, \n                    /* host iter */ c2, \n                    /* array */ local_A_pong, \n                    /* fifo */ fifo_A_in, \n                    /* fifo */ fifo_A_out, \n                    /* enable */ inter_trans_en\n                  );\n                  A_IO_L2_in_intra_trans(\n                    /* module id */ idx, \n                    /* host iter */ c0_prev, \n                    /* host iter */ c1_prev, \n                    /* host iter */ c2_prev, \n                    /* array */ local_A_ping, \n                    /* fifo */ fifo_A_local_out, \n                    /* enable */ intra_trans_en\n                  );\n                } else {\n                  A_IO_L2_in_inter_trans(\n                    /* module id */ idx, \n                    /* host iter */ c0, \n                    /* host iter */ c1, \n                    /* host iter */ c2, \n                    /* array */ local_A_ping, \n                    /* fifo */ fifo_A_in, \n                    /* fifo */ fifo_A_out, \n                    /* enable */ inter_trans_en\n                  );\n                  A_IO_L2_in_intra_trans(\n                    /* module id */ idx, \n                    /* host iter */ c0_prev, \n                    /* host iter */ c1_prev, \n                    /* host iter */ c2_prev, \n                    /* array */ local_A_pong, \n                    /* fifo */ fifo_A_local_out, \n                    /* enable */ intra_trans_en\n                  );\n                }\n                intra_trans_en = 1;\n                arb = !arb;\n                c0_prev = c0;\n                c1_prev = c1;\n                c2_prev = c2;\n              }\n            }\n        if (arb == 0) {\n          A_IO_L2_in_intra_trans(\n            /* module id */ idx, \n            /* host iter */ c0_prev, \n            /* host iter */ c1_prev, \n            /* host iter */ c2_prev, \n            /* array */ local_A_ping, \n            /* fifo */ fifo_A_local_out, \n            /* enable */ intra_trans_en\n          );\n        } else {\n          A_IO_L2_in_intra_trans(\n            /* module id */ idx, \n            /* host iter */ c0_prev, \n            /* host iter */ c1_prev, \n            /* host iter */ c2_prev, \n            /* array */ local_A_pong, \n            /* fifo */ fifo_A_local_out, \n            /* enable */ intra_trans_en\n          );\n        }\n      }\n    }\n    /* Module Definition */    \n\nLines 6-9 define the double buffers inside the I/O module.\nLines 19-95 performs the double buffering to overlap the data transfer between I/O modules (defined in the function ``A_IO_L2_in_inter_trans``) and data transfer to/from PEs (defined in the function ``A_IO_L2_in_intra_trans``).\n\nPlease refer to the generated code for more details of the functions ``A_IO_L2_in_inter_trans`` and ``A_IO_L2_in_intra_trans``.\n\nThe similar principles apply to the other I/O modules. Together with both the I/O modules and PEs, we have a complete functional systolic array that can be synthesized and executed on FPGAs.\n\n.. note:: \n\n    When adding the argument ``--host-serialize`` to the AutoSA command, the data of each array will be serialized on the host and transfered to the systolic array. AutoSA will introduce an additional I/O module for loading/writing the serialized data from/to the external memory before the original I/O modules. Feel free to try it out and compare with the code without serialization. The major benefit of using host serialization is to increase the DDR bus width and burst length to improve the effective DRAM bandwidth."
  },
  {
    "path": "docs/tutorials/optimize_array.rst",
    "content": ".. _construct-and-optimize-array-label:\n\nConstructing and Optimizing a Systolic Array\n============================================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nThis page takes an in-depth look at how AutoSA constructs and optimizes a systolic array to \nachieve high performance on FPGAs. \n\n.. note:: \n    This page will be helpful to readers who are interested in the implementation of AutoSA. \n    More details are covered in the `AutoSA Paper <https://vast.cs.ucla.edu/sites/default/files/publications/FPGA2021_AutoSA_camera.pdf>`_.\n    Feel free to skip this one if you focus on using AutoSA to generate systolic arrays only.\n\nPrerequisites\n-------------\nPlease finish the tutorial :ref:`theoretical-background-label` first.\n\nA complete systolic array architecture consists of both the PE array and the on-chip \nI/O network. \nAutoSA separates the process of building these two components into two stages: \n*computation* and *communication management*. \nThe stage of computation management constructs the PE and optimizes its micro-architecture. \nAfter that, the stage of communication management builds the I/O network for transferring data between PEs and the external memory. \nDetails of these two stages will be covered in the subsequent sections, respectively.\n\nIn the subsequent sections, we use the exmaple code below to illustrate different steps.\n\n.. code:: c\n\n  for (int i = 0; i < M; i++)\n    for (int j = 0; j < N; j++)\n      for (int k = 0; k < K; k++) \n  S0:   C[i][j] += A[i][k] * B[k][j];\n\nThis code performs matrix multiplication (the initialization is omitted for brevity).\nWith the help of `integer set library (ISL) <http://isl.gforge.inria.fr/>`_, we can\nextract the initial schedule tree of the program as shown below.\n\n.. image:: images/mm_tree_param.png\n    :width: 400\n    :align: center\n\nComputation Management\n----------------------\n\nThe stage of computation management consists of four steps: \n*space-time transformation*, *array partitioning*, *latency hiding*, \nand *SIMD vectorization*. We will go though each step in the following subsections. \n\nSpace-Time Transformation\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis step performs the space-time transformation to map the input program to a systolic array.\nDetails of the space-time transformation are covered in :ref:`theoretical-background-label`.\nThe algorithm below describes how AutoSA applies the space-time transformation.\n\n.. admonition:: Algorithm 1: Space-time transformation\n\n    | Inputs: A schedule tree :math:`s` \n    | Outputs: A list of schedule tree :math:`S`\n    | Initialize the space loop candidate pool :math:`P\\gets \\emptyset`;\n    | Extract the outermost permutable loop band :math:`d` from :math:`s`;\n    | for each loop :math:`l` in the band :math:`d` do\n    |  if all flow/read dependence distances on loop :math:`l \\leq 1` then\n    |    :math:`P \\gets P \\cup l`;\n    | /* Generate 2D systolic array. \\*/\n    | for each pair of loops :math:`(l_1, l_2)` in the pool :math:`P` do\n    |  Duplicate the schedule tree :math:`s' \\gets s`;\n    |  Modify :math:`s` by permuting the loops :math:`l_1, l_2` to outermost;\n    |  :math:`S\\gets S\\cup s'`;\n    | /* Generate 1D systolic array (omitted), similar to 2D case with only one space loop selected. \\*/\n\nAutoSA searches for the loops in the outermost loop band with flow/read dependence distances no greater then one. \nThose loops are put into a pool as the candidate space loops. \nNext, AutoSA enumerates all space loop combinations from the candidate pool. \nThe selected space loops are permuted outermost. \nAll the loops below the space loops are assigned as time loops. \nAt present, AutoSA generates 1D and 2D systolic arrays. \nThis constraint can be relaxed to generate higher-dimensional arrays \nif necessary. \nThere will be multiple systolic arrays generated from this step, \neach with a unique schedule. \nUsers can choose which array to process manually, \nor leave it to be explored by the auto-tuner.\n\nArray Partitioning\n^^^^^^^^^^^^^^^^^^\n\nGiven the limited on-chip resource, array partitioning is mandatory when mapping a large array to FPGA.  \nTo achieve this, AutoSA tiles the outermost permutable loop band in the schedule \ntree which contains the space loops. \nThe tiling factors can be chosen by the users or set by the auto-tuner during the \ndesign space exploration. \nThe schedule tree below shows one example in which we tile the outermost loop band \nin the MM example with the tiling factors of :math:`(4,4,4)`. \n\n.. image:: images/mm_tree_array_part.png\n    :width: 400\n    :align: center\n\nThe point loops from the original loops :math:`i` and :math:`j` are kept as the space loops. \nThis will lead to a 2D systolic array with the dimensions of :math:`4\\times4`.\n\nLatency Hiding\n^^^^^^^^^^^^^^\n\nLatency hiding helps hide the pipeline stalls caused by the loop-carried dependence \nof the compute statements. In the MM example, the multiply-and-add (MAC) operation \nin the statement S0 introduces loop-carried dependence on the loop :math:`k`, \nresulting in an initial interval (II) greater than one. \nTo resolve this issue, AutoSA looks for parallel loops in the schedule tree, \nstrip-mines them and permutes the point loops innermost. \nAs an example, loops :math:`i` and :math:`j` are parallel loops in the MM example. \nWe will strip-mine them with the tiling factors of :math:`(2,2)` and permute the point \nloops innermost. Since there is no loop-carried dependence on the innermost loop, \nthe PE could now achieve II=1. \nThe newly generated schedule is shown below.\n\n.. image:: images/mm_tree_latency.png\n    :width: 400\n    :align: center\n\nSimilar as the previous stage, AutoSA allows users to specify the loops to be tiled \nand the tiling factors. \nAlternatively, such choices will be explored by the auto-tuner to maximize the performance.\n\nSIMD Vectorization\n^^^^^^^^^^^^^^^^^^\n\nSIMD vectorization duplicates the compute units inside each PE, \nwhich still share the same control logic. \nThis helps amortize the control overheads and improve the resource efficiency of the \ndesign. At present, AutoSA detects the vectorizable loop by \nexamining the following two criteria:\n\n* The loop should be a parallel loop or a reduction loop. \n* All array references within the loop are stride-one or stride-zero in regard to this loop. \n\n.. note:: \n\n    The current polyhedral framework that AutoSA builds on lacks the capability \n    to detect the reduction loop, which requires the user annotation prior to \n    the compilation.\n\nIn the MM example, the loop :math:`k` is a reduction loop. Array references ``C[i][j]`` and ``A[i][k]`` \nare stride-zero and stride-one with regard to loop :math:`k`. \nThe array reference ``B[k][j]`` requires a layout transformation to ``B[j][k]`` so that \nit becomes a stride-one access that enables the vectorization. \nBelow is the updated schedule tree in which we strip-mine the loop :math:`k` \nwith a factor of 2.\n\n.. image:: images/mm_tree_simd.png\n    :width: 400\n    :align: center\n\nThe point loop is permuted innermost and marked ``unroll`` which will be handled by HLS tools at last. \n\nCommunication Management\n------------------------\n\nSo far we have finished the PE construction and optimization. \nHowever, the current array is still not functional as we are missing the other key component, \nthe I/O network. \nThe I/O network is a network on chip that supports two types of data communication:\n\n* **Inner-array communication**, the data communication between PEs.\n* **Outer-array communication**, the data communication between PEs and the external memory (e.g., DRAM).\n\nThe stage of communication management in AutoSA analyzes the program and constructs \nthe I/O network as mentioned above.\nWe show that I/O network can be built automatically via data dependence analysis \nin the polyhedral model. \nFurthermore, as the topology of the I/O network plays an important role in the \nfrequency of the design, we extend the algorithm to build an I/O network that \nonly involves local interconnects, \nhence, guaranteeing the sustained high frequency. \n\nThe following subsections explain our approaches in detail. \n`I/O Analysis`_ describes how we analyze the dependences in the program to extract \nthe necessary information for constructing the I/O network. \n`I/O Construction`_ builds the I/O network using the information extracted from the \nprevious step. \n`I/O Optimization`_ discusses several I/O optimization techniques to further \nimprove the I/O performance.\n\nI/O Analysis\n^^^^^^^^^^^^\n\nThe data communication is associated with the data dependences.\nTo build the I/O network, AutoSA analyzes the following three types of data dependences:\n\n* Read dependence: for transferring the read-only data.\n* Flow dependence: for transferring the intermediate results.\n* Output dependence: for transferring the final results. \n\nThe table below lists the dependences extracted from the MM example. \n\n.. csv-table::\n    :header: \"Type\", \"Dependence Relation\", \"Array Access\"\n\n    \"Read\", \":math:`D1:=\\{S0[i,j,k]\\to S0[i,j+1,k]\\}`\", ``A[i][k]``\n    \"Read\", \":math:`D2:=\\{S0[i,j,k]\\to S0[i+1,j,k]\\}`\", ``B[k][j]``\n    \"Flow\", \":math:`D3:=\\{S0[i,j,k]\\to S0[i,j,k+1]\\}`\", ``C[i][j]``\n    \"Output\", \":math:`D4:=\\{S0[i,j,k]\\to S0[i,j,k+1]\\}`\", ``C[i][j]``\n\nThe step of I/O analysis interprets such dependences and extracts a data structure \nnamed *I/O group* that contains the necessary information required to construct the I/O network. \nPlease refer to the `AutoSA Paper <https://vast.cs.ucla.edu/sites/default/files/publications/FPGA2021_AutoSA_camera.pdf>`_ \nfor more details about how we derive the I/O groups.\n\nAn I/O group :math:`g` is defined as a tuple of :math:`g=(A,D)` where :math:`A` is a\nset of array accesses that are associated with the current group and \n:math:`D` is the set of data dependences associated with the array accesses in :math:`A`. \nFor each I/O group, the following properties are computed:\n\n* **I/O direction**. This is the component of the dependence distance vector on the space loops.\n* **I/O type**. The I/O group is classified as *exterior I/O* if the dependence is carried by \n  the space loops. Otherwise, it is classified as *interior I/O*.\n\nAs an example, in the MM example, for the array access ``B[k][j]``, \nwe construct an I/O group :math:`g` from the array access ``B[k][j]`` and \nits associated dependence :math:`D2` as shown in the table above.\nThe dependence distance of :math:`D2` on the space loops is :math:`(1,0)`. \nTherefore, we assign the I/O direction as :math:`g.dir=(1,0)` and the \nI/O type as :math:`g.type=exterior`.\n\nThe I/O groups are then merged if they share the same properties.\nLater, AutoSA will allocate a set of I/O modules for each I/O group.\n\nThe last step is to compute the statement instances that require such data.\nWe divide them into two sets: copy-in set :math:`W_{in}` and \ncopy-out set :math:`W_{out}`. \nThese sets contain the statement instances that require the data to be copied in \nor copied out, respectively.\n\nThe table below includes the final I/O groups extracted from the MM example and \ntheir copy-in/copy-out sets. \nThey will be used for I/O network construction in the next section.\n\n.. csv-table::\n    :header: \"No.\", \":math:`A`\", \":math:`D`\", \":math:`W_{in}/W_{out}`\"\n\n    \":math:`g_1`\", ``A[i][k]``, \":math:`D_1`\", \":math:`W_{in}:={S0[i,j,k]}:0\\leq i< M \\land 0\\leq j< N \\land 0\\leq k<K\\}`\"\n    \":math:`g_2`\", ``B[k][j]``, \":math:`D_2`\", \":math:`W_{in}:={S0[i,j,k]}:0\\leq i< M \\land 0\\leq j< N \\land 0\\leq k<K\\}`\"\n    \":math:`g_3`\", ``C[i][j]``, \":math:`D_3`\", \":math:`W_{in}:={S0[i,j,k]}:0\\leq i< M \\land 0\\leq j< N \\land 0< k<K\\}`\"\n    , , , \":math:`W_{out}:={S0[i,j,k]}:0\\leq i< M \\land 0\\leq j< N \\land 0\\leq k<K-1\\}`\"\n    \":math:`g_4`\", ``C[i][j]``, \":math:`D_4`\", \":math:`W_{out}:={S0[i,j,k=K-1]:0\\leq i< M \\land 0\\leq j< N}`\"\n\nI/O Construction    \n^^^^^^^^^^^^^^^^\n\nThis step constructs the I/O modules based on the I/O grouping information extracted \nfrom the previous step. \nFor each I/O group, AutoSA allocates a set of I/O modules for transferring the \ndata between PEs and the external memory.\n\nWe start with the optimized schedule from the computation management. \nIn the first step, we isolate the statement instances that are involved with \nthe data communication from the current group by inserting a filter node into \nthe schedule tree with the copy-in/copy-out set. \nThe filter node restrains the iteration domains of its children nodes by intersecting the current iteration domain with the filter set.\n\nAs an example, below is the updated schedule with the filter domain of the I/O group\n:math:`g_2` (loops inside the space loops are omitted for brevity).\n\n.. image:: images/mm_tree_isolate.png\n    :width: 500\n    :align: center\n\nAt this stage, we could already generate a set of I/O modules that load the data from the external memory and send the data directly to each PE.\nThis can be realized by equating the space loops to the PE indices ``idx`` and ``idy`` in the updated schedule and using it to generate the code inside each I/O module.\nThe figure below shows the generated array and the corresponding schedule for each I/O module.\n\n.. image:: images/mm_array_b.png\n    :width: 500\n    :align: center\n\nHowever, this architecture may not be scalable as the data are scattered directly from the external memory which causes high fan-outs and could lead to routing failure.\nTo resolve this issue, we choose to *localize* the I/O network by using a daisy-chain architecture.\nIn this architecture, each I/O module fetches data from the upper-stream I/O modules.\nThe I/O module works as a filter that keeps the data belonging to the PEs that it is associated with and passes the rest of the data to the down-stream I/O modules.\nAs for the architecture in the figure above, \nwe name the I/O modules that are directly connected to PEs as level-one (L1) I/O modules.\nWe could first cluster the L1 I/O modules along the :math:`x`-axis, as shown in the figure below.\n\n.. image:: images/mm_array_L1.png\n    :width: 400\n    :align: center\n\nEvery two L1 modules along the :math:`x`-axis are connected to an upper-level (L2) I/O modules, which helps to reduce the memory fan-outs from four to two.\nWe name such a process as *I/O clustering*.\nI/O clustering can be applied multiple times in a hierarchical way.\nFor example, we could apply the I/O clustering again on the L2 I/O modules, generating one L3 I/O module that connects to the DRAM, as shown in the figure below.\nEventually, we reduce the memory fan-outs from four to one.\n\n.. image:: images/mm_array_L2.png\n    :width: 250\n    :align: center\n\nThe figure below depicts the final array architecture after the I/O clustering for all the I/O groups.\n\n.. image:: images/mm_array_unopt.png\n    :width: 400\n    :align: center\n\nI/O Optimization\n^^^^^^^^^^^^^^^^\n\nIn this step, AutoSA applies multiple passes to further optimize the I/O network. \n\n**I/O module embedding**: L1 I/O modules with exterior I/O are embedded into the PEs to save the resource.\n\n**I/O module pruning**: When transferring the data between different sub-array tiles, \nAutoSA checks if the copy-out set of the previous tile equals the copy-in set of the \ncurrent tile at the PE level. If two sets are equal at the PE level, \nit indicates the data are located on-chip and hence the data transfer from the external \nmemory is unnecessary. For such a case, the I/O modules for this I/O group are pruned \naway to save the off-chip communication and on-chip resource. \nAs an example, for the MM example, the I/O modules for the group :math:`g_3` \nwill be pruned away since the data of matrix C are accumulated locally inside each PE. \nThe figure below shows the optimized array by applying two techniques as mentioned above.\n\n.. image:: images/mm_array_opt.png\n    :width: 300\n    :align: center\n\n**Data packing**: To reduce the data transfer latency between the I/O modules, \nAutoSA performs data packing between I/O modules. \nPacking more data helps reduce the data transfer latency, \nhowever, it leads to FIFOs with a larger width and higher resource usage. \nTherefore, AutoSA offers options to set the data packing factor at each I/O level, \nwhich can also be set by the auto-tuner during the design space exploration.\n\n**Double buffering**: By default, AutoSA allocates a local buffer inside the L1 I/O modules \nfor I/O groups with interior I/O or inside the L2 I/O modules for I/O groups \nwith the exterior I/O. For such I/O modules with local buffers inside, \nAutoSA offers options to enable the double buffering that helps overlap the \nmemory transfer with the PE computation.\n\nAfter the above steps, we obtain a complete systolic array with both PEs and I/O network."
  },
  {
    "path": "docs/tutorials/structural_sparsity.rst",
    "content": "Supporting Structural Sparsity\n==============================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nStructural sparsity can be useful for DNN networks. This page discusses how structural \nsparsity is supported in AutoSA.\n\nWhat is Strctural Sparsity?\n---------------------------\n\nAutoSA supports the similar structural sparsity that can be found in the recent Nvidia \nAmpere GPU (`link <https://developer.nvidia.com/blog/exploiting-ampere-structured-sparsity-with-cusparselt/>`_). \nThe figure below shows the supported sparse matrix-dense matrix multiplication.\n\n.. image:: images/sparse_mm.png\n    :align: center\n\nThe figure above performs the computation of :math:`C=A\\times B`.\nThe first input matrix A is a strutural-sparse matrix, and the second input matrix B is \na dense matrix.\nAs for the matrix A, every adjacent ``VEC_LEN`` elemens are grouped together. In every group,\nwe allow up to ``NON_ZERO_NUM`` non-zero elements. Therefore, the sparsity of the matrix A is\n``1-NUN_ZERO_NUM/VEC_LEN``.\n\nThe sparse matrix A is then stored in a compressed format, in which only the non-zero elements \nare stored, along with their relative indices inside each group.\n\nThe benefits of structural sparsity are clear. It allows the hardware to achieve higher \neffective throughput with the same amount of resource. \nIt is also easy to be implemented on the systolic array architecture. \nWe will show how to modify the systolic array to support the structural sparsity in the next section.\n\nHow is Structural Sparsity Implemented in AutoSA?\n-------------------------------------------------\n\nAs a comparison, we first present how the dense matrix multiplication is mapped to the \nsystolic array.\n\n.. image:: images/dense_array.png\n    :width: 500\n    :align: center\n\nIn the figure above, we show an example of a 2D :math:`2\\times 2` systolic array.\nEach PE computes a sub tile of the matrix C with the size :math:`4\\times 2`.\nWith SIMD vectorization, each time, two vectors of 4 elements from the matrix A and \nmatrix B are loaded into the PE. The PE computes the inner product of the two vectors \nand updates the elements of matrix C.\n\nThis array can be easily extended to support the structural sparsity.\nThe figure below shows an example in which we set the vector size :math:`v` as 4 and \nthe number of non-zero elements :math:`NON_ZERO_NUM` as 2.\n\n.. image:: images/sparse_array.png\n    :align: center\n\nAs the new matrix A is sparse, instead of packing 4 elements and send to PE each time, \nwe will only pack 2 elements, along with their indices in the original group vector, and send them \nto PEs. When PE loads the packed data, it will use the indices of the A elements to select \nthe corresponding elments from the vector of matrix B. \nCompared to the dense architecture, we introduce the packed indices of the sparse data and a MUX \nfor selecting the data from matrix B.\n\nGenerating the Design\n---------------------\n\nNow let's use AutoSA to generate one sparse design.\nThe example used here can be found in the directory ``${AUTOSA_ROOT}/autosa_tests/mm_block_sparse``.\n\nUse the following command to generate the design.\n\n.. code:: bash\n\n    ./autosa ./autosa_tests/mm_block_sparse/kernel.c \\\n    --config=./autosa_config/autosa_config.json \\\n    --target=autosa_hls_c \\\n    --output-dir=./autosa.tmp/output \\\n    --sa-sizes=\"{kernel[]->space_time[3];kernel[]->array_part[16,16,16];kernel[]->latency[8,8];kernel[]->simd[8]}\" \\\n    --simd-info=./autosa_tests/mm_block_sparse/simd_info.json \\\n    --host-serialize \\\n    --hls \\\n    --block-sparse --block-sparse-ratio=\"{kernel[]->A[2,4]}\"\n\nThe generated designs can be found at the directory ``${AUTOSA_ROOT}/autosa.tmp/output/src``\n\nThis command generates a design in Xilinx HLS C. You can use Xilinx HLS to verify the correctness of the design.\n\nCopy the TCL script to the output directory.\n\n.. code:: bash\n\n    cp ${AUTOSA_ROOT}/autosa_tests/mm_block_sparse/hls_script.tcl ${AUTOSA_ROOT}/autosa.tmp/output/\n\nRun the TCL script to verify the design.\n\n.. code:: bash\n\n    cd ${AUTOSA_ROOT}/autosa.tmp/output\n    vivado_hls -f hls_script.tcl\n\nYou should be able to see the following content in the terminal if the HLS design is executed successfully.\n\n.. code:: bash\n\n    INFO: [SIM 211-2] *************** CSIM start ***************\n    INFO: [SIM 211-4] CSIM will launch GCC as the compiler.\n    make: 'csim.exe' is up to date.\n    Passed!\n    INFO: [SIM 211-1] CSim done with 0 errors.\n    INFO: [SIM 211-3] *************** CSIM finish ***************\n\nNow let's take a closer look at the design code.\nThe input code can be found at ``${AUTOSA_ROOT}/autosa_tests/mm_block_sparse/kernel.c``\n\nAt line 28, we define the original matrices used for the matrix multiplication.\n\n.. code:: c\n\n    data_t A[I][K], B[J][K], C[I][J], C_golden[I][J];\n\nIn this example, matrix A will be sparsified. \nThe figure below illustrates how we store the sparse information.\n\n.. image:: images/sparse_example1.png    \n    :align: center\n    \nIn this figure, we set the vector length ``VEC_LEN`` as 4, and \nnumber of non-zero elements ``NON_ZERO_NUM`` as 2.\nArray ``A_d`` stores the non-zero data elements. \nAnd the relative index of the data elements in each group in stored in the array ``A_i``.\nThe data and index array is concatenated to be stored in the array ``A_s``.\nFor each group vector, we store the index information using an ``unsigned char`` right \nafter the data elements. Currently we assume that the group vector length to be a \npower of two and is no greater than 8. Besides, the data width of the matrices is \nno shorter than 8. All of these limitations can be relaxed in the future. \n\nAfter concatenating the index with the data elements, we will also pad empty elements to align the array.\nSpecifically, we compute the number of elements, except the data elements, denoted by \n``META_DATA_NUM`` using the following formula:\n\n.. math::\n    \n    META\\_DATA\\_NUM = 2^{ceil(log2(NON\\_ZERO\\_NUM+1))} - NON\\_ZERO\\_NUM\n\nIn this example, we compute ``META_DATA_NUM`` as 2. Two additional data elements are inserted after \nthe original data elements, And we store the index in the third element, as shown in the figure above.\n\nAnother example is shown in the figure below.\n\n.. image:: images/sparse_example2.png    \n    :align: center\n\nIn this example, we have ``VEC_LEN`` as 4, ``NON_ZERO_NUM`` as 1, and ``META_DATA_NUM`` as 1.\n\nFor compilation, we still use the original dense matrix multiplication, as shown in lines 89-97.\nWe provide the sparse information to the compiler through command arguments:\n\n* ``--block-sparse``: Specifies to use block sparsity.\n* ``--block-sparse-ratio=\"{kernel[]->A[2,4]}\"``: Specifies the sparse array as array ``A``, and the \n  number of non-zero elements and the group vector length ``[NON_ZERO_ELEMENTS, VEC_LEN]``."
  },
  {
    "path": "docs/tutorials/theory_background.rst",
    "content": ".. _theoretical-background-label:\n\nTheoretical Background\n======================\n\n**Author**: Jie Wang (jiewang@cs.ucla.edu)\n\nThis page covers the theoretocal background of mapping algorithms to systolic arrays. \nWe will start by giving an example of a systolic array for matrix multiplication to show\nhow systolic arrays look like and how they work. Then we will cover some basics about the \npolyhedral model and the algorithm (i.e., space-time transformation) that AutoSA uses to map\nan algorithm to a systolic array.\n\nAn Example of Systolic Array\n----------------------------\n\nThe example code below describes the matrix multiplication :math:`C=A\\times B`.\n\n.. code:: c\n\n  int A[3][3], B[3][3], C[3][3];\n  for (int i = 0; i < 3; i++)\n    for (int j = 0; j < 3; j++) {\n  S0: C[i][j] = 0;\n      for (int k = 0; k < 3; k++) \n  S1:   C[i][j] += A[i][k] * B[k][j];\n    }\n      \nThis algorithm can be mapped to a systolic array depicted in the figure below.\n\n.. image:: images/2d_array_mm.png\n    :width: 200\n    :align: center\n\nIn this figure, a 3x3 2D systolic array is generated for this algorithm.\nThe processing elements (PE) are connected only through local interconnects, the most \nimportant signature of systolic array architecture. \nThe figure below futher depicts the detailed computation scheduling of this array.\n\n.. image:: images/2d_array_mm_schedule.png\n    :width: 500\n    :align: center\n\nSpecifically, each PE computes one element of matrix C locally. Data of matrix A and B \nare fed at the boundaries and reused across PEs. The timing to feed data to different rows and columns of PEs\nare skewed to match the computation scheduling inside PEs.\nTo explain further, at the first cycle (when t = 0), the data of ``A[0][0]`` and ``B[0][0]`` are \nfed to the PE on the top-left corner. At the next cycle (t = 1), ``A[0][0]`` is sent downward and \n``B[0][0]`` is sent rightward. In the meantime, new data ``A[0][1]`` and ``B[1][0]`` are sent to the original PE\nand we also start to feed boundary PEs in the second column and row. \n\nAfter the computation is completed, each PE contains the final result of matrix C. The final results\nwill be drained out to the external memory at last.\n\nAs shown in the figure above, at each cycle, data are pumped into arrays and transferred across PEs rhythmically. \nThis is how the name **systolic array** is coined for this type of architecture.\n\nThere are two major benefits of such an architecture.\n\n* *Performance*. Systolic array exploits parallelism with a large number of PEs to achieve high performance.\n* *Energy efficiency*. The local interconnects maximize data reuse and reduces the energy cost of data transfer, thus leading to high energy efficiency.\n\nDue to such benefits, in the recent years, we have seen systolic arrays being widely adopted in various application domains, e.g., genomics, machine learning, \nto accelerate the computation.\n\nPolyheral Model\n---------------\n\nThe polyhedral model is a mathematical framework for loop nest optimization. \nLoop nests that satisfy the requirements of the polyhedral model are called \nstatic control of parts (SCoP). A SCoP is defined as a set of statements with loop bounds\nand conditions as affine functions of the enclosing loop iterators and variables that are\nconstant during the SCoP execution.\n\nA program in the polyhedral model is typically represented by three components: \n*iteration domains*, *access relations*, and a *schedule*. We will keep use the running example of \nmatrix multiplication in the previous section to illustrate these concepts.\n\nThe iteration domain contains the loop instances of the program. The iteration domain of the statement\nS1 in the example program has the form\n\n.. math::\n\n    \\{S1[i,j,k]:0\\leq i< 3 \\land 0\\leq j< 3 \\land 0\\leq k<3\\}\n\nThroughout this tutorial, to represent the components of the polyhedral model, we use the same\nformat as `integer set library (ISL) <http://isl.gforge.inria.fr/>`_, which is a library\nfor polyhedral compilation. In addition, we will only show the representation with regard to the statement\nS1 for brevity.\n\nThe access relation maps a statement instance to an array index. For example, \nthe access relations for the read accesses in the statement S1 have the form\n\n.. math::\n\n    \\{S1[i,j,k]\\to A[i,k];S1[i,j,k]\\to B[k,j];S1[i,j,k]\\to C[i,j]\\}\n\nFinally, a schedule maps instance sets to multi-dimensional time. \nThe statement instances are executed following the lexicographic \norder of the multi-dimensional time. \nAs an example, the schedule of the statement S1 has the form \n\n.. math::\n\n    \\{S1[i,j,k]\\to [i,j,k]\\} \n    \nThe schedule of a SCoP program can be represented by \n`schedule trees <http://impact.gforge.inria.fr/impact2014/papers/impact2014-verdoolaege.pdf>`_.\nThe figure below shows the schedule tree of the example program. \n\n.. image:: images/mm_tree.png\n    :width: 400\n    :align: center\n    \nThe schedule tree starts with a domain node that defines the iteration domain of \nthe program, followed with band nodes that encode the partial schedules at each \nloop dimension. \nThe isl library manipulates the schedule tree of the program to perform the loop transformation. To generate the final code, an AST is obtained from the schedule tree which is then lowered to the target code (e.g., C).\n\nFor readers who are intereted to learn more about the polyhedral model, we recommend some resources below.\n\n* `ISL manual <http://isl.gforge.inria.fr//manual.pdf>`_, the manual contains all the basic concepts and APIs of ISL.\n* `ISCC online demonstrator <https://polyhedral.info/2014/01/21/ISCC-demo-online.html>`_, an interactive interface to most of ISl functionalities. Don't forget to check out `this tutorial <http://barvinok.gforge.inria.fr/tutorial.pdf>`_ before using ISCC.\n* `Pluto framework <http://pluto-compiler.sourceforge.net/>`_, a milestone framework to get familar with the polyhedral scheduling algorithms.\n* `PPCG <https://github.com/Meinersbur/ppcg>`_, a polyhedral-model-based C-to-CUDA compiler. The original paper is `here <https://dl.acm.org/doi/pdf/10.1145/2400682.2400713>`_.\n* Some recent polyhedral-model-based compilation frameworks\n\n    * `Tensor Comprehension <https://research.fb.com/downloads/tensor-comprehensions/>`_ (discontinued)\n    * `Tiramisu <http://tiramisu-compiler.org/#:~:text=Tiramisu%20is%20a%20polyhedral%20compiler,be%20optimized%20by%20the%20compiler.>`_\n\nSpace-Time transformation\n-------------------------\n\nIn the last section of this tutorial, we will touch another important topic that lays the foundation of AutoSA, \nthe space-time transformation.\nThe space-time transformation applies loop transformations on the target program and assigns new semantics\n*space* and *time* to the generated loops. Space loops map loop instances to different PEs that execute concurrently, while time loops describe the computation inside each PE. \n\nTo generate a legal systolic array, the following constraints should be satisfied by the loop transformation: \n\n* First, the transformation should be semantics-preserving. \n* Second, all dependences should be uniform (with constant dependence distance). \n* Third, the dependence distances on space loops should be no greater than one so that the data communication only happens between neighbor PEs. \n\nNote that for the first and second constraints, we consider all types of dependences (flow, anti, output and input/read dependences). \nWe take into account the read dependences since the data transfer needs to be managed explicitly in systolic arrays including the read-only data. \nAs for the third constraint, we only examine the flow and read dependences which are associated with the inter-PE communication. \nSince each PE has its own address space, anti and output dependences do not contribute to the data communication between PEs.\n\nFor the matrix multiplication example, we obtain one flow dependence (domain constraints and the statement S0 omitted for brevity) as \n\n.. math::\n\n    D1 := \\{S1[i,j,k]\\to S1[i,j,k+1]\\} \n    \nand two read dependences for array references ``A[i][k]`` and ``B[k][j]`` as \n\n.. math::\n\n    D2 := \\{S1[i,j,k]\\to S1[i,j+1,k]\\} \n    D3 := \\{S1[i,j,k]\\to S1[i+1,j,k]\\}\n    \nOne possible space-time transformation is \n\n.. math::\n\n    S := \\{S1[i,j,k]\\to[i,j,k]\\}\n    \nwhich is an identity mapping that keeps the original loop. \nWe could calculate the dependence distances for the above-mentioned three dependences \n:math:`D1`, :math:`D2`, and :math:`D3` under the schedule :math:`S`, which are :math:`(0,0,1)`, :math:`(0,1,0)`, \nand :math:`(1,0,0)`. \nAll dependences are uniform (we omit the discussion about output and anti dependences for brevity). \nBesides, dependence distances on all three loops are no greater than one. \nTherefore, all three loops are eligible to be selected as the space loops. \nAs an example, we select the first two loops :math:`i` and :math:`j` as \nspace loops and leave the loop :math:`k` as the time loop. \nThe transformed code after space-time transformation is shown below.\n\n.. image::  images/mm_st_code.png\n    :width: 400\n    :align: center\n\nThis transformation leads to the 2D systolic array as shown in `An Example of Systolic Array`_\n"
  },
  {
    "path": "install.sh",
    "content": "#!/bin/sh\n# Initialize ISL and PET\ngit submodule init\ngit submodule update\n(cd src/isl; git submodule init imath; git submodule update imath)\n(cd src/barvinok; ./get_submodules.sh)\n\n# Install python packages\npip3 install -r requirements.txt\n\n# Patch ISL\necho \"Patch ISL\"\n(cd ./autosa_scripts/ppcg_changes/isl; ./isl_patch.sh)\n\n# Compilation\n(cd src; echo \"autogen\"; ./autogen.sh; echo \"configure\"; ./configure; echo \"make\"; make -j4)\n\n# Cleanup \ncp ./autosa_scripts/autosa.py ./autosa\n(mkdir autosa.tmp; cd autosa.tmp; mkdir output optimizer; cd output; mkdir src latency_est resource_est tuning)\n"
  },
  {
    "path": "ltmain.sh",
    "content": "#! /bin/sh\n## DO NOT EDIT - This file generated from ./build-aux/ltmain.in\n##               by inline-source v2014-01-03.01\n\n# libtool (GNU libtool) 2.4.6\n# Provide generalized library-building support services.\n# Written by Gordon Matzigkeit <gord@gnu.ai.mit.edu>, 1996\n\n# Copyright (C) 1996-2015 Free Software Foundation, Inc.\n# This is free software; see the source for copying conditions.  There is NO\n# warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n\n# GNU Libtool is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# (at your option) any later version.\n#\n# As a special exception to the GNU General Public License,\n# if you distribute this file as part of a program or library that\n# is built using GNU Libtool, you may include this file under the\n# same distribution terms that you use for the rest of that program.\n#\n# GNU Libtool is distributed in the hope that it will be useful, but\n# WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU\n# General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License\n# along with this program.  If not, see <http://www.gnu.org/licenses/>.\n\n\nPROGRAM=libtool\nPACKAGE=libtool\nVERSION=\"2.4.6 Debian-2.4.6-0.1\"\npackage_revision=2.4.6\n\n\n## ------ ##\n## Usage. ##\n## ------ ##\n\n# Run './libtool --help' for help with using this script from the\n# command line.\n\n\n## ------------------------------- ##\n## User overridable command paths. ##\n## ------------------------------- ##\n\n# After configure completes, it has a better idea of some of the\n# shell tools we need than the defaults used by the functions shared\n# with bootstrap, so set those here where they can still be over-\n# ridden by the user, but otherwise take precedence.\n\n: ${AUTOCONF=\"autoconf\"}\n: ${AUTOMAKE=\"automake\"}\n\n\n## -------------------------- ##\n## Source external libraries. ##\n## -------------------------- ##\n\n# Much of our low-level functionality needs to be sourced from external\n# libraries, which are installed to $pkgauxdir.\n\n# Set a version string for this script.\nscriptversion=2015-01-20.17; # UTC\n\n# General shell script boiler plate, and helper functions.\n# Written by Gary V. Vaughan, 2004\n\n# Copyright (C) 2004-2015 Free Software Foundation, Inc.\n# This is free software; see the source for copying conditions.  There is NO\n# warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 3 of the License, or\n# (at your option) any later version.\n\n# As a special exception to the GNU General Public License, if you distribute\n# this file as part of a program or library that is built using GNU Libtool,\n# you may include this file under the same distribution terms that you use\n# for the rest of that program.\n\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNES FOR A PARTICULAR PURPOSE. See the GNU\n# General Public License for more details.\n\n# You should have received a copy of the GNU General Public License\n# along with this program. If not, see <http://www.gnu.org/licenses/>.\n\n# Please report bugs or propose patches to gary@gnu.org.\n\n\n## ------ ##\n## Usage. ##\n## ------ ##\n\n# Evaluate this file near the top of your script to gain access to\n# the functions and variables defined here:\n#\n#   . `echo \"$0\" | ${SED-sed} 's|[^/]*$||'`/build-aux/funclib.sh\n#\n# If you need to override any of the default environment variable\n# settings, do that before evaluating this file.\n\n\n## -------------------- ##\n## Shell normalisation. ##\n## -------------------- ##\n\n# Some shells need a little help to be as Bourne compatible as possible.\n# Before doing anything else, make sure all that help has been provided!\n\nDUALCASE=1; export DUALCASE # for MKS sh\nif test -n \"${ZSH_VERSION+set}\" && (emulate sh) >/dev/null 2>&1; then :\n  emulate sh\n  NULLCMD=:\n  # Pre-4.2 versions of Zsh do word splitting on ${1+\"$@\"}, which\n  # is contrary to our usage.  Disable this feature.\n  alias -g '${1+\"$@\"}'='\"$@\"'\n  setopt NO_GLOB_SUBST\nelse\n  case `(set -o) 2>/dev/null` in *posix*) set -o posix ;; esac\nfi\n\n# NLS nuisances: We save the old values in case they are required later.\n_G_user_locale=\n_G_safe_locale=\nfor _G_var in LANG LANGUAGE LC_ALL LC_CTYPE LC_COLLATE LC_MESSAGES\ndo\n  eval \"if test set = \\\"\\${$_G_var+set}\\\"; then\n          save_$_G_var=\\$$_G_var\n          $_G_var=C\n\t  export $_G_var\n\t  _G_user_locale=\\\"$_G_var=\\\\\\$save_\\$_G_var; \\$_G_user_locale\\\"\n\t  _G_safe_locale=\\\"$_G_var=C; \\$_G_safe_locale\\\"\n\tfi\"\ndone\n\n# CDPATH.\n(unset CDPATH) >/dev/null 2>&1 && unset CDPATH\n\n# Make sure IFS has a sensible default\nsp=' '\nnl='\n'\nIFS=\"$sp\t$nl\"\n\n# There are apparently some retarded systems that use ';' as a PATH separator!\nif test \"${PATH_SEPARATOR+set}\" != set; then\n  PATH_SEPARATOR=:\n  (PATH='/bin;/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 && {\n    (PATH='/bin:/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 ||\n      PATH_SEPARATOR=';'\n  }\nfi\n\n\n\n## ------------------------- ##\n## Locate command utilities. ##\n## ------------------------- ##\n\n\n# func_executable_p FILE\n# ----------------------\n# Check that FILE is an executable regular file.\nfunc_executable_p ()\n{\n    test -f \"$1\" && test -x \"$1\"\n}\n\n\n# func_path_progs PROGS_LIST CHECK_FUNC [PATH]\n# --------------------------------------------\n# Search for either a program that responds to --version with output\n# containing \"GNU\", or else returned by CHECK_FUNC otherwise, by\n# trying all the directories in PATH with each of the elements of\n# PROGS_LIST.\n#\n# CHECK_FUNC should accept the path to a candidate program, and\n# set $func_check_prog_result if it truncates its output less than\n# $_G_path_prog_max characters.\nfunc_path_progs ()\n{\n    _G_progs_list=$1\n    _G_check_func=$2\n    _G_PATH=${3-\"$PATH\"}\n\n    _G_path_prog_max=0\n    _G_path_prog_found=false\n    _G_save_IFS=$IFS; IFS=${PATH_SEPARATOR-:}\n    for _G_dir in $_G_PATH; do\n      IFS=$_G_save_IFS\n      test -z \"$_G_dir\" && _G_dir=.\n      for _G_prog_name in $_G_progs_list; do\n        for _exeext in '' .EXE; do\n          _G_path_prog=$_G_dir/$_G_prog_name$_exeext\n          func_executable_p \"$_G_path_prog\" || continue\n          case `\"$_G_path_prog\" --version 2>&1` in\n            *GNU*) func_path_progs_result=$_G_path_prog _G_path_prog_found=: ;;\n            *)     $_G_check_func $_G_path_prog\n\t\t   func_path_progs_result=$func_check_prog_result\n\t\t   ;;\n          esac\n          $_G_path_prog_found && break 3\n        done\n      done\n    done\n    IFS=$_G_save_IFS\n    test -z \"$func_path_progs_result\" && {\n      echo \"no acceptable sed could be found in \\$PATH\" >&2\n      exit 1\n    }\n}\n\n\n# We want to be able to use the functions in this file before configure\n# has figured out where the best binaries are kept, which means we have\n# to search for them ourselves - except when the results are already set\n# where we skip the searches.\n\n# Unless the user overrides by setting SED, search the path for either GNU\n# sed, or the sed that truncates its output the least.\ntest -z \"$SED\" && {\n  _G_sed_script=s/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb/\n  for _G_i in 1 2 3 4 5 6 7; do\n    _G_sed_script=$_G_sed_script$nl$_G_sed_script\n  done\n  echo \"$_G_sed_script\" 2>/dev/null | sed 99q >conftest.sed\n  _G_sed_script=\n\n  func_check_prog_sed ()\n  {\n    _G_path_prog=$1\n\n    _G_count=0\n    printf 0123456789 >conftest.in\n    while :\n    do\n      cat conftest.in conftest.in >conftest.tmp\n      mv conftest.tmp conftest.in\n      cp conftest.in conftest.nl\n      echo '' >> conftest.nl\n      \"$_G_path_prog\" -f conftest.sed <conftest.nl >conftest.out 2>/dev/null || break\n      diff conftest.out conftest.nl >/dev/null 2>&1 || break\n      _G_count=`expr $_G_count + 1`\n      if test \"$_G_count\" -gt \"$_G_path_prog_max\"; then\n        # Best one so far, save it but keep looking for a better one\n        func_check_prog_result=$_G_path_prog\n        _G_path_prog_max=$_G_count\n      fi\n      # 10*(2^10) chars as input seems more than enough\n      test 10 -lt \"$_G_count\" && break\n    done\n    rm -f conftest.in conftest.tmp conftest.nl conftest.out\n  }\n\n  func_path_progs \"sed gsed\" func_check_prog_sed $PATH:/usr/xpg4/bin\n  rm -f conftest.sed\n  SED=$func_path_progs_result\n}\n\n\n# Unless the user overrides by setting GREP, search the path for either GNU\n# grep, or the grep that truncates its output the least.\ntest -z \"$GREP\" && {\n  func_check_prog_grep ()\n  {\n    _G_path_prog=$1\n\n    _G_count=0\n    _G_path_prog_max=0\n    printf 0123456789 >conftest.in\n    while :\n    do\n      cat conftest.in conftest.in >conftest.tmp\n      mv conftest.tmp conftest.in\n      cp conftest.in conftest.nl\n      echo 'GREP' >> conftest.nl\n      \"$_G_path_prog\" -e 'GREP$' -e '-(cannot match)-' <conftest.nl >conftest.out 2>/dev/null || break\n      diff conftest.out conftest.nl >/dev/null 2>&1 || break\n      _G_count=`expr $_G_count + 1`\n      if test \"$_G_count\" -gt \"$_G_path_prog_max\"; then\n        # Best one so far, save it but keep looking for a better one\n        func_check_prog_result=$_G_path_prog\n        _G_path_prog_max=$_G_count\n      fi\n      # 10*(2^10) chars as input seems more than enough\n      test 10 -lt \"$_G_count\" && break\n    done\n    rm -f conftest.in conftest.tmp conftest.nl conftest.out\n  }\n\n  func_path_progs \"grep ggrep\" func_check_prog_grep $PATH:/usr/xpg4/bin\n  GREP=$func_path_progs_result\n}\n\n\n## ------------------------------- ##\n## User overridable command paths. ##\n## ------------------------------- ##\n\n# All uppercase variable names are used for environment variables.  These\n# variables can be overridden by the user before calling a script that\n# uses them if a suitable command of that name is not already available\n# in the command search PATH.\n\n: ${CP=\"cp -f\"}\n: ${ECHO=\"printf %s\\n\"}\n: ${EGREP=\"$GREP -E\"}\n: ${FGREP=\"$GREP -F\"}\n: ${LN_S=\"ln -s\"}\n: ${MAKE=\"make\"}\n: ${MKDIR=\"mkdir\"}\n: ${MV=\"mv -f\"}\n: ${RM=\"rm -f\"}\n: ${SHELL=\"${CONFIG_SHELL-/bin/sh}\"}\n\n\n## -------------------- ##\n## Useful sed snippets. ##\n## -------------------- ##\n\nsed_dirname='s|/[^/]*$||'\nsed_basename='s|^.*/||'\n\n# Sed substitution that helps us do robust quoting.  It backslashifies\n# metacharacters that are still active within double-quoted strings.\nsed_quote_subst='s|\\([`\"$\\\\]\\)|\\\\\\1|g'\n\n# Same as above, but do not quote variable references.\nsed_double_quote_subst='s/\\([\"`\\\\]\\)/\\\\\\1/g'\n\n# Sed substitution that turns a string into a regex matching for the\n# string literally.\nsed_make_literal_regex='s|[].[^$\\\\*\\/]|\\\\&|g'\n\n# Sed substitution that converts a w32 file name or path\n# that contains forward slashes, into one that contains\n# (escaped) backslashes.  A very naive implementation.\nsed_naive_backslashify='s|\\\\\\\\*|\\\\|g;s|/|\\\\|g;s|\\\\|\\\\\\\\|g'\n\n# Re-'\\' parameter expansions in output of sed_double_quote_subst that\n# were '\\'-ed in input to the same.  If an odd number of '\\' preceded a\n# '$' in input to sed_double_quote_subst, that '$' was protected from\n# expansion.  Since each input '\\' is now two '\\'s, look for any number\n# of runs of four '\\'s followed by two '\\'s and then a '$'.  '\\' that '$'.\n_G_bs='\\\\'\n_G_bs2='\\\\\\\\'\n_G_bs4='\\\\\\\\\\\\\\\\'\n_G_dollar='\\$'\nsed_double_backslash=\"\\\n  s/$_G_bs4/&\\\\\n/g\n  s/^$_G_bs2$_G_dollar/$_G_bs&/\n  s/\\\\([^$_G_bs]\\\\)$_G_bs2$_G_dollar/\\\\1$_G_bs2$_G_bs$_G_dollar/g\n  s/\\n//g\"\n\n\n## ----------------- ##\n## Global variables. ##\n## ----------------- ##\n\n# Except for the global variables explicitly listed below, the following\n# functions in the '^func_' namespace, and the '^require_' namespace\n# variables initialised in the 'Resource management' section, sourcing\n# this file will not pollute your global namespace with anything\n# else. There's no portable way to scope variables in Bourne shell\n# though, so actually running these functions will sometimes place\n# results into a variable named after the function, and often use\n# temporary variables in the '^_G_' namespace. If you are careful to\n# avoid using those namespaces casually in your sourcing script, things\n# should continue to work as you expect. And, of course, you can freely\n# overwrite any of the functions or variables defined here before\n# calling anything to customize them.\n\nEXIT_SUCCESS=0\nEXIT_FAILURE=1\nEXIT_MISMATCH=63  # $? = 63 is used to indicate version mismatch to missing.\nEXIT_SKIP=77\t  # $? = 77 is used to indicate a skipped test to automake.\n\n# Allow overriding, eg assuming that you follow the convention of\n# putting '$debug_cmd' at the start of all your functions, you can get\n# bash to show function call trace with:\n#\n#    debug_cmd='eval echo \"${FUNCNAME[0]} $*\" >&2' bash your-script-name\ndebug_cmd=${debug_cmd-\":\"}\nexit_cmd=:\n\n# By convention, finish your script with:\n#\n#    exit $exit_status\n#\n# so that you can set exit_status to non-zero if you want to indicate\n# something went wrong during execution without actually bailing out at\n# the point of failure.\nexit_status=$EXIT_SUCCESS\n\n# Work around backward compatibility issue on IRIX 6.5. On IRIX 6.4+, sh\n# is ksh but when the shell is invoked as \"sh\" and the current value of\n# the _XPG environment variable is not equal to 1 (one), the special\n# positional parameter $0, within a function call, is the name of the\n# function.\nprogpath=$0\n\n# The name of this program.\nprogname=`$ECHO \"$progpath\" |$SED \"$sed_basename\"`\n\n# Make sure we have an absolute progpath for reexecution:\ncase $progpath in\n  [\\\\/]*|[A-Za-z]:\\\\*) ;;\n  *[\\\\/]*)\n     progdir=`$ECHO \"$progpath\" |$SED \"$sed_dirname\"`\n     progdir=`cd \"$progdir\" && pwd`\n     progpath=$progdir/$progname\n     ;;\n  *)\n     _G_IFS=$IFS\n     IFS=${PATH_SEPARATOR-:}\n     for progdir in $PATH; do\n       IFS=$_G_IFS\n       test -x \"$progdir/$progname\" && break\n     done\n     IFS=$_G_IFS\n     test -n \"$progdir\" || progdir=`pwd`\n     progpath=$progdir/$progname\n     ;;\nesac\n\n\n## ----------------- ##\n## Standard options. ##\n## ----------------- ##\n\n# The following options affect the operation of the functions defined\n# below, and should be set appropriately depending on run-time para-\n# meters passed on the command line.\n\nopt_dry_run=false\nopt_quiet=false\nopt_verbose=false\n\n# Categories 'all' and 'none' are always available.  Append any others\n# you will pass as the first argument to func_warning from your own\n# code.\nwarning_categories=\n\n# By default, display warnings according to 'opt_warning_types'.  Set\n# 'warning_func'  to ':' to elide all warnings, or func_fatal_error to\n# treat the next displayed warning as a fatal error.\nwarning_func=func_warn_and_continue\n\n# Set to 'all' to display all warnings, 'none' to suppress all\n# warnings, or a space delimited list of some subset of\n# 'warning_categories' to display only the listed warnings.\nopt_warning_types=all\n\n\n## -------------------- ##\n## Resource management. ##\n## -------------------- ##\n\n# This section contains definitions for functions that each ensure a\n# particular resource (a file, or a non-empty configuration variable for\n# example) is available, and if appropriate to extract default values\n# from pertinent package files. Call them using their associated\n# 'require_*' variable to ensure that they are executed, at most, once.\n#\n# It's entirely deliberate that calling these functions can set\n# variables that don't obey the namespace limitations obeyed by the rest\n# of this file, in order that that they be as useful as possible to\n# callers.\n\n\n# require_term_colors\n# -------------------\n# Allow display of bold text on terminals that support it.\nrequire_term_colors=func_require_term_colors\nfunc_require_term_colors ()\n{\n    $debug_cmd\n\n    test -t 1 && {\n      # COLORTERM and USE_ANSI_COLORS environment variables take\n      # precedence, because most terminfo databases neglect to describe\n      # whether color sequences are supported.\n      test -n \"${COLORTERM+set}\" && : ${USE_ANSI_COLORS=\"1\"}\n\n      if test 1 = \"$USE_ANSI_COLORS\"; then\n        # Standard ANSI escape sequences\n        tc_reset='\u001b[0m'\n        tc_bold='\u001b[1m';   tc_standout='\u001b[7m'\n        tc_red='\u001b[31m';   tc_green='\u001b[32m'\n        tc_blue='\u001b[34m';  tc_cyan='\u001b[36m'\n      else\n        # Otherwise trust the terminfo database after all.\n        test -n \"`tput sgr0 2>/dev/null`\" && {\n          tc_reset=`tput sgr0`\n          test -n \"`tput bold 2>/dev/null`\" && tc_bold=`tput bold`\n          tc_standout=$tc_bold\n          test -n \"`tput smso 2>/dev/null`\" && tc_standout=`tput smso`\n          test -n \"`tput setaf 1 2>/dev/null`\" && tc_red=`tput setaf 1`\n          test -n \"`tput setaf 2 2>/dev/null`\" && tc_green=`tput setaf 2`\n          test -n \"`tput setaf 4 2>/dev/null`\" && tc_blue=`tput setaf 4`\n          test -n \"`tput setaf 5 2>/dev/null`\" && tc_cyan=`tput setaf 5`\n        }\n      fi\n    }\n\n    require_term_colors=:\n}\n\n\n## ----------------- ##\n## Function library. ##\n## ----------------- ##\n\n# This section contains a variety of useful functions to call in your\n# scripts. Take note of the portable wrappers for features provided by\n# some modern shells, which will fall back to slower equivalents on\n# less featureful shells.\n\n\n# func_append VAR VALUE\n# ---------------------\n# Append VALUE onto the existing contents of VAR.\n\n  # We should try to minimise forks, especially on Windows where they are\n  # unreasonably slow, so skip the feature probes when bash or zsh are\n  # being used:\n  if test set = \"${BASH_VERSION+set}${ZSH_VERSION+set}\"; then\n    : ${_G_HAVE_ARITH_OP=\"yes\"}\n    : ${_G_HAVE_XSI_OPS=\"yes\"}\n    # The += operator was introduced in bash 3.1\n    case $BASH_VERSION in\n      [12].* | 3.0 | 3.0*) ;;\n      *)\n        : ${_G_HAVE_PLUSEQ_OP=\"yes\"}\n        ;;\n    esac\n  fi\n\n  # _G_HAVE_PLUSEQ_OP\n  # Can be empty, in which case the shell is probed, \"yes\" if += is\n  # useable or anything else if it does not work.\n  test -z \"$_G_HAVE_PLUSEQ_OP\" \\\n    && (eval 'x=a; x+=\" b\"; test \"a b\" = \"$x\"') 2>/dev/null \\\n    && _G_HAVE_PLUSEQ_OP=yes\n\nif test yes = \"$_G_HAVE_PLUSEQ_OP\"\nthen\n  # This is an XSI compatible shell, allowing a faster implementation...\n  eval 'func_append ()\n  {\n    $debug_cmd\n\n    eval \"$1+=\\$2\"\n  }'\nelse\n  # ...otherwise fall back to using expr, which is often a shell builtin.\n  func_append ()\n  {\n    $debug_cmd\n\n    eval \"$1=\\$$1\\$2\"\n  }\nfi\n\n\n# func_append_quoted VAR VALUE\n# ----------------------------\n# Quote VALUE and append to the end of shell variable VAR, separated\n# by a space.\nif test yes = \"$_G_HAVE_PLUSEQ_OP\"; then\n  eval 'func_append_quoted ()\n  {\n    $debug_cmd\n\n    func_quote_for_eval \"$2\"\n    eval \"$1+=\\\\ \\$func_quote_for_eval_result\"\n  }'\nelse\n  func_append_quoted ()\n  {\n    $debug_cmd\n\n    func_quote_for_eval \"$2\"\n    eval \"$1=\\$$1\\\\ \\$func_quote_for_eval_result\"\n  }\nfi\n\n\n# func_append_uniq VAR VALUE\n# --------------------------\n# Append unique VALUE onto the existing contents of VAR, assuming\n# entries are delimited by the first character of VALUE.  For example:\n#\n#   func_append_uniq options \" --another-option option-argument\"\n#\n# will only append to $options if \" --another-option option-argument \"\n# is not already present somewhere in $options already (note spaces at\n# each end implied by leading space in second argument).\nfunc_append_uniq ()\n{\n    $debug_cmd\n\n    eval _G_current_value='`$ECHO $'$1'`'\n    _G_delim=`expr \"$2\" : '\\(.\\)'`\n\n    case $_G_delim$_G_current_value$_G_delim in\n      *\"$2$_G_delim\"*) ;;\n      *) func_append \"$@\" ;;\n    esac\n}\n\n\n# func_arith TERM...\n# ------------------\n# Set func_arith_result to the result of evaluating TERMs.\n  test -z \"$_G_HAVE_ARITH_OP\" \\\n    && (eval 'test 2 = $(( 1 + 1 ))') 2>/dev/null \\\n    && _G_HAVE_ARITH_OP=yes\n\nif test yes = \"$_G_HAVE_ARITH_OP\"; then\n  eval 'func_arith ()\n  {\n    $debug_cmd\n\n    func_arith_result=$(( $* ))\n  }'\nelse\n  func_arith ()\n  {\n    $debug_cmd\n\n    func_arith_result=`expr \"$@\"`\n  }\nfi\n\n\n# func_basename FILE\n# ------------------\n# Set func_basename_result to FILE with everything up to and including\n# the last / stripped.\nif test yes = \"$_G_HAVE_XSI_OPS\"; then\n  # If this shell supports suffix pattern removal, then use it to avoid\n  # forking. Hide the definitions single quotes in case the shell chokes\n  # on unsupported syntax...\n  _b='func_basename_result=${1##*/}'\n  _d='case $1 in\n        */*) func_dirname_result=${1%/*}$2 ;;\n        *  ) func_dirname_result=$3        ;;\n      esac'\n\nelse\n  # ...otherwise fall back to using sed.\n  _b='func_basename_result=`$ECHO \"$1\" |$SED \"$sed_basename\"`'\n  _d='func_dirname_result=`$ECHO \"$1\"  |$SED \"$sed_dirname\"`\n      if test \"X$func_dirname_result\" = \"X$1\"; then\n        func_dirname_result=$3\n      else\n        func_append func_dirname_result \"$2\"\n      fi'\nfi\n\neval 'func_basename ()\n{\n    $debug_cmd\n\n    '\"$_b\"'\n}'\n\n\n# func_dirname FILE APPEND NONDIR_REPLACEMENT\n# -------------------------------------------\n# Compute the dirname of FILE.  If nonempty, add APPEND to the result,\n# otherwise set result to NONDIR_REPLACEMENT.\neval 'func_dirname ()\n{\n    $debug_cmd\n\n    '\"$_d\"'\n}'\n\n\n# func_dirname_and_basename FILE APPEND NONDIR_REPLACEMENT\n# --------------------------------------------------------\n# Perform func_basename and func_dirname in a single function\n# call:\n#   dirname:  Compute the dirname of FILE.  If nonempty,\n#             add APPEND to the result, otherwise set result\n#             to NONDIR_REPLACEMENT.\n#             value returned in \"$func_dirname_result\"\n#   basename: Compute filename of FILE.\n#             value retuned in \"$func_basename_result\"\n# For efficiency, we do not delegate to the functions above but instead\n# duplicate the functionality here.\neval 'func_dirname_and_basename ()\n{\n    $debug_cmd\n\n    '\"$_b\"'\n    '\"$_d\"'\n}'\n\n\n# func_echo ARG...\n# ----------------\n# Echo program name prefixed message.\nfunc_echo ()\n{\n    $debug_cmd\n\n    _G_message=$*\n\n    func_echo_IFS=$IFS\n    IFS=$nl\n    for _G_line in $_G_message; do\n      IFS=$func_echo_IFS\n      $ECHO \"$progname: $_G_line\"\n    done\n    IFS=$func_echo_IFS\n}\n\n\n# func_echo_all ARG...\n# --------------------\n# Invoke $ECHO with all args, space-separated.\nfunc_echo_all ()\n{\n    $ECHO \"$*\"\n}\n\n\n# func_echo_infix_1 INFIX ARG...\n# ------------------------------\n# Echo program name, followed by INFIX on the first line, with any\n# additional lines not showing INFIX.\nfunc_echo_infix_1 ()\n{\n    $debug_cmd\n\n    $require_term_colors\n\n    _G_infix=$1; shift\n    _G_indent=$_G_infix\n    _G_prefix=\"$progname: $_G_infix: \"\n    _G_message=$*\n\n    # Strip color escape sequences before counting printable length\n    for _G_tc in \"$tc_reset\" \"$tc_bold\" \"$tc_standout\" \"$tc_red\" \"$tc_green\" \"$tc_blue\" \"$tc_cyan\"\n    do\n      test -n \"$_G_tc\" && {\n        _G_esc_tc=`$ECHO \"$_G_tc\" | $SED \"$sed_make_literal_regex\"`\n        _G_indent=`$ECHO \"$_G_indent\" | $SED \"s|$_G_esc_tc||g\"`\n      }\n    done\n    _G_indent=\"$progname: \"`echo \"$_G_indent\" | $SED 's|.| |g'`\"  \" ## exclude from sc_prohibit_nested_quotes\n\n    func_echo_infix_1_IFS=$IFS\n    IFS=$nl\n    for _G_line in $_G_message; do\n      IFS=$func_echo_infix_1_IFS\n      $ECHO \"$_G_prefix$tc_bold$_G_line$tc_reset\" >&2\n      _G_prefix=$_G_indent\n    done\n    IFS=$func_echo_infix_1_IFS\n}\n\n\n# func_error ARG...\n# -----------------\n# Echo program name prefixed message to standard error.\nfunc_error ()\n{\n    $debug_cmd\n\n    $require_term_colors\n\n    func_echo_infix_1 \"  $tc_standout${tc_red}error$tc_reset\" \"$*\" >&2\n}\n\n\n# func_fatal_error ARG...\n# -----------------------\n# Echo program name prefixed message to standard error, and exit.\nfunc_fatal_error ()\n{\n    $debug_cmd\n\n    func_error \"$*\"\n    exit $EXIT_FAILURE\n}\n\n\n# func_grep EXPRESSION FILENAME\n# -----------------------------\n# Check whether EXPRESSION matches any line of FILENAME, without output.\nfunc_grep ()\n{\n    $debug_cmd\n\n    $GREP \"$1\" \"$2\" >/dev/null 2>&1\n}\n\n\n# func_len STRING\n# ---------------\n# Set func_len_result to the length of STRING. STRING may not\n# start with a hyphen.\n  test -z \"$_G_HAVE_XSI_OPS\" \\\n    && (eval 'x=a/b/c;\n      test 5aa/bb/cc = \"${#x}${x%%/*}${x%/*}${x#*/}${x##*/}\"') 2>/dev/null \\\n    && _G_HAVE_XSI_OPS=yes\n\nif test yes = \"$_G_HAVE_XSI_OPS\"; then\n  eval 'func_len ()\n  {\n    $debug_cmd\n\n    func_len_result=${#1}\n  }'\nelse\n  func_len ()\n  {\n    $debug_cmd\n\n    func_len_result=`expr \"$1\" : \".*\" 2>/dev/null || echo $max_cmd_len`\n  }\nfi\n\n\n# func_mkdir_p DIRECTORY-PATH\n# ---------------------------\n# Make sure the entire path to DIRECTORY-PATH is available.\nfunc_mkdir_p ()\n{\n    $debug_cmd\n\n    _G_directory_path=$1\n    _G_dir_list=\n\n    if test -n \"$_G_directory_path\" && test : != \"$opt_dry_run\"; then\n\n      # Protect directory names starting with '-'\n      case $_G_directory_path in\n        -*) _G_directory_path=./$_G_directory_path ;;\n      esac\n\n      # While some portion of DIR does not yet exist...\n      while test ! -d \"$_G_directory_path\"; do\n        # ...make a list in topmost first order.  Use a colon delimited\n\t# list incase some portion of path contains whitespace.\n        _G_dir_list=$_G_directory_path:$_G_dir_list\n\n        # If the last portion added has no slash in it, the list is done\n        case $_G_directory_path in */*) ;; *) break ;; esac\n\n        # ...otherwise throw away the child directory and loop\n        _G_directory_path=`$ECHO \"$_G_directory_path\" | $SED -e \"$sed_dirname\"`\n      done\n      _G_dir_list=`$ECHO \"$_G_dir_list\" | $SED 's|:*$||'`\n\n      func_mkdir_p_IFS=$IFS; IFS=:\n      for _G_dir in $_G_dir_list; do\n\tIFS=$func_mkdir_p_IFS\n        # mkdir can fail with a 'File exist' error if two processes\n        # try to create one of the directories concurrently.  Don't\n        # stop in that case!\n        $MKDIR \"$_G_dir\" 2>/dev/null || :\n      done\n      IFS=$func_mkdir_p_IFS\n\n      # Bail out if we (or some other process) failed to create a directory.\n      test -d \"$_G_directory_path\" || \\\n        func_fatal_error \"Failed to create '$1'\"\n    fi\n}\n\n\n# func_mktempdir [BASENAME]\n# -------------------------\n# Make a temporary directory that won't clash with other running\n# libtool processes, and avoids race conditions if possible.  If\n# given, BASENAME is the basename for that directory.\nfunc_mktempdir ()\n{\n    $debug_cmd\n\n    _G_template=${TMPDIR-/tmp}/${1-$progname}\n\n    if test : = \"$opt_dry_run\"; then\n      # Return a directory name, but don't create it in dry-run mode\n      _G_tmpdir=$_G_template-$$\n    else\n\n      # If mktemp works, use that first and foremost\n      _G_tmpdir=`mktemp -d \"$_G_template-XXXXXXXX\" 2>/dev/null`\n\n      if test ! -d \"$_G_tmpdir\"; then\n        # Failing that, at least try and use $RANDOM to avoid a race\n        _G_tmpdir=$_G_template-${RANDOM-0}$$\n\n        func_mktempdir_umask=`umask`\n        umask 0077\n        $MKDIR \"$_G_tmpdir\"\n        umask $func_mktempdir_umask\n      fi\n\n      # If we're not in dry-run mode, bomb out on failure\n      test -d \"$_G_tmpdir\" || \\\n        func_fatal_error \"cannot create temporary directory '$_G_tmpdir'\"\n    fi\n\n    $ECHO \"$_G_tmpdir\"\n}\n\n\n# func_normal_abspath PATH\n# ------------------------\n# Remove doubled-up and trailing slashes, \".\" path components,\n# and cancel out any \"..\" path components in PATH after making\n# it an absolute path.\nfunc_normal_abspath ()\n{\n    $debug_cmd\n\n    # These SED scripts presuppose an absolute path with a trailing slash.\n    _G_pathcar='s|^/\\([^/]*\\).*$|\\1|'\n    _G_pathcdr='s|^/[^/]*||'\n    _G_removedotparts=':dotsl\n\t\ts|/\\./|/|g\n\t\tt dotsl\n\t\ts|/\\.$|/|'\n    _G_collapseslashes='s|/\\{1,\\}|/|g'\n    _G_finalslash='s|/*$|/|'\n\n    # Start from root dir and reassemble the path.\n    func_normal_abspath_result=\n    func_normal_abspath_tpath=$1\n    func_normal_abspath_altnamespace=\n    case $func_normal_abspath_tpath in\n      \"\")\n        # Empty path, that just means $cwd.\n        func_stripname '' '/' \"`pwd`\"\n        func_normal_abspath_result=$func_stripname_result\n        return\n        ;;\n      # The next three entries are used to spot a run of precisely\n      # two leading slashes without using negated character classes;\n      # we take advantage of case's first-match behaviour.\n      ///*)\n        # Unusual form of absolute path, do nothing.\n        ;;\n      //*)\n        # Not necessarily an ordinary path; POSIX reserves leading '//'\n        # and for example Cygwin uses it to access remote file shares\n        # over CIFS/SMB, so we conserve a leading double slash if found.\n        func_normal_abspath_altnamespace=/\n        ;;\n      /*)\n        # Absolute path, do nothing.\n        ;;\n      *)\n        # Relative path, prepend $cwd.\n        func_normal_abspath_tpath=`pwd`/$func_normal_abspath_tpath\n        ;;\n    esac\n\n    # Cancel out all the simple stuff to save iterations.  We also want\n    # the path to end with a slash for ease of parsing, so make sure\n    # there is one (and only one) here.\n    func_normal_abspath_tpath=`$ECHO \"$func_normal_abspath_tpath\" | $SED \\\n          -e \"$_G_removedotparts\" -e \"$_G_collapseslashes\" -e \"$_G_finalslash\"`\n    while :; do\n      # Processed it all yet?\n      if test / = \"$func_normal_abspath_tpath\"; then\n        # If we ascended to the root using \"..\" the result may be empty now.\n        if test -z \"$func_normal_abspath_result\"; then\n          func_normal_abspath_result=/\n        fi\n        break\n      fi\n      func_normal_abspath_tcomponent=`$ECHO \"$func_normal_abspath_tpath\" | $SED \\\n          -e \"$_G_pathcar\"`\n      func_normal_abspath_tpath=`$ECHO \"$func_normal_abspath_tpath\" | $SED \\\n          -e \"$_G_pathcdr\"`\n      # Figure out what to do with it\n      case $func_normal_abspath_tcomponent in\n        \"\")\n          # Trailing empty path component, ignore it.\n          ;;\n        ..)\n          # Parent dir; strip last assembled component from result.\n          func_dirname \"$func_normal_abspath_result\"\n          func_normal_abspath_result=$func_dirname_result\n          ;;\n        *)\n          # Actual path component, append it.\n          func_append func_normal_abspath_result \"/$func_normal_abspath_tcomponent\"\n          ;;\n      esac\n    done\n    # Restore leading double-slash if one was found on entry.\n    func_normal_abspath_result=$func_normal_abspath_altnamespace$func_normal_abspath_result\n}\n\n\n# func_notquiet ARG...\n# --------------------\n# Echo program name prefixed message only when not in quiet mode.\nfunc_notquiet ()\n{\n    $debug_cmd\n\n    $opt_quiet || func_echo ${1+\"$@\"}\n\n    # A bug in bash halts the script if the last line of a function\n    # fails when set -e is in force, so we need another command to\n    # work around that:\n    :\n}\n\n\n# func_relative_path SRCDIR DSTDIR\n# --------------------------------\n# Set func_relative_path_result to the relative path from SRCDIR to DSTDIR.\nfunc_relative_path ()\n{\n    $debug_cmd\n\n    func_relative_path_result=\n    func_normal_abspath \"$1\"\n    func_relative_path_tlibdir=$func_normal_abspath_result\n    func_normal_abspath \"$2\"\n    func_relative_path_tbindir=$func_normal_abspath_result\n\n    # Ascend the tree starting from libdir\n    while :; do\n      # check if we have found a prefix of bindir\n      case $func_relative_path_tbindir in\n        $func_relative_path_tlibdir)\n          # found an exact match\n          func_relative_path_tcancelled=\n          break\n          ;;\n        $func_relative_path_tlibdir*)\n          # found a matching prefix\n          func_stripname \"$func_relative_path_tlibdir\" '' \"$func_relative_path_tbindir\"\n          func_relative_path_tcancelled=$func_stripname_result\n          if test -z \"$func_relative_path_result\"; then\n            func_relative_path_result=.\n          fi\n          break\n          ;;\n        *)\n          func_dirname $func_relative_path_tlibdir\n          func_relative_path_tlibdir=$func_dirname_result\n          if test -z \"$func_relative_path_tlibdir\"; then\n            # Have to descend all the way to the root!\n            func_relative_path_result=../$func_relative_path_result\n            func_relative_path_tcancelled=$func_relative_path_tbindir\n            break\n          fi\n          func_relative_path_result=../$func_relative_path_result\n          ;;\n      esac\n    done\n\n    # Now calculate path; take care to avoid doubling-up slashes.\n    func_stripname '' '/' \"$func_relative_path_result\"\n    func_relative_path_result=$func_stripname_result\n    func_stripname '/' '/' \"$func_relative_path_tcancelled\"\n    if test -n \"$func_stripname_result\"; then\n      func_append func_relative_path_result \"/$func_stripname_result\"\n    fi\n\n    # Normalisation. If bindir is libdir, return '.' else relative path.\n    if test -n \"$func_relative_path_result\"; then\n      func_stripname './' '' \"$func_relative_path_result\"\n      func_relative_path_result=$func_stripname_result\n    fi\n\n    test -n \"$func_relative_path_result\" || func_relative_path_result=.\n\n    :\n}\n\n\n# func_quote_for_eval ARG...\n# --------------------------\n# Aesthetically quote ARGs to be evaled later.\n# This function returns two values:\n#   i) func_quote_for_eval_result\n#      double-quoted, suitable for a subsequent eval\n#  ii) func_quote_for_eval_unquoted_result\n#      has all characters that are still active within double\n#      quotes backslashified.\nfunc_quote_for_eval ()\n{\n    $debug_cmd\n\n    func_quote_for_eval_unquoted_result=\n    func_quote_for_eval_result=\n    while test 0 -lt $#; do\n      case $1 in\n        *[\\\\\\`\\\"\\$]*)\n\t  _G_unquoted_arg=`printf '%s\\n' \"$1\" |$SED \"$sed_quote_subst\"` ;;\n        *)\n          _G_unquoted_arg=$1 ;;\n      esac\n      if test -n \"$func_quote_for_eval_unquoted_result\"; then\n\tfunc_append func_quote_for_eval_unquoted_result \" $_G_unquoted_arg\"\n      else\n        func_append func_quote_for_eval_unquoted_result \"$_G_unquoted_arg\"\n      fi\n\n      case $_G_unquoted_arg in\n        # Double-quote args containing shell metacharacters to delay\n        # word splitting, command substitution and variable expansion\n        # for a subsequent eval.\n        # Many Bourne shells cannot handle close brackets correctly\n        # in scan sets, so we specify it separately.\n        *[\\[\\~\\#\\^\\&\\*\\(\\)\\{\\}\\|\\;\\<\\>\\?\\'\\ \\\t]*|*]*|\"\")\n          _G_quoted_arg=\\\"$_G_unquoted_arg\\\"\n          ;;\n        *)\n          _G_quoted_arg=$_G_unquoted_arg\n\t  ;;\n      esac\n\n      if test -n \"$func_quote_for_eval_result\"; then\n\tfunc_append func_quote_for_eval_result \" $_G_quoted_arg\"\n      else\n        func_append func_quote_for_eval_result \"$_G_quoted_arg\"\n      fi\n      shift\n    done\n}\n\n\n# func_quote_for_expand ARG\n# -------------------------\n# Aesthetically quote ARG to be evaled later; same as above,\n# but do not quote variable references.\nfunc_quote_for_expand ()\n{\n    $debug_cmd\n\n    case $1 in\n      *[\\\\\\`\\\"]*)\n\t_G_arg=`$ECHO \"$1\" | $SED \\\n\t    -e \"$sed_double_quote_subst\" -e \"$sed_double_backslash\"` ;;\n      *)\n        _G_arg=$1 ;;\n    esac\n\n    case $_G_arg in\n      # Double-quote args containing shell metacharacters to delay\n      # word splitting and command substitution for a subsequent eval.\n      # Many Bourne shells cannot handle close brackets correctly\n      # in scan sets, so we specify it separately.\n      *[\\[\\~\\#\\^\\&\\*\\(\\)\\{\\}\\|\\;\\<\\>\\?\\'\\ \\\t]*|*]*|\"\")\n        _G_arg=\\\"$_G_arg\\\"\n        ;;\n    esac\n\n    func_quote_for_expand_result=$_G_arg\n}\n\n\n# func_stripname PREFIX SUFFIX NAME\n# ---------------------------------\n# strip PREFIX and SUFFIX from NAME, and store in func_stripname_result.\n# PREFIX and SUFFIX must not contain globbing or regex special\n# characters, hashes, percent signs, but SUFFIX may contain a leading\n# dot (in which case that matches only a dot).\nif test yes = \"$_G_HAVE_XSI_OPS\"; then\n  eval 'func_stripname ()\n  {\n    $debug_cmd\n\n    # pdksh 5.2.14 does not do ${X%$Y} correctly if both X and Y are\n    # positional parameters, so assign one to ordinary variable first.\n    func_stripname_result=$3\n    func_stripname_result=${func_stripname_result#\"$1\"}\n    func_stripname_result=${func_stripname_result%\"$2\"}\n  }'\nelse\n  func_stripname ()\n  {\n    $debug_cmd\n\n    case $2 in\n      .*) func_stripname_result=`$ECHO \"$3\" | $SED -e \"s%^$1%%\" -e \"s%\\\\\\\\$2\\$%%\"`;;\n      *)  func_stripname_result=`$ECHO \"$3\" | $SED -e \"s%^$1%%\" -e \"s%$2\\$%%\"`;;\n    esac\n  }\nfi\n\n\n# func_show_eval CMD [FAIL_EXP]\n# -----------------------------\n# Unless opt_quiet is true, then output CMD.  Then, if opt_dryrun is\n# not true, evaluate CMD.  If the evaluation of CMD fails, and FAIL_EXP\n# is given, then evaluate it.\nfunc_show_eval ()\n{\n    $debug_cmd\n\n    _G_cmd=$1\n    _G_fail_exp=${2-':'}\n\n    func_quote_for_expand \"$_G_cmd\"\n    eval \"func_notquiet $func_quote_for_expand_result\"\n\n    $opt_dry_run || {\n      eval \"$_G_cmd\"\n      _G_status=$?\n      if test 0 -ne \"$_G_status\"; then\n\teval \"(exit $_G_status); $_G_fail_exp\"\n      fi\n    }\n}\n\n\n# func_show_eval_locale CMD [FAIL_EXP]\n# ------------------------------------\n# Unless opt_quiet is true, then output CMD.  Then, if opt_dryrun is\n# not true, evaluate CMD.  If the evaluation of CMD fails, and FAIL_EXP\n# is given, then evaluate it.  Use the saved locale for evaluation.\nfunc_show_eval_locale ()\n{\n    $debug_cmd\n\n    _G_cmd=$1\n    _G_fail_exp=${2-':'}\n\n    $opt_quiet || {\n      func_quote_for_expand \"$_G_cmd\"\n      eval \"func_echo $func_quote_for_expand_result\"\n    }\n\n    $opt_dry_run || {\n      eval \"$_G_user_locale\n\t    $_G_cmd\"\n      _G_status=$?\n      eval \"$_G_safe_locale\"\n      if test 0 -ne \"$_G_status\"; then\n\teval \"(exit $_G_status); $_G_fail_exp\"\n      fi\n    }\n}\n\n\n# func_tr_sh\n# ----------\n# Turn $1 into a string suitable for a shell variable name.\n# Result is stored in $func_tr_sh_result.  All characters\n# not in the set a-zA-Z0-9_ are replaced with '_'. Further,\n# if $1 begins with a digit, a '_' is prepended as well.\nfunc_tr_sh ()\n{\n    $debug_cmd\n\n    case $1 in\n    [0-9]* | *[!a-zA-Z0-9_]*)\n      func_tr_sh_result=`$ECHO \"$1\" | $SED -e 's/^\\([0-9]\\)/_\\1/' -e 's/[^a-zA-Z0-9_]/_/g'`\n      ;;\n    * )\n      func_tr_sh_result=$1\n      ;;\n    esac\n}\n\n\n# func_verbose ARG...\n# -------------------\n# Echo program name prefixed message in verbose mode only.\nfunc_verbose ()\n{\n    $debug_cmd\n\n    $opt_verbose && func_echo \"$*\"\n\n    :\n}\n\n\n# func_warn_and_continue ARG...\n# -----------------------------\n# Echo program name prefixed warning message to standard error.\nfunc_warn_and_continue ()\n{\n    $debug_cmd\n\n    $require_term_colors\n\n    func_echo_infix_1 \"${tc_red}warning$tc_reset\" \"$*\" >&2\n}\n\n\n# func_warning CATEGORY ARG...\n# ----------------------------\n# Echo program name prefixed warning message to standard error. Warning\n# messages can be filtered according to CATEGORY, where this function\n# elides messages where CATEGORY is not listed in the global variable\n# 'opt_warning_types'.\nfunc_warning ()\n{\n    $debug_cmd\n\n    # CATEGORY must be in the warning_categories list!\n    case \" $warning_categories \" in\n      *\" $1 \"*) ;;\n      *) func_internal_error \"invalid warning category '$1'\" ;;\n    esac\n\n    _G_category=$1\n    shift\n\n    case \" $opt_warning_types \" in\n      *\" $_G_category \"*) $warning_func ${1+\"$@\"} ;;\n    esac\n}\n\n\n# func_sort_ver VER1 VER2\n# -----------------------\n# 'sort -V' is not generally available.\n# Note this deviates from the version comparison in automake\n# in that it treats 1.5 < 1.5.0, and treats 1.4.4a < 1.4-p3a\n# but this should suffice as we won't be specifying old\n# version formats or redundant trailing .0 in bootstrap.conf.\n# If we did want full compatibility then we should probably\n# use m4_version_compare from autoconf.\nfunc_sort_ver ()\n{\n    $debug_cmd\n\n    printf '%s\\n%s\\n' \"$1\" \"$2\" \\\n      | sort -t. -k 1,1n -k 2,2n -k 3,3n -k 4,4n -k 5,5n -k 6,6n -k 7,7n -k 8,8n -k 9,9n\n}\n\n# func_lt_ver PREV CURR\n# ---------------------\n# Return true if PREV and CURR are in the correct order according to\n# func_sort_ver, otherwise false.  Use it like this:\n#\n#  func_lt_ver \"$prev_ver\" \"$proposed_ver\" || func_fatal_error \"...\"\nfunc_lt_ver ()\n{\n    $debug_cmd\n\n    test \"x$1\" = x`func_sort_ver \"$1\" \"$2\" | $SED 1q`\n}\n\n\n# Local variables:\n# mode: shell-script\n# sh-indentation: 2\n# eval: (add-hook 'before-save-hook 'time-stamp)\n# time-stamp-pattern: \"10/scriptversion=%:y-%02m-%02d.%02H; # UTC\"\n# time-stamp-time-zone: \"UTC\"\n# End:\n#! /bin/sh\n\n# Set a version string for this script.\nscriptversion=2014-01-07.03; # UTC\n\n# A portable, pluggable option parser for Bourne shell.\n# Written by Gary V. Vaughan, 2010\n\n# Copyright (C) 2010-2015 Free Software Foundation, Inc.\n# This is free software; see the source for copying conditions.  There is NO\n# warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n\n# This program is free software: you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation, either version 3 of the License, or\n# (at your option) any later version.\n\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n\n# You should have received a copy of the GNU General Public License\n# along with this program.  If not, see <http://www.gnu.org/licenses/>.\n\n# Please report bugs or propose patches to gary@gnu.org.\n\n\n## ------ ##\n## Usage. ##\n## ------ ##\n\n# This file is a library for parsing options in your shell scripts along\n# with assorted other useful supporting features that you can make use\n# of too.\n#\n# For the simplest scripts you might need only:\n#\n#   #!/bin/sh\n#   . relative/path/to/funclib.sh\n#   . relative/path/to/options-parser\n#   scriptversion=1.0\n#   func_options ${1+\"$@\"}\n#   eval set dummy \"$func_options_result\"; shift\n#   ...rest of your script...\n#\n# In order for the '--version' option to work, you will need to have a\n# suitably formatted comment like the one at the top of this file\n# starting with '# Written by ' and ending with '# warranty; '.\n#\n# For '-h' and '--help' to work, you will also need a one line\n# description of your script's purpose in a comment directly above the\n# '# Written by ' line, like the one at the top of this file.\n#\n# The default options also support '--debug', which will turn on shell\n# execution tracing (see the comment above debug_cmd below for another\n# use), and '--verbose' and the func_verbose function to allow your script\n# to display verbose messages only when your user has specified\n# '--verbose'.\n#\n# After sourcing this file, you can plug processing for additional\n# options by amending the variables from the 'Configuration' section\n# below, and following the instructions in the 'Option parsing'\n# section further down.\n\n## -------------- ##\n## Configuration. ##\n## -------------- ##\n\n# You should override these variables in your script after sourcing this\n# file so that they reflect the customisations you have added to the\n# option parser.\n\n# The usage line for option parsing errors and the start of '-h' and\n# '--help' output messages. You can embed shell variables for delayed\n# expansion at the time the message is displayed, but you will need to\n# quote other shell meta-characters carefully to prevent them being\n# expanded when the contents are evaled.\nusage='$progpath [OPTION]...'\n\n# Short help message in response to '-h' and '--help'.  Add to this or\n# override it after sourcing this library to reflect the full set of\n# options your script accepts.\nusage_message=\"\\\n       --debug        enable verbose shell tracing\n   -W, --warnings=CATEGORY\n                      report the warnings falling in CATEGORY [all]\n   -v, --verbose      verbosely report processing\n       --version      print version information and exit\n   -h, --help         print short or long help message and exit\n\"\n\n# Additional text appended to 'usage_message' in response to '--help'.\nlong_help_message=\"\nWarning categories include:\n       'all'          show all warnings\n       'none'         turn off all the warnings\n       'error'        warnings are treated as fatal errors\"\n\n# Help message printed before fatal option parsing errors.\nfatal_help=\"Try '\\$progname --help' for more information.\"\n\n\n\n## ------------------------- ##\n## Hook function management. ##\n## ------------------------- ##\n\n# This section contains functions for adding, removing, and running hooks\n# to the main code.  A hook is just a named list of of function, that can\n# be run in order later on.\n\n# func_hookable FUNC_NAME\n# -----------------------\n# Declare that FUNC_NAME will run hooks added with\n# 'func_add_hook FUNC_NAME ...'.\nfunc_hookable ()\n{\n    $debug_cmd\n\n    func_append hookable_fns \" $1\"\n}\n\n\n# func_add_hook FUNC_NAME HOOK_FUNC\n# ---------------------------------\n# Request that FUNC_NAME call HOOK_FUNC before it returns.  FUNC_NAME must\n# first have been declared \"hookable\" by a call to 'func_hookable'.\nfunc_add_hook ()\n{\n    $debug_cmd\n\n    case \" $hookable_fns \" in\n      *\" $1 \"*) ;;\n      *) func_fatal_error \"'$1' does not accept hook functions.\" ;;\n    esac\n\n    eval func_append ${1}_hooks '\" $2\"'\n}\n\n\n# func_remove_hook FUNC_NAME HOOK_FUNC\n# ------------------------------------\n# Remove HOOK_FUNC from the list of functions called by FUNC_NAME.\nfunc_remove_hook ()\n{\n    $debug_cmd\n\n    eval ${1}_hooks='`$ECHO \"\\$'$1'_hooks\" |$SED \"s| '$2'||\"`'\n}\n\n\n# func_run_hooks FUNC_NAME [ARG]...\n# ---------------------------------\n# Run all hook functions registered to FUNC_NAME.\n# It is assumed that the list of hook functions contains nothing more\n# than a whitespace-delimited list of legal shell function names, and\n# no effort is wasted trying to catch shell meta-characters or preserve\n# whitespace.\nfunc_run_hooks ()\n{\n    $debug_cmd\n\n    case \" $hookable_fns \" in\n      *\" $1 \"*) ;;\n      *) func_fatal_error \"'$1' does not support hook funcions.n\" ;;\n    esac\n\n    eval _G_hook_fns=\\$$1_hooks; shift\n\n    for _G_hook in $_G_hook_fns; do\n      eval $_G_hook '\"$@\"'\n\n      # store returned options list back into positional\n      # parameters for next 'cmd' execution.\n      eval _G_hook_result=\\$${_G_hook}_result\n      eval set dummy \"$_G_hook_result\"; shift\n    done\n\n    func_quote_for_eval ${1+\"$@\"}\n    func_run_hooks_result=$func_quote_for_eval_result\n}\n\n\n\n## --------------- ##\n## Option parsing. ##\n## --------------- ##\n\n# In order to add your own option parsing hooks, you must accept the\n# full positional parameter list in your hook function, remove any\n# options that you action, and then pass back the remaining unprocessed\n# options in '<hooked_function_name>_result', escaped suitably for\n# 'eval'.  Like this:\n#\n#    my_options_prep ()\n#    {\n#        $debug_cmd\n#\n#        # Extend the existing usage message.\n#        usage_message=$usage_message'\n#      -s, --silent       don'\\''t print informational messages\n#    '\n#\n#        func_quote_for_eval ${1+\"$@\"}\n#        my_options_prep_result=$func_quote_for_eval_result\n#    }\n#    func_add_hook func_options_prep my_options_prep\n#\n#\n#    my_silent_option ()\n#    {\n#        $debug_cmd\n#\n#        # Note that for efficiency, we parse as many options as we can\n#        # recognise in a loop before passing the remainder back to the\n#        # caller on the first unrecognised argument we encounter.\n#        while test $# -gt 0; do\n#          opt=$1; shift\n#          case $opt in\n#            --silent|-s) opt_silent=: ;;\n#            # Separate non-argument short options:\n#            -s*)         func_split_short_opt \"$_G_opt\"\n#                         set dummy \"$func_split_short_opt_name\" \\\n#                             \"-$func_split_short_opt_arg\" ${1+\"$@\"}\n#                         shift\n#                         ;;\n#            *)            set dummy \"$_G_opt\" \"$*\"; shift; break ;;\n#          esac\n#        done\n#\n#        func_quote_for_eval ${1+\"$@\"}\n#        my_silent_option_result=$func_quote_for_eval_result\n#    }\n#    func_add_hook func_parse_options my_silent_option\n#\n#\n#    my_option_validation ()\n#    {\n#        $debug_cmd\n#\n#        $opt_silent && $opt_verbose && func_fatal_help \"\\\n#    '--silent' and '--verbose' options are mutually exclusive.\"\n#\n#        func_quote_for_eval ${1+\"$@\"}\n#        my_option_validation_result=$func_quote_for_eval_result\n#    }\n#    func_add_hook func_validate_options my_option_validation\n#\n# You'll alse need to manually amend $usage_message to reflect the extra\n# options you parse.  It's preferable to append if you can, so that\n# multiple option parsing hooks can be added safely.\n\n\n# func_options [ARG]...\n# ---------------------\n# All the functions called inside func_options are hookable. See the\n# individual implementations for details.\nfunc_hookable func_options\nfunc_options ()\n{\n    $debug_cmd\n\n    func_options_prep ${1+\"$@\"}\n    eval func_parse_options \\\n        ${func_options_prep_result+\"$func_options_prep_result\"}\n    eval func_validate_options \\\n        ${func_parse_options_result+\"$func_parse_options_result\"}\n\n    eval func_run_hooks func_options \\\n        ${func_validate_options_result+\"$func_validate_options_result\"}\n\n    # save modified positional parameters for caller\n    func_options_result=$func_run_hooks_result\n}\n\n\n# func_options_prep [ARG]...\n# --------------------------\n# All initialisations required before starting the option parse loop.\n# Note that when calling hook functions, we pass through the list of\n# positional parameters.  If a hook function modifies that list, and\n# needs to propogate that back to rest of this script, then the complete\n# modified list must be put in 'func_run_hooks_result' before\n# returning.\nfunc_hookable func_options_prep\nfunc_options_prep ()\n{\n    $debug_cmd\n\n    # Option defaults:\n    opt_verbose=false\n    opt_warning_types=\n\n    func_run_hooks func_options_prep ${1+\"$@\"}\n\n    # save modified positional parameters for caller\n    func_options_prep_result=$func_run_hooks_result\n}\n\n\n# func_parse_options [ARG]...\n# ---------------------------\n# The main option parsing loop.\nfunc_hookable func_parse_options\nfunc_parse_options ()\n{\n    $debug_cmd\n\n    func_parse_options_result=\n\n    # this just eases exit handling\n    while test $# -gt 0; do\n      # Defer to hook functions for initial option parsing, so they\n      # get priority in the event of reusing an option name.\n      func_run_hooks func_parse_options ${1+\"$@\"}\n\n      # Adjust func_parse_options positional parameters to match\n      eval set dummy \"$func_run_hooks_result\"; shift\n\n      # Break out of the loop if we already parsed every option.\n      test $# -gt 0 || break\n\n      _G_opt=$1\n      shift\n      case $_G_opt in\n        --debug|-x)   debug_cmd='set -x'\n                      func_echo \"enabling shell trace mode\"\n                      $debug_cmd\n                      ;;\n\n        --no-warnings|--no-warning|--no-warn)\n                      set dummy --warnings none ${1+\"$@\"}\n                      shift\n\t\t      ;;\n\n        --warnings|--warning|-W)\n                      test $# = 0 && func_missing_arg $_G_opt && break\n                      case \" $warning_categories $1\" in\n                        *\" $1 \"*)\n                          # trailing space prevents matching last $1 above\n                          func_append_uniq opt_warning_types \" $1\"\n                          ;;\n                        *all)\n                          opt_warning_types=$warning_categories\n                          ;;\n                        *none)\n                          opt_warning_types=none\n                          warning_func=:\n                          ;;\n                        *error)\n                          opt_warning_types=$warning_categories\n                          warning_func=func_fatal_error\n                          ;;\n                        *)\n                          func_fatal_error \\\n                             \"unsupported warning category: '$1'\"\n                          ;;\n                      esac\n                      shift\n                      ;;\n\n        --verbose|-v) opt_verbose=: ;;\n        --version)    func_version ;;\n        -\\?|-h)       func_usage ;;\n        --help)       func_help ;;\n\n\t# Separate optargs to long options (plugins may need this):\n\t--*=*)        func_split_equals \"$_G_opt\"\n\t              set dummy \"$func_split_equals_lhs\" \\\n                          \"$func_split_equals_rhs\" ${1+\"$@\"}\n                      shift\n                      ;;\n\n       # Separate optargs to short options:\n        -W*)\n                      func_split_short_opt \"$_G_opt\"\n                      set dummy \"$func_split_short_opt_name\" \\\n                          \"$func_split_short_opt_arg\" ${1+\"$@\"}\n                      shift\n                      ;;\n\n        # Separate non-argument short options:\n        -\\?*|-h*|-v*|-x*)\n                      func_split_short_opt \"$_G_opt\"\n                      set dummy \"$func_split_short_opt_name\" \\\n                          \"-$func_split_short_opt_arg\" ${1+\"$@\"}\n                      shift\n                      ;;\n\n        --)           break ;;\n        -*)           func_fatal_help \"unrecognised option: '$_G_opt'\" ;;\n        *)            set dummy \"$_G_opt\" ${1+\"$@\"}; shift; break ;;\n      esac\n    done\n\n    # save modified positional parameters for caller\n    func_quote_for_eval ${1+\"$@\"}\n    func_parse_options_result=$func_quote_for_eval_result\n}\n\n\n# func_validate_options [ARG]...\n# ------------------------------\n# Perform any sanity checks on option settings and/or unconsumed\n# arguments.\nfunc_hookable func_validate_options\nfunc_validate_options ()\n{\n    $debug_cmd\n\n    # Display all warnings if -W was not given.\n    test -n \"$opt_warning_types\" || opt_warning_types=\" $warning_categories\"\n\n    func_run_hooks func_validate_options ${1+\"$@\"}\n\n    # Bail if the options were screwed!\n    $exit_cmd $EXIT_FAILURE\n\n    # save modified positional parameters for caller\n    func_validate_options_result=$func_run_hooks_result\n}\n\n\n\n## ----------------- ##\n## Helper functions. ##\n## ----------------- ##\n\n# This section contains the helper functions used by the rest of the\n# hookable option parser framework in ascii-betical order.\n\n\n# func_fatal_help ARG...\n# ----------------------\n# Echo program name prefixed message to standard error, followed by\n# a help hint, and exit.\nfunc_fatal_help ()\n{\n    $debug_cmd\n\n    eval \\$ECHO \\\"\"Usage: $usage\"\\\"\n    eval \\$ECHO \\\"\"$fatal_help\"\\\"\n    func_error ${1+\"$@\"}\n    exit $EXIT_FAILURE\n}\n\n\n# func_help\n# ---------\n# Echo long help message to standard output and exit.\nfunc_help ()\n{\n    $debug_cmd\n\n    func_usage_message\n    $ECHO \"$long_help_message\"\n    exit 0\n}\n\n\n# func_missing_arg ARGNAME\n# ------------------------\n# Echo program name prefixed message to standard error and set global\n# exit_cmd.\nfunc_missing_arg ()\n{\n    $debug_cmd\n\n    func_error \"Missing argument for '$1'.\"\n    exit_cmd=exit\n}\n\n\n# func_split_equals STRING\n# ------------------------\n# Set func_split_equals_lhs and func_split_equals_rhs shell variables after\n# splitting STRING at the '=' sign.\ntest -z \"$_G_HAVE_XSI_OPS\" \\\n    && (eval 'x=a/b/c;\n      test 5aa/bb/cc = \"${#x}${x%%/*}${x%/*}${x#*/}${x##*/}\"') 2>/dev/null \\\n    && _G_HAVE_XSI_OPS=yes\n\nif test yes = \"$_G_HAVE_XSI_OPS\"\nthen\n  # This is an XSI compatible shell, allowing a faster implementation...\n  eval 'func_split_equals ()\n  {\n      $debug_cmd\n\n      func_split_equals_lhs=${1%%=*}\n      func_split_equals_rhs=${1#*=}\n      test \"x$func_split_equals_lhs\" = \"x$1\" \\\n        && func_split_equals_rhs=\n  }'\nelse\n  # ...otherwise fall back to using expr, which is often a shell builtin.\n  func_split_equals ()\n  {\n      $debug_cmd\n\n      func_split_equals_lhs=`expr \"x$1\" : 'x\\([^=]*\\)'`\n      func_split_equals_rhs=\n      test \"x$func_split_equals_lhs\" = \"x$1\" \\\n        || func_split_equals_rhs=`expr \"x$1\" : 'x[^=]*=\\(.*\\)$'`\n  }\nfi #func_split_equals\n\n\n# func_split_short_opt SHORTOPT\n# -----------------------------\n# Set func_split_short_opt_name and func_split_short_opt_arg shell\n# variables after splitting SHORTOPT after the 2nd character.\nif test yes = \"$_G_HAVE_XSI_OPS\"\nthen\n  # This is an XSI compatible shell, allowing a faster implementation...\n  eval 'func_split_short_opt ()\n  {\n      $debug_cmd\n\n      func_split_short_opt_arg=${1#??}\n      func_split_short_opt_name=${1%\"$func_split_short_opt_arg\"}\n  }'\nelse\n  # ...otherwise fall back to using expr, which is often a shell builtin.\n  func_split_short_opt ()\n  {\n      $debug_cmd\n\n      func_split_short_opt_name=`expr \"x$1\" : 'x-\\(.\\)'`\n      func_split_short_opt_arg=`expr \"x$1\" : 'x-.\\(.*\\)$'`\n  }\nfi #func_split_short_opt\n\n\n# func_usage\n# ----------\n# Echo short help message to standard output and exit.\nfunc_usage ()\n{\n    $debug_cmd\n\n    func_usage_message\n    $ECHO \"Run '$progname --help |${PAGER-more}' for full usage\"\n    exit 0\n}\n\n\n# func_usage_message\n# ------------------\n# Echo short help message to standard output.\nfunc_usage_message ()\n{\n    $debug_cmd\n\n    eval \\$ECHO \\\"\"Usage: $usage\"\\\"\n    echo\n    $SED -n 's|^# ||\n        /^Written by/{\n          x;p;x\n        }\n\th\n\t/^Written by/q' < \"$progpath\"\n    echo\n    eval \\$ECHO \\\"\"$usage_message\"\\\"\n}\n\n\n# func_version\n# ------------\n# Echo version message to standard output and exit.\nfunc_version ()\n{\n    $debug_cmd\n\n    printf '%s\\n' \"$progname $scriptversion\"\n    $SED -n '\n        /(C)/!b go\n        :more\n        /\\./!{\n          N\n          s|\\n# | |\n          b more\n        }\n        :go\n        /^# Written by /,/# warranty; / {\n          s|^# ||\n          s|^# *$||\n          s|\\((C)\\)[ 0-9,-]*[ ,-]\\([1-9][0-9]* \\)|\\1 \\2|\n          p\n        }\n        /^# Written by / {\n          s|^# ||\n          p\n        }\n        /^warranty; /q' < \"$progpath\"\n\n    exit $?\n}\n\n\n# Local variables:\n# mode: shell-script\n# sh-indentation: 2\n# eval: (add-hook 'before-save-hook 'time-stamp)\n# time-stamp-pattern: \"10/scriptversion=%:y-%02m-%02d.%02H; # UTC\"\n# time-stamp-time-zone: \"UTC\"\n# End:\n\n# Set a version string.\nscriptversion='(GNU libtool) 2.4.6'\n\n\n# func_echo ARG...\n# ----------------\n# Libtool also displays the current mode in messages, so override\n# funclib.sh func_echo with this custom definition.\nfunc_echo ()\n{\n    $debug_cmd\n\n    _G_message=$*\n\n    func_echo_IFS=$IFS\n    IFS=$nl\n    for _G_line in $_G_message; do\n      IFS=$func_echo_IFS\n      $ECHO \"$progname${opt_mode+: $opt_mode}: $_G_line\"\n    done\n    IFS=$func_echo_IFS\n}\n\n\n# func_warning ARG...\n# -------------------\n# Libtool warnings are not categorized, so override funclib.sh\n# func_warning with this simpler definition.\nfunc_warning ()\n{\n    $debug_cmd\n\n    $warning_func ${1+\"$@\"}\n}\n\n\n## ---------------- ##\n## Options parsing. ##\n## ---------------- ##\n\n# Hook in the functions to make sure our own options are parsed during\n# the option parsing loop.\n\nusage='$progpath [OPTION]... [MODE-ARG]...'\n\n# Short help message in response to '-h'.\nusage_message=\"Options:\n       --config             show all configuration variables\n       --debug              enable verbose shell tracing\n   -n, --dry-run            display commands without modifying any files\n       --features           display basic configuration information and exit\n       --mode=MODE          use operation mode MODE\n       --no-warnings        equivalent to '-Wnone'\n       --preserve-dup-deps  don't remove duplicate dependency libraries\n       --quiet, --silent    don't print informational messages\n       --tag=TAG            use configuration variables from tag TAG\n   -v, --verbose            print more informational messages than default\n       --version            print version information\n   -W, --warnings=CATEGORY  report the warnings falling in CATEGORY [all]\n   -h, --help, --help-all   print short, long, or detailed help message\n\"\n\n# Additional text appended to 'usage_message' in response to '--help'.\nfunc_help ()\n{\n    $debug_cmd\n\n    func_usage_message\n    $ECHO \"$long_help_message\n\nMODE must be one of the following:\n\n       clean           remove files from the build directory\n       compile         compile a source file into a libtool object\n       execute         automatically set library path, then run a program\n       finish          complete the installation of libtool libraries\n       install         install libraries or executables\n       link            create a library or an executable\n       uninstall       remove libraries from an installed directory\n\nMODE-ARGS vary depending on the MODE.  When passed as first option,\n'--mode=MODE' may be abbreviated as 'MODE' or a unique abbreviation of that.\nTry '$progname --help --mode=MODE' for a more detailed description of MODE.\n\nWhen reporting a bug, please describe a test case to reproduce it and\ninclude the following information:\n\n       host-triplet:   $host\n       shell:          $SHELL\n       compiler:       $LTCC\n       compiler flags: $LTCFLAGS\n       linker:         $LD (gnu? $with_gnu_ld)\n       version:        $progname (GNU libtool) 2.4.6\n       automake:       `($AUTOMAKE --version) 2>/dev/null |$SED 1q`\n       autoconf:       `($AUTOCONF --version) 2>/dev/null |$SED 1q`\n\nReport bugs to <bug-libtool@gnu.org>.\nGNU libtool home page: <http://www.gnu.org/s/libtool/>.\nGeneral help using GNU software: <http://www.gnu.org/gethelp/>.\"\n    exit 0\n}\n\n\n# func_lo2o OBJECT-NAME\n# ---------------------\n# Transform OBJECT-NAME from a '.lo' suffix to the platform specific\n# object suffix.\n\nlo2o=s/\\\\.lo\\$/.$objext/\no2lo=s/\\\\.$objext\\$/.lo/\n\nif test yes = \"$_G_HAVE_XSI_OPS\"; then\n  eval 'func_lo2o ()\n  {\n    case $1 in\n      *.lo) func_lo2o_result=${1%.lo}.$objext ;;\n      *   ) func_lo2o_result=$1               ;;\n    esac\n  }'\n\n  # func_xform LIBOBJ-OR-SOURCE\n  # ---------------------------\n  # Transform LIBOBJ-OR-SOURCE from a '.o' or '.c' (or otherwise)\n  # suffix to a '.lo' libtool-object suffix.\n  eval 'func_xform ()\n  {\n    func_xform_result=${1%.*}.lo\n  }'\nelse\n  # ...otherwise fall back to using sed.\n  func_lo2o ()\n  {\n    func_lo2o_result=`$ECHO \"$1\" | $SED \"$lo2o\"`\n  }\n\n  func_xform ()\n  {\n    func_xform_result=`$ECHO \"$1\" | $SED 's|\\.[^.]*$|.lo|'`\n  }\nfi\n\n\n# func_fatal_configuration ARG...\n# -------------------------------\n# Echo program name prefixed message to standard error, followed by\n# a configuration failure hint, and exit.\nfunc_fatal_configuration ()\n{\n    func__fatal_error ${1+\"$@\"} \\\n      \"See the $PACKAGE documentation for more information.\" \\\n      \"Fatal configuration error.\"\n}\n\n\n# func_config\n# -----------\n# Display the configuration for all the tags in this script.\nfunc_config ()\n{\n    re_begincf='^# ### BEGIN LIBTOOL'\n    re_endcf='^# ### END LIBTOOL'\n\n    # Default configuration.\n    $SED \"1,/$re_begincf CONFIG/d;/$re_endcf CONFIG/,\\$d\" < \"$progpath\"\n\n    # Now print the configurations for the tags.\n    for tagname in $taglist; do\n      $SED -n \"/$re_begincf TAG CONFIG: $tagname\\$/,/$re_endcf TAG CONFIG: $tagname\\$/p\" < \"$progpath\"\n    done\n\n    exit $?\n}\n\n\n# func_features\n# -------------\n# Display the features supported by this script.\nfunc_features ()\n{\n    echo \"host: $host\"\n    if test yes = \"$build_libtool_libs\"; then\n      echo \"enable shared libraries\"\n    else\n      echo \"disable shared libraries\"\n    fi\n    if test yes = \"$build_old_libs\"; then\n      echo \"enable static libraries\"\n    else\n      echo \"disable static libraries\"\n    fi\n\n    exit $?\n}\n\n\n# func_enable_tag TAGNAME\n# -----------------------\n# Verify that TAGNAME is valid, and either flag an error and exit, or\n# enable the TAGNAME tag.  We also add TAGNAME to the global $taglist\n# variable here.\nfunc_enable_tag ()\n{\n    # Global variable:\n    tagname=$1\n\n    re_begincf=\"^# ### BEGIN LIBTOOL TAG CONFIG: $tagname\\$\"\n    re_endcf=\"^# ### END LIBTOOL TAG CONFIG: $tagname\\$\"\n    sed_extractcf=/$re_begincf/,/$re_endcf/p\n\n    # Validate tagname.\n    case $tagname in\n      *[!-_A-Za-z0-9,/]*)\n        func_fatal_error \"invalid tag name: $tagname\"\n        ;;\n    esac\n\n    # Don't test for the \"default\" C tag, as we know it's\n    # there but not specially marked.\n    case $tagname in\n        CC) ;;\n    *)\n        if $GREP \"$re_begincf\" \"$progpath\" >/dev/null 2>&1; then\n\t  taglist=\"$taglist $tagname\"\n\n\t  # Evaluate the configuration.  Be careful to quote the path\n\t  # and the sed script, to avoid splitting on whitespace, but\n\t  # also don't use non-portable quotes within backquotes within\n\t  # quotes we have to do it in 2 steps:\n\t  extractedcf=`$SED -n -e \"$sed_extractcf\" < \"$progpath\"`\n\t  eval \"$extractedcf\"\n        else\n\t  func_error \"ignoring unknown tag $tagname\"\n        fi\n        ;;\n    esac\n}\n\n\n# func_check_version_match\n# ------------------------\n# Ensure that we are using m4 macros, and libtool script from the same\n# release of libtool.\nfunc_check_version_match ()\n{\n    if test \"$package_revision\" != \"$macro_revision\"; then\n      if test \"$VERSION\" != \"$macro_version\"; then\n        if test -z \"$macro_version\"; then\n          cat >&2 <<_LT_EOF\n$progname: Version mismatch error.  This is $PACKAGE $VERSION, but the\n$progname: definition of this LT_INIT comes from an older release.\n$progname: You should recreate aclocal.m4 with macros from $PACKAGE $VERSION\n$progname: and run autoconf again.\n_LT_EOF\n        else\n          cat >&2 <<_LT_EOF\n$progname: Version mismatch error.  This is $PACKAGE $VERSION, but the\n$progname: definition of this LT_INIT comes from $PACKAGE $macro_version.\n$progname: You should recreate aclocal.m4 with macros from $PACKAGE $VERSION\n$progname: and run autoconf again.\n_LT_EOF\n        fi\n      else\n        cat >&2 <<_LT_EOF\n$progname: Version mismatch error.  This is $PACKAGE $VERSION, revision $package_revision,\n$progname: but the definition of this LT_INIT comes from revision $macro_revision.\n$progname: You should recreate aclocal.m4 with macros from revision $package_revision\n$progname: of $PACKAGE $VERSION and run autoconf again.\n_LT_EOF\n      fi\n\n      exit $EXIT_MISMATCH\n    fi\n}\n\n\n# libtool_options_prep [ARG]...\n# -----------------------------\n# Preparation for options parsed by libtool.\nlibtool_options_prep ()\n{\n    $debug_mode\n\n    # Option defaults:\n    opt_config=false\n    opt_dlopen=\n    opt_dry_run=false\n    opt_help=false\n    opt_mode=\n    opt_preserve_dup_deps=false\n    opt_quiet=false\n\n    nonopt=\n    preserve_args=\n\n    # Shorthand for --mode=foo, only valid as the first argument\n    case $1 in\n    clean|clea|cle|cl)\n      shift; set dummy --mode clean ${1+\"$@\"}; shift\n      ;;\n    compile|compil|compi|comp|com|co|c)\n      shift; set dummy --mode compile ${1+\"$@\"}; shift\n      ;;\n    execute|execut|execu|exec|exe|ex|e)\n      shift; set dummy --mode execute ${1+\"$@\"}; shift\n      ;;\n    finish|finis|fini|fin|fi|f)\n      shift; set dummy --mode finish ${1+\"$@\"}; shift\n      ;;\n    install|instal|insta|inst|ins|in|i)\n      shift; set dummy --mode install ${1+\"$@\"}; shift\n      ;;\n    link|lin|li|l)\n      shift; set dummy --mode link ${1+\"$@\"}; shift\n      ;;\n    uninstall|uninstal|uninsta|uninst|unins|unin|uni|un|u)\n      shift; set dummy --mode uninstall ${1+\"$@\"}; shift\n      ;;\n    esac\n\n    # Pass back the list of options.\n    func_quote_for_eval ${1+\"$@\"}\n    libtool_options_prep_result=$func_quote_for_eval_result\n}\nfunc_add_hook func_options_prep libtool_options_prep\n\n\n# libtool_parse_options [ARG]...\n# ---------------------------------\n# Provide handling for libtool specific options.\nlibtool_parse_options ()\n{\n    $debug_cmd\n\n    # Perform our own loop to consume as many options as possible in\n    # each iteration.\n    while test $# -gt 0; do\n      _G_opt=$1\n      shift\n      case $_G_opt in\n        --dry-run|--dryrun|-n)\n                        opt_dry_run=:\n                        ;;\n\n        --config)       func_config ;;\n\n        --dlopen|-dlopen)\n                        opt_dlopen=\"${opt_dlopen+$opt_dlopen\n}$1\"\n                        shift\n                        ;;\n\n        --preserve-dup-deps)\n                        opt_preserve_dup_deps=: ;;\n\n        --features)     func_features ;;\n\n        --finish)       set dummy --mode finish ${1+\"$@\"}; shift ;;\n\n        --help)         opt_help=: ;;\n\n        --help-all)     opt_help=': help-all' ;;\n\n        --mode)         test $# = 0 && func_missing_arg $_G_opt && break\n                        opt_mode=$1\n                        case $1 in\n                          # Valid mode arguments:\n                          clean|compile|execute|finish|install|link|relink|uninstall) ;;\n\n                          # Catch anything else as an error\n                          *) func_error \"invalid argument for $_G_opt\"\n                             exit_cmd=exit\n                             break\n                             ;;\n                        esac\n                        shift\n                        ;;\n\n        --no-silent|--no-quiet)\n                        opt_quiet=false\n                        func_append preserve_args \" $_G_opt\"\n                        ;;\n\n        --no-warnings|--no-warning|--no-warn)\n                        opt_warning=false\n                        func_append preserve_args \" $_G_opt\"\n                        ;;\n\n        --no-verbose)\n                        opt_verbose=false\n                        func_append preserve_args \" $_G_opt\"\n                        ;;\n\n        --silent|--quiet)\n                        opt_quiet=:\n                        opt_verbose=false\n                        func_append preserve_args \" $_G_opt\"\n                        ;;\n\n        --tag)          test $# = 0 && func_missing_arg $_G_opt && break\n                        opt_tag=$1\n                        func_append preserve_args \" $_G_opt $1\"\n                        func_enable_tag \"$1\"\n                        shift\n                        ;;\n\n        --verbose|-v)   opt_quiet=false\n                        opt_verbose=:\n                        func_append preserve_args \" $_G_opt\"\n                        ;;\n\n\t# An option not handled by this hook function:\n        *)\t\tset dummy \"$_G_opt\" ${1+\"$@\"};\tshift; break  ;;\n      esac\n    done\n\n\n    # save modified positional parameters for caller\n    func_quote_for_eval ${1+\"$@\"}\n    libtool_parse_options_result=$func_quote_for_eval_result\n}\nfunc_add_hook func_parse_options libtool_parse_options\n\n\n\n# libtool_validate_options [ARG]...\n# ---------------------------------\n# Perform any sanity checks on option settings and/or unconsumed\n# arguments.\nlibtool_validate_options ()\n{\n    # save first non-option argument\n    if test 0 -lt $#; then\n      nonopt=$1\n      shift\n    fi\n\n    # preserve --debug\n    test : = \"$debug_cmd\" || func_append preserve_args \" --debug\"\n\n    case $host in\n      # Solaris2 added to fix http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16452\n      # see also: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59788\n      *cygwin* | *mingw* | *pw32* | *cegcc* | *solaris2* | *os2*)\n        # don't eliminate duplications in $postdeps and $predeps\n        opt_duplicate_compiler_generated_deps=:\n        ;;\n      *)\n        opt_duplicate_compiler_generated_deps=$opt_preserve_dup_deps\n        ;;\n    esac\n\n    $opt_help || {\n      # Sanity checks first:\n      func_check_version_match\n\n      test yes != \"$build_libtool_libs\" \\\n        && test yes != \"$build_old_libs\" \\\n        && func_fatal_configuration \"not configured to build any kind of library\"\n\n      # Darwin sucks\n      eval std_shrext=\\\"$shrext_cmds\\\"\n\n      # Only execute mode is allowed to have -dlopen flags.\n      if test -n \"$opt_dlopen\" && test execute != \"$opt_mode\"; then\n        func_error \"unrecognized option '-dlopen'\"\n        $ECHO \"$help\" 1>&2\n        exit $EXIT_FAILURE\n      fi\n\n      # Change the help message to a mode-specific one.\n      generic_help=$help\n      help=\"Try '$progname --help --mode=$opt_mode' for more information.\"\n    }\n\n    # Pass back the unparsed argument list\n    func_quote_for_eval ${1+\"$@\"}\n    libtool_validate_options_result=$func_quote_for_eval_result\n}\nfunc_add_hook func_validate_options libtool_validate_options\n\n\n# Process options as early as possible so that --help and --version\n# can return quickly.\nfunc_options ${1+\"$@\"}\neval set dummy \"$func_options_result\"; shift\n\n\n\n## ----------- ##\n##    Main.    ##\n## ----------- ##\n\nmagic='%%%MAGIC variable%%%'\nmagic_exe='%%%MAGIC EXE variable%%%'\n\n# Global variables.\nextracted_archives=\nextracted_serial=0\n\n# If this variable is set in any of the actions, the command in it\n# will be execed at the end.  This prevents here-documents from being\n# left over by shells.\nexec_cmd=\n\n\n# A function that is used when there is no print builtin or printf.\nfunc_fallback_echo ()\n{\n  eval 'cat <<_LTECHO_EOF\n$1\n_LTECHO_EOF'\n}\n\n# func_generated_by_libtool\n# True iff stdin has been generated by Libtool. This function is only\n# a basic sanity check; it will hardly flush out determined imposters.\nfunc_generated_by_libtool_p ()\n{\n  $GREP \"^# Generated by .*$PACKAGE\" > /dev/null 2>&1\n}\n\n# func_lalib_p file\n# True iff FILE is a libtool '.la' library or '.lo' object file.\n# This function is only a basic sanity check; it will hardly flush out\n# determined imposters.\nfunc_lalib_p ()\n{\n    test -f \"$1\" &&\n      $SED -e 4q \"$1\" 2>/dev/null | func_generated_by_libtool_p\n}\n\n# func_lalib_unsafe_p file\n# True iff FILE is a libtool '.la' library or '.lo' object file.\n# This function implements the same check as func_lalib_p without\n# resorting to external programs.  To this end, it redirects stdin and\n# closes it afterwards, without saving the original file descriptor.\n# As a safety measure, use it only where a negative result would be\n# fatal anyway.  Works if 'file' does not exist.\nfunc_lalib_unsafe_p ()\n{\n    lalib_p=no\n    if test -f \"$1\" && test -r \"$1\" && exec 5<&0 <\"$1\"; then\n\tfor lalib_p_l in 1 2 3 4\n\tdo\n\t    read lalib_p_line\n\t    case $lalib_p_line in\n\t\t\\#\\ Generated\\ by\\ *$PACKAGE* ) lalib_p=yes; break;;\n\t    esac\n\tdone\n\texec 0<&5 5<&-\n    fi\n    test yes = \"$lalib_p\"\n}\n\n# func_ltwrapper_script_p file\n# True iff FILE is a libtool wrapper script\n# This function is only a basic sanity check; it will hardly flush out\n# determined imposters.\nfunc_ltwrapper_script_p ()\n{\n    test -f \"$1\" &&\n      $lt_truncate_bin < \"$1\" 2>/dev/null | func_generated_by_libtool_p\n}\n\n# func_ltwrapper_executable_p file\n# True iff FILE is a libtool wrapper executable\n# This function is only a basic sanity check; it will hardly flush out\n# determined imposters.\nfunc_ltwrapper_executable_p ()\n{\n    func_ltwrapper_exec_suffix=\n    case $1 in\n    *.exe) ;;\n    *) func_ltwrapper_exec_suffix=.exe ;;\n    esac\n    $GREP \"$magic_exe\" \"$1$func_ltwrapper_exec_suffix\" >/dev/null 2>&1\n}\n\n# func_ltwrapper_scriptname file\n# Assumes file is an ltwrapper_executable\n# uses $file to determine the appropriate filename for a\n# temporary ltwrapper_script.\nfunc_ltwrapper_scriptname ()\n{\n    func_dirname_and_basename \"$1\" \"\" \".\"\n    func_stripname '' '.exe' \"$func_basename_result\"\n    func_ltwrapper_scriptname_result=$func_dirname_result/$objdir/${func_stripname_result}_ltshwrapper\n}\n\n# func_ltwrapper_p file\n# True iff FILE is a libtool wrapper script or wrapper executable\n# This function is only a basic sanity check; it will hardly flush out\n# determined imposters.\nfunc_ltwrapper_p ()\n{\n    func_ltwrapper_script_p \"$1\" || func_ltwrapper_executable_p \"$1\"\n}\n\n\n# func_execute_cmds commands fail_cmd\n# Execute tilde-delimited COMMANDS.\n# If FAIL_CMD is given, eval that upon failure.\n# FAIL_CMD may read-access the current command in variable CMD!\nfunc_execute_cmds ()\n{\n    $debug_cmd\n\n    save_ifs=$IFS; IFS='~'\n    for cmd in $1; do\n      IFS=$sp$nl\n      eval cmd=\\\"$cmd\\\"\n      IFS=$save_ifs\n      func_show_eval \"$cmd\" \"${2-:}\"\n    done\n    IFS=$save_ifs\n}\n\n\n# func_source file\n# Source FILE, adding directory component if necessary.\n# Note that it is not necessary on cygwin/mingw to append a dot to\n# FILE even if both FILE and FILE.exe exist: automatic-append-.exe\n# behavior happens only for exec(3), not for open(2)!  Also, sourcing\n# 'FILE.' does not work on cygwin managed mounts.\nfunc_source ()\n{\n    $debug_cmd\n\n    case $1 in\n    */* | *\\\\*)\t. \"$1\" ;;\n    *)\t\t. \"./$1\" ;;\n    esac\n}\n\n\n# func_resolve_sysroot PATH\n# Replace a leading = in PATH with a sysroot.  Store the result into\n# func_resolve_sysroot_result\nfunc_resolve_sysroot ()\n{\n  func_resolve_sysroot_result=$1\n  case $func_resolve_sysroot_result in\n  =*)\n    func_stripname '=' '' \"$func_resolve_sysroot_result\"\n    func_resolve_sysroot_result=$lt_sysroot$func_stripname_result\n    ;;\n  esac\n}\n\n# func_replace_sysroot PATH\n# If PATH begins with the sysroot, replace it with = and\n# store the result into func_replace_sysroot_result.\nfunc_replace_sysroot ()\n{\n  case $lt_sysroot:$1 in\n  ?*:\"$lt_sysroot\"*)\n    func_stripname \"$lt_sysroot\" '' \"$1\"\n    func_replace_sysroot_result='='$func_stripname_result\n    ;;\n  *)\n    # Including no sysroot.\n    func_replace_sysroot_result=$1\n    ;;\n  esac\n}\n\n# func_infer_tag arg\n# Infer tagged configuration to use if any are available and\n# if one wasn't chosen via the \"--tag\" command line option.\n# Only attempt this if the compiler in the base compile\n# command doesn't match the default compiler.\n# arg is usually of the form 'gcc ...'\nfunc_infer_tag ()\n{\n    $debug_cmd\n\n    if test -n \"$available_tags\" && test -z \"$tagname\"; then\n      CC_quoted=\n      for arg in $CC; do\n\tfunc_append_quoted CC_quoted \"$arg\"\n      done\n      CC_expanded=`func_echo_all $CC`\n      CC_quoted_expanded=`func_echo_all $CC_quoted`\n      case $@ in\n      # Blanks in the command may have been stripped by the calling shell,\n      # but not from the CC environment variable when configure was run.\n      \" $CC \"* | \"$CC \"* | \" $CC_expanded \"* | \"$CC_expanded \"* | \\\n      \" $CC_quoted\"* | \"$CC_quoted \"* | \" $CC_quoted_expanded \"* | \"$CC_quoted_expanded \"*) ;;\n      # Blanks at the start of $base_compile will cause this to fail\n      # if we don't check for them as well.\n      *)\n\tfor z in $available_tags; do\n\t  if $GREP \"^# ### BEGIN LIBTOOL TAG CONFIG: $z$\" < \"$progpath\" > /dev/null; then\n\t    # Evaluate the configuration.\n\t    eval \"`$SED -n -e '/^# ### BEGIN LIBTOOL TAG CONFIG: '$z'$/,/^# ### END LIBTOOL TAG CONFIG: '$z'$/p' < $progpath`\"\n\t    CC_quoted=\n\t    for arg in $CC; do\n\t      # Double-quote args containing other shell metacharacters.\n\t      func_append_quoted CC_quoted \"$arg\"\n\t    done\n\t    CC_expanded=`func_echo_all $CC`\n\t    CC_quoted_expanded=`func_echo_all $CC_quoted`\n\t    case \"$@ \" in\n\t    \" $CC \"* | \"$CC \"* | \" $CC_expanded \"* | \"$CC_expanded \"* | \\\n\t    \" $CC_quoted\"* | \"$CC_quoted \"* | \" $CC_quoted_expanded \"* | \"$CC_quoted_expanded \"*)\n\t      # The compiler in the base compile command matches\n\t      # the one in the tagged configuration.\n\t      # Assume this is the tagged configuration we want.\n\t      tagname=$z\n\t      break\n\t      ;;\n\t    esac\n\t  fi\n\tdone\n\t# If $tagname still isn't set, then no tagged configuration\n\t# was found and let the user know that the \"--tag\" command\n\t# line option must be used.\n\tif test -z \"$tagname\"; then\n\t  func_echo \"unable to infer tagged configuration\"\n\t  func_fatal_error \"specify a tag with '--tag'\"\n#\telse\n#\t  func_verbose \"using $tagname tagged configuration\"\n\tfi\n\t;;\n      esac\n    fi\n}\n\n\n\n# func_write_libtool_object output_name pic_name nonpic_name\n# Create a libtool object file (analogous to a \".la\" file),\n# but don't create it if we're doing a dry run.\nfunc_write_libtool_object ()\n{\n    write_libobj=$1\n    if test yes = \"$build_libtool_libs\"; then\n      write_lobj=\\'$2\\'\n    else\n      write_lobj=none\n    fi\n\n    if test yes = \"$build_old_libs\"; then\n      write_oldobj=\\'$3\\'\n    else\n      write_oldobj=none\n    fi\n\n    $opt_dry_run || {\n      cat >${write_libobj}T <<EOF\n# $write_libobj - a libtool object file\n# Generated by $PROGRAM (GNU $PACKAGE) $VERSION\n#\n# Please DO NOT delete this file!\n# It is necessary for linking the library.\n\n# Name of the PIC object.\npic_object=$write_lobj\n\n# Name of the non-PIC object\nnon_pic_object=$write_oldobj\n\nEOF\n      $MV \"${write_libobj}T\" \"$write_libobj\"\n    }\n}\n\n\n##################################################\n# FILE NAME AND PATH CONVERSION HELPER FUNCTIONS #\n##################################################\n\n# func_convert_core_file_wine_to_w32 ARG\n# Helper function used by file name conversion functions when $build is *nix,\n# and $host is mingw, cygwin, or some other w32 environment. Relies on a\n# correctly configured wine environment available, with the winepath program\n# in $build's $PATH.\n#\n# ARG is the $build file name to be converted to w32 format.\n# Result is available in $func_convert_core_file_wine_to_w32_result, and will\n# be empty on error (or when ARG is empty)\nfunc_convert_core_file_wine_to_w32 ()\n{\n  $debug_cmd\n\n  func_convert_core_file_wine_to_w32_result=$1\n  if test -n \"$1\"; then\n    # Unfortunately, winepath does not exit with a non-zero error code, so we\n    # are forced to check the contents of stdout. On the other hand, if the\n    # command is not found, the shell will set an exit code of 127 and print\n    # *an error message* to stdout. So we must check for both error code of\n    # zero AND non-empty stdout, which explains the odd construction:\n    func_convert_core_file_wine_to_w32_tmp=`winepath -w \"$1\" 2>/dev/null`\n    if test \"$?\" -eq 0 && test -n \"$func_convert_core_file_wine_to_w32_tmp\"; then\n      func_convert_core_file_wine_to_w32_result=`$ECHO \"$func_convert_core_file_wine_to_w32_tmp\" |\n        $SED -e \"$sed_naive_backslashify\"`\n    else\n      func_convert_core_file_wine_to_w32_result=\n    fi\n  fi\n}\n# end: func_convert_core_file_wine_to_w32\n\n\n# func_convert_core_path_wine_to_w32 ARG\n# Helper function used by path conversion functions when $build is *nix, and\n# $host is mingw, cygwin, or some other w32 environment. Relies on a correctly\n# configured wine environment available, with the winepath program in $build's\n# $PATH. Assumes ARG has no leading or trailing path separator characters.\n#\n# ARG is path to be converted from $build format to win32.\n# Result is available in $func_convert_core_path_wine_to_w32_result.\n# Unconvertible file (directory) names in ARG are skipped; if no directory names\n# are convertible, then the result may be empty.\nfunc_convert_core_path_wine_to_w32 ()\n{\n  $debug_cmd\n\n  # unfortunately, winepath doesn't convert paths, only file names\n  func_convert_core_path_wine_to_w32_result=\n  if test -n \"$1\"; then\n    oldIFS=$IFS\n    IFS=:\n    for func_convert_core_path_wine_to_w32_f in $1; do\n      IFS=$oldIFS\n      func_convert_core_file_wine_to_w32 \"$func_convert_core_path_wine_to_w32_f\"\n      if test -n \"$func_convert_core_file_wine_to_w32_result\"; then\n        if test -z \"$func_convert_core_path_wine_to_w32_result\"; then\n          func_convert_core_path_wine_to_w32_result=$func_convert_core_file_wine_to_w32_result\n        else\n          func_append func_convert_core_path_wine_to_w32_result \";$func_convert_core_file_wine_to_w32_result\"\n        fi\n      fi\n    done\n    IFS=$oldIFS\n  fi\n}\n# end: func_convert_core_path_wine_to_w32\n\n\n# func_cygpath ARGS...\n# Wrapper around calling the cygpath program via LT_CYGPATH. This is used when\n# when (1) $build is *nix and Cygwin is hosted via a wine environment; or (2)\n# $build is MSYS and $host is Cygwin, or (3) $build is Cygwin. In case (1) or\n# (2), returns the Cygwin file name or path in func_cygpath_result (input\n# file name or path is assumed to be in w32 format, as previously converted\n# from $build's *nix or MSYS format). In case (3), returns the w32 file name\n# or path in func_cygpath_result (input file name or path is assumed to be in\n# Cygwin format). Returns an empty string on error.\n#\n# ARGS are passed to cygpath, with the last one being the file name or path to\n# be converted.\n#\n# Specify the absolute *nix (or w32) name to cygpath in the LT_CYGPATH\n# environment variable; do not put it in $PATH.\nfunc_cygpath ()\n{\n  $debug_cmd\n\n  if test -n \"$LT_CYGPATH\" && test -f \"$LT_CYGPATH\"; then\n    func_cygpath_result=`$LT_CYGPATH \"$@\" 2>/dev/null`\n    if test \"$?\" -ne 0; then\n      # on failure, ensure result is empty\n      func_cygpath_result=\n    fi\n  else\n    func_cygpath_result=\n    func_error \"LT_CYGPATH is empty or specifies non-existent file: '$LT_CYGPATH'\"\n  fi\n}\n#end: func_cygpath\n\n\n# func_convert_core_msys_to_w32 ARG\n# Convert file name or path ARG from MSYS format to w32 format.  Return\n# result in func_convert_core_msys_to_w32_result.\nfunc_convert_core_msys_to_w32 ()\n{\n  $debug_cmd\n\n  # awkward: cmd appends spaces to result\n  func_convert_core_msys_to_w32_result=`( cmd //c echo \"$1\" ) 2>/dev/null |\n    $SED -e 's/[ ]*$//' -e \"$sed_naive_backslashify\"`\n}\n#end: func_convert_core_msys_to_w32\n\n\n# func_convert_file_check ARG1 ARG2\n# Verify that ARG1 (a file name in $build format) was converted to $host\n# format in ARG2. Otherwise, emit an error message, but continue (resetting\n# func_to_host_file_result to ARG1).\nfunc_convert_file_check ()\n{\n  $debug_cmd\n\n  if test -z \"$2\" && test -n \"$1\"; then\n    func_error \"Could not determine host file name corresponding to\"\n    func_error \"  '$1'\"\n    func_error \"Continuing, but uninstalled executables may not work.\"\n    # Fallback:\n    func_to_host_file_result=$1\n  fi\n}\n# end func_convert_file_check\n\n\n# func_convert_path_check FROM_PATHSEP TO_PATHSEP FROM_PATH TO_PATH\n# Verify that FROM_PATH (a path in $build format) was converted to $host\n# format in TO_PATH. Otherwise, emit an error message, but continue, resetting\n# func_to_host_file_result to a simplistic fallback value (see below).\nfunc_convert_path_check ()\n{\n  $debug_cmd\n\n  if test -z \"$4\" && test -n \"$3\"; then\n    func_error \"Could not determine the host path corresponding to\"\n    func_error \"  '$3'\"\n    func_error \"Continuing, but uninstalled executables may not work.\"\n    # Fallback.  This is a deliberately simplistic \"conversion\" and\n    # should not be \"improved\".  See libtool.info.\n    if test \"x$1\" != \"x$2\"; then\n      lt_replace_pathsep_chars=\"s|$1|$2|g\"\n      func_to_host_path_result=`echo \"$3\" |\n        $SED -e \"$lt_replace_pathsep_chars\"`\n    else\n      func_to_host_path_result=$3\n    fi\n  fi\n}\n# end func_convert_path_check\n\n\n# func_convert_path_front_back_pathsep FRONTPAT BACKPAT REPL ORIG\n# Modifies func_to_host_path_result by prepending REPL if ORIG matches FRONTPAT\n# and appending REPL if ORIG matches BACKPAT.\nfunc_convert_path_front_back_pathsep ()\n{\n  $debug_cmd\n\n  case $4 in\n  $1 ) func_to_host_path_result=$3$func_to_host_path_result\n    ;;\n  esac\n  case $4 in\n  $2 ) func_append func_to_host_path_result \"$3\"\n    ;;\n  esac\n}\n# end func_convert_path_front_back_pathsep\n\n\n##################################################\n# $build to $host FILE NAME CONVERSION FUNCTIONS #\n##################################################\n# invoked via '$to_host_file_cmd ARG'\n#\n# In each case, ARG is the path to be converted from $build to $host format.\n# Result will be available in $func_to_host_file_result.\n\n\n# func_to_host_file ARG\n# Converts the file name ARG from $build format to $host format. Return result\n# in func_to_host_file_result.\nfunc_to_host_file ()\n{\n  $debug_cmd\n\n  $to_host_file_cmd \"$1\"\n}\n# end func_to_host_file\n\n\n# func_to_tool_file ARG LAZY\n# converts the file name ARG from $build format to toolchain format. Return\n# result in func_to_tool_file_result.  If the conversion in use is listed\n# in (the comma separated) LAZY, no conversion takes place.\nfunc_to_tool_file ()\n{\n  $debug_cmd\n\n  case ,$2, in\n    *,\"$to_tool_file_cmd\",*)\n      func_to_tool_file_result=$1\n      ;;\n    *)\n      $to_tool_file_cmd \"$1\"\n      func_to_tool_file_result=$func_to_host_file_result\n      ;;\n  esac\n}\n# end func_to_tool_file\n\n\n# func_convert_file_noop ARG\n# Copy ARG to func_to_host_file_result.\nfunc_convert_file_noop ()\n{\n  func_to_host_file_result=$1\n}\n# end func_convert_file_noop\n\n\n# func_convert_file_msys_to_w32 ARG\n# Convert file name ARG from (mingw) MSYS to (mingw) w32 format; automatic\n# conversion to w32 is not available inside the cwrapper.  Returns result in\n# func_to_host_file_result.\nfunc_convert_file_msys_to_w32 ()\n{\n  $debug_cmd\n\n  func_to_host_file_result=$1\n  if test -n \"$1\"; then\n    func_convert_core_msys_to_w32 \"$1\"\n    func_to_host_file_result=$func_convert_core_msys_to_w32_result\n  fi\n  func_convert_file_check \"$1\" \"$func_to_host_file_result\"\n}\n# end func_convert_file_msys_to_w32\n\n\n# func_convert_file_cygwin_to_w32 ARG\n# Convert file name ARG from Cygwin to w32 format.  Returns result in\n# func_to_host_file_result.\nfunc_convert_file_cygwin_to_w32 ()\n{\n  $debug_cmd\n\n  func_to_host_file_result=$1\n  if test -n \"$1\"; then\n    # because $build is cygwin, we call \"the\" cygpath in $PATH; no need to use\n    # LT_CYGPATH in this case.\n    func_to_host_file_result=`cygpath -m \"$1\"`\n  fi\n  func_convert_file_check \"$1\" \"$func_to_host_file_result\"\n}\n# end func_convert_file_cygwin_to_w32\n\n\n# func_convert_file_nix_to_w32 ARG\n# Convert file name ARG from *nix to w32 format.  Requires a wine environment\n# and a working winepath. Returns result in func_to_host_file_result.\nfunc_convert_file_nix_to_w32 ()\n{\n  $debug_cmd\n\n  func_to_host_file_result=$1\n  if test -n \"$1\"; then\n    func_convert_core_file_wine_to_w32 \"$1\"\n    func_to_host_file_result=$func_convert_core_file_wine_to_w32_result\n  fi\n  func_convert_file_check \"$1\" \"$func_to_host_file_result\"\n}\n# end func_convert_file_nix_to_w32\n\n\n# func_convert_file_msys_to_cygwin ARG\n# Convert file name ARG from MSYS to Cygwin format.  Requires LT_CYGPATH set.\n# Returns result in func_to_host_file_result.\nfunc_convert_file_msys_to_cygwin ()\n{\n  $debug_cmd\n\n  func_to_host_file_result=$1\n  if test -n \"$1\"; then\n    func_convert_core_msys_to_w32 \"$1\"\n    func_cygpath -u \"$func_convert_core_msys_to_w32_result\"\n    func_to_host_file_result=$func_cygpath_result\n  fi\n  func_convert_file_check \"$1\" \"$func_to_host_file_result\"\n}\n# end func_convert_file_msys_to_cygwin\n\n\n# func_convert_file_nix_to_cygwin ARG\n# Convert file name ARG from *nix to Cygwin format.  Requires Cygwin installed\n# in a wine environment, working winepath, and LT_CYGPATH set.  Returns result\n# in func_to_host_file_result.\nfunc_convert_file_nix_to_cygwin ()\n{\n  $debug_cmd\n\n  func_to_host_file_result=$1\n  if test -n \"$1\"; then\n    # convert from *nix to w32, then use cygpath to convert from w32 to cygwin.\n    func_convert_core_file_wine_to_w32 \"$1\"\n    func_cygpath -u \"$func_convert_core_file_wine_to_w32_result\"\n    func_to_host_file_result=$func_cygpath_result\n  fi\n  func_convert_file_check \"$1\" \"$func_to_host_file_result\"\n}\n# end func_convert_file_nix_to_cygwin\n\n\n#############################################\n# $build to $host PATH CONVERSION FUNCTIONS #\n#############################################\n# invoked via '$to_host_path_cmd ARG'\n#\n# In each case, ARG is the path to be converted from $build to $host format.\n# The result will be available in $func_to_host_path_result.\n#\n# Path separators are also converted from $build format to $host format.  If\n# ARG begins or ends with a path separator character, it is preserved (but\n# converted to $host format) on output.\n#\n# All path conversion functions are named using the following convention:\n#   file name conversion function    : func_convert_file_X_to_Y ()\n#   path conversion function         : func_convert_path_X_to_Y ()\n# where, for any given $build/$host combination the 'X_to_Y' value is the\n# same.  If conversion functions are added for new $build/$host combinations,\n# the two new functions must follow this pattern, or func_init_to_host_path_cmd\n# will break.\n\n\n# func_init_to_host_path_cmd\n# Ensures that function \"pointer\" variable $to_host_path_cmd is set to the\n# appropriate value, based on the value of $to_host_file_cmd.\nto_host_path_cmd=\nfunc_init_to_host_path_cmd ()\n{\n  $debug_cmd\n\n  if test -z \"$to_host_path_cmd\"; then\n    func_stripname 'func_convert_file_' '' \"$to_host_file_cmd\"\n    to_host_path_cmd=func_convert_path_$func_stripname_result\n  fi\n}\n\n\n# func_to_host_path ARG\n# Converts the path ARG from $build format to $host format. Return result\n# in func_to_host_path_result.\nfunc_to_host_path ()\n{\n  $debug_cmd\n\n  func_init_to_host_path_cmd\n  $to_host_path_cmd \"$1\"\n}\n# end func_to_host_path\n\n\n# func_convert_path_noop ARG\n# Copy ARG to func_to_host_path_result.\nfunc_convert_path_noop ()\n{\n  func_to_host_path_result=$1\n}\n# end func_convert_path_noop\n\n\n# func_convert_path_msys_to_w32 ARG\n# Convert path ARG from (mingw) MSYS to (mingw) w32 format; automatic\n# conversion to w32 is not available inside the cwrapper.  Returns result in\n# func_to_host_path_result.\nfunc_convert_path_msys_to_w32 ()\n{\n  $debug_cmd\n\n  func_to_host_path_result=$1\n  if test -n \"$1\"; then\n    # Remove leading and trailing path separator characters from ARG.  MSYS\n    # behavior is inconsistent here; cygpath turns them into '.;' and ';.';\n    # and winepath ignores them completely.\n    func_stripname : : \"$1\"\n    func_to_host_path_tmp1=$func_stripname_result\n    func_convert_core_msys_to_w32 \"$func_to_host_path_tmp1\"\n    func_to_host_path_result=$func_convert_core_msys_to_w32_result\n    func_convert_path_check : \";\" \\\n      \"$func_to_host_path_tmp1\" \"$func_to_host_path_result\"\n    func_convert_path_front_back_pathsep \":*\" \"*:\" \";\" \"$1\"\n  fi\n}\n# end func_convert_path_msys_to_w32\n\n\n# func_convert_path_cygwin_to_w32 ARG\n# Convert path ARG from Cygwin to w32 format.  Returns result in\n# func_to_host_file_result.\nfunc_convert_path_cygwin_to_w32 ()\n{\n  $debug_cmd\n\n  func_to_host_path_result=$1\n  if test -n \"$1\"; then\n    # See func_convert_path_msys_to_w32:\n    func_stripname : : \"$1\"\n    func_to_host_path_tmp1=$func_stripname_result\n    func_to_host_path_result=`cygpath -m -p \"$func_to_host_path_tmp1\"`\n    func_convert_path_check : \";\" \\\n      \"$func_to_host_path_tmp1\" \"$func_to_host_path_result\"\n    func_convert_path_front_back_pathsep \":*\" \"*:\" \";\" \"$1\"\n  fi\n}\n# end func_convert_path_cygwin_to_w32\n\n\n# func_convert_path_nix_to_w32 ARG\n# Convert path ARG from *nix to w32 format.  Requires a wine environment and\n# a working winepath.  Returns result in func_to_host_file_result.\nfunc_convert_path_nix_to_w32 ()\n{\n  $debug_cmd\n\n  func_to_host_path_result=$1\n  if test -n \"$1\"; then\n    # See func_convert_path_msys_to_w32:\n    func_stripname : : \"$1\"\n    func_to_host_path_tmp1=$func_stripname_result\n    func_convert_core_path_wine_to_w32 \"$func_to_host_path_tmp1\"\n    func_to_host_path_result=$func_convert_core_path_wine_to_w32_result\n    func_convert_path_check : \";\" \\\n      \"$func_to_host_path_tmp1\" \"$func_to_host_path_result\"\n    func_convert_path_front_back_pathsep \":*\" \"*:\" \";\" \"$1\"\n  fi\n}\n# end func_convert_path_nix_to_w32\n\n\n# func_convert_path_msys_to_cygwin ARG\n# Convert path ARG from MSYS to Cygwin format.  Requires LT_CYGPATH set.\n# Returns result in func_to_host_file_result.\nfunc_convert_path_msys_to_cygwin ()\n{\n  $debug_cmd\n\n  func_to_host_path_result=$1\n  if test -n \"$1\"; then\n    # See func_convert_path_msys_to_w32:\n    func_stripname : : \"$1\"\n    func_to_host_path_tmp1=$func_stripname_result\n    func_convert_core_msys_to_w32 \"$func_to_host_path_tmp1\"\n    func_cygpath -u -p \"$func_convert_core_msys_to_w32_result\"\n    func_to_host_path_result=$func_cygpath_result\n    func_convert_path_check : : \\\n      \"$func_to_host_path_tmp1\" \"$func_to_host_path_result\"\n    func_convert_path_front_back_pathsep \":*\" \"*:\" : \"$1\"\n  fi\n}\n# end func_convert_path_msys_to_cygwin\n\n\n# func_convert_path_nix_to_cygwin ARG\n# Convert path ARG from *nix to Cygwin format.  Requires Cygwin installed in a\n# a wine environment, working winepath, and LT_CYGPATH set.  Returns result in\n# func_to_host_file_result.\nfunc_convert_path_nix_to_cygwin ()\n{\n  $debug_cmd\n\n  func_to_host_path_result=$1\n  if test -n \"$1\"; then\n    # Remove leading and trailing path separator characters from\n    # ARG. msys behavior is inconsistent here, cygpath turns them\n    # into '.;' and ';.', and winepath ignores them completely.\n    func_stripname : : \"$1\"\n    func_to_host_path_tmp1=$func_stripname_result\n    func_convert_core_path_wine_to_w32 \"$func_to_host_path_tmp1\"\n    func_cygpath -u -p \"$func_convert_core_path_wine_to_w32_result\"\n    func_to_host_path_result=$func_cygpath_result\n    func_convert_path_check : : \\\n      \"$func_to_host_path_tmp1\" \"$func_to_host_path_result\"\n    func_convert_path_front_back_pathsep \":*\" \"*:\" : \"$1\"\n  fi\n}\n# end func_convert_path_nix_to_cygwin\n\n\n# func_dll_def_p FILE\n# True iff FILE is a Windows DLL '.def' file.\n# Keep in sync with _LT_DLL_DEF_P in libtool.m4\nfunc_dll_def_p ()\n{\n  $debug_cmd\n\n  func_dll_def_p_tmp=`$SED -n \\\n    -e 's/^[\t ]*//' \\\n    -e '/^\\(;.*\\)*$/d' \\\n    -e 's/^\\(EXPORTS\\|LIBRARY\\)\\([\t ].*\\)*$/DEF/p' \\\n    -e q \\\n    \"$1\"`\n  test DEF = \"$func_dll_def_p_tmp\"\n}\n\n\n# func_mode_compile arg...\nfunc_mode_compile ()\n{\n    $debug_cmd\n\n    # Get the compilation command and the source file.\n    base_compile=\n    srcfile=$nonopt  #  always keep a non-empty value in \"srcfile\"\n    suppress_opt=yes\n    suppress_output=\n    arg_mode=normal\n    libobj=\n    later=\n    pie_flag=\n\n    for arg\n    do\n      case $arg_mode in\n      arg  )\n\t# do not \"continue\".  Instead, add this to base_compile\n\tlastarg=$arg\n\targ_mode=normal\n\t;;\n\n      target )\n\tlibobj=$arg\n\targ_mode=normal\n\tcontinue\n\t;;\n\n      normal )\n\t# Accept any command-line options.\n\tcase $arg in\n\t-o)\n\t  test -n \"$libobj\" && \\\n\t    func_fatal_error \"you cannot specify '-o' more than once\"\n\t  arg_mode=target\n\t  continue\n\t  ;;\n\n\t-pie | -fpie | -fPIE)\n          func_append pie_flag \" $arg\"\n\t  continue\n\t  ;;\n\n\t-shared | -static | -prefer-pic | -prefer-non-pic)\n\t  func_append later \" $arg\"\n\t  continue\n\t  ;;\n\n\t-no-suppress)\n\t  suppress_opt=no\n\t  continue\n\t  ;;\n\n\t-Xcompiler)\n\t  arg_mode=arg  #  the next one goes into the \"base_compile\" arg list\n\t  continue      #  The current \"srcfile\" will either be retained or\n\t  ;;            #  replaced later.  I would guess that would be a bug.\n\n\t-Wc,*)\n\t  func_stripname '-Wc,' '' \"$arg\"\n\t  args=$func_stripname_result\n\t  lastarg=\n\t  save_ifs=$IFS; IFS=,\n\t  for arg in $args; do\n\t    IFS=$save_ifs\n\t    func_append_quoted lastarg \"$arg\"\n\t  done\n\t  IFS=$save_ifs\n\t  func_stripname ' ' '' \"$lastarg\"\n\t  lastarg=$func_stripname_result\n\n\t  # Add the arguments to base_compile.\n\t  func_append base_compile \" $lastarg\"\n\t  continue\n\t  ;;\n\n\t*)\n\t  # Accept the current argument as the source file.\n\t  # The previous \"srcfile\" becomes the current argument.\n\t  #\n\t  lastarg=$srcfile\n\t  srcfile=$arg\n\t  ;;\n\tesac  #  case $arg\n\t;;\n      esac    #  case $arg_mode\n\n      # Aesthetically quote the previous argument.\n      func_append_quoted base_compile \"$lastarg\"\n    done # for arg\n\n    case $arg_mode in\n    arg)\n      func_fatal_error \"you must specify an argument for -Xcompile\"\n      ;;\n    target)\n      func_fatal_error \"you must specify a target with '-o'\"\n      ;;\n    *)\n      # Get the name of the library object.\n      test -z \"$libobj\" && {\n\tfunc_basename \"$srcfile\"\n\tlibobj=$func_basename_result\n      }\n      ;;\n    esac\n\n    # Recognize several different file suffixes.\n    # If the user specifies -o file.o, it is replaced with file.lo\n    case $libobj in\n    *.[cCFSifmso] | \\\n    *.ada | *.adb | *.ads | *.asm | \\\n    *.c++ | *.cc | *.ii | *.class | *.cpp | *.cxx | \\\n    *.[fF][09]? | *.for | *.java | *.go | *.obj | *.sx | *.cu | *.cup)\n      func_xform \"$libobj\"\n      libobj=$func_xform_result\n      ;;\n    esac\n\n    case $libobj in\n    *.lo) func_lo2o \"$libobj\"; obj=$func_lo2o_result ;;\n    *)\n      func_fatal_error \"cannot determine name of library object from '$libobj'\"\n      ;;\n    esac\n\n    func_infer_tag $base_compile\n\n    for arg in $later; do\n      case $arg in\n      -shared)\n\ttest yes = \"$build_libtool_libs\" \\\n\t  || func_fatal_configuration \"cannot build a shared library\"\n\tbuild_old_libs=no\n\tcontinue\n\t;;\n\n      -static)\n\tbuild_libtool_libs=no\n\tbuild_old_libs=yes\n\tcontinue\n\t;;\n\n      -prefer-pic)\n\tpic_mode=yes\n\tcontinue\n\t;;\n\n      -prefer-non-pic)\n\tpic_mode=no\n\tcontinue\n\t;;\n      esac\n    done\n\n    func_quote_for_eval \"$libobj\"\n    test \"X$libobj\" != \"X$func_quote_for_eval_result\" \\\n      && $ECHO \"X$libobj\" | $GREP '[]~#^*{};<>?\"'\"'\"'\t &()|`$[]' \\\n      && func_warning \"libobj name '$libobj' may not contain shell special characters.\"\n    func_dirname_and_basename \"$obj\" \"/\" \"\"\n    objname=$func_basename_result\n    xdir=$func_dirname_result\n    lobj=$xdir$objdir/$objname\n\n    test -z \"$base_compile\" && \\\n      func_fatal_help \"you must specify a compilation command\"\n\n    # Delete any leftover library objects.\n    if test yes = \"$build_old_libs\"; then\n      removelist=\"$obj $lobj $libobj ${libobj}T\"\n    else\n      removelist=\"$lobj $libobj ${libobj}T\"\n    fi\n\n    # On Cygwin there's no \"real\" PIC flag so we must build both object types\n    case $host_os in\n    cygwin* | mingw* | pw32* | os2* | cegcc*)\n      pic_mode=default\n      ;;\n    esac\n    if test no = \"$pic_mode\" && test pass_all != \"$deplibs_check_method\"; then\n      # non-PIC code in shared libraries is not supported\n      pic_mode=default\n    fi\n\n    # Calculate the filename of the output object if compiler does\n    # not support -o with -c\n    if test no = \"$compiler_c_o\"; then\n      output_obj=`$ECHO \"$srcfile\" | $SED 's%^.*/%%; s%\\.[^.]*$%%'`.$objext\n      lockfile=$output_obj.lock\n    else\n      output_obj=\n      need_locks=no\n      lockfile=\n    fi\n\n    # Lock this critical section if it is needed\n    # We use this script file to make the link, it avoids creating a new file\n    if test yes = \"$need_locks\"; then\n      until $opt_dry_run || ln \"$progpath\" \"$lockfile\" 2>/dev/null; do\n\tfunc_echo \"Waiting for $lockfile to be removed\"\n\tsleep 2\n      done\n    elif test warn = \"$need_locks\"; then\n      if test -f \"$lockfile\"; then\n\t$ECHO \"\\\n*** ERROR, $lockfile exists and contains:\n`cat $lockfile 2>/dev/null`\n\nThis indicates that another process is trying to use the same\ntemporary object file, and libtool could not work around it because\nyour compiler does not support '-c' and '-o' together.  If you\nrepeat this compilation, it may succeed, by chance, but you had better\navoid parallel builds (make -j) in this platform, or get a better\ncompiler.\"\n\n\t$opt_dry_run || $RM $removelist\n\texit $EXIT_FAILURE\n      fi\n      func_append removelist \" $output_obj\"\n      $ECHO \"$srcfile\" > \"$lockfile\"\n    fi\n\n    $opt_dry_run || $RM $removelist\n    func_append removelist \" $lockfile\"\n    trap '$opt_dry_run || $RM $removelist; exit $EXIT_FAILURE' 1 2 15\n\n    func_to_tool_file \"$srcfile\" func_convert_file_msys_to_w32\n    srcfile=$func_to_tool_file_result\n    func_quote_for_eval \"$srcfile\"\n    qsrcfile=$func_quote_for_eval_result\n\n    # Only build a PIC object if we are building libtool libraries.\n    if test yes = \"$build_libtool_libs\"; then\n      # Without this assignment, base_compile gets emptied.\n      fbsd_hideous_sh_bug=$base_compile\n\n      if test no != \"$pic_mode\"; then\n\tcommand=\"$base_compile $qsrcfile $pic_flag\"\n      else\n\t# Don't build PIC code\n\tcommand=\"$base_compile $qsrcfile\"\n      fi\n\n      func_mkdir_p \"$xdir$objdir\"\n\n      if test -z \"$output_obj\"; then\n\t# Place PIC objects in $objdir\n\tfunc_append command \" -o $lobj\"\n      fi\n\n      func_show_eval_locale \"$command\"\t\\\n          'test -n \"$output_obj\" && $RM $removelist; exit $EXIT_FAILURE'\n\n      if test warn = \"$need_locks\" &&\n\t test \"X`cat $lockfile 2>/dev/null`\" != \"X$srcfile\"; then\n\t$ECHO \"\\\n*** ERROR, $lockfile contains:\n`cat $lockfile 2>/dev/null`\n\nbut it should contain:\n$srcfile\n\nThis indicates that another process is trying to use the same\ntemporary object file, and libtool could not work around it because\nyour compiler does not support '-c' and '-o' together.  If you\nrepeat this compilation, it may succeed, by chance, but you had better\navoid parallel builds (make -j) in this platform, or get a better\ncompiler.\"\n\n\t$opt_dry_run || $RM $removelist\n\texit $EXIT_FAILURE\n      fi\n\n      # Just move the object if needed, then go on to compile the next one\n      if test -n \"$output_obj\" && test \"X$output_obj\" != \"X$lobj\"; then\n\tfunc_show_eval '$MV \"$output_obj\" \"$lobj\"' \\\n\t  'error=$?; $opt_dry_run || $RM $removelist; exit $error'\n      fi\n\n      # Allow error messages only from the first compilation.\n      if test yes = \"$suppress_opt\"; then\n\tsuppress_output=' >/dev/null 2>&1'\n      fi\n    fi\n\n    # Only build a position-dependent object if we build old libraries.\n    if test yes = \"$build_old_libs\"; then\n      if test yes != \"$pic_mode\"; then\n\t# Don't build PIC code\n\tcommand=\"$base_compile $qsrcfile$pie_flag\"\n      else\n\tcommand=\"$base_compile $qsrcfile $pic_flag\"\n      fi\n      if test yes = \"$compiler_c_o\"; then\n\tfunc_append command \" -o $obj\"\n      fi\n\n      # Suppress compiler output if we already did a PIC compilation.\n      func_append command \"$suppress_output\"\n      func_show_eval_locale \"$command\" \\\n        '$opt_dry_run || $RM $removelist; exit $EXIT_FAILURE'\n\n      if test warn = \"$need_locks\" &&\n\t test \"X`cat $lockfile 2>/dev/null`\" != \"X$srcfile\"; then\n\t$ECHO \"\\\n*** ERROR, $lockfile contains:\n`cat $lockfile 2>/dev/null`\n\nbut it should contain:\n$srcfile\n\nThis indicates that another process is trying to use the same\ntemporary object file, and libtool could not work around it because\nyour compiler does not support '-c' and '-o' together.  If you\nrepeat this compilation, it may succeed, by chance, but you had better\navoid parallel builds (make -j) in this platform, or get a better\ncompiler.\"\n\n\t$opt_dry_run || $RM $removelist\n\texit $EXIT_FAILURE\n      fi\n\n      # Just move the object if needed\n      if test -n \"$output_obj\" && test \"X$output_obj\" != \"X$obj\"; then\n\tfunc_show_eval '$MV \"$output_obj\" \"$obj\"' \\\n\t  'error=$?; $opt_dry_run || $RM $removelist; exit $error'\n      fi\n    fi\n\n    $opt_dry_run || {\n      func_write_libtool_object \"$libobj\" \"$objdir/$objname\" \"$objname\"\n\n      # Unlock the critical section if it was locked\n      if test no != \"$need_locks\"; then\n\tremovelist=$lockfile\n        $RM \"$lockfile\"\n      fi\n    }\n\n    exit $EXIT_SUCCESS\n}\n\n$opt_help || {\n  test compile = \"$opt_mode\" && func_mode_compile ${1+\"$@\"}\n}\n\nfunc_mode_help ()\n{\n    # We need to display help for each of the modes.\n    case $opt_mode in\n      \"\")\n        # Generic help is extracted from the usage comments\n        # at the start of this file.\n        func_help\n        ;;\n\n      clean)\n        $ECHO \\\n\"Usage: $progname [OPTION]... --mode=clean RM [RM-OPTION]... FILE...\n\nRemove files from the build directory.\n\nRM is the name of the program to use to delete files associated with each FILE\n(typically '/bin/rm').  RM-OPTIONS are options (such as '-f') to be passed\nto RM.\n\nIf FILE is a libtool library, object or program, all the files associated\nwith it are deleted. Otherwise, only FILE itself is deleted using RM.\"\n        ;;\n\n      compile)\n      $ECHO \\\n\"Usage: $progname [OPTION]... --mode=compile COMPILE-COMMAND... SOURCEFILE\n\nCompile a source file into a libtool library object.\n\nThis mode accepts the following additional options:\n\n  -o OUTPUT-FILE    set the output file name to OUTPUT-FILE\n  -no-suppress      do not suppress compiler output for multiple passes\n  -prefer-pic       try to build PIC objects only\n  -prefer-non-pic   try to build non-PIC objects only\n  -shared           do not build a '.o' file suitable for static linking\n  -static           only build a '.o' file suitable for static linking\n  -Wc,FLAG          pass FLAG directly to the compiler\n\nCOMPILE-COMMAND is a command to be used in creating a 'standard' object file\nfrom the given SOURCEFILE.\n\nThe output file name is determined by removing the directory component from\nSOURCEFILE, then substituting the C source code suffix '.c' with the\nlibrary object suffix, '.lo'.\"\n        ;;\n\n      execute)\n        $ECHO \\\n\"Usage: $progname [OPTION]... --mode=execute COMMAND [ARGS]...\n\nAutomatically set library path, then run a program.\n\nThis mode accepts the following additional options:\n\n  -dlopen FILE      add the directory containing FILE to the library path\n\nThis mode sets the library path environment variable according to '-dlopen'\nflags.\n\nIf any of the ARGS are libtool executable wrappers, then they are translated\ninto their corresponding uninstalled binary, and any of their required library\ndirectories are added to the library path.\n\nThen, COMMAND is executed, with ARGS as arguments.\"\n        ;;\n\n      finish)\n        $ECHO \\\n\"Usage: $progname [OPTION]... --mode=finish [LIBDIR]...\n\nComplete the installation of libtool libraries.\n\nEach LIBDIR is a directory that contains libtool libraries.\n\nThe commands that this mode executes may require superuser privileges.  Use\nthe '--dry-run' option if you just want to see what would be executed.\"\n        ;;\n\n      install)\n        $ECHO \\\n\"Usage: $progname [OPTION]... --mode=install INSTALL-COMMAND...\n\nInstall executables or libraries.\n\nINSTALL-COMMAND is the installation command.  The first component should be\neither the 'install' or 'cp' program.\n\nThe following components of INSTALL-COMMAND are treated specially:\n\n  -inst-prefix-dir PREFIX-DIR  Use PREFIX-DIR as a staging area for installation\n\nThe rest of the components are interpreted as arguments to that command (only\nBSD-compatible install options are recognized).\"\n        ;;\n\n      link)\n        $ECHO \\\n\"Usage: $progname [OPTION]... --mode=link LINK-COMMAND...\n\nLink object files or libraries together to form another library, or to\ncreate an executable program.\n\nLINK-COMMAND is a command using the C compiler that you would use to create\na program from several object files.\n\nThe following components of LINK-COMMAND are treated specially:\n\n  -all-static       do not do any dynamic linking at all\n  -avoid-version    do not add a version suffix if possible\n  -bindir BINDIR    specify path to binaries directory (for systems where\n                    libraries must be found in the PATH setting at runtime)\n  -dlopen FILE      '-dlpreopen' FILE if it cannot be dlopened at runtime\n  -dlpreopen FILE   link in FILE and add its symbols to lt_preloaded_symbols\n  -export-dynamic   allow symbols from OUTPUT-FILE to be resolved with dlsym(3)\n  -export-symbols SYMFILE\n                    try to export only the symbols listed in SYMFILE\n  -export-symbols-regex REGEX\n                    try to export only the symbols matching REGEX\n  -LLIBDIR          search LIBDIR for required installed libraries\n  -lNAME            OUTPUT-FILE requires the installed library libNAME\n  -module           build a library that can dlopened\n  -no-fast-install  disable the fast-install mode\n  -no-install       link a not-installable executable\n  -no-undefined     declare that a library does not refer to external symbols\n  -o OUTPUT-FILE    create OUTPUT-FILE from the specified objects\n  -objectlist FILE  use a list of object files found in FILE to specify objects\n  -os2dllname NAME  force a short DLL name on OS/2 (no effect on other OSes)\n  -precious-files-regex REGEX\n                    don't remove output files matching REGEX\n  -release RELEASE  specify package release information\n  -rpath LIBDIR     the created library will eventually be installed in LIBDIR\n  -R[ ]LIBDIR       add LIBDIR to the runtime path of programs and libraries\n  -shared           only do dynamic linking of libtool libraries\n  -shrext SUFFIX    override the standard shared library file extension\n  -static           do not do any dynamic linking of uninstalled libtool libraries\n  -static-libtool-libs\n                    do not do any dynamic linking of libtool libraries\n  -version-info CURRENT[:REVISION[:AGE]]\n                    specify library version info [each variable defaults to 0]\n  -weak LIBNAME     declare that the target provides the LIBNAME interface\n  -Wc,FLAG\n  -Xcompiler FLAG   pass linker-specific FLAG directly to the compiler\n  -Wl,FLAG\n  -Xlinker FLAG     pass linker-specific FLAG directly to the linker\n  -XCClinker FLAG   pass link-specific FLAG to the compiler driver (CC)\n\nAll other options (arguments beginning with '-') are ignored.\n\nEvery other argument is treated as a filename.  Files ending in '.la' are\ntreated as uninstalled libtool libraries, other files are standard or library\nobject files.\n\nIf the OUTPUT-FILE ends in '.la', then a libtool library is created,\nonly library objects ('.lo' files) may be specified, and '-rpath' is\nrequired, except when creating a convenience library.\n\nIf OUTPUT-FILE ends in '.a' or '.lib', then a standard library is created\nusing 'ar' and 'ranlib', or on Windows using 'lib'.\n\nIf OUTPUT-FILE ends in '.lo' or '.$objext', then a reloadable object file\nis created, otherwise an executable program is created.\"\n        ;;\n\n      uninstall)\n        $ECHO \\\n\"Usage: $progname [OPTION]... --mode=uninstall RM [RM-OPTION]... FILE...\n\nRemove libraries from an installation directory.\n\nRM is the name of the program to use to delete files associated with each FILE\n(typically '/bin/rm').  RM-OPTIONS are options (such as '-f') to be passed\nto RM.\n\nIf FILE is a libtool library, all the files associated with it are deleted.\nOtherwise, only FILE itself is deleted using RM.\"\n        ;;\n\n      *)\n        func_fatal_help \"invalid operation mode '$opt_mode'\"\n        ;;\n    esac\n\n    echo\n    $ECHO \"Try '$progname --help' for more information about other modes.\"\n}\n\n# Now that we've collected a possible --mode arg, show help if necessary\nif $opt_help; then\n  if test : = \"$opt_help\"; then\n    func_mode_help\n  else\n    {\n      func_help noexit\n      for opt_mode in compile link execute install finish uninstall clean; do\n\tfunc_mode_help\n      done\n    } | $SED -n '1p; 2,$s/^Usage:/  or: /p'\n    {\n      func_help noexit\n      for opt_mode in compile link execute install finish uninstall clean; do\n\techo\n\tfunc_mode_help\n      done\n    } |\n    $SED '1d\n      /^When reporting/,/^Report/{\n\tH\n\td\n      }\n      $x\n      /information about other modes/d\n      /more detailed .*MODE/d\n      s/^Usage:.*--mode=\\([^ ]*\\) .*/Description of \\1 mode:/'\n  fi\n  exit $?\nfi\n\n\n# func_mode_execute arg...\nfunc_mode_execute ()\n{\n    $debug_cmd\n\n    # The first argument is the command name.\n    cmd=$nonopt\n    test -z \"$cmd\" && \\\n      func_fatal_help \"you must specify a COMMAND\"\n\n    # Handle -dlopen flags immediately.\n    for file in $opt_dlopen; do\n      test -f \"$file\" \\\n\t|| func_fatal_help \"'$file' is not a file\"\n\n      dir=\n      case $file in\n      *.la)\n\tfunc_resolve_sysroot \"$file\"\n\tfile=$func_resolve_sysroot_result\n\n\t# Check to see that this really is a libtool archive.\n\tfunc_lalib_unsafe_p \"$file\" \\\n\t  || func_fatal_help \"'$lib' is not a valid libtool archive\"\n\n\t# Read the libtool library.\n\tdlname=\n\tlibrary_names=\n\tfunc_source \"$file\"\n\n\t# Skip this library if it cannot be dlopened.\n\tif test -z \"$dlname\"; then\n\t  # Warn if it was a shared library.\n\t  test -n \"$library_names\" && \\\n\t    func_warning \"'$file' was not linked with '-export-dynamic'\"\n\t  continue\n\tfi\n\n\tfunc_dirname \"$file\" \"\" \".\"\n\tdir=$func_dirname_result\n\n\tif test -f \"$dir/$objdir/$dlname\"; then\n\t  func_append dir \"/$objdir\"\n\telse\n\t  if test ! -f \"$dir/$dlname\"; then\n\t    func_fatal_error \"cannot find '$dlname' in '$dir' or '$dir/$objdir'\"\n\t  fi\n\tfi\n\t;;\n\n      *.lo)\n\t# Just add the directory containing the .lo file.\n\tfunc_dirname \"$file\" \"\" \".\"\n\tdir=$func_dirname_result\n\t;;\n\n      *)\n\tfunc_warning \"'-dlopen' is ignored for non-libtool libraries and objects\"\n\tcontinue\n\t;;\n      esac\n\n      # Get the absolute pathname.\n      absdir=`cd \"$dir\" && pwd`\n      test -n \"$absdir\" && dir=$absdir\n\n      # Now add the directory to shlibpath_var.\n      if eval \"test -z \\\"\\$$shlibpath_var\\\"\"; then\n\teval \"$shlibpath_var=\\\"\\$dir\\\"\"\n      else\n\teval \"$shlibpath_var=\\\"\\$dir:\\$$shlibpath_var\\\"\"\n      fi\n    done\n\n    # This variable tells wrapper scripts just to set shlibpath_var\n    # rather than running their programs.\n    libtool_execute_magic=$magic\n\n    # Check if any of the arguments is a wrapper script.\n    args=\n    for file\n    do\n      case $file in\n      -* | *.la | *.lo ) ;;\n      *)\n\t# Do a test to see if this is really a libtool program.\n\tif func_ltwrapper_script_p \"$file\"; then\n\t  func_source \"$file\"\n\t  # Transform arg to wrapped name.\n\t  file=$progdir/$program\n\telif func_ltwrapper_executable_p \"$file\"; then\n\t  func_ltwrapper_scriptname \"$file\"\n\t  func_source \"$func_ltwrapper_scriptname_result\"\n\t  # Transform arg to wrapped name.\n\t  file=$progdir/$program\n\tfi\n\t;;\n      esac\n      # Quote arguments (to preserve shell metacharacters).\n      func_append_quoted args \"$file\"\n    done\n\n    if $opt_dry_run; then\n      # Display what would be done.\n      if test -n \"$shlibpath_var\"; then\n\teval \"\\$ECHO \\\"\\$shlibpath_var=\\$$shlibpath_var\\\"\"\n\techo \"export $shlibpath_var\"\n      fi\n      $ECHO \"$cmd$args\"\n      exit $EXIT_SUCCESS\n    else\n      if test -n \"$shlibpath_var\"; then\n\t# Export the shlibpath_var.\n\teval \"export $shlibpath_var\"\n      fi\n\n      # Restore saved environment variables\n      for lt_var in LANG LANGUAGE LC_ALL LC_CTYPE LC_COLLATE LC_MESSAGES\n      do\n\teval \"if test \\\"\\${save_$lt_var+set}\\\" = set; then\n                $lt_var=\\$save_$lt_var; export $lt_var\n\t      else\n\t\t$lt_unset $lt_var\n\t      fi\"\n      done\n\n      # Now prepare to actually exec the command.\n      exec_cmd=\\$cmd$args\n    fi\n}\n\ntest execute = \"$opt_mode\" && func_mode_execute ${1+\"$@\"}\n\n\n# func_mode_finish arg...\nfunc_mode_finish ()\n{\n    $debug_cmd\n\n    libs=\n    libdirs=\n    admincmds=\n\n    for opt in \"$nonopt\" ${1+\"$@\"}\n    do\n      if test -d \"$opt\"; then\n\tfunc_append libdirs \" $opt\"\n\n      elif test -f \"$opt\"; then\n\tif func_lalib_unsafe_p \"$opt\"; then\n\t  func_append libs \" $opt\"\n\telse\n\t  func_warning \"'$opt' is not a valid libtool archive\"\n\tfi\n\n      else\n\tfunc_fatal_error \"invalid argument '$opt'\"\n      fi\n    done\n\n    if test -n \"$libs\"; then\n      if test -n \"$lt_sysroot\"; then\n        sysroot_regex=`$ECHO \"$lt_sysroot\" | $SED \"$sed_make_literal_regex\"`\n        sysroot_cmd=\"s/\\([ ']\\)$sysroot_regex/\\1/g;\"\n      else\n        sysroot_cmd=\n      fi\n\n      # Remove sysroot references\n      if $opt_dry_run; then\n        for lib in $libs; do\n          echo \"removing references to $lt_sysroot and '=' prefixes from $lib\"\n        done\n      else\n        tmpdir=`func_mktempdir`\n        for lib in $libs; do\n\t  $SED -e \"$sysroot_cmd s/\\([ ']-[LR]\\)=/\\1/g; s/\\([ ']\\)=/\\1/g\" $lib \\\n\t    > $tmpdir/tmp-la\n\t  mv -f $tmpdir/tmp-la $lib\n\tdone\n        ${RM}r \"$tmpdir\"\n      fi\n    fi\n\n    if test -n \"$finish_cmds$finish_eval\" && test -n \"$libdirs\"; then\n      for libdir in $libdirs; do\n\tif test -n \"$finish_cmds\"; then\n\t  # Do each command in the finish commands.\n\t  func_execute_cmds \"$finish_cmds\" 'admincmds=\"$admincmds\n'\"$cmd\"'\"'\n\tfi\n\tif test -n \"$finish_eval\"; then\n\t  # Do the single finish_eval.\n\t  eval cmds=\\\"$finish_eval\\\"\n\t  $opt_dry_run || eval \"$cmds\" || func_append admincmds \"\n       $cmds\"\n\tfi\n      done\n    fi\n\n    # Exit here if they wanted silent mode.\n    $opt_quiet && exit $EXIT_SUCCESS\n\n    if test -n \"$finish_cmds$finish_eval\" && test -n \"$libdirs\"; then\n      echo \"----------------------------------------------------------------------\"\n      echo \"Libraries have been installed in:\"\n      for libdir in $libdirs; do\n\t$ECHO \"   $libdir\"\n      done\n      echo\n      echo \"If you ever happen to want to link against installed libraries\"\n      echo \"in a given directory, LIBDIR, you must either use libtool, and\"\n      echo \"specify the full pathname of the library, or use the '-LLIBDIR'\"\n      echo \"flag during linking and do at least one of the following:\"\n      if test -n \"$shlibpath_var\"; then\n\techo \"   - add LIBDIR to the '$shlibpath_var' environment variable\"\n\techo \"     during execution\"\n      fi\n      if test -n \"$runpath_var\"; then\n\techo \"   - add LIBDIR to the '$runpath_var' environment variable\"\n\techo \"     during linking\"\n      fi\n      if test -n \"$hardcode_libdir_flag_spec\"; then\n\tlibdir=LIBDIR\n\teval flag=\\\"$hardcode_libdir_flag_spec\\\"\n\n\t$ECHO \"   - use the '$flag' linker flag\"\n      fi\n      if test -n \"$admincmds\"; then\n\t$ECHO \"   - have your system administrator run these commands:$admincmds\"\n      fi\n      if test -f /etc/ld.so.conf; then\n\techo \"   - have your system administrator add LIBDIR to '/etc/ld.so.conf'\"\n      fi\n      echo\n\n      echo \"See any operating system documentation about shared libraries for\"\n      case $host in\n\tsolaris2.[6789]|solaris2.1[0-9])\n\t  echo \"more information, such as the ld(1), crle(1) and ld.so(8) manual\"\n\t  echo \"pages.\"\n\t  ;;\n\t*)\n\t  echo \"more information, such as the ld(1) and ld.so(8) manual pages.\"\n\t  ;;\n      esac\n      echo \"----------------------------------------------------------------------\"\n    fi\n    exit $EXIT_SUCCESS\n}\n\ntest finish = \"$opt_mode\" && func_mode_finish ${1+\"$@\"}\n\n\n# func_mode_install arg...\nfunc_mode_install ()\n{\n    $debug_cmd\n\n    # There may be an optional sh(1) argument at the beginning of\n    # install_prog (especially on Windows NT).\n    if test \"$SHELL\" = \"$nonopt\" || test /bin/sh = \"$nonopt\" ||\n       # Allow the use of GNU shtool's install command.\n       case $nonopt in *shtool*) :;; *) false;; esac\n    then\n      # Aesthetically quote it.\n      func_quote_for_eval \"$nonopt\"\n      install_prog=\"$func_quote_for_eval_result \"\n      arg=$1\n      shift\n    else\n      install_prog=\n      arg=$nonopt\n    fi\n\n    # The real first argument should be the name of the installation program.\n    # Aesthetically quote it.\n    func_quote_for_eval \"$arg\"\n    func_append install_prog \"$func_quote_for_eval_result\"\n    install_shared_prog=$install_prog\n    case \" $install_prog \" in\n      *[\\\\\\ /]cp\\ *) install_cp=: ;;\n      *) install_cp=false ;;\n    esac\n\n    # We need to accept at least all the BSD install flags.\n    dest=\n    files=\n    opts=\n    prev=\n    install_type=\n    isdir=false\n    stripme=\n    no_mode=:\n    for arg\n    do\n      arg2=\n      if test -n \"$dest\"; then\n\tfunc_append files \" $dest\"\n\tdest=$arg\n\tcontinue\n      fi\n\n      case $arg in\n      -d) isdir=: ;;\n      -f)\n\tif $install_cp; then :; else\n\t  prev=$arg\n\tfi\n\t;;\n      -g | -m | -o)\n\tprev=$arg\n\t;;\n      -s)\n\tstripme=\" -s\"\n\tcontinue\n\t;;\n      -*)\n\t;;\n      *)\n\t# If the previous option needed an argument, then skip it.\n\tif test -n \"$prev\"; then\n\t  if test X-m = \"X$prev\" && test -n \"$install_override_mode\"; then\n\t    arg2=$install_override_mode\n\t    no_mode=false\n\t  fi\n\t  prev=\n\telse\n\t  dest=$arg\n\t  continue\n\tfi\n\t;;\n      esac\n\n      # Aesthetically quote the argument.\n      func_quote_for_eval \"$arg\"\n      func_append install_prog \" $func_quote_for_eval_result\"\n      if test -n \"$arg2\"; then\n\tfunc_quote_for_eval \"$arg2\"\n      fi\n      func_append install_shared_prog \" $func_quote_for_eval_result\"\n    done\n\n    test -z \"$install_prog\" && \\\n      func_fatal_help \"you must specify an install program\"\n\n    test -n \"$prev\" && \\\n      func_fatal_help \"the '$prev' option requires an argument\"\n\n    if test -n \"$install_override_mode\" && $no_mode; then\n      if $install_cp; then :; else\n\tfunc_quote_for_eval \"$install_override_mode\"\n\tfunc_append install_shared_prog \" -m $func_quote_for_eval_result\"\n      fi\n    fi\n\n    if test -z \"$files\"; then\n      if test -z \"$dest\"; then\n\tfunc_fatal_help \"no file or destination specified\"\n      else\n\tfunc_fatal_help \"you must specify a destination\"\n      fi\n    fi\n\n    # Strip any trailing slash from the destination.\n    func_stripname '' '/' \"$dest\"\n    dest=$func_stripname_result\n\n    # Check to see that the destination is a directory.\n    test -d \"$dest\" && isdir=:\n    if $isdir; then\n      destdir=$dest\n      destname=\n    else\n      func_dirname_and_basename \"$dest\" \"\" \".\"\n      destdir=$func_dirname_result\n      destname=$func_basename_result\n\n      # Not a directory, so check to see that there is only one file specified.\n      set dummy $files; shift\n      test \"$#\" -gt 1 && \\\n\tfunc_fatal_help \"'$dest' is not a directory\"\n    fi\n    case $destdir in\n    [\\\\/]* | [A-Za-z]:[\\\\/]*) ;;\n    *)\n      for file in $files; do\n\tcase $file in\n\t*.lo) ;;\n\t*)\n\t  func_fatal_help \"'$destdir' must be an absolute directory name\"\n\t  ;;\n\tesac\n      done\n      ;;\n    esac\n\n    # This variable tells wrapper scripts just to set variables rather\n    # than running their programs.\n    libtool_install_magic=$magic\n\n    staticlibs=\n    future_libdirs=\n    current_libdirs=\n    for file in $files; do\n\n      # Do each installation.\n      case $file in\n      *.$libext)\n\t# Do the static libraries later.\n\tfunc_append staticlibs \" $file\"\n\t;;\n\n      *.la)\n\tfunc_resolve_sysroot \"$file\"\n\tfile=$func_resolve_sysroot_result\n\n\t# Check to see that this really is a libtool archive.\n\tfunc_lalib_unsafe_p \"$file\" \\\n\t  || func_fatal_help \"'$file' is not a valid libtool archive\"\n\n\tlibrary_names=\n\told_library=\n\trelink_command=\n\tfunc_source \"$file\"\n\n\t# Add the libdir to current_libdirs if it is the destination.\n\tif test \"X$destdir\" = \"X$libdir\"; then\n\t  case \"$current_libdirs \" in\n\t  *\" $libdir \"*) ;;\n\t  *) func_append current_libdirs \" $libdir\" ;;\n\t  esac\n\telse\n\t  # Note the libdir as a future libdir.\n\t  case \"$future_libdirs \" in\n\t  *\" $libdir \"*) ;;\n\t  *) func_append future_libdirs \" $libdir\" ;;\n\t  esac\n\tfi\n\n\tfunc_dirname \"$file\" \"/\" \"\"\n\tdir=$func_dirname_result\n\tfunc_append dir \"$objdir\"\n\n\tif test -n \"$relink_command\"; then\n\t  # Determine the prefix the user has applied to our future dir.\n\t  inst_prefix_dir=`$ECHO \"$destdir\" | $SED -e \"s%$libdir\\$%%\"`\n\n\t  # Don't allow the user to place us outside of our expected\n\t  # location b/c this prevents finding dependent libraries that\n\t  # are installed to the same prefix.\n\t  # At present, this check doesn't affect windows .dll's that\n\t  # are installed into $libdir/../bin (currently, that works fine)\n\t  # but it's something to keep an eye on.\n\t  test \"$inst_prefix_dir\" = \"$destdir\" && \\\n\t    func_fatal_error \"error: cannot install '$file' to a directory not ending in $libdir\"\n\n\t  if test -n \"$inst_prefix_dir\"; then\n\t    # Stick the inst_prefix_dir data into the link command.\n\t    relink_command=`$ECHO \"$relink_command\" | $SED \"s%@inst_prefix_dir@%-inst-prefix-dir $inst_prefix_dir%\"`\n\t  else\n\t    relink_command=`$ECHO \"$relink_command\" | $SED \"s%@inst_prefix_dir@%%\"`\n\t  fi\n\n\t  func_warning \"relinking '$file'\"\n\t  func_show_eval \"$relink_command\" \\\n\t    'func_fatal_error \"error: relink '\\''$file'\\'' with the above command before installing it\"'\n\tfi\n\n\t# See the names of the shared library.\n\tset dummy $library_names; shift\n\tif test -n \"$1\"; then\n\t  realname=$1\n\t  shift\n\n\t  srcname=$realname\n\t  test -n \"$relink_command\" && srcname=${realname}T\n\n\t  # Install the shared library and build the symlinks.\n\t  func_show_eval \"$install_shared_prog $dir/$srcname $destdir/$realname\" \\\n\t      'exit $?'\n\t  tstripme=$stripme\n\t  case $host_os in\n\t  cygwin* | mingw* | pw32* | cegcc*)\n\t    case $realname in\n\t    *.dll.a)\n\t      tstripme=\n\t      ;;\n\t    esac\n\t    ;;\n\t  os2*)\n\t    case $realname in\n\t    *_dll.a)\n\t      tstripme=\n\t      ;;\n\t    esac\n\t    ;;\n\t  esac\n\t  if test -n \"$tstripme\" && test -n \"$striplib\"; then\n\t    func_show_eval \"$striplib $destdir/$realname\" 'exit $?'\n\t  fi\n\n\t  if test \"$#\" -gt 0; then\n\t    # Delete the old symlinks, and create new ones.\n\t    # Try 'ln -sf' first, because the 'ln' binary might depend on\n\t    # the symlink we replace!  Solaris /bin/ln does not understand -f,\n\t    # so we also need to try rm && ln -s.\n\t    for linkname\n\t    do\n\t      test \"$linkname\" != \"$realname\" \\\n\t\t&& func_show_eval \"(cd $destdir && { $LN_S -f $realname $linkname || { $RM $linkname && $LN_S $realname $linkname; }; })\"\n\t    done\n\t  fi\n\n\t  # Do each command in the postinstall commands.\n\t  lib=$destdir/$realname\n\t  func_execute_cmds \"$postinstall_cmds\" 'exit $?'\n\tfi\n\n\t# Install the pseudo-library for information purposes.\n\tfunc_basename \"$file\"\n\tname=$func_basename_result\n\tinstname=$dir/${name}i\n\tfunc_show_eval \"$install_prog $instname $destdir/$name\" 'exit $?'\n\n\t# Maybe install the static library, too.\n\ttest -n \"$old_library\" && func_append staticlibs \" $dir/$old_library\"\n\t;;\n\n      *.lo)\n\t# Install (i.e. copy) a libtool object.\n\n\t# Figure out destination file name, if it wasn't already specified.\n\tif test -n \"$destname\"; then\n\t  destfile=$destdir/$destname\n\telse\n\t  func_basename \"$file\"\n\t  destfile=$func_basename_result\n\t  destfile=$destdir/$destfile\n\tfi\n\n\t# Deduce the name of the destination old-style object file.\n\tcase $destfile in\n\t*.lo)\n\t  func_lo2o \"$destfile\"\n\t  staticdest=$func_lo2o_result\n\t  ;;\n\t*.$objext)\n\t  staticdest=$destfile\n\t  destfile=\n\t  ;;\n\t*)\n\t  func_fatal_help \"cannot copy a libtool object to '$destfile'\"\n\t  ;;\n\tesac\n\n\t# Install the libtool object if requested.\n\ttest -n \"$destfile\" && \\\n\t  func_show_eval \"$install_prog $file $destfile\" 'exit $?'\n\n\t# Install the old object if enabled.\n\tif test yes = \"$build_old_libs\"; then\n\t  # Deduce the name of the old-style object file.\n\t  func_lo2o \"$file\"\n\t  staticobj=$func_lo2o_result\n\t  func_show_eval \"$install_prog \\$staticobj \\$staticdest\" 'exit $?'\n\tfi\n\texit $EXIT_SUCCESS\n\t;;\n\n      *)\n\t# Figure out destination file name, if it wasn't already specified.\n\tif test -n \"$destname\"; then\n\t  destfile=$destdir/$destname\n\telse\n\t  func_basename \"$file\"\n\t  destfile=$func_basename_result\n\t  destfile=$destdir/$destfile\n\tfi\n\n\t# If the file is missing, and there is a .exe on the end, strip it\n\t# because it is most likely a libtool script we actually want to\n\t# install\n\tstripped_ext=\n\tcase $file in\n\t  *.exe)\n\t    if test ! -f \"$file\"; then\n\t      func_stripname '' '.exe' \"$file\"\n\t      file=$func_stripname_result\n\t      stripped_ext=.exe\n\t    fi\n\t    ;;\n\tesac\n\n\t# Do a test to see if this is really a libtool program.\n\tcase $host in\n\t*cygwin* | *mingw*)\n\t    if func_ltwrapper_executable_p \"$file\"; then\n\t      func_ltwrapper_scriptname \"$file\"\n\t      wrapper=$func_ltwrapper_scriptname_result\n\t    else\n\t      func_stripname '' '.exe' \"$file\"\n\t      wrapper=$func_stripname_result\n\t    fi\n\t    ;;\n\t*)\n\t    wrapper=$file\n\t    ;;\n\tesac\n\tif func_ltwrapper_script_p \"$wrapper\"; then\n\t  notinst_deplibs=\n\t  relink_command=\n\n\t  func_source \"$wrapper\"\n\n\t  # Check the variables that should have been set.\n\t  test -z \"$generated_by_libtool_version\" && \\\n\t    func_fatal_error \"invalid libtool wrapper script '$wrapper'\"\n\n\t  finalize=:\n\t  for lib in $notinst_deplibs; do\n\t    # Check to see that each library is installed.\n\t    libdir=\n\t    if test -f \"$lib\"; then\n\t      func_source \"$lib\"\n\t    fi\n\t    libfile=$libdir/`$ECHO \"$lib\" | $SED 's%^.*/%%g'`\n\t    if test -n \"$libdir\" && test ! -f \"$libfile\"; then\n\t      func_warning \"'$lib' has not been installed in '$libdir'\"\n\t      finalize=false\n\t    fi\n\t  done\n\n\t  relink_command=\n\t  func_source \"$wrapper\"\n\n\t  outputname=\n\t  if test no = \"$fast_install\" && test -n \"$relink_command\"; then\n\t    $opt_dry_run || {\n\t      if $finalize; then\n\t        tmpdir=`func_mktempdir`\n\t\tfunc_basename \"$file$stripped_ext\"\n\t\tfile=$func_basename_result\n\t        outputname=$tmpdir/$file\n\t        # Replace the output file specification.\n\t        relink_command=`$ECHO \"$relink_command\" | $SED 's%@OUTPUT@%'\"$outputname\"'%g'`\n\n\t        $opt_quiet || {\n\t          func_quote_for_expand \"$relink_command\"\n\t\t  eval \"func_echo $func_quote_for_expand_result\"\n\t        }\n\t        if eval \"$relink_command\"; then :\n\t          else\n\t\t  func_error \"error: relink '$file' with the above command before installing it\"\n\t\t  $opt_dry_run || ${RM}r \"$tmpdir\"\n\t\t  continue\n\t        fi\n\t        file=$outputname\n\t      else\n\t        func_warning \"cannot relink '$file'\"\n\t      fi\n\t    }\n\t  else\n\t    # Install the binary that we compiled earlier.\n\t    file=`$ECHO \"$file$stripped_ext\" | $SED \"s%\\([^/]*\\)$%$objdir/\\1%\"`\n\t  fi\n\tfi\n\n\t# remove .exe since cygwin /usr/bin/install will append another\n\t# one anyway\n\tcase $install_prog,$host in\n\t*/usr/bin/install*,*cygwin*)\n\t  case $file:$destfile in\n\t  *.exe:*.exe)\n\t    # this is ok\n\t    ;;\n\t  *.exe:*)\n\t    destfile=$destfile.exe\n\t    ;;\n\t  *:*.exe)\n\t    func_stripname '' '.exe' \"$destfile\"\n\t    destfile=$func_stripname_result\n\t    ;;\n\t  esac\n\t  ;;\n\tesac\n\tfunc_show_eval \"$install_prog\\$stripme \\$file \\$destfile\" 'exit $?'\n\t$opt_dry_run || if test -n \"$outputname\"; then\n\t  ${RM}r \"$tmpdir\"\n\tfi\n\t;;\n      esac\n    done\n\n    for file in $staticlibs; do\n      func_basename \"$file\"\n      name=$func_basename_result\n\n      # Set up the ranlib parameters.\n      oldlib=$destdir/$name\n      func_to_tool_file \"$oldlib\" func_convert_file_msys_to_w32\n      tool_oldlib=$func_to_tool_file_result\n\n      func_show_eval \"$install_prog \\$file \\$oldlib\" 'exit $?'\n\n      if test -n \"$stripme\" && test -n \"$old_striplib\"; then\n\tfunc_show_eval \"$old_striplib $tool_oldlib\" 'exit $?'\n      fi\n\n      # Do each command in the postinstall commands.\n      func_execute_cmds \"$old_postinstall_cmds\" 'exit $?'\n    done\n\n    test -n \"$future_libdirs\" && \\\n      func_warning \"remember to run '$progname --finish$future_libdirs'\"\n\n    if test -n \"$current_libdirs\"; then\n      # Maybe just do a dry run.\n      $opt_dry_run && current_libdirs=\" -n$current_libdirs\"\n      exec_cmd='$SHELL \"$progpath\" $preserve_args --finish$current_libdirs'\n    else\n      exit $EXIT_SUCCESS\n    fi\n}\n\ntest install = \"$opt_mode\" && func_mode_install ${1+\"$@\"}\n\n\n# func_generate_dlsyms outputname originator pic_p\n# Extract symbols from dlprefiles and create ${outputname}S.o with\n# a dlpreopen symbol table.\nfunc_generate_dlsyms ()\n{\n    $debug_cmd\n\n    my_outputname=$1\n    my_originator=$2\n    my_pic_p=${3-false}\n    my_prefix=`$ECHO \"$my_originator\" | $SED 's%[^a-zA-Z0-9]%_%g'`\n    my_dlsyms=\n\n    if test -n \"$dlfiles$dlprefiles\" || test no != \"$dlself\"; then\n      if test -n \"$NM\" && test -n \"$global_symbol_pipe\"; then\n\tmy_dlsyms=${my_outputname}S.c\n      else\n\tfunc_error \"not configured to extract global symbols from dlpreopened files\"\n      fi\n    fi\n\n    if test -n \"$my_dlsyms\"; then\n      case $my_dlsyms in\n      \"\") ;;\n      *.c)\n\t# Discover the nlist of each of the dlfiles.\n\tnlist=$output_objdir/$my_outputname.nm\n\n\tfunc_show_eval \"$RM $nlist ${nlist}S ${nlist}T\"\n\n\t# Parse the name list into a source file.\n\tfunc_verbose \"creating $output_objdir/$my_dlsyms\"\n\n\t$opt_dry_run || $ECHO > \"$output_objdir/$my_dlsyms\" \"\\\n/* $my_dlsyms - symbol resolution table for '$my_outputname' dlsym emulation. */\n/* Generated by $PROGRAM (GNU $PACKAGE) $VERSION */\n\n#ifdef __cplusplus\nextern \\\"C\\\" {\n#endif\n\n#if defined __GNUC__ && (((__GNUC__ == 4) && (__GNUC_MINOR__ >= 4)) || (__GNUC__ > 4))\n#pragma GCC diagnostic ignored \\\"-Wstrict-prototypes\\\"\n#endif\n\n/* Keep this code in sync between libtool.m4, ltmain, lt_system.h, and tests.  */\n#if defined _WIN32 || defined __CYGWIN__ || defined _WIN32_WCE\n/* DATA imports from DLLs on WIN32 can't be const, because runtime\n   relocations are performed -- see ld's documentation on pseudo-relocs.  */\n# define LT_DLSYM_CONST\n#elif defined __osf__\n/* This system does not cope well with relocations in const data.  */\n# define LT_DLSYM_CONST\n#else\n# define LT_DLSYM_CONST const\n#endif\n\n#define STREQ(s1, s2) (strcmp ((s1), (s2)) == 0)\n\n/* External symbol declarations for the compiler. */\\\n\"\n\n\tif test yes = \"$dlself\"; then\n\t  func_verbose \"generating symbol list for '$output'\"\n\n\t  $opt_dry_run || echo ': @PROGRAM@ ' > \"$nlist\"\n\n\t  # Add our own program objects to the symbol list.\n\t  progfiles=`$ECHO \"$objs$old_deplibs\" | $SP2NL | $SED \"$lo2o\" | $NL2SP`\n\t  for progfile in $progfiles; do\n\t    func_to_tool_file \"$progfile\" func_convert_file_msys_to_w32\n\t    func_verbose \"extracting global C symbols from '$func_to_tool_file_result'\"\n\t    $opt_dry_run || eval \"$NM $func_to_tool_file_result | $global_symbol_pipe >> '$nlist'\"\n\t  done\n\n\t  if test -n \"$exclude_expsyms\"; then\n\t    $opt_dry_run || {\n\t      eval '$EGREP -v \" ($exclude_expsyms)$\" \"$nlist\" > \"$nlist\"T'\n\t      eval '$MV \"$nlist\"T \"$nlist\"'\n\t    }\n\t  fi\n\n\t  if test -n \"$export_symbols_regex\"; then\n\t    $opt_dry_run || {\n\t      eval '$EGREP -e \"$export_symbols_regex\" \"$nlist\" > \"$nlist\"T'\n\t      eval '$MV \"$nlist\"T \"$nlist\"'\n\t    }\n\t  fi\n\n\t  # Prepare the list of exported symbols\n\t  if test -z \"$export_symbols\"; then\n\t    export_symbols=$output_objdir/$outputname.exp\n\t    $opt_dry_run || {\n\t      $RM $export_symbols\n\t      eval \"$SED -n -e '/^: @PROGRAM@ $/d' -e 's/^.* \\(.*\\)$/\\1/p' \"'< \"$nlist\" > \"$export_symbols\"'\n\t      case $host in\n\t      *cygwin* | *mingw* | *cegcc* )\n                eval \"echo EXPORTS \"'> \"$output_objdir/$outputname.def\"'\n                eval 'cat \"$export_symbols\" >> \"$output_objdir/$outputname.def\"'\n\t        ;;\n\t      esac\n\t    }\n\t  else\n\t    $opt_dry_run || {\n\t      eval \"$SED -e 's/\\([].[*^$]\\)/\\\\\\\\\\1/g' -e 's/^/ /' -e 's/$/$/'\"' < \"$export_symbols\" > \"$output_objdir/$outputname.exp\"'\n\t      eval '$GREP -f \"$output_objdir/$outputname.exp\" < \"$nlist\" > \"$nlist\"T'\n\t      eval '$MV \"$nlist\"T \"$nlist\"'\n\t      case $host in\n\t        *cygwin* | *mingw* | *cegcc* )\n\t          eval \"echo EXPORTS \"'> \"$output_objdir/$outputname.def\"'\n\t          eval 'cat \"$nlist\" >> \"$output_objdir/$outputname.def\"'\n\t          ;;\n\t      esac\n\t    }\n\t  fi\n\tfi\n\n\tfor dlprefile in $dlprefiles; do\n\t  func_verbose \"extracting global C symbols from '$dlprefile'\"\n\t  func_basename \"$dlprefile\"\n\t  name=$func_basename_result\n          case $host in\n\t    *cygwin* | *mingw* | *cegcc* )\n\t      # if an import library, we need to obtain dlname\n\t      if func_win32_import_lib_p \"$dlprefile\"; then\n\t        func_tr_sh \"$dlprefile\"\n\t        eval \"curr_lafile=\\$libfile_$func_tr_sh_result\"\n\t        dlprefile_dlbasename=\n\t        if test -n \"$curr_lafile\" && func_lalib_p \"$curr_lafile\"; then\n\t          # Use subshell, to avoid clobbering current variable values\n\t          dlprefile_dlname=`source \"$curr_lafile\" && echo \"$dlname\"`\n\t          if test -n \"$dlprefile_dlname\"; then\n\t            func_basename \"$dlprefile_dlname\"\n\t            dlprefile_dlbasename=$func_basename_result\n\t          else\n\t            # no lafile. user explicitly requested -dlpreopen <import library>.\n\t            $sharedlib_from_linklib_cmd \"$dlprefile\"\n\t            dlprefile_dlbasename=$sharedlib_from_linklib_result\n\t          fi\n\t        fi\n\t        $opt_dry_run || {\n\t          if test -n \"$dlprefile_dlbasename\"; then\n\t            eval '$ECHO \": $dlprefile_dlbasename\" >> \"$nlist\"'\n\t          else\n\t            func_warning \"Could not compute DLL name from $name\"\n\t            eval '$ECHO \": $name \" >> \"$nlist\"'\n\t          fi\n\t          func_to_tool_file \"$dlprefile\" func_convert_file_msys_to_w32\n\t          eval \"$NM \\\"$func_to_tool_file_result\\\" 2>/dev/null | $global_symbol_pipe |\n\t            $SED -e '/I __imp/d' -e 's/I __nm_/D /;s/_nm__//' >> '$nlist'\"\n\t        }\n\t      else # not an import lib\n\t        $opt_dry_run || {\n\t          eval '$ECHO \": $name \" >> \"$nlist\"'\n\t          func_to_tool_file \"$dlprefile\" func_convert_file_msys_to_w32\n\t          eval \"$NM \\\"$func_to_tool_file_result\\\" 2>/dev/null | $global_symbol_pipe >> '$nlist'\"\n\t        }\n\t      fi\n\t    ;;\n\t    *)\n\t      $opt_dry_run || {\n\t        eval '$ECHO \": $name \" >> \"$nlist\"'\n\t        func_to_tool_file \"$dlprefile\" func_convert_file_msys_to_w32\n\t        eval \"$NM \\\"$func_to_tool_file_result\\\" 2>/dev/null | $global_symbol_pipe >> '$nlist'\"\n\t      }\n\t    ;;\n          esac\n\tdone\n\n\t$opt_dry_run || {\n\t  # Make sure we have at least an empty file.\n\t  test -f \"$nlist\" || : > \"$nlist\"\n\n\t  if test -n \"$exclude_expsyms\"; then\n\t    $EGREP -v \" ($exclude_expsyms)$\" \"$nlist\" > \"$nlist\"T\n\t    $MV \"$nlist\"T \"$nlist\"\n\t  fi\n\n\t  # Try sorting and uniquifying the output.\n\t  if $GREP -v \"^: \" < \"$nlist\" |\n\t      if sort -k 3 </dev/null >/dev/null 2>&1; then\n\t\tsort -k 3\n\t      else\n\t\tsort +2\n\t      fi |\n\t      uniq > \"$nlist\"S; then\n\t    :\n\t  else\n\t    $GREP -v \"^: \" < \"$nlist\" > \"$nlist\"S\n\t  fi\n\n\t  if test -f \"$nlist\"S; then\n\t    eval \"$global_symbol_to_cdecl\"' < \"$nlist\"S >> \"$output_objdir/$my_dlsyms\"'\n\t  else\n\t    echo '/* NONE */' >> \"$output_objdir/$my_dlsyms\"\n\t  fi\n\n\t  func_show_eval '$RM \"${nlist}I\"'\n\t  if test -n \"$global_symbol_to_import\"; then\n\t    eval \"$global_symbol_to_import\"' < \"$nlist\"S > \"$nlist\"I'\n\t  fi\n\n\t  echo >> \"$output_objdir/$my_dlsyms\" \"\\\n\n/* The mapping between symbol names and symbols.  */\ntypedef struct {\n  const char *name;\n  void *address;\n} lt_dlsymlist;\nextern LT_DLSYM_CONST lt_dlsymlist\nlt_${my_prefix}_LTX_preloaded_symbols[];\\\n\"\n\n\t  if test -s \"$nlist\"I; then\n\t    echo >> \"$output_objdir/$my_dlsyms\" \"\\\nstatic void lt_syminit(void)\n{\n  LT_DLSYM_CONST lt_dlsymlist *symbol = lt_${my_prefix}_LTX_preloaded_symbols;\n  for (; symbol->name; ++symbol)\n    {\"\n\t    $SED 's/.*/      if (STREQ (symbol->name, \\\"&\\\")) symbol->address = (void *) \\&&;/' < \"$nlist\"I >> \"$output_objdir/$my_dlsyms\"\n\t    echo >> \"$output_objdir/$my_dlsyms\" \"\\\n    }\n}\"\n\t  fi\n\t  echo >> \"$output_objdir/$my_dlsyms\" \"\\\nLT_DLSYM_CONST lt_dlsymlist\nlt_${my_prefix}_LTX_preloaded_symbols[] =\n{ {\\\"$my_originator\\\", (void *) 0},\"\n\n\t  if test -s \"$nlist\"I; then\n\t    echo >> \"$output_objdir/$my_dlsyms\" \"\\\n  {\\\"@INIT@\\\", (void *) &lt_syminit},\"\n\t  fi\n\n\t  case $need_lib_prefix in\n\t  no)\n\t    eval \"$global_symbol_to_c_name_address\" < \"$nlist\" >> \"$output_objdir/$my_dlsyms\"\n\t    ;;\n\t  *)\n\t    eval \"$global_symbol_to_c_name_address_lib_prefix\" < \"$nlist\" >> \"$output_objdir/$my_dlsyms\"\n\t    ;;\n\t  esac\n\t  echo >> \"$output_objdir/$my_dlsyms\" \"\\\n  {0, (void *) 0}\n};\n\n/* This works around a problem in FreeBSD linker */\n#ifdef FREEBSD_WORKAROUND\nstatic const void *lt_preloaded_setup() {\n  return lt_${my_prefix}_LTX_preloaded_symbols;\n}\n#endif\n\n#ifdef __cplusplus\n}\n#endif\\\n\"\n\t} # !$opt_dry_run\n\n\tpic_flag_for_symtable=\n\tcase \"$compile_command \" in\n\t*\" -static \"*) ;;\n\t*)\n\t  case $host in\n\t  # compiling the symbol table file with pic_flag works around\n\t  # a FreeBSD bug that causes programs to crash when -lm is\n\t  # linked before any other PIC object.  But we must not use\n\t  # pic_flag when linking with -static.  The problem exists in\n\t  # FreeBSD 2.2.6 and is fixed in FreeBSD 3.1.\n\t  *-*-freebsd2.*|*-*-freebsd3.0*|*-*-freebsdelf3.0*)\n\t    pic_flag_for_symtable=\" $pic_flag -DFREEBSD_WORKAROUND\" ;;\n\t  *-*-hpux*)\n\t    pic_flag_for_symtable=\" $pic_flag\"  ;;\n\t  *)\n\t    $my_pic_p && pic_flag_for_symtable=\" $pic_flag\"\n\t    ;;\n\t  esac\n\t  ;;\n\tesac\n\tsymtab_cflags=\n\tfor arg in $LTCFLAGS; do\n\t  case $arg in\n\t  -pie | -fpie | -fPIE) ;;\n\t  *) func_append symtab_cflags \" $arg\" ;;\n\t  esac\n\tdone\n\n\t# Now compile the dynamic symbol file.\n\tfunc_show_eval '(cd $output_objdir && $LTCC$symtab_cflags -c$no_builtin_flag$pic_flag_for_symtable \"$my_dlsyms\")' 'exit $?'\n\n\t# Clean up the generated files.\n\tfunc_show_eval '$RM \"$output_objdir/$my_dlsyms\" \"$nlist\" \"${nlist}S\" \"${nlist}T\" \"${nlist}I\"'\n\n\t# Transform the symbol file into the correct name.\n\tsymfileobj=$output_objdir/${my_outputname}S.$objext\n\tcase $host in\n\t*cygwin* | *mingw* | *cegcc* )\n\t  if test -f \"$output_objdir/$my_outputname.def\"; then\n\t    compile_command=`$ECHO \"$compile_command\" | $SED \"s%@SYMFILE@%$output_objdir/$my_outputname.def $symfileobj%\"`\n\t    finalize_command=`$ECHO \"$finalize_command\" | $SED \"s%@SYMFILE@%$output_objdir/$my_outputname.def $symfileobj%\"`\n\t  else\n\t    compile_command=`$ECHO \"$compile_command\" | $SED \"s%@SYMFILE@%$symfileobj%\"`\n\t    finalize_command=`$ECHO \"$finalize_command\" | $SED \"s%@SYMFILE@%$symfileobj%\"`\n\t  fi\n\t  ;;\n\t*)\n\t  compile_command=`$ECHO \"$compile_command\" | $SED \"s%@SYMFILE@%$symfileobj%\"`\n\t  finalize_command=`$ECHO \"$finalize_command\" | $SED \"s%@SYMFILE@%$symfileobj%\"`\n\t  ;;\n\tesac\n\t;;\n      *)\n\tfunc_fatal_error \"unknown suffix for '$my_dlsyms'\"\n\t;;\n      esac\n    else\n      # We keep going just in case the user didn't refer to\n      # lt_preloaded_symbols.  The linker will fail if global_symbol_pipe\n      # really was required.\n\n      # Nullify the symbol file.\n      compile_command=`$ECHO \"$compile_command\" | $SED \"s% @SYMFILE@%%\"`\n      finalize_command=`$ECHO \"$finalize_command\" | $SED \"s% @SYMFILE@%%\"`\n    fi\n}\n\n# func_cygming_gnu_implib_p ARG\n# This predicate returns with zero status (TRUE) if\n# ARG is a GNU/binutils-style import library. Returns\n# with nonzero status (FALSE) otherwise.\nfunc_cygming_gnu_implib_p ()\n{\n  $debug_cmd\n\n  func_to_tool_file \"$1\" func_convert_file_msys_to_w32\n  func_cygming_gnu_implib_tmp=`$NM \"$func_to_tool_file_result\" | eval \"$global_symbol_pipe\" | $EGREP ' (_head_[A-Za-z0-9_]+_[ad]l*|[A-Za-z0-9_]+_[ad]l*_iname)$'`\n  test -n \"$func_cygming_gnu_implib_tmp\"\n}\n\n# func_cygming_ms_implib_p ARG\n# This predicate returns with zero status (TRUE) if\n# ARG is an MS-style import library. Returns\n# with nonzero status (FALSE) otherwise.\nfunc_cygming_ms_implib_p ()\n{\n  $debug_cmd\n\n  func_to_tool_file \"$1\" func_convert_file_msys_to_w32\n  func_cygming_ms_implib_tmp=`$NM \"$func_to_tool_file_result\" | eval \"$global_symbol_pipe\" | $GREP '_NULL_IMPORT_DESCRIPTOR'`\n  test -n \"$func_cygming_ms_implib_tmp\"\n}\n\n# func_win32_libid arg\n# return the library type of file 'arg'\n#\n# Need a lot of goo to handle *both* DLLs and import libs\n# Has to be a shell function in order to 'eat' the argument\n# that is supplied when $file_magic_command is called.\n# Despite the name, also deal with 64 bit binaries.\nfunc_win32_libid ()\n{\n  $debug_cmd\n\n  win32_libid_type=unknown\n  win32_fileres=`file -L $1 2>/dev/null`\n  case $win32_fileres in\n  *ar\\ archive\\ import\\ library*) # definitely import\n    win32_libid_type=\"x86 archive import\"\n    ;;\n  *ar\\ archive*) # could be an import, or static\n    # Keep the egrep pattern in sync with the one in _LT_CHECK_MAGIC_METHOD.\n    if eval $OBJDUMP -f $1 | $SED -e '10q' 2>/dev/null |\n       $EGREP 'file format (pei*-i386(.*architecture: i386)?|pe-arm-wince|pe-x86-64)' >/dev/null; then\n      case $nm_interface in\n      \"MS dumpbin\")\n\tif func_cygming_ms_implib_p \"$1\" ||\n\t   func_cygming_gnu_implib_p \"$1\"\n\tthen\n\t  win32_nmres=import\n\telse\n\t  win32_nmres=\n\tfi\n\t;;\n      *)\n\tfunc_to_tool_file \"$1\" func_convert_file_msys_to_w32\n\twin32_nmres=`eval $NM -f posix -A \\\"$func_to_tool_file_result\\\" |\n\t  $SED -n -e '\n\t    1,100{\n\t\t/ I /{\n\t\t    s|.*|import|\n\t\t    p\n\t\t    q\n\t\t}\n\t    }'`\n\t;;\n      esac\n      case $win32_nmres in\n      import*)  win32_libid_type=\"x86 archive import\";;\n      *)        win32_libid_type=\"x86 archive static\";;\n      esac\n    fi\n    ;;\n  *DLL*)\n    win32_libid_type=\"x86 DLL\"\n    ;;\n  *executable*) # but shell scripts are \"executable\" too...\n    case $win32_fileres in\n    *MS\\ Windows\\ PE\\ Intel*)\n      win32_libid_type=\"x86 DLL\"\n      ;;\n    esac\n    ;;\n  esac\n  $ECHO \"$win32_libid_type\"\n}\n\n# func_cygming_dll_for_implib ARG\n#\n# Platform-specific function to extract the\n# name of the DLL associated with the specified\n# import library ARG.\n# Invoked by eval'ing the libtool variable\n#    $sharedlib_from_linklib_cmd\n# Result is available in the variable\n#    $sharedlib_from_linklib_result\nfunc_cygming_dll_for_implib ()\n{\n  $debug_cmd\n\n  sharedlib_from_linklib_result=`$DLLTOOL --identify-strict --identify \"$1\"`\n}\n\n# func_cygming_dll_for_implib_fallback_core SECTION_NAME LIBNAMEs\n#\n# The is the core of a fallback implementation of a\n# platform-specific function to extract the name of the\n# DLL associated with the specified import library LIBNAME.\n#\n# SECTION_NAME is either .idata$6 or .idata$7, depending\n# on the platform and compiler that created the implib.\n#\n# Echos the name of the DLL associated with the\n# specified import library.\nfunc_cygming_dll_for_implib_fallback_core ()\n{\n  $debug_cmd\n\n  match_literal=`$ECHO \"$1\" | $SED \"$sed_make_literal_regex\"`\n  $OBJDUMP -s --section \"$1\" \"$2\" 2>/dev/null |\n    $SED '/^Contents of section '\"$match_literal\"':/{\n      # Place marker at beginning of archive member dllname section\n      s/.*/====MARK====/\n      p\n      d\n    }\n    # These lines can sometimes be longer than 43 characters, but\n    # are always uninteresting\n    /:[\t ]*file format pe[i]\\{,1\\}-/d\n    /^In archive [^:]*:/d\n    # Ensure marker is printed\n    /^====MARK====/p\n    # Remove all lines with less than 43 characters\n    /^.\\{43\\}/!d\n    # From remaining lines, remove first 43 characters\n    s/^.\\{43\\}//' |\n    $SED -n '\n      # Join marker and all lines until next marker into a single line\n      /^====MARK====/ b para\n      H\n      $ b para\n      b\n      :para\n      x\n      s/\\n//g\n      # Remove the marker\n      s/^====MARK====//\n      # Remove trailing dots and whitespace\n      s/[\\. \\t]*$//\n      # Print\n      /./p' |\n    # we now have a list, one entry per line, of the stringified\n    # contents of the appropriate section of all members of the\n    # archive that possess that section. Heuristic: eliminate\n    # all those that have a first or second character that is\n    # a '.' (that is, objdump's representation of an unprintable\n    # character.) This should work for all archives with less than\n    # 0x302f exports -- but will fail for DLLs whose name actually\n    # begins with a literal '.' or a single character followed by\n    # a '.'.\n    #\n    # Of those that remain, print the first one.\n    $SED -e '/^\\./d;/^.\\./d;q'\n}\n\n# func_cygming_dll_for_implib_fallback ARG\n# Platform-specific function to extract the\n# name of the DLL associated with the specified\n# import library ARG.\n#\n# This fallback implementation is for use when $DLLTOOL\n# does not support the --identify-strict option.\n# Invoked by eval'ing the libtool variable\n#    $sharedlib_from_linklib_cmd\n# Result is available in the variable\n#    $sharedlib_from_linklib_result\nfunc_cygming_dll_for_implib_fallback ()\n{\n  $debug_cmd\n\n  if func_cygming_gnu_implib_p \"$1\"; then\n    # binutils import library\n    sharedlib_from_linklib_result=`func_cygming_dll_for_implib_fallback_core '.idata$7' \"$1\"`\n  elif func_cygming_ms_implib_p \"$1\"; then\n    # ms-generated import library\n    sharedlib_from_linklib_result=`func_cygming_dll_for_implib_fallback_core '.idata$6' \"$1\"`\n  else\n    # unknown\n    sharedlib_from_linklib_result=\n  fi\n}\n\n\n# func_extract_an_archive dir oldlib\nfunc_extract_an_archive ()\n{\n    $debug_cmd\n\n    f_ex_an_ar_dir=$1; shift\n    f_ex_an_ar_oldlib=$1\n    if test yes = \"$lock_old_archive_extraction\"; then\n      lockfile=$f_ex_an_ar_oldlib.lock\n      until $opt_dry_run || ln \"$progpath\" \"$lockfile\" 2>/dev/null; do\n\tfunc_echo \"Waiting for $lockfile to be removed\"\n\tsleep 2\n      done\n    fi\n    func_show_eval \"(cd \\$f_ex_an_ar_dir && $AR x \\\"\\$f_ex_an_ar_oldlib\\\")\" \\\n\t\t   'stat=$?; rm -f \"$lockfile\"; exit $stat'\n    if test yes = \"$lock_old_archive_extraction\"; then\n      $opt_dry_run || rm -f \"$lockfile\"\n    fi\n    if ($AR t \"$f_ex_an_ar_oldlib\" | sort | sort -uc >/dev/null 2>&1); then\n     :\n    else\n      func_fatal_error \"object name conflicts in archive: $f_ex_an_ar_dir/$f_ex_an_ar_oldlib\"\n    fi\n}\n\n\n# func_extract_archives gentop oldlib ...\nfunc_extract_archives ()\n{\n    $debug_cmd\n\n    my_gentop=$1; shift\n    my_oldlibs=${1+\"$@\"}\n    my_oldobjs=\n    my_xlib=\n    my_xabs=\n    my_xdir=\n\n    for my_xlib in $my_oldlibs; do\n      # Extract the objects.\n      case $my_xlib in\n\t[\\\\/]* | [A-Za-z]:[\\\\/]*) my_xabs=$my_xlib ;;\n\t*) my_xabs=`pwd`\"/$my_xlib\" ;;\n      esac\n      func_basename \"$my_xlib\"\n      my_xlib=$func_basename_result\n      my_xlib_u=$my_xlib\n      while :; do\n        case \" $extracted_archives \" in\n\t*\" $my_xlib_u \"*)\n\t  func_arith $extracted_serial + 1\n\t  extracted_serial=$func_arith_result\n\t  my_xlib_u=lt$extracted_serial-$my_xlib ;;\n\t*) break ;;\n\tesac\n      done\n      extracted_archives=\"$extracted_archives $my_xlib_u\"\n      my_xdir=$my_gentop/$my_xlib_u\n\n      func_mkdir_p \"$my_xdir\"\n\n      case $host in\n      *-darwin*)\n\tfunc_verbose \"Extracting $my_xabs\"\n\t# Do not bother doing anything if just a dry run\n\t$opt_dry_run || {\n\t  darwin_orig_dir=`pwd`\n\t  cd $my_xdir || exit $?\n\t  darwin_archive=$my_xabs\n\t  darwin_curdir=`pwd`\n\t  func_basename \"$darwin_archive\"\n\t  darwin_base_archive=$func_basename_result\n\t  darwin_arches=`$LIPO -info \"$darwin_archive\" 2>/dev/null | $GREP Architectures 2>/dev/null || true`\n\t  if test -n \"$darwin_arches\"; then\n\t    darwin_arches=`$ECHO \"$darwin_arches\" | $SED -e 's/.*are://'`\n\t    darwin_arch=\n\t    func_verbose \"$darwin_base_archive has multiple architectures $darwin_arches\"\n\t    for darwin_arch in  $darwin_arches; do\n\t      func_mkdir_p \"unfat-$$/$darwin_base_archive-$darwin_arch\"\n\t      $LIPO -thin $darwin_arch -output \"unfat-$$/$darwin_base_archive-$darwin_arch/$darwin_base_archive\" \"$darwin_archive\"\n\t      cd \"unfat-$$/$darwin_base_archive-$darwin_arch\"\n\t      func_extract_an_archive \"`pwd`\" \"$darwin_base_archive\"\n\t      cd \"$darwin_curdir\"\n\t      $RM \"unfat-$$/$darwin_base_archive-$darwin_arch/$darwin_base_archive\"\n\t    done # $darwin_arches\n            ## Okay now we've a bunch of thin objects, gotta fatten them up :)\n\t    darwin_filelist=`find unfat-$$ -type f -name \\*.o -print -o -name \\*.lo -print | $SED -e \"$sed_basename\" | sort -u`\n\t    darwin_file=\n\t    darwin_files=\n\t    for darwin_file in $darwin_filelist; do\n\t      darwin_files=`find unfat-$$ -name $darwin_file -print | sort | $NL2SP`\n\t      $LIPO -create -output \"$darwin_file\" $darwin_files\n\t    done # $darwin_filelist\n\t    $RM -rf unfat-$$\n\t    cd \"$darwin_orig_dir\"\n\t  else\n\t    cd $darwin_orig_dir\n\t    func_extract_an_archive \"$my_xdir\" \"$my_xabs\"\n\t  fi # $darwin_arches\n\t} # !$opt_dry_run\n\t;;\n      *)\n        func_extract_an_archive \"$my_xdir\" \"$my_xabs\"\n\t;;\n      esac\n      my_oldobjs=\"$my_oldobjs \"`find $my_xdir -name \\*.$objext -print -o -name \\*.lo -print | sort | $NL2SP`\n    done\n\n    func_extract_archives_result=$my_oldobjs\n}\n\n\n# func_emit_wrapper [arg=no]\n#\n# Emit a libtool wrapper script on stdout.\n# Don't directly open a file because we may want to\n# incorporate the script contents within a cygwin/mingw\n# wrapper executable.  Must ONLY be called from within\n# func_mode_link because it depends on a number of variables\n# set therein.\n#\n# ARG is the value that the WRAPPER_SCRIPT_BELONGS_IN_OBJDIR\n# variable will take.  If 'yes', then the emitted script\n# will assume that the directory where it is stored is\n# the $objdir directory.  This is a cygwin/mingw-specific\n# behavior.\nfunc_emit_wrapper ()\n{\n\tfunc_emit_wrapper_arg1=${1-no}\n\n\t$ECHO \"\\\n#! $SHELL\n\n# $output - temporary wrapper script for $objdir/$outputname\n# Generated by $PROGRAM (GNU $PACKAGE) $VERSION\n#\n# The $output program cannot be directly executed until all the libtool\n# libraries that it depends on are installed.\n#\n# This wrapper script should never be moved out of the build directory.\n# If it is, it will not operate correctly.\n\n# Sed substitution that helps us do robust quoting.  It backslashifies\n# metacharacters that are still active within double-quoted strings.\nsed_quote_subst='$sed_quote_subst'\n\n# Be Bourne compatible\nif test -n \\\"\\${ZSH_VERSION+set}\\\" && (emulate sh) >/dev/null 2>&1; then\n  emulate sh\n  NULLCMD=:\n  # Zsh 3.x and 4.x performs word splitting on \\${1+\\\"\\$@\\\"}, which\n  # is contrary to our usage.  Disable this feature.\n  alias -g '\\${1+\\\"\\$@\\\"}'='\\\"\\$@\\\"'\n  setopt NO_GLOB_SUBST\nelse\n  case \\`(set -o) 2>/dev/null\\` in *posix*) set -o posix;; esac\nfi\nBIN_SH=xpg4; export BIN_SH # for Tru64\nDUALCASE=1; export DUALCASE # for MKS sh\n\n# The HP-UX ksh and POSIX shell print the target directory to stdout\n# if CDPATH is set.\n(unset CDPATH) >/dev/null 2>&1 && unset CDPATH\n\nrelink_command=\\\"$relink_command\\\"\n\n# This environment variable determines our operation mode.\nif test \\\"\\$libtool_install_magic\\\" = \\\"$magic\\\"; then\n  # install mode needs the following variables:\n  generated_by_libtool_version='$macro_version'\n  notinst_deplibs='$notinst_deplibs'\nelse\n  # When we are sourced in execute mode, \\$file and \\$ECHO are already set.\n  if test \\\"\\$libtool_execute_magic\\\" != \\\"$magic\\\"; then\n    file=\\\"\\$0\\\"\"\n\n    qECHO=`$ECHO \"$ECHO\" | $SED \"$sed_quote_subst\"`\n    $ECHO \"\\\n\n# A function that is used when there is no print builtin or printf.\nfunc_fallback_echo ()\n{\n  eval 'cat <<_LTECHO_EOF\n\\$1\n_LTECHO_EOF'\n}\n    ECHO=\\\"$qECHO\\\"\n  fi\n\n# Very basic option parsing. These options are (a) specific to\n# the libtool wrapper, (b) are identical between the wrapper\n# /script/ and the wrapper /executable/ that is used only on\n# windows platforms, and (c) all begin with the string \"--lt-\"\n# (application programs are unlikely to have options that match\n# this pattern).\n#\n# There are only two supported options: --lt-debug and\n# --lt-dump-script. There is, deliberately, no --lt-help.\n#\n# The first argument to this parsing function should be the\n# script's $0 value, followed by \"$@\".\nlt_option_debug=\nfunc_parse_lt_options ()\n{\n  lt_script_arg0=\\$0\n  shift\n  for lt_opt\n  do\n    case \\\"\\$lt_opt\\\" in\n    --lt-debug) lt_option_debug=1 ;;\n    --lt-dump-script)\n        lt_dump_D=\\`\\$ECHO \\\"X\\$lt_script_arg0\\\" | $SED -e 's/^X//' -e 's%/[^/]*$%%'\\`\n        test \\\"X\\$lt_dump_D\\\" = \\\"X\\$lt_script_arg0\\\" && lt_dump_D=.\n        lt_dump_F=\\`\\$ECHO \\\"X\\$lt_script_arg0\\\" | $SED -e 's/^X//' -e 's%^.*/%%'\\`\n        cat \\\"\\$lt_dump_D/\\$lt_dump_F\\\"\n        exit 0\n      ;;\n    --lt-*)\n        \\$ECHO \\\"Unrecognized --lt- option: '\\$lt_opt'\\\" 1>&2\n        exit 1\n      ;;\n    esac\n  done\n\n  # Print the debug banner immediately:\n  if test -n \\\"\\$lt_option_debug\\\"; then\n    echo \\\"$outputname:$output:\\$LINENO: libtool wrapper (GNU $PACKAGE) $VERSION\\\" 1>&2\n  fi\n}\n\n# Used when --lt-debug. Prints its arguments to stdout\n# (redirection is the responsibility of the caller)\nfunc_lt_dump_args ()\n{\n  lt_dump_args_N=1;\n  for lt_arg\n  do\n    \\$ECHO \\\"$outputname:$output:\\$LINENO: newargv[\\$lt_dump_args_N]: \\$lt_arg\\\"\n    lt_dump_args_N=\\`expr \\$lt_dump_args_N + 1\\`\n  done\n}\n\n# Core function for launching the target application\nfunc_exec_program_core ()\n{\n\"\n  case $host in\n  # Backslashes separate directories on plain windows\n  *-*-mingw | *-*-os2* | *-cegcc*)\n    $ECHO \"\\\n      if test -n \\\"\\$lt_option_debug\\\"; then\n        \\$ECHO \\\"$outputname:$output:\\$LINENO: newargv[0]: \\$progdir\\\\\\\\\\$program\\\" 1>&2\n        func_lt_dump_args \\${1+\\\"\\$@\\\"} 1>&2\n      fi\n      exec \\\"\\$progdir\\\\\\\\\\$program\\\" \\${1+\\\"\\$@\\\"}\n\"\n    ;;\n\n  *)\n    $ECHO \"\\\n      if test -n \\\"\\$lt_option_debug\\\"; then\n        \\$ECHO \\\"$outputname:$output:\\$LINENO: newargv[0]: \\$progdir/\\$program\\\" 1>&2\n        func_lt_dump_args \\${1+\\\"\\$@\\\"} 1>&2\n      fi\n      exec \\\"\\$progdir/\\$program\\\" \\${1+\\\"\\$@\\\"}\n\"\n    ;;\n  esac\n  $ECHO \"\\\n      \\$ECHO \\\"\\$0: cannot exec \\$program \\$*\\\" 1>&2\n      exit 1\n}\n\n# A function to encapsulate launching the target application\n# Strips options in the --lt-* namespace from \\$@ and\n# launches target application with the remaining arguments.\nfunc_exec_program ()\n{\n  case \\\" \\$* \\\" in\n  *\\\\ --lt-*)\n    for lt_wr_arg\n    do\n      case \\$lt_wr_arg in\n      --lt-*) ;;\n      *) set x \\\"\\$@\\\" \\\"\\$lt_wr_arg\\\"; shift;;\n      esac\n      shift\n    done ;;\n  esac\n  func_exec_program_core \\${1+\\\"\\$@\\\"}\n}\n\n  # Parse options\n  func_parse_lt_options \\\"\\$0\\\" \\${1+\\\"\\$@\\\"}\n\n  # Find the directory that this script lives in.\n  thisdir=\\`\\$ECHO \\\"\\$file\\\" | $SED 's%/[^/]*$%%'\\`\n  test \\\"x\\$thisdir\\\" = \\\"x\\$file\\\" && thisdir=.\n\n  # Follow symbolic links until we get to the real thisdir.\n  file=\\`ls -ld \\\"\\$file\\\" | $SED -n 's/.*-> //p'\\`\n  while test -n \\\"\\$file\\\"; do\n    destdir=\\`\\$ECHO \\\"\\$file\\\" | $SED 's%/[^/]*\\$%%'\\`\n\n    # If there was a directory component, then change thisdir.\n    if test \\\"x\\$destdir\\\" != \\\"x\\$file\\\"; then\n      case \\\"\\$destdir\\\" in\n      [\\\\\\\\/]* | [A-Za-z]:[\\\\\\\\/]*) thisdir=\\\"\\$destdir\\\" ;;\n      *) thisdir=\\\"\\$thisdir/\\$destdir\\\" ;;\n      esac\n    fi\n\n    file=\\`\\$ECHO \\\"\\$file\\\" | $SED 's%^.*/%%'\\`\n    file=\\`ls -ld \\\"\\$thisdir/\\$file\\\" | $SED -n 's/.*-> //p'\\`\n  done\n\n  # Usually 'no', except on cygwin/mingw when embedded into\n  # the cwrapper.\n  WRAPPER_SCRIPT_BELONGS_IN_OBJDIR=$func_emit_wrapper_arg1\n  if test \\\"\\$WRAPPER_SCRIPT_BELONGS_IN_OBJDIR\\\" = \\\"yes\\\"; then\n    # special case for '.'\n    if test \\\"\\$thisdir\\\" = \\\".\\\"; then\n      thisdir=\\`pwd\\`\n    fi\n    # remove .libs from thisdir\n    case \\\"\\$thisdir\\\" in\n    *[\\\\\\\\/]$objdir ) thisdir=\\`\\$ECHO \\\"\\$thisdir\\\" | $SED 's%[\\\\\\\\/][^\\\\\\\\/]*$%%'\\` ;;\n    $objdir )   thisdir=. ;;\n    esac\n  fi\n\n  # Try to get the absolute directory name.\n  absdir=\\`cd \\\"\\$thisdir\\\" && pwd\\`\n  test -n \\\"\\$absdir\\\" && thisdir=\\\"\\$absdir\\\"\n\"\n\n\tif test yes = \"$fast_install\"; then\n\t  $ECHO \"\\\n  program=lt-'$outputname'$exeext\n  progdir=\\\"\\$thisdir/$objdir\\\"\n\n  if test ! -f \\\"\\$progdir/\\$program\\\" ||\n     { file=\\`ls -1dt \\\"\\$progdir/\\$program\\\" \\\"\\$progdir/../\\$program\\\" 2>/dev/null | $SED 1q\\`; \\\\\n       test \\\"X\\$file\\\" != \\\"X\\$progdir/\\$program\\\"; }; then\n\n    file=\\\"\\$\\$-\\$program\\\"\n\n    if test ! -d \\\"\\$progdir\\\"; then\n      $MKDIR \\\"\\$progdir\\\"\n    else\n      $RM \\\"\\$progdir/\\$file\\\"\n    fi\"\n\n\t  $ECHO \"\\\n\n    # relink executable if necessary\n    if test -n \\\"\\$relink_command\\\"; then\n      if relink_command_output=\\`eval \\$relink_command 2>&1\\`; then :\n      else\n\t\\$ECHO \\\"\\$relink_command_output\\\" >&2\n\t$RM \\\"\\$progdir/\\$file\\\"\n\texit 1\n      fi\n    fi\n\n    $MV \\\"\\$progdir/\\$file\\\" \\\"\\$progdir/\\$program\\\" 2>/dev/null ||\n    { $RM \\\"\\$progdir/\\$program\\\";\n      $MV \\\"\\$progdir/\\$file\\\" \\\"\\$progdir/\\$program\\\"; }\n    $RM \\\"\\$progdir/\\$file\\\"\n  fi\"\n\telse\n\t  $ECHO \"\\\n  program='$outputname'\n  progdir=\\\"\\$thisdir/$objdir\\\"\n\"\n\tfi\n\n\t$ECHO \"\\\n\n  if test -f \\\"\\$progdir/\\$program\\\"; then\"\n\n\t# fixup the dll searchpath if we need to.\n\t#\n\t# Fix the DLL searchpath if we need to.  Do this before prepending\n\t# to shlibpath, because on Windows, both are PATH and uninstalled\n\t# libraries must come first.\n\tif test -n \"$dllsearchpath\"; then\n\t  $ECHO \"\\\n    # Add the dll search path components to the executable PATH\n    PATH=$dllsearchpath:\\$PATH\n\"\n\tfi\n\n\t# Export our shlibpath_var if we have one.\n\tif test yes = \"$shlibpath_overrides_runpath\" && test -n \"$shlibpath_var\" && test -n \"$temp_rpath\"; then\n\t  $ECHO \"\\\n    # Add our own library path to $shlibpath_var\n    $shlibpath_var=\\\"$temp_rpath\\$$shlibpath_var\\\"\n\n    # Some systems cannot cope with colon-terminated $shlibpath_var\n    # The second colon is a workaround for a bug in BeOS R4 sed\n    $shlibpath_var=\\`\\$ECHO \\\"\\$$shlibpath_var\\\" | $SED 's/::*\\$//'\\`\n\n    export $shlibpath_var\n\"\n\tfi\n\n\t$ECHO \"\\\n    if test \\\"\\$libtool_execute_magic\\\" != \\\"$magic\\\"; then\n      # Run the actual program with our arguments.\n      func_exec_program \\${1+\\\"\\$@\\\"}\n    fi\n  else\n    # The program doesn't exist.\n    \\$ECHO \\\"\\$0: error: '\\$progdir/\\$program' does not exist\\\" 1>&2\n    \\$ECHO \\\"This script is just a wrapper for \\$program.\\\" 1>&2\n    \\$ECHO \\\"See the $PACKAGE documentation for more information.\\\" 1>&2\n    exit 1\n  fi\nfi\\\n\"\n}\n\n\n# func_emit_cwrapperexe_src\n# emit the source code for a wrapper executable on stdout\n# Must ONLY be called from within func_mode_link because\n# it depends on a number of variable set therein.\nfunc_emit_cwrapperexe_src ()\n{\n\tcat <<EOF\n\n/* $cwrappersource - temporary wrapper executable for $objdir/$outputname\n   Generated by $PROGRAM (GNU $PACKAGE) $VERSION\n\n   The $output program cannot be directly executed until all the libtool\n   libraries that it depends on are installed.\n\n   This wrapper executable should never be moved out of the build directory.\n   If it is, it will not operate correctly.\n*/\nEOF\n\t    cat <<\"EOF\"\n#ifdef _MSC_VER\n# define _CRT_SECURE_NO_DEPRECATE 1\n#endif\n#include <stdio.h>\n#include <stdlib.h>\n#ifdef _MSC_VER\n# include <direct.h>\n# include <process.h>\n# include <io.h>\n#else\n# include <unistd.h>\n# include <stdint.h>\n# ifdef __CYGWIN__\n#  include <io.h>\n# endif\n#endif\n#include <malloc.h>\n#include <stdarg.h>\n#include <assert.h>\n#include <string.h>\n#include <ctype.h>\n#include <errno.h>\n#include <fcntl.h>\n#include <sys/stat.h>\n\n#define STREQ(s1, s2) (strcmp ((s1), (s2)) == 0)\n\n/* declarations of non-ANSI functions */\n#if defined __MINGW32__\n# ifdef __STRICT_ANSI__\nint _putenv (const char *);\n# endif\n#elif defined __CYGWIN__\n# ifdef __STRICT_ANSI__\nchar *realpath (const char *, char *);\nint putenv (char *);\nint setenv (const char *, const char *, int);\n# endif\n/* #elif defined other_platform || defined ... */\n#endif\n\n/* portability defines, excluding path handling macros */\n#if defined _MSC_VER\n# define setmode _setmode\n# define stat    _stat\n# define chmod   _chmod\n# define getcwd  _getcwd\n# define putenv  _putenv\n# define S_IXUSR _S_IEXEC\n#elif defined __MINGW32__\n# define setmode _setmode\n# define stat    _stat\n# define chmod   _chmod\n# define getcwd  _getcwd\n# define putenv  _putenv\n#elif defined __CYGWIN__\n# define HAVE_SETENV\n# define FOPEN_WB \"wb\"\n/* #elif defined other platforms ... */\n#endif\n\n#if defined PATH_MAX\n# define LT_PATHMAX PATH_MAX\n#elif defined MAXPATHLEN\n# define LT_PATHMAX MAXPATHLEN\n#else\n# define LT_PATHMAX 1024\n#endif\n\n#ifndef S_IXOTH\n# define S_IXOTH 0\n#endif\n#ifndef S_IXGRP\n# define S_IXGRP 0\n#endif\n\n/* path handling portability macros */\n#ifndef DIR_SEPARATOR\n# define DIR_SEPARATOR '/'\n# define PATH_SEPARATOR ':'\n#endif\n\n#if defined _WIN32 || defined __MSDOS__ || defined __DJGPP__ || \\\n  defined __OS2__\n# define HAVE_DOS_BASED_FILE_SYSTEM\n# define FOPEN_WB \"wb\"\n# ifndef DIR_SEPARATOR_2\n#  define DIR_SEPARATOR_2 '\\\\'\n# endif\n# ifndef PATH_SEPARATOR_2\n#  define PATH_SEPARATOR_2 ';'\n# endif\n#endif\n\n#ifndef DIR_SEPARATOR_2\n# define IS_DIR_SEPARATOR(ch) ((ch) == DIR_SEPARATOR)\n#else /* DIR_SEPARATOR_2 */\n# define IS_DIR_SEPARATOR(ch) \\\n\t(((ch) == DIR_SEPARATOR) || ((ch) == DIR_SEPARATOR_2))\n#endif /* DIR_SEPARATOR_2 */\n\n#ifndef PATH_SEPARATOR_2\n# define IS_PATH_SEPARATOR(ch) ((ch) == PATH_SEPARATOR)\n#else /* PATH_SEPARATOR_2 */\n# define IS_PATH_SEPARATOR(ch) ((ch) == PATH_SEPARATOR_2)\n#endif /* PATH_SEPARATOR_2 */\n\n#ifndef FOPEN_WB\n# define FOPEN_WB \"w\"\n#endif\n#ifndef _O_BINARY\n# define _O_BINARY 0\n#endif\n\n#define XMALLOC(type, num)      ((type *) xmalloc ((num) * sizeof(type)))\n#define XFREE(stale) do { \\\n  if (stale) { free (stale); stale = 0; } \\\n} while (0)\n\n#if defined LT_DEBUGWRAPPER\nstatic int lt_debug = 1;\n#else\nstatic int lt_debug = 0;\n#endif\n\nconst char *program_name = \"libtool-wrapper\"; /* in case xstrdup fails */\n\nvoid *xmalloc (size_t num);\nchar *xstrdup (const char *string);\nconst char *base_name (const char *name);\nchar *find_executable (const char *wrapper);\nchar *chase_symlinks (const char *pathspec);\nint make_executable (const char *path);\nint check_executable (const char *path);\nchar *strendzap (char *str, const char *pat);\nvoid lt_debugprintf (const char *file, int line, const char *fmt, ...);\nvoid lt_fatal (const char *file, int line, const char *message, ...);\nstatic const char *nonnull (const char *s);\nstatic const char *nonempty (const char *s);\nvoid lt_setenv (const char *name, const char *value);\nchar *lt_extend_str (const char *orig_value, const char *add, int to_end);\nvoid lt_update_exe_path (const char *name, const char *value);\nvoid lt_update_lib_path (const char *name, const char *value);\nchar **prepare_spawn (char **argv);\nvoid lt_dump_script (FILE *f);\nEOF\n\n\t    cat <<EOF\n#if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 5)\n# define externally_visible volatile\n#else\n# define externally_visible __attribute__((externally_visible)) volatile\n#endif\nexternally_visible const char * MAGIC_EXE = \"$magic_exe\";\nconst char * LIB_PATH_VARNAME = \"$shlibpath_var\";\nEOF\n\n\t    if test yes = \"$shlibpath_overrides_runpath\" && test -n \"$shlibpath_var\" && test -n \"$temp_rpath\"; then\n              func_to_host_path \"$temp_rpath\"\n\t      cat <<EOF\nconst char * LIB_PATH_VALUE   = \"$func_to_host_path_result\";\nEOF\n\t    else\n\t      cat <<\"EOF\"\nconst char * LIB_PATH_VALUE   = \"\";\nEOF\n\t    fi\n\n\t    if test -n \"$dllsearchpath\"; then\n              func_to_host_path \"$dllsearchpath:\"\n\t      cat <<EOF\nconst char * EXE_PATH_VARNAME = \"PATH\";\nconst char * EXE_PATH_VALUE   = \"$func_to_host_path_result\";\nEOF\n\t    else\n\t      cat <<\"EOF\"\nconst char * EXE_PATH_VARNAME = \"\";\nconst char * EXE_PATH_VALUE   = \"\";\nEOF\n\t    fi\n\n\t    if test yes = \"$fast_install\"; then\n\t      cat <<EOF\nconst char * TARGET_PROGRAM_NAME = \"lt-$outputname\"; /* hopefully, no .exe */\nEOF\n\t    else\n\t      cat <<EOF\nconst char * TARGET_PROGRAM_NAME = \"$outputname\"; /* hopefully, no .exe */\nEOF\n\t    fi\n\n\n\t    cat <<\"EOF\"\n\n#define LTWRAPPER_OPTION_PREFIX         \"--lt-\"\n\nstatic const char *ltwrapper_option_prefix = LTWRAPPER_OPTION_PREFIX;\nstatic const char *dumpscript_opt       = LTWRAPPER_OPTION_PREFIX \"dump-script\";\nstatic const char *debug_opt            = LTWRAPPER_OPTION_PREFIX \"debug\";\n\nint\nmain (int argc, char *argv[])\n{\n  char **newargz;\n  int  newargc;\n  char *tmp_pathspec;\n  char *actual_cwrapper_path;\n  char *actual_cwrapper_name;\n  char *target_name;\n  char *lt_argv_zero;\n  int rval = 127;\n\n  int i;\n\n  program_name = (char *) xstrdup (base_name (argv[0]));\n  newargz = XMALLOC (char *, (size_t) argc + 1);\n\n  /* very simple arg parsing; don't want to rely on getopt\n   * also, copy all non cwrapper options to newargz, except\n   * argz[0], which is handled differently\n   */\n  newargc=0;\n  for (i = 1; i < argc; i++)\n    {\n      if (STREQ (argv[i], dumpscript_opt))\n\t{\nEOF\n\t    case $host in\n\t      *mingw* | *cygwin* )\n\t\t# make stdout use \"unix\" line endings\n\t\techo \"          setmode(1,_O_BINARY);\"\n\t\t;;\n\t      esac\n\n\t    cat <<\"EOF\"\n\t  lt_dump_script (stdout);\n\t  return 0;\n\t}\n      if (STREQ (argv[i], debug_opt))\n\t{\n          lt_debug = 1;\n          continue;\n\t}\n      if (STREQ (argv[i], ltwrapper_option_prefix))\n        {\n          /* however, if there is an option in the LTWRAPPER_OPTION_PREFIX\n             namespace, but it is not one of the ones we know about and\n             have already dealt with, above (inluding dump-script), then\n             report an error. Otherwise, targets might begin to believe\n             they are allowed to use options in the LTWRAPPER_OPTION_PREFIX\n             namespace. The first time any user complains about this, we'll\n             need to make LTWRAPPER_OPTION_PREFIX a configure-time option\n             or a configure.ac-settable value.\n           */\n          lt_fatal (__FILE__, __LINE__,\n\t\t    \"unrecognized %s option: '%s'\",\n                    ltwrapper_option_prefix, argv[i]);\n        }\n      /* otherwise ... */\n      newargz[++newargc] = xstrdup (argv[i]);\n    }\n  newargz[++newargc] = NULL;\n\nEOF\n\t    cat <<EOF\n  /* The GNU banner must be the first non-error debug message */\n  lt_debugprintf (__FILE__, __LINE__, \"libtool wrapper (GNU $PACKAGE) $VERSION\\n\");\nEOF\n\t    cat <<\"EOF\"\n  lt_debugprintf (__FILE__, __LINE__, \"(main) argv[0]: %s\\n\", argv[0]);\n  lt_debugprintf (__FILE__, __LINE__, \"(main) program_name: %s\\n\", program_name);\n\n  tmp_pathspec = find_executable (argv[0]);\n  if (tmp_pathspec == NULL)\n    lt_fatal (__FILE__, __LINE__, \"couldn't find %s\", argv[0]);\n  lt_debugprintf (__FILE__, __LINE__,\n                  \"(main) found exe (before symlink chase) at: %s\\n\",\n\t\t  tmp_pathspec);\n\n  actual_cwrapper_path = chase_symlinks (tmp_pathspec);\n  lt_debugprintf (__FILE__, __LINE__,\n                  \"(main) found exe (after symlink chase) at: %s\\n\",\n\t\t  actual_cwrapper_path);\n  XFREE (tmp_pathspec);\n\n  actual_cwrapper_name = xstrdup (base_name (actual_cwrapper_path));\n  strendzap (actual_cwrapper_path, actual_cwrapper_name);\n\n  /* wrapper name transforms */\n  strendzap (actual_cwrapper_name, \".exe\");\n  tmp_pathspec = lt_extend_str (actual_cwrapper_name, \".exe\", 1);\n  XFREE (actual_cwrapper_name);\n  actual_cwrapper_name = tmp_pathspec;\n  tmp_pathspec = 0;\n\n  /* target_name transforms -- use actual target program name; might have lt- prefix */\n  target_name = xstrdup (base_name (TARGET_PROGRAM_NAME));\n  strendzap (target_name, \".exe\");\n  tmp_pathspec = lt_extend_str (target_name, \".exe\", 1);\n  XFREE (target_name);\n  target_name = tmp_pathspec;\n  tmp_pathspec = 0;\n\n  lt_debugprintf (__FILE__, __LINE__,\n\t\t  \"(main) libtool target name: %s\\n\",\n\t\t  target_name);\nEOF\n\n\t    cat <<EOF\n  newargz[0] =\n    XMALLOC (char, (strlen (actual_cwrapper_path) +\n\t\t    strlen (\"$objdir\") + 1 + strlen (actual_cwrapper_name) + 1));\n  strcpy (newargz[0], actual_cwrapper_path);\n  strcat (newargz[0], \"$objdir\");\n  strcat (newargz[0], \"/\");\nEOF\n\n\t    cat <<\"EOF\"\n  /* stop here, and copy so we don't have to do this twice */\n  tmp_pathspec = xstrdup (newargz[0]);\n\n  /* do NOT want the lt- prefix here, so use actual_cwrapper_name */\n  strcat (newargz[0], actual_cwrapper_name);\n\n  /* DO want the lt- prefix here if it exists, so use target_name */\n  lt_argv_zero = lt_extend_str (tmp_pathspec, target_name, 1);\n  XFREE (tmp_pathspec);\n  tmp_pathspec = NULL;\nEOF\n\n\t    case $host_os in\n\t      mingw*)\n\t    cat <<\"EOF\"\n  {\n    char* p;\n    while ((p = strchr (newargz[0], '\\\\')) != NULL)\n      {\n\t*p = '/';\n      }\n    while ((p = strchr (lt_argv_zero, '\\\\')) != NULL)\n      {\n\t*p = '/';\n      }\n  }\nEOF\n\t    ;;\n\t    esac\n\n\t    cat <<\"EOF\"\n  XFREE (target_name);\n  XFREE (actual_cwrapper_path);\n  XFREE (actual_cwrapper_name);\n\n  lt_setenv (\"BIN_SH\", \"xpg4\"); /* for Tru64 */\n  lt_setenv (\"DUALCASE\", \"1\");  /* for MSK sh */\n  /* Update the DLL searchpath.  EXE_PATH_VALUE ($dllsearchpath) must\n     be prepended before (that is, appear after) LIB_PATH_VALUE ($temp_rpath)\n     because on Windows, both *_VARNAMEs are PATH but uninstalled\n     libraries must come first. */\n  lt_update_exe_path (EXE_PATH_VARNAME, EXE_PATH_VALUE);\n  lt_update_lib_path (LIB_PATH_VARNAME, LIB_PATH_VALUE);\n\n  lt_debugprintf (__FILE__, __LINE__, \"(main) lt_argv_zero: %s\\n\",\n\t\t  nonnull (lt_argv_zero));\n  for (i = 0; i < newargc; i++)\n    {\n      lt_debugprintf (__FILE__, __LINE__, \"(main) newargz[%d]: %s\\n\",\n\t\t      i, nonnull (newargz[i]));\n    }\n\nEOF\n\n\t    case $host_os in\n\t      mingw*)\n\t\tcat <<\"EOF\"\n  /* execv doesn't actually work on mingw as expected on unix */\n  newargz = prepare_spawn (newargz);\n  rval = (int) _spawnv (_P_WAIT, lt_argv_zero, (const char * const *) newargz);\n  if (rval == -1)\n    {\n      /* failed to start process */\n      lt_debugprintf (__FILE__, __LINE__,\n\t\t      \"(main) failed to launch target \\\"%s\\\": %s\\n\",\n\t\t      lt_argv_zero, nonnull (strerror (errno)));\n      return 127;\n    }\n  return rval;\nEOF\n\t\t;;\n\t      *)\n\t\tcat <<\"EOF\"\n  execv (lt_argv_zero, newargz);\n  return rval; /* =127, but avoids unused variable warning */\nEOF\n\t\t;;\n\t    esac\n\n\t    cat <<\"EOF\"\n}\n\nvoid *\nxmalloc (size_t num)\n{\n  void *p = (void *) malloc (num);\n  if (!p)\n    lt_fatal (__FILE__, __LINE__, \"memory exhausted\");\n\n  return p;\n}\n\nchar *\nxstrdup (const char *string)\n{\n  return string ? strcpy ((char *) xmalloc (strlen (string) + 1),\n\t\t\t  string) : NULL;\n}\n\nconst char *\nbase_name (const char *name)\n{\n  const char *base;\n\n#if defined HAVE_DOS_BASED_FILE_SYSTEM\n  /* Skip over the disk name in MSDOS pathnames. */\n  if (isalpha ((unsigned char) name[0]) && name[1] == ':')\n    name += 2;\n#endif\n\n  for (base = name; *name; name++)\n    if (IS_DIR_SEPARATOR (*name))\n      base = name + 1;\n  return base;\n}\n\nint\ncheck_executable (const char *path)\n{\n  struct stat st;\n\n  lt_debugprintf (__FILE__, __LINE__, \"(check_executable): %s\\n\",\n                  nonempty (path));\n  if ((!path) || (!*path))\n    return 0;\n\n  if ((stat (path, &st) >= 0)\n      && (st.st_mode & (S_IXUSR | S_IXGRP | S_IXOTH)))\n    return 1;\n  else\n    return 0;\n}\n\nint\nmake_executable (const char *path)\n{\n  int rval = 0;\n  struct stat st;\n\n  lt_debugprintf (__FILE__, __LINE__, \"(make_executable): %s\\n\",\n                  nonempty (path));\n  if ((!path) || (!*path))\n    return 0;\n\n  if (stat (path, &st) >= 0)\n    {\n      rval = chmod (path, st.st_mode | S_IXOTH | S_IXGRP | S_IXUSR);\n    }\n  return rval;\n}\n\n/* Searches for the full path of the wrapper.  Returns\n   newly allocated full path name if found, NULL otherwise\n   Does not chase symlinks, even on platforms that support them.\n*/\nchar *\nfind_executable (const char *wrapper)\n{\n  int has_slash = 0;\n  const char *p;\n  const char *p_next;\n  /* static buffer for getcwd */\n  char tmp[LT_PATHMAX + 1];\n  size_t tmp_len;\n  char *concat_name;\n\n  lt_debugprintf (__FILE__, __LINE__, \"(find_executable): %s\\n\",\n                  nonempty (wrapper));\n\n  if ((wrapper == NULL) || (*wrapper == '\\0'))\n    return NULL;\n\n  /* Absolute path? */\n#if defined HAVE_DOS_BASED_FILE_SYSTEM\n  if (isalpha ((unsigned char) wrapper[0]) && wrapper[1] == ':')\n    {\n      concat_name = xstrdup (wrapper);\n      if (check_executable (concat_name))\n\treturn concat_name;\n      XFREE (concat_name);\n    }\n  else\n    {\n#endif\n      if (IS_DIR_SEPARATOR (wrapper[0]))\n\t{\n\t  concat_name = xstrdup (wrapper);\n\t  if (check_executable (concat_name))\n\t    return concat_name;\n\t  XFREE (concat_name);\n\t}\n#if defined HAVE_DOS_BASED_FILE_SYSTEM\n    }\n#endif\n\n  for (p = wrapper; *p; p++)\n    if (*p == '/')\n      {\n\thas_slash = 1;\n\tbreak;\n      }\n  if (!has_slash)\n    {\n      /* no slashes; search PATH */\n      const char *path = getenv (\"PATH\");\n      if (path != NULL)\n\t{\n\t  for (p = path; *p; p = p_next)\n\t    {\n\t      const char *q;\n\t      size_t p_len;\n\t      for (q = p; *q; q++)\n\t\tif (IS_PATH_SEPARATOR (*q))\n\t\t  break;\n\t      p_len = (size_t) (q - p);\n\t      p_next = (*q == '\\0' ? q : q + 1);\n\t      if (p_len == 0)\n\t\t{\n\t\t  /* empty path: current directory */\n\t\t  if (getcwd (tmp, LT_PATHMAX) == NULL)\n\t\t    lt_fatal (__FILE__, __LINE__, \"getcwd failed: %s\",\n                              nonnull (strerror (errno)));\n\t\t  tmp_len = strlen (tmp);\n\t\t  concat_name =\n\t\t    XMALLOC (char, tmp_len + 1 + strlen (wrapper) + 1);\n\t\t  memcpy (concat_name, tmp, tmp_len);\n\t\t  concat_name[tmp_len] = '/';\n\t\t  strcpy (concat_name + tmp_len + 1, wrapper);\n\t\t}\n\t      else\n\t\t{\n\t\t  concat_name =\n\t\t    XMALLOC (char, p_len + 1 + strlen (wrapper) + 1);\n\t\t  memcpy (concat_name, p, p_len);\n\t\t  concat_name[p_len] = '/';\n\t\t  strcpy (concat_name + p_len + 1, wrapper);\n\t\t}\n\t      if (check_executable (concat_name))\n\t\treturn concat_name;\n\t      XFREE (concat_name);\n\t    }\n\t}\n      /* not found in PATH; assume curdir */\n    }\n  /* Relative path | not found in path: prepend cwd */\n  if (getcwd (tmp, LT_PATHMAX) == NULL)\n    lt_fatal (__FILE__, __LINE__, \"getcwd failed: %s\",\n              nonnull (strerror (errno)));\n  tmp_len = strlen (tmp);\n  concat_name = XMALLOC (char, tmp_len + 1 + strlen (wrapper) + 1);\n  memcpy (concat_name, tmp, tmp_len);\n  concat_name[tmp_len] = '/';\n  strcpy (concat_name + tmp_len + 1, wrapper);\n\n  if (check_executable (concat_name))\n    return concat_name;\n  XFREE (concat_name);\n  return NULL;\n}\n\nchar *\nchase_symlinks (const char *pathspec)\n{\n#ifndef S_ISLNK\n  return xstrdup (pathspec);\n#else\n  char buf[LT_PATHMAX];\n  struct stat s;\n  char *tmp_pathspec = xstrdup (pathspec);\n  char *p;\n  int has_symlinks = 0;\n  while (strlen (tmp_pathspec) && !has_symlinks)\n    {\n      lt_debugprintf (__FILE__, __LINE__,\n\t\t      \"checking path component for symlinks: %s\\n\",\n\t\t      tmp_pathspec);\n      if (lstat (tmp_pathspec, &s) == 0)\n\t{\n\t  if (S_ISLNK (s.st_mode) != 0)\n\t    {\n\t      has_symlinks = 1;\n\t      break;\n\t    }\n\n\t  /* search backwards for last DIR_SEPARATOR */\n\t  p = tmp_pathspec + strlen (tmp_pathspec) - 1;\n\t  while ((p > tmp_pathspec) && (!IS_DIR_SEPARATOR (*p)))\n\t    p--;\n\t  if ((p == tmp_pathspec) && (!IS_DIR_SEPARATOR (*p)))\n\t    {\n\t      /* no more DIR_SEPARATORS left */\n\t      break;\n\t    }\n\t  *p = '\\0';\n\t}\n      else\n\t{\n\t  lt_fatal (__FILE__, __LINE__,\n\t\t    \"error accessing file \\\"%s\\\": %s\",\n\t\t    tmp_pathspec, nonnull (strerror (errno)));\n\t}\n    }\n  XFREE (tmp_pathspec);\n\n  if (!has_symlinks)\n    {\n      return xstrdup (pathspec);\n    }\n\n  tmp_pathspec = realpath (pathspec, buf);\n  if (tmp_pathspec == 0)\n    {\n      lt_fatal (__FILE__, __LINE__,\n\t\t\"could not follow symlinks for %s\", pathspec);\n    }\n  return xstrdup (tmp_pathspec);\n#endif\n}\n\nchar *\nstrendzap (char *str, const char *pat)\n{\n  size_t len, patlen;\n\n  assert (str != NULL);\n  assert (pat != NULL);\n\n  len = strlen (str);\n  patlen = strlen (pat);\n\n  if (patlen <= len)\n    {\n      str += len - patlen;\n      if (STREQ (str, pat))\n\t*str = '\\0';\n    }\n  return str;\n}\n\nvoid\nlt_debugprintf (const char *file, int line, const char *fmt, ...)\n{\n  va_list args;\n  if (lt_debug)\n    {\n      (void) fprintf (stderr, \"%s:%s:%d: \", program_name, file, line);\n      va_start (args, fmt);\n      (void) vfprintf (stderr, fmt, args);\n      va_end (args);\n    }\n}\n\nstatic void\nlt_error_core (int exit_status, const char *file,\n\t       int line, const char *mode,\n\t       const char *message, va_list ap)\n{\n  fprintf (stderr, \"%s:%s:%d: %s: \", program_name, file, line, mode);\n  vfprintf (stderr, message, ap);\n  fprintf (stderr, \".\\n\");\n\n  if (exit_status >= 0)\n    exit (exit_status);\n}\n\nvoid\nlt_fatal (const char *file, int line, const char *message, ...)\n{\n  va_list ap;\n  va_start (ap, message);\n  lt_error_core (EXIT_FAILURE, file, line, \"FATAL\", message, ap);\n  va_end (ap);\n}\n\nstatic const char *\nnonnull (const char *s)\n{\n  return s ? s : \"(null)\";\n}\n\nstatic const char *\nnonempty (const char *s)\n{\n  return (s && !*s) ? \"(empty)\" : nonnull (s);\n}\n\nvoid\nlt_setenv (const char *name, const char *value)\n{\n  lt_debugprintf (__FILE__, __LINE__,\n\t\t  \"(lt_setenv) setting '%s' to '%s'\\n\",\n                  nonnull (name), nonnull (value));\n  {\n#ifdef HAVE_SETENV\n    /* always make a copy, for consistency with !HAVE_SETENV */\n    char *str = xstrdup (value);\n    setenv (name, str, 1);\n#else\n    size_t len = strlen (name) + 1 + strlen (value) + 1;\n    char *str = XMALLOC (char, len);\n    sprintf (str, \"%s=%s\", name, value);\n    if (putenv (str) != EXIT_SUCCESS)\n      {\n        XFREE (str);\n      }\n#endif\n  }\n}\n\nchar *\nlt_extend_str (const char *orig_value, const char *add, int to_end)\n{\n  char *new_value;\n  if (orig_value && *orig_value)\n    {\n      size_t orig_value_len = strlen (orig_value);\n      size_t add_len = strlen (add);\n      new_value = XMALLOC (char, add_len + orig_value_len + 1);\n      if (to_end)\n        {\n          strcpy (new_value, orig_value);\n          strcpy (new_value + orig_value_len, add);\n        }\n      else\n        {\n          strcpy (new_value, add);\n          strcpy (new_value + add_len, orig_value);\n        }\n    }\n  else\n    {\n      new_value = xstrdup (add);\n    }\n  return new_value;\n}\n\nvoid\nlt_update_exe_path (const char *name, const char *value)\n{\n  lt_debugprintf (__FILE__, __LINE__,\n\t\t  \"(lt_update_exe_path) modifying '%s' by prepending '%s'\\n\",\n                  nonnull (name), nonnull (value));\n\n  if (name && *name && value && *value)\n    {\n      char *new_value = lt_extend_str (getenv (name), value, 0);\n      /* some systems can't cope with a ':'-terminated path #' */\n      size_t len = strlen (new_value);\n      while ((len > 0) && IS_PATH_SEPARATOR (new_value[len-1]))\n        {\n          new_value[--len] = '\\0';\n        }\n      lt_setenv (name, new_value);\n      XFREE (new_value);\n    }\n}\n\nvoid\nlt_update_lib_path (const char *name, const char *value)\n{\n  lt_debugprintf (__FILE__, __LINE__,\n\t\t  \"(lt_update_lib_path) modifying '%s' by prepending '%s'\\n\",\n                  nonnull (name), nonnull (value));\n\n  if (name && *name && value && *value)\n    {\n      char *new_value = lt_extend_str (getenv (name), value, 0);\n      lt_setenv (name, new_value);\n      XFREE (new_value);\n    }\n}\n\nEOF\n\t    case $host_os in\n\t      mingw*)\n\t\tcat <<\"EOF\"\n\n/* Prepares an argument vector before calling spawn().\n   Note that spawn() does not by itself call the command interpreter\n     (getenv (\"COMSPEC\") != NULL ? getenv (\"COMSPEC\") :\n      ({ OSVERSIONINFO v; v.dwOSVersionInfoSize = sizeof(OSVERSIONINFO);\n         GetVersionEx(&v);\n         v.dwPlatformId == VER_PLATFORM_WIN32_NT;\n      }) ? \"cmd.exe\" : \"command.com\").\n   Instead it simply concatenates the arguments, separated by ' ', and calls\n   CreateProcess().  We must quote the arguments since Win32 CreateProcess()\n   interprets characters like ' ', '\\t', '\\\\', '\"' (but not '<' and '>') in a\n   special way:\n   - Space and tab are interpreted as delimiters. They are not treated as\n     delimiters if they are surrounded by double quotes: \"...\".\n   - Unescaped double quotes are removed from the input. Their only effect is\n     that within double quotes, space and tab are treated like normal\n     characters.\n   - Backslashes not followed by double quotes are not special.\n   - But 2*n+1 backslashes followed by a double quote become\n     n backslashes followed by a double quote (n >= 0):\n       \\\" -> \"\n       \\\\\\\" -> \\\"\n       \\\\\\\\\\\" -> \\\\\"\n */\n#define SHELL_SPECIAL_CHARS \"\\\"\\\\ \\001\\002\\003\\004\\005\\006\\007\\010\\011\\012\\013\\014\\015\\016\\017\\020\\021\\022\\023\\024\\025\\026\\027\\030\\031\\032\\033\\034\\035\\036\\037\"\n#define SHELL_SPACE_CHARS \" \\001\\002\\003\\004\\005\\006\\007\\010\\011\\012\\013\\014\\015\\016\\017\\020\\021\\022\\023\\024\\025\\026\\027\\030\\031\\032\\033\\034\\035\\036\\037\"\nchar **\nprepare_spawn (char **argv)\n{\n  size_t argc;\n  char **new_argv;\n  size_t i;\n\n  /* Count number of arguments.  */\n  for (argc = 0; argv[argc] != NULL; argc++)\n    ;\n\n  /* Allocate new argument vector.  */\n  new_argv = XMALLOC (char *, argc + 1);\n\n  /* Put quoted arguments into the new argument vector.  */\n  for (i = 0; i < argc; i++)\n    {\n      const char *string = argv[i];\n\n      if (string[0] == '\\0')\n\tnew_argv[i] = xstrdup (\"\\\"\\\"\");\n      else if (strpbrk (string, SHELL_SPECIAL_CHARS) != NULL)\n\t{\n\t  int quote_around = (strpbrk (string, SHELL_SPACE_CHARS) != NULL);\n\t  size_t length;\n\t  unsigned int backslashes;\n\t  const char *s;\n\t  char *quoted_string;\n\t  char *p;\n\n\t  length = 0;\n\t  backslashes = 0;\n\t  if (quote_around)\n\t    length++;\n\t  for (s = string; *s != '\\0'; s++)\n\t    {\n\t      char c = *s;\n\t      if (c == '\"')\n\t\tlength += backslashes + 1;\n\t      length++;\n\t      if (c == '\\\\')\n\t\tbackslashes++;\n\t      else\n\t\tbackslashes = 0;\n\t    }\n\t  if (quote_around)\n\t    length += backslashes + 1;\n\n\t  quoted_string = XMALLOC (char, length + 1);\n\n\t  p = quoted_string;\n\t  backslashes = 0;\n\t  if (quote_around)\n\t    *p++ = '\"';\n\t  for (s = string; *s != '\\0'; s++)\n\t    {\n\t      char c = *s;\n\t      if (c == '\"')\n\t\t{\n\t\t  unsigned int j;\n\t\t  for (j = backslashes + 1; j > 0; j--)\n\t\t    *p++ = '\\\\';\n\t\t}\n\t      *p++ = c;\n\t      if (c == '\\\\')\n\t\tbackslashes++;\n\t      else\n\t\tbackslashes = 0;\n\t    }\n\t  if (quote_around)\n\t    {\n\t      unsigned int j;\n\t      for (j = backslashes; j > 0; j--)\n\t\t*p++ = '\\\\';\n\t      *p++ = '\"';\n\t    }\n\t  *p = '\\0';\n\n\t  new_argv[i] = quoted_string;\n\t}\n      else\n\tnew_argv[i] = (char *) string;\n    }\n  new_argv[argc] = NULL;\n\n  return new_argv;\n}\nEOF\n\t\t;;\n\t    esac\n\n            cat <<\"EOF\"\nvoid lt_dump_script (FILE* f)\n{\nEOF\n\t    func_emit_wrapper yes |\n\t      $SED -n -e '\ns/^\\(.\\{79\\}\\)\\(..*\\)/\\1\\\n\\2/\nh\ns/\\([\\\\\"]\\)/\\\\\\1/g\ns/$/\\\\n/\ns/\\([^\\n]*\\).*/  fputs (\"\\1\", f);/p\ng\nD'\n            cat <<\"EOF\"\n}\nEOF\n}\n# end: func_emit_cwrapperexe_src\n\n# func_win32_import_lib_p ARG\n# True if ARG is an import lib, as indicated by $file_magic_cmd\nfunc_win32_import_lib_p ()\n{\n    $debug_cmd\n\n    case `eval $file_magic_cmd \\\"\\$1\\\" 2>/dev/null | $SED -e 10q` in\n    *import*) : ;;\n    *) false ;;\n    esac\n}\n\n# func_suncc_cstd_abi\n# !!ONLY CALL THIS FOR SUN CC AFTER $compile_command IS FULLY EXPANDED!!\n# Several compiler flags select an ABI that is incompatible with the\n# Cstd library. Avoid specifying it if any are in CXXFLAGS.\nfunc_suncc_cstd_abi ()\n{\n    $debug_cmd\n\n    case \" $compile_command \" in\n    *\" -compat=g \"*|*\\ -std=c++[0-9][0-9]\\ *|*\" -library=stdcxx4 \"*|*\" -library=stlport4 \"*)\n      suncc_use_cstd_abi=no\n      ;;\n    *)\n      suncc_use_cstd_abi=yes\n      ;;\n    esac\n}\n\n# func_mode_link arg...\nfunc_mode_link ()\n{\n    $debug_cmd\n\n    case $host in\n    *-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-os2* | *-cegcc*)\n      # It is impossible to link a dll without this setting, and\n      # we shouldn't force the makefile maintainer to figure out\n      # what system we are compiling for in order to pass an extra\n      # flag for every libtool invocation.\n      # allow_undefined=no\n\n      # FIXME: Unfortunately, there are problems with the above when trying\n      # to make a dll that has undefined symbols, in which case not\n      # even a static library is built.  For now, we need to specify\n      # -no-undefined on the libtool link line when we can be certain\n      # that all symbols are satisfied, otherwise we get a static library.\n      allow_undefined=yes\n      ;;\n    *)\n      allow_undefined=yes\n      ;;\n    esac\n    libtool_args=$nonopt\n    base_compile=\"$nonopt $@\"\n    compile_command=$nonopt\n    finalize_command=$nonopt\n\n    compile_rpath=\n    finalize_rpath=\n    compile_shlibpath=\n    finalize_shlibpath=\n    convenience=\n    old_convenience=\n    deplibs=\n    old_deplibs=\n    compiler_flags=\n    linker_flags=\n    dllsearchpath=\n    lib_search_path=`pwd`\n    inst_prefix_dir=\n    new_inherited_linker_flags=\n\n    avoid_version=no\n    bindir=\n    dlfiles=\n    dlprefiles=\n    dlself=no\n    export_dynamic=no\n    export_symbols=\n    export_symbols_regex=\n    generated=\n    libobjs=\n    ltlibs=\n    module=no\n    no_install=no\n    objs=\n    os2dllname=\n    non_pic_objects=\n    precious_files_regex=\n    prefer_static_libs=no\n    preload=false\n    prev=\n    prevarg=\n    release=\n    rpath=\n    xrpath=\n    perm_rpath=\n    temp_rpath=\n    thread_safe=no\n    vinfo=\n    vinfo_number=no\n    weak_libs=\n    single_module=$wl-single_module\n    func_infer_tag $base_compile\n\n    # We need to know -static, to get the right output filenames.\n    for arg\n    do\n      case $arg in\n      -shared)\n\ttest yes != \"$build_libtool_libs\" \\\n\t  && func_fatal_configuration \"cannot build a shared library\"\n\tbuild_old_libs=no\n\tbreak\n\t;;\n      -all-static | -static | -static-libtool-libs)\n\tcase $arg in\n\t-all-static)\n\t  if test yes = \"$build_libtool_libs\" && test -z \"$link_static_flag\"; then\n\t    func_warning \"complete static linking is impossible in this configuration\"\n\t  fi\n\t  if test -n \"$link_static_flag\"; then\n\t    dlopen_self=$dlopen_self_static\n\t  fi\n\t  prefer_static_libs=yes\n\t  ;;\n\t-static)\n\t  if test -z \"$pic_flag\" && test -n \"$link_static_flag\"; then\n\t    dlopen_self=$dlopen_self_static\n\t  fi\n\t  prefer_static_libs=built\n\t  ;;\n\t-static-libtool-libs)\n\t  if test -z \"$pic_flag\" && test -n \"$link_static_flag\"; then\n\t    dlopen_self=$dlopen_self_static\n\t  fi\n\t  prefer_static_libs=yes\n\t  ;;\n\tesac\n\tbuild_libtool_libs=no\n\tbuild_old_libs=yes\n\tbreak\n\t;;\n      esac\n    done\n\n    # See if our shared archives depend on static archives.\n    test -n \"$old_archive_from_new_cmds\" && build_old_libs=yes\n\n    # Go through the arguments, transforming them on the way.\n    while test \"$#\" -gt 0; do\n      arg=$1\n      shift\n      func_quote_for_eval \"$arg\"\n      qarg=$func_quote_for_eval_unquoted_result\n      func_append libtool_args \" $func_quote_for_eval_result\"\n\n      # If the previous option needs an argument, assign it.\n      if test -n \"$prev\"; then\n\tcase $prev in\n\toutput)\n\t  func_append compile_command \" @OUTPUT@\"\n\t  func_append finalize_command \" @OUTPUT@\"\n\t  ;;\n\tesac\n\n\tcase $prev in\n\tbindir)\n\t  bindir=$arg\n\t  prev=\n\t  continue\n\t  ;;\n\tdlfiles|dlprefiles)\n\t  $preload || {\n\t    # Add the symbol object into the linking commands.\n\t    func_append compile_command \" @SYMFILE@\"\n\t    func_append finalize_command \" @SYMFILE@\"\n\t    preload=:\n\t  }\n\t  case $arg in\n\t  *.la | *.lo) ;;  # We handle these cases below.\n\t  force)\n\t    if test no = \"$dlself\"; then\n\t      dlself=needless\n\t      export_dynamic=yes\n\t    fi\n\t    prev=\n\t    continue\n\t    ;;\n\t  self)\n\t    if test dlprefiles = \"$prev\"; then\n\t      dlself=yes\n\t    elif test dlfiles = \"$prev\" && test yes != \"$dlopen_self\"; then\n\t      dlself=yes\n\t    else\n\t      dlself=needless\n\t      export_dynamic=yes\n\t    fi\n\t    prev=\n\t    continue\n\t    ;;\n\t  *)\n\t    if test dlfiles = \"$prev\"; then\n\t      func_append dlfiles \" $arg\"\n\t    else\n\t      func_append dlprefiles \" $arg\"\n\t    fi\n\t    prev=\n\t    continue\n\t    ;;\n\t  esac\n\t  ;;\n\texpsyms)\n\t  export_symbols=$arg\n\t  test -f \"$arg\" \\\n\t    || func_fatal_error \"symbol file '$arg' does not exist\"\n\t  prev=\n\t  continue\n\t  ;;\n\texpsyms_regex)\n\t  export_symbols_regex=$arg\n\t  prev=\n\t  continue\n\t  ;;\n\tframework)\n\t  case $host in\n\t    *-*-darwin*)\n\t      case \"$deplibs \" in\n\t\t*\" $qarg.ltframework \"*) ;;\n\t\t*) func_append deplibs \" $qarg.ltframework\" # this is fixed later\n\t\t   ;;\n\t      esac\n\t      ;;\n\t  esac\n\t  prev=\n\t  continue\n\t  ;;\n\tinst_prefix)\n\t  inst_prefix_dir=$arg\n\t  prev=\n\t  continue\n\t  ;;\n\tmllvm)\n\t  # Clang does not use LLVM to link, so we can simply discard any\n\t  # '-mllvm $arg' options when doing the link step.\n\t  prev=\n\t  continue\n\t  ;;\n\tobjectlist)\n\t  if test -f \"$arg\"; then\n\t    save_arg=$arg\n\t    moreargs=\n\t    for fil in `cat \"$save_arg\"`\n\t    do\n#\t      func_append moreargs \" $fil\"\n\t      arg=$fil\n\t      # A libtool-controlled object.\n\n\t      # Check to see that this really is a libtool object.\n\t      if func_lalib_unsafe_p \"$arg\"; then\n\t\tpic_object=\n\t\tnon_pic_object=\n\n\t\t# Read the .lo file\n\t\tfunc_source \"$arg\"\n\n\t\tif test -z \"$pic_object\" ||\n\t\t   test -z \"$non_pic_object\" ||\n\t\t   test none = \"$pic_object\" &&\n\t\t   test none = \"$non_pic_object\"; then\n\t\t  func_fatal_error \"cannot find name of object for '$arg'\"\n\t\tfi\n\n\t\t# Extract subdirectory from the argument.\n\t\tfunc_dirname \"$arg\" \"/\" \"\"\n\t\txdir=$func_dirname_result\n\n\t\tif test none != \"$pic_object\"; then\n\t\t  # Prepend the subdirectory the object is found in.\n\t\t  pic_object=$xdir$pic_object\n\n\t\t  if test dlfiles = \"$prev\"; then\n\t\t    if test yes = \"$build_libtool_libs\" && test yes = \"$dlopen_support\"; then\n\t\t      func_append dlfiles \" $pic_object\"\n\t\t      prev=\n\t\t      continue\n\t\t    else\n\t\t      # If libtool objects are unsupported, then we need to preload.\n\t\t      prev=dlprefiles\n\t\t    fi\n\t\t  fi\n\n\t\t  # CHECK ME:  I think I busted this.  -Ossama\n\t\t  if test dlprefiles = \"$prev\"; then\n\t\t    # Preload the old-style object.\n\t\t    func_append dlprefiles \" $pic_object\"\n\t\t    prev=\n\t\t  fi\n\n\t\t  # A PIC object.\n\t\t  func_append libobjs \" $pic_object\"\n\t\t  arg=$pic_object\n\t\tfi\n\n\t\t# Non-PIC object.\n\t\tif test none != \"$non_pic_object\"; then\n\t\t  # Prepend the subdirectory the object is found in.\n\t\t  non_pic_object=$xdir$non_pic_object\n\n\t\t  # A standard non-PIC object\n\t\t  func_append non_pic_objects \" $non_pic_object\"\n\t\t  if test -z \"$pic_object\" || test none = \"$pic_object\"; then\n\t\t    arg=$non_pic_object\n\t\t  fi\n\t\telse\n\t\t  # If the PIC object exists, use it instead.\n\t\t  # $xdir was prepended to $pic_object above.\n\t\t  non_pic_object=$pic_object\n\t\t  func_append non_pic_objects \" $non_pic_object\"\n\t\tfi\n\t      else\n\t\t# Only an error if not doing a dry-run.\n\t\tif $opt_dry_run; then\n\t\t  # Extract subdirectory from the argument.\n\t\t  func_dirname \"$arg\" \"/\" \"\"\n\t\t  xdir=$func_dirname_result\n\n\t\t  func_lo2o \"$arg\"\n\t\t  pic_object=$xdir$objdir/$func_lo2o_result\n\t\t  non_pic_object=$xdir$func_lo2o_result\n\t\t  func_append libobjs \" $pic_object\"\n\t\t  func_append non_pic_objects \" $non_pic_object\"\n\t        else\n\t\t  func_fatal_error \"'$arg' is not a valid libtool object\"\n\t\tfi\n\t      fi\n\t    done\n\t  else\n\t    func_fatal_error \"link input file '$arg' does not exist\"\n\t  fi\n\t  arg=$save_arg\n\t  prev=\n\t  continue\n\t  ;;\n\tos2dllname)\n\t  os2dllname=$arg\n\t  prev=\n\t  continue\n\t  ;;\n\tprecious_regex)\n\t  precious_files_regex=$arg\n\t  prev=\n\t  continue\n\t  ;;\n\trelease)\n\t  release=-$arg\n\t  prev=\n\t  continue\n\t  ;;\n\trpath | xrpath)\n\t  # We need an absolute path.\n\t  case $arg in\n\t  [\\\\/]* | [A-Za-z]:[\\\\/]*) ;;\n\t  *)\n\t    func_fatal_error \"only absolute run-paths are allowed\"\n\t    ;;\n\t  esac\n\t  if test rpath = \"$prev\"; then\n\t    case \"$rpath \" in\n\t    *\" $arg \"*) ;;\n\t    *) func_append rpath \" $arg\" ;;\n\t    esac\n\t  else\n\t    case \"$xrpath \" in\n\t    *\" $arg \"*) ;;\n\t    *) func_append xrpath \" $arg\" ;;\n\t    esac\n\t  fi\n\t  prev=\n\t  continue\n\t  ;;\n\tshrext)\n\t  shrext_cmds=$arg\n\t  prev=\n\t  continue\n\t  ;;\n\tweak)\n\t  func_append weak_libs \" $arg\"\n\t  prev=\n\t  continue\n\t  ;;\n\txcclinker)\n\t  func_append linker_flags \" $qarg\"\n\t  func_append compiler_flags \" $qarg\"\n\t  prev=\n\t  func_append compile_command \" $qarg\"\n\t  func_append finalize_command \" $qarg\"\n\t  continue\n\t  ;;\n\txcompiler)\n\t  func_append compiler_flags \" $qarg\"\n\t  prev=\n\t  func_append compile_command \" $qarg\"\n\t  func_append finalize_command \" $qarg\"\n\t  continue\n\t  ;;\n\txlinker)\n\t  func_append linker_flags \" $qarg\"\n\t  func_append compiler_flags \" $wl$qarg\"\n\t  prev=\n\t  func_append compile_command \" $wl$qarg\"\n\t  func_append finalize_command \" $wl$qarg\"\n\t  continue\n\t  ;;\n\t*)\n\t  eval \"$prev=\\\"\\$arg\\\"\"\n\t  prev=\n\t  continue\n\t  ;;\n\tesac\n      fi # test -n \"$prev\"\n\n      prevarg=$arg\n\n      case $arg in\n      -all-static)\n\tif test -n \"$link_static_flag\"; then\n\t  # See comment for -static flag below, for more details.\n\t  func_append compile_command \" $link_static_flag\"\n\t  func_append finalize_command \" $link_static_flag\"\n\tfi\n\tcontinue\n\t;;\n\n      -allow-undefined)\n\t# FIXME: remove this flag sometime in the future.\n\tfunc_fatal_error \"'-allow-undefined' must not be used because it is the default\"\n\t;;\n\n      -avoid-version)\n\tavoid_version=yes\n\tcontinue\n\t;;\n\n      -bindir)\n\tprev=bindir\n\tcontinue\n\t;;\n\n      -dlopen)\n\tprev=dlfiles\n\tcontinue\n\t;;\n\n      -dlpreopen)\n\tprev=dlprefiles\n\tcontinue\n\t;;\n\n      -export-dynamic)\n\texport_dynamic=yes\n\tcontinue\n\t;;\n\n      -export-symbols | -export-symbols-regex)\n\tif test -n \"$export_symbols\" || test -n \"$export_symbols_regex\"; then\n\t  func_fatal_error \"more than one -exported-symbols argument is not allowed\"\n\tfi\n\tif test X-export-symbols = \"X$arg\"; then\n\t  prev=expsyms\n\telse\n\t  prev=expsyms_regex\n\tfi\n\tcontinue\n\t;;\n\n      -framework)\n\tprev=framework\n\tcontinue\n\t;;\n\n      -inst-prefix-dir)\n\tprev=inst_prefix\n\tcontinue\n\t;;\n\n      # The native IRIX linker understands -LANG:*, -LIST:* and -LNO:*\n      # so, if we see these flags be careful not to treat them like -L\n      -L[A-Z][A-Z]*:*)\n\tcase $with_gcc/$host in\n\tno/*-*-irix* | /*-*-irix*)\n\t  func_append compile_command \" $arg\"\n\t  func_append finalize_command \" $arg\"\n\t  ;;\n\tesac\n\tcontinue\n\t;;\n\n      -L*)\n\tfunc_stripname \"-L\" '' \"$arg\"\n\tif test -z \"$func_stripname_result\"; then\n\t  if test \"$#\" -gt 0; then\n\t    func_fatal_error \"require no space between '-L' and '$1'\"\n\t  else\n\t    func_fatal_error \"need path for '-L' option\"\n\t  fi\n\tfi\n\tfunc_resolve_sysroot \"$func_stripname_result\"\n\tdir=$func_resolve_sysroot_result\n\t# We need an absolute path.\n\tcase $dir in\n\t[\\\\/]* | [A-Za-z]:[\\\\/]*) ;;\n\t*)\n\t  absdir=`cd \"$dir\" && pwd`\n\t  test -z \"$absdir\" && \\\n\t    func_fatal_error \"cannot determine absolute directory name of '$dir'\"\n\t  dir=$absdir\n\t  ;;\n\tesac\n\tcase \"$deplibs \" in\n\t*\" -L$dir \"* | *\" $arg \"*)\n\t  # Will only happen for absolute or sysroot arguments\n\t  ;;\n\t*)\n\t  # Preserve sysroot, but never include relative directories\n\t  case $dir in\n\t    [\\\\/]* | [A-Za-z]:[\\\\/]* | =*) func_append deplibs \" $arg\" ;;\n\t    *) func_append deplibs \" -L$dir\" ;;\n\t  esac\n\t  func_append lib_search_path \" $dir\"\n\t  ;;\n\tesac\n\tcase $host in\n\t*-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-os2* | *-cegcc*)\n\t  testbindir=`$ECHO \"$dir\" | $SED 's*/lib$*/bin*'`\n\t  case :$dllsearchpath: in\n\t  *\":$dir:\"*) ;;\n\t  ::) dllsearchpath=$dir;;\n\t  *) func_append dllsearchpath \":$dir\";;\n\t  esac\n\t  case :$dllsearchpath: in\n\t  *\":$testbindir:\"*) ;;\n\t  ::) dllsearchpath=$testbindir;;\n\t  *) func_append dllsearchpath \":$testbindir\";;\n\t  esac\n\t  ;;\n\tesac\n\tcontinue\n\t;;\n\n      -l*)\n\tif test X-lc = \"X$arg\" || test X-lm = \"X$arg\"; then\n\t  case $host in\n\t  *-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-beos* | *-cegcc* | *-*-haiku*)\n\t    # These systems don't actually have a C or math library (as such)\n\t    continue\n\t    ;;\n\t  *-*-os2*)\n\t    # These systems don't actually have a C library (as such)\n\t    test X-lc = \"X$arg\" && continue\n\t    ;;\n\t  *-*-openbsd* | *-*-freebsd* | *-*-dragonfly* | *-*-bitrig*)\n\t    # Do not include libc due to us having libc/libc_r.\n\t    test X-lc = \"X$arg\" && continue\n\t    ;;\n\t  *-*-rhapsody* | *-*-darwin1.[012])\n\t    # Rhapsody C and math libraries are in the System framework\n\t    func_append deplibs \" System.ltframework\"\n\t    continue\n\t    ;;\n\t  *-*-sco3.2v5* | *-*-sco5v6*)\n\t    # Causes problems with __ctype\n\t    test X-lc = \"X$arg\" && continue\n\t    ;;\n\t  *-*-sysv4.2uw2* | *-*-sysv5* | *-*-unixware* | *-*-OpenUNIX*)\n\t    # Compiler inserts libc in the correct place for threads to work\n\t    test X-lc = \"X$arg\" && continue\n\t    ;;\n\t  esac\n\telif test X-lc_r = \"X$arg\"; then\n\t case $host in\n\t *-*-openbsd* | *-*-freebsd* | *-*-dragonfly* | *-*-bitrig*)\n\t   # Do not include libc_r directly, use -pthread flag.\n\t   continue\n\t   ;;\n\t esac\n\tfi\n\tfunc_append deplibs \" $arg\"\n\tcontinue\n\t;;\n\n      -mllvm)\n\tprev=mllvm\n\tcontinue\n\t;;\n\n      -module)\n\tmodule=yes\n\tcontinue\n\t;;\n\n      # Tru64 UNIX uses -model [arg] to determine the layout of C++\n      # classes, name mangling, and exception handling.\n      # Darwin uses the -arch flag to determine output architecture.\n      -model|-arch|-isysroot|--sysroot)\n\tfunc_append compiler_flags \" $arg\"\n\tfunc_append compile_command \" $arg\"\n\tfunc_append finalize_command \" $arg\"\n\tprev=xcompiler\n\tcontinue\n\t;;\n\n      -mt|-mthreads|-kthread|-Kthread|-pthread|-pthreads|--thread-safe \\\n      |-threads|-fopenmp|-openmp|-mp|-xopenmp|-omp|-qsmp=*)\n\tfunc_append compiler_flags \" $arg\"\n\tfunc_append compile_command \" $arg\"\n\tfunc_append finalize_command \" $arg\"\n\tcase \"$new_inherited_linker_flags \" in\n\t    *\" $arg \"*) ;;\n\t    * ) func_append new_inherited_linker_flags \" $arg\" ;;\n\tesac\n\tcontinue\n\t;;\n\n      -multi_module)\n\tsingle_module=$wl-multi_module\n\tcontinue\n\t;;\n\n      -no-fast-install)\n\tfast_install=no\n\tcontinue\n\t;;\n\n      -no-install)\n\tcase $host in\n\t*-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-os2* | *-*-darwin* | *-cegcc*)\n\t  # The PATH hackery in wrapper scripts is required on Windows\n\t  # and Darwin in order for the loader to find any dlls it needs.\n\t  func_warning \"'-no-install' is ignored for $host\"\n\t  func_warning \"assuming '-no-fast-install' instead\"\n\t  fast_install=no\n\t  ;;\n\t*) no_install=yes ;;\n\tesac\n\tcontinue\n\t;;\n\n      -no-undefined)\n\tallow_undefined=no\n\tcontinue\n\t;;\n\n      -objectlist)\n\tprev=objectlist\n\tcontinue\n\t;;\n\n      -os2dllname)\n\tprev=os2dllname\n\tcontinue\n\t;;\n\n      -o) prev=output ;;\n\n      -precious-files-regex)\n\tprev=precious_regex\n\tcontinue\n\t;;\n\n      -release)\n\tprev=release\n\tcontinue\n\t;;\n\n      -rpath)\n\tprev=rpath\n\tcontinue\n\t;;\n\n      -R)\n\tprev=xrpath\n\tcontinue\n\t;;\n\n      -R*)\n\tfunc_stripname '-R' '' \"$arg\"\n\tdir=$func_stripname_result\n\t# We need an absolute path.\n\tcase $dir in\n\t[\\\\/]* | [A-Za-z]:[\\\\/]*) ;;\n\t=*)\n\t  func_stripname '=' '' \"$dir\"\n\t  dir=$lt_sysroot$func_stripname_result\n\t  ;;\n\t*)\n\t  func_fatal_error \"only absolute run-paths are allowed\"\n\t  ;;\n\tesac\n\tcase \"$xrpath \" in\n\t*\" $dir \"*) ;;\n\t*) func_append xrpath \" $dir\" ;;\n\tesac\n\tcontinue\n\t;;\n\n      -shared)\n\t# The effects of -shared are defined in a previous loop.\n\tcontinue\n\t;;\n\n      -shrext)\n\tprev=shrext\n\tcontinue\n\t;;\n\n      -static | -static-libtool-libs)\n\t# The effects of -static are defined in a previous loop.\n\t# We used to do the same as -all-static on platforms that\n\t# didn't have a PIC flag, but the assumption that the effects\n\t# would be equivalent was wrong.  It would break on at least\n\t# Digital Unix and AIX.\n\tcontinue\n\t;;\n\n      -thread-safe)\n\tthread_safe=yes\n\tcontinue\n\t;;\n\n      -version-info)\n\tprev=vinfo\n\tcontinue\n\t;;\n\n      -version-number)\n\tprev=vinfo\n\tvinfo_number=yes\n\tcontinue\n\t;;\n\n      -weak)\n        prev=weak\n\tcontinue\n\t;;\n\n      -Wc,*)\n\tfunc_stripname '-Wc,' '' \"$arg\"\n\targs=$func_stripname_result\n\targ=\n\tsave_ifs=$IFS; IFS=,\n\tfor flag in $args; do\n\t  IFS=$save_ifs\n          func_quote_for_eval \"$flag\"\n\t  func_append arg \" $func_quote_for_eval_result\"\n\t  func_append compiler_flags \" $func_quote_for_eval_result\"\n\tdone\n\tIFS=$save_ifs\n\tfunc_stripname ' ' '' \"$arg\"\n\targ=$func_stripname_result\n\t;;\n\n      -Wl,*)\n\tfunc_stripname '-Wl,' '' \"$arg\"\n\targs=$func_stripname_result\n\targ=\n\tsave_ifs=$IFS; IFS=,\n\tfor flag in $args; do\n\t  IFS=$save_ifs\n          func_quote_for_eval \"$flag\"\n\t  func_append arg \" $wl$func_quote_for_eval_result\"\n\t  func_append compiler_flags \" $wl$func_quote_for_eval_result\"\n\t  func_append linker_flags \" $func_quote_for_eval_result\"\n\tdone\n\tIFS=$save_ifs\n\tfunc_stripname ' ' '' \"$arg\"\n\targ=$func_stripname_result\n\t;;\n\n      -Xcompiler)\n\tprev=xcompiler\n\tcontinue\n\t;;\n\n      -Xlinker)\n\tprev=xlinker\n\tcontinue\n\t;;\n\n      -XCClinker)\n\tprev=xcclinker\n\tcontinue\n\t;;\n\n      # -msg_* for osf cc\n      -msg_*)\n\tfunc_quote_for_eval \"$arg\"\n\targ=$func_quote_for_eval_result\n\t;;\n\n      # Flags to be passed through unchanged, with rationale:\n      # -64, -mips[0-9]      enable 64-bit mode for the SGI compiler\n      # -r[0-9][0-9]*        specify processor for the SGI compiler\n      # -xarch=*, -xtarget=* enable 64-bit mode for the Sun compiler\n      # +DA*, +DD*           enable 64-bit mode for the HP compiler\n      # -q*                  compiler args for the IBM compiler\n      # -m*, -t[45]*, -txscale* architecture-specific flags for GCC\n      # -F/path              path to uninstalled frameworks, gcc on darwin\n      # -p, -pg, --coverage, -fprofile-*  profiling flags for GCC\n      # -fstack-protector*   stack protector flags for GCC\n      # @file                GCC response files\n      # -tp=*                Portland pgcc target processor selection\n      # --sysroot=*          for sysroot support\n      # -O*, -g*, -flto*, -fwhopr*, -fuse-linker-plugin GCC link-time optimization\n      # -specs=*             GCC specs files\n      # -stdlib=*            select c++ std lib with clang\n      # -fsanitize=*         Clang/GCC memory and address sanitizer\n      -64|-mips[0-9]|-r[0-9][0-9]*|-xarch=*|-xtarget=*|+DA*|+DD*|-q*|-m*| \\\n      -t[45]*|-txscale*|-p|-pg|--coverage|-fprofile-*|-F*|@*|-tp=*|--sysroot=*| \\\n      -O*|-g*|-flto*|-fwhopr*|-fuse-linker-plugin|-fstack-protector*|-stdlib=*| \\\n      -specs=*|-fsanitize=*)\n        func_quote_for_eval \"$arg\"\n\targ=$func_quote_for_eval_result\n        func_append compile_command \" $arg\"\n        func_append finalize_command \" $arg\"\n        func_append compiler_flags \" $arg\"\n        continue\n        ;;\n\n      -Z*)\n        if test os2 = \"`expr $host : '.*\\(os2\\)'`\"; then\n          # OS/2 uses -Zxxx to specify OS/2-specific options\n\t  compiler_flags=\"$compiler_flags $arg\"\n\t  func_append compile_command \" $arg\"\n\t  func_append finalize_command \" $arg\"\n\t  case $arg in\n\t  -Zlinker | -Zstack)\n\t    prev=xcompiler\n\t    ;;\n\t  esac\n\t  continue\n        else\n\t  # Otherwise treat like 'Some other compiler flag' below\n\t  func_quote_for_eval \"$arg\"\n\t  arg=$func_quote_for_eval_result\n        fi\n\t;;\n\n      # Some other compiler flag.\n      -* | +*)\n        func_quote_for_eval \"$arg\"\n\targ=$func_quote_for_eval_result\n\t;;\n\n      *.$objext)\n\t# A standard object.\n\tfunc_append objs \" $arg\"\n\t;;\n\n      *.lo)\n\t# A libtool-controlled object.\n\n\t# Check to see that this really is a libtool object.\n\tif func_lalib_unsafe_p \"$arg\"; then\n\t  pic_object=\n\t  non_pic_object=\n\n\t  # Read the .lo file\n\t  func_source \"$arg\"\n\n\t  if test -z \"$pic_object\" ||\n\t     test -z \"$non_pic_object\" ||\n\t     test none = \"$pic_object\" &&\n\t     test none = \"$non_pic_object\"; then\n\t    func_fatal_error \"cannot find name of object for '$arg'\"\n\t  fi\n\n\t  # Extract subdirectory from the argument.\n\t  func_dirname \"$arg\" \"/\" \"\"\n\t  xdir=$func_dirname_result\n\n\t  test none = \"$pic_object\" || {\n\t    # Prepend the subdirectory the object is found in.\n\t    pic_object=$xdir$pic_object\n\n\t    if test dlfiles = \"$prev\"; then\n\t      if test yes = \"$build_libtool_libs\" && test yes = \"$dlopen_support\"; then\n\t\tfunc_append dlfiles \" $pic_object\"\n\t\tprev=\n\t\tcontinue\n\t      else\n\t\t# If libtool objects are unsupported, then we need to preload.\n\t\tprev=dlprefiles\n\t      fi\n\t    fi\n\n\t    # CHECK ME:  I think I busted this.  -Ossama\n\t    if test dlprefiles = \"$prev\"; then\n\t      # Preload the old-style object.\n\t      func_append dlprefiles \" $pic_object\"\n\t      prev=\n\t    fi\n\n\t    # A PIC object.\n\t    func_append libobjs \" $pic_object\"\n\t    arg=$pic_object\n\t  }\n\n\t  # Non-PIC object.\n\t  if test none != \"$non_pic_object\"; then\n\t    # Prepend the subdirectory the object is found in.\n\t    non_pic_object=$xdir$non_pic_object\n\n\t    # A standard non-PIC object\n\t    func_append non_pic_objects \" $non_pic_object\"\n\t    if test -z \"$pic_object\" || test none = \"$pic_object\"; then\n\t      arg=$non_pic_object\n\t    fi\n\t  else\n\t    # If the PIC object exists, use it instead.\n\t    # $xdir was prepended to $pic_object above.\n\t    non_pic_object=$pic_object\n\t    func_append non_pic_objects \" $non_pic_object\"\n\t  fi\n\telse\n\t  # Only an error if not doing a dry-run.\n\t  if $opt_dry_run; then\n\t    # Extract subdirectory from the argument.\n\t    func_dirname \"$arg\" \"/\" \"\"\n\t    xdir=$func_dirname_result\n\n\t    func_lo2o \"$arg\"\n\t    pic_object=$xdir$objdir/$func_lo2o_result\n\t    non_pic_object=$xdir$func_lo2o_result\n\t    func_append libobjs \" $pic_object\"\n\t    func_append non_pic_objects \" $non_pic_object\"\n\t  else\n\t    func_fatal_error \"'$arg' is not a valid libtool object\"\n\t  fi\n\tfi\n\t;;\n\n      *.$libext)\n\t# An archive.\n\tfunc_append deplibs \" $arg\"\n\tfunc_append old_deplibs \" $arg\"\n\tcontinue\n\t;;\n\n      *.la)\n\t# A libtool-controlled library.\n\n\tfunc_resolve_sysroot \"$arg\"\n\tif test dlfiles = \"$prev\"; then\n\t  # This library was specified with -dlopen.\n\t  func_append dlfiles \" $func_resolve_sysroot_result\"\n\t  prev=\n\telif test dlprefiles = \"$prev\"; then\n\t  # The library was specified with -dlpreopen.\n\t  func_append dlprefiles \" $func_resolve_sysroot_result\"\n\t  prev=\n\telse\n\t  func_append deplibs \" $func_resolve_sysroot_result\"\n\tfi\n\tcontinue\n\t;;\n\n      # Some other compiler argument.\n      *)\n\t# Unknown arguments in both finalize_command and compile_command need\n\t# to be aesthetically quoted because they are evaled later.\n\tfunc_quote_for_eval \"$arg\"\n\targ=$func_quote_for_eval_result\n\t;;\n      esac # arg\n\n      # Now actually substitute the argument into the commands.\n      if test -n \"$arg\"; then\n\tfunc_append compile_command \" $arg\"\n\tfunc_append finalize_command \" $arg\"\n      fi\n    done # argument parsing loop\n\n    test -n \"$prev\" && \\\n      func_fatal_help \"the '$prevarg' option requires an argument\"\n\n    if test yes = \"$export_dynamic\" && test -n \"$export_dynamic_flag_spec\"; then\n      eval arg=\\\"$export_dynamic_flag_spec\\\"\n      func_append compile_command \" $arg\"\n      func_append finalize_command \" $arg\"\n    fi\n\n    oldlibs=\n    # calculate the name of the file, without its directory\n    func_basename \"$output\"\n    outputname=$func_basename_result\n    libobjs_save=$libobjs\n\n    if test -n \"$shlibpath_var\"; then\n      # get the directories listed in $shlibpath_var\n      eval shlib_search_path=\\`\\$ECHO \\\"\\$$shlibpath_var\\\" \\| \\$SED \\'s/:/ /g\\'\\`\n    else\n      shlib_search_path=\n    fi\n    eval sys_lib_search_path=\\\"$sys_lib_search_path_spec\\\"\n    eval sys_lib_dlsearch_path=\\\"$sys_lib_dlsearch_path_spec\\\"\n\n    # Definition is injected by LT_CONFIG during libtool generation.\n    func_munge_path_list sys_lib_dlsearch_path \"$LT_SYS_LIBRARY_PATH\"\n\n    func_dirname \"$output\" \"/\" \"\"\n    output_objdir=$func_dirname_result$objdir\n    func_to_tool_file \"$output_objdir/\"\n    tool_output_objdir=$func_to_tool_file_result\n    # Create the object directory.\n    func_mkdir_p \"$output_objdir\"\n\n    # Determine the type of output\n    case $output in\n    \"\")\n      func_fatal_help \"you must specify an output file\"\n      ;;\n    *.$libext) linkmode=oldlib ;;\n    *.lo | *.$objext) linkmode=obj ;;\n    *.la) linkmode=lib ;;\n    *) linkmode=prog ;; # Anything else should be a program.\n    esac\n\n    specialdeplibs=\n\n    libs=\n    # Find all interdependent deplibs by searching for libraries\n    # that are linked more than once (e.g. -la -lb -la)\n    for deplib in $deplibs; do\n      if $opt_preserve_dup_deps; then\n\tcase \"$libs \" in\n\t*\" $deplib \"*) func_append specialdeplibs \" $deplib\" ;;\n\tesac\n      fi\n      func_append libs \" $deplib\"\n    done\n\n    if test lib = \"$linkmode\"; then\n      libs=\"$predeps $libs $compiler_lib_search_path $postdeps\"\n\n      # Compute libraries that are listed more than once in $predeps\n      # $postdeps and mark them as special (i.e., whose duplicates are\n      # not to be eliminated).\n      pre_post_deps=\n      if $opt_duplicate_compiler_generated_deps; then\n\tfor pre_post_dep in $predeps $postdeps; do\n\t  case \"$pre_post_deps \" in\n\t  *\" $pre_post_dep \"*) func_append specialdeplibs \" $pre_post_deps\" ;;\n\t  esac\n\t  func_append pre_post_deps \" $pre_post_dep\"\n\tdone\n      fi\n      pre_post_deps=\n    fi\n\n    deplibs=\n    newdependency_libs=\n    newlib_search_path=\n    need_relink=no # whether we're linking any uninstalled libtool libraries\n    notinst_deplibs= # not-installed libtool libraries\n    notinst_path= # paths that contain not-installed libtool libraries\n\n    case $linkmode in\n    lib)\n\tpasses=\"conv dlpreopen link\"\n\tfor file in $dlfiles $dlprefiles; do\n\t  case $file in\n\t  *.la) ;;\n\t  *)\n\t    func_fatal_help \"libraries can '-dlopen' only libtool libraries: $file\"\n\t    ;;\n\t  esac\n\tdone\n\t;;\n    prog)\n\tcompile_deplibs=\n\tfinalize_deplibs=\n\talldeplibs=false\n\tnewdlfiles=\n\tnewdlprefiles=\n\tpasses=\"conv scan dlopen dlpreopen link\"\n\t;;\n    *)  passes=\"conv\"\n\t;;\n    esac\n\n    for pass in $passes; do\n      # The preopen pass in lib mode reverses $deplibs; put it back here\n      # so that -L comes before libs that need it for instance...\n      if test lib,link = \"$linkmode,$pass\"; then\n\t## FIXME: Find the place where the list is rebuilt in the wrong\n\t##        order, and fix it there properly\n        tmp_deplibs=\n\tfor deplib in $deplibs; do\n\t  tmp_deplibs=\"$deplib $tmp_deplibs\"\n\tdone\n\tdeplibs=$tmp_deplibs\n      fi\n\n      if test lib,link = \"$linkmode,$pass\" ||\n\t test prog,scan = \"$linkmode,$pass\"; then\n\tlibs=$deplibs\n\tdeplibs=\n      fi\n      if test prog = \"$linkmode\"; then\n\tcase $pass in\n\tdlopen) libs=$dlfiles ;;\n\tdlpreopen) libs=$dlprefiles ;;\n\tlink)\n\t  libs=\"$deplibs %DEPLIBS%\"\n\t  test \"X$link_all_deplibs\" != Xno && libs=\"$libs $dependency_libs\"\n\t  ;;\n\tesac\n      fi\n      if test lib,dlpreopen = \"$linkmode,$pass\"; then\n\t# Collect and forward deplibs of preopened libtool libs\n\tfor lib in $dlprefiles; do\n\t  # Ignore non-libtool-libs\n\t  dependency_libs=\n\t  func_resolve_sysroot \"$lib\"\n\t  case $lib in\n\t  *.la)\tfunc_source \"$func_resolve_sysroot_result\" ;;\n\t  esac\n\n\t  # Collect preopened libtool deplibs, except any this library\n\t  # has declared as weak libs\n\t  for deplib in $dependency_libs; do\n\t    func_basename \"$deplib\"\n            deplib_base=$func_basename_result\n\t    case \" $weak_libs \" in\n\t    *\" $deplib_base \"*) ;;\n\t    *) func_append deplibs \" $deplib\" ;;\n\t    esac\n\t  done\n\tdone\n\tlibs=$dlprefiles\n      fi\n      if test dlopen = \"$pass\"; then\n\t# Collect dlpreopened libraries\n\tsave_deplibs=$deplibs\n\tdeplibs=\n      fi\n\n      for deplib in $libs; do\n\tlib=\n\tfound=false\n\tcase $deplib in\n\t-mt|-mthreads|-kthread|-Kthread|-pthread|-pthreads|--thread-safe \\\n        |-threads|-fopenmp|-openmp|-mp|-xopenmp|-omp|-qsmp=*)\n\t  if test prog,link = \"$linkmode,$pass\"; then\n\t    compile_deplibs=\"$deplib $compile_deplibs\"\n\t    finalize_deplibs=\"$deplib $finalize_deplibs\"\n\t  else\n\t    func_append compiler_flags \" $deplib\"\n\t    if test lib = \"$linkmode\"; then\n\t\tcase \"$new_inherited_linker_flags \" in\n\t\t    *\" $deplib \"*) ;;\n\t\t    * ) func_append new_inherited_linker_flags \" $deplib\" ;;\n\t\tesac\n\t    fi\n\t  fi\n\t  continue\n\t  ;;\n\t-l*)\n\t  if test lib != \"$linkmode\" && test prog != \"$linkmode\"; then\n\t    func_warning \"'-l' is ignored for archives/objects\"\n\t    continue\n\t  fi\n\t  func_stripname '-l' '' \"$deplib\"\n\t  name=$func_stripname_result\n\t  if test lib = \"$linkmode\"; then\n\t    searchdirs=\"$newlib_search_path $lib_search_path $compiler_lib_search_dirs $sys_lib_search_path $shlib_search_path\"\n\t  else\n\t    searchdirs=\"$newlib_search_path $lib_search_path $sys_lib_search_path $shlib_search_path\"\n\t  fi\n\t  for searchdir in $searchdirs; do\n\t    for search_ext in .la $std_shrext .so .a; do\n\t      # Search the libtool library\n\t      lib=$searchdir/lib$name$search_ext\n\t      if test -f \"$lib\"; then\n\t\tif test .la = \"$search_ext\"; then\n\t\t  found=:\n\t\telse\n\t\t  found=false\n\t\tfi\n\t\tbreak 2\n\t      fi\n\t    done\n\t  done\n\t  if $found; then\n\t    # deplib is a libtool library\n\t    # If $allow_libtool_libs_with_static_runtimes && $deplib is a stdlib,\n\t    # We need to do some special things here, and not later.\n\t    if test yes = \"$allow_libtool_libs_with_static_runtimes\"; then\n\t      case \" $predeps $postdeps \" in\n\t      *\" $deplib \"*)\n\t\tif func_lalib_p \"$lib\"; then\n\t\t  library_names=\n\t\t  old_library=\n\t\t  func_source \"$lib\"\n\t\t  for l in $old_library $library_names; do\n\t\t    ll=$l\n\t\t  done\n\t\t  if test \"X$ll\" = \"X$old_library\"; then # only static version available\n\t\t    found=false\n\t\t    func_dirname \"$lib\" \"\" \".\"\n\t\t    ladir=$func_dirname_result\n\t\t    lib=$ladir/$old_library\n\t\t    if test prog,link = \"$linkmode,$pass\"; then\n\t\t      compile_deplibs=\"$deplib $compile_deplibs\"\n\t\t      finalize_deplibs=\"$deplib $finalize_deplibs\"\n\t\t    else\n\t\t      deplibs=\"$deplib $deplibs\"\n\t\t      test lib = \"$linkmode\" && newdependency_libs=\"$deplib $newdependency_libs\"\n\t\t    fi\n\t\t    continue\n\t\t  fi\n\t\tfi\n\t\t;;\n\t      *) ;;\n\t      esac\n\t    fi\n\t  else\n\t    # deplib doesn't seem to be a libtool library\n\t    if test prog,link = \"$linkmode,$pass\"; then\n\t      compile_deplibs=\"$deplib $compile_deplibs\"\n\t      finalize_deplibs=\"$deplib $finalize_deplibs\"\n\t    else\n\t      deplibs=\"$deplib $deplibs\"\n\t      test lib = \"$linkmode\" && newdependency_libs=\"$deplib $newdependency_libs\"\n\t    fi\n\t    continue\n\t  fi\n\t  ;; # -l\n\t*.ltframework)\n\t  if test prog,link = \"$linkmode,$pass\"; then\n\t    compile_deplibs=\"$deplib $compile_deplibs\"\n\t    finalize_deplibs=\"$deplib $finalize_deplibs\"\n\t  else\n\t    deplibs=\"$deplib $deplibs\"\n\t    if test lib = \"$linkmode\"; then\n\t\tcase \"$new_inherited_linker_flags \" in\n\t\t    *\" $deplib \"*) ;;\n\t\t    * ) func_append new_inherited_linker_flags \" $deplib\" ;;\n\t\tesac\n\t    fi\n\t  fi\n\t  continue\n\t  ;;\n\t-L*)\n\t  case $linkmode in\n\t  lib)\n\t    deplibs=\"$deplib $deplibs\"\n\t    test conv = \"$pass\" && continue\n\t    newdependency_libs=\"$deplib $newdependency_libs\"\n\t    func_stripname '-L' '' \"$deplib\"\n\t    func_resolve_sysroot \"$func_stripname_result\"\n\t    func_append newlib_search_path \" $func_resolve_sysroot_result\"\n\t    ;;\n\t  prog)\n\t    if test conv = \"$pass\"; then\n\t      deplibs=\"$deplib $deplibs\"\n\t      continue\n\t    fi\n\t    if test scan = \"$pass\"; then\n\t      deplibs=\"$deplib $deplibs\"\n\t    else\n\t      compile_deplibs=\"$deplib $compile_deplibs\"\n\t      finalize_deplibs=\"$deplib $finalize_deplibs\"\n\t    fi\n\t    func_stripname '-L' '' \"$deplib\"\n\t    func_resolve_sysroot \"$func_stripname_result\"\n\t    func_append newlib_search_path \" $func_resolve_sysroot_result\"\n\t    ;;\n\t  *)\n\t    func_warning \"'-L' is ignored for archives/objects\"\n\t    ;;\n\t  esac # linkmode\n\t  continue\n\t  ;; # -L\n\t-R*)\n\t  if test link = \"$pass\"; then\n\t    func_stripname '-R' '' \"$deplib\"\n\t    func_resolve_sysroot \"$func_stripname_result\"\n\t    dir=$func_resolve_sysroot_result\n\t    # Make sure the xrpath contains only unique directories.\n\t    case \"$xrpath \" in\n\t    *\" $dir \"*) ;;\n\t    *) func_append xrpath \" $dir\" ;;\n\t    esac\n\t  fi\n\t  deplibs=\"$deplib $deplibs\"\n\t  continue\n\t  ;;\n\t*.la)\n\t  func_resolve_sysroot \"$deplib\"\n\t  lib=$func_resolve_sysroot_result\n\t  ;;\n\t*.$libext)\n\t  if test conv = \"$pass\"; then\n\t    deplibs=\"$deplib $deplibs\"\n\t    continue\n\t  fi\n\t  case $linkmode in\n\t  lib)\n\t    # Linking convenience modules into shared libraries is allowed,\n\t    # but linking other static libraries is non-portable.\n\t    case \" $dlpreconveniencelibs \" in\n\t    *\" $deplib \"*) ;;\n\t    *)\n\t      valid_a_lib=false\n\t      case $deplibs_check_method in\n\t\tmatch_pattern*)\n\t\t  set dummy $deplibs_check_method; shift\n\t\t  match_pattern_regex=`expr \"$deplibs_check_method\" : \"$1 \\(.*\\)\"`\n\t\t  if eval \"\\$ECHO \\\"$deplib\\\"\" 2>/dev/null | $SED 10q \\\n\t\t    | $EGREP \"$match_pattern_regex\" > /dev/null; then\n\t\t    valid_a_lib=:\n\t\t  fi\n\t\t;;\n\t\tpass_all)\n\t\t  valid_a_lib=:\n\t\t;;\n\t      esac\n\t      if $valid_a_lib; then\n\t\techo\n\t\t$ECHO \"*** Warning: Linking the shared library $output against the\"\n\t\t$ECHO \"*** static library $deplib is not portable!\"\n\t\tdeplibs=\"$deplib $deplibs\"\n\t      else\n\t\techo\n\t\t$ECHO \"*** Warning: Trying to link with static lib archive $deplib.\"\n\t\techo \"*** I have the capability to make that library automatically link in when\"\n\t\techo \"*** you link to this library.  But I can only do this if you have a\"\n\t\techo \"*** shared version of the library, which you do not appear to have\"\n\t\techo \"*** because the file extensions .$libext of this argument makes me believe\"\n\t\techo \"*** that it is just a static archive that I should not use here.\"\n\t      fi\n\t      ;;\n\t    esac\n\t    continue\n\t    ;;\n\t  prog)\n\t    if test link != \"$pass\"; then\n\t      deplibs=\"$deplib $deplibs\"\n\t    else\n\t      compile_deplibs=\"$deplib $compile_deplibs\"\n\t      finalize_deplibs=\"$deplib $finalize_deplibs\"\n\t    fi\n\t    continue\n\t    ;;\n\t  esac # linkmode\n\t  ;; # *.$libext\n\t*.lo | *.$objext)\n\t  if test conv = \"$pass\"; then\n\t    deplibs=\"$deplib $deplibs\"\n\t  elif test prog = \"$linkmode\"; then\n\t    if test dlpreopen = \"$pass\" || test yes != \"$dlopen_support\" || test no = \"$build_libtool_libs\"; then\n\t      # If there is no dlopen support or we're linking statically,\n\t      # we need to preload.\n\t      func_append newdlprefiles \" $deplib\"\n\t      compile_deplibs=\"$deplib $compile_deplibs\"\n\t      finalize_deplibs=\"$deplib $finalize_deplibs\"\n\t    else\n\t      func_append newdlfiles \" $deplib\"\n\t    fi\n\t  fi\n\t  continue\n\t  ;;\n\t%DEPLIBS%)\n\t  alldeplibs=:\n\t  continue\n\t  ;;\n\tesac # case $deplib\n\n\t$found || test -f \"$lib\" \\\n\t  || func_fatal_error \"cannot find the library '$lib' or unhandled argument '$deplib'\"\n\n\t# Check to see that this really is a libtool archive.\n\tfunc_lalib_unsafe_p \"$lib\" \\\n\t  || func_fatal_error \"'$lib' is not a valid libtool archive\"\n\n\tfunc_dirname \"$lib\" \"\" \".\"\n\tladir=$func_dirname_result\n\n\tdlname=\n\tdlopen=\n\tdlpreopen=\n\tlibdir=\n\tlibrary_names=\n\told_library=\n\tinherited_linker_flags=\n\t# If the library was installed with an old release of libtool,\n\t# it will not redefine variables installed, or shouldnotlink\n\tinstalled=yes\n\tshouldnotlink=no\n\tavoidtemprpath=\n\n\n\t# Read the .la file\n\tfunc_source \"$lib\"\n\n\t# Convert \"-framework foo\" to \"foo.ltframework\"\n\tif test -n \"$inherited_linker_flags\"; then\n\t  tmp_inherited_linker_flags=`$ECHO \"$inherited_linker_flags\" | $SED 's/-framework \\([^ $]*\\)/\\1.ltframework/g'`\n\t  for tmp_inherited_linker_flag in $tmp_inherited_linker_flags; do\n\t    case \" $new_inherited_linker_flags \" in\n\t      *\" $tmp_inherited_linker_flag \"*) ;;\n\t      *) func_append new_inherited_linker_flags \" $tmp_inherited_linker_flag\";;\n\t    esac\n\t  done\n\tfi\n\tdependency_libs=`$ECHO \" $dependency_libs\" | $SED 's% \\([^ $]*\\).ltframework% -framework \\1%g'`\n\tif test lib,link = \"$linkmode,$pass\" ||\n\t   test prog,scan = \"$linkmode,$pass\" ||\n\t   { test prog != \"$linkmode\" && test lib != \"$linkmode\"; }; then\n\t  test -n \"$dlopen\" && func_append dlfiles \" $dlopen\"\n\t  test -n \"$dlpreopen\" && func_append dlprefiles \" $dlpreopen\"\n\tfi\n\n\tif test conv = \"$pass\"; then\n\t  # Only check for convenience libraries\n\t  deplibs=\"$lib $deplibs\"\n\t  if test -z \"$libdir\"; then\n\t    if test -z \"$old_library\"; then\n\t      func_fatal_error \"cannot find name of link library for '$lib'\"\n\t    fi\n\t    # It is a libtool convenience library, so add in its objects.\n\t    func_append convenience \" $ladir/$objdir/$old_library\"\n\t    func_append old_convenience \" $ladir/$objdir/$old_library\"\n\t    tmp_libs=\n\t    for deplib in $dependency_libs; do\n\t      deplibs=\"$deplib $deplibs\"\n\t      if $opt_preserve_dup_deps; then\n\t\tcase \"$tmp_libs \" in\n\t\t*\" $deplib \"*) func_append specialdeplibs \" $deplib\" ;;\n\t\tesac\n\t      fi\n\t      func_append tmp_libs \" $deplib\"\n\t    done\n\t  elif test prog != \"$linkmode\" && test lib != \"$linkmode\"; then\n\t    func_fatal_error \"'$lib' is not a convenience library\"\n\t  fi\n\t  continue\n\tfi # $pass = conv\n\n\n\t# Get the name of the library we link against.\n\tlinklib=\n\tif test -n \"$old_library\" &&\n\t   { test yes = \"$prefer_static_libs\" ||\n\t     test built,no = \"$prefer_static_libs,$installed\"; }; then\n\t  linklib=$old_library\n\telse\n\t  for l in $old_library $library_names; do\n\t    linklib=$l\n\t  done\n\tfi\n\tif test -z \"$linklib\"; then\n\t  func_fatal_error \"cannot find name of link library for '$lib'\"\n\tfi\n\n\t# This library was specified with -dlopen.\n\tif test dlopen = \"$pass\"; then\n\t  test -z \"$libdir\" \\\n\t    && func_fatal_error \"cannot -dlopen a convenience library: '$lib'\"\n\t  if test -z \"$dlname\" ||\n\t     test yes != \"$dlopen_support\" ||\n\t     test no = \"$build_libtool_libs\"\n\t  then\n\t    # If there is no dlname, no dlopen support or we're linking\n\t    # statically, we need to preload.  We also need to preload any\n\t    # dependent libraries so libltdl's deplib preloader doesn't\n\t    # bomb out in the load deplibs phase.\n\t    func_append dlprefiles \" $lib $dependency_libs\"\n\t  else\n\t    func_append newdlfiles \" $lib\"\n\t  fi\n\t  continue\n\tfi # $pass = dlopen\n\n\t# We need an absolute path.\n\tcase $ladir in\n\t[\\\\/]* | [A-Za-z]:[\\\\/]*) abs_ladir=$ladir ;;\n\t*)\n\t  abs_ladir=`cd \"$ladir\" && pwd`\n\t  if test -z \"$abs_ladir\"; then\n\t    func_warning \"cannot determine absolute directory name of '$ladir'\"\n\t    func_warning \"passing it literally to the linker, although it might fail\"\n\t    abs_ladir=$ladir\n\t  fi\n\t  ;;\n\tesac\n\tfunc_basename \"$lib\"\n\tlaname=$func_basename_result\n\n\t# Find the relevant object directory and library name.\n\tif test yes = \"$installed\"; then\n\t  if test ! -f \"$lt_sysroot$libdir/$linklib\" && test -f \"$abs_ladir/$linklib\"; then\n\t    func_warning \"library '$lib' was moved.\"\n\t    dir=$ladir\n\t    absdir=$abs_ladir\n\t    libdir=$abs_ladir\n\t  else\n\t    dir=$lt_sysroot$libdir\n\t    absdir=$lt_sysroot$libdir\n\t  fi\n\t  test yes = \"$hardcode_automatic\" && avoidtemprpath=yes\n\telse\n\t  if test ! -f \"$ladir/$objdir/$linklib\" && test -f \"$abs_ladir/$linklib\"; then\n\t    dir=$ladir\n\t    absdir=$abs_ladir\n\t    # Remove this search path later\n\t    func_append notinst_path \" $abs_ladir\"\n\t  else\n\t    dir=$ladir/$objdir\n\t    absdir=$abs_ladir/$objdir\n\t    # Remove this search path later\n\t    func_append notinst_path \" $abs_ladir\"\n\t  fi\n\tfi # $installed = yes\n\tfunc_stripname 'lib' '.la' \"$laname\"\n\tname=$func_stripname_result\n\n\t# This library was specified with -dlpreopen.\n\tif test dlpreopen = \"$pass\"; then\n\t  if test -z \"$libdir\" && test prog = \"$linkmode\"; then\n\t    func_fatal_error \"only libraries may -dlpreopen a convenience library: '$lib'\"\n\t  fi\n\t  case $host in\n\t    # special handling for platforms with PE-DLLs.\n\t    *cygwin* | *mingw* | *cegcc* )\n\t      # Linker will automatically link against shared library if both\n\t      # static and shared are present.  Therefore, ensure we extract\n\t      # symbols from the import library if a shared library is present\n\t      # (otherwise, the dlopen module name will be incorrect).  We do\n\t      # this by putting the import library name into $newdlprefiles.\n\t      # We recover the dlopen module name by 'saving' the la file\n\t      # name in a special purpose variable, and (later) extracting the\n\t      # dlname from the la file.\n\t      if test -n \"$dlname\"; then\n\t        func_tr_sh \"$dir/$linklib\"\n\t        eval \"libfile_$func_tr_sh_result=\\$abs_ladir/\\$laname\"\n\t        func_append newdlprefiles \" $dir/$linklib\"\n\t      else\n\t        func_append newdlprefiles \" $dir/$old_library\"\n\t        # Keep a list of preopened convenience libraries to check\n\t        # that they are being used correctly in the link pass.\n\t        test -z \"$libdir\" && \\\n\t          func_append dlpreconveniencelibs \" $dir/$old_library\"\n\t      fi\n\t    ;;\n\t    * )\n\t      # Prefer using a static library (so that no silly _DYNAMIC symbols\n\t      # are required to link).\n\t      if test -n \"$old_library\"; then\n\t        func_append newdlprefiles \" $dir/$old_library\"\n\t        # Keep a list of preopened convenience libraries to check\n\t        # that they are being used correctly in the link pass.\n\t        test -z \"$libdir\" && \\\n\t          func_append dlpreconveniencelibs \" $dir/$old_library\"\n\t      # Otherwise, use the dlname, so that lt_dlopen finds it.\n\t      elif test -n \"$dlname\"; then\n\t        func_append newdlprefiles \" $dir/$dlname\"\n\t      else\n\t        func_append newdlprefiles \" $dir/$linklib\"\n\t      fi\n\t    ;;\n\t  esac\n\tfi # $pass = dlpreopen\n\n\tif test -z \"$libdir\"; then\n\t  # Link the convenience library\n\t  if test lib = \"$linkmode\"; then\n\t    deplibs=\"$dir/$old_library $deplibs\"\n\t  elif test prog,link = \"$linkmode,$pass\"; then\n\t    compile_deplibs=\"$dir/$old_library $compile_deplibs\"\n\t    finalize_deplibs=\"$dir/$old_library $finalize_deplibs\"\n\t  else\n\t    deplibs=\"$lib $deplibs\" # used for prog,scan pass\n\t  fi\n\t  continue\n\tfi\n\n\n\tif test prog = \"$linkmode\" && test link != \"$pass\"; then\n\t  func_append newlib_search_path \" $ladir\"\n\t  deplibs=\"$lib $deplibs\"\n\n\t  linkalldeplibs=false\n\t  if test no != \"$link_all_deplibs\" || test -z \"$library_names\" ||\n\t     test no = \"$build_libtool_libs\"; then\n\t    linkalldeplibs=:\n\t  fi\n\n\t  tmp_libs=\n\t  for deplib in $dependency_libs; do\n\t    case $deplib in\n\t    -L*) func_stripname '-L' '' \"$deplib\"\n\t         func_resolve_sysroot \"$func_stripname_result\"\n\t         func_append newlib_search_path \" $func_resolve_sysroot_result\"\n\t\t ;;\n\t    esac\n\t    # Need to link against all dependency_libs?\n\t    if $linkalldeplibs; then\n\t      deplibs=\"$deplib $deplibs\"\n\t    else\n\t      # Need to hardcode shared library paths\n\t      # or/and link against static libraries\n\t      newdependency_libs=\"$deplib $newdependency_libs\"\n\t    fi\n\t    if $opt_preserve_dup_deps; then\n\t      case \"$tmp_libs \" in\n\t      *\" $deplib \"*) func_append specialdeplibs \" $deplib\" ;;\n\t      esac\n\t    fi\n\t    func_append tmp_libs \" $deplib\"\n\t  done # for deplib\n\t  continue\n\tfi # $linkmode = prog...\n\n\tif test prog,link = \"$linkmode,$pass\"; then\n\t  if test -n \"$library_names\" &&\n\t     { { test no = \"$prefer_static_libs\" ||\n\t         test built,yes = \"$prefer_static_libs,$installed\"; } ||\n\t       test -z \"$old_library\"; }; then\n\t    # We need to hardcode the library path\n\t    if test -n \"$shlibpath_var\" && test -z \"$avoidtemprpath\"; then\n\t      # Make sure the rpath contains only unique directories.\n\t      case $temp_rpath: in\n\t      *\"$absdir:\"*) ;;\n\t      *) func_append temp_rpath \"$absdir:\" ;;\n\t      esac\n\t    fi\n\n\t    # Hardcode the library path.\n\t    # Skip directories that are in the system default run-time\n\t    # search path.\n\t    case \" $sys_lib_dlsearch_path \" in\n\t    *\" $absdir \"*) ;;\n\t    *)\n\t      case \"$compile_rpath \" in\n\t      *\" $absdir \"*) ;;\n\t      *) func_append compile_rpath \" $absdir\" ;;\n\t      esac\n\t      ;;\n\t    esac\n\t    case \" $sys_lib_dlsearch_path \" in\n\t    *\" $libdir \"*) ;;\n\t    *)\n\t      case \"$finalize_rpath \" in\n\t      *\" $libdir \"*) ;;\n\t      *) func_append finalize_rpath \" $libdir\" ;;\n\t      esac\n\t      ;;\n\t    esac\n\t  fi # $linkmode,$pass = prog,link...\n\n\t  if $alldeplibs &&\n\t     { test pass_all = \"$deplibs_check_method\" ||\n\t       { test yes = \"$build_libtool_libs\" &&\n\t\t test -n \"$library_names\"; }; }; then\n\t    # We only need to search for static libraries\n\t    continue\n\t  fi\n\tfi\n\n\tlink_static=no # Whether the deplib will be linked statically\n\tuse_static_libs=$prefer_static_libs\n\tif test built = \"$use_static_libs\" && test yes = \"$installed\"; then\n\t  use_static_libs=no\n\tfi\n\tif test -n \"$library_names\" &&\n\t   { test no = \"$use_static_libs\" || test -z \"$old_library\"; }; then\n\t  case $host in\n\t  *cygwin* | *mingw* | *cegcc* | *os2*)\n\t      # No point in relinking DLLs because paths are not encoded\n\t      func_append notinst_deplibs \" $lib\"\n\t      need_relink=no\n\t    ;;\n\t  *)\n\t    if test no = \"$installed\"; then\n\t      func_append notinst_deplibs \" $lib\"\n\t      need_relink=yes\n\t    fi\n\t    ;;\n\t  esac\n\t  # This is a shared library\n\n\t  # Warn about portability, can't link against -module's on some\n\t  # systems (darwin).  Don't bleat about dlopened modules though!\n\t  dlopenmodule=\n\t  for dlpremoduletest in $dlprefiles; do\n\t    if test \"X$dlpremoduletest\" = \"X$lib\"; then\n\t      dlopenmodule=$dlpremoduletest\n\t      break\n\t    fi\n\t  done\n\t  if test -z \"$dlopenmodule\" && test yes = \"$shouldnotlink\" && test link = \"$pass\"; then\n\t    echo\n\t    if test prog = \"$linkmode\"; then\n\t      $ECHO \"*** Warning: Linking the executable $output against the loadable module\"\n\t    else\n\t      $ECHO \"*** Warning: Linking the shared library $output against the loadable module\"\n\t    fi\n\t    $ECHO \"*** $linklib is not portable!\"\n\t  fi\n\t  if test lib = \"$linkmode\" &&\n\t     test yes = \"$hardcode_into_libs\"; then\n\t    # Hardcode the library path.\n\t    # Skip directories that are in the system default run-time\n\t    # search path.\n\t    case \" $sys_lib_dlsearch_path \" in\n\t    *\" $absdir \"*) ;;\n\t    *)\n\t      case \"$compile_rpath \" in\n\t      *\" $absdir \"*) ;;\n\t      *) func_append compile_rpath \" $absdir\" ;;\n\t      esac\n\t      ;;\n\t    esac\n\t    case \" $sys_lib_dlsearch_path \" in\n\t    *\" $libdir \"*) ;;\n\t    *)\n\t      case \"$finalize_rpath \" in\n\t      *\" $libdir \"*) ;;\n\t      *) func_append finalize_rpath \" $libdir\" ;;\n\t      esac\n\t      ;;\n\t    esac\n\t  fi\n\n\t  if test -n \"$old_archive_from_expsyms_cmds\"; then\n\t    # figure out the soname\n\t    set dummy $library_names\n\t    shift\n\t    realname=$1\n\t    shift\n\t    libname=`eval \"\\\\$ECHO \\\"$libname_spec\\\"\"`\n\t    # use dlname if we got it. it's perfectly good, no?\n\t    if test -n \"$dlname\"; then\n\t      soname=$dlname\n\t    elif test -n \"$soname_spec\"; then\n\t      # bleh windows\n\t      case $host in\n\t      *cygwin* | mingw* | *cegcc* | *os2*)\n\t        func_arith $current - $age\n\t\tmajor=$func_arith_result\n\t\tversuffix=-$major\n\t\t;;\n\t      esac\n\t      eval soname=\\\"$soname_spec\\\"\n\t    else\n\t      soname=$realname\n\t    fi\n\n\t    # Make a new name for the extract_expsyms_cmds to use\n\t    soroot=$soname\n\t    func_basename \"$soroot\"\n\t    soname=$func_basename_result\n\t    func_stripname 'lib' '.dll' \"$soname\"\n\t    newlib=libimp-$func_stripname_result.a\n\n\t    # If the library has no export list, then create one now\n\t    if test -f \"$output_objdir/$soname-def\"; then :\n\t    else\n\t      func_verbose \"extracting exported symbol list from '$soname'\"\n\t      func_execute_cmds \"$extract_expsyms_cmds\" 'exit $?'\n\t    fi\n\n\t    # Create $newlib\n\t    if test -f \"$output_objdir/$newlib\"; then :; else\n\t      func_verbose \"generating import library for '$soname'\"\n\t      func_execute_cmds \"$old_archive_from_expsyms_cmds\" 'exit $?'\n\t    fi\n\t    # make sure the library variables are pointing to the new library\n\t    dir=$output_objdir\n\t    linklib=$newlib\n\t  fi # test -n \"$old_archive_from_expsyms_cmds\"\n\n\t  if test prog = \"$linkmode\" || test relink != \"$opt_mode\"; then\n\t    add_shlibpath=\n\t    add_dir=\n\t    add=\n\t    lib_linked=yes\n\t    case $hardcode_action in\n\t    immediate | unsupported)\n\t      if test no = \"$hardcode_direct\"; then\n\t\tadd=$dir/$linklib\n\t\tcase $host in\n\t\t  *-*-sco3.2v5.0.[024]*) add_dir=-L$dir ;;\n\t\t  *-*-sysv4*uw2*) add_dir=-L$dir ;;\n\t\t  *-*-sysv5OpenUNIX* | *-*-sysv5UnixWare7.[01].[10]* | \\\n\t\t    *-*-unixware7*) add_dir=-L$dir ;;\n\t\t  *-*-darwin* )\n\t\t    # if the lib is a (non-dlopened) module then we cannot\n\t\t    # link against it, someone is ignoring the earlier warnings\n\t\t    if /usr/bin/file -L $add 2> /dev/null |\n\t\t\t $GREP \": [^:]* bundle\" >/dev/null; then\n\t\t      if test \"X$dlopenmodule\" != \"X$lib\"; then\n\t\t\t$ECHO \"*** Warning: lib $linklib is a module, not a shared library\"\n\t\t\tif test -z \"$old_library\"; then\n\t\t\t  echo\n\t\t\t  echo \"*** And there doesn't seem to be a static archive available\"\n\t\t\t  echo \"*** The link will probably fail, sorry\"\n\t\t\telse\n\t\t\t  add=$dir/$old_library\n\t\t\tfi\n\t\t      elif test -n \"$old_library\"; then\n\t\t\tadd=$dir/$old_library\n\t\t      fi\n\t\t    fi\n\t\tesac\n\t      elif test no = \"$hardcode_minus_L\"; then\n\t\tcase $host in\n\t\t*-*-sunos*) add_shlibpath=$dir ;;\n\t\tesac\n\t\tadd_dir=-L$dir\n\t\tadd=-l$name\n\t      elif test no = \"$hardcode_shlibpath_var\"; then\n\t\tadd_shlibpath=$dir\n\t\tadd=-l$name\n\t      else\n\t\tlib_linked=no\n\t      fi\n\t      ;;\n\t    relink)\n\t      if test yes = \"$hardcode_direct\" &&\n\t         test no = \"$hardcode_direct_absolute\"; then\n\t\tadd=$dir/$linklib\n\t      elif test yes = \"$hardcode_minus_L\"; then\n\t\tadd_dir=-L$absdir\n\t\t# Try looking first in the location we're being installed to.\n\t\tif test -n \"$inst_prefix_dir\"; then\n\t\t  case $libdir in\n\t\t    [\\\\/]*)\n\t\t      func_append add_dir \" -L$inst_prefix_dir$libdir\"\n\t\t      ;;\n\t\t  esac\n\t\tfi\n\t\tadd=-l$name\n\t      elif test yes = \"$hardcode_shlibpath_var\"; then\n\t\tadd_shlibpath=$dir\n\t\tadd=-l$name\n\t      else\n\t\tlib_linked=no\n\t      fi\n\t      ;;\n\t    *) lib_linked=no ;;\n\t    esac\n\n\t    if test yes != \"$lib_linked\"; then\n\t      func_fatal_configuration \"unsupported hardcode properties\"\n\t    fi\n\n\t    if test -n \"$add_shlibpath\"; then\n\t      case :$compile_shlibpath: in\n\t      *\":$add_shlibpath:\"*) ;;\n\t      *) func_append compile_shlibpath \"$add_shlibpath:\" ;;\n\t      esac\n\t    fi\n\t    if test prog = \"$linkmode\"; then\n\t      test -n \"$add_dir\" && compile_deplibs=\"$add_dir $compile_deplibs\"\n\t      test -n \"$add\" && compile_deplibs=\"$add $compile_deplibs\"\n\t    else\n\t      test -n \"$add_dir\" && deplibs=\"$add_dir $deplibs\"\n\t      test -n \"$add\" && deplibs=\"$add $deplibs\"\n\t      if test yes != \"$hardcode_direct\" &&\n\t\t test yes != \"$hardcode_minus_L\" &&\n\t\t test yes = \"$hardcode_shlibpath_var\"; then\n\t\tcase :$finalize_shlibpath: in\n\t\t*\":$libdir:\"*) ;;\n\t\t*) func_append finalize_shlibpath \"$libdir:\" ;;\n\t\tesac\n\t      fi\n\t    fi\n\t  fi\n\n\t  if test prog = \"$linkmode\" || test relink = \"$opt_mode\"; then\n\t    add_shlibpath=\n\t    add_dir=\n\t    add=\n\t    # Finalize command for both is simple: just hardcode it.\n\t    if test yes = \"$hardcode_direct\" &&\n\t       test no = \"$hardcode_direct_absolute\"; then\n\t      add=$libdir/$linklib\n\t    elif test yes = \"$hardcode_minus_L\"; then\n\t      add_dir=-L$libdir\n\t      add=-l$name\n\t    elif test yes = \"$hardcode_shlibpath_var\"; then\n\t      case :$finalize_shlibpath: in\n\t      *\":$libdir:\"*) ;;\n\t      *) func_append finalize_shlibpath \"$libdir:\" ;;\n\t      esac\n\t      add=-l$name\n\t    elif test yes = \"$hardcode_automatic\"; then\n\t      if test -n \"$inst_prefix_dir\" &&\n\t\t test -f \"$inst_prefix_dir$libdir/$linklib\"; then\n\t\tadd=$inst_prefix_dir$libdir/$linklib\n\t      else\n\t\tadd=$libdir/$linklib\n\t      fi\n\t    else\n\t      # We cannot seem to hardcode it, guess we'll fake it.\n\t      add_dir=-L$libdir\n\t      # Try looking first in the location we're being installed to.\n\t      if test -n \"$inst_prefix_dir\"; then\n\t\tcase $libdir in\n\t\t  [\\\\/]*)\n\t\t    func_append add_dir \" -L$inst_prefix_dir$libdir\"\n\t\t    ;;\n\t\tesac\n\t      fi\n\t      add=-l$name\n\t    fi\n\n\t    if test prog = \"$linkmode\"; then\n\t      test -n \"$add_dir\" && finalize_deplibs=\"$add_dir $finalize_deplibs\"\n\t      test -n \"$add\" && finalize_deplibs=\"$add $finalize_deplibs\"\n\t    else\n\t      test -n \"$add_dir\" && deplibs=\"$add_dir $deplibs\"\n\t      test -n \"$add\" && deplibs=\"$add $deplibs\"\n\t    fi\n\t  fi\n\telif test prog = \"$linkmode\"; then\n\t  # Here we assume that one of hardcode_direct or hardcode_minus_L\n\t  # is not unsupported.  This is valid on all known static and\n\t  # shared platforms.\n\t  if test unsupported != \"$hardcode_direct\"; then\n\t    test -n \"$old_library\" && linklib=$old_library\n\t    compile_deplibs=\"$dir/$linklib $compile_deplibs\"\n\t    finalize_deplibs=\"$dir/$linklib $finalize_deplibs\"\n\t  else\n\t    compile_deplibs=\"-l$name -L$dir $compile_deplibs\"\n\t    finalize_deplibs=\"-l$name -L$dir $finalize_deplibs\"\n\t  fi\n\telif test yes = \"$build_libtool_libs\"; then\n\t  # Not a shared library\n\t  if test pass_all != \"$deplibs_check_method\"; then\n\t    # We're trying link a shared library against a static one\n\t    # but the system doesn't support it.\n\n\t    # Just print a warning and add the library to dependency_libs so\n\t    # that the program can be linked against the static library.\n\t    echo\n\t    $ECHO \"*** Warning: This system cannot link to static lib archive $lib.\"\n\t    echo \"*** I have the capability to make that library automatically link in when\"\n\t    echo \"*** you link to this library.  But I can only do this if you have a\"\n\t    echo \"*** shared version of the library, which you do not appear to have.\"\n\t    if test yes = \"$module\"; then\n\t      echo \"*** But as you try to build a module library, libtool will still create \"\n\t      echo \"*** a static module, that should work as long as the dlopening application\"\n\t      echo \"*** is linked with the -dlopen flag to resolve symbols at runtime.\"\n\t      if test -z \"$global_symbol_pipe\"; then\n\t\techo\n\t\techo \"*** However, this would only work if libtool was able to extract symbol\"\n\t\techo \"*** lists from a program, using 'nm' or equivalent, but libtool could\"\n\t\techo \"*** not find such a program.  So, this module is probably useless.\"\n\t\techo \"*** 'nm' from GNU binutils and a full rebuild may help.\"\n\t      fi\n\t      if test no = \"$build_old_libs\"; then\n\t\tbuild_libtool_libs=module\n\t\tbuild_old_libs=yes\n\t      else\n\t\tbuild_libtool_libs=no\n\t      fi\n\t    fi\n\t  else\n\t    deplibs=\"$dir/$old_library $deplibs\"\n\t    link_static=yes\n\t  fi\n\tfi # link shared/static library?\n\n\tif test lib = \"$linkmode\"; then\n\t  if test -n \"$dependency_libs\" &&\n\t     { test yes != \"$hardcode_into_libs\" ||\n\t       test yes = \"$build_old_libs\" ||\n\t       test yes = \"$link_static\"; }; then\n\t    # Extract -R from dependency_libs\n\t    temp_deplibs=\n\t    for libdir in $dependency_libs; do\n\t      case $libdir in\n\t      -R*) func_stripname '-R' '' \"$libdir\"\n\t           temp_xrpath=$func_stripname_result\n\t\t   case \" $xrpath \" in\n\t\t   *\" $temp_xrpath \"*) ;;\n\t\t   *) func_append xrpath \" $temp_xrpath\";;\n\t\t   esac;;\n\t      *) func_append temp_deplibs \" $libdir\";;\n\t      esac\n\t    done\n\t    dependency_libs=$temp_deplibs\n\t  fi\n\n\t  func_append newlib_search_path \" $absdir\"\n\t  # Link against this library\n\t  test no = \"$link_static\" && newdependency_libs=\"$abs_ladir/$laname $newdependency_libs\"\n\t  # ... and its dependency_libs\n\t  tmp_libs=\n\t  for deplib in $dependency_libs; do\n\t    newdependency_libs=\"$deplib $newdependency_libs\"\n\t    case $deplib in\n              -L*) func_stripname '-L' '' \"$deplib\"\n                   func_resolve_sysroot \"$func_stripname_result\";;\n              *) func_resolve_sysroot \"$deplib\" ;;\n            esac\n\t    if $opt_preserve_dup_deps; then\n\t      case \"$tmp_libs \" in\n\t      *\" $func_resolve_sysroot_result \"*)\n                func_append specialdeplibs \" $func_resolve_sysroot_result\" ;;\n\t      esac\n\t    fi\n\t    func_append tmp_libs \" $func_resolve_sysroot_result\"\n\t  done\n\n\t  if test no != \"$link_all_deplibs\"; then\n\t    # Add the search paths of all dependency libraries\n\t    for deplib in $dependency_libs; do\n\t      path=\n\t      case $deplib in\n\t      -L*) path=$deplib ;;\n\t      *.la)\n\t        func_resolve_sysroot \"$deplib\"\n\t        deplib=$func_resolve_sysroot_result\n\t        func_dirname \"$deplib\" \"\" \".\"\n\t\tdir=$func_dirname_result\n\t\t# We need an absolute path.\n\t\tcase $dir in\n\t\t[\\\\/]* | [A-Za-z]:[\\\\/]*) absdir=$dir ;;\n\t\t*)\n\t\t  absdir=`cd \"$dir\" && pwd`\n\t\t  if test -z \"$absdir\"; then\n\t\t    func_warning \"cannot determine absolute directory name of '$dir'\"\n\t\t    absdir=$dir\n\t\t  fi\n\t\t  ;;\n\t\tesac\n\t\tif $GREP \"^installed=no\" $deplib > /dev/null; then\n\t\tcase $host in\n\t\t*-*-darwin*)\n\t\t  depdepl=\n\t\t  eval deplibrary_names=`$SED -n -e 's/^library_names=\\(.*\\)$/\\1/p' $deplib`\n\t\t  if test -n \"$deplibrary_names\"; then\n\t\t    for tmp in $deplibrary_names; do\n\t\t      depdepl=$tmp\n\t\t    done\n\t\t    if test -f \"$absdir/$objdir/$depdepl\"; then\n\t\t      depdepl=$absdir/$objdir/$depdepl\n\t\t      darwin_install_name=`$OTOOL -L $depdepl | awk '{if (NR == 2) {print $1;exit}}'`\n                      if test -z \"$darwin_install_name\"; then\n                          darwin_install_name=`$OTOOL64 -L $depdepl  | awk '{if (NR == 2) {print $1;exit}}'`\n                      fi\n\t\t      func_append compiler_flags \" $wl-dylib_file $wl$darwin_install_name:$depdepl\"\n\t\t      func_append linker_flags \" -dylib_file $darwin_install_name:$depdepl\"\n\t\t      path=\n\t\t    fi\n\t\t  fi\n\t\t  ;;\n\t\t*)\n\t\t  path=-L$absdir/$objdir\n\t\t  ;;\n\t\tesac\n\t\telse\n\t\t  eval libdir=`$SED -n -e 's/^libdir=\\(.*\\)$/\\1/p' $deplib`\n\t\t  test -z \"$libdir\" && \\\n\t\t    func_fatal_error \"'$deplib' is not a valid libtool archive\"\n\t\t  test \"$absdir\" != \"$libdir\" && \\\n\t\t    func_warning \"'$deplib' seems to be moved\"\n\n\t\t  path=-L$absdir\n\t\tfi\n\t\t;;\n\t      esac\n\t      case \" $deplibs \" in\n\t      *\" $path \"*) ;;\n\t      *) deplibs=\"$path $deplibs\" ;;\n\t      esac\n\t    done\n\t  fi # link_all_deplibs != no\n\tfi # linkmode = lib\n      done # for deplib in $libs\n      if test link = \"$pass\"; then\n\tif test prog = \"$linkmode\"; then\n\t  compile_deplibs=\"$new_inherited_linker_flags $compile_deplibs\"\n\t  finalize_deplibs=\"$new_inherited_linker_flags $finalize_deplibs\"\n\telse\n\t  compiler_flags=\"$compiler_flags \"`$ECHO \" $new_inherited_linker_flags\" | $SED 's% \\([^ $]*\\).ltframework% -framework \\1%g'`\n\tfi\n      fi\n      dependency_libs=$newdependency_libs\n      if test dlpreopen = \"$pass\"; then\n\t# Link the dlpreopened libraries before other libraries\n\tfor deplib in $save_deplibs; do\n\t  deplibs=\"$deplib $deplibs\"\n\tdone\n      fi\n      if test dlopen != \"$pass\"; then\n\ttest conv = \"$pass\" || {\n\t  # Make sure lib_search_path contains only unique directories.\n\t  lib_search_path=\n\t  for dir in $newlib_search_path; do\n\t    case \"$lib_search_path \" in\n\t    *\" $dir \"*) ;;\n\t    *) func_append lib_search_path \" $dir\" ;;\n\t    esac\n\t  done\n\t  newlib_search_path=\n\t}\n\n\tif test prog,link = \"$linkmode,$pass\"; then\n\t  vars=\"compile_deplibs finalize_deplibs\"\n\telse\n\t  vars=deplibs\n\tfi\n\tfor var in $vars dependency_libs; do\n\t  # Add libraries to $var in reverse order\n\t  eval tmp_libs=\\\"\\$$var\\\"\n\t  new_libs=\n\t  for deplib in $tmp_libs; do\n\t    # FIXME: Pedantically, this is the right thing to do, so\n\t    #        that some nasty dependency loop isn't accidentally\n\t    #        broken:\n\t    #new_libs=\"$deplib $new_libs\"\n\t    # Pragmatically, this seems to cause very few problems in\n\t    # practice:\n\t    case $deplib in\n\t    -L*) new_libs=\"$deplib $new_libs\" ;;\n\t    -R*) ;;\n\t    *)\n\t      # And here is the reason: when a library appears more\n\t      # than once as an explicit dependence of a library, or\n\t      # is implicitly linked in more than once by the\n\t      # compiler, it is considered special, and multiple\n\t      # occurrences thereof are not removed.  Compare this\n\t      # with having the same library being listed as a\n\t      # dependency of multiple other libraries: in this case,\n\t      # we know (pedantically, we assume) the library does not\n\t      # need to be listed more than once, so we keep only the\n\t      # last copy.  This is not always right, but it is rare\n\t      # enough that we require users that really mean to play\n\t      # such unportable linking tricks to link the library\n\t      # using -Wl,-lname, so that libtool does not consider it\n\t      # for duplicate removal.\n\t      case \" $specialdeplibs \" in\n\t      *\" $deplib \"*) new_libs=\"$deplib $new_libs\" ;;\n\t      *)\n\t\tcase \" $new_libs \" in\n\t\t*\" $deplib \"*) ;;\n\t\t*) new_libs=\"$deplib $new_libs\" ;;\n\t\tesac\n\t\t;;\n\t      esac\n\t      ;;\n\t    esac\n\t  done\n\t  tmp_libs=\n\t  for deplib in $new_libs; do\n\t    case $deplib in\n\t    -L*)\n\t      case \" $tmp_libs \" in\n\t      *\" $deplib \"*) ;;\n\t      *) func_append tmp_libs \" $deplib\" ;;\n\t      esac\n\t      ;;\n\t    *) func_append tmp_libs \" $deplib\" ;;\n\t    esac\n\t  done\n\t  eval $var=\\\"$tmp_libs\\\"\n\tdone # for var\n      fi\n\n      # Add Sun CC postdeps if required:\n      test CXX = \"$tagname\" && {\n        case $host_os in\n        linux*)\n          case `$CC -V 2>&1 | sed 5q` in\n          *Sun\\ C*) # Sun C++ 5.9\n            func_suncc_cstd_abi\n\n            if test no != \"$suncc_use_cstd_abi\"; then\n              func_append postdeps ' -library=Cstd -library=Crun'\n            fi\n            ;;\n          esac\n          ;;\n\n        solaris*)\n          func_cc_basename \"$CC\"\n          case $func_cc_basename_result in\n          CC* | sunCC*)\n            func_suncc_cstd_abi\n\n            if test no != \"$suncc_use_cstd_abi\"; then\n              func_append postdeps ' -library=Cstd -library=Crun'\n            fi\n            ;;\n          esac\n          ;;\n        esac\n      }\n\n      # Last step: remove runtime libs from dependency_libs\n      # (they stay in deplibs)\n      tmp_libs=\n      for i in $dependency_libs; do\n\tcase \" $predeps $postdeps $compiler_lib_search_path \" in\n\t*\" $i \"*)\n\t  i=\n\t  ;;\n\tesac\n\tif test -n \"$i\"; then\n\t  func_append tmp_libs \" $i\"\n\tfi\n      done\n      dependency_libs=$tmp_libs\n    done # for pass\n    if test prog = \"$linkmode\"; then\n      dlfiles=$newdlfiles\n    fi\n    if test prog = \"$linkmode\" || test lib = \"$linkmode\"; then\n      dlprefiles=$newdlprefiles\n    fi\n\n    case $linkmode in\n    oldlib)\n      if test -n \"$dlfiles$dlprefiles\" || test no != \"$dlself\"; then\n\tfunc_warning \"'-dlopen' is ignored for archives\"\n      fi\n\n      case \" $deplibs\" in\n      *\\ -l* | *\\ -L*)\n\tfunc_warning \"'-l' and '-L' are ignored for archives\" ;;\n      esac\n\n      test -n \"$rpath\" && \\\n\tfunc_warning \"'-rpath' is ignored for archives\"\n\n      test -n \"$xrpath\" && \\\n\tfunc_warning \"'-R' is ignored for archives\"\n\n      test -n \"$vinfo\" && \\\n\tfunc_warning \"'-version-info/-version-number' is ignored for archives\"\n\n      test -n \"$release\" && \\\n\tfunc_warning \"'-release' is ignored for archives\"\n\n      test -n \"$export_symbols$export_symbols_regex\" && \\\n\tfunc_warning \"'-export-symbols' is ignored for archives\"\n\n      # Now set the variables for building old libraries.\n      build_libtool_libs=no\n      oldlibs=$output\n      func_append objs \"$old_deplibs\"\n      ;;\n\n    lib)\n      # Make sure we only generate libraries of the form 'libNAME.la'.\n      case $outputname in\n      lib*)\n\tfunc_stripname 'lib' '.la' \"$outputname\"\n\tname=$func_stripname_result\n\teval shared_ext=\\\"$shrext_cmds\\\"\n\teval libname=\\\"$libname_spec\\\"\n\t;;\n      *)\n\ttest no = \"$module\" \\\n\t  && func_fatal_help \"libtool library '$output' must begin with 'lib'\"\n\n\tif test no != \"$need_lib_prefix\"; then\n\t  # Add the \"lib\" prefix for modules if required\n\t  func_stripname '' '.la' \"$outputname\"\n\t  name=$func_stripname_result\n\t  eval shared_ext=\\\"$shrext_cmds\\\"\n\t  eval libname=\\\"$libname_spec\\\"\n\telse\n\t  func_stripname '' '.la' \"$outputname\"\n\t  libname=$func_stripname_result\n\tfi\n\t;;\n      esac\n\n      if test -n \"$objs\"; then\n\tif test pass_all != \"$deplibs_check_method\"; then\n\t  func_fatal_error \"cannot build libtool library '$output' from non-libtool objects on this host:$objs\"\n\telse\n\t  echo\n\t  $ECHO \"*** Warning: Linking the shared library $output against the non-libtool\"\n\t  $ECHO \"*** objects $objs is not portable!\"\n\t  func_append libobjs \" $objs\"\n\tfi\n      fi\n\n      test no = \"$dlself\" \\\n\t|| func_warning \"'-dlopen self' is ignored for libtool libraries\"\n\n      set dummy $rpath\n      shift\n      test 1 -lt \"$#\" \\\n\t&& func_warning \"ignoring multiple '-rpath's for a libtool library\"\n\n      install_libdir=$1\n\n      oldlibs=\n      if test -z \"$rpath\"; then\n\tif test yes = \"$build_libtool_libs\"; then\n\t  # Building a libtool convenience library.\n\t  # Some compilers have problems with a '.al' extension so\n\t  # convenience libraries should have the same extension an\n\t  # archive normally would.\n\t  oldlibs=\"$output_objdir/$libname.$libext $oldlibs\"\n\t  build_libtool_libs=convenience\n\t  build_old_libs=yes\n\tfi\n\n\ttest -n \"$vinfo\" && \\\n\t  func_warning \"'-version-info/-version-number' is ignored for convenience libraries\"\n\n\ttest -n \"$release\" && \\\n\t  func_warning \"'-release' is ignored for convenience libraries\"\n      else\n\n\t# Parse the version information argument.\n\tsave_ifs=$IFS; IFS=:\n\tset dummy $vinfo 0 0 0\n\tshift\n\tIFS=$save_ifs\n\n\ttest -n \"$7\" && \\\n\t  func_fatal_help \"too many parameters to '-version-info'\"\n\n\t# convert absolute version numbers to libtool ages\n\t# this retains compatibility with .la files and attempts\n\t# to make the code below a bit more comprehensible\n\n\tcase $vinfo_number in\n\tyes)\n\t  number_major=$1\n\t  number_minor=$2\n\t  number_revision=$3\n\t  #\n\t  # There are really only two kinds -- those that\n\t  # use the current revision as the major version\n\t  # and those that subtract age and use age as\n\t  # a minor version.  But, then there is irix\n\t  # that has an extra 1 added just for fun\n\t  #\n\t  case $version_type in\n\t  # correct linux to gnu/linux during the next big refactor\n\t  darwin|freebsd-elf|linux|osf|windows|none)\n\t    func_arith $number_major + $number_minor\n\t    current=$func_arith_result\n\t    age=$number_minor\n\t    revision=$number_revision\n\t    ;;\n\t  freebsd-aout|qnx|sunos)\n\t    current=$number_major\n\t    revision=$number_minor\n\t    age=0\n\t    ;;\n\t  irix|nonstopux)\n\t    func_arith $number_major + $number_minor\n\t    current=$func_arith_result\n\t    age=$number_minor\n\t    revision=$number_minor\n\t    lt_irix_increment=no\n\t    ;;\n\t  *)\n\t    func_fatal_configuration \"$modename: unknown library version type '$version_type'\"\n\t    ;;\n\t  esac\n\t  ;;\n\tno)\n\t  current=$1\n\t  revision=$2\n\t  age=$3\n\t  ;;\n\tesac\n\n\t# Check that each of the things are valid numbers.\n\tcase $current in\n\t0|[1-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-9][0-9][0-9][0-9][0-9]) ;;\n\t*)\n\t  func_error \"CURRENT '$current' must be a nonnegative integer\"\n\t  func_fatal_error \"'$vinfo' is not valid version information\"\n\t  ;;\n\tesac\n\n\tcase $revision in\n\t0|[1-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-9][0-9][0-9][0-9][0-9]) ;;\n\t*)\n\t  func_error \"REVISION '$revision' must be a nonnegative integer\"\n\t  func_fatal_error \"'$vinfo' is not valid version information\"\n\t  ;;\n\tesac\n\n\tcase $age in\n\t0|[1-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-9][0-9][0-9][0-9][0-9]) ;;\n\t*)\n\t  func_error \"AGE '$age' must be a nonnegative integer\"\n\t  func_fatal_error \"'$vinfo' is not valid version information\"\n\t  ;;\n\tesac\n\n\tif test \"$age\" -gt \"$current\"; then\n\t  func_error \"AGE '$age' is greater than the current interface number '$current'\"\n\t  func_fatal_error \"'$vinfo' is not valid version information\"\n\tfi\n\n\t# Calculate the version variables.\n\tmajor=\n\tversuffix=\n\tverstring=\n\tcase $version_type in\n\tnone) ;;\n\n\tdarwin)\n\t  # Like Linux, but with the current version available in\n\t  # verstring for coding it into the library header\n\t  func_arith $current - $age\n\t  major=.$func_arith_result\n\t  versuffix=$major.$age.$revision\n\t  # Darwin ld doesn't like 0 for these options...\n\t  func_arith $current + 1\n\t  minor_current=$func_arith_result\n\t  xlcverstring=\"$wl-compatibility_version $wl$minor_current $wl-current_version $wl$minor_current.$revision\"\n\t  verstring=\"-compatibility_version $minor_current -current_version $minor_current.$revision\"\n          # On Darwin other compilers\n          case $CC in\n              nagfor*)\n                  verstring=\"$wl-compatibility_version $wl$minor_current $wl-current_version $wl$minor_current.$revision\"\n                  ;;\n              *)\n                  verstring=\"-compatibility_version $minor_current -current_version $minor_current.$revision\"\n                  ;;\n          esac\n\t  ;;\n\n\tfreebsd-aout)\n\t  major=.$current\n\t  versuffix=.$current.$revision\n\t  ;;\n\n\tfreebsd-elf)\n\t  func_arith $current - $age\n\t  major=.$func_arith_result\n\t  versuffix=$major.$age.$revision\n\t  ;;\n\n\tirix | nonstopux)\n\t  if test no = \"$lt_irix_increment\"; then\n\t    func_arith $current - $age\n\t  else\n\t    func_arith $current - $age + 1\n\t  fi\n\t  major=$func_arith_result\n\n\t  case $version_type in\n\t    nonstopux) verstring_prefix=nonstopux ;;\n\t    *)         verstring_prefix=sgi ;;\n\t  esac\n\t  verstring=$verstring_prefix$major.$revision\n\n\t  # Add in all the interfaces that we are compatible with.\n\t  loop=$revision\n\t  while test 0 -ne \"$loop\"; do\n\t    func_arith $revision - $loop\n\t    iface=$func_arith_result\n\t    func_arith $loop - 1\n\t    loop=$func_arith_result\n\t    verstring=$verstring_prefix$major.$iface:$verstring\n\t  done\n\n\t  # Before this point, $major must not contain '.'.\n\t  major=.$major\n\t  versuffix=$major.$revision\n\t  ;;\n\n\tlinux) # correct to gnu/linux during the next big refactor\n\t  func_arith $current - $age\n\t  major=.$func_arith_result\n\t  versuffix=$major.$age.$revision\n\t  ;;\n\n\tosf)\n\t  func_arith $current - $age\n\t  major=.$func_arith_result\n\t  versuffix=.$current.$age.$revision\n\t  verstring=$current.$age.$revision\n\n\t  # Add in all the interfaces that we are compatible with.\n\t  loop=$age\n\t  while test 0 -ne \"$loop\"; do\n\t    func_arith $current - $loop\n\t    iface=$func_arith_result\n\t    func_arith $loop - 1\n\t    loop=$func_arith_result\n\t    verstring=$verstring:$iface.0\n\t  done\n\n\t  # Make executables depend on our current version.\n\t  func_append verstring \":$current.0\"\n\t  ;;\n\n\tqnx)\n\t  major=.$current\n\t  versuffix=.$current\n\t  ;;\n\n\tsco)\n\t  major=.$current\n\t  versuffix=.$current\n\t  ;;\n\n\tsunos)\n\t  major=.$current\n\t  versuffix=.$current.$revision\n\t  ;;\n\n\twindows)\n\t  # Use '-' rather than '.', since we only want one\n\t  # extension on DOS 8.3 file systems.\n\t  func_arith $current - $age\n\t  major=$func_arith_result\n\t  versuffix=-$major\n\t  ;;\n\n\t*)\n\t  func_fatal_configuration \"unknown library version type '$version_type'\"\n\t  ;;\n\tesac\n\n\t# Clear the version info if we defaulted, and they specified a release.\n\tif test -z \"$vinfo\" && test -n \"$release\"; then\n\t  major=\n\t  case $version_type in\n\t  darwin)\n\t    # we can't check for \"0.0\" in archive_cmds due to quoting\n\t    # problems, so we reset it completely\n\t    verstring=\n\t    ;;\n\t  *)\n\t    verstring=0.0\n\t    ;;\n\t  esac\n\t  if test no = \"$need_version\"; then\n\t    versuffix=\n\t  else\n\t    versuffix=.0.0\n\t  fi\n\tfi\n\n\t# Remove version info from name if versioning should be avoided\n\tif test yes,no = \"$avoid_version,$need_version\"; then\n\t  major=\n\t  versuffix=\n\t  verstring=\n\tfi\n\n\t# Check to see if the archive will have undefined symbols.\n\tif test yes = \"$allow_undefined\"; then\n\t  if test unsupported = \"$allow_undefined_flag\"; then\n\t    if test yes = \"$build_old_libs\"; then\n\t      func_warning \"undefined symbols not allowed in $host shared libraries; building static only\"\n\t      build_libtool_libs=no\n\t    else\n\t      func_fatal_error \"can't build $host shared library unless -no-undefined is specified\"\n\t    fi\n\t  fi\n\telse\n\t  # Don't allow undefined symbols.\n\t  allow_undefined_flag=$no_undefined_flag\n\tfi\n\n      fi\n\n      func_generate_dlsyms \"$libname\" \"$libname\" :\n      func_append libobjs \" $symfileobj\"\n      test \" \" = \"$libobjs\" && libobjs=\n\n      if test relink != \"$opt_mode\"; then\n\t# Remove our outputs, but don't remove object files since they\n\t# may have been created when compiling PIC objects.\n\tremovelist=\n\ttempremovelist=`$ECHO \"$output_objdir/*\"`\n\tfor p in $tempremovelist; do\n\t  case $p in\n\t    *.$objext | *.gcno)\n\t       ;;\n\t    $output_objdir/$outputname | $output_objdir/$libname.* | $output_objdir/$libname$release.*)\n\t       if test -n \"$precious_files_regex\"; then\n\t\t if $ECHO \"$p\" | $EGREP -e \"$precious_files_regex\" >/dev/null 2>&1\n\t\t then\n\t\t   continue\n\t\t fi\n\t       fi\n\t       func_append removelist \" $p\"\n\t       ;;\n\t    *) ;;\n\t  esac\n\tdone\n\ttest -n \"$removelist\" && \\\n\t  func_show_eval \"${RM}r \\$removelist\"\n      fi\n\n      # Now set the variables for building old libraries.\n      if test yes = \"$build_old_libs\" && test convenience != \"$build_libtool_libs\"; then\n\tfunc_append oldlibs \" $output_objdir/$libname.$libext\"\n\n\t# Transform .lo files to .o files.\n\toldobjs=\"$objs \"`$ECHO \"$libobjs\" | $SP2NL | $SED \"/\\.$libext$/d; $lo2o\" | $NL2SP`\n      fi\n\n      # Eliminate all temporary directories.\n      #for path in $notinst_path; do\n      #\tlib_search_path=`$ECHO \"$lib_search_path \" | $SED \"s% $path % %g\"`\n      #\tdeplibs=`$ECHO \"$deplibs \" | $SED \"s% -L$path % %g\"`\n      #\tdependency_libs=`$ECHO \"$dependency_libs \" | $SED \"s% -L$path % %g\"`\n      #done\n\n      if test -n \"$xrpath\"; then\n\t# If the user specified any rpath flags, then add them.\n\ttemp_xrpath=\n\tfor libdir in $xrpath; do\n\t  func_replace_sysroot \"$libdir\"\n\t  func_append temp_xrpath \" -R$func_replace_sysroot_result\"\n\t  case \"$finalize_rpath \" in\n\t  *\" $libdir \"*) ;;\n\t  *) func_append finalize_rpath \" $libdir\" ;;\n\t  esac\n\tdone\n\tif test yes != \"$hardcode_into_libs\" || test yes = \"$build_old_libs\"; then\n\t  dependency_libs=\"$temp_xrpath $dependency_libs\"\n\tfi\n      fi\n\n      # Make sure dlfiles contains only unique files that won't be dlpreopened\n      old_dlfiles=$dlfiles\n      dlfiles=\n      for lib in $old_dlfiles; do\n\tcase \" $dlprefiles $dlfiles \" in\n\t*\" $lib \"*) ;;\n\t*) func_append dlfiles \" $lib\" ;;\n\tesac\n      done\n\n      # Make sure dlprefiles contains only unique files\n      old_dlprefiles=$dlprefiles\n      dlprefiles=\n      for lib in $old_dlprefiles; do\n\tcase \"$dlprefiles \" in\n\t*\" $lib \"*) ;;\n\t*) func_append dlprefiles \" $lib\" ;;\n\tesac\n      done\n\n      if test yes = \"$build_libtool_libs\"; then\n\tif test -n \"$rpath\"; then\n\t  case $host in\n\t  *-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-os2* | *-*-beos* | *-cegcc* | *-*-haiku*)\n\t    # these systems don't actually have a c library (as such)!\n\t    ;;\n\t  *-*-rhapsody* | *-*-darwin1.[012])\n\t    # Rhapsody C library is in the System framework\n\t    func_append deplibs \" System.ltframework\"\n\t    ;;\n\t  *-*-netbsd*)\n\t    # Don't link with libc until the a.out ld.so is fixed.\n\t    ;;\n\t  *-*-openbsd* | *-*-freebsd* | *-*-dragonfly*)\n\t    # Do not include libc due to us having libc/libc_r.\n\t    ;;\n\t  *-*-sco3.2v5* | *-*-sco5v6*)\n\t    # Causes problems with __ctype\n\t    ;;\n\t  *-*-sysv4.2uw2* | *-*-sysv5* | *-*-unixware* | *-*-OpenUNIX*)\n\t    # Compiler inserts libc in the correct place for threads to work\n\t    ;;\n\t  *)\n\t    # Add libc to deplibs on all other systems if necessary.\n\t    if test yes = \"$build_libtool_need_lc\"; then\n\t      func_append deplibs \" -lc\"\n\t    fi\n\t    ;;\n\t  esac\n\tfi\n\n\t# Transform deplibs into only deplibs that can be linked in shared.\n\tname_save=$name\n\tlibname_save=$libname\n\trelease_save=$release\n\tversuffix_save=$versuffix\n\tmajor_save=$major\n\t# I'm not sure if I'm treating the release correctly.  I think\n\t# release should show up in the -l (ie -lgmp5) so we don't want to\n\t# add it in twice.  Is that correct?\n\trelease=\n\tversuffix=\n\tmajor=\n\tnewdeplibs=\n\tdroppeddeps=no\n\tcase $deplibs_check_method in\n\tpass_all)\n\t  # Don't check for shared/static.  Everything works.\n\t  # This might be a little naive.  We might want to check\n\t  # whether the library exists or not.  But this is on\n\t  # osf3 & osf4 and I'm not really sure... Just\n\t  # implementing what was already the behavior.\n\t  newdeplibs=$deplibs\n\t  ;;\n\ttest_compile)\n\t  # This code stresses the \"libraries are programs\" paradigm to its\n\t  # limits. Maybe even breaks it.  We compile a program, linking it\n\t  # against the deplibs as a proxy for the library.  Then we can check\n\t  # whether they linked in statically or dynamically with ldd.\n\t  $opt_dry_run || $RM conftest.c\n\t  cat > conftest.c <<EOF\n\t  int main() { return 0; }\nEOF\n\t  $opt_dry_run || $RM conftest\n\t  if $LTCC $LTCFLAGS -o conftest conftest.c $deplibs; then\n\t    ldd_output=`ldd conftest`\n\t    for i in $deplibs; do\n\t      case $i in\n\t      -l*)\n\t\tfunc_stripname -l '' \"$i\"\n\t\tname=$func_stripname_result\n\t\tif test yes = \"$allow_libtool_libs_with_static_runtimes\"; then\n\t\t  case \" $predeps $postdeps \" in\n\t\t  *\" $i \"*)\n\t\t    func_append newdeplibs \" $i\"\n\t\t    i=\n\t\t    ;;\n\t\t  esac\n\t\tfi\n\t\tif test -n \"$i\"; then\n\t\t  libname=`eval \"\\\\$ECHO \\\"$libname_spec\\\"\"`\n\t\t  deplib_matches=`eval \"\\\\$ECHO \\\"$library_names_spec\\\"\"`\n\t\t  set dummy $deplib_matches; shift\n\t\t  deplib_match=$1\n\t\t  if test `expr \"$ldd_output\" : \".*$deplib_match\"` -ne 0; then\n\t\t    func_append newdeplibs \" $i\"\n\t\t  else\n\t\t    droppeddeps=yes\n\t\t    echo\n\t\t    $ECHO \"*** Warning: dynamic linker does not accept needed library $i.\"\n\t\t    echo \"*** I have the capability to make that library automatically link in when\"\n\t\t    echo \"*** you link to this library.  But I can only do this if you have a\"\n\t\t    echo \"*** shared version of the library, which I believe you do not have\"\n\t\t    echo \"*** because a test_compile did reveal that the linker did not use it for\"\n\t\t    echo \"*** its dynamic dependency list that programs get resolved with at runtime.\"\n\t\t  fi\n\t\tfi\n\t\t;;\n\t      *)\n\t\tfunc_append newdeplibs \" $i\"\n\t\t;;\n\t      esac\n\t    done\n\t  else\n\t    # Error occurred in the first compile.  Let's try to salvage\n\t    # the situation: Compile a separate program for each library.\n\t    for i in $deplibs; do\n\t      case $i in\n\t      -l*)\n\t\tfunc_stripname -l '' \"$i\"\n\t\tname=$func_stripname_result\n\t\t$opt_dry_run || $RM conftest\n\t\tif $LTCC $LTCFLAGS -o conftest conftest.c $i; then\n\t\t  ldd_output=`ldd conftest`\n\t\t  if test yes = \"$allow_libtool_libs_with_static_runtimes\"; then\n\t\t    case \" $predeps $postdeps \" in\n\t\t    *\" $i \"*)\n\t\t      func_append newdeplibs \" $i\"\n\t\t      i=\n\t\t      ;;\n\t\t    esac\n\t\t  fi\n\t\t  if test -n \"$i\"; then\n\t\t    libname=`eval \"\\\\$ECHO \\\"$libname_spec\\\"\"`\n\t\t    deplib_matches=`eval \"\\\\$ECHO \\\"$library_names_spec\\\"\"`\n\t\t    set dummy $deplib_matches; shift\n\t\t    deplib_match=$1\n\t\t    if test `expr \"$ldd_output\" : \".*$deplib_match\"` -ne 0; then\n\t\t      func_append newdeplibs \" $i\"\n\t\t    else\n\t\t      droppeddeps=yes\n\t\t      echo\n\t\t      $ECHO \"*** Warning: dynamic linker does not accept needed library $i.\"\n\t\t      echo \"*** I have the capability to make that library automatically link in when\"\n\t\t      echo \"*** you link to this library.  But I can only do this if you have a\"\n\t\t      echo \"*** shared version of the library, which you do not appear to have\"\n\t\t      echo \"*** because a test_compile did reveal that the linker did not use this one\"\n\t\t      echo \"*** as a dynamic dependency that programs can get resolved with at runtime.\"\n\t\t    fi\n\t\t  fi\n\t\telse\n\t\t  droppeddeps=yes\n\t\t  echo\n\t\t  $ECHO \"*** Warning!  Library $i is needed by this library but I was not able to\"\n\t\t  echo \"*** make it link in!  You will probably need to install it or some\"\n\t\t  echo \"*** library that it depends on before this library will be fully\"\n\t\t  echo \"*** functional.  Installing it before continuing would be even better.\"\n\t\tfi\n\t\t;;\n\t      *)\n\t\tfunc_append newdeplibs \" $i\"\n\t\t;;\n\t      esac\n\t    done\n\t  fi\n\t  ;;\n\tfile_magic*)\n\t  set dummy $deplibs_check_method; shift\n\t  file_magic_regex=`expr \"$deplibs_check_method\" : \"$1 \\(.*\\)\"`\n\t  for a_deplib in $deplibs; do\n\t    case $a_deplib in\n\t    -l*)\n\t      func_stripname -l '' \"$a_deplib\"\n\t      name=$func_stripname_result\n\t      if test yes = \"$allow_libtool_libs_with_static_runtimes\"; then\n\t\tcase \" $predeps $postdeps \" in\n\t\t*\" $a_deplib \"*)\n\t\t  func_append newdeplibs \" $a_deplib\"\n\t\t  a_deplib=\n\t\t  ;;\n\t\tesac\n\t      fi\n\t      if test -n \"$a_deplib\"; then\n\t\tlibname=`eval \"\\\\$ECHO \\\"$libname_spec\\\"\"`\n\t\tif test -n \"$file_magic_glob\"; then\n\t\t  libnameglob=`func_echo_all \"$libname\" | $SED -e $file_magic_glob`\n\t\telse\n\t\t  libnameglob=$libname\n\t\tfi\n\t\ttest yes = \"$want_nocaseglob\" && nocaseglob=`shopt -p nocaseglob`\n\t\tfor i in $lib_search_path $sys_lib_search_path $shlib_search_path; do\n\t\t  if test yes = \"$want_nocaseglob\"; then\n\t\t    shopt -s nocaseglob\n\t\t    potential_libs=`ls $i/$libnameglob[.-]* 2>/dev/null`\n\t\t    $nocaseglob\n\t\t  else\n\t\t    potential_libs=`ls $i/$libnameglob[.-]* 2>/dev/null`\n\t\t  fi\n\t\t  for potent_lib in $potential_libs; do\n\t\t      # Follow soft links.\n\t\t      if ls -lLd \"$potent_lib\" 2>/dev/null |\n\t\t\t $GREP \" -> \" >/dev/null; then\n\t\t\tcontinue\n\t\t      fi\n\t\t      # The statement above tries to avoid entering an\n\t\t      # endless loop below, in case of cyclic links.\n\t\t      # We might still enter an endless loop, since a link\n\t\t      # loop can be closed while we follow links,\n\t\t      # but so what?\n\t\t      potlib=$potent_lib\n\t\t      while test -h \"$potlib\" 2>/dev/null; do\n\t\t\tpotliblink=`ls -ld $potlib | $SED 's/.* -> //'`\n\t\t\tcase $potliblink in\n\t\t\t[\\\\/]* | [A-Za-z]:[\\\\/]*) potlib=$potliblink;;\n\t\t\t*) potlib=`$ECHO \"$potlib\" | $SED 's|[^/]*$||'`\"$potliblink\";;\n\t\t\tesac\n\t\t      done\n\t\t      if eval $file_magic_cmd \\\"\\$potlib\\\" 2>/dev/null |\n\t\t\t $SED -e 10q |\n\t\t\t $EGREP \"$file_magic_regex\" > /dev/null; then\n\t\t\tfunc_append newdeplibs \" $a_deplib\"\n\t\t\ta_deplib=\n\t\t\tbreak 2\n\t\t      fi\n\t\t  done\n\t\tdone\n\t      fi\n\t      if test -n \"$a_deplib\"; then\n\t\tdroppeddeps=yes\n\t\techo\n\t\t$ECHO \"*** Warning: linker path does not have real file for library $a_deplib.\"\n\t\techo \"*** I have the capability to make that library automatically link in when\"\n\t\techo \"*** you link to this library.  But I can only do this if you have a\"\n\t\techo \"*** shared version of the library, which you do not appear to have\"\n\t\techo \"*** because I did check the linker path looking for a file starting\"\n\t\tif test -z \"$potlib\"; then\n\t\t  $ECHO \"*** with $libname but no candidates were found. (...for file magic test)\"\n\t\telse\n\t\t  $ECHO \"*** with $libname and none of the candidates passed a file format test\"\n\t\t  $ECHO \"*** using a file magic. Last file checked: $potlib\"\n\t\tfi\n\t      fi\n\t      ;;\n\t    *)\n\t      # Add a -L argument.\n\t      func_append newdeplibs \" $a_deplib\"\n\t      ;;\n\t    esac\n\t  done # Gone through all deplibs.\n\t  ;;\n\tmatch_pattern*)\n\t  set dummy $deplibs_check_method; shift\n\t  match_pattern_regex=`expr \"$deplibs_check_method\" : \"$1 \\(.*\\)\"`\n\t  for a_deplib in $deplibs; do\n\t    case $a_deplib in\n\t    -l*)\n\t      func_stripname -l '' \"$a_deplib\"\n\t      name=$func_stripname_result\n\t      if test yes = \"$allow_libtool_libs_with_static_runtimes\"; then\n\t\tcase \" $predeps $postdeps \" in\n\t\t*\" $a_deplib \"*)\n\t\t  func_append newdeplibs \" $a_deplib\"\n\t\t  a_deplib=\n\t\t  ;;\n\t\tesac\n\t      fi\n\t      if test -n \"$a_deplib\"; then\n\t\tlibname=`eval \"\\\\$ECHO \\\"$libname_spec\\\"\"`\n\t\tfor i in $lib_search_path $sys_lib_search_path $shlib_search_path; do\n\t\t  potential_libs=`ls $i/$libname[.-]* 2>/dev/null`\n\t\t  for potent_lib in $potential_libs; do\n\t\t    potlib=$potent_lib # see symlink-check above in file_magic test\n\t\t    if eval \"\\$ECHO \\\"$potent_lib\\\"\" 2>/dev/null | $SED 10q | \\\n\t\t       $EGREP \"$match_pattern_regex\" > /dev/null; then\n\t\t      func_append newdeplibs \" $a_deplib\"\n\t\t      a_deplib=\n\t\t      break 2\n\t\t    fi\n\t\t  done\n\t\tdone\n\t      fi\n\t      if test -n \"$a_deplib\"; then\n\t\tdroppeddeps=yes\n\t\techo\n\t\t$ECHO \"*** Warning: linker path does not have real file for library $a_deplib.\"\n\t\techo \"*** I have the capability to make that library automatically link in when\"\n\t\techo \"*** you link to this library.  But I can only do this if you have a\"\n\t\techo \"*** shared version of the library, which you do not appear to have\"\n\t\techo \"*** because I did check the linker path looking for a file starting\"\n\t\tif test -z \"$potlib\"; then\n\t\t  $ECHO \"*** with $libname but no candidates were found. (...for regex pattern test)\"\n\t\telse\n\t\t  $ECHO \"*** with $libname and none of the candidates passed a file format test\"\n\t\t  $ECHO \"*** using a regex pattern. Last file checked: $potlib\"\n\t\tfi\n\t      fi\n\t      ;;\n\t    *)\n\t      # Add a -L argument.\n\t      func_append newdeplibs \" $a_deplib\"\n\t      ;;\n\t    esac\n\t  done # Gone through all deplibs.\n\t  ;;\n\tnone | unknown | *)\n\t  newdeplibs=\n\t  tmp_deplibs=`$ECHO \" $deplibs\" | $SED 's/ -lc$//; s/ -[LR][^ ]*//g'`\n\t  if test yes = \"$allow_libtool_libs_with_static_runtimes\"; then\n\t    for i in $predeps $postdeps; do\n\t      # can't use Xsed below, because $i might contain '/'\n\t      tmp_deplibs=`$ECHO \" $tmp_deplibs\" | $SED \"s|$i||\"`\n\t    done\n\t  fi\n\t  case $tmp_deplibs in\n\t  *[!\\\t\\ ]*)\n\t    echo\n\t    if test none = \"$deplibs_check_method\"; then\n\t      echo \"*** Warning: inter-library dependencies are not supported in this platform.\"\n\t    else\n\t      echo \"*** Warning: inter-library dependencies are not known to be supported.\"\n\t    fi\n\t    echo \"*** All declared inter-library dependencies are being dropped.\"\n\t    droppeddeps=yes\n\t    ;;\n\t  esac\n\t  ;;\n\tesac\n\tversuffix=$versuffix_save\n\tmajor=$major_save\n\trelease=$release_save\n\tlibname=$libname_save\n\tname=$name_save\n\n\tcase $host in\n\t*-*-rhapsody* | *-*-darwin1.[012])\n\t  # On Rhapsody replace the C library with the System framework\n\t  newdeplibs=`$ECHO \" $newdeplibs\" | $SED 's/ -lc / System.ltframework /'`\n\t  ;;\n\tesac\n\n\tif test yes = \"$droppeddeps\"; then\n\t  if test yes = \"$module\"; then\n\t    echo\n\t    echo \"*** Warning: libtool could not satisfy all declared inter-library\"\n\t    $ECHO \"*** dependencies of module $libname.  Therefore, libtool will create\"\n\t    echo \"*** a static module, that should work as long as the dlopening\"\n\t    echo \"*** application is linked with the -dlopen flag.\"\n\t    if test -z \"$global_symbol_pipe\"; then\n\t      echo\n\t      echo \"*** However, this would only work if libtool was able to extract symbol\"\n\t      echo \"*** lists from a program, using 'nm' or equivalent, but libtool could\"\n\t      echo \"*** not find such a program.  So, this module is probably useless.\"\n\t      echo \"*** 'nm' from GNU binutils and a full rebuild may help.\"\n\t    fi\n\t    if test no = \"$build_old_libs\"; then\n\t      oldlibs=$output_objdir/$libname.$libext\n\t      build_libtool_libs=module\n\t      build_old_libs=yes\n\t    else\n\t      build_libtool_libs=no\n\t    fi\n\t  else\n\t    echo \"*** The inter-library dependencies that have been dropped here will be\"\n\t    echo \"*** automatically added whenever a program is linked with this library\"\n\t    echo \"*** or is declared to -dlopen it.\"\n\n\t    if test no = \"$allow_undefined\"; then\n\t      echo\n\t      echo \"*** Since this library must not contain undefined symbols,\"\n\t      echo \"*** because either the platform does not support them or\"\n\t      echo \"*** it was explicitly requested with -no-undefined,\"\n\t      echo \"*** libtool will only create a static version of it.\"\n\t      if test no = \"$build_old_libs\"; then\n\t\toldlibs=$output_objdir/$libname.$libext\n\t\tbuild_libtool_libs=module\n\t\tbuild_old_libs=yes\n\t      else\n\t\tbuild_libtool_libs=no\n\t      fi\n\t    fi\n\t  fi\n\tfi\n\t# Done checking deplibs!\n\tdeplibs=$newdeplibs\n      fi\n      # Time to change all our \"foo.ltframework\" stuff back to \"-framework foo\"\n      case $host in\n\t*-*-darwin*)\n\t  newdeplibs=`$ECHO \" $newdeplibs\" | $SED 's% \\([^ $]*\\).ltframework% -framework \\1%g'`\n\t  new_inherited_linker_flags=`$ECHO \" $new_inherited_linker_flags\" | $SED 's% \\([^ $]*\\).ltframework% -framework \\1%g'`\n\t  deplibs=`$ECHO \" $deplibs\" | $SED 's% \\([^ $]*\\).ltframework% -framework \\1%g'`\n\t  ;;\n      esac\n\n      # move library search paths that coincide with paths to not yet\n      # installed libraries to the beginning of the library search list\n      new_libs=\n      for path in $notinst_path; do\n\tcase \" $new_libs \" in\n\t*\" -L$path/$objdir \"*) ;;\n\t*)\n\t  case \" $deplibs \" in\n\t  *\" -L$path/$objdir \"*)\n\t    func_append new_libs \" -L$path/$objdir\" ;;\n\t  esac\n\t  ;;\n\tesac\n      done\n      for deplib in $deplibs; do\n\tcase $deplib in\n\t-L*)\n\t  case \" $new_libs \" in\n\t  *\" $deplib \"*) ;;\n\t  *) func_append new_libs \" $deplib\" ;;\n\t  esac\n\t  ;;\n\t*) func_append new_libs \" $deplib\" ;;\n\tesac\n      done\n      deplibs=$new_libs\n\n      # All the library-specific variables (install_libdir is set above).\n      library_names=\n      old_library=\n      dlname=\n\n      # Test again, we may have decided not to build it any more\n      if test yes = \"$build_libtool_libs\"; then\n\t# Remove $wl instances when linking with ld.\n\t# FIXME: should test the right _cmds variable.\n\tcase $archive_cmds in\n\t  *\\$LD\\ *) wl= ;;\n        esac\n\tif test yes = \"$hardcode_into_libs\"; then\n\t  # Hardcode the library paths\n\t  hardcode_libdirs=\n\t  dep_rpath=\n\t  rpath=$finalize_rpath\n\t  test relink = \"$opt_mode\" || rpath=$compile_rpath$rpath\n\t  for libdir in $rpath; do\n\t    if test -n \"$hardcode_libdir_flag_spec\"; then\n\t      if test -n \"$hardcode_libdir_separator\"; then\n\t\tfunc_replace_sysroot \"$libdir\"\n\t\tlibdir=$func_replace_sysroot_result\n\t\tif test -z \"$hardcode_libdirs\"; then\n\t\t  hardcode_libdirs=$libdir\n\t\telse\n\t\t  # Just accumulate the unique libdirs.\n\t\t  case $hardcode_libdir_separator$hardcode_libdirs$hardcode_libdir_separator in\n\t\t  *\"$hardcode_libdir_separator$libdir$hardcode_libdir_separator\"*)\n\t\t    ;;\n\t\t  *)\n\t\t    func_append hardcode_libdirs \"$hardcode_libdir_separator$libdir\"\n\t\t    ;;\n\t\t  esac\n\t\tfi\n\t      else\n\t\teval flag=\\\"$hardcode_libdir_flag_spec\\\"\n\t\tfunc_append dep_rpath \" $flag\"\n\t      fi\n\t    elif test -n \"$runpath_var\"; then\n\t      case \"$perm_rpath \" in\n\t      *\" $libdir \"*) ;;\n\t      *) func_append perm_rpath \" $libdir\" ;;\n\t      esac\n\t    fi\n\t  done\n\t  # Substitute the hardcoded libdirs into the rpath.\n\t  if test -n \"$hardcode_libdir_separator\" &&\n\t     test -n \"$hardcode_libdirs\"; then\n\t    libdir=$hardcode_libdirs\n\t    eval \"dep_rpath=\\\"$hardcode_libdir_flag_spec\\\"\"\n\t  fi\n\t  if test -n \"$runpath_var\" && test -n \"$perm_rpath\"; then\n\t    # We should set the runpath_var.\n\t    rpath=\n\t    for dir in $perm_rpath; do\n\t      func_append rpath \"$dir:\"\n\t    done\n\t    eval \"$runpath_var='$rpath\\$$runpath_var'; export $runpath_var\"\n\t  fi\n\t  test -n \"$dep_rpath\" && deplibs=\"$dep_rpath $deplibs\"\n\tfi\n\n\tshlibpath=$finalize_shlibpath\n\ttest relink = \"$opt_mode\" || shlibpath=$compile_shlibpath$shlibpath\n\tif test -n \"$shlibpath\"; then\n\t  eval \"$shlibpath_var='$shlibpath\\$$shlibpath_var'; export $shlibpath_var\"\n\tfi\n\n\t# Get the real and link names of the library.\n\teval shared_ext=\\\"$shrext_cmds\\\"\n\teval library_names=\\\"$library_names_spec\\\"\n\tset dummy $library_names\n\tshift\n\trealname=$1\n\tshift\n\n\tif test -n \"$soname_spec\"; then\n\t  eval soname=\\\"$soname_spec\\\"\n\telse\n\t  soname=$realname\n\tfi\n\tif test -z \"$dlname\"; then\n\t  dlname=$soname\n\tfi\n\n\tlib=$output_objdir/$realname\n\tlinknames=\n\tfor link\n\tdo\n\t  func_append linknames \" $link\"\n\tdone\n\n\t# Use standard objects if they are pic\n\ttest -z \"$pic_flag\" && libobjs=`$ECHO \"$libobjs\" | $SP2NL | $SED \"$lo2o\" | $NL2SP`\n\ttest \"X$libobjs\" = \"X \" && libobjs=\n\n\tdelfiles=\n\tif test -n \"$export_symbols\" && test -n \"$include_expsyms\"; then\n\t  $opt_dry_run || cp \"$export_symbols\" \"$output_objdir/$libname.uexp\"\n\t  export_symbols=$output_objdir/$libname.uexp\n\t  func_append delfiles \" $export_symbols\"\n\tfi\n\n\torig_export_symbols=\n\tcase $host_os in\n\tcygwin* | mingw* | cegcc*)\n\t  if test -n \"$export_symbols\" && test -z \"$export_symbols_regex\"; then\n\t    # exporting using user supplied symfile\n\t    func_dll_def_p \"$export_symbols\" || {\n\t      # and it's NOT already a .def file. Must figure out\n\t      # which of the given symbols are data symbols and tag\n\t      # them as such. So, trigger use of export_symbols_cmds.\n\t      # export_symbols gets reassigned inside the \"prepare\n\t      # the list of exported symbols\" if statement, so the\n\t      # include_expsyms logic still works.\n\t      orig_export_symbols=$export_symbols\n\t      export_symbols=\n\t      always_export_symbols=yes\n\t    }\n\t  fi\n\t  ;;\n\tesac\n\n\t# Prepare the list of exported symbols\n\tif test -z \"$export_symbols\"; then\n\t  if test yes = \"$always_export_symbols\" || test -n \"$export_symbols_regex\"; then\n\t    func_verbose \"generating symbol list for '$libname.la'\"\n\t    export_symbols=$output_objdir/$libname.exp\n\t    $opt_dry_run || $RM $export_symbols\n\t    cmds=$export_symbols_cmds\n\t    save_ifs=$IFS; IFS='~'\n\t    for cmd1 in $cmds; do\n\t      IFS=$save_ifs\n\t      # Take the normal branch if the nm_file_list_spec branch\n\t      # doesn't work or if tool conversion is not needed.\n\t      case $nm_file_list_spec~$to_tool_file_cmd in\n\t\t*~func_convert_file_noop | *~func_convert_file_msys_to_w32 | ~*)\n\t\t  try_normal_branch=yes\n\t\t  eval cmd=\\\"$cmd1\\\"\n\t\t  func_len \" $cmd\"\n\t\t  len=$func_len_result\n\t\t  ;;\n\t\t*)\n\t\t  try_normal_branch=no\n\t\t  ;;\n\t      esac\n\t      if test yes = \"$try_normal_branch\" \\\n\t\t && { test \"$len\" -lt \"$max_cmd_len\" \\\n\t\t      || test \"$max_cmd_len\" -le -1; }\n\t      then\n\t\tfunc_show_eval \"$cmd\" 'exit $?'\n\t\tskipped_export=false\n\t      elif test -n \"$nm_file_list_spec\"; then\n\t\tfunc_basename \"$output\"\n\t\toutput_la=$func_basename_result\n\t\tsave_libobjs=$libobjs\n\t\tsave_output=$output\n\t\toutput=$output_objdir/$output_la.nm\n\t\tfunc_to_tool_file \"$output\"\n\t\tlibobjs=$nm_file_list_spec$func_to_tool_file_result\n\t\tfunc_append delfiles \" $output\"\n\t\tfunc_verbose \"creating $NM input file list: $output\"\n\t\tfor obj in $save_libobjs; do\n\t\t  func_to_tool_file \"$obj\"\n\t\t  $ECHO \"$func_to_tool_file_result\"\n\t\tdone > \"$output\"\n\t\teval cmd=\\\"$cmd1\\\"\n\t\tfunc_show_eval \"$cmd\" 'exit $?'\n\t\toutput=$save_output\n\t\tlibobjs=$save_libobjs\n\t\tskipped_export=false\n\t      else\n\t\t# The command line is too long to execute in one step.\n\t\tfunc_verbose \"using reloadable object file for export list...\"\n\t\tskipped_export=:\n\t\t# Break out early, otherwise skipped_export may be\n\t\t# set to false by a later but shorter cmd.\n\t\tbreak\n\t      fi\n\t    done\n\t    IFS=$save_ifs\n\t    if test -n \"$export_symbols_regex\" && test : != \"$skipped_export\"; then\n\t      func_show_eval '$EGREP -e \"$export_symbols_regex\" \"$export_symbols\" > \"${export_symbols}T\"'\n\t      func_show_eval '$MV \"${export_symbols}T\" \"$export_symbols\"'\n\t    fi\n\t  fi\n\tfi\n\n\tif test -n \"$export_symbols\" && test -n \"$include_expsyms\"; then\n\t  tmp_export_symbols=$export_symbols\n\t  test -n \"$orig_export_symbols\" && tmp_export_symbols=$orig_export_symbols\n\t  $opt_dry_run || eval '$ECHO \"$include_expsyms\" | $SP2NL >> \"$tmp_export_symbols\"'\n\tfi\n\n\tif test : != \"$skipped_export\" && test -n \"$orig_export_symbols\"; then\n\t  # The given exports_symbols file has to be filtered, so filter it.\n\t  func_verbose \"filter symbol list for '$libname.la' to tag DATA exports\"\n\t  # FIXME: $output_objdir/$libname.filter potentially contains lots of\n\t  # 's' commands, which not all seds can handle. GNU sed should be fine\n\t  # though. Also, the filter scales superlinearly with the number of\n\t  # global variables. join(1) would be nice here, but unfortunately\n\t  # isn't a blessed tool.\n\t  $opt_dry_run || $SED -e '/[ ,]DATA/!d;s,\\(.*\\)\\([ \\,].*\\),s|^\\1$|\\1\\2|,' < $export_symbols > $output_objdir/$libname.filter\n\t  func_append delfiles \" $export_symbols $output_objdir/$libname.filter\"\n\t  export_symbols=$output_objdir/$libname.def\n\t  $opt_dry_run || $SED -f $output_objdir/$libname.filter < $orig_export_symbols > $export_symbols\n\tfi\n\n\ttmp_deplibs=\n\tfor test_deplib in $deplibs; do\n\t  case \" $convenience \" in\n\t  *\" $test_deplib \"*) ;;\n\t  *)\n\t    func_append tmp_deplibs \" $test_deplib\"\n\t    ;;\n\t  esac\n\tdone\n\tdeplibs=$tmp_deplibs\n\n\tif test -n \"$convenience\"; then\n\t  if test -n \"$whole_archive_flag_spec\" &&\n\t    test yes = \"$compiler_needs_object\" &&\n\t    test -z \"$libobjs\"; then\n\t    # extract the archives, so we have objects to list.\n\t    # TODO: could optimize this to just extract one archive.\n\t    whole_archive_flag_spec=\n\t  fi\n\t  if test -n \"$whole_archive_flag_spec\"; then\n\t    save_libobjs=$libobjs\n\t    eval libobjs=\\\"\\$libobjs $whole_archive_flag_spec\\\"\n\t    test \"X$libobjs\" = \"X \" && libobjs=\n\t  else\n\t    gentop=$output_objdir/${outputname}x\n\t    func_append generated \" $gentop\"\n\n\t    func_extract_archives $gentop $convenience\n\t    func_append libobjs \" $func_extract_archives_result\"\n\t    test \"X$libobjs\" = \"X \" && libobjs=\n\t  fi\n\tfi\n\n\tif test yes = \"$thread_safe\" && test -n \"$thread_safe_flag_spec\"; then\n\t  eval flag=\\\"$thread_safe_flag_spec\\\"\n\t  func_append linker_flags \" $flag\"\n\tfi\n\n\t# Make a backup of the uninstalled library when relinking\n\tif test relink = \"$opt_mode\"; then\n\t  $opt_dry_run || eval '(cd $output_objdir && $RM ${realname}U && $MV $realname ${realname}U)' || exit $?\n\tfi\n\n\t# Do each of the archive commands.\n\tif test yes = \"$module\" && test -n \"$module_cmds\"; then\n\t  if test -n \"$export_symbols\" && test -n \"$module_expsym_cmds\"; then\n\t    eval test_cmds=\\\"$module_expsym_cmds\\\"\n\t    cmds=$module_expsym_cmds\n\t  else\n\t    eval test_cmds=\\\"$module_cmds\\\"\n\t    cmds=$module_cmds\n\t  fi\n\telse\n\t  if test -n \"$export_symbols\" && test -n \"$archive_expsym_cmds\"; then\n\t    eval test_cmds=\\\"$archive_expsym_cmds\\\"\n\t    cmds=$archive_expsym_cmds\n\t  else\n\t    eval test_cmds=\\\"$archive_cmds\\\"\n\t    cmds=$archive_cmds\n\t  fi\n\tfi\n\n\tif test : != \"$skipped_export\" &&\n\t   func_len \" $test_cmds\" &&\n\t   len=$func_len_result &&\n\t   test \"$len\" -lt \"$max_cmd_len\" || test \"$max_cmd_len\" -le -1; then\n\t  :\n\telse\n\t  # The command line is too long to link in one step, link piecewise\n\t  # or, if using GNU ld and skipped_export is not :, use a linker\n\t  # script.\n\n\t  # Save the value of $output and $libobjs because we want to\n\t  # use them later.  If we have whole_archive_flag_spec, we\n\t  # want to use save_libobjs as it was before\n\t  # whole_archive_flag_spec was expanded, because we can't\n\t  # assume the linker understands whole_archive_flag_spec.\n\t  # This may have to be revisited, in case too many\n\t  # convenience libraries get linked in and end up exceeding\n\t  # the spec.\n\t  if test -z \"$convenience\" || test -z \"$whole_archive_flag_spec\"; then\n\t    save_libobjs=$libobjs\n\t  fi\n\t  save_output=$output\n\t  func_basename \"$output\"\n\t  output_la=$func_basename_result\n\n\t  # Clear the reloadable object creation command queue and\n\t  # initialize k to one.\n\t  test_cmds=\n\t  concat_cmds=\n\t  objlist=\n\t  last_robj=\n\t  k=1\n\n\t  if test -n \"$save_libobjs\" && test : != \"$skipped_export\" && test yes = \"$with_gnu_ld\"; then\n\t    output=$output_objdir/$output_la.lnkscript\n\t    func_verbose \"creating GNU ld script: $output\"\n\t    echo 'INPUT (' > $output\n\t    for obj in $save_libobjs\n\t    do\n\t      func_to_tool_file \"$obj\"\n\t      $ECHO \"$func_to_tool_file_result\" >> $output\n\t    done\n\t    echo ')' >> $output\n\t    func_append delfiles \" $output\"\n\t    func_to_tool_file \"$output\"\n\t    output=$func_to_tool_file_result\n\t  elif test -n \"$save_libobjs\" && test : != \"$skipped_export\" && test -n \"$file_list_spec\"; then\n\t    output=$output_objdir/$output_la.lnk\n\t    func_verbose \"creating linker input file list: $output\"\n\t    : > $output\n\t    set x $save_libobjs\n\t    shift\n\t    firstobj=\n\t    if test yes = \"$compiler_needs_object\"; then\n\t      firstobj=\"$1 \"\n\t      shift\n\t    fi\n\t    for obj\n\t    do\n\t      func_to_tool_file \"$obj\"\n\t      $ECHO \"$func_to_tool_file_result\" >> $output\n\t    done\n\t    func_append delfiles \" $output\"\n\t    func_to_tool_file \"$output\"\n\t    output=$firstobj\\\"$file_list_spec$func_to_tool_file_result\\\"\n\t  else\n\t    if test -n \"$save_libobjs\"; then\n\t      func_verbose \"creating reloadable object files...\"\n\t      output=$output_objdir/$output_la-$k.$objext\n\t      eval test_cmds=\\\"$reload_cmds\\\"\n\t      func_len \" $test_cmds\"\n\t      len0=$func_len_result\n\t      len=$len0\n\n\t      # Loop over the list of objects to be linked.\n\t      for obj in $save_libobjs\n\t      do\n\t\tfunc_len \" $obj\"\n\t\tfunc_arith $len + $func_len_result\n\t\tlen=$func_arith_result\n\t\tif test -z \"$objlist\" ||\n\t\t   test \"$len\" -lt \"$max_cmd_len\"; then\n\t\t  func_append objlist \" $obj\"\n\t\telse\n\t\t  # The command $test_cmds is almost too long, add a\n\t\t  # command to the queue.\n\t\t  if test 1 -eq \"$k\"; then\n\t\t    # The first file doesn't have a previous command to add.\n\t\t    reload_objs=$objlist\n\t\t    eval concat_cmds=\\\"$reload_cmds\\\"\n\t\t  else\n\t\t    # All subsequent reloadable object files will link in\n\t\t    # the last one created.\n\t\t    reload_objs=\"$objlist $last_robj\"\n\t\t    eval concat_cmds=\\\"\\$concat_cmds~$reload_cmds~\\$RM $last_robj\\\"\n\t\t  fi\n\t\t  last_robj=$output_objdir/$output_la-$k.$objext\n\t\t  func_arith $k + 1\n\t\t  k=$func_arith_result\n\t\t  output=$output_objdir/$output_la-$k.$objext\n\t\t  objlist=\" $obj\"\n\t\t  func_len \" $last_robj\"\n\t\t  func_arith $len0 + $func_len_result\n\t\t  len=$func_arith_result\n\t\tfi\n\t      done\n\t      # Handle the remaining objects by creating one last\n\t      # reloadable object file.  All subsequent reloadable object\n\t      # files will link in the last one created.\n\t      test -z \"$concat_cmds\" || concat_cmds=$concat_cmds~\n\t      reload_objs=\"$objlist $last_robj\"\n\t      eval concat_cmds=\\\"\\$concat_cmds$reload_cmds\\\"\n\t      if test -n \"$last_robj\"; then\n\t        eval concat_cmds=\\\"\\$concat_cmds~\\$RM $last_robj\\\"\n\t      fi\n\t      func_append delfiles \" $output\"\n\n\t    else\n\t      output=\n\t    fi\n\n\t    ${skipped_export-false} && {\n\t      func_verbose \"generating symbol list for '$libname.la'\"\n\t      export_symbols=$output_objdir/$libname.exp\n\t      $opt_dry_run || $RM $export_symbols\n\t      libobjs=$output\n\t      # Append the command to create the export file.\n\t      test -z \"$concat_cmds\" || concat_cmds=$concat_cmds~\n\t      eval concat_cmds=\\\"\\$concat_cmds$export_symbols_cmds\\\"\n\t      if test -n \"$last_robj\"; then\n\t\teval concat_cmds=\\\"\\$concat_cmds~\\$RM $last_robj\\\"\n\t      fi\n\t    }\n\n\t    test -n \"$save_libobjs\" &&\n\t      func_verbose \"creating a temporary reloadable object file: $output\"\n\n\t    # Loop through the commands generated above and execute them.\n\t    save_ifs=$IFS; IFS='~'\n\t    for cmd in $concat_cmds; do\n\t      IFS=$save_ifs\n\t      $opt_quiet || {\n\t\t  func_quote_for_expand \"$cmd\"\n\t\t  eval \"func_echo $func_quote_for_expand_result\"\n\t      }\n\t      $opt_dry_run || eval \"$cmd\" || {\n\t\tlt_exit=$?\n\n\t\t# Restore the uninstalled library and exit\n\t\tif test relink = \"$opt_mode\"; then\n\t\t  ( cd \"$output_objdir\" && \\\n\t\t    $RM \"${realname}T\" && \\\n\t\t    $MV \"${realname}U\" \"$realname\" )\n\t\tfi\n\n\t\texit $lt_exit\n\t      }\n\t    done\n\t    IFS=$save_ifs\n\n\t    if test -n \"$export_symbols_regex\" && ${skipped_export-false}; then\n\t      func_show_eval '$EGREP -e \"$export_symbols_regex\" \"$export_symbols\" > \"${export_symbols}T\"'\n\t      func_show_eval '$MV \"${export_symbols}T\" \"$export_symbols\"'\n\t    fi\n\t  fi\n\n          ${skipped_export-false} && {\n\t    if test -n \"$export_symbols\" && test -n \"$include_expsyms\"; then\n\t      tmp_export_symbols=$export_symbols\n\t      test -n \"$orig_export_symbols\" && tmp_export_symbols=$orig_export_symbols\n\t      $opt_dry_run || eval '$ECHO \"$include_expsyms\" | $SP2NL >> \"$tmp_export_symbols\"'\n\t    fi\n\n\t    if test -n \"$orig_export_symbols\"; then\n\t      # The given exports_symbols file has to be filtered, so filter it.\n\t      func_verbose \"filter symbol list for '$libname.la' to tag DATA exports\"\n\t      # FIXME: $output_objdir/$libname.filter potentially contains lots of\n\t      # 's' commands, which not all seds can handle. GNU sed should be fine\n\t      # though. Also, the filter scales superlinearly with the number of\n\t      # global variables. join(1) would be nice here, but unfortunately\n\t      # isn't a blessed tool.\n\t      $opt_dry_run || $SED -e '/[ ,]DATA/!d;s,\\(.*\\)\\([ \\,].*\\),s|^\\1$|\\1\\2|,' < $export_symbols > $output_objdir/$libname.filter\n\t      func_append delfiles \" $export_symbols $output_objdir/$libname.filter\"\n\t      export_symbols=$output_objdir/$libname.def\n\t      $opt_dry_run || $SED -f $output_objdir/$libname.filter < $orig_export_symbols > $export_symbols\n\t    fi\n\t  }\n\n\t  libobjs=$output\n\t  # Restore the value of output.\n\t  output=$save_output\n\n\t  if test -n \"$convenience\" && test -n \"$whole_archive_flag_spec\"; then\n\t    eval libobjs=\\\"\\$libobjs $whole_archive_flag_spec\\\"\n\t    test \"X$libobjs\" = \"X \" && libobjs=\n\t  fi\n\t  # Expand the library linking commands again to reset the\n\t  # value of $libobjs for piecewise linking.\n\n\t  # Do each of the archive commands.\n\t  if test yes = \"$module\" && test -n \"$module_cmds\"; then\n\t    if test -n \"$export_symbols\" && test -n \"$module_expsym_cmds\"; then\n\t      cmds=$module_expsym_cmds\n\t    else\n\t      cmds=$module_cmds\n\t    fi\n\t  else\n\t    if test -n \"$export_symbols\" && test -n \"$archive_expsym_cmds\"; then\n\t      cmds=$archive_expsym_cmds\n\t    else\n\t      cmds=$archive_cmds\n\t    fi\n\t  fi\n\tfi\n\n\tif test -n \"$delfiles\"; then\n\t  # Append the command to remove temporary files to $cmds.\n\t  eval cmds=\\\"\\$cmds~\\$RM $delfiles\\\"\n\tfi\n\n\t# Add any objects from preloaded convenience libraries\n\tif test -n \"$dlprefiles\"; then\n\t  gentop=$output_objdir/${outputname}x\n\t  func_append generated \" $gentop\"\n\n\t  func_extract_archives $gentop $dlprefiles\n\t  func_append libobjs \" $func_extract_archives_result\"\n\t  test \"X$libobjs\" = \"X \" && libobjs=\n\tfi\n\n\tsave_ifs=$IFS; IFS='~'\n\tfor cmd in $cmds; do\n\t  IFS=$sp$nl\n\t  eval cmd=\\\"$cmd\\\"\n\t  IFS=$save_ifs\n\t  $opt_quiet || {\n\t    func_quote_for_expand \"$cmd\"\n\t    eval \"func_echo $func_quote_for_expand_result\"\n\t  }\n\t  $opt_dry_run || eval \"$cmd\" || {\n\t    lt_exit=$?\n\n\t    # Restore the uninstalled library and exit\n\t    if test relink = \"$opt_mode\"; then\n\t      ( cd \"$output_objdir\" && \\\n\t        $RM \"${realname}T\" && \\\n\t\t$MV \"${realname}U\" \"$realname\" )\n\t    fi\n\n\t    exit $lt_exit\n\t  }\n\tdone\n\tIFS=$save_ifs\n\n\t# Restore the uninstalled library and exit\n\tif test relink = \"$opt_mode\"; then\n\t  $opt_dry_run || eval '(cd $output_objdir && $RM ${realname}T && $MV $realname ${realname}T && $MV ${realname}U $realname)' || exit $?\n\n\t  if test -n \"$convenience\"; then\n\t    if test -z \"$whole_archive_flag_spec\"; then\n\t      func_show_eval '${RM}r \"$gentop\"'\n\t    fi\n\t  fi\n\n\t  exit $EXIT_SUCCESS\n\tfi\n\n\t# Create links to the real library.\n\tfor linkname in $linknames; do\n\t  if test \"$realname\" != \"$linkname\"; then\n\t    func_show_eval '(cd \"$output_objdir\" && $RM \"$linkname\" && $LN_S \"$realname\" \"$linkname\")' 'exit $?'\n\t  fi\n\tdone\n\n\t# If -module or -export-dynamic was specified, set the dlname.\n\tif test yes = \"$module\" || test yes = \"$export_dynamic\"; then\n\t  # On all known operating systems, these are identical.\n\t  dlname=$soname\n\tfi\n      fi\n      ;;\n\n    obj)\n      if test -n \"$dlfiles$dlprefiles\" || test no != \"$dlself\"; then\n\tfunc_warning \"'-dlopen' is ignored for objects\"\n      fi\n\n      case \" $deplibs\" in\n      *\\ -l* | *\\ -L*)\n\tfunc_warning \"'-l' and '-L' are ignored for objects\" ;;\n      esac\n\n      test -n \"$rpath\" && \\\n\tfunc_warning \"'-rpath' is ignored for objects\"\n\n      test -n \"$xrpath\" && \\\n\tfunc_warning \"'-R' is ignored for objects\"\n\n      test -n \"$vinfo\" && \\\n\tfunc_warning \"'-version-info' is ignored for objects\"\n\n      test -n \"$release\" && \\\n\tfunc_warning \"'-release' is ignored for objects\"\n\n      case $output in\n      *.lo)\n\ttest -n \"$objs$old_deplibs\" && \\\n\t  func_fatal_error \"cannot build library object '$output' from non-libtool objects\"\n\n\tlibobj=$output\n\tfunc_lo2o \"$libobj\"\n\tobj=$func_lo2o_result\n\t;;\n      *)\n\tlibobj=\n\tobj=$output\n\t;;\n      esac\n\n      # Delete the old objects.\n      $opt_dry_run || $RM $obj $libobj\n\n      # Objects from convenience libraries.  This assumes\n      # single-version convenience libraries.  Whenever we create\n      # different ones for PIC/non-PIC, this we'll have to duplicate\n      # the extraction.\n      reload_conv_objs=\n      gentop=\n      # if reload_cmds runs $LD directly, get rid of -Wl from\n      # whole_archive_flag_spec and hope we can get by with turning comma\n      # into space.\n      case $reload_cmds in\n        *\\$LD[\\ \\$]*) wl= ;;\n      esac\n      if test -n \"$convenience\"; then\n\tif test -n \"$whole_archive_flag_spec\"; then\n\t  eval tmp_whole_archive_flags=\\\"$whole_archive_flag_spec\\\"\n\t  test -n \"$wl\" || tmp_whole_archive_flags=`$ECHO \"$tmp_whole_archive_flags\" | $SED 's|,| |g'`\n\t  reload_conv_objs=$reload_objs\\ $tmp_whole_archive_flags\n\telse\n\t  gentop=$output_objdir/${obj}x\n\t  func_append generated \" $gentop\"\n\n\t  func_extract_archives $gentop $convenience\n\t  reload_conv_objs=\"$reload_objs $func_extract_archives_result\"\n\tfi\n      fi\n\n      # If we're not building shared, we need to use non_pic_objs\n      test yes = \"$build_libtool_libs\" || libobjs=$non_pic_objects\n\n      # Create the old-style object.\n      reload_objs=$objs$old_deplibs' '`$ECHO \"$libobjs\" | $SP2NL | $SED \"/\\.$libext$/d; /\\.lib$/d; $lo2o\" | $NL2SP`' '$reload_conv_objs\n\n      output=$obj\n      func_execute_cmds \"$reload_cmds\" 'exit $?'\n\n      # Exit if we aren't doing a library object file.\n      if test -z \"$libobj\"; then\n\tif test -n \"$gentop\"; then\n\t  func_show_eval '${RM}r \"$gentop\"'\n\tfi\n\n\texit $EXIT_SUCCESS\n      fi\n\n      test yes = \"$build_libtool_libs\" || {\n\tif test -n \"$gentop\"; then\n\t  func_show_eval '${RM}r \"$gentop\"'\n\tfi\n\n\t# Create an invalid libtool object if no PIC, so that we don't\n\t# accidentally link it into a program.\n\t# $show \"echo timestamp > $libobj\"\n\t# $opt_dry_run || eval \"echo timestamp > $libobj\" || exit $?\n\texit $EXIT_SUCCESS\n      }\n\n      if test -n \"$pic_flag\" || test default != \"$pic_mode\"; then\n\t# Only do commands if we really have different PIC objects.\n\treload_objs=\"$libobjs $reload_conv_objs\"\n\toutput=$libobj\n\tfunc_execute_cmds \"$reload_cmds\" 'exit $?'\n      fi\n\n      if test -n \"$gentop\"; then\n\tfunc_show_eval '${RM}r \"$gentop\"'\n      fi\n\n      exit $EXIT_SUCCESS\n      ;;\n\n    prog)\n      case $host in\n\t*cygwin*) func_stripname '' '.exe' \"$output\"\n\t          output=$func_stripname_result.exe;;\n      esac\n      test -n \"$vinfo\" && \\\n\tfunc_warning \"'-version-info' is ignored for programs\"\n\n      test -n \"$release\" && \\\n\tfunc_warning \"'-release' is ignored for programs\"\n\n      $preload \\\n\t&& test unknown,unknown,unknown = \"$dlopen_support,$dlopen_self,$dlopen_self_static\" \\\n\t&& func_warning \"'LT_INIT([dlopen])' not used. Assuming no dlopen support.\"\n\n      case $host in\n      *-*-rhapsody* | *-*-darwin1.[012])\n\t# On Rhapsody replace the C library is the System framework\n\tcompile_deplibs=`$ECHO \" $compile_deplibs\" | $SED 's/ -lc / System.ltframework /'`\n\tfinalize_deplibs=`$ECHO \" $finalize_deplibs\" | $SED 's/ -lc / System.ltframework /'`\n\t;;\n      esac\n\n      case $host in\n      *-*-darwin*)\n\t# Don't allow lazy linking, it breaks C++ global constructors\n\t# But is supposedly fixed on 10.4 or later (yay!).\n\tif test CXX = \"$tagname\"; then\n\t  case ${MACOSX_DEPLOYMENT_TARGET-10.0} in\n\t    10.[0123])\n\t      func_append compile_command \" $wl-bind_at_load\"\n\t      func_append finalize_command \" $wl-bind_at_load\"\n\t    ;;\n\t  esac\n\tfi\n\t# Time to change all our \"foo.ltframework\" stuff back to \"-framework foo\"\n\tcompile_deplibs=`$ECHO \" $compile_deplibs\" | $SED 's% \\([^ $]*\\).ltframework% -framework \\1%g'`\n\tfinalize_deplibs=`$ECHO \" $finalize_deplibs\" | $SED 's% \\([^ $]*\\).ltframework% -framework \\1%g'`\n\t;;\n      esac\n\n\n      # move library search paths that coincide with paths to not yet\n      # installed libraries to the beginning of the library search list\n      new_libs=\n      for path in $notinst_path; do\n\tcase \" $new_libs \" in\n\t*\" -L$path/$objdir \"*) ;;\n\t*)\n\t  case \" $compile_deplibs \" in\n\t  *\" -L$path/$objdir \"*)\n\t    func_append new_libs \" -L$path/$objdir\" ;;\n\t  esac\n\t  ;;\n\tesac\n      done\n      for deplib in $compile_deplibs; do\n\tcase $deplib in\n\t-L*)\n\t  case \" $new_libs \" in\n\t  *\" $deplib \"*) ;;\n\t  *) func_append new_libs \" $deplib\" ;;\n\t  esac\n\t  ;;\n\t*) func_append new_libs \" $deplib\" ;;\n\tesac\n      done\n      compile_deplibs=$new_libs\n\n\n      func_append compile_command \" $compile_deplibs\"\n      func_append finalize_command \" $finalize_deplibs\"\n\n      if test -n \"$rpath$xrpath\"; then\n\t# If the user specified any rpath flags, then add them.\n\tfor libdir in $rpath $xrpath; do\n\t  # This is the magic to use -rpath.\n\t  case \"$finalize_rpath \" in\n\t  *\" $libdir \"*) ;;\n\t  *) func_append finalize_rpath \" $libdir\" ;;\n\t  esac\n\tdone\n      fi\n\n      # Now hardcode the library paths\n      rpath=\n      hardcode_libdirs=\n      for libdir in $compile_rpath $finalize_rpath; do\n\tif test -n \"$hardcode_libdir_flag_spec\"; then\n\t  if test -n \"$hardcode_libdir_separator\"; then\n\t    if test -z \"$hardcode_libdirs\"; then\n\t      hardcode_libdirs=$libdir\n\t    else\n\t      # Just accumulate the unique libdirs.\n\t      case $hardcode_libdir_separator$hardcode_libdirs$hardcode_libdir_separator in\n\t      *\"$hardcode_libdir_separator$libdir$hardcode_libdir_separator\"*)\n\t\t;;\n\t      *)\n\t\tfunc_append hardcode_libdirs \"$hardcode_libdir_separator$libdir\"\n\t\t;;\n\t      esac\n\t    fi\n\t  else\n\t    eval flag=\\\"$hardcode_libdir_flag_spec\\\"\n\t    func_append rpath \" $flag\"\n\t  fi\n\telif test -n \"$runpath_var\"; then\n\t  case \"$perm_rpath \" in\n\t  *\" $libdir \"*) ;;\n\t  *) func_append perm_rpath \" $libdir\" ;;\n\t  esac\n\tfi\n\tcase $host in\n\t*-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-os2* | *-cegcc*)\n\t  testbindir=`$ECHO \"$libdir\" | $SED -e 's*/lib$*/bin*'`\n\t  case :$dllsearchpath: in\n\t  *\":$libdir:\"*) ;;\n\t  ::) dllsearchpath=$libdir;;\n\t  *) func_append dllsearchpath \":$libdir\";;\n\t  esac\n\t  case :$dllsearchpath: in\n\t  *\":$testbindir:\"*) ;;\n\t  ::) dllsearchpath=$testbindir;;\n\t  *) func_append dllsearchpath \":$testbindir\";;\n\t  esac\n\t  ;;\n\tesac\n      done\n      # Substitute the hardcoded libdirs into the rpath.\n      if test -n \"$hardcode_libdir_separator\" &&\n\t test -n \"$hardcode_libdirs\"; then\n\tlibdir=$hardcode_libdirs\n\teval rpath=\\\" $hardcode_libdir_flag_spec\\\"\n      fi\n      compile_rpath=$rpath\n\n      rpath=\n      hardcode_libdirs=\n      for libdir in $finalize_rpath; do\n\tif test -n \"$hardcode_libdir_flag_spec\"; then\n\t  if test -n \"$hardcode_libdir_separator\"; then\n\t    if test -z \"$hardcode_libdirs\"; then\n\t      hardcode_libdirs=$libdir\n\t    else\n\t      # Just accumulate the unique libdirs.\n\t      case $hardcode_libdir_separator$hardcode_libdirs$hardcode_libdir_separator in\n\t      *\"$hardcode_libdir_separator$libdir$hardcode_libdir_separator\"*)\n\t\t;;\n\t      *)\n\t\tfunc_append hardcode_libdirs \"$hardcode_libdir_separator$libdir\"\n\t\t;;\n\t      esac\n\t    fi\n\t  else\n\t    eval flag=\\\"$hardcode_libdir_flag_spec\\\"\n\t    func_append rpath \" $flag\"\n\t  fi\n\telif test -n \"$runpath_var\"; then\n\t  case \"$finalize_perm_rpath \" in\n\t  *\" $libdir \"*) ;;\n\t  *) func_append finalize_perm_rpath \" $libdir\" ;;\n\t  esac\n\tfi\n      done\n      # Substitute the hardcoded libdirs into the rpath.\n      if test -n \"$hardcode_libdir_separator\" &&\n\t test -n \"$hardcode_libdirs\"; then\n\tlibdir=$hardcode_libdirs\n\teval rpath=\\\" $hardcode_libdir_flag_spec\\\"\n      fi\n      finalize_rpath=$rpath\n\n      if test -n \"$libobjs\" && test yes = \"$build_old_libs\"; then\n\t# Transform all the library objects into standard objects.\n\tcompile_command=`$ECHO \"$compile_command\" | $SP2NL | $SED \"$lo2o\" | $NL2SP`\n\tfinalize_command=`$ECHO \"$finalize_command\" | $SP2NL | $SED \"$lo2o\" | $NL2SP`\n      fi\n\n      func_generate_dlsyms \"$outputname\" \"@PROGRAM@\" false\n\n      # template prelinking step\n      if test -n \"$prelink_cmds\"; then\n\tfunc_execute_cmds \"$prelink_cmds\" 'exit $?'\n      fi\n\n      wrappers_required=:\n      case $host in\n      *cegcc* | *mingw32ce*)\n        # Disable wrappers for cegcc and mingw32ce hosts, we are cross compiling anyway.\n        wrappers_required=false\n        ;;\n      *cygwin* | *mingw* )\n        test yes = \"$build_libtool_libs\" || wrappers_required=false\n        ;;\n      *)\n        if test no = \"$need_relink\" || test yes != \"$build_libtool_libs\"; then\n          wrappers_required=false\n        fi\n        ;;\n      esac\n      $wrappers_required || {\n\t# Replace the output file specification.\n\tcompile_command=`$ECHO \"$compile_command\" | $SED 's%@OUTPUT@%'\"$output\"'%g'`\n\tlink_command=$compile_command$compile_rpath\n\n\t# We have no uninstalled library dependencies, so finalize right now.\n\texit_status=0\n\tfunc_show_eval \"$link_command\" 'exit_status=$?'\n\n\tif test -n \"$postlink_cmds\"; then\n\t  func_to_tool_file \"$output\"\n\t  postlink_cmds=`func_echo_all \"$postlink_cmds\" | $SED -e 's%@OUTPUT@%'\"$output\"'%g' -e 's%@TOOL_OUTPUT@%'\"$func_to_tool_file_result\"'%g'`\n\t  func_execute_cmds \"$postlink_cmds\" 'exit $?'\n\tfi\n\n\t# Delete the generated files.\n\tif test -f \"$output_objdir/${outputname}S.$objext\"; then\n\t  func_show_eval '$RM \"$output_objdir/${outputname}S.$objext\"'\n\tfi\n\n\texit $exit_status\n      }\n\n      if test -n \"$compile_shlibpath$finalize_shlibpath\"; then\n\tcompile_command=\"$shlibpath_var=\\\"$compile_shlibpath$finalize_shlibpath\\$$shlibpath_var\\\" $compile_command\"\n      fi\n      if test -n \"$finalize_shlibpath\"; then\n\tfinalize_command=\"$shlibpath_var=\\\"$finalize_shlibpath\\$$shlibpath_var\\\" $finalize_command\"\n      fi\n\n      compile_var=\n      finalize_var=\n      if test -n \"$runpath_var\"; then\n\tif test -n \"$perm_rpath\"; then\n\t  # We should set the runpath_var.\n\t  rpath=\n\t  for dir in $perm_rpath; do\n\t    func_append rpath \"$dir:\"\n\t  done\n\t  compile_var=\"$runpath_var=\\\"$rpath\\$$runpath_var\\\" \"\n\tfi\n\tif test -n \"$finalize_perm_rpath\"; then\n\t  # We should set the runpath_var.\n\t  rpath=\n\t  for dir in $finalize_perm_rpath; do\n\t    func_append rpath \"$dir:\"\n\t  done\n\t  finalize_var=\"$runpath_var=\\\"$rpath\\$$runpath_var\\\" \"\n\tfi\n      fi\n\n      if test yes = \"$no_install\"; then\n\t# We don't need to create a wrapper script.\n\tlink_command=$compile_var$compile_command$compile_rpath\n\t# Replace the output file specification.\n\tlink_command=`$ECHO \"$link_command\" | $SED 's%@OUTPUT@%'\"$output\"'%g'`\n\t# Delete the old output file.\n\t$opt_dry_run || $RM $output\n\t# Link the executable and exit\n\tfunc_show_eval \"$link_command\" 'exit $?'\n\n\tif test -n \"$postlink_cmds\"; then\n\t  func_to_tool_file \"$output\"\n\t  postlink_cmds=`func_echo_all \"$postlink_cmds\" | $SED -e 's%@OUTPUT@%'\"$output\"'%g' -e 's%@TOOL_OUTPUT@%'\"$func_to_tool_file_result\"'%g'`\n\t  func_execute_cmds \"$postlink_cmds\" 'exit $?'\n\tfi\n\n\texit $EXIT_SUCCESS\n      fi\n\n      case $hardcode_action,$fast_install in\n        relink,*)\n\t  # Fast installation is not supported\n\t  link_command=$compile_var$compile_command$compile_rpath\n\t  relink_command=$finalize_var$finalize_command$finalize_rpath\n\n\t  func_warning \"this platform does not like uninstalled shared libraries\"\n\t  func_warning \"'$output' will be relinked during installation\"\n\t  ;;\n        *,yes)\n\t  link_command=$finalize_var$compile_command$finalize_rpath\n\t  relink_command=`$ECHO \"$compile_var$compile_command$compile_rpath\" | $SED 's%@OUTPUT@%\\$progdir/\\$file%g'`\n          ;;\n\t*,no)\n\t  link_command=$compile_var$compile_command$compile_rpath\n\t  relink_command=$finalize_var$finalize_command$finalize_rpath\n          ;;\n\t*,needless)\n\t  link_command=$finalize_var$compile_command$finalize_rpath\n\t  relink_command=\n          ;;\n      esac\n\n      # Replace the output file specification.\n      link_command=`$ECHO \"$link_command\" | $SED 's%@OUTPUT@%'\"$output_objdir/$outputname\"'%g'`\n\n      # Delete the old output files.\n      $opt_dry_run || $RM $output $output_objdir/$outputname $output_objdir/lt-$outputname\n\n      func_show_eval \"$link_command\" 'exit $?'\n\n      if test -n \"$postlink_cmds\"; then\n\tfunc_to_tool_file \"$output_objdir/$outputname\"\n\tpostlink_cmds=`func_echo_all \"$postlink_cmds\" | $SED -e 's%@OUTPUT@%'\"$output_objdir/$outputname\"'%g' -e 's%@TOOL_OUTPUT@%'\"$func_to_tool_file_result\"'%g'`\n\tfunc_execute_cmds \"$postlink_cmds\" 'exit $?'\n      fi\n\n      # Now create the wrapper script.\n      func_verbose \"creating $output\"\n\n      # Quote the relink command for shipping.\n      if test -n \"$relink_command\"; then\n\t# Preserve any variables that may affect compiler behavior\n\tfor var in $variables_saved_for_relink; do\n\t  if eval test -z \\\"\\${$var+set}\\\"; then\n\t    relink_command=\"{ test -z \\\"\\${$var+set}\\\" || $lt_unset $var || { $var=; export $var; }; }; $relink_command\"\n\t  elif eval var_value=\\$$var; test -z \"$var_value\"; then\n\t    relink_command=\"$var=; export $var; $relink_command\"\n\t  else\n\t    func_quote_for_eval \"$var_value\"\n\t    relink_command=\"$var=$func_quote_for_eval_result; export $var; $relink_command\"\n\t  fi\n\tdone\n\trelink_command=\"(cd `pwd`; $relink_command)\"\n\trelink_command=`$ECHO \"$relink_command\" | $SED \"$sed_quote_subst\"`\n      fi\n\n      # Only actually do things if not in dry run mode.\n      $opt_dry_run || {\n\t# win32 will think the script is a binary if it has\n\t# a .exe suffix, so we strip it off here.\n\tcase $output in\n\t  *.exe) func_stripname '' '.exe' \"$output\"\n\t         output=$func_stripname_result ;;\n\tesac\n\t# test for cygwin because mv fails w/o .exe extensions\n\tcase $host in\n\t  *cygwin*)\n\t    exeext=.exe\n\t    func_stripname '' '.exe' \"$outputname\"\n\t    outputname=$func_stripname_result ;;\n\t  *) exeext= ;;\n\tesac\n\tcase $host in\n\t  *cygwin* | *mingw* )\n\t    func_dirname_and_basename \"$output\" \"\" \".\"\n\t    output_name=$func_basename_result\n\t    output_path=$func_dirname_result\n\t    cwrappersource=$output_path/$objdir/lt-$output_name.c\n\t    cwrapper=$output_path/$output_name.exe\n\t    $RM $cwrappersource $cwrapper\n\t    trap \"$RM $cwrappersource $cwrapper; exit $EXIT_FAILURE\" 1 2 15\n\n\t    func_emit_cwrapperexe_src > $cwrappersource\n\n\t    # The wrapper executable is built using the $host compiler,\n\t    # because it contains $host paths and files. If cross-\n\t    # compiling, it, like the target executable, must be\n\t    # executed on the $host or under an emulation environment.\n\t    $opt_dry_run || {\n\t      $LTCC $LTCFLAGS -o $cwrapper $cwrappersource\n\t      $STRIP $cwrapper\n\t    }\n\n\t    # Now, create the wrapper script for func_source use:\n\t    func_ltwrapper_scriptname $cwrapper\n\t    $RM $func_ltwrapper_scriptname_result\n\t    trap \"$RM $func_ltwrapper_scriptname_result; exit $EXIT_FAILURE\" 1 2 15\n\t    $opt_dry_run || {\n\t      # note: this script will not be executed, so do not chmod.\n\t      if test \"x$build\" = \"x$host\"; then\n\t\t$cwrapper --lt-dump-script > $func_ltwrapper_scriptname_result\n\t      else\n\t\tfunc_emit_wrapper no > $func_ltwrapper_scriptname_result\n\t      fi\n\t    }\n\t  ;;\n\t  * )\n\t    $RM $output\n\t    trap \"$RM $output; exit $EXIT_FAILURE\" 1 2 15\n\n\t    func_emit_wrapper no > $output\n\t    chmod +x $output\n\t  ;;\n\tesac\n      }\n      exit $EXIT_SUCCESS\n      ;;\n    esac\n\n    # See if we need to build an old-fashioned archive.\n    for oldlib in $oldlibs; do\n\n      case $build_libtool_libs in\n        convenience)\n\t  oldobjs=\"$libobjs_save $symfileobj\"\n\t  addlibs=$convenience\n\t  build_libtool_libs=no\n\t  ;;\n\tmodule)\n\t  oldobjs=$libobjs_save\n\t  addlibs=$old_convenience\n\t  build_libtool_libs=no\n          ;;\n\t*)\n\t  oldobjs=\"$old_deplibs $non_pic_objects\"\n\t  $preload && test -f \"$symfileobj\" \\\n\t    && func_append oldobjs \" $symfileobj\"\n\t  addlibs=$old_convenience\n\t  ;;\n      esac\n\n      if test -n \"$addlibs\"; then\n\tgentop=$output_objdir/${outputname}x\n\tfunc_append generated \" $gentop\"\n\n\tfunc_extract_archives $gentop $addlibs\n\tfunc_append oldobjs \" $func_extract_archives_result\"\n      fi\n\n      # Do each command in the archive commands.\n      if test -n \"$old_archive_from_new_cmds\" && test yes = \"$build_libtool_libs\"; then\n\tcmds=$old_archive_from_new_cmds\n      else\n\n\t# Add any objects from preloaded convenience libraries\n\tif test -n \"$dlprefiles\"; then\n\t  gentop=$output_objdir/${outputname}x\n\t  func_append generated \" $gentop\"\n\n\t  func_extract_archives $gentop $dlprefiles\n\t  func_append oldobjs \" $func_extract_archives_result\"\n\tfi\n\n\t# POSIX demands no paths to be encoded in archives.  We have\n\t# to avoid creating archives with duplicate basenames if we\n\t# might have to extract them afterwards, e.g., when creating a\n\t# static archive out of a convenience library, or when linking\n\t# the entirety of a libtool archive into another (currently\n\t# not supported by libtool).\n\tif (for obj in $oldobjs\n\t    do\n\t      func_basename \"$obj\"\n\t      $ECHO \"$func_basename_result\"\n\t    done | sort | sort -uc >/dev/null 2>&1); then\n\t  :\n\telse\n\t  echo \"copying selected object files to avoid basename conflicts...\"\n\t  gentop=$output_objdir/${outputname}x\n\t  func_append generated \" $gentop\"\n\t  func_mkdir_p \"$gentop\"\n\t  save_oldobjs=$oldobjs\n\t  oldobjs=\n\t  counter=1\n\t  for obj in $save_oldobjs\n\t  do\n\t    func_basename \"$obj\"\n\t    objbase=$func_basename_result\n\t    case \" $oldobjs \" in\n\t    \" \") oldobjs=$obj ;;\n\t    *[\\ /]\"$objbase \"*)\n\t      while :; do\n\t\t# Make sure we don't pick an alternate name that also\n\t\t# overlaps.\n\t\tnewobj=lt$counter-$objbase\n\t\tfunc_arith $counter + 1\n\t\tcounter=$func_arith_result\n\t\tcase \" $oldobjs \" in\n\t\t*[\\ /]\"$newobj \"*) ;;\n\t\t*) if test ! -f \"$gentop/$newobj\"; then break; fi ;;\n\t\tesac\n\t      done\n\t      func_show_eval \"ln $obj $gentop/$newobj || cp $obj $gentop/$newobj\"\n\t      func_append oldobjs \" $gentop/$newobj\"\n\t      ;;\n\t    *) func_append oldobjs \" $obj\" ;;\n\t    esac\n\t  done\n\tfi\n\tfunc_to_tool_file \"$oldlib\" func_convert_file_msys_to_w32\n\ttool_oldlib=$func_to_tool_file_result\n\teval cmds=\\\"$old_archive_cmds\\\"\n\n\tfunc_len \" $cmds\"\n\tlen=$func_len_result\n\tif test \"$len\" -lt \"$max_cmd_len\" || test \"$max_cmd_len\" -le -1; then\n\t  cmds=$old_archive_cmds\n\telif test -n \"$archiver_list_spec\"; then\n\t  func_verbose \"using command file archive linking...\"\n\t  for obj in $oldobjs\n\t  do\n\t    func_to_tool_file \"$obj\"\n\t    $ECHO \"$func_to_tool_file_result\"\n\t  done > $output_objdir/$libname.libcmd\n\t  func_to_tool_file \"$output_objdir/$libname.libcmd\"\n\t  oldobjs=\" $archiver_list_spec$func_to_tool_file_result\"\n\t  cmds=$old_archive_cmds\n\telse\n\t  # the command line is too long to link in one step, link in parts\n\t  func_verbose \"using piecewise archive linking...\"\n\t  save_RANLIB=$RANLIB\n\t  RANLIB=:\n\t  objlist=\n\t  concat_cmds=\n\t  save_oldobjs=$oldobjs\n\t  oldobjs=\n\t  # Is there a better way of finding the last object in the list?\n\t  for obj in $save_oldobjs\n\t  do\n\t    last_oldobj=$obj\n\t  done\n\t  eval test_cmds=\\\"$old_archive_cmds\\\"\n\t  func_len \" $test_cmds\"\n\t  len0=$func_len_result\n\t  len=$len0\n\t  for obj in $save_oldobjs\n\t  do\n\t    func_len \" $obj\"\n\t    func_arith $len + $func_len_result\n\t    len=$func_arith_result\n\t    func_append objlist \" $obj\"\n\t    if test \"$len\" -lt \"$max_cmd_len\"; then\n\t      :\n\t    else\n\t      # the above command should be used before it gets too long\n\t      oldobjs=$objlist\n\t      if test \"$obj\" = \"$last_oldobj\"; then\n\t\tRANLIB=$save_RANLIB\n\t      fi\n\t      test -z \"$concat_cmds\" || concat_cmds=$concat_cmds~\n\t      eval concat_cmds=\\\"\\$concat_cmds$old_archive_cmds\\\"\n\t      objlist=\n\t      len=$len0\n\t    fi\n\t  done\n\t  RANLIB=$save_RANLIB\n\t  oldobjs=$objlist\n\t  if test -z \"$oldobjs\"; then\n\t    eval cmds=\\\"\\$concat_cmds\\\"\n\t  else\n\t    eval cmds=\\\"\\$concat_cmds~\\$old_archive_cmds\\\"\n\t  fi\n\tfi\n      fi\n      func_execute_cmds \"$cmds\" 'exit $?'\n    done\n\n    test -n \"$generated\" && \\\n      func_show_eval \"${RM}r$generated\"\n\n    # Now create the libtool archive.\n    case $output in\n    *.la)\n      old_library=\n      test yes = \"$build_old_libs\" && old_library=$libname.$libext\n      func_verbose \"creating $output\"\n\n      # Preserve any variables that may affect compiler behavior\n      for var in $variables_saved_for_relink; do\n\tif eval test -z \\\"\\${$var+set}\\\"; then\n\t  relink_command=\"{ test -z \\\"\\${$var+set}\\\" || $lt_unset $var || { $var=; export $var; }; }; $relink_command\"\n\telif eval var_value=\\$$var; test -z \"$var_value\"; then\n\t  relink_command=\"$var=; export $var; $relink_command\"\n\telse\n\t  func_quote_for_eval \"$var_value\"\n\t  relink_command=\"$var=$func_quote_for_eval_result; export $var; $relink_command\"\n\tfi\n      done\n      # Quote the link command for shipping.\n      relink_command=\"(cd `pwd`; $SHELL \\\"$progpath\\\" $preserve_args --mode=relink $libtool_args @inst_prefix_dir@)\"\n      relink_command=`$ECHO \"$relink_command\" | $SED \"$sed_quote_subst\"`\n      if test yes = \"$hardcode_automatic\"; then\n\trelink_command=\n      fi\n\n      # Only create the output if not a dry run.\n      $opt_dry_run || {\n\tfor installed in no yes; do\n\t  if test yes = \"$installed\"; then\n\t    if test -z \"$install_libdir\"; then\n\t      break\n\t    fi\n\t    output=$output_objdir/${outputname}i\n\t    # Replace all uninstalled libtool libraries with the installed ones\n\t    newdependency_libs=\n\t    for deplib in $dependency_libs; do\n\t      case $deplib in\n\t      *.la)\n\t\tfunc_basename \"$deplib\"\n\t\tname=$func_basename_result\n\t\tfunc_resolve_sysroot \"$deplib\"\n\t\teval libdir=`$SED -n -e 's/^libdir=\\(.*\\)$/\\1/p' $func_resolve_sysroot_result`\n\t\ttest -z \"$libdir\" && \\\n\t\t  func_fatal_error \"'$deplib' is not a valid libtool archive\"\n\t\tfunc_append newdependency_libs \" ${lt_sysroot:+=}$libdir/$name\"\n\t\t;;\n\t      -L*)\n\t\tfunc_stripname -L '' \"$deplib\"\n\t\tfunc_replace_sysroot \"$func_stripname_result\"\n\t\tfunc_append newdependency_libs \" -L$func_replace_sysroot_result\"\n\t\t;;\n\t      -R*)\n\t\tfunc_stripname -R '' \"$deplib\"\n\t\tfunc_replace_sysroot \"$func_stripname_result\"\n\t\tfunc_append newdependency_libs \" -R$func_replace_sysroot_result\"\n\t\t;;\n\t      *) func_append newdependency_libs \" $deplib\" ;;\n\t      esac\n\t    done\n\t    dependency_libs=$newdependency_libs\n\t    newdlfiles=\n\n\t    for lib in $dlfiles; do\n\t      case $lib in\n\t      *.la)\n\t        func_basename \"$lib\"\n\t\tname=$func_basename_result\n\t\teval libdir=`$SED -n -e 's/^libdir=\\(.*\\)$/\\1/p' $lib`\n\t\ttest -z \"$libdir\" && \\\n\t\t  func_fatal_error \"'$lib' is not a valid libtool archive\"\n\t\tfunc_append newdlfiles \" ${lt_sysroot:+=}$libdir/$name\"\n\t\t;;\n\t      *) func_append newdlfiles \" $lib\" ;;\n\t      esac\n\t    done\n\t    dlfiles=$newdlfiles\n\t    newdlprefiles=\n\t    for lib in $dlprefiles; do\n\t      case $lib in\n\t      *.la)\n\t\t# Only pass preopened files to the pseudo-archive (for\n\t\t# eventual linking with the app. that links it) if we\n\t\t# didn't already link the preopened objects directly into\n\t\t# the library:\n\t\tfunc_basename \"$lib\"\n\t\tname=$func_basename_result\n\t\teval libdir=`$SED -n -e 's/^libdir=\\(.*\\)$/\\1/p' $lib`\n\t\ttest -z \"$libdir\" && \\\n\t\t  func_fatal_error \"'$lib' is not a valid libtool archive\"\n\t\tfunc_append newdlprefiles \" ${lt_sysroot:+=}$libdir/$name\"\n\t\t;;\n\t      esac\n\t    done\n\t    dlprefiles=$newdlprefiles\n\t  else\n\t    newdlfiles=\n\t    for lib in $dlfiles; do\n\t      case $lib in\n\t\t[\\\\/]* | [A-Za-z]:[\\\\/]*) abs=$lib ;;\n\t\t*) abs=`pwd`\"/$lib\" ;;\n\t      esac\n\t      func_append newdlfiles \" $abs\"\n\t    done\n\t    dlfiles=$newdlfiles\n\t    newdlprefiles=\n\t    for lib in $dlprefiles; do\n\t      case $lib in\n\t\t[\\\\/]* | [A-Za-z]:[\\\\/]*) abs=$lib ;;\n\t\t*) abs=`pwd`\"/$lib\" ;;\n\t      esac\n\t      func_append newdlprefiles \" $abs\"\n\t    done\n\t    dlprefiles=$newdlprefiles\n\t  fi\n\t  $RM $output\n\t  # place dlname in correct position for cygwin\n\t  # In fact, it would be nice if we could use this code for all target\n\t  # systems that can't hard-code library paths into their executables\n\t  # and that have no shared library path variable independent of PATH,\n\t  # but it turns out we can't easily determine that from inspecting\n\t  # libtool variables, so we have to hard-code the OSs to which it\n\t  # applies here; at the moment, that means platforms that use the PE\n\t  # object format with DLL files.  See the long comment at the top of\n\t  # tests/bindir.at for full details.\n\t  tdlname=$dlname\n\t  case $host,$output,$installed,$module,$dlname in\n\t    *cygwin*,*lai,yes,no,*.dll | *mingw*,*lai,yes,no,*.dll | *cegcc*,*lai,yes,no,*.dll)\n\t      # If a -bindir argument was supplied, place the dll there.\n\t      if test -n \"$bindir\"; then\n\t\tfunc_relative_path \"$install_libdir\" \"$bindir\"\n\t\ttdlname=$func_relative_path_result/$dlname\n\t      else\n\t\t# Otherwise fall back on heuristic.\n\t\ttdlname=../bin/$dlname\n\t      fi\n\t      ;;\n\t  esac\n\t  $ECHO > $output \"\\\n# $outputname - a libtool library file\n# Generated by $PROGRAM (GNU $PACKAGE) $VERSION\n#\n# Please DO NOT delete this file!\n# It is necessary for linking the library.\n\n# The name that we can dlopen(3).\ndlname='$tdlname'\n\n# Names of this library.\nlibrary_names='$library_names'\n\n# The name of the static archive.\nold_library='$old_library'\n\n# Linker flags that cannot go in dependency_libs.\ninherited_linker_flags='$new_inherited_linker_flags'\n\n# Libraries that this one depends upon.\ndependency_libs='$dependency_libs'\n\n# Names of additional weak libraries provided by this library\nweak_library_names='$weak_libs'\n\n# Version information for $libname.\ncurrent=$current\nage=$age\nrevision=$revision\n\n# Is this an already installed library?\ninstalled=$installed\n\n# Should we warn about portability when linking against -modules?\nshouldnotlink=$module\n\n# Files to dlopen/dlpreopen\ndlopen='$dlfiles'\ndlpreopen='$dlprefiles'\n\n# Directory that this library needs to be installed in:\nlibdir='$install_libdir'\"\n\t  if test no,yes = \"$installed,$need_relink\"; then\n\t    $ECHO >> $output \"\\\nrelink_command=\\\"$relink_command\\\"\"\n\t  fi\n\tdone\n      }\n\n      # Do a symbolic link so that the libtool archive can be found in\n      # LD_LIBRARY_PATH before the program is installed.\n      func_show_eval '( cd \"$output_objdir\" && $RM \"$outputname\" && $LN_S \"../$outputname\" \"$outputname\" )' 'exit $?'\n      ;;\n    esac\n    exit $EXIT_SUCCESS\n}\n\nif test link = \"$opt_mode\" || test relink = \"$opt_mode\"; then\n  func_mode_link ${1+\"$@\"}\nfi\n\n\n# func_mode_uninstall arg...\nfunc_mode_uninstall ()\n{\n    $debug_cmd\n\n    RM=$nonopt\n    files=\n    rmforce=false\n    exit_status=0\n\n    # This variable tells wrapper scripts just to set variables rather\n    # than running their programs.\n    libtool_install_magic=$magic\n\n    for arg\n    do\n      case $arg in\n      -f) func_append RM \" $arg\"; rmforce=: ;;\n      -*) func_append RM \" $arg\" ;;\n      *) func_append files \" $arg\" ;;\n      esac\n    done\n\n    test -z \"$RM\" && \\\n      func_fatal_help \"you must specify an RM program\"\n\n    rmdirs=\n\n    for file in $files; do\n      func_dirname \"$file\" \"\" \".\"\n      dir=$func_dirname_result\n      if test . = \"$dir\"; then\n\todir=$objdir\n      else\n\todir=$dir/$objdir\n      fi\n      func_basename \"$file\"\n      name=$func_basename_result\n      test uninstall = \"$opt_mode\" && odir=$dir\n\n      # Remember odir for removal later, being careful to avoid duplicates\n      if test clean = \"$opt_mode\"; then\n\tcase \" $rmdirs \" in\n\t  *\" $odir \"*) ;;\n\t  *) func_append rmdirs \" $odir\" ;;\n\tesac\n      fi\n\n      # Don't error if the file doesn't exist and rm -f was used.\n      if { test -L \"$file\"; } >/dev/null 2>&1 ||\n\t { test -h \"$file\"; } >/dev/null 2>&1 ||\n\t test -f \"$file\"; then\n\t:\n      elif test -d \"$file\"; then\n\texit_status=1\n\tcontinue\n      elif $rmforce; then\n\tcontinue\n      fi\n\n      rmfiles=$file\n\n      case $name in\n      *.la)\n\t# Possibly a libtool archive, so verify it.\n\tif func_lalib_p \"$file\"; then\n\t  func_source $dir/$name\n\n\t  # Delete the libtool libraries and symlinks.\n\t  for n in $library_names; do\n\t    func_append rmfiles \" $odir/$n\"\n\t  done\n\t  test -n \"$old_library\" && func_append rmfiles \" $odir/$old_library\"\n\n\t  case $opt_mode in\n\t  clean)\n\t    case \" $library_names \" in\n\t    *\" $dlname \"*) ;;\n\t    *) test -n \"$dlname\" && func_append rmfiles \" $odir/$dlname\" ;;\n\t    esac\n\t    test -n \"$libdir\" && func_append rmfiles \" $odir/$name $odir/${name}i\"\n\t    ;;\n\t  uninstall)\n\t    if test -n \"$library_names\"; then\n\t      # Do each command in the postuninstall commands.\n\t      func_execute_cmds \"$postuninstall_cmds\" '$rmforce || exit_status=1'\n\t    fi\n\n\t    if test -n \"$old_library\"; then\n\t      # Do each command in the old_postuninstall commands.\n\t      func_execute_cmds \"$old_postuninstall_cmds\" '$rmforce || exit_status=1'\n\t    fi\n\t    # FIXME: should reinstall the best remaining shared library.\n\t    ;;\n\t  esac\n\tfi\n\t;;\n\n      *.lo)\n\t# Possibly a libtool object, so verify it.\n\tif func_lalib_p \"$file\"; then\n\n\t  # Read the .lo file\n\t  func_source $dir/$name\n\n\t  # Add PIC object to the list of files to remove.\n\t  if test -n \"$pic_object\" && test none != \"$pic_object\"; then\n\t    func_append rmfiles \" $dir/$pic_object\"\n\t  fi\n\n\t  # Add non-PIC object to the list of files to remove.\n\t  if test -n \"$non_pic_object\" && test none != \"$non_pic_object\"; then\n\t    func_append rmfiles \" $dir/$non_pic_object\"\n\t  fi\n\tfi\n\t;;\n\n      *)\n\tif test clean = \"$opt_mode\"; then\n\t  noexename=$name\n\t  case $file in\n\t  *.exe)\n\t    func_stripname '' '.exe' \"$file\"\n\t    file=$func_stripname_result\n\t    func_stripname '' '.exe' \"$name\"\n\t    noexename=$func_stripname_result\n\t    # $file with .exe has already been added to rmfiles,\n\t    # add $file without .exe\n\t    func_append rmfiles \" $file\"\n\t    ;;\n\t  esac\n\t  # Do a test to see if this is a libtool program.\n\t  if func_ltwrapper_p \"$file\"; then\n\t    if func_ltwrapper_executable_p \"$file\"; then\n\t      func_ltwrapper_scriptname \"$file\"\n\t      relink_command=\n\t      func_source $func_ltwrapper_scriptname_result\n\t      func_append rmfiles \" $func_ltwrapper_scriptname_result\"\n\t    else\n\t      relink_command=\n\t      func_source $dir/$noexename\n\t    fi\n\n\t    # note $name still contains .exe if it was in $file originally\n\t    # as does the version of $file that was added into $rmfiles\n\t    func_append rmfiles \" $odir/$name $odir/${name}S.$objext\"\n\t    if test yes = \"$fast_install\" && test -n \"$relink_command\"; then\n\t      func_append rmfiles \" $odir/lt-$name\"\n\t    fi\n\t    if test \"X$noexename\" != \"X$name\"; then\n\t      func_append rmfiles \" $odir/lt-$noexename.c\"\n\t    fi\n\t  fi\n\tfi\n\t;;\n      esac\n      func_show_eval \"$RM $rmfiles\" 'exit_status=1'\n    done\n\n    # Try to remove the $objdir's in the directories where we deleted files\n    for dir in $rmdirs; do\n      if test -d \"$dir\"; then\n\tfunc_show_eval \"rmdir $dir >/dev/null 2>&1\"\n      fi\n    done\n\n    exit $exit_status\n}\n\nif test uninstall = \"$opt_mode\" || test clean = \"$opt_mode\"; then\n  func_mode_uninstall ${1+\"$@\"}\nfi\n\ntest -z \"$opt_mode\" && {\n  help=$generic_help\n  func_fatal_help \"you must specify a MODE\"\n}\n\ntest -z \"$exec_cmd\" && \\\n  func_fatal_help \"invalid operation mode '$opt_mode'\"\n\nif test -n \"$exec_cmd\"; then\n  eval exec \"$exec_cmd\"\n  exit $EXIT_FAILURE\nfi\n\nexit $exit_status\n\n\n# The TAGs below are defined such that we never get into a situation\n# where we disable both kinds of libraries.  Given conflicting\n# choices, we go for a static library, that is the most portable,\n# since we can't tell whether shared libraries were disabled because\n# the user asked for that or because the platform doesn't support\n# them.  This is particularly important on AIX, because we don't\n# support having both static and shared libraries enabled at the same\n# time on that platform, so we default to a shared-only configuration.\n# If a disable-shared tag is given, we'll fallback to a static-only\n# configuration.  But we'll never go from static-only to shared-only.\n\n# ### BEGIN LIBTOOL TAG CONFIG: disable-shared\nbuild_libtool_libs=no\nbuild_old_libs=yes\n# ### END LIBTOOL TAG CONFIG: disable-shared\n\n# ### BEGIN LIBTOOL TAG CONFIG: disable-static\nbuild_old_libs=`case $build_libtool_libs in yes) echo no;; *) echo yes;; esac`\n# ### END LIBTOOL TAG CONFIG: disable-static\n\n# Local Variables:\n# mode:shell-script\n# sh-indentation:2\n# End:\n"
  },
  {
    "path": "requirements.txt",
    "content": "decorator>=4.3.0\njoblib>=0.14.1\nnumpy>=1.18.2\npandas>=1.0.3\nscipy>=1.4.1\nsklearn>=0.0\nsympy>=1.4\nxgboost>=0.81\n"
  },
  {
    "path": "src/ChangeLog",
    "content": "version: 0.08.3\ndate: Wed Nov 13 11:39:01 CET 2019\nchanges:\n\t- support recent versions of clang\n\t- fix OpenMP support when contraction is enabled\n---\nversion: 0.08.2\ndate: Thu Mar 28 18:36:52 CET 2019\nchanges:\n\t- support recent versions of clang\n---\nversion: 0.08.1\ndate: Mon Jul 30 23:05:04 CEST 2018\nchanges:\n\t- move some functionality to isl\n---\nversion: 0.08\ndate: Sat Mar  3 15:31:38 CET 2018\nchanges:\n\t- minor fixes\n---\nversion: 0.07\ndate: Tue Feb  7 17:23:22 CET 2017\nchanges:\n\t- support hybrid tiling\n---\nversion: 0.06\ndate: Fri May  6 12:08:50 CEST 2016\nchanges:\n\t- use PPCG specific macro names in generated code\n\t- complete transition to schedule trees\n\t- maximize coincidence by default\n\t- map arrays with constant index expressions to private memory\n\t- optionally group chains of statements\n---\nversion: 0.05\ndate: Fri Jan 15 09:30:23 CET 2016\nchanges:\n\t- fix live-out computation\n\t- optionally compute schedule for C target\n\t- optionally perform tiling for C target\n\t- create single kernel for non-permutable subtree\n---\nversion: 0.04\ndate: Wed Jun 17 10:52:58 CEST 2015\nchanges:\n\t- use schedule trees\n\t- fix live-range reordering\n\t- improve generation of synchronization\n\t- exploit independences during dependence analysis\n"
  },
  {
    "path": "src/LICENSE",
    "content": "MIT License (MIT)\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of\nthis software and associated documentation files (the \"Software\"), to deal in\nthe Software without restriction, including without limitation the rights to\nuse, copy, modify, merge, publish, distribute, sublicense, and/or sell copies\nof the Software, and to permit persons to whom the Software is furnished to do\nso, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "src/Makefile.am",
    "content": "if BUNDLED_ISL\n    MAYBE_ISL = isl\n    ISL_LA = $(top_builddir)/isl/libisl.la\n    LOCAL_ISL_LA = isl/libisl.la\nendif\nif BUNDLED_BARVINOK\n    MAYBE_BARVINOK = barvinok\n    BARVINOK_LA = $(top_builddir)/barvinok/libbarvinok.la\nendif \nif BUNDLED_PET\n    MAYBE_PET = pet\n    PET_LA = $(top_builddir)/pet/libpet.la\nendif\n\nSUBDIRS = $(MAYBE_ISL) $(MAYBE_BARVINOK) $(MAYBE_PET) .\n\nFORCE:\nisl/libisl.la: FORCE\n\tcd isl; $(MAKE) $(AM_MAKEFLAGS) libisl.la\nbarvinok/libbarvinok.la: FORCE\n\tcd barvinok; $(MAKE) $(AM_MAKEFLAGS) libbarvinok.la\npet/libpet.la: FORCE\n\tcd pet; $(MAKE) $(AM_MAKEFLAGS) libpet.la\n\nACLOCAL_AMFLAGS = -I m4\n\nLIB_ISL = $(ISL_LA) @ISL_LIBS@\nLIB_BARVINOK = $(BARVINOK_LA) @BARVINOK_LIBS@\nLIB_PET = $(PET_LA) @PET_LIBS@\n\nAM_CPPFLAGS = @ISL_CFLAGS@ @BARVINOK_CFLAGS@ @PET_CFLAGS@\nLDADD = $(LIB_PET) $(LIB_ISL) $(LIB_BARVINOK)\nAM_CXXFLAGS = -std=c++11\nbin_PROGRAMS = autosa\nautosa_SOURCES = \\\n\tcpu.c \\\n\tcpu.h \\\n\tgrouping.c \\\n\tgrouping.h \\\n\thybrid.c \\\n\thybrid.h \\\n\tschedule.c \\\n\tschedule.h \\\n\tppcg_options.c \\\n\tppcg_options.h \\\n\tppcg.c \\\n\tppcg.h \\\n\tprint.c \\\n\tprint.h \\\n\tutil.c \\\n\tutil.h \\\n\tmain.cpp \\\n\tcJSON/cJSON.c \\\n\tautosa_codegen.cpp \\\n\tautosa_comm.cpp \\\n\tautosa_common.cpp \\\n\tautosa_cpu.cpp \\\n\tautosa_intel_opencl.cpp \\\n\tautosa_print.cpp \\\n\tautosa_schedule_tree.cpp \\\n\tautosa_t2s.cpp \\\n\tautosa_trans.cpp \\\n\tautosa_utils.cpp \\\n\tautosa_xilinx_hls_c.cpp  \\\n\tautosa_catapult_hls_c.cpp \\\n\tautosa_tapa_cpp.cpp \\\n\tautosa_tuning.cpp \\\n\tjson.hpp\n\n#TESTS = @extra_tests@\n#EXTRA_TESTS = opencl_test.sh polybench_test.sh\n#TEST_EXTENSIONS = .sh\n\n#BUILT_SOURCES = gitversion.h\n\n#CLEANFILES = gitversion.h\n\n#EXTRA_DIST = \\\n#\texamples \\\n#\tocl_utilities.c \\\n#\tocl_utilities.h \\\n#\ttests\n\n#dist-hook:\n#\techo @GIT_HEAD_VERSION@ > $(distdir)/GIT_HEAD_ID\n#\n#gitversion.h: @GIT_HEAD@\n#\t$(AM_V_GEN)echo '#define GIT_HEAD_ID \"'@GIT_HEAD_VERSION@'\"' > $@\n#\n#cpu.c \\\n#cpu.h \\\n#cuda.c \\\n#cuda.h \\\n#opencl.c \\\n#opencl.h \\\n#cuda_common.h \\\n#cuda_common.c \\\n#gpu.c \\\n#gpu.h \\\n#gpu_array_tile.c \\\n#gpu_array_tile.h \\\n#gpu_group.c \\\n#gpu_group.h \\\n#gpu_hybrid.c \\\n#gpu_hybrid.h \\\n#gpu_print.c \\\n#gpu_print.h \\\n#gpu_tree.c \\\n#gpu_tree.h\n"
  },
  {
    "path": "src/README",
    "content": "Requirements:\n\n- automake, autoconf, libtool\n\t(not needed when compiling a release)\n- pkg-config (http://www.freedesktop.org/wiki/Software/pkg-config)\n\t(not needed when compiling a release using the included isl and pet)\n- gmp (http://gmplib.org/)\n- libyaml (http://pyyaml.org/wiki/LibYAML)\n\t(only needed if you want to compile the pet executable)\n- LLVM/clang libraries, 2.9 or higher (http://clang.llvm.org/get_started.html)\n\tUnless you have some other reasons for wanting to use the svn version,\n\tit is best to install the latest release (3.9).\n\tFor more details, see pet/README.\n\nIf you are installing on Ubuntu, then you can install the following packages:\n\nautomake autoconf libtool pkg-config libgmp3-dev libyaml-dev libclang-dev llvm\n\nNote that you need at least version 3.2 of libclang-dev (ubuntu raring).\nOlder versions of this package did not include the required libraries.\nIf you are using an older version of ubuntu, then you need to compile and\ninstall LLVM/clang from source.\n\n\nPreparing:\n\nGrab the latest release and extract it or get the source from\nthe git repository as follows.  This process requires autoconf,\nautomake, libtool and pkg-config.\n\n\tgit clone git://repo.or.cz/ppcg.git\n\tcd ppcg\n\t./get_submodules.sh\n\t./autogen.sh\n\n\nCompilation:\n\n\t./configure\n\tmake\n\tmake check\n\nIf you have installed any of the required libraries in a non-standard\nlocation, then you may need to use the --with-gmp-prefix,\n--with-libyaml-prefix and/or --with-clang-prefix options\nwhen calling \"./configure\".\n\n\nUsing PPCG to generate CUDA or OpenCL code\n\nTo convert a fragment of a C program to CUDA, insert a line containing\n\n\t#pragma scop\n\nbefore the fragment and add a line containing\n\n\t#pragma endscop\n\nafter the fragment.  To generate CUDA code run\n\t\n\tppcg --target=cuda file.c\n\nwhere file.c is the file containing the fragment.  The generated\ncode is stored in file_host.cu and file_kernel.cu.\n\nTo generate OpenCL code run\n\n\tppcg --target=opencl file.c\n\nwhere file.c is the file containing the fragment.  The generated code\nis stored in file_host.c and file_kernel.cl.\n\n\nSpecifying tile, grid and block sizes\n\nThe iterations space tile size, grid size and block size can\nbe specified using the --sizes option.  The argument is a union map\nin isl notation mapping kernels identified by their sequence number\nin a \"kernel\" space to singleton sets in the \"tile\", \"grid\" and \"block\"\nspaces.  The sizes are specified outermost to innermost.\n\nThe dimension of the \"tile\" space indicates the (maximal) number of loop\ndimensions to tile.  The elements of the single integer tuple\nspecify the tile sizes in each dimension.\nIn case of hybrid tiling, the first element is half the size of\nthe tile in the time (sequential) dimension.  The second element\nspecifies the number of elements in the base of the hexagon.\nThe remaining elements specify the tile sizes in the remaining space\ndimensions.\n\nThe dimension of the \"grid\" space indicates the (maximal) number of block\ndimensions in the grid.  The elements of the single integer tuple\nspecify the number of blocks in each dimension.\n\nThe dimension of the \"block\" space indicates the (maximal) number of thread\ndimensions in the grid.  The elements of the single integer tuple\nspecify the number of threads in each dimension.\n\nFor example,\n\n    { kernel[0] -> tile[64,64]; kernel[i] -> block[16] : i != 4 }\n\nspecifies that in kernel 0, two loops should be tiled with a tile\nsize of 64 in both dimensions and that all kernels except kernel 4\nshould be run using a block of 16 threads.\n\nSince PPCG performs some scheduling, it can be difficult to predict\nwhat exactly will end up in a kernel.  If you want to specify\ntile, grid or block sizes, you may want to run PPCG first with the defaults,\nexamine the kernels and then run PPCG again with the desired sizes.\nInstead of examining the kernels, you can also specify the option\n--dump-sizes on the first run to obtain the effectively used default sizes.\n\n\nCompiling the generated CUDA code with nvcc\n\nTo get optimal performance from nvcc, it is important to choose --arch\naccording to your target GPU.  Specifically, use the flag \"--arch sm_20\"\nfor fermi, \"--arch sm_30\" for GK10x Kepler and \"--arch sm_35\" for\nGK110 Kepler.  We discourage the use of older cards as we have seen\ncorrectness issues with compilation for older architectures.\nNote that in the absence of any --arch flag, nvcc defaults to\n\"--arch sm_13\". This will not only be slower, but can also cause\ncorrectness issues.\nIf you want to obtain results that are identical to those obtained\nby the original code, then you may need to disable some optimizations\nby passing the \"--fmad=false\" option.\n\n\nCompiling the generated OpenCL code with gcc\n\nTo compile the host code you need to link against the file\nocl_utilities.c which contains utility functions used by the generated\nOpenCL host code.  To compile the host code with gcc, run\n\n  gcc -std=c99 file_host.c ocl_utilities.c -lOpenCL\n\nNote that we have experienced the generated OpenCL code freezing\non some inputs (e.g., the PolyBench symm benchmark) when using\nat least some version of the Nvidia OpenCL library, while the\ncorresponding CUDA code runs fine.\nWe have experienced no such freezes when using AMD, ARM or Intel\nOpenCL libraries.\n\nBy default, the compiled executable will need the _kernel.cl file at\nrun time.  Alternatively, the option --opencl-embed-kernel-code may be\ngiven to place the kernel code in a string literal.  The kernel code is\nthen compiled into the host binary, such that the _kernel.cl file is no\nlonger needed at run time.  Any kernel include files, in particular\nthose supplied using --opencl-include-file, will still be required at\nrun time.\n\n\nFunction calls\n\nFunction calls inside the analyzed fragment are reproduced\nin the CUDA or OpenCL code, but for now it is left to the user\nto make sure that the functions that are being called are\navailable from the generated kernels.\n\nIn the case of OpenCL code, the --opencl-include-file option\nmay be used to specify one or more files to be #include'd\nfrom the generated code.  These files may then contain\nthe definitions of the functions being called from the\nprogram fragment.  If the pathnames of the included files\nare relative to the current directory, then you may need\nto additionally specify the --opencl-compiler-options=-I.\nto make sure that the files can be found by the OpenCL compiler.\nThe included files may contain definitions of types used by the\ngenerated kernels.  By default, PPCG generates definitions for\ntypes as needed, but these definitions may collide with those in\nthe included files, as PPCG does not consider the contents of the\nincluded files.  The --no-opencl-print-kernel-types will prevent\nPPCG from generating type definitions.\n\n\nGNU extensions\n\nBy default, PPCG may print out macro definitions that involve\nGNU extensions such as __typeof__ and statement expressions.\nSome compilers may not support these extensions.\nIn particular, OpenCL 1.2 beignet 1.1.1 (git-6de6918)\nhas been reported not to support __typeof__.\nThe use of these extensions can be turned off with the\n--no-allow-gnu-extensions option.\n\n\nProcessing PolyBench\n\nWhen processing a PolyBench/C 3.2 benchmark, you should always specify\n-DPOLYBENCH_USE_C99_PROTO on the ppcg command line.  Otherwise, the source\nfiles are inconsistent, having fixed size arrays but parametrically\nbounded loops iterating over them.\nHowever, you should not specify this define when compiling\nthe PPCG generated code using nvcc since CUDA does not support VLAs.\n\n\nCUDA and function overloading\n\nWhile CUDA supports function overloading based on the arguments types,\nno such function overloading exists in the input language C.  Since PPCG\nsimply prints out the same function name as in the original code, this\nmay result in a different function being called based on the types\nof the arguments.  For example, if the original code contains a call\nto the function sqrt() with a float argument, then the argument will\nbe promoted to a double and the sqrt() function will be called.\nIn the transformed (CUDA) code, however, overloading will cause the\nfunction sqrtf() to be called.  Until this issue has been resolved in PPCG,\nwe recommend that users either explicitly call the function sqrtf() or\nexplicitly cast the argument to double in the input code.\n\n\nContact\n\nFor bug reports, feature requests and questions,\ncontact http://groups.google.com/group/isl-development\n\nWhenever you report a bug, please mention the exact version of PPCG\nthat you are using (output of \"./ppcg --version\").  If you are unable\nto compile PPCG, then report the git version (output of \"git describe\")\nor the version number included in the name of the tarball.\n\n\nCiting PPCG\n\nIf you use PPCG for your research, you are invited to cite\nthe following paper.\n\n@article{Verdoolaege2013PPCG,\n    author = {Verdoolaege, Sven and Juega, Juan Carlos and Cohen, Albert and\n\t\tG\\'{o}mez, Jos{\\'e} Ignacio and Tenllado, Christian and\n\t\tCatthoor, Francky},\n    title = {Polyhedral parallel code generation for CUDA},\n    journal = {ACM Trans. Archit. Code Optim.},\n    issue_date = {January 2013},\n    volume = {9},\n    number = {4},\n    month = jan,\n    year = {2013},\n    issn = {1544-3566},\n    pages = {54:1--54:23},\n    doi = {10.1145/2400682.2400713},\n    acmid = {2400713},\n    publisher = {ACM},\n    address = {New York, NY, USA},\n}\n"
  },
  {
    "path": "src/autogen.sh",
    "content": "#!/bin/sh\nautoreconf -i\nif test -f isl/autogen.sh; then\n\t(cd isl; ./autogen.sh)\nfi\nif test -f barvinok/autogen.sh; then\n  (cd barvinok; ./autogen.sh)\nfi\nif test -f pet/autogen.sh; then\n\t(cd pet; ./autogen.sh)\nfi\n"
  },
  {
    "path": "src/autosa_catapult_hls_c.cpp",
    "content": "#include <isl/ctx.h>\n\n#include \"autosa_catapult_hls_c.h\"\n#include \"autosa_common.h\"\n#include \"autosa_comm.h\"\n#include \"autosa_print.h\"\n#include \"autosa_trans.h\"\n#include \"autosa_codegen.h\"\n#include \"autosa_utils.h\"\n\n#include <set>\n\nstruct print_host_user_data\n{\n  struct hls_info *hls;\n  struct autosa_prog *prog;\n  struct autosa_hw_top_module *top;\n};\n\nstruct print_hw_module_data\n{\n  struct hls_info *hls;\n  struct autosa_prog *prog;\n  struct autosa_hw_module *module;\n  /* Used for double buffer codegen. Modify the printed iterator prefix. */\n  const char *iterator_prefix;\n};\n\n/* Open the host .cpp file and the kernel .h and .cpp files for writing.\n * Add the necessary includes.\n */\nstatic void hls_open_files(struct hls_info *info, const char *input)\n{\n  char name[PATH_MAX];\n  char dir[PATH_MAX];\n  int len, len_dir;\n  isl_printer *p_str;\n  char *file_path;\n\n  p_str = isl_printer_to_str(info->ctx);\n  p_str = isl_printer_print_str(p_str, info->output_dir);\n  p_str = isl_printer_print_str(p_str, \"/src/\");\n  file_path = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  len = ppcg_extract_base_name(name, input);\n\n  /* Store the prefix */\n  strncpy(dir, name, len);\n  dir[len] = '\\0';\n  p_str = isl_printer_to_str(info->ctx);\n  p_str = isl_printer_print_str(p_str, dir);\n  info->kernel_prefix = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  /* Add the prefix */\n  sprintf(dir, \"%s\", file_path);\n  len_dir = strlen(file_path);\n\n  strcpy(name + len, \"_host.cpp\");\n  strcpy(dir + len_dir, name);\n  info->host_c = fopen(dir, \"w\");\n  if (!info->host_c)\n  {\n    printf(\"[AutoSA] Error: Can't open the file: %s\\n\", dir);\n    exit(1);\n  }\n\n  //if (!info->hls)\n  //{\n  //  /* OpenCL host */\n  //  strcpy(name + len, \"_host.hpp\");\n  //  strcpy(dir + len_dir, name);\n  //  info->host_h = fopen(dir, \"w\");\n  //  print_xilinx_host_header(info->host_h);\n  //  fprintf(info->host_c, \"#include \\\"%s\\\"\\n\", name);\n  //}\n\n  strcpy(name + len, \"_directives.tcl\");\n  strcpy(dir + len_dir, name);\n  info->tcl = fopen(dir, \"w\");\n  if (!info->tcl) \n  {\n    printf(\"[AutoSA] Error: Can't open the file: %s\\n\", dir);\n    exit(1);\n  }\n\n  strcpy(name + len, \"_kernel_modules.cpp\");\n  strcpy(dir + len_dir, name);\n  info->kernel_c = fopen(dir, \"w\");\n  if (!info->kernel_c)\n  {\n    printf(\"[AutoSA] Error: Can't open the file: %s\\n\", dir);\n    exit(1);\n  }\n\n  strcpy(name + len, \"_kernel.h\");\n  strcpy(dir + len_dir, name);\n  info->kernel_h = fopen(dir, \"w\");\n  if (!info->kernel_h)\n  {\n    printf(\"[AutoSA] Error: Can't open the file: %s\\n\", dir);\n    exit(1);\n  }\n\n  //fprintf(info->host_c, \"#include <assert.h>\\n\");\n  //fprintf(info->host_c, \"#include <stdio.h>\\n\");\n  fprintf(info->host_c, \"#include <vector>\\n\");\n  fprintf(info->host_c, \"#include <cstdlib>\\n\");\n  if (info->hls)\n    fprintf(info->host_c, \"#include \\\"%s\\\"\\n\", name);\n\n  if (info->hls) {\n    fprintf(info->kernel_c, \"#include \\\"%s\\\"\\n\", name);\n    //fprintf(info->kernel_c, \"#include <mc_scverify.h>\\n\");\n  }\n\n  if (info->hls) {\n    strcpy(name + len, \"_kernel_hw.h\");\n    fprintf(info->host_c, \"#include \\\"%s\\\"\\n\", name);\n    fprintf(info->host_c, \"#include <mc_scverify.h>\\n\\n\");\n  }    \n\n  strcpy(name + len, \"_top_gen.cpp\");\n  strcpy(dir + len_dir, name);\n  info->top_gen_c = fopen(dir, \"w\");\n\n  strcpy(name + len, \"_top_gen.h\");\n  strcpy(dir + len_dir, name);\n  info->top_gen_h = fopen(dir, \"w\");\n\n  fprintf(info->top_gen_c, \"#include <isl/printer.h>\\n\");\n  fprintf(info->top_gen_c, \"#include \\\"%s\\\"\\n\", name);\n  \n  fprintf(info->kernel_h, \"#ifndef _KERNEL_H_\\n\");\n  fprintf(info->kernel_h, \"#define _KERNEL_H_\\n\");\n  fprintf(info->kernel_h, \"#include <ac_int.h>\\n\");\n  fprintf(info->kernel_h, \"#include <ac_channel.h>\\n\");\n  fprintf(info->kernel_h, \"#include <ac_float.h>\\n\");\n  fprintf(info->kernel_h, \"#include <ac_std_float.h>\\n\");\n  fprintf(info->kernel_h, \"#include <ac_math.h>\\n\");\n  fprintf(info->kernel_h, \"\\n\");\n\n  fprintf(info->kernel_h, \"#define min(x,y) ((x < y) ? x : y)\\n\");\n  fprintf(info->kernel_h, \"#define max(x,y) ((x > y) ? x : y)\\n\");\n  fprintf(info->kernel_h, \"\\n\");\n\n  free(file_path);\n}\n\n/* Close all output files.\n */\nstatic void hls_close_files(struct hls_info *info)\n{\n  isl_printer *p_str;\n  char *complete;\n  FILE *f;\n\n  fprintf(info->kernel_h, \"#endif\\n\\n\");\n\n  fclose(info->kernel_c);\n  fclose(info->kernel_h);\n  fclose(info->host_c);\n  if (!info->hls)\n  {\n    fclose(info->host_h);\n  }\n  fclose(info->top_gen_c);\n  fclose(info->top_gen_h);\n  fclose(info->tcl);\n  free(info->kernel_prefix);\n\n  p_str = isl_printer_to_str(info->ctx);\n  p_str = isl_printer_print_str(p_str, info->output_dir);\n  p_str = isl_printer_print_str(p_str, \"/src/completed\");\n  complete = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  f = fopen(complete, \"w\");\n  fclose(f);\n  free(complete);\n}\n\n/* Extract the data pack factors for each I/O buffer allocated for the current\n * I/O group.\n * Only insert the data pack factor that is not found in the current list\n * \"data_pack_factors\".\n * The list is in ascending order.\n */\nstatic int *extract_data_pack_factors(int *data_pack_factors,\n                                      int *n_factor, struct autosa_array_ref_group *group)\n{\n  /* Test if the group default packing factor needs to be inserted */\n  if (group->n_lane > 1)\n  {    \n    int n_lane = group->n_lane;\n    bool insert = true;\n    int pos = 0;\n    for (pos = 0; pos < *n_factor; pos++)\n    {\n      if (n_lane > data_pack_factors[pos])\n      {\n        if (pos < *n_factor - 1)\n        {\n          if (n_lane < data_pack_factors[pos + 1])\n          {\n            // insert @pos+1\n            pos++;\n            break;\n          }\n        }\n      }\n      else if (n_lane == data_pack_factors[pos])\n      {\n        insert = false;\n        break;\n      }\n    }\n\n    if (insert) {\n      *n_factor = *n_factor + 1;\n      data_pack_factors = (int *)realloc(data_pack_factors,\n                                         sizeof(int) * (*n_factor));\n      for (int j = *n_factor - 1; j > pos; j--)\n      {\n        data_pack_factors[j] = data_pack_factors[j - 1];\n      }\n      data_pack_factors[pos] = n_lane;\n    }\n  }\n\n  for (int i = 0; i < group->n_io_buffer; i++)\n  {\n    struct autosa_io_buffer *buf = group->io_buffers[i];\n    bool insert = true;\n    int pos = 0;\n    for (pos = 0; pos < *n_factor; pos++)\n    {\n      if (buf->n_lane > data_pack_factors[pos])\n      {\n        if (pos < *n_factor - 1)\n        {\n          if (buf->n_lane < data_pack_factors[pos + 1])\n          {\n            // insert @pos+1\n            pos++;\n            break;\n          }\n        }\n      }\n      else if (buf->n_lane == data_pack_factors[pos])\n      {\n        insert = false;\n        break;\n      }\n    }\n\n    if (!insert)\n      continue;\n\n    *n_factor = *n_factor + 1;\n    data_pack_factors = (int *)realloc(data_pack_factors,\n                                       sizeof(int) * (*n_factor));\n    for (int j = *n_factor - 1; j > pos; j--)\n    {\n      data_pack_factors[j] = data_pack_factors[j - 1];\n    }\n    data_pack_factors[pos] = buf->n_lane;\n  }\n\n  return data_pack_factors;\n}\n\n/* Examine the local buffers of each array group. \n * Extract the data pack factors and build the data types \n * required by the program. \n */\nstatic isl_stat print_data_types_catapult(\n  struct autosa_hw_top_module *top, struct hls_info *hls)\n{\n  isl_printer *p;\n  struct autosa_kernel *kernel;\n\n  kernel = top->kernel;\n  p = isl_printer_to_file(kernel->ctx, hls->kernel_h);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_str_new_line(p, \"/* Data Type */\");\n  \n  /* Print the primitive data type. */\n  for (int i = 0; i < kernel->n_array; i++) {\n    struct autosa_local_array_info *local = &kernel->array[i];\n    if (!strcmp(local->array->type, \"float\")) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"typedef ac_ieee_float<binary32> \");\n      p = isl_printer_print_str(p, local->array->name);\n      p = isl_printer_print_str(p, \"_t1;\");\n      p = isl_printer_end_line(p);\n    } else if (!strcmp(local->array->type, \"unsigned short\")) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"typedef ac_int<\");\n      p = isl_printer_print_int(p, local->array->size * 8);\n      p = isl_printer_print_str(p, \",false> \");\n      p = isl_printer_print_str(p, local->array->name);\n      p = isl_printer_print_str(p, \"_t1;\");\n      p = isl_printer_end_line(p);      \n    } else if (!strcmp(local->array->type, \"unsigned int\")) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"typedef ac_int<\");\n      p = isl_printer_print_int(p, local->array->size * 8);\n      p = isl_printer_print_str(p, \",false> \");\n      p = isl_printer_print_str(p, local->array->name);\n      p = isl_printer_print_str(p, \"_t1;\");\n      p = isl_printer_end_line(p);      \n    } else {\n      printf(\"[AutoSA] Warning: The primitive data type is not converted to Catapult data type.\\n\");\n      continue;\n    }\n  }\n\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local = &kernel->array[i];\n    int *data_pack_factors = (int *)malloc(sizeof(int));\n    int n_factor = 1;\n    /* First insert the default data pack factor for the array. */\n    data_pack_factors[0] = local->n_lane;    \n\n    /* IO group */\n    for (int n = 0; n < local->n_io_group; n++)\n    {\n      struct autosa_array_ref_group *group = local->io_groups[n];\n      data_pack_factors = extract_data_pack_factors(data_pack_factors, &n_factor, group);\n    }\n    /* Drain group */\n    if (local->drain_group)\n      data_pack_factors = extract_data_pack_factors(data_pack_factors, &n_factor, local->drain_group);\n\n    if (local->is_sparse) {\n      std::set<int> tmp_lanes;\n      for (int n = 0; n < n_factor; n++) {\n        tmp_lanes.insert(data_pack_factors[n] * kernel->n_nzero);\n        tmp_lanes.insert(data_pack_factors[n]);\n      }\n      for (auto it = tmp_lanes.begin(); it != tmp_lanes.end(); ++it) {\n        int f = *it;\n        if (local->array->size * 8 * f > 1024) {\n          printf(\"[AutoSA] Warning: The data width %d is greater than 1024-bit. The type definition is not generated.\\n\", local->array->size * 8 * f);\n          continue;\n        }\n        if (f > 1) {\n          p = isl_printer_start_line(p);\n          //p = isl_printer_print_str(p, \"typedef ap_uint<\");\n          p = isl_printer_print_str(p, \"typedef ac_int<\");\n          p = isl_printer_print_int(p, local->array->size * 8 * f);\n          p = isl_printer_print_str(p, \",false\");\n          p = isl_printer_print_str(p, \"> \");\n          p = isl_printer_print_str(p, local->array->name);\n          p = isl_printer_print_str(p, \"_t\");\n          p = isl_printer_print_int(p, f);\n          p = isl_printer_print_str(p, \";\");\n          p = isl_printer_end_line(p);\n        }\n      }\n\n      for (int n = 0; n < n_factor; n++) {\n        if (data_pack_factors[n] * kernel->n_nzero * local->array->size * 8 > 1024)\n          continue;\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"typedef struct \");\n        p = isl_printer_print_str(p, local->array->name);\n        p = isl_printer_print_str(p, \"_s_t\");\n        p = isl_printer_print_int(p, data_pack_factors[n]);\n        p = isl_printer_print_str(p, \" {\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_indent(p, 2);\n        \n        p = isl_printer_start_line(p);\n        if (data_pack_factors[n] == 1 && kernel->n_nzero == 1) {\n          p = isl_printer_print_str(p, local->array->type);\n        } else {\n          p = isl_printer_print_str(p, local->array->name);\n          p = isl_printer_print_str(p, \"_t\");\n          p = isl_printer_print_int(p, data_pack_factors[n] * kernel->n_nzero);\n        }\n        p = isl_printer_print_str(p, \" d;\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        if (data_pack_factors[n] == 1 && kernel->n_nzero == 1) {\n          p = isl_printer_print_str(p, \"unsigned char\");  \n        } else {\n          //p = isl_printer_print_str(p, \"ap_uint<\");\n          p = isl_printer_print_str(p, \"ac_int<\");\n          p = isl_printer_print_int(p, 8 * data_pack_factors[n]);\n          p = isl_printer_print_str(p, \",false\");\n          p = isl_printer_print_str(p, \">\");\n        }\n        p = isl_printer_print_str(p, \" i;\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_indent(p, -2);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"} \");\n        p = isl_printer_print_str(p, local->array->name);\n        p = isl_printer_print_str(p, \"_s_t\");\n        p = isl_printer_print_int(p, data_pack_factors[n]);\n        p = isl_printer_print_str(p, \";\");\n        p = isl_printer_end_line(p);\n      }\n    } else {\n      for (int n = 0; n < n_factor; n++)\n      {\n        if (data_pack_factors[n] != 1)\n        {\n          int width;\n          width = local->array->size * 8 * data_pack_factors[n];\n          p = isl_printer_start_line(p);\n          //p = isl_printer_print_str(p, \"typedef ap_uint<\");\n          p = isl_printer_print_str(p, \"typedef ac_int<\");\n          p = isl_printer_print_int(p, width);\n          p = isl_printer_print_str(p, \",false\");\n          p = isl_printer_print_str(p, \"> \");\n          p = isl_printer_print_str(p, local->array->name);\n          p = isl_printer_print_str(p, \"_t\");\n          p = isl_printer_print_int(p, data_pack_factors[n]);\n          p = isl_printer_print_str(p, \";\");\n          p = isl_printer_end_line(p);\n        }\n      }\n    }\n    free(data_pack_factors);    \n  }\n  p = print_str_new_line(p, \"/* Data Type */\");\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  return isl_stat_ok;\n}\n\nstatic __isl_give isl_printer *declare_and_allocate_cpu_arrays_catapult(\n  __isl_take isl_printer *p, struct autosa_prog *prog, \n  struct autosa_kernel *kernel, struct autosa_hw_top_module *top)\n{\n  p = print_str_new_line(p, \"// Allocate memory in host memory\");\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    if (local_array->n_mem_ports > 1 && local_array->array->copy_out)\n    {\n      /* Create multiple host buffers. */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::vector<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *> \");\n      p = isl_printer_print_str(p, \"dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize) {\n        p = isl_printer_print_str(p, \"_unserialized\");\n      }\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n      p = isl_printer_print_int(p, local_array->n_mem_ports);\n      p = isl_printer_print_str(p, \"; i++) {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"_tmp\");\n      p = isl_printer_print_str(p, \" = (\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *)malloc(\");\n      p = autosa_array_info_print_data_size(p, local_array->array);      \n      p = isl_printer_print_str(p, \" * sizeof(\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \"));\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \".push_back(dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"_tmp);\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n\n      if (local_array->host_serialize) {\n        /* Allocate additional serialize buffer. */\n        /* Create multiple host buffers. */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"std::vector<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \" *> \");\n        p = isl_printer_print_str(p, \"dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);      \n        p = isl_printer_print_str(p, \";\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n        p = isl_printer_print_int(p, local_array->n_mem_ports);\n        p = isl_printer_print_str(p, \"; i++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \" *dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"_tmp\");\n        p = isl_printer_print_str(p, \" = (\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \" *)malloc(\");\n        //p = autosa_array_info_print_data_size(p, local_array->array);\n        p = isl_printer_print_pw_qpolynomial(p, local_array->serialize_bound);\n        if (local_array->is_sparse) {\n          p = isl_printer_print_str(p, \" / \");\n          p = isl_printer_print_double(p, (double)local_array->eff_compress_ratio);\n        }\n        p = isl_printer_print_str(p, \" * sizeof(\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \"));\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \".push_back(dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"_tmp);\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n      }\n    }\n    else\n    {\n      /* Create a single host buffer. */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \" = (\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *)malloc(\");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \" * sizeof(\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \"));\");\n      p = isl_printer_end_line(p);\n\n      if (local_array->host_serialize) {\n        /* Create a single host buffer. */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \" *dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);       \n        p = isl_printer_print_str(p, \" = (\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \" *)malloc(\");        \n        p = isl_printer_print_pw_qpolynomial(p, local_array->serialize_bound);\n        if (local_array->is_sparse) {\n          p = isl_printer_print_str(p, \" / \");\n          p = isl_printer_print_double(p, (double)local_array->eff_compress_ratio);\n        }\n        p = isl_printer_print_str(p, \" * sizeof(\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \"));\");\n        p = isl_printer_end_line(p);\n      }\n    }    \n  }\n  p = isl_printer_end_line(p);\n\n  /* Initialize buffer. */\n  p = print_str_new_line(p, \"// Initialize host buffers\");\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    if (local_array->n_mem_ports > 1 && local_array->array->copy_out)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n      p = isl_printer_print_int(p, local_array->n_mem_ports);\n      p = isl_printer_print_str(p, \"; i++) {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"memcpy(dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \"[i]\");      \n      p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->is_sparse)\n        p = isl_printer_print_str(p, \"_s\");\n      p = isl_printer_print_str(p, \", \");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \" * sizeof(\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \"));\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n    }\n    else\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"memcpy(dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->is_sparse)\n        p = isl_printer_print_str(p, \"_s\");\n      p = isl_printer_print_str(p, \", \");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \" * sizeof(\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \"));\");\n      p = isl_printer_end_line(p);\n    }\n  }\n  \n  /* Perform data serialization if needed. */\n  for (int i = 0; i < top->n_hw_modules; i++) {\n    struct autosa_hw_module *module = top->hw_modules[i];\n    if (module->serialize_tree && module->in) {\n      struct autosa_array_ref_group *group = module->io_groups[0];\n      struct autosa_local_array_info *local_array = group->local_array;\n      if (local_array->n_mem_ports > 1 && local_array->array->copy_out)\n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n        p = isl_printer_print_int(p, local_array->n_mem_ports);\n        p = isl_printer_print_str(p, \"; i++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n  \n        p = isl_printer_start_line(p);        \n        p = isl_printer_print_str(p, module->in? \"host_serialize_\" : \"host_deserialize_\");\n        p = isl_printer_print_str(p, local_array->array->name);            \n        p = isl_printer_print_str(p, \"(\");\n        p = print_host_serialize_arguments(p, kernel, group, module, 0, 0);  // TODO: add hbm support later.\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n  \n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n      } else \n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, module->in? \"host_serialize_\" : \"host_deserialize_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"(\");\n        p = print_host_serialize_arguments(p, kernel, group, module, 0, 0);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }  \n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"// Allocate buffers in device memory\");\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"std::vector<\");\n    p = autosa_print_array_type(p, local_array->array);\n    p = isl_printer_print_str(p, \" *> buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n  }\n\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    int indent1, indent2;\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n    p = isl_printer_print_int(p, local_array->n_mem_ports);\n    p = isl_printer_print_str(p, \"; i++) {\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n\n    p = isl_printer_start_line(p);\n    p = autosa_print_array_type(p, local_array->array);\n    p = isl_printer_print_str(p, \" *buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"_tmp = (\");\n    p = autosa_print_array_type(p, local_array->array);\n    p = isl_printer_print_str(p, \" *)malloc(\");\n    if (local_array->host_serialize) {\n      p = autosa_array_info_print_serialize_data_size(p, local_array->array);\n    } else {\n      p = autosa_array_info_print_data_size(p, local_array->array);\n    }\n    p = isl_printer_print_str(p, \" / \");\n    p = isl_printer_print_int(p, local_array->array->n_lane);\n    p = isl_printer_print_str(p, \" * sizeof(\");\n    p = autosa_print_array_type(p, local_array->array);\n    p = isl_printer_print_str(p, \"));\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \".push_back(buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"_tmp);\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print code for initializing the device for execution of the transformed\n * code. This includes declaring locally defined variables as well as\n * declaring and allocating the required copies of arrays on the device.\n */\nstatic __isl_give isl_printer *init_device_catapult(__isl_take isl_printer *p,\n                                                    struct autosa_prog *prog, \n                                                    struct autosa_kernel *kernel, \n                                                    int hls,\n                                                    struct autosa_hw_top_module *top)\n{\n  p = autosa_print_local_declarations(p, prog);\n  //if (!hls)\n  //{\n  //  p = find_device_catapult(p);\n  //  p = declare_and_allocate_device_arrays_catapult(p, prog, kernel, top);\n  //}\n  //else\n  //{\n  p = declare_and_allocate_cpu_arrays_catapult(p, prog, kernel, top);\n  //}\n\n  return p;\n}\n\nstatic __isl_give isl_printer *autosa_free_cpu_arrays_catapult(\n  __isl_take isl_printer *p, struct autosa_prog *prog, struct autosa_kernel *kernel)\n{\n  p = print_str_new_line(p, \"// Clean up resources\");\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n    p = isl_printer_print_int(p, local_array->n_mem_ports);\n    p = isl_printer_print_str(p, \"; i++) {\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"free(buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"[i]);\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    if (local_array->n_mem_ports > 1 && local_array->array->copy_out)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n      p = isl_printer_print_int(p, local_array->n_mem_ports);\n      p = isl_printer_print_str(p, \"; i++) {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"free(dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"[i]);\");\n      p = isl_printer_end_line(p);\n\n      if (local_array->host_serialize) {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"free(dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"_unserialized\");\n        p = isl_printer_print_str(p, \"[i]);\");\n        p = isl_printer_end_line(p);\n      }\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n    }\n    else\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"free(dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \");\");\n      p = isl_printer_end_line(p);\n\n      if (local_array->host_serialize) {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"free(dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"_unserialized\");\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n\n  return p;\n}\n\n/* Print code for clearing the device after execution of the transformed code.\n * In particular, free the memory that was allocated on the device.\n */\nstatic __isl_give isl_printer *clear_device_catapult(__isl_take isl_printer *p,\n                                                   struct autosa_prog *prog, \n                                                   struct autosa_kernel *kernel, \n                                                   int hls,\n                                                   struct autosa_hw_top_module *top)\n{  \n  /* Deserialize the buffer data if necessary. */\n  for (int i = 0; i < top->n_hw_modules; i++) {\n    struct autosa_hw_module *module = top->hw_modules[i];\n    if (module->serialize_tree && !module->in) {\n      struct autosa_array_ref_group *group = module->io_groups[0];\n      struct autosa_local_array_info *local_array = group->local_array;\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"host_deserialize_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"(\");      \n      p = print_host_serialize_arguments(p, top->kernel, group, module, 0, 0);  // TODO: add hbm support later.\n      p = isl_printer_print_str(p, \");\");      \n      p = isl_printer_end_line(p);\n    }\n  }\n\n  if (hls)\n  {\n    /* Restore buffer */\n    p = print_str_new_line(p, \"// Restore data from host buffers\");\n    for (int i = 0; i < prog->n_array; i++)\n    {\n      struct autosa_array_info *array = &prog->array[i];\n      if (!autosa_array_requires_device_allocation(array))\n        continue;\n\n      if (array->copy_out)\n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"memcpy(\");\n        p = isl_printer_print_str(p, array->name);\n        p = isl_printer_print_str(p, \", dev_\");\n        p = isl_printer_print_str(p, array->name);\n        if (array->local_array->host_serialize) {\n          p = isl_printer_print_str(p, \"_unserialized\");\n        }\n        if (array->local_array->n_mem_ports > 1)\n        {\n          p = isl_printer_print_str(p, \"[0]\");\n        }\n        p = isl_printer_print_str(p, \", \");\n        p = autosa_array_info_print_size(p, array);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n    p = isl_printer_end_line(p);\n    p = autosa_free_cpu_arrays_catapult(p, prog, kernel);\n  }\n  //else\n  //{\n  //  /* Restore buffer */\n  //  p = print_str_new_line(p, \"// Restore data from host buffers\");\n  //  for (int i = 0; i < prog->n_array; i++)\n  //  {\n  //    struct autosa_array_info *array = &prog->array[i];\n  //    if (!autosa_array_requires_device_allocation(array))\n  //      continue;\n//\n  //    if (array->copy_out)\n  //    {\n  //      p = isl_printer_start_line(p);\n  //      p = isl_printer_print_str(p, \"std::copy(dev_\");\n  //      p = isl_printer_print_str(p, array->name);\n  //      if (array->local_array->host_serialize) {\n  //        p = isl_printer_print_str(p, \"_unserialized\");\n  //      }\n  //      if (array->local_array->n_mem_ports > 1)\n  //      {\n  //        p = isl_printer_print_str(p, \"[0]\");\n  //      }\n  //      p = isl_printer_print_str(p, \".begin(), dev_\");\n  //      p = isl_printer_print_str(p, array->name);\n  //      if (array->local_array->host_serialize) {\n  //        p = isl_printer_print_str(p, \"_unserialized\");\n  //      }\n  //      if (array->local_array->n_mem_ports > 1)\n  //      {\n  //        p = isl_printer_print_str(p, \"[0]\");\n  //      }\n  //      p = isl_printer_print_str(p, \".end(), reinterpret_cast<\");\n  //      p = isl_printer_print_str(p, array->type);\n  //      p = isl_printer_print_str(p, \" *>(\");\n  //      p = isl_printer_print_str(p, array->name);\n  //      p = isl_printer_print_str(p, \"));\");\n  //      p = isl_printer_end_line(p);\n  //    }\n  //  }\n  //}\n\n  return p;\n}\n\nstatic __isl_give isl_printer *drain_merge_catapult(\n  __isl_take isl_printer *p, struct autosa_prog *prog,\n  struct autosa_drain_merge_func *func,\n  int hls)\n{\n  struct autosa_array_ref_group *group = func->group;\n  p = print_str_new_line(p, \"// Merge results\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"for (int idx = \");\n  p = isl_printer_print_int(p, group->mem_port_id);\n  p = isl_printer_print_str(p, \"; idx < \");\n  p = isl_printer_print_int(p, group->mem_port_id + group->n_mem_ports);\n  p = isl_printer_print_str(p, \"; idx++) {\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_indent(p, 2);\n  p = isl_printer_start_line(p);\n  p = autosa_array_ref_group_print_prefix(group, p);\n  p = isl_printer_print_str(p, \"_drain_merge(\");\n  p = print_drain_merge_arguments(p, func->kernel, group, func, 0, hls);\n  p = isl_printer_print_str(p, \");\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_end_line(p);\n  return p;\n}\n\n/* Print code to \"p\" for copying \"array\" from the host to the device\n * in its entirety.  The bounds on the extent of \"array\" have\n * been precomputed in extract_array_info and are used in\n * gpu_array_info_print_size.\n */\nstatic __isl_give isl_printer *copy_array_to_device_catapult(\n  __isl_take isl_printer *p,\n  struct autosa_array_info *array, int hls)\n{\n  int indent;\n\n  struct autosa_local_array_info *local_array = array->local_array;\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n  p = isl_printer_print_int(p, local_array->n_mem_ports);\n  p = isl_printer_print_str(p, \"; i++) {\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_indent(p, 2);\n\n  //p = isl_printer_start_line(p);\n  //p = isl_printer_print_str(p, \"memcpy(buffer_\");\n  //p = isl_printer_print_str(p, array->name);\n  //p = isl_printer_print_str(p, \"[i], dev_\");\n  //p = isl_printer_print_str(p, array->name);\n  //if (local_array->n_mem_ports > 1 && local_array->array->copy_out)\n  //{\n  //  p = isl_printer_print_str(p, \"[i]\");\n  //}\n  //p = isl_printer_print_str(p, \", \");\n  //if (local_array->host_serialize) {\n  //  p = autosa_array_info_print_serialize_size(p, array);\n  //} else {\n  //  p = autosa_array_info_print_size(p, array);\n  //}\n  //p = isl_printer_print_str(p, \");\");\n  //p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"for (int c0 = 0; c0 < \");\n  if (local_array->host_serialize) {\n    p = autosa_array_info_print_serialize_data_size(p, array);\n  } else {\n    p = autosa_array_info_print_data_size(p, array);\n  }\n  p = isl_printer_print_str(p, \" / \");\n  p = isl_printer_print_int(p, array->n_lane);\n  p = isl_printer_print_str(p, \"; c0++) {\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_indent(p, 2);\n  p = isl_printer_start_line(p);\n  p = autosa_print_array_type(p, array);\n  p = isl_printer_print_str(p, \" tmp;\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"for (int c1 = 0; c1 < \");\n  p = isl_printer_print_int(p, array->n_lane);\n  p = isl_printer_print_str(p, \"; c1++) {\");\n  p = isl_printer_end_line(p);\n  \n  p = isl_printer_indent(p, 2);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"tmp.set_slc(c1 * \");\n  p = isl_printer_print_int(p, array->size * 8);\n  p = isl_printer_print_str(p, \", (\");\n  p = isl_printer_print_str(p, array->name);\n  p = isl_printer_print_str(p, \"_t1)dev_\");\n  p = isl_printer_print_str(p, array->name);\n  if (local_array->n_mem_ports > 1 && local_array->array->copy_out)\n  {\n    p = isl_printer_print_str(p, \"[i]\");\n  }\n  p = isl_printer_print_str(p, \"[c0 * \");\n  p = isl_printer_print_int(p, array->n_lane);\n  p = isl_printer_print_str(p, \" + c1]);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"buffer_\");\n  p = isl_printer_print_str(p, array->name);\n  p = isl_printer_print_str(p, \"[i][c0] = tmp;\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_end_line(p);  \n\n  return p;\n}\n\n/* Print code to \"p\" for copying \"array\" back from the device to the host\n * in its entirety.  The bounds on the extent of \"array\" have\n * been precomputed in extract_array_info and are used in\n * polysa_array_info_print_size.\n */\nstatic __isl_give isl_printer *copy_array_from_device_catapult(\n  __isl_take isl_printer *p, struct autosa_array_info *array, int hls)\n{\n  struct autosa_local_array_info *local_array;\n  int indent;\n\n  local_array = array->local_array;\n  //if (!hls)\n  //{\n  //  p = isl_printer_start_line(p);\n  //  p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n  //  p = isl_printer_print_int(p, local_array->n_io_group_refs);\n  //  p = isl_printer_print_str(p, \"; i++) {\");\n  //  p = isl_printer_end_line(p);\n  //  p = isl_printer_indent(p, 2);\n//\n  //  p = print_str_new_line(p, \"OCL_CHECK(err,\");\n  //  indent = strlen(\"OCL_CHECK(\");\n  //  p = isl_printer_indent(p, indent);\n  //  p = isl_printer_start_line(p);\n  //  p = isl_printer_print_str(p, \"err = q.enqueueMigrateMemObjects({buffer_\");\n  //  p = isl_printer_print_str(p, array->name);\n  //  p = isl_printer_print_str(p, \"[i]\");\n  //  p = isl_printer_print_str(p, \"}, CL_MIGRATE_MEM_OBJECT_HOST));\");\n  //  p = isl_printer_end_line(p);\n  //  p = isl_printer_indent(p, -indent);\n//\n  //  p = isl_printer_indent(p, -2);\n  //  p = print_str_new_line(p, \"}\");\n  //}\n  //else\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n    p = isl_printer_print_int(p, local_array->n_mem_ports);\n    p = isl_printer_print_str(p, \"; i++) {\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n\n    //p = isl_printer_start_line(p);\n    //p = isl_printer_print_str(p, \"memcpy(dev_\");\n    //p = isl_printer_print_str(p, array->name);\n    //if (local_array->n_mem_ports > 1 && local_array->array->copy_out)\n    //{\n    //  p = isl_printer_print_str(p, \"[i]\");\n    //}\n    //p = isl_printer_print_str(p, \", buffer_\");\n    //p = isl_printer_print_str(p, array->name);\n    //p = isl_printer_print_str(p, \"[i], \");\n    //if (local_array->host_serialize) {\n    //  p = autosa_array_info_print_serialize_size(p, array);\n    //} else {\n    //  p = autosa_array_info_print_size(p, array);\n    //}\n    //p = isl_printer_print_str(p, \");\");\n    //p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int c0 = 0; c0 < \");\n    if (local_array->host_serialize) {\n      p = autosa_array_info_print_serialize_data_size(p, array);\n    } else {\n      p = autosa_array_info_print_data_size(p, array);\n    }\n    p = isl_printer_print_str(p, \" / \");\n    p = isl_printer_print_int(p, array->n_lane);\n    p = isl_printer_print_str(p, \"; c0++) {\");\n    p = isl_printer_end_line(p);   \n\n    p = isl_printer_indent(p, 2);\n    p = isl_printer_start_line(p);\n    p = autosa_print_array_type(p, array);\n    p = isl_printer_print_str(p, \" tmp = buffer_\");\n    p = isl_printer_print_str(p, array->name);\n    p = isl_printer_print_str(p, \"[i][c0];\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int c1 = 0; c1 < \");\n    p = isl_printer_print_int(p, array->n_lane);\n    p = isl_printer_print_str(p, \"; c1++) {\");\n    p = isl_printer_end_line(p); \n\n    p = isl_printer_indent(p, 2);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"dev_\");\n    p = isl_printer_print_str(p, array->name);\n    if (local_array->n_mem_ports > 1 && local_array->array->copy_out)\n    {\n      p = isl_printer_print_str(p, \"[i]\");\n    }\n    p = isl_printer_print_str(p, \"[c0 * \");\n    p = isl_printer_print_int(p, array->n_lane);\n    p = isl_printer_print_str(p, \" + c1] = (\");\n    p = isl_printer_print_str(p, array->type);\n    p = isl_printer_print_str(p, \")tmp.slc<\");\n    p = isl_printer_print_int(p, array->size * 8);\n    p = isl_printer_print_str(p, \">(\");\n    p = isl_printer_print_int(p, array->size * 8);\n    p = isl_printer_print_str(p, \" * c1);\");\n    p = isl_printer_end_line(p); \n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");    \n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");    \n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n    p = isl_printer_end_line(p);    \n  }\n\n  return p;\n}\n\n/* Print a statement for copying an array to or from the device,\n * or for initializing or clearing the device.\n * The statement identifier of a copying node is called\n * \"to_device_<array name>\" or \"from_device_<array name>\" and\n * its user pointer points to the autosa_array_info of the array\n * that needs to be copied.\n * The node for initializing the device is called \"init_device\".\n * The node for clearing the device is called \"clear_device\".\n *\n * Extract the array (if any) from the identifier and call\n * init_device, clear_device, copy_array_to_device or copy_array_from_device.\n */\nstatic __isl_give isl_printer *print_device_node_catapult(__isl_take isl_printer *p,\n                                                          __isl_keep isl_ast_node *node, \n                                                          struct autosa_prog *prog, \n                                                          int hls,\n                                                          struct autosa_hw_top_module *top)\n{\n  isl_ast_expr *expr, *arg;\n  isl_id *id;\n  const char *name;\n  struct autosa_array_info *array;\n  struct autosa_kernel *kernel;\n  struct autosa_drain_merge_func *func;\n\n  expr = isl_ast_node_user_get_expr(node);\n  arg = isl_ast_expr_get_op_arg(expr, 0);\n  id = isl_ast_expr_get_id(arg);\n  name = isl_id_get_name(id);\n  if (!strcmp(name, \"init_device\") || !strcmp(name, \"clear_device\"))\n    kernel = (struct autosa_kernel *)isl_id_get_user(id);\n  else if (!strcmp(name, \"drain_merge\"))\n    func = (struct autosa_drain_merge_func *)isl_id_get_user(id);\n  else\n    array = (struct autosa_array_info *)isl_id_get_user(id);\n  isl_id_free(id);\n  isl_ast_expr_free(arg);\n  isl_ast_expr_free(expr);\n\n  if (!name)\n    return isl_printer_free(p);\n  if (!strcmp(name, \"init_device\"))\n    return init_device_catapult(p, prog, kernel, hls, top);\n  if (!strcmp(name, \"clear_device\"))\n    return clear_device_catapult(p, prog, kernel, hls, top);\n  if (!strcmp(name, \"drain_merge\"))\n    return drain_merge_catapult(p, prog, func, hls);\n  if (!array)\n    return isl_printer_free(p);\n\n  if (!prefixcmp(name, \"to_device\"))\n    return copy_array_to_device_catapult(p, array, hls);\n  else\n    return copy_array_from_device_catapult(p, array, hls);\n\n  return p;\n}\n\n/* Print the user statement of the host code to \"p\".\n *\n * The host code may contain original user statements, kernel launches,\n * statements that copy data to/from the device and statements\n * the initialize or clear the device.\n * The original user statements and the kernel launches have\n * an associated annotation, while the other statements do not.\n * The latter are handled by print_device_node.\n * The annotation on the user statements is called \"user\".\n *\n * In case of a kernel launch, print a block of statements that\n * defines the grid and the block and then launches the kernel.\n */\nstatic __isl_give isl_printer *print_host_user_catapult(__isl_take isl_printer *p,\n                                                        __isl_take isl_ast_print_options *print_options,\n                                                        __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  int is_user;\n  struct autosa_kernel *kernel;\n  struct autosa_kernel_stmt *stmt;\n  struct print_host_user_data *data;\n  struct hls_info *hls;\n  struct autosa_hw_top_module *top;\n\n  isl_ast_print_options_free(print_options);\n\n  data = (struct print_host_user_data *)user;\n  hls = data->hls;\n  top = data->top;\n\n  id = isl_ast_node_get_annotation(node);\n  if (!id)\n  {\n    return print_device_node_catapult(p, node, data->prog, hls->hls, top);\n  }\n\n  is_user = !strcmp(isl_id_get_name(id), \"user\");\n  kernel = is_user ? NULL : (struct autosa_kernel *)isl_id_get_user(id);\n  stmt = is_user ? (struct autosa_kernel_stmt *)isl_id_get_user(id) : NULL;\n  isl_id_free(id);\n\n  if (is_user)\n    return autosa_kernel_print_domain(p, stmt);\n\n  //if (!hls->hls)\n  //{\n  //  /* Print OpenCL host. */\n  //  p = ppcg_start_block(p);\n//\n  //  p = print_set_kernel_arguments_xilinx(p, data->prog, kernel);\n  //  p = print_str_new_line(p, \"q.finish();\");\n  //  p = print_str_new_line(p, \"fpga_begin = std::chrono::high_resolution_clock::now();\");\n  //  p = isl_printer_end_line(p);\n  //  p = print_str_new_line(p, \"// Launch the kernel\");\n  //  p = print_str_new_line(p, \"OCL_CHECK(err, err = q.enqueueTask(krnl));\");\n  //  p = isl_printer_end_line(p);\n  //  p = print_str_new_line(p, \"q.finish();\");\n  //  p = print_str_new_line(p, \"fpga_end = std::chrono::high_resolution_clock::now();\");\n//\n  //  p = ppcg_end_block(p);\n  //  p = isl_printer_end_line(p);\n  //}\n  //else\n  //{\n    /* Print HLS host. */\n    p = ppcg_start_block(p);\n\n    p = print_str_new_line(p, \"// Launch the kernel\");\n    p = print_str_new_line(p, \"kernel0 kernel0_inst;\");\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"kernel\");    \n    p = isl_printer_print_int(p, 0);\n    p = isl_printer_print_str(p, \"_inst.run(\");\n    p = print_kernel_arguments(p, data->prog, kernel, 0, hls);\n    p = isl_printer_print_str(p, \");\");\n    p = isl_printer_end_line(p);\n\n    p = ppcg_end_block(p);\n  //}\n  /* Print the top kernel header. */\n  //print_kernel_headers_catapult(data->prog, kernel, data->hls);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_module_core_header_catapult(\n  __isl_take isl_printer *p,\n  struct autosa_prog *prog, struct autosa_hw_module *module,\n  int inter, int boundary, int serialize, int types)\n{\n  int n = isl_id_list_n_id(module->inst_ids);\n\n  p = isl_printer_start_line(p);  \n  if (types)\n    p = isl_printer_print_str(p, \"void \");\n  p = isl_printer_print_str(p, \"CCS_BLOCK(run)\");\n  p = isl_printer_print_str(p, \"(\");\n  if (!types) {\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n    p = isl_printer_start_line(p);  \n  }\n  p = print_module_arguments(p, prog, module->kernel, module, types,\n                             CATAPULT_HW, inter, -1, boundary, serialize);\n  p = isl_printer_print_str(p, \")\");\n  if (!types) {\n    p = isl_printer_indent(p, -2);\n  }\n\n  return p;\n}\n\n/* Print out variable declarations on Xilinx platforms.\n * The local variable can be mapped to different memory resources:\n * FF, LUTRAM, BRAM, URAM.\n */\nstatic __isl_give isl_printer *print_module_var_catapult(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_var *var, int double_buffer,\n    struct autosa_hw_module *module)\n{\n  int j;\n  int use_memory = 0; // 0: FF 1: LUTRAM 2: BRAM 3: URAM\n  use_memory = extract_memory_type(module, var, module->options->autosa->uram);\n\n  p = isl_printer_start_line(p);\n  if (var->array->local_array->is_sparse && module->type != PE_MODULE) {\n    p = isl_printer_print_str(p, var->array->name);\n    p = isl_printer_print_str(p, \"_s_t\");\n    p = isl_printer_print_int(p, var->n_lane);\n  } else {\n    //if (var->n_lane == 1)\n    //  p = isl_printer_print_str(p, var->array->type);\n    //else {\n      p = isl_printer_print_str(p, var->array->name);    \n      p = isl_printer_print_str(p, \"_t\");\n      p = isl_printer_print_int(p, var->n_lane);\n    //}\n  }\n  p = isl_printer_print_str(p, \" \");\n  p = isl_printer_print_str(p, var->name);\n  if (double_buffer)\n    p = isl_printer_print_str(p, \"_ping\");\n  for (j = 0; j < isl_vec_size(var->size); ++j)\n  {\n    isl_val *v;\n\n    p = isl_printer_print_str(p, \"[\");\n    v = isl_vec_get_element_val(var->size, j);\n    p = isl_printer_print_val(p, v);\n    isl_val_free(v);\n    p = isl_printer_print_str(p, \"]\");\n  }\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n\n  /* Print pong buffer */\n  if (double_buffer)\n  {\n    p = isl_printer_start_line(p);\n    if (var->array->local_array->is_sparse) {\n      p = isl_printer_print_str(p, var->array->name);\n      p = isl_printer_print_str(p, \"_s_t\");      \n      p = isl_printer_print_int(p, var->n_lane);      \n    } else {\n      if (var->n_lane == 1)\n        p = isl_printer_print_str(p, var->array->type);\n      else {\n        p = isl_printer_print_str(p, var->array->name);        \n        p = isl_printer_print_str(p, \"_t\");\n        p = isl_printer_print_int(p, var->n_lane);\n      }\n    }\n    p = isl_printer_print_str(p, \" \");\n    p = isl_printer_print_str(p, var->name);\n    if (double_buffer)\n      p = isl_printer_print_str(p, \"_pong\");\n    for (j = 0; j < isl_vec_size(var->size); ++j)\n    {\n      isl_val *v;\n\n      p = isl_printer_print_str(p, \"[\");\n      v = isl_vec_get_element_val(var->size, j);\n      p = isl_printer_print_val(p, v);\n      isl_val_free(v);\n      p = isl_printer_print_str(p, \"]\");\n    }\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_module_vars_catapult(\n  __isl_take isl_printer *p, struct autosa_hw_module *module, int inter)\n{\n  int i, n;\n  isl_space *space;\n  const char *type;\n\n  if (inter == -1)\n  {\n    for (i = 0; i < module->n_var; ++i)\n      p = print_module_var_catapult(p, &module->var[i], module->double_buffer, module);\n  }  \n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_for_with_pipeline(\n  __isl_keep isl_ast_node *node, __isl_take isl_printer *p,\n  __isl_take isl_ast_print_options *print_options)\n{\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"#pragma hls_pipeline_init_interval 1\");\n  p = isl_printer_end_line(p);\n\n  p = isl_ast_node_for_print(node, p, print_options);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_for_with_unroll(\n  __isl_keep isl_ast_node *node, __isl_take isl_printer *p,\n  __isl_take isl_ast_print_options *print_options)\n{\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"#pragma unroll yes\");\n  p = isl_printer_end_line(p);\n\n  p = isl_ast_node_for_print(node, p, print_options);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_for_with_guard(\n  __isl_take isl_ast_node *node, __isl_take isl_printer *p,\n  __isl_take isl_ast_print_options *print_options,\n  int pipeline, int unroll,\n  int guard_start, int guard_end,\n  char **fifo_names, isl_pw_qpolynomial **bounds, int n_fifo,\n  int double_buffer, int inter, int read,\n  char *module_name, char *buf_name\n  )\n{  \n  if (guard_start) {\n    p = isl_printer_print_str(p, \"#ifndef __SYNTHESIS__\");\n    p = isl_printer_end_line(p);    \n\n    p = print_str_new_line(p, \"// while () // Please add the fifo check for C sim.\");\n    //if (n_fifo > 0) {\n    //  p = isl_printer_start_line(p);\n    //  p = isl_printer_print_str(p, \"while (\");\n    //  //for (int i = 0; i < n_fifo; i++) {\n    //  //  if (i > 0)\n    //  //    p = isl_printer_print_str(p, \" && \");\n    //  //  p = isl_printer_print_str(p, fifo_names[i]);\n    //  //  p = isl_printer_print_str(p, \".available(\");\n    //  //  p = isl_printer_print_pw_qpolynomial(p, bounds[i]);\n    //  //  p = isl_printer_print_str(p, \")\");\n    //  //}\n    //  p = isl_printer_print_str(p, \")\");\n    //  p = isl_printer_end_line(p);\n    //}\n  }\n\n  //p = isl_printer_indent(p, 2);\n  if (pipeline) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"#pragma hls_pipeline_init_interval 1\");\n    p = isl_printer_end_line(p);\n  }\n  if (unroll) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"#pragma unroll yes\");\n    p = isl_printer_end_line(p);\n  }\n\n  if (!guard_end) {\n    p = isl_ast_node_for_print(node, p, print_options);   \n    //p = isl_printer_indent(p, -2); \n  } else {\n    isl_ast_expr *iterator, *init, *cond, *inc;\n    isl_ast_node *body;\n    const char *iter_type;\n    iterator = isl_ast_node_for_get_iterator(node);\n    init = isl_ast_node_for_get_init(node);\n    cond = isl_ast_node_for_get_cond(node);\n    inc = isl_ast_node_for_get_inc(node);\n    body = isl_ast_node_for_get_body(node);\n    iter_type = isl_options_get_ast_iterator_type(isl_ast_node_get_ctx(node));\n    \n    //p = isl_printer_indent(p, -2);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (\");\n    p = isl_printer_print_str(p, iter_type);\n    p = isl_printer_print_str(p, \" \");\n    p = isl_printer_print_ast_expr(p, iterator);\n    p = isl_printer_print_str(p, \" = \");\n    p = isl_printer_print_ast_expr(p, init);\n    p = isl_printer_print_str(p, \"; \");\n    p = isl_printer_print_ast_expr(p, cond);\n    p = isl_printer_print_str(p, \"; \");\n    p = isl_printer_print_ast_expr(p, iterator);\n    p = isl_printer_print_str(p, \" += \");\n    p = isl_printer_print_ast_expr(p, inc);\n    p = isl_printer_print_str(p, \")\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_print_str(p, \"#endif\");\n    p = isl_printer_end_line(p);\n\n    p = ppcg_start_block(p);\n\n    /* Add the double buffer logic if needed. */    \n    if (inter == 0 || inter == 1) {      \n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, module_name);\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_str(p, buf_name);\n      p = isl_printer_print_str(p, \" \");\n      p = isl_printer_print_str(p, buf_name);\n      p = isl_printer_print_str(p, \"_tmp;\");\n      p = isl_printer_end_line(p);\n\n      if (read) {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, buf_name);\n        p = isl_printer_print_str(p, \"_tmp = \");\n        p = isl_printer_print_str(p, buf_name);\n        p = isl_printer_print_str(p, \".read();\");\n        p = isl_printer_end_line(p);      \n      }\n    }    \n\n    //p = isl_printer_indent(p, 2);  \n    p = isl_ast_node_print(body, p, print_options);    \n    //p = isl_printer_indent(p, -2);  \n        \n    if (inter == 0 || inter == 1) {      \n      if (!read) {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, buf_name);\n        p = isl_printer_print_str(p, \".write(\");\n        p = isl_printer_print_str(p, buf_name);\n        p = isl_printer_print_str(p, \"_tmp);\");\n        p = isl_printer_end_line(p);      \n      }\n    }\n\n    p = ppcg_end_block(p);\n\n    isl_ast_expr_free(iterator);\n    isl_ast_expr_free(init);\n    isl_ast_expr_free(cond);\n    isl_ast_expr_free(inc);\n    isl_ast_node_free(body);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_for_catapult(__isl_take isl_printer *p,\n                                                  __isl_take isl_ast_print_options *print_options,\n                                                  __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  int pipeline;\n  int unroll;\n  int guard_start;\n  int guard_end;\n  /* for catapult fifos */\n  int n_fifo;\n  char **fifo_names;\n  isl_pw_qpolynomial **bounds;\n  int double_buffer, inter, read;\n  char *module_name, *buf_name;\n\n  pipeline = 0;\n  unroll = 0;\n  guard_start = 0;\n  guard_end = 0;\n  id = isl_ast_node_get_annotation(node);\n  n_fifo = 0;\n  fifo_names = NULL;\n  bounds = NULL;\n  double_buffer = 0;\n  inter = -1;\n  read = -1;\n  module_name = NULL;\n  buf_name = NULL;\n\n  if (id)\n  {\n    struct autosa_ast_node_userinfo *info;\n\n    info = (struct autosa_ast_node_userinfo *)isl_id_get_user(id);\n    if (info && info->is_pipeline)\n      pipeline = 1;\n    if (info && info->is_unroll)\n      unroll = 1;\n    if (info && info->is_guard_start)\n      guard_start = 1;\n    if (info && info->is_guard_end) {\n      guard_end = 1;\n      if (info->inter >= 0) {\n        double_buffer = info->double_buffer;\n        inter = info->inter;\n        read = info->read;\n        module_name = info->module_name;\n        buf_name = info->buf_name;\n      }\n    }\n  }\n\n  if (guard_start || guard_end)\n    p = print_for_with_guard(\n            node, p, print_options, pipeline, unroll, \n            guard_start, guard_end,\n            fifo_names, bounds, n_fifo,\n            double_buffer, inter, read, module_name, buf_name);\n  else if (pipeline)\n    p = print_for_with_pipeline(node, p, print_options);\n  else if (unroll)\n    p = print_for_with_unroll(node, p, print_options);\n  else\n    p = isl_ast_node_for_print(node, p, print_options);\n\n  isl_id_free(id);\n\n  return p;\n}\n\n/* Prints out the rest of the fields in the class for Catapult HLS. \n * If the function holds the inter and intra trans modules, prints out \n * a private filed containing the function decls.\n * \n */\nstatic __isl_give isl_printer *print_module_fields_catapult(\n  __isl_take isl_printer *p, struct autosa_prog *prog,\n  struct autosa_hw_module *module, struct hls_info *hls,\n  int inter, int boundary, int serialize, int types) \n{\n  p = print_str_new_line(p, \"}\");\n\n  // TODO: More to be printed out for other functions\n  if (inter == -1 && module->is_filter && module->is_buffer) {\n    /* Print the inter/intra trans modules and the buffer. */\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"private:\");\n    p = isl_printer_indent(p, 2);\n    /* inter trans module */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, module->name);\n    p = isl_printer_print_str(p, \"_inter_trans\");    \n    if (boundary)\n      p = isl_printer_print_str(p, \"_boundary\");    \n    p = isl_printer_print_str(p, \" \");\n    p = isl_printer_print_str(p, module->name);\n    p = isl_printer_print_str(p, \"_inter_trans\");\n    if (boundary)\n      p = isl_printer_print_str(p, \"_boundary\");    \n    p = isl_printer_print_str(p, \"_inst;\");\n    p = isl_printer_end_line(p);\n\n    /* intra trans module */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, module->name);\n    p = isl_printer_print_str(p, \"_intra_trans \");\n    p = isl_printer_print_str(p, module->name);\n    p = isl_printer_print_str(p, \"_intra_trans_inst;\");\n    p = isl_printer_end_line(p);    \n\n    /* buffer */\n    for (int i = 0; i < module->n_var; i++) {\n      struct autosa_kernel_var *var;\n      var = (struct autosa_kernel_var *)&module->var[i];\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"ac_channel<\");\n      p = isl_printer_print_str(p, module->name);      \n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_str(p, var->name);\n      p = isl_printer_print_str(p, \"> \");\n      p = isl_printer_print_str(p, module->name);      \n      //if (boundary)\n      //  p = isl_printer_print_str(p, \"_boundary\");    \n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_str(p, var->name);\n      p = isl_printer_print_str(p, \"_inst;\");\n      p = isl_printer_end_line(p);\n    }    \n  } \n\n  p = isl_printer_indent(p, -2);\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"};\");  \n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_module_core_headers_catapult(\n  __isl_take isl_printer *p, struct autosa_prog *prog, \n  struct autosa_hw_module *module, struct hls_info *hls,\n  int inter, int boundary, int serialize, int types)\n{\n  int n = isl_id_list_n_id(module->inst_ids);  \n\n  if (types) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"class \");\n    p = isl_printer_print_str(p, module->name);\n    if (inter == 0)\n      p = isl_printer_print_str(p, \"_intra_trans\");\n    if (inter == 1)\n      p = isl_printer_print_str(p, \"_inter_trans\");\n    if (boundary)\n      p = isl_printer_print_str(p, \"_boundary\");\n    if (serialize)\n      p = isl_printer_print_str(p, \"_serialize\");\n    p = isl_printer_print_str(p, \" {\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_indent(p, 2);\n    p = print_str_new_line(p, \"public:\");\n\n    p = isl_printer_indent(p, 2);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, module->name);\n    if (inter == 0)\n      p = isl_printer_print_str(p, \"_intra_trans\");\n    if (inter == 1)\n      p = isl_printer_print_str(p, \"_inter_trans\");\n    if (boundary)\n      p = isl_printer_print_str(p, \"_boundary\");\n    if (serialize)\n      p = isl_printer_print_str(p, \"_serialize\");\n    p = isl_printer_print_str(p, \"() {}\");\n    p = isl_printer_end_line(p);\n\n    p = print_str_new_line(p, \"#pragma hls_design interface\");\n    if ((inter == -1 && module->pipeline_at_default_func && !serialize && !module->is_filter) ||\n        (inter == -1 && module->pipeline_at_filter_func[0] && module->is_filter) ||\n        (inter == 0 && module->pipeline_at_filter_func[1]) ||\n        (inter == 1 && module->pipeline_at_filter_func[2]))\n      p = print_str_new_line(p, \"#pragma hls_pipeline_init_interval 1\");\n    p = print_module_core_header_catapult(p, prog, module, inter, boundary, serialize, 1);\n    p = isl_printer_print_str(p, \" {\");\n    p = isl_printer_end_line(p);\n  } else {\n    // TODO\n  }\n\n  return p;\n}\n\n/* Print the serializaztion module that connects the external memory to the \n * top-level I/O module. \n */\nstatic __isl_give isl_printer *autosa_print_serialize_module(\n  __isl_take isl_printer *p,\n  struct autosa_hw_module *module, struct autosa_prog *prog,\n  struct hls_info *hls, int boundary)\n{  \n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);  \n\n  /* Print core. */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  if (hls->target == CATAPULT_HW)\n    p = print_module_core_headers_catapult(p, prog, module, hls, -1, boundary, 1, 1); // TODO  \n  \n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  if (!prog->scop->options->autosa->use_cplusplus_template) {\n    p = print_module_iterators(p, hls->kernel_c, module);    \n  }\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_print_str(p, \"#ifndef __SYNTHESIS__\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"// while () // Please add the fifo check for C sim.\");\n  p = isl_printer_print_str(p, \"#endif\");\n  p = isl_printer_end_line(p);\n  \n  p = print_module_serialize_body(p, module, hls);\n  p = isl_printer_indent(p, -2);  \n  if (hls->target == CATAPULT_HW)\n    p = print_module_fields_catapult(p, prog, module, hls, -1, boundary, 1, 1);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print the default module. \n * For PE modules, we will print a wrapper function to speedup the HLS \n * synthesis. \n * For the rest of the modules, wrapper is disabled. \n */\nstatic __isl_give isl_printer *autosa_print_default_module(\n  __isl_take isl_printer *p,\n  struct autosa_hw_module *module, struct autosa_prog *prog,\n  struct hls_info *hls, int boundary)\n{\n  if (!boundary) {\n    if (!module->device_tree)\n      return p;\n  } else {\n    if (!module->boundary_tree)\n      return p;\n  }    \n\n  //bool wrapper = 0;\n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n  \n  ///* Print wrapper for PE and L1 IO module */\n  //if (module->type == PE_MODULE || (module->type != PE_MODULE && module->level == 1)) \n  //  wrapper = 1;  \n\n  /* Print core. */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  if (hls->target == CATAPULT_HW)\n    p = print_module_core_headers_catapult(p, prog, module, hls, -1, boundary, 0, 1);  \n  \n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  //if (!prog->scop->options->autosa->use_cplusplus_template) {\n  p = print_module_iterators(p, hls->kernel_c, module);  \n  //}  \n  if (prog->scop->options->autosa->block_sparse) {\n    for (int i = 0; i < module->n_io_group; i++) {\n      struct autosa_array_ref_group *group = module->io_groups[i];\n      if (group->local_array->array_type == AUTOSA_EXT_ARRAY) {      \n        int n_lane = get_io_group_n_lane(module, NULL, group);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, group->array->name);\n        if (group->local_array->is_sparse)\n          p = isl_printer_print_str(p, \"_s_t\");\n        else\n          p = isl_printer_print_str(p, \"_t\");      \n        p = isl_printer_print_int(p, n_lane);\n        p = isl_printer_print_str(p, \" fifo_data_\");\n        p = isl_printer_print_str(p, group->array->name);\n        p = isl_printer_print_str(p, \";\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }  \n  if (module->type == PE_MODULE)\n    p = print_module_vars_catapult(p, module, -1);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);  \n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);  \n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &print_for_catapult, &hw_data);\n\n  if (!boundary)\n    p = isl_ast_node_print(module->device_tree, p, print_options);\n  else\n    p = isl_ast_node_print(module->boundary_tree, p, print_options);\n  p = isl_printer_indent(p, -2);\n  \n  if (hls->target == CATAPULT_HW)\n    p = print_module_fields_catapult(p, prog, module, hls, -1, boundary, 0, 1);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n\n  /* If the module serialization is enabled, we will print out an extra module\n   * for serializing the data. */\n  if (module->to_mem && module->options->autosa->host_serialize) {\n    p = autosa_print_serialize_module(p, module, prog, hls, boundary);\n  }\n\n  return p;\n}\n\n/* Print the inter_trans module.\n */\nstatic __isl_give isl_printer *autosa_print_inter_trans_module(\n  __isl_take isl_printer *p,\n  struct autosa_hw_module *module, struct autosa_prog *prog,\n  struct hls_info *hls, int boundary)\n{\n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  if (boundary) {\n    if (!module->boundary_inter_tree)\n      return p;\n  } else {\n    if (!module->inter_tree)\n      return p;\n  }  \n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n  \n  p = print_module_core_headers_catapult(p, prog, module, hls, 1, boundary, 0, 1);\n    \n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  if (!prog->scop->options->autosa->use_cplusplus_template) {\n    p = print_module_iterators(p, hls->kernel_c, module);\n  }  \n  p = print_module_vars_catapult(p, module, 1); \n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);  \n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &print_for_catapult, &hw_data);  \n  \n  p = isl_ast_node_print((boundary == 0) ? module->inter_tree : module->boundary_inter_tree, p, print_options);\n  p = isl_printer_indent(p, -2);\n  \n  p = print_module_fields_catapult(p, prog, module, hls, 1, boundary, 0, 1);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n\n  return p;  \n}\n\n/* Print the intra_trans module. \n */\nstatic __isl_give isl_printer *autosa_print_intra_trans_module(\n  __isl_take isl_printer *p,\n  struct autosa_hw_module *module, struct autosa_prog *prog,\n  struct hls_info *hls, int boundary)\n{\n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  if (!module->intra_tree)\n    return p;\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = print_module_core_headers_catapult(p, prog, module, hls, 0, boundary, 0, 1);\n  \n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  if (!prog->scop->options->autosa->use_cplusplus_template) {\n    p = print_module_iterators(p, hls->kernel_c, module);\n  }\n  p = print_module_vars_catapult(p, module, 1);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  //if (module->double_buffer)\n  //{\n  //  p = isl_printer_start_line(p);\n  //  p = isl_printer_print_str(p, \"if (!intra_trans_en) return;\");\n  //  p = isl_printer_end_line(p);\n  //  p = isl_printer_end_line(p);\n  //}\n  /* For local reduce, print the buffer initialization. */  \n  for (int i = 0; i < module->n_var; i++) {\n    if (module->var[i].init_required) {\n      p = autosa_print_var_initialization(p, &module->var[i], hls->target);\n    }\n  }\n  p = isl_printer_end_line(p);\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);  \n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                        &print_for_catapult, &hw_data);  \n    \n  p = isl_ast_node_print(module->intra_tree, p, print_options);\n  p = isl_printer_indent(p, -2);\n\n  p = print_module_fields_catapult(p, prog, module, hls, 0, boundary, 0, 1);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n\n  return p;  \n}\n\nstatic __isl_give isl_printer *print_local_array_struct(\n  __isl_take isl_printer *p,\n  struct autosa_hw_module *module,\n  struct autosa_kernel_var *var)\n{\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"struct \");\n  p = isl_printer_print_str(p, module->name);\n  p = isl_printer_print_str(p, \"_\");\n  p = isl_printer_print_str(p, var->name);\n  p = isl_printer_print_str(p, \" {\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_indent(p, 2);\n  p = isl_printer_start_line(p);\n  if (var->array->local_array->is_sparse && module->type != PE_MODULE) {\n    p = isl_printer_print_str(p, var->array->name);\n    p = isl_printer_print_str(p, \"_s_t\");\n    p = isl_printer_print_int(p, var->n_lane);\n  } else {    \n    p = isl_printer_print_str(p, var->array->name);    \n    p = isl_printer_print_str(p, \"_t\");\n    p = isl_printer_print_int(p, var->n_lane);    \n  }\n  p = isl_printer_print_str(p, \" data\");\n  for (int i = 0; i < isl_vec_size(var->size); i++) {\n    isl_val *v;\n\n    p = isl_printer_print_str(p, \"[\");\n    v = isl_vec_get_element_val(var->size, i);\n    p = isl_printer_print_val(p, v);\n    isl_val_free(v);\n    p = isl_printer_print_str(p, \"]\");    \n  }\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_indent(p, -2);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"};\");\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *autosa_print_host_code(__isl_take isl_printer *p,\n                                                      struct autosa_prog *prog, __isl_keep isl_ast_node *tree,\n                                                      struct autosa_hw_module **modules, int n_modules,\n                                                      struct autosa_hw_top_module *top,\n                                                      struct autosa_drain_merge_func **drain_merge_funcs, int n_drain_merge_funcs,\n                                                      struct hls_info *hls)\n{\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_ast_node_get_ctx(tree);\n  struct print_host_user_data data = {hls, prog, top};\n  struct print_hw_module_data hw_data = {hls, prog, NULL};\n  isl_printer *p_module;\n\n  /* Print the data pack types in the program. */\n  print_data_types_catapult(top, hls);\n\n  /* Print the macros for sparse data structure */\n  if (prog->scop->options->autosa->block_sparse) {\n    print_sparse_macros(top->kernel, hls);\n  }\n\n  /* Print the helper functions in the program. */\n  print_drain_merge_funcs(top->kernel, drain_merge_funcs, n_drain_merge_funcs, hls);\n\n  /* Print the host data serialization function. */\n  print_host_serialize_funcs(top->kernel, modules, n_modules, hls);\n\n  /* Print the default AST. */\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_host_user_catapult, &data);\n\n  /* Print the macros definitions in the program. */\n  p = autosa_print_macros(p, tree);\n  p = isl_ast_node_print(tree, p, print_options);\n\n  /* Print the hw module ASTs. */\n  p_module = isl_printer_to_file(ctx, hls->kernel_c);\n  p_module = isl_printer_set_output_format(p_module, ISL_FORMAT_C);\n\n  /* Print the local buffer definition */\n  p_module = isl_printer_end_line(p_module);\n  for (int i = 0; i < n_modules; i++) {\n    if (modules[i]->type == PE_MODULE)\n      continue;\n    if (modules[i]->n_var > 0) {\n      for (int j = 0; j < modules[i]->n_var; j++)\n        p_module = print_local_array_struct(p_module, modules[i], &modules[i]->var[j]);\n        p_module = isl_printer_end_line(p_module);\n    }\n  }\n  p_module = print_str_new_line(p_module, \"#include <mc_scverify.h>\");\n  p_module = isl_printer_end_line(p_module);\n\n  for (int i = 0; i < n_modules; i++)\n  {\n    if (modules[i]->is_filter && modules[i]->is_buffer)\n    {\n      /* Print out the definitions for inter_trans and intra_trans function calls. */\n      /* Intra transfer function */\n      p_module = autosa_print_intra_trans_module(p_module, modules[i], prog, hls, 0); // todo\n \n      /* Inter transfer function */\n      p_module = autosa_print_inter_trans_module(p_module, modules[i], prog, hls, 0); // todo\n      if (modules[i]->boundary)\n        p_module = autosa_print_inter_trans_module(p_module, modules[i], prog, hls, 1); // todo\n    }\n\n    p_module = autosa_print_default_module(p_module, modules[i], prog, hls, 0);\n \n    if (modules[i]->boundary)\n    {\n      /* Print out the definitions for boundary trans function calls. */\n      p_module = autosa_print_default_module(p_module, modules[i], prog, hls, 1);\n    }      \n  }\n  isl_printer_free(p_module);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_top_module_headers_catapult(\n  __isl_take isl_printer *p,\n  struct autosa_prog *prog, struct autosa_hw_top_module *top, struct hls_info *hls)\n{\n  struct autosa_kernel *kernel = top->kernel;\n\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"#pragma hls_design top\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");  \n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"class kernel0 {\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  p = print_str_new_line(p, \"p = isl_printer_indent(p, 2);\");\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");  \n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"public:\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_indent(p, 2);\");\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");  \n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"kernel0() {}\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");  \n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"#pragma hls_design interface\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"void CCS_BLOCK(run)(\");\n  p = print_kernel_arguments(p, prog, top->kernel, 1, hls); // todo\n  p = isl_printer_print_str(p, \")\\\");\");\n  p = isl_printer_end_line(p);\n  \n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"{\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_top_module_call_stmt(\n  __isl_take isl_printer *p,\n  __isl_take isl_ast_print_options *print_options,\n  __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  struct autosa_kernel_stmt *stmt;\n  struct print_hw_module_data *data = (struct print_hw_module_data *)(user);\n\n  id = isl_ast_node_get_annotation(node);\n  stmt = (struct autosa_kernel_stmt *)isl_id_get_user(id);\n  isl_id_free(id);\n\n  isl_ast_print_options_free(print_options);\n\n  switch (stmt->type)\n  {\n    case AUTOSA_KERNEL_STMT_MODULE_CALL:\n      return autosa_kernel_print_module_call(p, stmt, data->prog, data->hls->target);\n  }\n\n  return p;  \n}\n\nstatic __isl_give isl_printer *print_top_module_call_inst(\n  __isl_take isl_printer *p,\n  __isl_take isl_ast_print_options *print_options,\n  __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  struct autosa_kernel_stmt *stmt;\n  struct print_hw_module_data *data = (struct print_hw_module_data *)(user);\n\n  id = isl_ast_node_get_annotation(node);\n  stmt = (struct autosa_kernel_stmt *)isl_id_get_user(id);\n  isl_id_free(id);\n\n  isl_ast_print_options_free(print_options);\n\n  switch (stmt->type)\n  {\n    case AUTOSA_KERNEL_STMT_MODULE_CALL:\n      return autosa_kernel_print_module_call_inst(p, stmt, data->prog, data->hls->target);\n  }\n\n  return p;    \n}\n\nstatic char *extract_fifo_name_from_fifo_decl_name(isl_ctx *ctx, char *fifo_decl_name)\n{\n  int loc = 0;\n  char ch;\n  isl_printer *p_str = isl_printer_to_str(ctx);\n  char *name = NULL;\n\n  while ((ch = fifo_decl_name[loc]) != '\\0')\n  {\n    if (ch == '.')\n      break;\n    char buf[2];\n    buf[0] = ch;\n    buf[1] = '\\0';\n    p_str = isl_printer_print_str(p_str, buf);\n    loc++;\n  }\n\n  name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  return name;\n}\n\nstatic char *extract_fifo_width_from_fifo_decl_name(isl_ctx *ctx, char *fifo_decl_name)\n{\n  int loc = 0;\n  char ch;\n  isl_printer *p_str = isl_printer_to_str(ctx);\n  char *name = NULL;\n\n  while ((ch = fifo_decl_name[loc]) != '\\0')\n  {\n    if (ch == '.')\n      break;\n    loc++;\n  }\n\n  loc++;\n\n  while ((ch = fifo_decl_name[loc]) != '\\0')\n  {\n    char buf[2];\n    buf[0] = ch;\n    buf[1] = '\\0';\n    p_str = isl_printer_print_str(p_str, buf);\n    loc++;\n  }\n\n  name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  return name;\n}\n\nstatic __isl_give isl_printer *print_top_module_fifo_stmt(__isl_take isl_printer *p,\n                                                          __isl_take isl_ast_print_options *print_options,\n                                                          __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  struct autosa_kernel_stmt *stmt;\n  struct print_hw_module_data *data = (struct print_hw_module_data *)(user);\n\n  id = isl_ast_node_get_annotation(node);\n  stmt = (struct autosa_kernel_stmt *)isl_id_get_user(id);\n  isl_id_free(id);\n\n  isl_ast_print_options_free(print_options);\n\n  switch (stmt->type)\n  {\n  case AUTOSA_KERNEL_STMT_FIFO_DECL:\n    return autosa_kernel_print_fifo_decl(p, stmt, data->prog, data->hls);\n  }\n\n  return p;\n}\n\n/* This function prints the code that prints out the top function that \n * calls the hardware modules and declares the fifos.\n */\nstatic void print_top_gen_host_code(\n  struct autosa_prog *prog, __isl_keep isl_ast_node *node,\n  struct autosa_hw_top_module *top, struct hls_info *hls)\n{\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_ast_node_get_ctx(node);\n  isl_printer *p;\n  int fifo_depth = prog->scop->options->autosa->fifo_depth;\n  struct print_hw_module_data hw_data = {hls, prog, NULL};\n\n  /* Print the top module ASTs. */\n  p = isl_printer_to_file(ctx, hls->top_gen_c);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n\n  print_top_gen_headers(prog, top, hls);\n  fprintf(hls->top_gen_c, \" {\\n\");\n  p = isl_printer_indent(p, 2);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"FILE *fd = fopen(\\\"\");\n  p = isl_printer_print_str(p, hls->output_dir);\n  p = isl_printer_print_str(p, \"/resource_est/design_info.dat\\\", \\\"w\\\");\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"int fifo_cnt;\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"isl_ctx *ctx = isl_ctx_alloc();\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"isl_printer *p = isl_printer_to_file(ctx, f);\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_end_line(p);\n\n  if (hls->target == CATAPULT_HW)\n    p = print_top_module_headers_catapult(p, prog, top, hls);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_indent(p, 2);\");\n  p = isl_printer_end_line(p);\n\n  int n_module_names = 0;\n  char **module_names = NULL;\n  for (int i = 0; i < top->n_hw_modules; i++)\n  {\n    /* Generate module call counter. */\n    struct autosa_hw_module *module = top->hw_modules[i];\n    char *module_name;\n\n    if (module->is_filter && module->is_buffer)\n    {\n      module_name = concat(ctx, module->name, \"intra_trans\");\n\n      n_module_names++;\n      module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n      module_names[n_module_names - 1] = module_name;\n\n      module_name = concat(ctx, module->name, \"inter_trans\");\n\n      n_module_names++;\n      module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n      module_names[n_module_names - 1] = module_name;\n\n      if (module->boundary)\n      {\n        module_name = concat(ctx, module->name, \"inter_trans_boundary\");\n\n        n_module_names++;\n        module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n        module_names[n_module_names - 1] = module_name;\n      }\n    }\n\n    module_name = strdup(module->name);\n\n    n_module_names++;\n    module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n    module_names[n_module_names - 1] = module_name;\n\n    if (module->boundary)\n    {\n      module_name = concat(ctx, module->name, \"boundary\");\n\n      n_module_names++;\n      module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n      module_names[n_module_names - 1] = module_name;\n    }\n\n    if (module->n_pe_dummy_modules > 0)\n    {\n      for (int j = 0; j < module->n_pe_dummy_modules; j++)\n      {\n        struct autosa_pe_dummy_module *dummy_module = module->pe_dummy_modules[j];\n        struct autosa_array_ref_group *group = dummy_module->io_group;\n        isl_printer *p_str = isl_printer_to_str(ctx);\n        p_str = autosa_array_ref_group_print_prefix(group, p_str);\n        p_str = isl_printer_print_str(p_str, \"_PE_dummy\");\n        p_str = isl_printer_print_str(p_str, dummy_module->in? \"_in\" : \"_out\");\n        module_name = isl_printer_get_str(p_str);\n        isl_printer_free(p_str);\n\n        n_module_names++;\n        module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n        module_names[n_module_names - 1] = module_name;\n      }\n    }\n\n    if (module->is_serialized) { \n      if (module->boundary)      \n        module_name = concat(ctx, module->name, \"boundary_serialize\");\n      else\n        module_name = concat(ctx, module->name, \"serialize\");\n      \n      n_module_names++;\n      module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n      module_names[n_module_names - 1] = module_name;\n    }\n  }\n  for (int i = 0; i < n_module_names; i++)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, module_names[i]);\n    p = isl_printer_print_str(p, \"_cnt = 0;\");\n    p = isl_printer_end_line(p);\n  }\n\n  /* Print module calls. */\n  for (int i = 0; i < top->n_module_calls; i++)\n  {\n    /* Print AST */\n    print_options = isl_ast_print_options_alloc(ctx);\n    print_options = isl_ast_print_options_set_print_user(print_options,\n                                                         &print_top_module_call_stmt, &hw_data);\n\n    p = isl_ast_node_print(top->module_call_wrapped_trees[i],\n                           p, print_options);\n  }\n\n  /* module:module_name:module_cnt. */\n  for (int i = 0; i < n_module_names; i++)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"fprintf(fd, \\\"module:\");\n    p = isl_printer_print_str(p, module_names[i]);\n    p = isl_printer_print_str(p, \":\\%d\\\\n\\\", \");\n    p = isl_printer_print_str(p, module_names[i]);\n    p = isl_printer_print_str(p, \"_cnt);\");\n    p = isl_printer_end_line(p);\n  }\n  p = isl_printer_end_line(p);\n\n  for (int i = 0; i < n_module_names; i++)\n  {\n    free(module_names[i]);\n  }\n  free(module_names);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_indent(p, -2);\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"}\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");  \n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_indent(p, -2);\");\n  p = isl_printer_end_line(p);\n\n  /* Print the private fields */\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"private:\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  p = print_str_new_line(p, \"p = isl_printer_indent(p, 2);\");\n\n  /* Print the function calls */\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"/* Module Declaration */\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  for (int i = 0; i < top->n_module_calls; i++) {\n    print_options = isl_ast_print_options_alloc(ctx);\n    print_options = isl_ast_print_options_set_print_user(print_options,\n                                                         &print_top_module_call_inst, &hw_data);\n    p = isl_ast_node_print(top->module_call_wrapped_trees[i],\n                           p, print_options);\n  }\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"/* Module Declaration */\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  /* Print the fifo decls */\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"/* FIFO Declaration */\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  \n  /* Print the serialize fifos if existing. */\n  for (int i = 0; i < top->n_hw_modules; i++) {\n    struct autosa_hw_module *module = top->hw_modules[i];\n    struct autosa_array_ref_group *group = module->io_groups[0];\n    if (module->is_serialized) {\n      /* Generate fifo decl counter. */\n      char *fifo_name;\n      int fifo_w;  // bytes\n      fifo_w = module->data_pack_inter * group->array->size;\n      isl_printer *p_str;\n      p_str = isl_printer_to_str(ctx);\n      p_str = autosa_array_ref_group_print_fifo_name(group, p_str);\n      p_str = isl_printer_print_str(p_str, \"_\");\n      p_str = isl_printer_print_str(p_str, module->name);\n      p_str = isl_printer_print_str(p_str, \"_serialize\");\n      fifo_name = isl_printer_get_str(p_str);\n      isl_printer_free(p_str);\n\n      p = print_str_new_line(p, \"fifo_cnt = 1;\");\n      p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* \");\n      p = isl_printer_print_str(p, module->name);\n      p = isl_printer_print_str(p, \"_serialize fifo */ \");      \n      p = print_fifo_type_catapult(p, group, module->data_pack_inter);\n      p = isl_printer_print_str(p, \" \");\n      p = isl_printer_print_str(p, fifo_name);      \n      p = isl_printer_print_str(p, \";\\\");\");\n      p = isl_printer_end_line(p);\n      p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");      \n\n      /* fifo:fifo_name:fifo_cnt:fifo_width */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"fprintf(fd, \\\"fifo:\");\n      p = isl_printer_print_str(p, fifo_name);\n      p = isl_printer_print_str(p, \":\\%d:\");\n      p = isl_printer_print_int(p, fifo_w);\n      p = isl_printer_print_str(p, \"\\\\n\\\", fifo_cnt);\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_end_line(p);      \n      free(fifo_name);\n    }\n  }\n\n  for (int i = 0; i < top->n_fifo_decls; i++) {\n    /* Generate fifo decl counter. */\n    char *fifo_decl_name = top->fifo_decl_names[i];\n    char *fifo_name = extract_fifo_name_from_fifo_decl_name(ctx, fifo_decl_name);\n    char *fifo_w = extract_fifo_width_from_fifo_decl_name(ctx, fifo_decl_name);\n    p = print_str_new_line(p, \"fifo_cnt = 0;\");\n\n    /* Print AST */\n    print_options = isl_ast_print_options_alloc(ctx);\n    print_options = isl_ast_print_options_set_print_user(print_options,\n                                                         &print_top_module_fifo_stmt, &hw_data); \n\n    p = isl_ast_node_print(top->fifo_decl_wrapped_trees[i],\n                           p, print_options);\n\n    /* fifo:fifo_name:fifo_cnt:fifo_width */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"fprintf(fd, \\\"fifo:\");\n    p = isl_printer_print_str(p, fifo_name);\n    p = isl_printer_print_str(p, \":\\%d:\");\n    p = isl_printer_print_str(p, fifo_w);\n    p = isl_printer_print_str(p, \"\\\\n\\\", fifo_cnt);\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_end_line(p);\n\n    free(fifo_name);\n    free(fifo_w);\n  }\n\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");    \n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"/* FIFO Declaration */\\\");\");  \n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  p = print_str_new_line(p, \"p = isl_printer_indent(p, -2);\");\n  p = print_str_new_line(p, \"p = isl_printer_indent(p, -2);\");\n\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"};\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  //if (hls->target == XILINX_HW)\n  //{\n  //  if (!hls->hls)\n  //  {\n  //    p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  //    p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"}\\\");\");\n  //    p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  //  }\n  //}\n\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"fclose(fd);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"isl_printer_free(p);\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"isl_ctx_free(ctx);\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_indent(p, -2);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"}\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_end_line(p);\n\n  /* For internal testing only. */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"int main()\");\n  p = isl_printer_end_line(p);\n\n  p = ppcg_start_block(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"FILE *f = fopen(\\\"\");\n  p = isl_printer_print_str(p, hls->output_dir);\n  p = isl_printer_print_str(p, \"/src/top.cpp\\\", \\\"w\\\");\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"top_generate(f);\");\n  p = isl_printer_end_line(p);\n\n  p = ppcg_end_block(p);\n  p = isl_printer_free(p);\n\n  return;  \n}\n\n/* This function prints the tcl file for the catapult HLS project. */\nstatic void print_tcl_code(\n  struct autosa_prog *prog, \n  struct autosa_hw_module **modules,\n  int n_modules,\n  struct hls_info *hls)\n{\n  isl_ctx *ctx = prog->ctx;\n  isl_printer *p;\n  \n  p = isl_printer_to_file(ctx, hls->tcl);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n\n  p = print_str_new_line(p, \"solution new -state initial\");\n  p = print_str_new_line(p, \"solution options defaults\");\n  p = print_str_new_line(p, \"solution options set /Input/CppStandard c++11\");\n  p = print_str_new_line(p, \"solution options set /Output/GenerateCycleNetlist false\");\n  p = print_str_new_line(p, \"solution options set /Flows/SCVerify/USE_CCS_BLOCK true\");\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"solution file add ./\");\n  p = isl_printer_print_str(p, hls->kernel_prefix);\n  p = isl_printer_print_str(p, \"_kernel.h -type CHEADER\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"solution file add ./\");\n  p = isl_printer_print_str(p, hls->kernel_prefix);\n  p = isl_printer_print_str(p, \"_kernel_hw.h -type CHEADER\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"solution file add ./\");\n  p = isl_printer_print_str(p, hls->kernel_prefix);\n  p = isl_printer_print_str(p, \".h -type CHEADER\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"solution file add ./\");\n  p = isl_printer_print_str(p, hls->kernel_prefix);\n  p = isl_printer_print_str(p, \"_host.cpp -type C++\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"directive set -PIPELINE_RAMP_UP true\");\n  p = print_str_new_line(p, \"directive set -PROTOTYPING_ENGINE oasys\");\n  p = print_str_new_line(p, \"directive set -CLUSTER_TYPE combinational\");\n  p = print_str_new_line(p, \"directive set -CLUSTER_FAST_MODE false\");\n  p = print_str_new_line(p, \"directive set -CLUSTER_RTL_SYN false\");\n  p = print_str_new_line(p, \"directive set -CLUSTER_OPT_CONSTANT_INPUTS true\");\n  p = print_str_new_line(p, \"directive set -CLUSTER_ADDTREE_IN_COUNT_THRESHOLD 0\");\n  p = print_str_new_line(p, \"directive set -CLUSTER_ADDTREE_IN_WIDTH_THRESHOLD 0\");\n  p = print_str_new_line(p, \"directive set -ROM_THRESHOLD 64\");\n  p = print_str_new_line(p, \"directive set -PROTOTYPE_ROM true\");\n  p = print_str_new_line(p, \"directive set -CHARACTERIZE_ROM false\");\n  p = print_str_new_line(p, \"directive set -OPT_CONST_MULTS use_library\");\n  p = print_str_new_line(p, \"directive set -CLOCK_OVERHEAD 20.000000\");\n  p = print_str_new_line(p, \"directive set -RESET_CLEARS_ALL_REGS use_library\");\n  p = print_str_new_line(p, \"directive set -START_FLAG {}\");\n  p = print_str_new_line(p, \"directive set -READY_FLAG {}\");\n  p = print_str_new_line(p, \"directive set -DONE_FLAG {}\");\n  p = print_str_new_line(p, \"directive set -TRANSACTION_DONE_SIGNAL true\");\n  p = print_str_new_line(p, \"directive set -STALL_FLAG false\");\n  p = print_str_new_line(p, \"directive set -IDLE_SIGNAL {}\");\n  p = print_str_new_line(p, \"directive set -REGISTER_IDLE_SIGNAL false\");\n  p = print_str_new_line(p, \"directive set -ARRAY_SIZE 1024\");\n  p = print_str_new_line(p, \"directive set -CHAN_IO_PROTOCOL use_library\");\n  p = print_str_new_line(p, \"directive set -IO_MODE super\");\n  p = print_str_new_line(p, \"directive set -UNROLL no\");\n  p = print_str_new_line(p, \"directive set -REALLOC true\");\n  p = print_str_new_line(p, \"directive set -MUXPATH true\");\n  p = print_str_new_line(p, \"directive set -TIMING_CHECKS true\");\n  p = print_str_new_line(p, \"directive set -ASSIGN_OVERHEAD 0\");\n  p = print_str_new_line(p, \"directive set -REGISTER_SHARING_LIMIT 0\");\n  p = print_str_new_line(p, \"directive set -REGISTER_SHARING_MAX_WIDTH_DIFFERENCE 8\");\n  p = print_str_new_line(p, \"directive set -SAFE_FSM false\");\n  p = print_str_new_line(p, \"directive set -NO_X_ASSIGNMENTS true\");\n  p = print_str_new_line(p, \"directive set -REG_MAX_FANOUT 0\");\n  p = print_str_new_line(p, \"directive set -FSM_BINARY_ENCODING_THRESHOLD 64\");\n  p = print_str_new_line(p, \"directive set -FSM_ENCODING none\");\n  p = print_str_new_line(p, \"directive set -LOGIC_OPT false\");\n  p = print_str_new_line(p, \"directive set -MEM_MAP_THRESHOLD 32\");\n  p = print_str_new_line(p, \"directive set -REGISTER_THRESHOLD 256\");\n  p = print_str_new_line(p, \"directive set -MERGEABLE true\");\n  p = print_str_new_line(p, \"directive set -SPECULATE true\");\n  p = print_str_new_line(p, \"directive set -DESIGN_GOAL area\");\n\n  p = print_str_new_line(p, \"go new\");\n  p = print_str_new_line(p, \"solution library add mgc_Xilinx-VIRTEX-uplus-2LV_beh -- -rtlsyntool Vivado -manufacturer Xilinx -family VIRTEX-uplus -speed -2LV -part xcvu11p-flga2577-2LV-e\");\n  p = print_str_new_line(p, \"solution library add Xilinx_RAMS\");\n  p = print_str_new_line(p, \"solution library add Xilinx_ROMS\");\n  p = print_str_new_line(p, \"solution library add amba\");\n  p = print_str_new_line(p, \"solution library add ccs_fpga_hic\");\n  p = print_str_new_line(p, \"solution library add Xilinx_FIFO\");\n\n  p = print_str_new_line(p, \"go libraries\");\n  p = print_str_new_line(p, \"directive set -CLOCKS {clk {-CLOCK_PERIOD 5.0 -CLOCK_EDGE rising -CLOCK_UNCERTAINTY 0.0 -CLOCK_HIGH_TIME 2.5 -RESET_SYNC_NAME rst -RESET_ASYNC_NAME arst_n -RESET_KIND sync -RESET_SYNC_ACTIVE high -RESET_ASYNC_ACTIVE low -ENABLE_ACTIVE high}}\");\n\n  p = print_str_new_line(p, \"go assembly\");\n  p = print_str_new_line(p, \"directive set -FIFO_DEPTH 1\");\n\n  /* Set all modules with identifiers to direct input. */\n  const char *dims[] = {\"idx\", \"idy\", \"idz\"};\n  for (int i = 0; i < n_modules; i++) {\n    int n = isl_id_list_n_id(modules[i]->inst_ids);\n    if (modules[i]->is_filter && modules[i]->is_buffer) {\n      /* Intra transfer function */      \n      if (n > 0) {\n        for (int j = 0; j < n; j++) {\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"directive set /kernel0/\");\n          p = isl_printer_print_str(p, modules[i]->name);\n          p = isl_printer_print_str(p, \"_intra_trans/\");\n          p = isl_printer_print_str(p, dims[j]);\n          p = isl_printer_print_str(p, \":rsc -MAP_TO_MODULE {[DirectInput]}\");\n          p = isl_printer_end_line(p);\n        }\n      }\n\n      /* Inter transfer function */\n      if (n > 0) {\n        for (int j = 0; j < n; j++) {\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"directive set /kernel0/\");\n          p = isl_printer_print_str(p, modules[i]->name);\n          p = isl_printer_print_str(p, \"_inter_trans/\");\n          p = isl_printer_print_str(p, dims[j]);\n          p = isl_printer_print_str(p, \":rsc -MAP_TO_MODULE {[DirectInput]}\");\n          p = isl_printer_end_line(p);\n        }\n      }\n\n      if (modules[i]->boundary) {\n        if (n > 0) {\n          for (int j = 0; j < n; j++) {\n            p = isl_printer_start_line(p);\n            p = isl_printer_print_str(p, \"directive set /kernel0/\");\n            p = isl_printer_print_str(p, modules[i]->name);\n            p = isl_printer_print_str(p, \"_inter_trans_boundary/\");\n            p = isl_printer_print_str(p, dims[j]);\n            p = isl_printer_print_str(p, \":rsc -MAP_TO_MODULE {[DirectInput]}\");\n            p = isl_printer_end_line(p);\n          }\n        }\n      }\n    }\n\n    /* Default module */\n    if (n > 0) {\n      for (int j = 0; j < n; j++) {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"directive set /kernel0/\");\n        p = isl_printer_print_str(p, modules[i]->name);\n        p = isl_printer_print_str(p, \"/\");\n        p = isl_printer_print_str(p, dims[j]);\n        p = isl_printer_print_str(p, \":rsc -MAP_TO_MODULE {[DirectInput]}\");\n        p = isl_printer_end_line(p);\n      }\n\n      /* Serialize */\n      if (modules[i]->to_mem && modules[i]->options->autosa->host_serialize) {\n        for (int j = 0; j < n; j++) {\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"directive set /kernel0/\");\n          p = isl_printer_print_str(p, modules[i]->name);\n          p = isl_printer_print_str(p, \"_serialize/\");\n          p = isl_printer_print_str(p, dims[j]);\n          p = isl_printer_print_str(p, \":rsc -MAP_TO_MODULE {[DirectInput]}\");\n          p = isl_printer_end_line(p);\n        }\n      }\n\n      if (modules[i]->boundary) {\n        for (int j = 0; j < n; j++) {\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"directive set /kernel0/\");\n          p = isl_printer_print_str(p, modules[i]->name);\n          p = isl_printer_print_str(p, \"_boundary/\");\n          p = isl_printer_print_str(p, dims[j]);\n          p = isl_printer_print_str(p, \":rsc -MAP_TO_MODULE {[DirectInput]}\");\n          p = isl_printer_end_line(p);\n        }\n\n        /* Serialize */\n        if (modules[i]->to_mem && modules[i]->options->autosa->host_serialize) {\n          for (int j = 0; j < n; j++) {\n            p = isl_printer_start_line(p);\n            p = isl_printer_print_str(p, \"directive set /kernel0/\");\n            p = isl_printer_print_str(p, modules[i]->name);\n            p = isl_printer_print_str(p, \"_boundary_serialize/\");\n            p = isl_printer_print_str(p, dims[j]);\n            p = isl_printer_print_str(p, \":rsc -MAP_TO_MODULE {[DirectInput]}\");\n            p = isl_printer_end_line(p);\n          }\n        } \n      }\n    }\n  }\n\n  /* Set local buffer properties. */\n  for (int i = 0; i < n_modules; i++) {\n    if (modules[i]->type == PE_MODULE)\n      continue;\n    for (int j = 0; j < modules[i]->n_var; j++) {\n      struct autosa_kernel_var *var;\n      var = (struct autosa_kernel_var *)&modules[i]->var[j];\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"directive set /kernel0/\");\n      p = isl_printer_print_str(p, modules[i]->name);\n      p = isl_printer_print_str(p, \"/\");\n      p = isl_printer_print_str(p, modules[i]->name);\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_str(p, var->name);\n      p = isl_printer_print_str(p, \"_inst:cns -MAP_TO_MODULE Xilinx_RAMS.BLOCK_1R1W_RBW_DUAL\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"directive set /kernel0/\");\n      p = isl_printer_print_str(p, modules[i]->name);\n      p = isl_printer_print_str(p, \"/\");\n      p = isl_printer_print_str(p, modules[i]->name);\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_str(p, var->name);\n      if (modules[i]->double_buffer)\n        p = isl_printer_print_str(p, \"_inst:cns -STAGE_REPLICATION 2\");\n      else\n        p = isl_printer_print_str(p, \"_inst:cns -STAGE_REPLICATION 1\");\n      p = isl_printer_end_line(p);\n\n      /* word width */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"directive set /kernel0/\");\n      p = isl_printer_print_str(p, modules[i]->name);\n      p = isl_printer_print_str(p, \"/\");\n      p = isl_printer_print_str(p, modules[i]->name);\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_str(p, var->name);\n      p = isl_printer_print_str(p, \"_inst -WORD_WIDTH \");\n      p = isl_printer_print_int(p, var->array->size * 8 * var->n_lane);\n      p = isl_printer_end_line(p);\n\n      if (modules[i]->boundary) {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"directive set /kernel0/\");\n        p = isl_printer_print_str(p, modules[i]->name);\n        p = isl_printer_print_str(p, \"_boundary\");\n        p = isl_printer_print_str(p, \"/\");\n        p = isl_printer_print_str(p, modules[i]->name);\n        //p = isl_printer_print_str(p, \"_boundary_\");\n        p = isl_printer_print_str(p, \"_\");\n        p = isl_printer_print_str(p, var->name);\n        p = isl_printer_print_str(p, \"_inst:cns -MAP_TO_MODULE Xilinx_RAMS.BLOCK_1R1W_RBW_DUAL\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"directive set /kernel0/\");\n        p = isl_printer_print_str(p, modules[i]->name);\n        p = isl_printer_print_str(p, \"_boundary\");\n        p = isl_printer_print_str(p, \"/\");\n        p = isl_printer_print_str(p, modules[i]->name);\n        //p = isl_printer_print_str(p, \"_boundary_\");\n        p = isl_printer_print_str(p, \"_\");\n        p = isl_printer_print_str(p, var->name);\n        if (modules[i]->double_buffer)\n          p = isl_printer_print_str(p, \"_inst:cns -STAGE_REPLICATION 2\");\n        else\n          p = isl_printer_print_str(p, \"_inst:cns -STAGE_REPLICATION 1\");\n        p = isl_printer_end_line(p);\n\n        /* word width */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"directive set /kernel0/\");\n        p = isl_printer_print_str(p, modules[i]->name);\n        p = isl_printer_print_str(p, \"_boundary\");\n        p = isl_printer_print_str(p, \"/\");\n        p = isl_printer_print_str(p, modules[i]->name);\n        //p = isl_printer_print_str(p, \"_boundary_\");\n        p = isl_printer_print_str(p, \"_\");\n        p = isl_printer_print_str(p, var->name);\n        p = isl_printer_print_str(p, \"_inst -WORD_WIDTH \");\n        p = isl_printer_print_int(p, var->array->size * 8 * var->n_lane);\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n\n  p = print_str_new_line(p, \"go architect\");\n  p = print_str_new_line(p, \"// Insert directives for dependence if necessary\");\n  p = print_str_new_line(p, \"// Example: directive set /kernel0/PE/run/for:read_mem(local_C:rsc.@) -IGNORE_DEPENDENCY_FROM {for:write_mem(local_C:rsc.@) for:write_mem(local_C:rsc.@)}\");\n  \n  p = print_str_new_line(p, \"go allocate\");\n  p = print_str_new_line(p, \"go extract\");\n\n  p = isl_printer_free(p);\n\n  return;\n}\n\n/* Given a autosa_prog \"prog\" and the corresponding tranformed AST\n * \"tree\", print the entire OpenCL/HLS code to \"p\".\n * \"types\" collects the types for which a definition has already been\n * printed.\n */\nstatic __isl_give isl_printer *print_hw(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, __isl_keep isl_ast_node *tree,\n    struct autosa_hw_module **modules, int n_modules,\n    struct autosa_hw_top_module *top_module,\n    struct autosa_drain_merge_func **drain_merge_funcs, int n_drain_merge_funcs,\n    struct autosa_types *types, void *user)\n{\n  struct hls_info *hls = (struct hls_info *)user;\n  isl_printer *p_tmp;\n\n  p_tmp = isl_printer_to_file(isl_printer_get_ctx(p), hls->kernel_c);\n  p_tmp = isl_printer_set_output_format(p_tmp, ISL_FORMAT_C);\n  p_tmp = autosa_print_types(p_tmp, types, prog);\n  p_tmp = isl_printer_free(p_tmp);  \n\n  /* Print OpenCL host and kernel function. */\n  p = autosa_print_host_code(p, prog, tree, modules, n_modules, top_module,\n                             drain_merge_funcs, n_drain_merge_funcs, hls);\n  /* Print seperate top module code generation function. */\n  print_top_gen_host_code(prog, tree, top_module, hls);\n  /* Print the separate TCL file. */\n  print_tcl_code(prog, modules, n_modules, hls);\n\n  return p;\n}\n\n/* Generate systolic arrays using Catapult HLS C.\n */\nint generate_autosa_catapult_hls_c(isl_ctx *ctx, struct ppcg_options *options,\n                                   const char *input)\n{\n  struct hls_info hls;\n  int r;\n\n  hls.target = CATAPULT_HW;  \n  hls.hls = 1;\n  hls.ctx = ctx;\n  hls.output_dir = options->autosa->output_dir;\n  hls.hcl = options->autosa->hcl;\n  hls_open_files(&hls, input);\n\n  r = generate_sa(ctx, input, hls.host_c, options, &print_hw, &hls);\n\n  hls_close_files(&hls);\n\n  return r;\n}"
  },
  {
    "path": "src/autosa_catapult_hls_c.h",
    "content": "#ifndef _AUTOSA_CATAPULT_HLS_C_H\n#define _AUTOSA_CATAPULT_HLS_C_H\n\n#include <pet.h>\n#include \"ppcg_options.h\"\n#include \"ppcg.h\"\n\n#ifdef __cplusplus\nextern \"C\"\n{\n#endif\n\nint generate_autosa_catapult_hls_c(isl_ctx *ctx, struct ppcg_options *options,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t const char *input);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif"
  },
  {
    "path": "src/autosa_codegen.cpp",
    "content": "#include <isl/aff.h>\n\n#include <barvinok/isl.h>\n\n#include \"autosa_codegen.h\"\n#include \"autosa_utils.h\"\n#include \"autosa_print.h\"\n#include \"autosa_schedule_tree.h\"\n#include \"autosa_comm.h\"\n\n/* Generate the I/O module name.\n * [io_group_name]_IO_L[X]_in/out\n */\nstatic char *generate_io_module_name(isl_ctx *ctx,\n                                     struct autosa_array_ref_group *group, int level, int read)\n{\n  isl_printer *p;\n\n  p = isl_printer_to_str(ctx);\n  p = isl_printer_print_str(p, group->array->name);\n  if (group->group_type == AUTOSA_IO_GROUP)\n  {\n    if (group->local_array->n_io_group > 1)\n    {\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_int(p, group->nr);\n    }\n  }\n  else if (group->group_type == AUTOSA_DRAIN_GROUP)\n  {\n    p = isl_printer_print_str(p, \"_\");\n    p = isl_printer_print_str(p, \"drain\");\n  }\n  p = isl_printer_print_str(p, \"_IO_L\");\n  p = isl_printer_print_int(p, level);\n  if (read)\n    p = isl_printer_print_str(p, \"_in\");\n  else\n    p = isl_printer_print_str(p, \"_out\");\n\n  char *str = isl_printer_get_str(p);\n  isl_printer_free(p);\n\n  return str;\n}\n\n/* Return an isl_multi_aff, with as elements the parameters in \"space\"\n * that have the names specified by the elements in \"names\".\n * If (some of) these parameters do not already appear in \"space\",\n * then they are added first.\n */\nstatic __isl_give isl_multi_aff *parameter_vector(__isl_take isl_space *space,\n                                                  __isl_keep isl_id_list *names)\n{\n  int i, n;\n  isl_local_space *ls;\n  isl_multi_aff *ma;\n\n  if (!names)\n    space = isl_space_free(space);\n\n  n = isl_id_list_n_id(names);\n  for (i = 0; i < n; ++i)\n  {\n    int pos;\n    isl_id *id;\n\n    id = isl_id_list_get_id(names, i);\n    pos = isl_space_find_dim_by_id(space, isl_dim_param, id);\n    if (pos >= 0)\n    {\n      isl_id_free(id);\n      continue;\n    }\n    pos = isl_space_dim(space, isl_dim_param);\n    space = isl_space_add_dims(space, isl_dim_param, 1);\n    space = isl_space_set_dim_id(space, isl_dim_param, pos, id);\n  }\n  ma = isl_multi_aff_zero(isl_space_copy(space));\n  ls = isl_local_space_from_space(isl_space_domain(space));\n  for (i = 0; i < n; ++i)\n  {\n    int pos;\n    isl_id *id;\n    isl_aff *aff;\n\n    id = isl_id_list_get_id(names, i);\n    pos = isl_space_find_dim_by_id(space, isl_dim_param, id);\n    isl_id_free(id);\n    aff = isl_aff_var_on_domain(isl_local_space_copy(ls),\n                                isl_dim_param, pos);\n    ma = isl_multi_aff_set_aff(ma, i, aff);\n  }\n  isl_local_space_free(ls);\n\n  return ma;\n}\n\n/* Return constraints on the domain elements that are greater or equal \n * to a sequence of parameters called \"names\", to the partial schedule of \"node\".\n * The number of members of the band node \"node\" should be smaller\n * than or equal to the number of elements in \"names\". \n * If it is smaller, then the first elements of \"names\" are equated to zero.\n */\nstatic __isl_give isl_union_set *set_schedule_ge(\n    __isl_keep isl_schedule_node *node, __isl_keep isl_id_list *names)\n{\n  int n, n_zero;\n  isl_multi_union_pw_aff *mupa, *mupa2;\n  isl_multi_aff *ma;\n  isl_space *space;\n  isl_union_set *domain;\n\n  if (!node)\n    return NULL;\n  n = isl_id_list_n_id(names);\n  if (n == 0)\n    return isl_schedule_node_get_universe_domain(node);\n  n_zero = n - isl_schedule_node_band_n_member(node);\n\n  mupa = isl_schedule_node_band_get_partial_schedule(node);\n  space = isl_multi_union_pw_aff_get_space(mupa);\n  space = isl_space_params(space);\n  space = isl_space_set_from_params(space);\n  space = isl_space_add_dims(space, isl_dim_set, n_zero);\n  ma = isl_multi_aff_zero(space);\n  domain = isl_schedule_node_get_universe_domain(node);\n  /* Generate the mupa that is on the same domain of partial schedule, with\n   * a function that maps to the n_zero dims to zero. */\n  mupa2 = isl_multi_union_pw_aff_multi_aff_on_domain(\n      isl_union_set_copy(domain), ma);\n\n  /* Generate the mupa with the n_zero dims as paramters and equal zero. */\n  mupa = isl_multi_union_pw_aff_range_product(mupa2, mupa);\n  space = isl_multi_union_pw_aff_get_space(mupa);\n  ma = parameter_vector(space, names);\n  /* Generate the mupa that is on the same domain of partial schedule, with\n   * a function that maps the domain elements to the parameters. */\n  mupa2 = isl_multi_union_pw_aff_multi_aff_on_domain(domain, ma);\n  mupa = isl_multi_union_pw_aff_sub(mupa, mupa2);\n\n  return isl_multi_union_pw_aff_nonneg_union_set(mupa);\n}\n\n/* Return constraints on the domain elements that less or equal to a sequence of\n * parameters called \"names\", to the partial schedule of \"node\".\n * The number of members of the band node \"node\" should be smaller\n * than or equal to the number of elements in \"names\". \n * If it is smaller, then the first elements of \"names\" are equated to zero.\n */\nstatic __isl_give isl_union_set *set_schedule_le(\n    __isl_keep isl_schedule_node *node, __isl_keep isl_id_list *names)\n{\n  int n, n_zero;\n  isl_multi_union_pw_aff *mupa, *mupa2;\n  isl_multi_aff *ma;\n  isl_space *space;\n  isl_union_set *domain;\n\n  if (!node)\n    return NULL;\n  n = isl_id_list_n_id(names);\n  if (n == 0)\n    return isl_schedule_node_get_universe_domain(node);\n  n_zero = n - isl_schedule_node_band_n_member(node);\n\n  mupa = isl_schedule_node_band_get_partial_schedule(node);\n  space = isl_multi_union_pw_aff_get_space(mupa);\n  space = isl_space_params(space);\n  space = isl_space_set_from_params(space);\n  space = isl_space_add_dims(space, isl_dim_set, n_zero);\n  ma = isl_multi_aff_zero(space);\n  domain = isl_schedule_node_get_universe_domain(node);\n  /* Generate the mupa that is on the same domain of partial schedule, with\n   * a function that maps to the n_zero dims to zero. */\n  mupa2 = isl_multi_union_pw_aff_multi_aff_on_domain(\n      isl_union_set_copy(domain), ma);\n\n  /* Generate the mupa with the n_zero dims as paramters and equal zero. */\n  mupa = isl_multi_union_pw_aff_range_product(mupa2, mupa);\n  space = isl_multi_union_pw_aff_get_space(mupa);\n  ma = parameter_vector(space, names);\n  /* Generate the mupa that is on the same domain of partial schedule, with\n   * a function that maps the domain elements to the parameters. */\n  mupa2 = isl_multi_union_pw_aff_multi_aff_on_domain(domain, ma);\n  mupa = isl_multi_union_pw_aff_sub(mupa2, mupa);\n\n  return isl_multi_union_pw_aff_nonneg_union_set(mupa);\n}\n\n/* Construct an isl_multi_val for use as tile sizes for tiling \"node\"\n * from the elements in \"tile_size\".\n */\nstatic __isl_give isl_multi_val *construct_band_tiles_sizes(\n    __isl_keep isl_schedule_node *node, int *tile_size)\n{\n  isl_space *space;\n\n  if (!node)\n    return NULL;\n\n  space = isl_schedule_node_band_get_space(node);\n  return ppcg_multi_val_from_int_list(space, tile_size);\n}\n\n/* Return constraints on the domain elements that equate a sequence of\n * parameters called \"names\", to the partial schedule\n * of \"node\" modulo the integers in \"size\".\n * The number of elements in the array \"size\" should be equal\n * to the number of elements in \"names\".\n * The number of members of the band node \"node\" should be smaller\n * than or equal to this number.  If it is smaller, then the first\n * elements of \"names\" are equated to zero.\n */\nstatic __isl_give isl_union_set *set_schedule_modulo(\n    __isl_keep isl_schedule_node *node, __isl_keep isl_id_list *names,\n    int *size)\n{\n  int n, n_zero;\n  isl_space *space;\n  isl_multi_aff *ma;\n  isl_multi_union_pw_aff *mupa, *mupa2;\n  isl_multi_val *mv;\n  isl_union_set *domain;\n\n  if (!node)\n    return NULL;\n  n = isl_id_list_n_id(names);\n  if (n == 0)\n    return isl_schedule_node_get_universe_domain(node);\n  n_zero = n - isl_schedule_node_band_n_member(node);\n\n  mupa = isl_schedule_node_band_get_partial_schedule(node);\n  mv = construct_band_tiles_sizes(node, size + n_zero);\n  mupa = isl_multi_union_pw_aff_mod_multi_val(mupa, mv);\n  space = isl_multi_union_pw_aff_get_space(mupa);\n  space = isl_space_params(space);\n  space = isl_space_set_from_params(space);\n  space = isl_space_add_dims(space, isl_dim_set, n_zero);\n  ma = isl_multi_aff_zero(space);\n\n  domain = isl_schedule_node_get_universe_domain(node);\n  mupa2 = isl_multi_union_pw_aff_multi_aff_on_domain(\n      isl_union_set_copy(domain), ma);\n  mupa = isl_multi_union_pw_aff_range_product(mupa2, mupa);\n\n  space = isl_multi_union_pw_aff_get_space(mupa);\n  ma = parameter_vector(space, names);\n\n  mupa2 = isl_multi_union_pw_aff_multi_aff_on_domain(domain, ma);\n  mupa = isl_multi_union_pw_aff_sub(mupa, mupa2);\n\n  return isl_multi_union_pw_aff_zero_union_set(mupa);\n}\n\n/* Generate two prefixes: fifo_prefix and buffer_prefix\n * fifo_prefix: fifo_A_0\n * buffer_prefix: local_A_0\n */\nstatic void init_suffix(struct autosa_hw_module *module,\n                        struct autosa_array_ref_group *group, char **fifo_suffix, char **buf_suffix)\n{\n  isl_ctx *ctx = isl_map_get_ctx(group->access);\n\n  isl_printer *p = isl_printer_to_str(ctx);\n  p = autosa_array_ref_group_print_fifo_name(group, p);\n  *fifo_suffix = isl_printer_get_str(p);\n  isl_printer_free(p);\n\n  p = isl_printer_to_str(ctx);\n  p = isl_printer_print_str(p, \"local_\");\n  p = isl_printer_print_str(p, group->array->name);\n  if ((group->group_type == AUTOSA_IO_GROUP && group->local_array->n_io_group > 1) ||\n      (group->group_type == AUTOSA_PE_GROUP && group->local_array->n_pe_group > 1))\n  {\n    p = isl_printer_print_str(p, \"_\");\n    p = isl_printer_print_int(p, group->nr);\n  }\n  if (group->group_type == AUTOSA_DRAIN_GROUP)\n  {\n    p = isl_printer_print_str(p, \"_\");\n    p = isl_printer_print_str(p, \"drain\");\n  }  \n  *buf_suffix = isl_printer_get_str(p);\n  isl_printer_free(p);\n}\n\n///* Return constraints on the domain elements that equate the partial schedule\n// * of \"node\" to the lower bound of partial schedule. \n// */\n//static __isl_give isl_union_set *schedule_eq_lb(\n//    __isl_keep isl_schedule_node *node)\n//{\n//  int n, n_zero;\n//  isl_multi_union_pw_aff *mupa, *mupa2;\n//  isl_multi_aff *ma;\n//  isl_space *space;\n//  isl_union_set *domain;\n//  isl_union_map *umap;\n//  isl_union_set *uset;\n//  isl_schedule_node *node2;\n//  isl_bool under_extension = isl_bool_false;\n//\n//  if (!node)\n//    return NULL;\n//\n//  /* Test if it is under extension node */\n//  node2 = isl_schedule_node_copy(node);\n//  while (node2)\n//  {\n//    if (isl_schedule_node_get_type(node2) == isl_schedule_node_extension)\n//    {\n//      under_extension = isl_bool_true;\n//      break;\n//    }\n//    if (isl_schedule_node_has_parent(node2))\n//      node2 = isl_schedule_node_parent(node2);\n//    else\n//      break;\n//  }\n//  isl_schedule_node_free(node2);\n//\n//  umap = isl_schedule_node_band_get_partial_schedule_union_map(node);\n//  if (!under_extension)\n//  {\n//    domain = isl_schedule_node_get_domain(node);\n//    umap = isl_union_map_intersect_domain(umap, domain);\n//  }\n//  uset = isl_union_map_range(isl_union_map_copy(umap));\n//  uset = isl_union_set_lexmin(uset);\n//  umap = isl_union_map_reverse(umap);\n//  uset = isl_union_set_apply(uset, umap);\n//\n//  return uset;\n//}\nstatic __isl_give isl_union_set *schedule_eq_lb(\n  __isl_keep isl_schedule_node *node)\n{\n  isl_schedule_node *child;\n  isl_union_map *prefix, *prefix_ge;\n  int depth1, depth2;\n  isl_set *prefix_range;\n  isl_map *sched_identity, *ge;\n  isl_union_set *domain;\n  isl_schedule_node *node_tmp;\n  isl_bool under_extension = isl_bool_false;\n\n  if (!node)\n    return NULL;\n\n  /* Test if \"node\" is under extension node */\n  node_tmp = isl_schedule_node_copy(node);\n  while (node_tmp) {\n    if (isl_schedule_node_get_type(node_tmp) == isl_schedule_node_extension) {\n      under_extension = isl_bool_true;\n      break;\n    }\n    if (isl_schedule_node_has_parent(node_tmp)) \n      node_tmp = isl_schedule_node_parent(node_tmp);\n    else\n      break;\n  }\n  isl_schedule_node_free(node_tmp);\n\n  if (under_extension) {\n//#ifdef _DEBUG    \n//    printf(\"debug: under extension\\n\");\n//    DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif    \n    /* Currently all the extension nodes are inserted with rectangular schedule domains.\n     * Therefore, we will safely call a routine that handles the rectangular \n     * domains to get the lower bound. \n     */\n    isl_union_map *umap;\n    isl_union_set *uset;\n    umap = isl_schedule_node_band_get_partial_schedule_union_map(node);\n    uset = isl_union_map_range(isl_union_map_copy(umap));\n    uset = isl_union_set_lexmin(uset);\n    umap = isl_union_map_reverse(umap);\n    uset = isl_union_set_apply(uset, umap);\n\n    return uset;\n  }\n\n  depth1 = isl_schedule_node_get_schedule_depth(node);\n  child = isl_schedule_node_child(isl_schedule_node_copy(node), 0);\n  depth2 = isl_schedule_node_get_schedule_depth(child);\n  prefix = isl_schedule_node_get_prefix_schedule_relation(child);\n  //DBGSCHDNODE(stdout, child, isl_schedule_node_get_ctx(child));\n  //DBGUMAP(stdout, prefix, isl_schedule_node_get_ctx(child));\n  isl_schedule_node_free(child);  \n  //isl_union_set *tmp_uset = isl_union_map_range(isl_union_map_copy(prefix));\n  //DBGUSET(stdout, tmp_uset, isl_union_set_get_ctx(tmp_uset));\n  //prefix_range = isl_set_from_union_set(tmp_uset);\n  prefix_range = isl_set_from_union_set(isl_union_map_range(isl_union_map_copy(prefix)));\n  ge = isl_map_lex_ge(isl_set_get_space(prefix_range));\n  /* Set the outer dims equal */\n  for (int i = 0; i < depth1; i++) {\n    ge = isl_map_equate(ge, isl_dim_in, i, isl_dim_out, i);\n  }\n  ge = isl_map_intersect_domain(ge, isl_set_copy(prefix_range));\n  ge = isl_map_intersect_range(ge, prefix_range);\n  prefix_ge = isl_union_map_apply_range(isl_union_map_copy(prefix), isl_union_map_from_map(ge));\n  prefix_ge = isl_union_map_lexmin(prefix_ge);\n  prefix = isl_union_map_intersect(prefix, prefix_ge);\n  domain = isl_union_map_domain(prefix);\n\n  return domain;\n}\n\n/* Return constraints on the domain elements that not equate the partial schedule\n * of \"node\" to the lower bound of partial schedule. \n */\nstatic __isl_give isl_union_set *schedule_neq_lb(\n    __isl_keep isl_schedule_node *node)\n{\n  isl_union_set *uset, *domain;\n  isl_union_map *umap;\n\n  if (!node)\n    return NULL;\n\n  uset = schedule_eq_lb(node);\n  umap = isl_schedule_node_band_get_partial_schedule_union_map(node);\n  domain = isl_union_map_domain(umap);\n  uset = isl_union_set_subtract(domain, uset);\n\n  return uset;\n}\n\n/* Return constraints on the domain elements that equate the partial schedule\n * of \"node\" to the upper bound of partial schedule. \n */\nstatic __isl_give isl_union_set *schedule_eq_ub(\n    __isl_keep isl_schedule_node *node)\n{\n  /* Compute the prefix schedule, \n   * Build a relation that sets the demensions before the current band\n   * equal, and the current dim le. \n   * Intersect the relation with the schedule range.\n   * Apply the relation to the current prefix schedule range.\n   * Compute the lexmax of the range.\n   * Get the domain.\n   */\n  isl_schedule_node *child;\n  isl_union_map *prefix, *prefix_le;\n  int depth1, depth2;\n  isl_set *prefix_range;\n  isl_map *sched_identity, *le;\n  isl_union_set *domain;\n\n  if (!node)\n    return NULL;\n\n  depth1 = isl_schedule_node_get_schedule_depth(node);\n  child = isl_schedule_node_child(isl_schedule_node_copy(node), 0);\n  depth2 = isl_schedule_node_get_schedule_depth(child);\n  prefix = isl_schedule_node_get_prefix_schedule_relation(child);\n  isl_schedule_node_free(child);\n  prefix_range = isl_set_from_union_set(isl_union_map_range(isl_union_map_copy(prefix)));   \n  le = isl_map_lex_le(isl_set_get_space(prefix_range));  \n  /* Set the outer dims equal */\n  for (int i = 0; i < depth1; i++) {\n    le = isl_map_equate(le, isl_dim_in, i, isl_dim_out, i);\n  }\n  le = isl_map_intersect_domain(le, isl_set_copy(prefix_range));\n  le = isl_map_intersect_range(le, prefix_range);\n  prefix_le = isl_union_map_apply_range(isl_union_map_copy(prefix), isl_union_map_from_map(le));\n  prefix_le = isl_union_map_lexmax(prefix_le);\n  prefix = isl_union_map_intersect(prefix, prefix_le);\n  domain = isl_union_map_domain(prefix);\n\n  return domain;\n}\n\n/* Return constraints on the domain elements that not equate the partial schedule\n * of \"node\" to the upper bound of partial schedule. \n */\nstatic __isl_give isl_union_set *schedule_neq_ub(\n    __isl_keep isl_schedule_node *node)\n{\n  isl_union_set *uset, *domain, *sched_domain;\n  isl_union_map *umap;\n\n  if (!node)\n    return NULL;\n\n  uset = schedule_eq_ub(node);\n  domain = isl_schedule_node_get_domain(node);\n  umap = isl_schedule_node_band_get_partial_schedule_union_map(node);\n  umap = isl_union_map_intersect_domain(umap, domain);\n  sched_domain = isl_union_map_domain(umap);\n  uset = isl_union_set_subtract(sched_domain, uset);\n\n  return uset;\n}\n\n/* Internal struct used for add_io_copies_stmt_acc. */\nstruct add_io_copies_stmt_acc_data\n{\n  struct autosa_kernel *kernel;\n  struct autosa_array_ref_group *group;\n  struct autosa_stmt_access *ref;\n  struct autosa_array_tile *local_tile; /* Local buffer tile */\n  int n_lane;\n  int read;\n  char *stmt_name;\n  int insert_dependence;\n  struct autosa_hw_module *module;\n  int module_type; // 0 default 1 intra 1 inter\n};\n\n/* Create an IO statement. \n * \"io_group\" is the current I/O group that is analyzed.\n * \"local_tile\" is the tile that the current IO stmt accesses.\n * \"depth\" is the schedule depth that the current stmt is inserted at.\n */\nstatic __isl_give isl_multi_aff *autosa_create_io_access_stmt(\n    isl_ctx *ctx,\n    struct autosa_array_ref_group *local_group,\n    struct autosa_array_ref_group *io_group,\n    struct autosa_array_tile *tile,\n    int depth,\n    __isl_keep char *stmt_name)\n{\n  isl_space *space;\n  isl_id *id;\n  char buf[100];\n  struct autosa_array_ref_group_pair *pair =\n      (struct autosa_array_ref_group_pair *)malloc(\n          sizeof(struct autosa_array_ref_group_pair));\n  pair->local_group = local_group;\n  pair->io_group = io_group;\n  pair->local_tile = tile;\n  pair->in_use = 0;  \n  if (io_group->n_lane > 1 && io_group->local_array->array_type == AUTOSA_INT_ARRAY) {    \n    pair->simd_depth = depth;\n  } else {    \n    pair->simd_depth = -1;\n  }\n\n  space = isl_space_copy(io_group->array->space);\n  space = isl_space_from_range(space);\n  space = isl_space_add_dims(space, isl_dim_in, depth);\n  space = isl_space_wrap(space);\n  space = isl_space_map_from_set(space);\n\n  sprintf(buf, \"%s\", stmt_name);\n\n  id = isl_id_alloc(ctx, buf, pair);\n  id = isl_id_set_free_user(id, &free_group_pair);\n  space = isl_space_set_tuple_id(space, isl_dim_in, id);\n\n  return isl_multi_aff_identity(space);\n}\n\n/* Test if the array access \"ref\" is stride-0 or stride-1 under the current\n * schedule node.\n */\nstatic isl_bool is_acc_stride_one_at_node(\n    __isl_keep isl_schedule_node *node, struct autosa_stmt_access *ref)\n{\n  isl_union_set *domain;\n  isl_union_map *prefix;\n  isl_map *acc;\n  isl_bool is_zero = isl_bool_false, is_one = isl_bool_false;\n  \n  prefix = isl_schedule_node_get_prefix_schedule_union_map(node);\n\n  /* Scalar access */\n  if (ref->n_index == 0)\n    return isl_bool_true;\n\n  /* Transform the domain of access function to scheduling domains. */\n  acc = isl_map_copy(ref->access);\n  acc = isl_map_from_union_map(\n      isl_union_map_apply_domain(isl_union_map_from_map(acc), prefix));\n  is_one = access_is_stride_one(acc, ref->n_index - 1);\n\n  isl_map_free(acc);  \n  return is_one;\n}\n\n/* Insert the copy statement at the statement level.\n */\nstatic __isl_give isl_schedule_node *add_io_copies_stmt_acc_single(\n    __isl_take isl_schedule_node *node, void *user)\n{\n  struct add_io_copies_stmt_acc_data *data =\n      (struct add_io_copies_stmt_acc_data *)(user);\n  struct autosa_array_ref_group *group = data->group;\n  struct autosa_stmt_access *ref = data->ref;\n  char *stmt_name = data->stmt_name;\n  int read = data->read;\n  isl_union_set *uset, *empty_filter, *domain;\n  isl_set *set;\n  isl_space *space;\n  isl_id *id, *id2;\n  isl_ctx *ctx;\n  isl_union_map *access;\n  int empty;\n  struct autosa_array_tile *tile;\n  isl_multi_aff *ma, *from_access;\n  isl_multi_pw_aff *mpa;\n  isl_multi_union_pw_aff *mupa;\n  isl_schedule_node *graft;\n  int n_lane = data->n_lane;\n  int is_simd;\n  isl_id *hls_id;\n  isl_bool stride_one;\n  isl_bool insert_dependence = isl_bool_false;\n  isl_bool under_extension;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_leaf)\n    return node;  \n\n  /* Examine if the statement contains the access. */\n  uset = isl_schedule_node_get_domain(node);\n  if (isl_union_set_is_empty(uset)) {\n    isl_union_set_free(uset);\n    return node;\n  }\n\n  set = isl_set_from_union_set(isl_union_set_copy(uset));\n  space = isl_set_get_space(set);\n  isl_set_free(set);\n  id = isl_space_get_tuple_id(space, isl_dim_set);\n  isl_space_free(space);\n  space = isl_map_get_space(ref->access);\n  id2 = isl_space_get_tuple_id(space, isl_dim_in);\n  empty_filter = isl_union_set_empty(isl_union_set_get_space(uset));\n  isl_union_set_free(uset);\n  isl_space_free(space);\n\n  if (id != id2)\n  {\n    isl_id_free(id);\n    isl_id_free(id2);\n    node = isl_schedule_node_insert_filter(node, empty_filter);\n    return node;\n  }\n  isl_id_free(id);\n  isl_id_free(id2);\n  ctx = isl_schedule_node_get_ctx(node);\n  is_simd = is_node_under_simd(node);\n\n  /* S -> [D -> A] */\n  access = io_comm_access_ref(data->kernel, node, group, ref, read);\n  //DBGUMAP(stdout, access, isl_union_map_get_ctx(access))\n\n  empty = isl_union_map_is_empty(access);\n  if (empty < 0 || empty)\n  {\n    isl_union_map_free(access);\n    isl_union_set_free(empty_filter);\n    if (empty < 0)\n      return isl_schedule_node_free(node);\n    return node;\n  }\n\n  /* Update the stmt_name. */\n  if (data->insert_dependence)\n  {\n    isl_schedule_node *node2;\n\n    node2 = isl_schedule_node_copy(node);\n    if (n_lane >= 1 && is_simd)\n    {\n      //node2 = isl_schedule_node_parent(node);\n      while (!is_marked(node2, \"simd\")) {\n        node2 = isl_schedule_node_parent(node2);\n      }\n      node2 = isl_schedule_node_child(node2, 0);\n    }\n    /* Test if the access is stride one at the current loop. */\n    stride_one = is_acc_stride_one_at_node(node2, ref);\n    if (stride_one)\n    {\n      /* Test if the loop bound/n_lane > 1. \n       * If so, insert a hls_dep mark.\n       * Only do this when there is a single access in the group.\n       */\n      int *ubs = NULL;\n      isl_schedule_node *node_copy = isl_schedule_node_copy(node2);\n      if (is_simd) {\n        while (node_copy && isl_schedule_node_has_parent(node_copy)) {\n          if (is_marked(node_copy, \"simd\")) \n            break;\n          node_copy = isl_schedule_node_parent(node_copy);\n        }\n      }\n      while (node_copy && isl_schedule_node_has_parent(node_copy))\n      {\n        if (isl_schedule_node_get_type(node_copy) == isl_schedule_node_band)\n          break;\n        node_copy = isl_schedule_node_parent(node_copy);\n      }\n      if (isl_schedule_node_get_type(node_copy) == isl_schedule_node_band)\n      {\n        int n = isl_schedule_node_band_n_member(node_copy);     \n        ubs = extract_band_upper_bounds(node_copy);\n        if (ubs[n - 1] / n_lane > 1)\n        {\n          insert_dependence = isl_bool_true;\n          /* Update the stmt_name. */\n          int coalesce_depth;\n          int coalesce_bound;\n\n          //coalesce_depth = isl_schedule_node_get_schedule_depth(node_copy) - 1;\n          node_copy = isl_schedule_node_child(node_copy, 0);\n          coalesce_depth = isl_schedule_node_get_schedule_depth(node_copy) - 1;\n          coalesce_bound = ubs[n - 1] / n_lane;\n\n          isl_printer *p_str = isl_printer_to_str(ctx);\n          p_str = isl_printer_print_str(p_str, stmt_name);\n          p_str = isl_printer_print_str(p_str, \".\");\n          p_str = isl_printer_print_int(p_str, coalesce_depth);\n          p_str = isl_printer_print_str(p_str, \".\");\n          p_str = isl_printer_print_int(p_str, coalesce_bound);\n          free(stmt_name);\n          stmt_name = isl_printer_get_str(p_str);\n          isl_printer_free(p_str);\n        }\n      }\n      free(ubs);\n      isl_schedule_node_free(node_copy);\n    }\n    isl_schedule_node_free(node2);\n  }\n\n  from_access = autosa_create_io_access_stmt(\n      ctx, group, group, data->local_tile,\n      isl_schedule_node_get_schedule_depth(node), stmt_name);\n  free(stmt_name);\n\n  /* Create a register tiling. */\n  tile = create_register_tiling(node, group, ref);\n  ma = isl_multi_aff_copy(tile->tiling);\n  ma = isl_multi_aff_pullback_multi_aff(ma,\n                                        isl_multi_aff_copy(from_access));\n  mpa = isl_multi_pw_aff_from_multi_aff(ma);\n  mupa = isl_multi_union_pw_aff_from_multi_pw_aff(mpa);\n\n  /* [D -> A] */\n  domain = isl_union_map_range(access);\n  /* Only for read, we extend the access to a rectangular hull which helps to \n   * improve the memory coalescing. \n   */\n  if (read && !autosa_array_is_scalar(group->array))\n  {\n    isl_map *map;\n    isl_set *set;\n    set = isl_map_domain(isl_map_from_union_map(isl_union_set_unwrap(domain)));\n    map = group_tile_buffer(group, tile);\n    map = isl_map_intersect_domain(map, set);\n    domain = isl_union_set_from_set(isl_map_wrap(map));\n  }\n\n  /* read.fifoX[D -> A] */\n  domain = isl_union_set_preimage_multi_aff(domain, from_access);\n  /* read.fifoX[D -> A] -> D */\n  access = isl_union_set_wrapped_domain_map(domain);\n  /* D -> read.fifoX[D -> A] */\n  access = isl_union_map_reverse(access);\n  access = isl_union_map_coalesce(access);\n\n  graft = isl_schedule_node_from_extension(access);\n  graft = isl_schedule_node_child(graft, 0);\n  graft = isl_schedule_node_insert_partial_schedule(graft, mupa);\n\n  /* If the current statement is under the SIMD loop, we will add a filter \n   * to only transfer the data at one loop since we will later insert a \n   * statement to handle the data transfer of the entire SIMD loop.\n   */\n  if (data->kernel->options->autosa->isl_sink) {\n    if (n_lane >= 1 && is_simd)\n    {\n      /* The loop above is the SIMD loop.\n       * Check the node is below the simd mark. \n       */\n      int n_index;\n      int tile_size[1];\n      isl_id *id;\n      isl_printer *p_str;\n      isl_union_map *umap;\n      isl_union_set *filter;\n      /* Create a filter. */    \n      node = isl_schedule_node_parent(node);\n      if (data->read)\n        filter = schedule_eq_lb(node);\n      else\n        filter = schedule_eq_ub(node);\n      node = isl_schedule_node_insert_filter(node, filter);\n      node = isl_schedule_node_child(node, 0);\n      node = isl_schedule_node_child(node, 0);\n    }\n  }\n\n  /* Insert a \"pipeline\" mark under the band node. */\n  hls_id = isl_id_alloc(ctx, \"hls_pipeline\", NULL);\n  graft = isl_schedule_node_child(graft, 0);\n  graft = isl_schedule_node_insert_mark(graft, hls_id);\n  graft = isl_schedule_node_parent(graft);\n\n  if (insert_dependence)\n  {\n    char *mark_name;\n    isl_id *id;\n    isl_printer *p_str = isl_printer_to_str(ctx);\n    p_str = isl_printer_print_str(p_str, \"hls_dependence.\");\n    p_str = autosa_array_ref_group_print_name(group, p_str);\n    mark_name = isl_printer_get_str(p_str);\n    isl_printer_free(p_str);\n    id = isl_id_alloc(ctx, mark_name, NULL);\n    graft = isl_schedule_node_child(graft, 0);\n    graft = isl_schedule_node_child(graft, 0);\n    graft = isl_schedule_node_insert_mark(graft, id);\n    free(mark_name);\n  }\n\n  while (graft && isl_schedule_node_has_parent(graft))\n    graft = isl_schedule_node_parent(graft);\n\n  node = isl_schedule_node_graft_before(node, graft);\n  node = isl_schedule_node_insert_filter(node, empty_filter);\n  node = isl_schedule_node_parent(node);\n  node = isl_schedule_node_parent(node);\n  node = isl_schedule_node_parent(node);  \n\n  autosa_array_tile_free(tile);\n\n  return node;\n}\n\nstatic __isl_give isl_schedule_node *modify_simd_loop(\n  __isl_take isl_schedule_node *node, void *user)\n{\n  struct add_io_copies_stmt_acc_data *data =\n      (struct add_io_copies_stmt_acc_data *)(user);\n  if (data->n_lane >= 1 && is_marked(node, \"simd\")) {\n    int n_index;\n    int tile_size[1];\n    isl_id *id;\n    isl_printer *p_str;\n    isl_union_map *umap;\n    isl_union_set *filter;\n    isl_union_set *domain;\n\n    node = isl_schedule_node_child(node, 0);\n    /* Test if the domain is empty. */\n    domain = isl_schedule_node_get_domain(node);\n    if (isl_union_set_is_empty(domain)) {\n      isl_union_set_free(domain);\n      node = isl_schedule_node_parent(node);\n      return node;  \n    }\n    isl_union_set_free(domain);\n\n    if (data->read)\n      filter = schedule_eq_lb(node);\n    else\n      filter = schedule_eq_ub(node);\n    node = isl_schedule_node_insert_filter(node, filter);\n    node = isl_schedule_node_parent(node);\n  }\n  return node;\n}\n\n/* Add copies at the stmt level for each array reference in the \"group\" \n * in the I/O modules.\n * \n * \"group\" is an I/O group.\n * \"read\" denotes if copy-in or copy-out from/to the external memory.\n * \"in\" denotes the fifo direction.\n * \"insert_dependence\" determines if it is necessary to insert a hls dependence mark.\n */\n__isl_give isl_schedule_node *add_io_copies_stmt_acc(\n  struct autosa_kernel *kernel,\n  struct autosa_array_ref_group *group,\n  __isl_take isl_schedule_node *node,\n  struct autosa_array_tile *tile, /* local tile */\n  int n_lane,\n  int read,\n  __isl_take char *stmt_name,\n  int before,\n  int insert_dependence,\n  struct autosa_hw_module *module,\n  int module_type)\n{\n  struct add_io_copies_stmt_acc_data data = {\n      kernel, group, NULL, tile, n_lane, read, stmt_name,\n      insert_dependence && group->n_ref == 1, module, module_type};\n\n  for (int i = 0; i < group->n_ref; i++)\n  {\n    struct autosa_stmt_access *ref = group->refs[i];\n    data.ref = ref;\n    //DBGMAP(stdout, ref->access, kernel->ctx)    \n    if ((read && ref->read) || (!read && ref->write)) {\n      node = isl_schedule_node_map_descendant_bottom_up(\n          node, &add_io_copies_stmt_acc_single, &data);\n    }\n  }\n//#ifndef ISL_SINK  \n  /* Modify the SIMD loop.\n   * If the current statement is under the SIMD loop, we will add a filter \n   * to only transfer the data at one loop since we will later insert a \n   * statement to handle the data transfer of the entire SIMD loop.   \n   */\n  if (!kernel->options->autosa->isl_sink) {\n    node = isl_schedule_node_map_descendant_bottom_up(node, &modify_simd_loop, &data);\n  }\n//#endif  \n\n  return node;\n}\n\n/* Insert the copy statement at the node level to transfer the entire tie.\n * If \"is_buffer\" is set, add a marker for dependence false. This is\n * only for Xilinx platform.\n */\nstatic __isl_give isl_schedule_node *add_io_copies_stmt_tile(\n  struct autosa_kernel *kernel,\n  struct autosa_array_ref_group *group,\n  __isl_take isl_schedule_node *node,\n  struct autosa_array_tile *local_tile, /* Local buffer */\n  struct autosa_array_tile *tile,       /* The tile to be copied */  \n  int n_lane,\n  int read,\n  __isl_take char *stmt_name,\n  int before, int is_buffer,\n  /* If it is proper to insert hls_pipeline for Xilinx platforms. */\n  int insert_dependence,\n  /* If needs to insert a access_serialize mark. */\n  int insert_serialize,\n  struct autosa_hw_module *module,\n  int module_type,\n  TPArrayTile *tuning_tile\n) {\n  isl_union_map *access = NULL;\n  int empty;\n  isl_multi_aff *from_access;\n  isl_multi_aff *ma;\n  isl_multi_pw_aff *mpa;\n  isl_multi_union_pw_aff *mupa;\n  isl_union_set *domain;\n  isl_schedule_node *graft;\n  int n;\n  isl_id *id;\n  isl_ctx *ctx = kernel->ctx;\n  int coalesce_depth;\n  int coalesce_bound;\n  isl_val *coalesce_bound_val;  \n  \n  access = io_comm_access(kernel, node, group, read);\n\n  empty = isl_union_map_is_empty(access);\n  if (empty < 0 || empty)\n  {\n    isl_union_map_free(access);\n    if (empty < 0)\n      return isl_schedule_node_free(node);\n    return node;\n  }\n\n  from_access = autosa_create_io_access_stmt(kernel->ctx, group, group,\n                                             local_tile, isl_schedule_node_get_schedule_depth(node), stmt_name);\n\n  ma = isl_multi_aff_copy(tile->tiling);  \n  ma = isl_multi_aff_pullback_multi_aff(ma,\n                                        isl_multi_aff_copy(from_access));\n  mpa = isl_multi_pw_aff_from_multi_aff(ma);\n  mupa = isl_multi_union_pw_aff_from_multi_pw_aff(mpa);\n\n  domain = isl_union_map_range(access);\n  /* Restrain the buffer to the local tile size. */\n  if (!autosa_array_is_scalar(group->array))\n  {\n    isl_map *map;\n    isl_set *set;\n    set = isl_map_domain(isl_map_from_union_map(isl_union_set_unwrap(domain)));\n    map = group_tile_buffer(group, tile);\n    map = isl_map_intersect_domain(map, set);\n    domain = isl_union_set_from_set(isl_map_wrap(map));\n  }\n\n  domain = isl_union_set_preimage_multi_aff(domain, from_access);\n  access = isl_union_set_wrapped_domain_map(domain);\n  access = isl_union_map_reverse(access);\n  access = isl_union_map_coalesce(access);\n\n  graft = isl_schedule_node_from_extension(access);\n  graft = isl_schedule_node_child(graft, 0);\n  graft = isl_schedule_node_insert_partial_schedule(graft, mupa);\n\n  /* Split off the last dimension. */\n  n = isl_schedule_node_band_n_member(graft);\n  if (n > 1)\n  {\n    graft = isl_schedule_node_band_split(graft, n - 1);\n    graft = isl_schedule_node_child(graft, 0);\n  }\n\n  /* Insert a coalesce mark indicating the loop below could be used for\n   * memory coalescing.\n   */\n  id = isl_id_alloc(ctx, \"access_coalesce\", NULL);\n  graft = isl_schedule_node_insert_mark(graft, id);\n  graft = isl_schedule_node_child(graft, 0);\n\n  if (insert_serialize) {\n    id = isl_id_alloc(ctx, \"access_serialize\", NULL);\n    graft = isl_schedule_node_insert_mark(graft, id);\n    graft = isl_schedule_node_child(graft, 0);\n  }\n\n  if (kernel->options->autosa->tuning_method == 1) {\n    /* Insert the buffer informaton */\n    id = isl_id_alloc(ctx, \"tuning_array_tile\", tuning_tile);\n    graft = isl_schedule_node_insert_mark(graft, id);\n    graft = isl_schedule_node_child(graft, 0);\n  }\n\n  if (group->local_array->is_sparse) {\n    n_lane *= (kernel->n_nzero * kernel->compress_ratio);\n  }\n\n  if (n_lane > 1) {\n    /* Peform data packing. \n     * We will tile the last dimension by the factor of data packing.\n     * Then we insert a filter to transfer data only once.\n     */\n    int tile_size[1];\n    isl_id *id;\n    isl_printer *p_str;\n    isl_union_map *umap;\n    isl_union_set *filter;\n    int depth;\n\n    /* Tile the last dimension. */\n    tile_size[0] = n_lane;\n    graft = autosa_tile_band(graft, tile_size);\n    graft = isl_schedule_node_child(graft, 0);\n    /* Create a filter. */\n    filter = schedule_eq_lb(graft);\n    graft = isl_schedule_node_insert_filter(graft, filter);\n    /* Move to the tile loop */\n    graft = isl_schedule_node_parent(graft);\n  }\n  free(stmt_name);\n  /* Insert a \"pipeline\" mark inside the band node. */\n  id = isl_id_alloc(ctx, \"hls_pipeline\", NULL);\n\n  graft = isl_schedule_node_child(graft, 0);\n  graft = isl_schedule_node_insert_mark(graft, id);\n  graft = isl_schedule_node_parent(graft);\n\n  if (is_buffer && !read && insert_dependence)\n  {\n    // TODO: should not be inter_trans or intra_trans.\n    // TODO: only add this pragma for io_transfer statement which requires data packing.\n    /* Insert a \"dependence\" mark. \n     * This is not safe. Currently only insert the mark when there is at least \n     * one level of coalesce loop (coalesce_bound > 1) and\n     * when data_pack does not equal to the nxt_data_pack. \n     */\n    char *mark_name;\n    isl_printer *p_str = isl_printer_to_str(ctx);\n    p_str = isl_printer_print_str(p_str, \"hls_dependence.\");\n    p_str = autosa_array_ref_group_print_name(group, p_str);\n    mark_name = isl_printer_get_str(p_str);\n    isl_printer_free(p_str);\n    id = isl_id_alloc(ctx, mark_name, NULL);\n    graft = isl_schedule_node_child(graft, 0);\n    graft = isl_schedule_node_child(graft, 0);\n    graft = isl_schedule_node_insert_mark(graft, id);\n    free(mark_name);\n  }\n\n  while (graft && isl_schedule_node_has_parent(graft))\n    graft = isl_schedule_node_parent(graft);\n\n  //DBGSCHDNODE(stdout, graft, isl_schedule_node_get_ctx(graft));\n  //DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n\n  if (before)\n  {\n    node = isl_schedule_node_graft_before(node, graft);\n  }\n  else\n  {\n    node = isl_schedule_node_graft_after(node, graft);\n  }\n\n  return node;\n}\n\n/* Set all the module io dims equals to the module identifier above the io_level.\n * If the module is a filter, set the io dim greater or equal than the \n * identifier at the io_level.\n * If the module is connect to pe, set the level 1 io dim equal to the lb/ub.\n * The node should point to the \"array\" mark.\n */\nstatic __isl_give isl_schedule_node *add_io_ids_filter(\n  __isl_take isl_schedule_node *node, \n  __isl_keep isl_id_list *io_ids,  \n  int io_level, int n_io_ids, int is_filter, int to_pe, int read)\n{\n  isl_union_set *core;\n  int io_id = 0;\n\n  core = isl_union_set_universe(isl_schedule_node_get_domain(node));\n  //for (int i = n_io_ids + 1; i >= io_level; i--) {\n  for (int i = io_level + n_io_ids - 1; i >= io_level; i--) {\n    node = autosa_tree_move_down_to_io_mark(node, core, i);\n    node = isl_schedule_node_parent(node);\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band) {\n      isl_id *id;\n      isl_id_list *ids;\n      isl_union_set *uset;\n\n      ids = isl_id_list_from_id(isl_id_list_get_id(io_ids, io_id));\n      if (io_id == n_io_ids - 1) {\n        if (is_filter)\n          uset = set_schedule_ge(node, ids);\n        else\n          uset = set_schedule_eq(node, ids);\n      } else {\n        uset = set_schedule_eq(node, ids);\n      }\n      io_id++;\n      node = isl_schedule_node_insert_filter(node, uset);\n      isl_id_list_free(ids);\n    }\n  }\n  if (to_pe && io_level > 1)\n  {\n    /* Add filter to only send data to boundary PEs. */\n    while (!isl_schedule_node_is_io_mark(node, 2)) {\n      node = isl_schedule_node_child(node, 0);\n    }\n    node = isl_schedule_node_child(node, 0);\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band) {\n      isl_union_set *uset;\n\n      if (read)\n        uset = schedule_eq_lb(node);\n      else\n        uset = schedule_eq_ub(node);\n      node = isl_schedule_node_insert_filter(node, uset);\n      node = isl_schedule_node_child(node, 0);\n    }\n  }\n\n  isl_union_set_free(core);\n\n  return node; \n}\n\nstatic __isl_give isl_printer *print_io_stmt_prefix(\n  __isl_take isl_printer *p,\n  int read, int dummy, int reduce,\n  struct autosa_array_ref_group *group)\n{\n  /* io_type */\n  p = isl_printer_print_str(p, read ? \"in\" : \"out\");\n  if (dummy)\n    p = isl_printer_print_str(p, \"_dummy\");\n  if (reduce)\n    p = isl_printer_print_str(p, \"_reduce\");\n  \n  /* fifo_name */\n  p = isl_printer_print_str(p, \".\");\n  if (group->group_type != AUTOSA_PE_GROUP)\n  {\n    p = isl_printer_print_str(p, \"fifo_\");\n  }\n  p = isl_printer_print_str(p, group->array->name);\n  if (group->group_type == AUTOSA_IO_GROUP)\n  {\n    if (group->local_array->n_io_group > 1)\n    {\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_int(p, group->nr);\n    }\n  }\n  else if (group->group_type == AUTOSA_DRAIN_GROUP)\n  {\n    p = isl_printer_print_str(p, \"_\");\n    p = isl_printer_print_str(p, \"drain\");\n  }\n\n  /* cur_data_pack */\n  p = isl_printer_print_str(p, \".\");\n  p = isl_printer_print_int(p, group->n_lane);\n\n  /* next_data_pack */\n  p = isl_printer_print_str(p, \".1\");\n\n  return p;\n}\n\n/* Print the io transfer statement prefix in the format of:\n * in/out_trans[_dram]/[_dram_serialize]/[_boundary]/[_reduce_[op]].\n * [in_fifo_name].[out_fifo_name].[is_buffer].[cur_pack_lane].[nxt_pack_lane].\n * [coalesce_depth].[coalesce_bound].[if_branch_depth]\n */\nstatic __isl_give isl_printer *print_io_trans_stmt_prefix(\n  __isl_take isl_printer *p, \n  int read, int to_mem, int serialize, int boundary, int reduce,\n  char *reduce_op,\n  int in_local, int out_local,\n  int is_buffer,\n  char *fifo_suffix, int n_lane) \n{\n  /* io_trans_type */\n  p = isl_printer_print_str(p, read ? \"in_trans\" : \"out_trans\");\n  if (to_mem) {\n    p = isl_printer_print_str(p, \"_dram\");\n    if (serialize)\n      p = isl_printer_print_str(p, \"_serialize\");\n  }\n  if (boundary)\n    p = isl_printer_print_str(p, \"_boundary\");\n  if (reduce) {\n    p = isl_printer_print_str(p, \"_reduce_\");\n    p = isl_printer_print_str(p, reduce_op);\n  }\n\n  /* in_fifo_name */\n  p = isl_printer_print_str(p, \".\");\n  p = isl_printer_print_str(p, fifo_suffix);\n  if (in_local)\n    p = isl_printer_print_str(p, \"_local\");\n\n  /* out_fifo_name */\n  p = isl_printer_print_str(p, \".\");\n  p = isl_printer_print_str(p, fifo_suffix);\n  if (out_local)\n    p = isl_printer_print_str(p, \"_local\");  \n\n  /* is_buffer */\n  p = isl_printer_print_str(p, is_buffer == 0 ? \".0\" : \".1\");\n\n  /* cur_pack_lane */\n  p = isl_printer_print_str(p, \".\");\n  p = isl_printer_print_int(p, n_lane);  \n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_trans_stmt_coalesce(\n    __isl_take isl_printer *p,\n    __isl_keep isl_schedule_node *node,\n    struct autosa_io_buffer *buf,\n    int *coalesce_bound,\n    int n_lane\n    ) \n{\n  int coalesce_depth;\n  isl_val *coalesce_bound_val;\n  \n  coalesce_depth = isl_schedule_node_get_schedule_depth(node) + buf->tile->n - 1;\n  /* If the host serialization is enabled, we extend the coalesce bound to the \n   * entire buffer. Otherwise, only the last dimension is considered.\n   */    \n  if (buf->serialize) {\n    coalesce_bound_val = isl_val_copy(buf->tile->bound[buf->tile->n - 1].size);  \n    for (int i = 0; i < buf->tile->n - 1; i++) {\n      coalesce_bound_val = isl_val_mul(isl_val_copy(buf->tile->bound[i].size), \n                                       coalesce_bound_val);    \n    }    \n    if (buf->sparse) {\n      *coalesce_bound = isl_val_get_num_si(coalesce_bound_val) / (n_lane * buf->vec_len);      \n    } else {\n      *coalesce_bound = isl_val_get_num_si(coalesce_bound_val) / n_lane;      \n    }    \n    isl_val_free(coalesce_bound_val);\n  } else {\n    coalesce_bound_val = buf->tile->bound[buf->tile->n - 1].size;  \n    *coalesce_bound = isl_val_get_num_si(coalesce_bound_val) / n_lane;        \n  }\n  if (*coalesce_bound <= 1)\n    coalesce_depth = -1;\n\n  p = isl_printer_print_str(p, \".\");\n  p = isl_printer_print_int(p, coalesce_depth);\n  p = isl_printer_print_str(p, \".\");\n  p = isl_printer_print_int(p, *coalesce_bound);\n\n  return p;\n}\n\nstatic __isl_give isl_union_set *compute_io_group_access_domain(\n  __isl_keep isl_schedule_node *node,\n  struct autosa_array_ref_group *group,\n  struct autosa_kernel *kernel,\n  int read\n){\n  isl_union_map *group_access;\n  isl_union_set *group_domain;\n  isl_union_map *prefix;\n  isl_schedule_node *node_tmp;\n\n  node_tmp = isl_schedule_node_copy(node);\n  node_tmp = autosa_tree_move_up_to_kernel(node_tmp);\n  group_access = autosa_io_group_access_relation(group, kernel, read, !read);    \n  if (kernel->array_part_w > 0) {\n    /* Remove the local accesses below the array level. */\n    node_tmp = autosa_tree_move_down_to_array(node_tmp, kernel->core);\n    prefix = isl_schedule_node_get_prefix_schedule_relation(node_tmp);\n    prefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix,\n                                                              isl_union_pw_multi_aff_copy(kernel->contraction));\n    if (group->local_array->array_type == AUTOSA_INT_ARRAY)\n      group_access = remove_local_accesses_group_flow(kernel, group, group_access, prefix, read);  \n    isl_union_map_free(prefix);\n  }\n  isl_schedule_node_free(node_tmp);\n\n  group_domain = isl_union_map_domain(group_access);\n  group_domain = isl_union_set_coalesce(group_domain);\n\n  return group_domain;  \n}\n\n/* Compute the iteration domain used by the io_group and add the \n * domain as a filter at the top of the schedule tree.\n */\nstatic __isl_give isl_schedule_node *insert_io_group_access_domain(\n  __isl_take isl_schedule_node *node, \n  struct autosa_array_ref_group *group,\n  struct autosa_kernel *kernel,\n  int read)\n{\n  isl_union_set *group_domain;\n  group_domain = compute_io_group_access_domain(node, group, kernel, read);  \n  node = isl_schedule_node_insert_filter(node, group_domain);\n  return node;\n}\n\nstatic __isl_give isl_union_set *compute_io_group_access_domain_local_reduce(\n  __isl_keep isl_schedule_node *node,\n  struct autosa_array_ref_group *group,\n  struct autosa_kernel *kernel,\n  int read, int io_group, int drain_group)\n{\n  isl_union_map *group_access;\n  isl_union_set *group_domain;\n  isl_union_map *prefix;\n  isl_schedule_node *node_tmp;\n\n  node_tmp = isl_schedule_node_copy(node);\n  group_access = isl_union_map_empty(isl_map_get_space(group->access));\n\n  if (io_group) {\n    struct autosa_array_ref_group *cur_group = group;\n    group_access = isl_union_map_union(group_access,\n                                       autosa_io_group_access_relation(cur_group, kernel, read, !read));  \n    /* Remove the local accesses below the array level. */  \n    node_tmp = autosa_tree_move_up_to_kernel(node_tmp);  \n    node_tmp = autosa_tree_move_down_to_array(node_tmp, kernel->core);\n    prefix = isl_schedule_node_get_prefix_schedule_relation(node_tmp);\n    prefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix,\n                                                              isl_union_pw_multi_aff_copy(kernel->contraction));\n    if (group->local_array->array_type == AUTOSA_INT_ARRAY)\n      group_access = remove_local_accesses_group_flow(kernel, cur_group, group_access, prefix, read);  \n    isl_union_map_free(prefix);                                                                \n  }\n  if (drain_group) {\n    struct autosa_array_ref_group *cur_group = group->attached_drain_group;\n    group_access = isl_union_map_union(group_access,\n                                       autosa_io_group_access_relation(cur_group, kernel, read, !read));\n  }\n  isl_schedule_node_free(node_tmp);\n\n  group_domain = isl_union_map_domain(group_access);\n  group_domain = isl_union_set_coalesce(group_domain);\n\n  return group_domain;  \n}\n\n/* Compute the iteration domain used by the io_group and add the \n * domain as a filter at the top of the schedule tree.\n * If io_group is one, consider io_group domain.\n * If drain_group is one, consider the attached drain group domain.\n */\nstatic __isl_give isl_schedule_node *insert_io_group_access_domain_local_reduce(\n  __isl_take isl_schedule_node *node, \n  struct autosa_array_ref_group *group,\n  struct autosa_kernel *kernel,\n  int read, int io_group, int drain_group)\n{\n  isl_union_set *group_domain;\n  group_domain = compute_io_group_access_domain_local_reduce(node, group, kernel, read, io_group, drain_group);\n  node = isl_schedule_node_insert_filter(node, group_domain);  \n  return node;\n}\n\n/* Insert a filter node that filters the valid access domain of the current\n * io group. The \"node\" should point to the \"kernel\" mark, and will be returned \n * at the \"kernel\" mark.\n */\n__isl_give isl_schedule_node *insert_io_group_domain(\n  __isl_take isl_schedule_node *node, \n  struct autosa_array_ref_group *group,\n  struct autosa_kernel *kernel,\n  struct autosa_gen *gen,\n  int read)\n{\n  node = isl_schedule_node_child(node, 0); // context\n  if (gen->options->autosa->local_reduce && group->attached_drain_group) \n    node = insert_io_group_access_domain_local_reduce(node, group, kernel, read, 0, 1);\n  else\n    node = insert_io_group_access_domain(node, group, kernel, read);\n  node = autosa_tree_move_up_to_kernel(node);\n\n  return node;\n}\n\nstatic __isl_give isl_union_set *compute_io_group_domain(\n  __isl_keep isl_schedule_node *node, \n  struct autosa_array_ref_group *group,\n  struct autosa_kernel *kernel,\n  struct autosa_gen *gen,\n  int read)\n{\n  isl_union_set *domain;\n  node = autosa_tree_move_down_to_kernel(isl_schedule_node_copy(node));\n  if (gen->options->autosa->local_reduce && group->attached_drain_group)\n    domain = compute_io_group_access_domain_local_reduce(node, group, kernel, read, 1, 1);\n  else\n    domain = compute_io_group_access_domain(node, group, kernel, read);\n  isl_schedule_node_free(node);\n\n  return domain;\n}\n\n/* Compute the minimal group domain to filter the elements at the io_level \"level.\n * The original group domain is first inserted at root.\n * Then, we compute the prefix schedule down to the io_level \"level\".\n * Next, we derive the range of the prefix schedule, and compute the \n * reverse elements that are required for this range set.\n */\nstatic __isl_give isl_union_set *compute_io_group_domain_at_level(\n  __isl_keep isl_union_set *group_domain,\n  __isl_keep isl_schedule_node *node,\n  struct autosa_array_ref_group *group,\n  struct autosa_kernel *kernel,\n  int level\n){\n  isl_union_map *prefix, *filter_prefix;\n  isl_union_set *filter_range, *filter_domain;\n  \n  node = autosa_tree_move_down_to_io_mark(isl_schedule_node_copy(node), kernel->core, level);\n  prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n\n  node = isl_schedule_node_insert_filter(node, isl_union_set_copy(group_domain));\n  node = isl_schedule_node_child(node, 0);\n  filter_prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n  isl_schedule_node_free(node);\n  filter_range = isl_union_map_range(filter_prefix);\n  prefix = isl_union_map_reverse(prefix);\n  filter_domain = isl_union_set_apply(filter_range, prefix);\n\n  return filter_domain;\n}\n\n/* Extend the group domain so that the domain sets include elements that are\n * lexicographically less or equal to the IO band at the io_level \"level\".\n */\nstatic __isl_give isl_union_set *extend_io_group_domain(\n  __isl_take isl_union_set *group_domain,\n  __isl_keep isl_schedule_node *node,\n  struct autosa_array_ref_group *group,\n  struct autosa_kernel *kernel,\n  int level\n){\n//#ifdef _DEBUG\n//  DBGUSET(stdout, group_domain, isl_schedule_node_get_ctx(node));\n//#endif\n  isl_union_map *prefix;\n  isl_set *group_range, *all_range;\n  isl_map *ge;\n\n  /* Get the all range */\n  node = autosa_tree_move_down_to_io_mark(isl_schedule_node_copy(node), kernel->core, level);  \n  prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n  all_range = isl_set_from_union_set(isl_union_map_range(isl_union_map_copy(prefix)));\n\n  //node = isl_schedule_node_insert_filter(node, isl_union_set_copy(group_domain));\n  //node = isl_schedule_node_child(node, 0);\n  //prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n  isl_schedule_node_free(node);\n  group_range = isl_set_from_union_set(isl_union_set_apply(group_domain, isl_union_map_copy(prefix)));\n//#ifdef _DEBUG\n//  DBGSET(stdout, group_range, kernel->ctx);\n//#endif\n  ge = isl_map_lex_ge(isl_set_get_space(group_range));\n  /* Set the dimensions except the last one as equal */\n  for (int i = 0; i < isl_set_dim(group_range, isl_dim_set) - 1; i++) {\n    ge = isl_map_equate(ge, isl_dim_in, i, isl_dim_out, i);\n  }\n  ge = isl_map_intersect_domain(ge, isl_set_copy(all_range));\n  ge = isl_map_intersect_range(ge, all_range);\n//#ifdef _DEBUG\n//  DBGMAP(stdout, ge, kernel->ctx);\n//#endif\n  group_range = isl_set_apply(group_range, ge);\n  group_range = isl_set_coalesce(group_range);\n//#ifdef _DEBUG\n//  DBGSET(stdout, group_range, kernel->ctx);\n//#endif  \n  prefix = isl_union_map_reverse(prefix);\n  group_domain = isl_union_set_apply(isl_union_set_from_set(group_range), prefix);\n\n  return group_domain;\n} \n\nstatic __isl_give isl_schedule_node *insert_io_stmts_acc(\n  __isl_take isl_schedule_node *node,\n  int nxt_data_pack,\n  __isl_take isl_printer *p,\n  struct autosa_kernel *kernel, \n  struct autosa_array_ref_group *group,\n  struct autosa_io_buffer *buf, /* Local buffer */\n  int read, int is_buffer, \n  struct autosa_hw_module *module,\n  int module_type\n)\n{\n  char *stmt_name;\n\n  p = isl_printer_print_str(p, \".\");\n  p = isl_printer_print_int(p, nxt_data_pack);\n  stmt_name = isl_printer_get_str(p);\n  isl_printer_free(p);\n\n  int insert_hls_dep = is_buffer && !read && \n                       buf->n_lane != nxt_data_pack && \n                       kernel->options->autosa->insert_hls_dependence;\n\n  node = add_io_copies_stmt_acc(kernel, group, node,\n                                buf->tile, nxt_data_pack, read, stmt_name, read ? 1 : 0,\n                                insert_hls_dep, module, module_type);\n\n  return node;\n}\n\nstatic __isl_give isl_schedule_node *insert_io_stmts_tile(\n    __isl_take isl_schedule_node *node,    \n    int nxt_data_pack,\n    __isl_take isl_printer *p,\n    struct autosa_kernel *kernel, \n    struct autosa_array_ref_group *group,\n    //struct autosa_io_buffer *buf,\n    struct autosa_io_buffer *local_buffer,      /* local buffer */\n    struct autosa_io_buffer *copy_buffer,       /* buffer to be transferred */\n    int read, int is_buffer,\n    struct autosa_hw_module *module,\n    int cut, /* If to cut the sub tree */\n    int module_type,\n    int if_depth /* If branch sched depth */\n)\n{\n  char *stmt_name;\n  int coalesce_bound;\n\n  p = isl_printer_print_str(p, \".\");\n  p = isl_printer_print_int(p, nxt_data_pack);  \n  \n  p = print_trans_stmt_coalesce(p, node, copy_buffer, &coalesce_bound, nxt_data_pack);   \n  module->coalesce_bound = coalesce_bound;\n  \n  if (if_depth != -1) {\n    p = isl_printer_print_str(p, \".\");\n    p = isl_printer_print_int(p, if_depth);\n  }\n\n  stmt_name = isl_printer_get_str(p);\n  isl_printer_free(p);\n\n  int insert_hls_dep = coalesce_bound > 1 && \n                       copy_buffer->n_lane != nxt_data_pack && \n                       kernel->options->autosa->insert_hls_dependence;  \n\n  node = add_io_copies_stmt_tile(kernel, group, node,\n                                 local_buffer->tile? local_buffer->tile : NULL, copy_buffer->tile, \n                                 nxt_data_pack,\n                                 //local_buffer? local_buffer->n_lane : nxt_data_pack,\n                                 read, stmt_name, read ? 1 : 0,\n                                 //nxt_data_pack, read, stmt_name, read ? 1 : 0,\n                                 is_buffer & 0,\n                                 insert_hls_dep,\n                                 module->is_serialized,\n                                 module, module_type, copy_buffer->tuning_tile);\n\n  //DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n  \n  if (cut) {\n    node = isl_schedule_node_cut(node);\n    /* Insert empty filter. */\n    isl_union_set *empty_filter = isl_union_set_from_set(isl_set_empty(\n          isl_set_get_space(kernel->context)));\n    node = isl_schedule_node_insert_filter(node, empty_filter);\n  }  \n\n  return node;\n}\n\nstatic __isl_give isl_schedule_node *insert_filter_trans_stmts(\n  __isl_take isl_schedule_node *node,\n  isl_id_list *io_ids,\n  int io_id_level,\n  int io_level,\n  int read,\n  struct autosa_io_buffer *buf,\n  struct autosa_hw_module *module,\n  struct autosa_kernel *kernel,\n  struct autosa_gen *gen,\n  int boundary, int is_lower,\n  int is_buffer,\n  char *fifo_suffix,\n  struct autosa_array_ref_group *group,\n  __isl_keep isl_union_set *group_core,\n  int module_type\n)\n{\n  isl_id_list *ids;\n  isl_union_set *eq_filter, *neq_filter;\n  isl_ctx *ctx;\n  isl_printer *p;\n  int upper_io_level;\n  int lower_if = gen->options->autosa->lower_if_branch;\n  int if_depth = -1;\n  \n  ctx = isl_schedule_node_get_ctx(node);\n  if (io_id_level < 0) {\n    /* This is the highest-level module that also connects to the DRAM.\n     * Filter node is not required, since all data belongs to this module.\n     */\n    if (boundary == 0) {\n      return isl_schedule_node_free(node);\n    } else {\n      node = autosa_tree_move_down_to_io_mark(node, group_core, buf->level);\n      node = isl_schedule_node_child(node, 0);\n      goto INSERT_STMT;\n    }\n  }\n\n  if (lower_if) {\n    /* Lower the if branch inside the user statement. */\n    node = autosa_tree_move_down_to_io_mark(node, group_core, io_level);\n    if_depth = isl_schedule_node_get_schedule_depth(node) -  1;\n\n    node = autosa_tree_move_down_to_io_mark(node, group_core, buf->level);\n    node = isl_schedule_node_child(node, 0);\n    goto INSERT_STMT;\n  }\n\n  node = autosa_tree_move_down_to_io_mark(node, group_core, io_level);\n  node = isl_schedule_node_parent(node);\n  ids = isl_id_list_from_id(isl_id_list_get_id(io_ids, io_id_level));\n  eq_filter = set_schedule_eq(node, ids);  \n  isl_id_list_free(ids);\n  \n  upper_io_level = io_level + 1;\n  node = autosa_tree_move_down_to_io_mark(node, group_core, io_level);  \n  node = isl_schedule_node_child(node, 0);  \n  node = isl_schedule_node_order_before(node, eq_filter); // point to the second tree.    \n\n  /* Pass the data not filtered */  \n  if (boundary) {\n    isl_union_set *empty_filter = isl_union_set_from_set(isl_set_empty(isl_set_get_space(kernel->context)));\n    node = isl_schedule_node_cut(node);\n    node = isl_schedule_node_insert_filter(node, empty_filter);\n  } else {\n    if (io_level != buf->level) {\n      node = autosa_tree_move_down_to_io_mark(node, group_core, buf->level);\n      node = isl_schedule_node_child(node, 0);\n    }\n    p = isl_printer_to_str(ctx);\n    p = print_io_trans_stmt_prefix(\n          p, read, module->to_mem, gen->options->autosa->host_serialize, boundary, 0, NULL,\n          0, 0, 0, fifo_suffix, buf->n_lane);    \n    if (!buf->tile) {\n      node = insert_io_stmts_acc(node, buf->n_lane, p, kernel, group, buf, read, is_buffer, module, module_type);   \n    } else {\n      node = insert_io_stmts_tile(node, buf->n_lane, p, kernel, group, buf, buf, read, is_buffer, module, 1, module_type, -1);\n    }\n  }\n\n  /* Keep the data filtered */\n  node = autosa_tree_move_up_to_kernel(node);  \n  node = autosa_tree_move_down_to_io_mark(node, group_core, io_level);\n  node = isl_schedule_node_child(node, 0); // seqeuence\n  node = isl_schedule_node_child(node, 0); // filter  \n  node = isl_schedule_node_child(node, 0); // filter  \n\n  if (io_level != buf->level) {\n    node = autosa_tree_move_down_to_io_mark(node, group_core, buf->level);\n    node = isl_schedule_node_child(node, 0);\n  }  \n\nINSERT_STMT:    \n  p = isl_printer_to_str(ctx);\n  p = print_io_trans_stmt_prefix(\n        p, read, module->to_mem, gen->options->autosa->host_serialize, boundary, 0, NULL,\n        !read && is_lower ? 1 : 0, read && is_lower? 1 : 0, is_buffer, fifo_suffix, buf->n_lane);\n\n  if (!buf->tile)  {\n    node = insert_io_stmts_acc(node, buf->n_lane, p, kernel, group, buf, read, is_buffer, module, module_type);   \n  } else {\n    node = insert_io_stmts_tile(node, buf->n_lane, p, kernel, group, buf, buf, read, is_buffer, module, 1, module_type, if_depth);\n  }  \n\n  return node;\n}\n\n/* The node points to the \"kernel\" mark.\n */\nstatic int get_local_reduce_sched_depth(\n  __isl_take isl_schedule_node *node,\n  struct autosa_kernel *kernel)\n{\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  if (kernel->array_part_w > 0) {\n    int pos = 0;\n    int n;\n    node = isl_schedule_node_parent(node);\n    n = isl_schedule_node_band_n_member(node);\n    for (pos = n - 1; pos >= 0; pos--)\n    {\n      if (isl_schedule_node_band_member_get_coincident(node, pos))\n        break;\n    }\n    if (pos == n - 1) {\n      node = isl_schedule_node_child(node, 0);\n    } else {\n      node = isl_schedule_node_band_split(node, pos + 1);\n      node = isl_schedule_node_child(node, 0);      \n    }\n  }\n\n  int depth = isl_schedule_node_get_schedule_depth(node);\n  isl_schedule_node_free(node);\n\n  return depth;\n}\n\n/* Generate the inter_trans module for the I/O group.\n * We will add data transfer statements into the schedule tree, \n * filters that restrain the space loops to the current module,\n * and add the module and function type mark above the tree.\n */\nstatic __isl_give isl_schedule *generate_io_module_inter_trans(\n  __isl_keep isl_schedule *sched, struct autosa_hw_module *module,\n  struct autosa_array_ref_group *group,\n  struct autosa_kernel *kernel, struct autosa_gen *gen,\n  int io_level, int space_dim, int read, int boundary)\n{\n  isl_schedule *new_sched;\n  isl_ctx *ctx;\n  isl_printer *p;  \n  int n_io_ids;\n  isl_id_list *io_ids;\n  isl_id *id;\n  char *fifo_suffix, *buf_suffix;\n  isl_union_set *empty_filter = NULL;  \n  char *stmt_name;\n  struct autosa_io_buffer *buf = NULL;  \n  isl_schedule_node *node;\n  int upper_io_level = io_level + 1;\n  int is_filter = 1;\n  int is_buffer = 1;\n  int i;\n  isl_union_set *group_core = NULL;\n\n  if (io_level > space_dim && boundary == 0) {\n    return NULL;\n  }\n\n  new_sched = isl_schedule_dup(sched);\n  //DBGSCHD(stdout, new_sched, gen->ctx);\n  node = isl_schedule_get_root(new_sched);\n  isl_schedule_free(new_sched);\n  ctx = isl_schedule_node_get_ctx(node);\n  \n  /* Compute the union of domains of all the array references in the group. */\n  node = autosa_tree_move_down_to_kernel(node);\n  node = isl_schedule_node_child(node, 0); // context\n  node = isl_schedule_node_child(node, 0);\n  if (gen->options->autosa->local_reduce && group->attached_drain_group)\n    node = insert_io_group_access_domain_local_reduce(node, group, kernel, read, 0, 1);\n  else\n    node = insert_io_group_access_domain(node, group, kernel, read);\n  node = isl_schedule_node_child(node, 0);\n  group_core = isl_union_set_universe(isl_schedule_node_get_domain(node));\n  node = autosa_tree_move_up_to_kernel(node);\n  \n  /* Add the filters. */\n  n_io_ids = space_dim - io_level + 1;\n  io_ids = ppcg_scop_generate_names(gen->prog->scop, n_io_ids, \"p\");\n  n_io_ids = 0;  \n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  node = add_io_ids_filter(node, io_ids, io_level, space_dim - io_level + 1, is_filter, 0, read);\n  node = autosa_tree_move_up_to_kernel(node);\n  //DBGSCHDNODE(stdout, node, ctx);\n\n  /* Locate the buffer. */\n  for (i = io_level; i >= 1; i--)\n  {\n    buf = group->io_buffers[i - 1];\n    if (buf->tile != NULL)\n      break;\n  }\n  if (is_buffer)\n  {\n    if (i != io_level)\n    {\n      /* IO buffer is optimized out. */\n      is_buffer = 0;\n    }\n  }\n\n  if (buf->tile && buf->hoist_depth != -1) {\n    /* This buffer has been hoisted. */    \n    node = isl_schedule_node_child(node, 0); // context\n    node = isl_schedule_node_child(node, 0); // last inserted filter\n    node = isl_schedule_node_child(node, 0);\n    node = isl_schedule_node_insert_filter(node, isl_union_set_copy(buf->hoist_domain));\n    node = isl_schedule_node_child(node, 0);\n    isl_union_set_free(group_core);\n    group_core = isl_union_set_universe(isl_schedule_node_get_domain(node));    \n    node = autosa_tree_move_up_to_kernel(node);\n  }\n  \n  init_suffix(module, group, &fifo_suffix, &buf_suffix);\n  node = insert_filter_trans_stmts(node, io_ids, space_dim - io_level, io_level, read,\n      buf, module, kernel, gen, boundary, 0, is_buffer, fifo_suffix, group, group_core, 2);\n\n  free(fifo_suffix);\n  free(buf_suffix);      \n  isl_id_list_free(io_ids);\n  if (!node) {\n    isl_union_set_free(group_core);\n    return NULL;  \n  }\n\n  module->data_pack_inter = buf->n_lane;\n  /* Insert the \"io_module.inter_trans\" function mark. */\n  node = autosa_tree_move_up_to_kernel(node);  \n  if (gen->options->autosa->local_reduce && group->attached_drain_group) {\n    node = autosa_tree_move_down_to_depth(\n              node, \n              get_local_reduce_sched_depth(isl_schedule_node_copy(node), kernel), \n              kernel->core);    \n  } else {\n    if (io_level > space_dim) {\n      node = autosa_tree_move_down_to_array(node, kernel->core);\n      node = isl_schedule_node_child(node, 0);  \n    } else {      \n      node = autosa_tree_move_down_to_io_mark(node, group_core, io_level);\n      node = isl_schedule_node_parent(node);\n      node = isl_schedule_node_parent(node);\n    }    \n  }\n  \n  if (gen->options->target == AUTOSA_TARGET_CATAPULT_HLS_C) {\n    id = isl_id_alloc(ctx, \"synth\", NULL);\n    node = isl_schedule_node_insert_mark(node, id);\n    node = autosa_tree_move_up_to_kernel(node);\n    node = isl_schedule_node_child(node, 0);\n  }\n  \n  id = isl_id_alloc(ctx, \"io_module.inter_trans\", NULL);\n  node = isl_schedule_node_insert_mark(node, id);\n\n  /* Add the module mark. */\n  id = isl_id_alloc(ctx, \"module\", module);\n  node = autosa_tree_move_up_to_kernel(node);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_mark(node, id);\n\n  new_sched = isl_schedule_node_get_schedule(node);\n  isl_schedule_node_free(node);  \n  isl_union_set_free(group_core);\n\n  return new_sched;\n}\n\n/* The \"node\" points to the kernel mark. \n * This function should be called before inserting module ids into the schedule.\n */\nstatic __isl_give isl_schedule_node *insert_io_group_guard(\n  __isl_take isl_schedule_node *node, \n  struct autosa_gen *gen,\n  struct autosa_kernel *kernel,\n  int n_io_ids)\n{\n  isl_union_set *domain;\n  isl_set *guard;\n  isl_schedule_node *node_tmp;\n  isl_id_list *io_ids;\n  \n  node_tmp = isl_schedule_node_copy(node);\n  io_ids = ppcg_scop_generate_names(gen->prog->scop, n_io_ids, \"p\");\n  node_tmp = add_io_ids_filter(node_tmp, io_ids, 1, n_io_ids, 0, 0, 0);  \n  domain = isl_schedule_node_get_domain(node_tmp);\n  guard = isl_union_set_params(domain);\n  guard = isl_set_from_params(guard);\n  isl_schedule_node_free(node_tmp);\n  isl_id_list_free(io_ids);\n  \n//#ifdef _DEBUG\n//  DBGSET(stdout, guard, isl_set_get_ctx(guard));\n//#endif\n\n  //node = autosa_tree_move_up_to_kernel(node);\n  node = isl_schedule_node_child(node, 0); // context;\n  node = isl_schedule_node_child(node, 0); // filter;\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_guard(node, guard);\n  node = autosa_tree_move_up_to_kernel(node);\n\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif\n\n  return node;\n}\n\nstatic __isl_give isl_set *get_io_group_guard(\n  __isl_keep isl_schedule_node *node,\n  struct autosa_gen *gen,\n  struct autosa_kernel *kernel,\n  int n_io_ids)\n{\n  isl_union_set *domain;\n  isl_set *guard;\n  isl_schedule_node *node_tmp;\n  isl_id_list *io_ids;\n  int depth;\n  \n  node_tmp = isl_schedule_node_copy(node);\n  io_ids = ppcg_scop_generate_names(gen->prog->scop, n_io_ids, \"p\");  \n  node_tmp = add_io_ids_filter(node_tmp, io_ids, 1, n_io_ids, 0, 0, 0);  \n  isl_id_list_free(io_ids);\n\n  domain = isl_schedule_node_get_domain(node_tmp);\n  guard = isl_union_set_params(domain);\n  guard = isl_set_from_params(guard);\n  isl_schedule_node_free(node_tmp);\n  \n  return guard;\n}\n\n/* Generate the intra_trans module for the I/O group.\n * We will add data transfer statements into the schedule tree that \n * transfer data to/from the lower-level modules,\n * filters that restrain the space loops to the current module,\n * and add the module and function type mark above the tree.\n */\nstatic __isl_give isl_schedule *generate_io_module_intra_trans(\n  __isl_keep isl_schedule *sched, struct autosa_hw_module *module,\n  struct autosa_array_ref_group *group,\n  struct autosa_kernel *kernel, struct autosa_gen *gen,\n  int io_level, int space_dim, int read, int is_buffer)\n{\n  isl_ctx *ctx;\n  isl_printer *p;  \n  int n_io_ids;\n  isl_id_list *io_ids;  \n  isl_id *id;    \n  char *fifo_suffix, *buf_suffix;\n  isl_union_set *empty_filter = NULL;    \n  char *stmt_name;\n  struct autosa_io_buffer *buf = NULL;    \n  isl_schedule *new_sched;\n  isl_schedule_node *node;  \n  int i;\n  isl_set *guard;\n  isl_schedule_node *node_tmp;\n  isl_union_set *group_core = NULL;\n  isl_union_set *group_domain;\n\n  new_sched = isl_schedule_dup(sched);\n  node = isl_schedule_get_root(new_sched);  \n  node = autosa_tree_move_down_to_kernel(node);\n  isl_schedule_free(new_sched);\n  ctx = isl_schedule_node_get_ctx(node);\n  n_io_ids = space_dim - io_level + 1;\n  io_ids = ppcg_scop_generate_names(gen->prog->scop, n_io_ids, \"p\");  \n  int upper_io_level = io_level + 1;\n\n  /* Insert the group domain. */   \n  node = isl_schedule_node_child(node, 0); // context\n  node = isl_schedule_node_child(node, 0);\n  if (gen->options->autosa->local_reduce && group->attached_drain_group)\n    node = insert_io_group_access_domain_local_reduce(node, group, kernel, read, 1, 1);\n  else\n    node = insert_io_group_access_domain(node, group, kernel, read);  \n  node = isl_schedule_node_child(node, 0);\n  group_core = isl_union_set_universe(isl_schedule_node_get_domain(node)); \n  node = autosa_tree_move_up_to_kernel(node);\n\n  /* Add the filters. */\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  node = add_io_ids_filter(node, io_ids, io_level, space_dim - io_level + 1, 0, module->to_pe, read);\n  node = autosa_tree_move_up_to_kernel(node);  \n\n  /* Add the data transfer statements. */\n  init_suffix(module, group, &fifo_suffix, &buf_suffix);\n\n  /* Locate the current buffer. */\n  for (i = io_level; i >= 1; i--)\n  {\n    buf = group->io_buffers[i - 1];\n    if (buf->tile != NULL)\n      break;\n  }  \n  if (is_buffer)\n  {\n    if (i != io_level)\n    {\n      /* IO buffer is optimized out. */\n      is_buffer = 0;\n    }\n  }\n\n  /* Insert the extra transfer statement. */\n  p = isl_printer_to_str(ctx);\n  p = print_io_trans_stmt_prefix(p, !read, 0, 0, 0, \n                                 gen->options->autosa->local_reduce && group->attached_drain_group,\n                                 gen->options->autosa->reduce_op,\n                                 !read, read, is_buffer, fifo_suffix, buf->n_lane);\n\n  /* Locate the next buffer after the current buffer. */\n  int cur_level = buf->level;\n  struct autosa_io_buffer *cur_buf = buf;\n  for (int i = cur_level - 1; i >= 1; i--)\n  {\n    buf = group->io_buffers[i - 1];\n    if (buf->tile != NULL)\n      break;\n  }\n\n  if (cur_level == 1 || !buf->tile)\n  {\n    node = insert_io_stmts_acc(node, group->n_lane, p, kernel, group, cur_buf, read, is_buffer, module, 1);\n    module->data_pack_intra = group->n_lane;                                  \n  }\n  else\n  {\n    /* Move the schedule node to the level of the next buffer. */\n    node = autosa_tree_move_down_to_io_mark(node, group_core, buf->level);\n    node = isl_schedule_node_child(node, 0);    \n    node = insert_io_stmts_tile(\n                node, buf->n_lane, p, kernel, group, \n                cur_buf, buf, !read, is_buffer, module, 1, 1, -1);\n    module->data_pack_intra = buf->n_lane;    \n  }\n\n  free(fifo_suffix);\n  free(buf_suffix);\n\n  /* Insert the function mark. */    \n  node = autosa_tree_move_up_to_kernel(node);\n  \n  if (gen->options->autosa->local_reduce && group->attached_drain_group) {\n    node = autosa_tree_move_down_to_depth(\n              node, \n              get_local_reduce_sched_depth(isl_schedule_node_copy(node), kernel), \n              kernel->core);    \n  } else {\n    if (io_level > space_dim) {\n      node = autosa_tree_move_down_to_array(node, kernel->core);      \n      node = isl_schedule_node_child(node, 0);  \n    } else {\n      if (cur_buf->tile && cur_buf->hoist_depth != -1) {\n        /* This buffer has been hoisted. */        \n        node = autosa_tree_move_down_to_depth(node, cur_buf->hoist_depth, kernel->core);\n      } else {\n        node = autosa_tree_move_down_to_io_mark(node, group_core, io_level);\n        node = isl_schedule_node_parent(node);\n        node = isl_schedule_node_parent(node);\n      }\n    }    \n  }  \n\n  if (gen->options->target == AUTOSA_TARGET_CATAPULT_HLS_C) {\n    id = isl_id_alloc(ctx, \"synth\", NULL);\n    node = isl_schedule_node_insert_mark(node, id);\n    node = autosa_tree_move_up_to_kernel(node);\n    node = isl_schedule_node_child(node, 0);\n  }\n\n  id = isl_id_alloc(ctx, \"io_module.intra_trans\", NULL);\n  if (kernel->array_part_w == 0 && isl_schedule_node_get_schedule_depth(node) < group->io_level) {\n    node = autosa_tree_move_up_to_kernel(node);\n    node = isl_schedule_node_child(node, 0);\n    node = isl_schedule_node_insert_mark(node, id);  \n  } else {\n    node = isl_schedule_node_insert_mark(node, id);  \n  }  \n\n  /* Add the module mark. */\n  id = isl_id_alloc(ctx, \"module\", module);\n  node = autosa_tree_move_up_to_kernel(node);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_mark(node, id);\n\n  /* Make the node atomic */\n  node = autosa_tree_move_down_to_pe(node, kernel->core);\n  node = autosa_atomic_ancestors(node);\n  new_sched = isl_schedule_node_get_schedule(node);\n\n  isl_schedule_node_free(node);\n  isl_id_list_free(io_ids);\n  isl_union_set_free(group_core);\n\n  return new_sched;\n}\n\n/* Create the local buffer variable for the \"group\".\n * Specifically, if \"tile\" is NULL, a register is created.\n * Otherwise, a local array is created. \n * We will also update the last dimension of the array based on the \n * data packing factor \"n_lane\".\n */\nstatic void create_io_module_var(isl_ctx *ctx,\n                                 struct autosa_array_ref_group *group,\n                                 struct autosa_array_tile *tile, struct autosa_kernel_var *var, int n_lane)\n{\n  isl_printer *p;\n\n  var->array = group->array;\n  var->type = autosa_array_ref_group_type(group);\n  var->n_lane = n_lane;\n  var->n_part = 1;\n\n  p = isl_printer_to_str(ctx);\n  p = autosa_array_ref_group_print_name(group, p);\n  var->name = isl_printer_get_str(p);\n  isl_printer_free(p);\n\n  if (tile == NULL)\n  {\n    /* Create a register. */\n    var->size = isl_vec_alloc(ctx, 1);\n    var->size = isl_vec_set_element_si(var->size, 0, 1);\n  }\n  else\n  {\n    var->size = isl_vec_alloc(ctx, group->array->n_index);\n    for (int i = 0; i < group->array->n_index; ++i)\n    {\n      isl_val *size;\n\n      size = isl_val_copy(tile->bound[i].size);\n      if (i == group->array->n_index - 1) {        \n        if (group->local_array->is_sparse) {\n          size = isl_val_div(size, isl_val_int_from_si(ctx, n_lane * group->local_array->vec_len));          \n        } else {\n          if (n_lane > 1)\n            size = isl_val_div(size, isl_val_int_from_si(ctx, n_lane));          \n        }\n      }      \n      var->size = isl_vec_set_element_val(var->size, i, size);\n    }\n  }\n}\n\n/* Create the local buffers inside the I/O modules. */\nstatic isl_stat create_io_module_vars(\n    struct autosa_hw_module *module, struct autosa_kernel *kernel,\n    struct autosa_array_tile *tile, int init_required)\n{\n  module->var = isl_calloc_array(kernel->ctx, struct autosa_kernel_var, 1);\n  if (!module->var)\n    return isl_stat_error;\n  module->n_var = 1;\n  module->var[0].init_required = init_required;\n\n  create_io_module_var(kernel->ctx, module->io_groups[0],\n                       tile, &module->var[0], module->data_pack_inter);\n\n  return isl_stat_ok;\n}\n\n/* Generate the io_module for the outer loops that contain the \n * inter_trans and intra_trans modules.\n */\nstatic __isl_give isl_schedule *generate_io_module_outer(\n    __isl_keep isl_schedule *sched, struct autosa_hw_module *module,\n    struct autosa_array_ref_group *group,\n    struct autosa_kernel *kernel, struct autosa_gen *gen,\n    int io_level, int space_dim, int read, int boundary)\n{\n  isl_ctx *ctx;\n  int n_io_ids;\n  isl_id_list *io_ids;\n  isl_id *id;\n  isl_union_set *empty_filter = NULL;\n  const char *stmt_name1, *stmt_name2, *stmt_name5;  \n  char *stmt_name3, *stmt_name4;\n  isl_schedule_node *node, *graft1, *graft2, *graft3, *graft4, *graft5;\n  isl_schedule *new_sched;\n  int upper_io_level;\n  isl_space *space;\n  isl_union_set *domain;\n  struct autosa_io_buffer *buf;\n  isl_union_set *group_core = NULL;\n\n  if (io_level > space_dim && boundary == 0) {\n    return NULL;\n  }\n\n  new_sched = isl_schedule_dup(sched);\n  node = isl_schedule_get_root(new_sched);\n  isl_schedule_free(new_sched);\n  ctx = isl_schedule_node_get_ctx(node);\n  n_io_ids = space_dim - io_level + 1;\n\n  /* Compute the union of domains of all the array references in the group. */\n  node = autosa_tree_move_down_to_kernel(node);\n  node = isl_schedule_node_child(node, 0); // context\n  node = isl_schedule_node_child(node, 0);\n  if (gen->options->autosa->local_reduce && group->attached_drain_group)\n    node = insert_io_group_access_domain_local_reduce(node, group, kernel, read, 1, 1);\n  else\n    node = insert_io_group_access_domain(node, group, kernel, read);\n  node = isl_schedule_node_child(node, 0);\n  group_core = isl_union_set_universe(isl_schedule_node_get_domain(node));\n  node = autosa_tree_move_up_to_kernel(node);\n\n  io_ids = ppcg_scop_generate_names(gen->prog->scop, n_io_ids, \"p\");\n  n_io_ids = 0;\n  \n  if (io_level > space_dim && boundary == 1) {    \n    goto OUTER_INSERT_STMT;\n  }\n\n  upper_io_level = io_level + 1;\n  /* Add the filters. */\n  n_io_ids = 0;\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  while (!isl_schedule_node_is_io_mark(node, upper_io_level))\n  {\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n    {\n      isl_id *id;\n      isl_id_list *ids;\n      isl_union_set *uset;\n\n      ids = isl_id_list_from_id(isl_id_list_get_id(io_ids, n_io_ids));\n      uset = set_schedule_eq(node, ids);\n      n_io_ids++;\n      node = isl_schedule_node_insert_filter(node, uset);\n      isl_id_list_free(ids);\n      node = isl_schedule_node_child(node, 0);\n    }\n    node = isl_schedule_node_child(node, 0);\n  }\n  node = autosa_tree_move_up_to_kernel(node);\n\n  /* Locate the buffer */\n  buf = group->io_buffers[io_level - 1];\n  if (buf->tile && buf->hoist_depth != - 1) {\n    /* This buffer has been hoisted. */\n    node = isl_schedule_node_child(node, 0); // context\n    node = isl_schedule_node_child(node, 0); // last inserted filter\n    node = isl_schedule_node_child(node, 0);\n    node = isl_schedule_node_insert_filter(node, isl_union_set_copy(buf->hoist_domain));\n    node = isl_schedule_node_child(node, 0);\n    isl_union_set_free(group_core);\n    group_core = isl_union_set_universe(isl_schedule_node_get_domain(node));    \n    node = autosa_tree_move_up_to_kernel(node);\n  }\n\nOUTER_INSERT_STMT:  \n  if (gen->options->target != AUTOSA_TARGET_CATAPULT_HLS_C) {\n    if (gen->options->autosa->local_reduce && group->attached_drain_group) {\n      node = autosa_tree_move_down_to_depth(\n                node, \n                get_local_reduce_sched_depth(isl_schedule_node_copy(node), kernel), \n                kernel->core);        \n    } else {\n      if (io_level > space_dim && boundary == 1) {\n        node = autosa_tree_move_down_to_array(node, kernel->core);\n        node = isl_schedule_node_child(node, 0);              \n      } else {      \n        node = autosa_tree_move_down_to_io_mark(node, group_core, io_level);\n        node = isl_schedule_node_parent(node);      \n      }    \n    }\n  } else {\n    /* Move to the node below the kernel mark. */\n    node = isl_schedule_node_child(node, 0);\n  }\n  isl_union_set_free(group_core);\n\n  /* Add the inter_trans and intra_trans function calls. */  \n  stmt_name1 = boundary == 0 ? \"io_module.inter_trans.0\" : \"io_module.inter_trans.1\";\n  stmt_name2 = \"io_module.intra_trans\";\n  isl_printer *p_str = isl_printer_to_str(ctx);\n  if (boundary == 0)\n    p_str = isl_printer_print_str(p_str, \"io_module.inter_intra.0.\");\n  else\n    p_str = isl_printer_print_str(p_str, \"io_module.inter_intra.1.\");\n  if (module->double_buffer)\n    p_str = isl_printer_print_int(p_str, 1);\n  else\n    p_str = isl_printer_print_int(p_str, 0);\n  stmt_name3 = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  p_str = isl_printer_to_str(ctx);\n  if (boundary == 0)\n    p_str = isl_printer_print_str(p_str, \"io_module.intra_inter.0.\");\n  else\n    p_str = isl_printer_print_str(p_str, \"io_module.intra_inter.1.\");\n  if (module->double_buffer)\n    p_str = isl_printer_print_int(p_str, 1);\n  else\n    p_str = isl_printer_print_int(p_str, 0);\n  stmt_name4 = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  \n  stmt_name5 = \"io_module.state_handle\";  \n  \n  node = isl_schedule_node_cut(node);\n\n  space = isl_space_set_alloc(ctx, 0, 0);\n  space = isl_space_set_tuple_name(space, isl_dim_set, stmt_name1);\n  domain = isl_union_set_from_set(isl_set_universe(space));\n  graft1 = isl_schedule_node_from_domain(domain);\n\n  space = isl_space_set_alloc(ctx, 0, 0);\n  space = isl_space_set_tuple_name(space, isl_dim_set, stmt_name2);\n  domain = isl_union_set_from_set(isl_set_universe(space));\n  graft2 = isl_schedule_node_from_domain(domain);\n\n  space = isl_space_set_alloc(ctx, 0, 0);\n  space = isl_space_set_tuple_name(space, isl_dim_set, stmt_name3);\n  domain = isl_union_set_from_set(isl_set_universe(space));\n  graft3 = isl_schedule_node_from_domain(domain);\n\n  space = isl_space_set_alloc(ctx, 0, 0);\n  space = isl_space_set_tuple_name(space, isl_dim_set, stmt_name4);\n  domain = isl_union_set_from_set(isl_set_universe(space));\n  graft4 = isl_schedule_node_from_domain(domain);\n\n  space = isl_space_set_alloc(ctx, 0, 0);\n  space = isl_space_set_tuple_name(space, isl_dim_set, stmt_name5);\n  domain = isl_union_set_from_set(isl_set_universe(space));\n  graft5 = isl_schedule_node_from_domain(domain);\n\n  free(stmt_name3);\n  free(stmt_name4);\n\n  if (read)\n  {\n    node = isl_schedule_node_graft_before(node, isl_schedule_node_copy(graft3));\n  }\n  else\n  {\n    node = isl_schedule_node_graft_before(node, isl_schedule_node_copy(graft4));\n  }\n  if (module->double_buffer && gen->options->target != AUTOSA_TARGET_CATAPULT_HLS_C)\n  {\n    /* Add misc statements for saving and switching states. */\n    node = isl_schedule_node_graft_before(node, isl_schedule_node_copy(graft5));\n  }\n  node = isl_schedule_node_cut(node);\n  /* Insert an empty filter */\n  empty_filter = isl_union_set_from_set(isl_set_empty(\n      isl_set_get_space(kernel->context)));\n  node = isl_schedule_node_insert_filter(node, empty_filter);\n\n  if (module->double_buffer && gen->options->target != AUTOSA_TARGET_CATAPULT_HLS_C)\n  {\n    /* Ignore it if tuning_method is 1. It will considered later in the latency estimation. */\n    if (gen->options->autosa->tuning_method != 1) {\n      /* Add the last function call. */\n      node = autosa_tree_move_up_to_kernel(node);\n      node = isl_schedule_node_child(node, 0);\n      node = isl_schedule_node_child(node, 0);\n      node = isl_schedule_node_child(node, 0);\n      if (read)\n        node = isl_schedule_node_graft_after(node, isl_schedule_node_copy(graft2));\n      else\n        node = isl_schedule_node_graft_after(node, isl_schedule_node_copy(graft1));\n    }\n  }\n  isl_schedule_node_free(graft1);\n  isl_schedule_node_free(graft2);\n  isl_schedule_node_free(graft3);\n  isl_schedule_node_free(graft4);\n  isl_schedule_node_free(graft5);\n\n  /* Add the module mark. */\n  id = isl_id_alloc(ctx, \"module\", module);\n  node = autosa_tree_move_up_to_kernel(node);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_mark(node, id);\n\n  new_sched = isl_schedule_node_get_schedule(node);\n  isl_schedule_node_free(node);\n\n  /* Update module information. */\n  if (!boundary || (io_level > space_dim && boundary == 1))\n  {\n    module->type = (group->group_type == AUTOSA_DRAIN_GROUP) ? DRAIN_MODULE : IO_MODULE;\n    module->level = io_level;\n    module->n_io_group++;\n    module->io_groups = (struct autosa_array_ref_group **)realloc(module->io_groups,\n                                                                  module->n_io_group * sizeof(struct autosa_array_ref_group *));\n    module->io_groups[module->n_io_group - 1] = group;\n    module->inst_ids = io_ids;\n    module->kernel = kernel;\n    module->is_buffer = 1;\n    module->is_filter = 1;\n    /* Create IO module variables. */\n    for (int i = io_level; i >= 1; i--)\n    {\n      buf = group->io_buffers[i - 1];\n      if (buf->tile != NULL)\n        break;\n    }\n    if (gen->options->autosa->local_reduce && group->attached_drain_group) {\n      create_io_module_vars(module, kernel, buf->tile, 1);\n    } else {\n      create_io_module_vars(module, kernel, buf->tile, 0);\n    }\n  }\n  else\n  {\n    isl_id_list_free(io_ids);\n  }\n\n  return new_sched;\n}\n\n/* We will generate five seperate schedules for this type of I/O module.\n * Schedule 1: Outer loops contains two marks for inter_transfer \n *             and intra_transfer modules\n * Schedule 2: Inter_transfer function\n * Schedule 3: Intra_transfer function\n * Schedule 4: The boundary module for outer loops that is the last module\n *             in the chain.\n * Schedule 5: The boundary module for inter_transfer that is the last module\n *             in the chain.\n */\nstatic __isl_give struct autosa_hw_module *generate_filter_buffer_io_module(\n    __isl_take struct autosa_hw_module *module,\n    __isl_keep isl_schedule_node *node,\n    struct autosa_array_ref_group *group, struct autosa_kernel *kernel,\n    struct autosa_gen *gen,\n    int io_level, int space_dim, int is_filter, int is_buffer, int read)\n{\n  isl_schedule *sched;\n  isl_schedule *sched1, *sched2, *sched3;\n  isl_schedule *boundary_sched2, *boundary_sched1;\n\n  sched = isl_schedule_node_get_schedule(node);\n  \n  if (gen->options->autosa->double_buffer && kernel->array_part_w > 0)\n  {\n    isl_union_map *double_buffer_assignment;\n    /* Check if the double buffer assignment exists. */    \n    double_buffer_assignment = extract_sizes_from_str(kernel->ctx, gen->options->autosa->double_buffer_assignment);    \n    if (!double_buffer_assignment) {\n      /* Use the default strategy:\n       * Set all the modules to double buffer except the drain module.       \n       */      \n      if (group->group_type == AUTOSA_DRAIN_GROUP) {\n        module->double_buffer = 0;\n      } else {\n        module->double_buffer = 1;\n      }      \n    } else {\n      isl_set *tmp;\n      tmp = extract_sa_sizes(double_buffer_assignment, group->local_array->array->name);      \n      \n      if (tmp) {        \n        module->double_buffer = 1;        \n      }\n      isl_set_free(tmp);\n    }\n    isl_union_map_free(double_buffer_assignment);\n  }\n  else\n  {    \n    module->double_buffer = 0;\n  }\n\n  /* Inter transfer function. */\n  sched2 = generate_io_module_inter_trans(sched, module, group, kernel, gen,\n                                          io_level, space_dim, read, 0);\n  if (is_filter)\n  {\n    /* Add the boundary module schedule. */\n    module->boundary = 1;\n    boundary_sched2 = generate_io_module_inter_trans(sched, module, group,\n                                                     kernel, gen, io_level, space_dim, read, 1);\n  }  \n  /* Intra transfer function. */\n  sched3 = generate_io_module_intra_trans(sched, module, group, kernel, gen,\n                                          io_level, space_dim, read, is_buffer);\n  /* Outer loops. */  \n  sched1 = generate_io_module_outer(sched, module, group, kernel, gen,\n                                    io_level, space_dim, read, 0);\n  if (is_filter)\n  {\n    /* Add the boundary module schedule. */    \n    module->boundary = 1;\n    boundary_sched1 = generate_io_module_outer(sched, module, group, kernel, gen,\n                                               io_level, space_dim, read, 1);\n  }\n\n  isl_schedule_free(sched);\n\n  module->sched = NULL;\n  module->outer_sched = sched1;\n  module->inter_sched = sched2;\n  module->intra_sched = sched3;\n  if (gen->options->autosa->tuning_method == 1) {\n    module->tuning_sched = NULL;\n    if (sched2)\n      module->tuning_inter_sched = kernel->tuning_program->generate_tuning_schedule(isl_schedule_dup(sched2));\n    else\n      module->tuning_inter_sched = kernel->tuning_program->generate_tuning_schedule(isl_schedule_dup(boundary_sched2));\n    module->tuning_intra_sched = kernel->tuning_program->generate_tuning_schedule(isl_schedule_dup(sched3));\n    \n    module->tuning_num_sched = NULL;\n    if (sched2)\n      module->tuning_num_inter_sched = kernel->tuning_program->generate_tuning_schedule(isl_schedule_dup(sched2));\n    else\n      module->tuning_num_inter_sched = kernel->tuning_program->generate_tuning_schedule(isl_schedule_dup(boundary_sched2));\n    module->tuning_num_intra_sched = kernel->tuning_program->generate_tuning_schedule(isl_schedule_dup(sched3));\n\n    if (sched1)\n      module->tuning_outer_sched = kernel->tuning_program->generate_tuning_schedule(isl_schedule_dup(sched1));\n    else\n      module->tuning_outer_sched = kernel->tuning_program->generate_tuning_schedule(isl_schedule_dup(boundary_sched1));\n    /* Remove the filter ids */\n    isl_schedule *tuning_sched;\n    if (sched1)\n      tuning_sched = isl_schedule_dup(sched1);\n    else\n      tuning_sched = isl_schedule_dup(boundary_sched1);    \n    isl_schedule_node *root = isl_schedule_get_root(tuning_sched);        \n    if (io_level <= space_dim) {\n      root = autosa_tree_move_down_to_io_mark(root, kernel->core, io_level + 1);      \n      while (root && isl_schedule_node_has_parent(root)) {\n        root = isl_schedule_node_parent(root);\n        if (isl_schedule_node_get_type(root) == isl_schedule_node_filter) {\n          root = isl_schedule_node_delete(root);\n        }\n        if (autosa_tree_node_is_mark(root, \"array\"))\n          break;\n      }\n    }\n    if (root) {\n      isl_schedule_free(tuning_sched);\n      tuning_sched = isl_schedule_node_get_schedule(root);\n    }\n    isl_schedule_node_free(root);\n    module->tuning_num_outer_sched = kernel->tuning_program->generate_tuning_schedule(tuning_sched);\n  }  \n\n  if (module->boundary)\n  {\n    module->boundary_outer_sched = boundary_sched1;\n    module->boundary_inter_sched = boundary_sched2;\n  }\n\n  return module;\n}\n\n/* Internal struct for add_drain_merge_stmt_acc_single. */\nstruct drain_merge_stmt_acc_data\n{\n  struct autosa_kernel *kernel;\n  struct autosa_array_ref_group *group;\n  struct autosa_stmt_access *ref;\n};\n\nstatic __isl_give isl_multi_aff *autosa_create_drain_merge_stmt(\n    isl_ctx *ctx,\n    struct autosa_array_ref_group *io_group,\n    isl_schedule_node *node,\n    char *stmt_name)\n{\n  isl_space *space;\n  int depth;\n  char buf[100];\n  isl_id *id;\n\n  depth = isl_schedule_node_get_schedule_depth(node);\n  space = isl_space_copy(io_group->array->space);\n  space = isl_space_from_range(space);\n  space = isl_space_add_dims(space, isl_dim_in, depth);\n  space = isl_space_wrap(space);\n  space = isl_space_map_from_set(space);\n\n  sprintf(buf, \"%s\", stmt_name);\n\n  id = isl_id_alloc(ctx, buf, NULL);\n  space = isl_space_set_tuple_id(space, isl_dim_in, id);\n\n  return isl_multi_aff_identity(space);\n}\n\nstatic __isl_give isl_schedule_node *add_drain_merge_stmt_acc_single(\n    __isl_take isl_schedule_node *node, void *user)\n{\n  struct drain_merge_stmt_acc_data *data =\n      (struct drain_merge_stmt_acc_data *)(user);\n  struct autosa_array_ref_group *group = data->group;\n  struct autosa_kernel *kernel = data->kernel;\n  struct autosa_stmt_access *ref = data->ref;\n  struct autosa_array_tile *tile;\n  isl_union_set *uset, *empty_filter, *domain;\n  isl_set *set;\n  isl_space *space;\n  isl_id *id, *id2;\n  isl_ctx *ctx;\n  isl_union_map *access;\n  int empty;\n  isl_printer *p_str;\n  char *stmt_name;\n  isl_multi_aff *from_access, *ma;\n  isl_multi_pw_aff *mpa;\n  isl_multi_union_pw_aff *mupa;\n  isl_schedule_node *graft;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_leaf)\n    return node;\n\n  /* Examine if the statement contains the access. */\n  uset = isl_schedule_node_get_domain(node);\n  set = isl_set_from_union_set(isl_union_set_copy(uset));\n  space = isl_set_get_space(set);\n  isl_set_free(set);\n  id = isl_space_get_tuple_id(space, isl_dim_set);\n  isl_space_free(space);\n  space = isl_map_get_space(ref->access);\n  id2 = isl_space_get_tuple_id(space, isl_dim_in);\n  empty_filter = isl_union_set_empty(isl_union_set_get_space(uset));\n  isl_union_set_free(uset);\n  isl_space_free(space);\n\n  if (id != id2)\n  {\n    isl_id_free(id);\n    isl_id_free(id2);\n    node = isl_schedule_node_insert_filter(node, empty_filter);\n    return node;\n  }\n  isl_id_free(id);\n  isl_id_free(id2);\n  ctx = isl_schedule_node_get_ctx(node);\n\n  access = io_comm_access_ref(kernel, node, group, ref, 0);\n  empty = isl_union_map_is_empty(access);\n  if (empty < 0 || empty)\n  {\n    isl_union_map_free(access);\n    isl_union_set_free(empty_filter);\n    if (empty < 0)\n      return isl_schedule_node_free(node);\n    return node;\n  }\n\n  p_str = isl_printer_to_str(ctx);\n  p_str = isl_printer_print_str(p_str, \"drain_merge.\");\n  p_str = isl_printer_print_str(p_str, group->local_array->array->name);\n  stmt_name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  from_access = autosa_create_drain_merge_stmt(ctx, group, node, stmt_name);\n  free(stmt_name);\n\n  /* Create a register tiling. */\n  tile = create_register_tiling(node, group, ref);\n  ma = isl_multi_aff_copy(tile->tiling);\n  ma = isl_multi_aff_pullback_multi_aff(ma,\n                                        isl_multi_aff_copy(from_access));\n  mpa = isl_multi_pw_aff_from_multi_aff(ma);\n  mupa = isl_multi_union_pw_aff_from_multi_pw_aff(mpa);\n\n  domain = isl_union_map_range(access);\n  domain = isl_union_set_preimage_multi_aff(domain, from_access);\n  access = isl_union_set_wrapped_domain_map(domain);\n  access = isl_union_map_reverse(access);\n  access = isl_union_map_coalesce(access);\n\n  graft = isl_schedule_node_from_extension(access);\n  graft = isl_schedule_node_child(graft, 0);\n  graft = isl_schedule_node_insert_partial_schedule(graft, mupa);\n\n  while (graft && isl_schedule_node_has_parent(graft))\n    graft = isl_schedule_node_parent(graft);\n\n  node = isl_schedule_node_graft_before(node, graft);\n  node = isl_schedule_node_insert_filter(node, empty_filter);\n  node = isl_schedule_node_parent(node);\n  node = isl_schedule_node_parent(node);\n  node = isl_schedule_node_parent(node);\n\n  autosa_array_tile_free(tile);\n\n  return node;\n}\n\nstatic __isl_give isl_schedule_node *add_drain_merge_stmt_acc(\n    __isl_take isl_schedule_node *node, struct autosa_array_ref_group *group,\n    struct autosa_kernel *kernel)\n{\n  struct drain_merge_stmt_acc_data data = {kernel, group, NULL};\n  for (int i = 0; i < group->n_ref; i++)\n  {\n    data.ref = group->refs[i];\n    node = isl_schedule_node_map_descendant_bottom_up(\n        node, &add_drain_merge_stmt_acc_single, &data);\n  }\n  return node;\n}\n\n/* This function generats code that merge all drained values from the drain group.\n */\nstatic __isl_give struct autosa_drain_merge_func *generate_drain_merge_func(\n    struct autosa_array_ref_group *group, struct autosa_kernel *kernel,\n    struct autosa_gen *gen)\n{\n  isl_ctx *ctx;\n  isl_schedule_node *node;\n  int io_level;\n  int space_dim;\n  int n_io_ids;\n  isl_id_list *io_ids = NULL;\n  isl_union_map *group_access;\n  isl_union_set *group_domain;\n  isl_schedule *sched;\n  isl_id *id;\n  struct autosa_drain_merge_func *func = NULL;\n\n  ctx = gen->ctx;\n  node = isl_schedule_get_root(group->io_schedule);\n  io_level = group->io_level;\n  space_dim = group->space_dim;\n  n_io_ids = space_dim - io_level + 1;\n  io_ids = ppcg_scop_generate_names(gen->prog->scop, n_io_ids, \"p\");\n\n  /* Add the filters. */\n  n_io_ids = 0;\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  while (!isl_schedule_node_is_io_mark(node, io_level))\n  {\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n    {\n      isl_id *id;\n      isl_id_list *ids;\n      isl_union_set *uset;\n\n      ids = isl_id_list_from_id(isl_id_list_get_id(io_ids, n_io_ids));\n      uset = set_schedule_eq(node, ids);\n      n_io_ids++;\n      node = isl_schedule_node_insert_filter(node, uset);\n      isl_id_list_free(ids);\n      node = isl_schedule_node_child(node, 0);\n    }\n    node = isl_schedule_node_child(node, 0);\n  }\n  node = autosa_tree_move_up_to_kernel(node);\n\n  /* Add the data transfer statements. */\n  node = autosa_tree_move_down_to_io_mark(node, kernel->core, io_level);\n  node = add_drain_merge_stmt_acc(node, group, kernel);\n\n  /* Compute the union of domains of all the array references in the group. */\n  group_access = isl_union_map_empty(isl_map_get_space(group->access));\n  for (int i = 0; i < group->n_ref; i++)\n  {\n    struct autosa_stmt_access *ref = group->refs[i];\n    group_access = isl_union_map_union(group_access,\n                                       autosa_drain_group_ref_access_relation(group, ref, 0, 1,\n                                                                              kernel->expanded_domain));\n  }\n  group_domain = isl_union_map_domain(group_access);\n  group_domain = isl_union_set_coalesce(group_domain);\n  /* Add the group domain as the filter. */\n  node = autosa_tree_move_up_to_kernel(node);\n  node = isl_schedule_node_child(node, 0); // context\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_filter(node, group_domain);\n\n  /* Add the func mark. */\n  func = autosa_drain_merge_func_alloc(gen);\n  id = isl_id_alloc(ctx, \"drain_merge\", func);\n  node = autosa_tree_move_up_to_kernel(node);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_mark(node, id);\n\n  sched = isl_schedule_node_get_schedule(node);\n  func->sched = sched;\n  func->group = group;\n  func->kernel = kernel;\n  func->inst_ids = io_ids;\n\n  isl_schedule_node_free(node);\n\n  return func;\n}\n\nstruct add_serialize_stmt_acc_data\n{\n  struct autosa_array_ref_group *group;\n  struct autosa_stmt_access *ref;\n  struct autosa_kernel *kernel;\n  struct autosa_array_tile *local_tile;\n  char *stmt_name;\n  int read;\n  struct autosa_hw_module *module;\n};\n\nstatic __isl_give isl_schedule_node *add_serialize_stmt_acc_single(\n    __isl_take isl_schedule_node *node, void *user)\n{\n  struct add_serialize_stmt_acc_data *data =\n      (struct add_serialize_stmt_acc_data *)user;\n  struct autosa_array_ref_group *group = data->group;\n  struct autosa_stmt_access *ref = data->ref;\n  struct autosa_array_tile *tile;\n  isl_union_set *uset, *empty_filter, *domain;\n  isl_set *set;\n  isl_space *space;\n  isl_id *id, *id2;\n  isl_ctx *ctx;\n  isl_union_map *access;\n  int empty;\n  isl_multi_aff *from_access;\n  isl_multi_aff *ma;\n  isl_multi_pw_aff *mpa;\n  isl_multi_union_pw_aff *mupa;\n  isl_schedule_node *graft;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_leaf)\n    return node;\n\n  /* Examine if the statement contains the access. */\n  uset = isl_schedule_node_get_domain(node);\n  set = isl_set_from_union_set(isl_union_set_copy(uset));\n  space = isl_set_get_space(set);\n  isl_set_free(set);\n  id = isl_space_get_tuple_id(space, isl_dim_set);\n  isl_space_free(space);\n  space = isl_map_get_space(ref->access);\n  id2 = isl_space_get_tuple_id(space, isl_dim_in);\n  empty_filter = isl_union_set_empty(isl_union_set_get_space(uset));\n  isl_union_set_free(uset);\n  isl_space_free(space);\n  if (id = id2)\n  {\n    isl_id_free(id);\n    isl_id_free(id2);\n    node = isl_schedule_node_insert_filter(node, empty_filter);\n    return node;\n  }\n  isl_id_free(id);\n  isl_id_free(id2);\n  ctx = isl_schedule_node_get_ctx(node);\n\n  /* S -> [D -> A] */\n  access = io_comm_access_ref(data->kernel, node, group, ref, data->read);  \n\n  empty = isl_union_map_is_empty(access);\n  if (empty < 0 || empty)\n  {\n    isl_union_map_free(access);\n    isl_union_set_free(empty_filter);\n    if (empty < 0)\n      return isl_schedule_node_free(node);\n    return node;\n  }\n\n  from_access = autosa_create_io_access_stmt(\n      ctx, group, group, data->local_tile,\n      isl_schedule_node_get_schedule_depth(node), data->stmt_name);\n\n  /* Create a register tiling. */\n  tile = create_register_tiling(node, group, ref);\n  ma = isl_multi_aff_copy(tile->tiling);\n  ma = isl_multi_aff_pullback_multi_aff(ma,\n                                        isl_multi_aff_copy(from_access));\n  mpa = isl_multi_pw_aff_from_multi_aff(ma);\n  mupa = isl_multi_union_pw_aff_from_multi_pw_aff(mpa);\n\n  domain = isl_union_map_range(access);\n  /* Update the serialization bound. */\n  group->local_array->serialize_bound = isl_set_card(isl_set_from_union_set(isl_union_set_copy(domain)));\n\n  domain = isl_union_set_preimage_multi_aff(domain, from_access);\n  access = isl_union_set_wrapped_domain_map(domain);\n  access = isl_union_map_reverse(access);\n  access = isl_union_map_coalesce(access);\n\n  graft = isl_schedule_node_from_extension(access);\n  graft = isl_schedule_node_child(graft, 0);\n  graft = isl_schedule_node_insert_partial_schedule(graft, mupa);\n\n  while (graft && isl_schedule_node_has_parent(graft))\n    graft = isl_schedule_node_parent(graft);\n\n  node = isl_schedule_node_graft_before(node, graft);\n  node = isl_schedule_node_insert_filter(node, empty_filter);\n  node = isl_schedule_node_parent(node);\n  node = isl_schedule_node_parent(node);\n  node = isl_schedule_node_parent(node);\n\n  autosa_array_tile_free(tile);\n\n  return node;\n}\n\nstatic __isl_give isl_schedule_node *add_serialize_stmt_acc(\n  __isl_take isl_schedule_node *node,\n  struct autosa_array_ref_group *group,\n  struct autosa_kernel *kernel,\n  struct autosa_array_tile *tile,\n  char *stmt_name,\n  int read,\n  struct autosa_hw_module *module)\n{\n  struct add_serialize_stmt_acc_data data = {\n      group, NULL, kernel, tile, stmt_name, read, module};\n\n  for (int i = 0; i < group->n_ref; i++)\n  {\n    struct autosa_stmt_access *ref = group->refs[i];\n    data.ref = ref;\n    node = isl_schedule_node_map_descendant_bottom_up(\n        node, &add_serialize_stmt_acc_single, &data);\n  }\n\n  return node;\n}\n\nstatic __isl_give isl_schedule_node *add_serialize_stmt_tile(\n  __isl_take isl_schedule_node *node,\n  struct autosa_array_ref_group *group,\n  struct autosa_kernel *kernel,\n  struct autosa_array_tile *local_tile, /* Local buffer */\n  struct autosa_array_tile *tile,       /* Tile to be copied */\n  char *stmt_name,\n  int read,\n  struct autosa_hw_module *module)\n{\n  isl_union_map *access;\n  int empty;\n  isl_multi_aff *from_access;\n  isl_multi_aff *ma;\n  isl_multi_pw_aff *mpa;\n  isl_multi_union_pw_aff *mupa;\n  isl_union_set *domain;\n  isl_schedule_node *graft;\n  isl_ctx *ctx;\n\n  ctx = isl_schedule_node_get_ctx(node);\n  access = io_comm_access(kernel, node, group, read);\n  empty = isl_union_map_is_empty(access);\n  if (empty < 0 || empty)\n  {\n    isl_union_map_free(access);\n    if (empty < 0)\n      return isl_schedule_node_free(node);\n    return node;\n  }\n\n  from_access = autosa_create_io_access_stmt(kernel->ctx, group, group,\n                                             local_tile, isl_schedule_node_get_schedule_depth(node), stmt_name);\n\n  ma = isl_multi_aff_copy(tile->tiling);\n  ma = isl_multi_aff_pullback_multi_aff(ma,\n                                        isl_multi_aff_copy(from_access));\n  mpa = isl_multi_pw_aff_from_multi_aff(ma);\n  mupa = isl_multi_union_pw_aff_from_multi_pw_aff(mpa);\n\n  /* [D -> A] */\n  domain = isl_union_map_range(access);\n  /* Restrain the buffer to the local tile size. */\n  if (!autosa_array_is_scalar(group->array))\n  {\n    isl_map *map;\n    isl_set *set;\n    set = isl_map_domain(isl_map_from_union_map(isl_union_set_unwrap(domain)));\n    map = group_tile_buffer(group, tile);\n    map = isl_map_intersect_domain(map, set);\n    domain = isl_union_set_from_set(isl_map_wrap(map));\n  }\n\n  /* Extract the serialization bound. */\n  group->local_array->serialize_bound = isl_set_card(\n      isl_set_from_union_set(isl_union_set_copy(domain)));  \n\n  domain = isl_union_set_preimage_multi_aff(domain, from_access);\n  access = isl_union_set_wrapped_domain_map(domain);\n  access = isl_union_map_reverse(access);\n  access = isl_union_map_coalesce(access);\n\n  graft = isl_schedule_node_from_extension(access);\n  graft = isl_schedule_node_child(graft, 0);\n  graft = isl_schedule_node_insert_partial_schedule(graft, mupa);\n\n  if (group->local_array->is_sparse) {\n    /* We will need to modify the last dimension accordingly. */\n    int n = isl_schedule_node_band_n_member(graft);\n    if (n > 1) {\n      graft = isl_schedule_node_band_split(graft, n - 1);\n      graft = isl_schedule_node_child(graft, 0);\n    }\n    if (group->local_array->eff_compress_ratio > 1) {\n      int tile_size[1];\n      isl_union_set *filter;\n      \n      tile_size[0] = group->local_array->eff_compress_ratio;\n      graft = autosa_tile_band(graft, tile_size);\n      graft = isl_schedule_node_child(graft, 0);\n      filter = schedule_eq_lb(graft);\n      graft = isl_schedule_node_insert_filter(graft, filter);\n      graft = isl_schedule_node_parent(graft);\n    }\n  }\n\n  while (graft && isl_schedule_node_has_parent(graft))\n    graft = isl_schedule_node_parent(graft);\n\n  node = isl_schedule_node_graft_before(node, graft);\n\n  return node;\n}\n\n/* Generate a schedule for serializing/deserializing the host data.\n */\nstatic __isl_give isl_schedule *generate_serialize_schedule(\n    struct autosa_kernel *kernel,\n    struct autosa_array_ref_group *group,\n    struct autosa_hw_module *module,\n    struct autosa_gen *gen,\n    int in)\n{\n  isl_printer *p;\n  isl_schedule_node *node;\n  isl_ctx *ctx;\n  struct autosa_io_buffer *buf;\n  int io_level, i;\n  char *stmt_name;\n  isl_union_set *empty_filter;\n  isl_union_map *group_access;\n  isl_union_set *group_domain;\n  isl_id *id;\n  isl_schedule *sched;\n  isl_union_set *group_core = NULL;\n\n  ctx = gen->ctx;\n  if (gen->options->autosa->lower_int_io_L1_buffer && group->io_L1_lower_schedule)\n    node = isl_schedule_get_root(group->io_L1_lower_schedule);\n  else\n    node = isl_schedule_get_root(group->io_schedule);\n  node = autosa_tree_move_down_to_kernel(node);\n\n  /* Compute the union of domains of all the array references in the group. */\n  node = isl_schedule_node_child(node, 0); // context\n  node = isl_schedule_node_child(node, 0);\n  if (gen->options->autosa->local_reduce && group->attached_drain_group)\n    node = insert_io_group_access_domain_local_reduce(node, group, kernel, in, 0, 1);\n  else\n    node = insert_io_group_access_domain(node, group, kernel, in);\n  node = isl_schedule_node_child(node, 0);\n  group_core = isl_union_set_universe(isl_schedule_node_get_domain(node));\n  node = autosa_tree_move_up_to_kernel(node);\n\n  /* Generate the statement */\n  p = isl_printer_to_str(ctx);\n  p = isl_printer_print_str(p, in ? \"serialize\" : \"deserialize\");\n  stmt_name = isl_printer_get_str(p);\n  isl_printer_free(p);\n\n  io_level = module->level;\n  /* Locate the next buffer. */\n  for (i = io_level; i >= 1; i--)\n  {\n    buf = group->io_buffers[i - 1];\n    if (buf->tile != NULL)\n      break;\n  }\n  /* Move the schedule node to the level of the buffer.\n   * TODO: fix it when the buf->tile == NULL.\n   */\n  node = autosa_tree_move_down_to_depth(node, buf->tile->depth, group_core);\n  if (!buf->tile)\n  {\n    /* If there is more than one reference in the I/O group to be serialized.\n     * We will disable the serialization for this module.\n     */\n    if (group->n_ref > 1)\n    {\n      isl_schedule_node_free(node);\n      return NULL;\n    }\n    else\n    {\n      node = add_serialize_stmt_acc(node, group, kernel, buf->tile, stmt_name, in, module);\n    }\n  }\n  else\n  {\n    node = add_serialize_stmt_tile(node, group, kernel, buf->tile, buf->tile, stmt_name, in, module);\n    node = isl_schedule_node_cut(node);\n    empty_filter = isl_union_set_from_set(isl_set_empty(isl_set_get_space(kernel->context)));\n    node = isl_schedule_node_insert_filter(node, empty_filter);\n  }\n  free(stmt_name);\n\n  /* Add the host_serialize mark. */\n  id = isl_id_alloc(ctx, \"host_serialize\", module);\n  node = autosa_tree_move_up_to_kernel(node);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_mark(node, id);\n\n  /* Update the array information */\n  group->local_array->host_serialize = 1;\n\n  sched = isl_schedule_node_get_schedule(node);\n  isl_schedule_node_free(node);\n  isl_union_set_free(group_core);\n\n  return sched;\n}\n\n/* This function recalculates the bound of io module ids for the io module.\n * We will insert a filter that equals the io id to the \n * sched dim at each dimension.\n * Then we will compute the domain of these io ids and use them to update the \n * io schedule context.\n * The node points to \"array\".\n */\nstatic __isl_give isl_schedule_node *update_io_module_context(\n  __isl_take isl_schedule_node *node,\n  struct autosa_gen *gen,\n  int io_level, int n_io_ids)\n{\n  isl_union_set *domain;\n  isl_ctx *ctx;\n  isl_set *grid;\n  isl_schedule_node *tmp_node;\n  isl_id_list *io_ids;\n  isl_set *context;\n\n  ctx = isl_schedule_node_get_ctx(node);\n  tmp_node = isl_schedule_node_copy(node);\n\n  /* Add io ids filters down to the io_level */\n  io_ids = ppcg_scop_generate_names(gen->prog->scop, n_io_ids, \"p\");\n  tmp_node = add_io_ids_filter(tmp_node, io_ids, 1, n_io_ids, 0, 0, 0);\n  \n  /* Collect the domain down to the io_level */\n  domain = isl_schedule_node_get_domain(tmp_node);\n  grid = isl_union_set_params(domain);\n  grid = isl_set_from_params(grid);\n\n  isl_id_list_free(io_ids);\n  isl_schedule_node_free(tmp_node);\n\n  /* Update the context. */\n  node = autosa_tree_move_up_to_kernel(node);\n  node = isl_schedule_node_child(node, 0); // context\n  context = isl_schedule_node_context_get_context(node);  \n  context = isl_set_intersect(context, grid);\n  context = isl_set_coalesce(context);\n\n  node = isl_schedule_node_delete(node);\n  node = isl_schedule_node_insert_context(node, context);\n\n  return node;\n}\n\n/* Generate the schedule for the I/O module.  \n * We will insert statements at the corresponding position in the schedule tree\n * to transfer the data.\n * The statement is in the format of:\n * in_trans/out_trans[_dram]/[_dram_serialize]/[_boundary].fifo_suffix[_local].\n * is_filter.is_buffer.filte_depth.filter_dim.buf_cur_lane.buf_nxt_lane.coalesce_depth.coalesce_ub\n * \n * If is_buffer is disabled, we will insert one I/O statement for \n * transferring the data between the same-level I/O modules and lower-level modules.\n * If is_buffer is enabled, we will insert two I/O statements:\n * - one for transaferring the data between the same-level I/O modules and store\n *   the data required for the lower-level I/O modules in the buffers.\n * - one for transaferring the data to/from the lower-level I/O modules from/to \n *   the local buffers.\n * If host data serialization is enabled, we will generate a separate schedule \n * for serializing/deserializing the host data.\n */\nstatic isl_stat generate_default_io_module_schedule(\n  __isl_take struct autosa_hw_module *module,\n  __isl_keep isl_schedule_node *node,\n  struct autosa_array_ref_group *group,\n  struct autosa_kernel *kernel,\n  struct autosa_gen *gen,\n  int io_level, int space_dim,\n  int is_filter, int is_buffer,\n  int read, int boundary)\n{\n  isl_schedule *sched1, *sched2;\n  isl_ctx *ctx;\n  isl_printer *p;\n  char *io_mark;\n  int n_io_ids = 0;\n  isl_id_list *io_ids;\n  isl_id *id;\n  int is_mark;\n  isl_set *context;\n  char *fifo_suffix, *buf_suffix;\n  isl_union_set *empty_filter = NULL;\n  isl_union_set *eq_filter = NULL;\n  isl_union_set *neq_filter = NULL;\n  int depth;\n  char *stmt_name;\n  struct autosa_io_buffer *buf = NULL;\n  int i;\n  isl_union_set *id_filter;\n  isl_union_set *group_core = NULL;\n\n  ctx = isl_schedule_node_get_ctx(node);\n  sched1 = isl_schedule_node_get_schedule(node);\n  sched2 = isl_schedule_dup(sched1);\n  isl_schedule_free(sched1);\n  node = isl_schedule_get_root(sched2);\n  isl_schedule_free(sched2);  \n\n  /* Compute the union of domains of all the array references in the group. */\n  node = autosa_tree_move_down_to_kernel(node);\n  node = isl_schedule_node_child(node, 0); // context\n  node = isl_schedule_node_child(node, 0);\n  if (gen->options->autosa->local_reduce && group->attached_drain_group)\n    node = insert_io_group_access_domain_local_reduce(node, group, kernel, read, 0, 1);\n  else\n    node = insert_io_group_access_domain(node, group, kernel, read);  \n  node = isl_schedule_node_child(node, 0);\n  group_core = isl_union_set_universe(isl_schedule_node_get_domain(node));    \n  node = autosa_tree_move_up_to_kernel(node);  \n\n  /* Add the module id filters. */\n  n_io_ids = space_dim - io_level + 1;\n  io_ids = ppcg_scop_generate_names(gen->prog->scop, n_io_ids, \"p\");   \n  node = autosa_tree_move_down_to_array(node, kernel->core);  \n  node = add_io_ids_filter(node, io_ids, io_level, space_dim - io_level + 1, is_filter, module->to_pe, read);\n  node = autosa_tree_move_up_to_kernel(node);    \n\n  /* Add the data transfer statements. */  \n  init_suffix(module, group, &fifo_suffix, &buf_suffix);  \n  /* Locate the next buffer. */\n  for (i = io_level; i >= 1; i--)\n  {\n    buf = group->io_buffers[i - 1];\n    if (buf->tile != NULL)\n      break;\n  }\n  if (is_buffer)\n  {\n    if (i != io_level)\n    {\n      /* The buffer is optimized out at this level. */\n      is_buffer = 0;\n    }\n  }\n\n  if (buf->tile && buf->hoist_depth != -1) {\n    /* This buffer has been hoisted. */    \n    node = isl_schedule_node_child(node, 0); // context\n    node = isl_schedule_node_child(node, 0); // last inserted filter\n    node = isl_schedule_node_child(node, 0);\n    node = isl_schedule_node_insert_filter(node, isl_union_set_copy(buf->hoist_domain));\n    node = isl_schedule_node_child(node, 0);\n    isl_union_set_free(group_core);\n    group_core = isl_union_set_universe(isl_schedule_node_get_domain(node));    \n    node = autosa_tree_move_up_to_kernel(node);\n  }\n\n  /* Move the schedule node to the level of the buffer. \n   * In the current implementation, there will also be a buffer at the \n   * innermost level.\n   */\n  if (is_filter) {\n    module->data_pack_inter = buf->n_lane;\n    module->data_pack_intra = buf->n_lane;\n    node = insert_filter_trans_stmts(\n              node, io_ids, space_dim - io_level, io_level, read,\n              buf, module, kernel, gen, boundary, 1, is_buffer, fifo_suffix, group, group_core, 0);\n  } else {\n    if (is_buffer) {\n      /* Insert two statements:\n       * - Load from upper stream I/O modules/DRAM to buffer\n       * - Write to downstream I/O modules from buffer\n       */\n      module->data_pack_inter = buf->n_lane;\n      /* Locate the next buffer after the current buffer. */\n      int cur_level = buf->level;\n      struct autosa_io_buffer *cur_buf = buf;\n      for (int i = cur_level - 1; i >= 1; i--)\n      {\n        buf = group->io_buffers[i - 1];\n        if (buf->tile != NULL)\n          break;\n      }\n\n      if (!buf->tile) {\n        module->data_pack_intra = group->n_lane;        \n      } else {\n        module->data_pack_intra = buf->n_lane;\n      }\n      \n      /* Insert the first statement. */\n      node = autosa_tree_move_down_to_depth(node, cur_buf->tile->depth, kernel->core);\n      p = isl_printer_to_str(ctx);\n      p = print_io_trans_stmt_prefix(\n              p, read, module->to_mem, gen->options->autosa->host_serialize, boundary, 0, NULL,\n              0, 0, is_buffer, fifo_suffix, cur_buf->n_lane);\n      node = insert_io_stmts_tile(node, cur_buf->n_lane, p, kernel, group, \n              cur_buf, cur_buf, read, is_buffer, module, 0, 0, -1);\n            \n      /* Insert the second statement. */\n      p = isl_printer_to_str(ctx);\n      p = print_io_trans_stmt_prefix(\n              p, !read, 0, gen->options->autosa->host_serialize, boundary, 0, NULL,\n              !read, read, is_buffer, fifo_suffix, cur_buf->n_lane);\n      if (module->to_pe || !buf->tile) {\n        node = insert_io_stmts_acc(\n                  node, group->n_lane, p, kernel, group, cur_buf, read, is_buffer, module, 0);\n      } else {\n        node = autosa_tree_move_down_to_io_mark(node, group_core, buf->level);\n        node = isl_schedule_node_child(node, 0);        \n        node = insert_io_stmts_tile(node, buf->n_lane, p, kernel, group, \n                  cur_buf, buf, read, is_buffer, module, 1, 0, -1);\n      }\n    } else {\n      /* Insert one statement.\n       * Load from upper stream I/O modules/DRAM and write to\n       * downstream I/O modules.\n       */\n      if (buf->tile) {\n        int pe_depth;        \n        isl_schedule_node *node_tmp;\n\n        module->data_pack_inter = group->io_buffers[io_level - 1]->n_lane;\n        module->data_pack_intra = buf->n_lane;\n\n        node_tmp = isl_schedule_node_copy(node);\n        node_tmp = autosa_tree_move_down_to_pe(node_tmp, kernel->core);\n        pe_depth = isl_schedule_node_get_schedule_depth(node_tmp);\n        isl_schedule_node_free(node_tmp);\n        if (pe_depth == buf->tile->depth) {\n          node = autosa_tree_move_down_to_pe(node, kernel->core);\n        } else if (pe_depth > buf->tile->depth){\n          node = autosa_tree_move_down_to_depth(node, buf->tile->depth, kernel->core);\n        } else {\n          node = autosa_tree_move_up_to_kernel(node);\n          node = autosa_tree_move_down_to_depth(node, buf->tile->depth, kernel->core);\n        }        \n        p = isl_printer_to_str(ctx);\n        p = print_io_trans_stmt_prefix(\n              p, read, module->to_mem, gen->options->autosa->host_serialize, boundary, 0, NULL,\n              !read, read, is_buffer, fifo_suffix, module->data_pack_inter);\n        node = insert_io_stmts_tile(node, module->data_pack_intra, p, kernel, group, \n                  group->io_buffers[io_level - 1], buf, read, is_buffer, module, 1, 0, -1);\n      } else {\n        module->data_pack_inter = group->n_lane;\n        module->data_pack_intra = group->n_lane;\n\n        p = print_io_trans_stmt_prefix(\n                p, read, module->to_mem, gen->options->autosa->host_serialize, boundary, 0, NULL,\n                !read, read, is_buffer, fifo_suffix, group->n_lane);\n        node = insert_io_stmts_acc(node, group->n_lane, p, kernel, group, NULL, read, is_buffer, module, 0);\n      }\n    }\n  }\n\n  free(fifo_suffix);\n  free(buf_suffix);\n  isl_union_set_free(group_core);\n\n  /* Add the module mark. */\n  id = isl_id_alloc(ctx, \"module\", module);\n  node = autosa_tree_move_up_to_kernel(node);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_mark(node, id);\n\n  if (gen->options->autosa->tuning_method == 1 && !boundary) {\n    isl_schedule *orig_sched = isl_schedule_node_get_schedule(node);\n    module->tuning_sched = kernel->tuning_program->generate_tuning_schedule(isl_schedule_dup(orig_sched));\n\n    isl_schedule *tuning_sched = isl_schedule_dup(orig_sched);\n    isl_schedule_free(orig_sched);\n    /* Remove module filters. */\n    isl_schedule_node *root = isl_schedule_get_root(tuning_sched);\n    isl_schedule_free(tuning_sched);    \n    root = autosa_tree_move_down_to_io_mark(root, kernel->core, io_level);\n    while (isl_schedule_node_has_parent(root)) {\n      root = isl_schedule_node_parent(root);\n      if (isl_schedule_node_get_type(root) == isl_schedule_node_filter) {\n        root = isl_schedule_node_delete(root);\n      }\n      if (autosa_tree_node_is_mark(root, \"array\"))\n        break;\n    }    \n    tuning_sched = isl_schedule_node_get_schedule(root);\n    isl_schedule_node_free(root);\n    module->tuning_num_sched = kernel->tuning_program->generate_tuning_schedule(tuning_sched);\n  }\n\n  sched1 = isl_schedule_node_get_schedule(node);\n  isl_schedule_node_free(node);\n\n  if (!boundary)\n  {\n    module->sched = sched1;\n    module->type = (group->group_type == AUTOSA_DRAIN_GROUP) ? DRAIN_MODULE : IO_MODULE;\n    module->level = io_level;\n    module->n_io_group++;\n    module->io_groups = (struct autosa_array_ref_group **)realloc(module->io_groups,\n                                                                  module->n_io_group * sizeof(struct autosa_array_ref_group *));\n    module->io_groups[module->n_io_group - 1] = group;\n    module->inst_ids = io_ids;\n    module->kernel = kernel;\n    module->is_buffer = is_buffer;\n    module->is_filter = is_filter;\n    /* Create IO module variables. */\n    if (is_buffer)\n    {\n      for (int i = io_level; i >= 1; i--)\n      {\n        buf = group->io_buffers[i - 1];\n        if (buf->tile != NULL)\n          break;\n      }\n      create_io_module_vars(module, kernel, buf->tile, 0);\n    }\n  }\n  else\n  {\n    isl_id_list_free(io_ids);\n    module->boundary_sched = sched1;\n  }\n\n  return isl_stat_ok;\n}\n\n/* Generate the default I/O module when either is_filter or is_buffer is zero.\n */\nstatic __isl_give struct autosa_hw_module *generate_default_io_module(\n    __isl_take struct autosa_hw_module *module, __isl_keep isl_schedule_node *node,\n    struct autosa_array_ref_group *group, struct autosa_kernel *kernel,\n    struct autosa_gen *gen,\n    int io_level, int space_dim, int is_filter, int is_buffer, int read)\n{\n  isl_ctx *ctx = gen->ctx;\n\n  generate_default_io_module_schedule(module, node, group,\n                                      kernel, gen, io_level, space_dim, is_filter, is_buffer, read, 0);\n\n  if (is_filter)\n  {\n    /* Add the boundary module schedule. */\n    module->boundary = 1;\n    generate_default_io_module_schedule(module, node, group,\n                                        kernel, gen, io_level, space_dim, is_filter, is_buffer, read, 1);\n  }\n\n  return module;\n}\n\n/* Generate the I/O modules for transffering the data.\n * The I/O module is decribed by two features:\n * - is_filter: If the module is a filter node, it will keep the data \n *   that belongs to it and sends to the lower-level I/O modules or PEs. \n *   Else, it will simply pass the data to downstream modules.\n * - is buffer: If the module is buffered. We will allocate a local buffer \n *   inside the module.\n */\nstatic __isl_give struct autosa_hw_module *generate_io_module_by_type(\n    __isl_take struct autosa_hw_module *module, __isl_keep isl_schedule_node *node,\n    struct autosa_array_ref_group *group, struct autosa_kernel *kernel,\n    struct autosa_gen *gen, int io_level, int space_dim,\n    int is_filter, int is_buffer, int read)\n{\n  if (is_filter && is_buffer)\n  {\n    module = generate_filter_buffer_io_module(module, node, group, kernel,\n                                              gen, io_level, space_dim, is_filter, is_buffer, read);\n  }\n  else\n  {    \n    module = generate_default_io_module(module, node, group, kernel,\n                                        gen, io_level, space_dim, is_filter, is_buffer, read);\n  }\n\n  return module;\n}\n\n/* This function updates the data pack factors for I/O modules that access\n * the external DRAM. The module data should also be serialized.\n */\nstatic int update_serialize_data_pack(struct autosa_gen *gen, struct autosa_hw_module *module)\n{\n  isl_union_map *sizes;\n  int *data_pack_ubs = NULL;\n  int dram_limit = 64; // bytes\n  int ele_size = module->io_groups[0]->array->size;\n  int n_lane = module->data_pack_inter;\n  int host_pack = -1;\n\n  sizes = extract_sizes_from_str(gen->ctx, module->options->autosa->data_pack_sizes);  \n  data_pack_ubs = read_data_pack_sizes_array(sizes, module->io_groups[0]->array->name);\n  if (data_pack_ubs) \n    dram_limit = data_pack_ubs[2];\n  free(data_pack_ubs);\n  isl_union_map_free(sizes);\n\n  if (module->io_groups[0]->local_array->is_sparse) {\n    /* Extract the sparse information */\n    int n_nzero = module->io_groups[0]->local_array->n_nzero;\n    int n_meta_data = module->io_groups[0]->local_array->n_meta_data;\n    for (int limit = dram_limit; limit >= (ele_size * n_lane * (n_nzero + n_meta_data)); limit -= (ele_size * n_lane * (n_nzero + n_meta_data))) {\n      if (limit % (ele_size * n_lane * (n_nzero + n_meta_data)) == 0 &&\n          module->coalesce_bound % (limit / (ele_size * n_lane * (n_nzero + n_meta_data))) == 0) {\n        host_pack = limit / ele_size;\n        break;\n      }\n    }\n  } else {    \n    isl_printer *p_str = isl_printer_to_str(gen->ctx);\n    p_str = isl_printer_set_output_format(p_str, ISL_FORMAT_C);    \n    p_str = isl_printer_print_pw_qpolynomial(p_str, module->io_groups[0]->local_array->serialize_bound);    \n    char *serialize_bound = isl_printer_get_str(p_str);\n    isl_printer_free(p_str);    \n    std::string serialize_bound_str(serialize_bound);    \n    int serialize_bound_int = stoi(serialize_bound_str);    \n    free(serialize_bound);\n\n    for (int limit = dram_limit; limit >= ele_size * n_lane; limit -= ele_size * n_lane) \n    {\n      /* Limit should be a power of two. */\n      if (log2f((float)limit) != int(log2f((float)limit)))\n        continue;\n      //if (limit % (ele_size * n_lane) == 0 && module->coalesce_bound % (limit / (ele_size * n_lane)) == 0)\n      if (limit % (ele_size * n_lane) == 0 && serialize_bound_int % (limit / (ele_size * n_lane)) == 0)\n      {\n        host_pack = limit / ele_size;\n        break;\n      }\n    }\n  }\n\n  return host_pack != -1? host_pack : module->data_pack_intra;\n}\n\n/* This function builds a set of I/O modules for each I/O group.\n * We will first examine if any flow dependence that is associated with the \n * current group is carried by the array part loops. \n * In that case, credit control should be added to force the dependece.\n * TODO: to be implemented.\n * Next, we will generate the copy-in set and copy-out set of I/O modules for \n * the I/O groups. At each I/O level, we generate one I/O module.\n * We apply the I/O module pruning by default here.\n * Specifically, if the copy-out set at the current array_part loops equals \n * the copy-in set at of the next array_part loops, there is no need to generate\n * to go off-chip, we will prune away such I/O modules.\n * If the I/O group has interior I/O at the PE level, the data required for the \n * next iteration should reside in the PEs.\n * Otherwise, we will connect the copy-out I/O modules to the copy-in I/O modules,\n * and buffer the data on-chip. (TODO: not supported yet.)\n */\nstatic __isl_give struct autosa_hw_module **sa_io_module_gen(\n    struct autosa_array_ref_group *group,\n    struct autosa_gen *gen, int *n_modules, int in, int out)\n{  \n  isl_schedule_node *node;\n  isl_ctx *ctx;\n  struct autosa_kernel *kernel;\n  int space_dim;\n  int io_level;\n  struct autosa_hw_module **modules = NULL;\n  int module_cnt = 0;\n  int credit = 0;\n\n  ctx = gen->ctx;\n  if (gen->options->autosa->lower_int_io_L1_buffer && group->io_L1_lower_schedule) \n    node = isl_schedule_get_root(group->io_L1_lower_schedule);\n  else\n    node = isl_schedule_get_root(group->io_schedule);\n  \n  io_level = group->io_level;\n  space_dim = group->space_dim;  \n  kernel = gen->kernel;\n  node = autosa_tree_move_down_to_kernel(node);\n\n  /* Test if the deps in this I/O group are carried by array part loops.\n   * If so, data hazards are possible, and we will set the credit as true\n   * so that we could enable credit control between read and write I/O modules to \n   * prevent the data hazards. \n   * TODO: This is not supported yet.\n   */\n  if (gen->options->autosa->credit_control)\n  {\n    if (is_flow_dep_carried_by_array_part_loops(group->io_schedule, group, kernel))\n      credit = 1;\n  }\n\n  /* At each I/O level, generate one I/O module. */\n  /* Copy-in group. */  \n  if (in && group->copy_in)\n  {    \n    for (int i = io_level; i >= 1; i--)\n    {\n      struct autosa_hw_module *module;\n      char *module_name = NULL;\n      char *io_mark = NULL;\n      isl_printer *p_str;\n      int is_filter;\n      int is_buffer;\n      int innermost, outermost;\n\n      /* Classify the module type. */\n      outermost = io_level;\n      if (group->io_type == AUTOSA_INT_IO)\n        innermost = 1;\n      else\n        innermost = 2; // IO_L1 is integrated into PEs. No need to generate.\n\n      /* Since we perform I/O clustering automatically, all the I/O modules\n       * except the outermost level will be in the filter mode:\n       * which means that they will pass data to downstream modules\n       * and filter out the data that they need for the lower-level modules\n       * they are connected to.\n       */  \n      if (i == outermost && outermost != innermost) {\n        is_filter = 0;\n        if (gen->options->autosa->lower_int_io_L1_buffer) {\n          is_filter = 1;\n        }\n      } else\n        is_filter = 1;\n      \n      /* All the innermost modules will be buffered to isolate the computation \n       * and data communication. Otherwise, possible data hazards might cause \n       * the design to stuck.\n       */\n      if (i == innermost) \n        is_buffer = 1;\n      else\n        is_buffer = 0;\n\n      if (gen->options->autosa->two_level_buffer)\n      {\n        /* When two-level buffering is enabled, \n         * we will implement a second-level buffe at the outermost I/O module.\n         */\n        if (i == outermost)\n          is_buffer = 1;\n      }\n      if (gen->options->autosa->lower_int_io_L1_buffer)\n      {\n        if (i == outermost) \n          is_buffer = group->io_buffers[outermost - 1]->tile? 1 : 0;\n      }      \n\n      /* Generate the I/O module */\n      if (i >= innermost && i <= outermost)\n      {\n        module = autosa_hw_module_alloc(gen);\n        module_name = generate_io_module_name(ctx, group, i, 1);\n        module->name = module_name;\n        module->to_pe = (i == innermost) ? 1 : 0;\n        module->to_mem = (i == outermost) ? 1 : 0;\n        module->credit = (i == outermost) ? credit : 0;\n        module->n_array_ref = group->local_array->n_io_group_refs;\n        module->in = 1;\n        module->is_serialized = (gen->options->autosa->host_serialize && module->to_mem) ? 1 : 0;\n        if (module->to_mem)\n        {\n          /* Create the group_ref and mem_port mapping. */\n          for (int p = 0; p < group->n_mem_ports; p++)\n          {\n            int group_ref_offset = group->local_array->n_io_group_refs;\n            int mem_port_offset = group->mem_port_id;                 \n            group->local_array->group_ref_mem_port_map.push_back(group_ref_offset + p);\n            group->local_array->group_ref_mem_port_map.push_back(mem_port_offset + p);\n          }\n          group->local_array->n_io_group_refs += group->n_mem_ports;\n        }\n\n        module = generate_io_module_by_type(module, node, group, kernel,\n                                            gen, i, space_dim, is_filter, is_buffer, 1);\n        if (module->is_serialized)\n        {\n          /* Generate the schedule for serializing/deserializing the host data. */          \n          module->serialize_sched = generate_serialize_schedule(\n              kernel, group, module, gen, 1);\n          if (module->serialize_sched) {\n            /* Update the data packing factor. */            \n            module->data_pack_serialize = update_serialize_data_pack(gen, module);            \n            module->io_groups[0]->local_array->n_lane = module->data_pack_serialize;\n            module->io_groups[0]->local_array->array->n_lane = module->data_pack_serialize;\n          }\n        } else {\n          module->is_serialized = 0;\n        }\n\n        module_cnt++;\n        modules = (struct autosa_hw_module **)realloc(modules,\n                                                      module_cnt * sizeof(struct autosa_hw_module *));\n        modules[module_cnt - 1] = module;\n      }\n    }\n  }\n  \n  /* Copy-out group. */  \n  if (out && group->copy_out)\n  {    \n    for (int i = 1; i <= io_level; i++)\n    {\n      struct autosa_hw_module *module;\n      char *module_name = NULL;\n      char *io_mark = NULL;\n      isl_printer *p_str;\n      int is_filter;\n      int is_buffer;\n      int innermost, outermost;\n\n      /* Classify the module type. */\n      outermost = io_level;\n      if (group->io_type == AUTOSA_INT_IO)\n        innermost = 1;\n      else\n        innermost = 2; // IO_L1 is integrated into PEs.\n\n      if (i == outermost && outermost != innermost)\n        is_filter = 0;\n      else\n        is_filter = 1;\n      \n      if (i == innermost) \n        is_buffer = 1;\n      else\n        is_buffer = 0;\n\n      if (gen->options->autosa->two_level_buffer)\n      {\n        /* When two-level buffering is enabled, \n         * we will implement a second-level buffer at the outermost I/O module.\n         */\n        if (i == outermost)\n          is_buffer = 1;\n      }\n\n      /* Generate the I/O module. */\n      if (i >= innermost && i <= outermost)\n      {\n        module = autosa_hw_module_alloc(gen);\n        module_name = generate_io_module_name(ctx, group, i, 0);\n        module->name = module_name;\n        module->to_pe = (i == innermost) ? 1 : 0;\n        module->to_mem = (i == outermost) ? 1 : 0;\n        module->credit = (i == outermost) ? credit : 0;\n        module->n_array_ref = group->local_array->n_io_group_refs;\n        module->in = 0;\n        module->is_serialized = (gen->options->autosa->host_serialize && module->to_mem) ? 1 : 0;\n        if (module->to_mem)\n        {\n          /* Create the group_ref and mem_port mapping. */\n          for (int p = 0; p < group->n_mem_ports; p++)\n          {\n            int group_ref_offset = group->local_array->n_io_group_refs;\n            int mem_port_offset = group->mem_port_id;                        \n            group->local_array->group_ref_mem_port_map.push_back(group_ref_offset + p);\n            group->local_array->group_ref_mem_port_map.push_back(mem_port_offset + p);\n          }\n          group->local_array->n_io_group_refs += group->n_mem_ports;\n        }\n        \n        module = generate_io_module_by_type(module, node, group, kernel,\n                                            gen, i, space_dim, is_filter, is_buffer, 0);\n        if (module->is_serialized)\n        {\n          /* Generate the schedule for serializing/deserializing the host data. */          \n          module->serialize_sched = generate_serialize_schedule(\n              kernel, group, module, gen, 0);\n          if (module->serialize_sched) {\n            /* Update the data packing factor. */\n            module->data_pack_serialize = update_serialize_data_pack(gen, module);            \n            module->io_groups[0]->local_array->n_lane = module->data_pack_serialize;\n            module->io_groups[0]->local_array->array->n_lane = module->data_pack_serialize;\n          }            \n        } else {\n          module->is_serialized = 0;\n        }\n\n        module_cnt++;\n        modules = (struct autosa_hw_module **)realloc(modules,\n                                                      module_cnt * sizeof(struct autosa_hw_module *));\n        modules[module_cnt - 1] = module;\n      }\n    }\n  }\n\n  isl_schedule_node_free(node);\n  *n_modules = module_cnt;\n  return modules;\n}\n\n/* If the band node \"node\" has more than \"n\" members, then split off\n * the first \"n\" of them.\n */\nstatic __isl_give isl_schedule_node *split_band(\n    __isl_take isl_schedule_node *node, int n)\n{\n  int dim;\n\n  dim = isl_schedule_node_band_n_member(node);\n  if (n < dim)\n    node = isl_schedule_node_band_split(node, n);\n\n  return node;\n}\n\n/* Compute the effective sa size as a list of the sizes in each dimension.\n *\n * The sa size specified by the user or set by default\n * in read_array_part_tile_sizes() and applied by the PE filter,\n * may be too large for the given code in the sense that\n * it may contain PEs that don't need to execute anything.\n * We therefore don't return this sa size, but instead the\n * smallest grid size that ensures that all blocks that actually\n * execute code are included in the grid.\n *\n * We first extract a description of the grid, i.e., the possible values\n * of the PE ids, from the domain elements in \"domain\" and\n * kernel->pe_filter.\n * The PE ids are parameters in kernel->pe_filter.\n * We simply need to change them into set dimensions.\n *\n * Then, for each PE dimension, we compute the maximal value of the PE id\n * and add one.\n */\nstatic __isl_give isl_multi_pw_aff *extract_sa_grid_size(\n    struct autosa_kernel *kernel, __isl_take isl_union_set *domain)\n{\n  int i;\n  isl_set *grid;\n  isl_set *context;\n  isl_multi_pw_aff *size;\n\n  domain = isl_union_set_intersect(domain,\n                                   isl_union_set_copy(kernel->pe_filter));\n\n  grid = isl_union_set_params(domain);\n  grid = isl_set_from_params(grid);\n  grid = isl_set_add_dims(grid, isl_dim_set, kernel->n_sa_dim);\n\n  for (i = 0; i < kernel->n_sa_dim; ++i)\n  {\n    int pos;\n    isl_id *id;\n\n    if (!grid)\n      return NULL;\n\n    id = isl_id_list_get_id(kernel->pe_ids, i);\n    pos = isl_set_find_dim_by_id(grid, isl_dim_param, id);\n    isl_id_free(id);\n    if (pos < 0)\n      isl_die(isl_set_get_ctx(grid), isl_error_internal,\n              \"missing constraints on PE identifier\",\n              grid = isl_set_free(grid));\n    grid = isl_set_equate(grid, isl_dim_param, pos, isl_dim_set, i);\n    grid = isl_set_project_out(grid, isl_dim_param, pos, 1);\n  }\n\n  grid = isl_set_coalesce(grid);\n  size = ppcg_size_from_extent(grid);\n  context = isl_set_params(isl_set_copy(kernel->context));\n  return isl_multi_pw_aff_gist(size, context);\n}\n\n/* Internal struct for add_pe_ext_io_copies. */\nstruct autosa_add_pe_ext_io_copies_data\n{\n  struct autosa_kernel *kernel;\n  struct autosa_array_ref_group *pe_group;\n  struct autosa_array_ref_group *io_group;\n  struct autosa_stmt_access *ref;\n  int read;\n  int in; /* I/O direction */\n  int dummy;\n  int reduce;\n  isl_union_set *filter;\n};\n\n/* Find the PE group that contains the reference \"ref\" from the IO group.\n */\nstatic struct autosa_array_ref_group *autosa_find_pe_group(\n    struct autosa_local_array_info *local_array,\n    struct autosa_array_ref_group *io_group,\n    struct autosa_stmt_access *ref)\n{\n  /* As all accesses from the array are merged together for internal array,\n   * simply return the first PE group. \n   */\n  if (local_array->array_type == AUTOSA_INT_ARRAY)\n    return local_array->pe_groups[0];\n\n  for (int i = 0; i < local_array->n_pe_group; i++)\n  {\n    struct autosa_array_ref_group *pe_group = local_array->pe_groups[i];\n    if (pe_group->refs[0] == ref)\n      return pe_group;\n  }\n\n  return NULL;\n}\n\n/* Given a schedule node \"node\" of the type \"isl_schedule_node_leaf\", \n * we will test if it is under any extension node.\n * If so, we will then test if the current node intersect with the extension domain. \n */\nstatic isl_bool leaf_node_is_extended(__isl_keep isl_schedule_node *node)\n{\n  isl_schedule_node *node_e;\n  isl_schedule_node *node_f;\n  isl_union_set *filter;\n  isl_union_map *extension;\n  isl_union_set *extension_range;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_leaf)\n    return isl_bool_error;\n\n  node_e = isl_schedule_node_copy(node);\n  node_f = isl_schedule_node_copy(node);\n\n  while (node_e && isl_schedule_node_has_parent(node_e))\n  {\n    if (isl_schedule_node_get_type(node_e) == isl_schedule_node_extension)\n      break;\n    node_e = isl_schedule_node_parent(node_e);\n  }\n\n  if (node_e == NULL || isl_schedule_node_get_type(node_e) != isl_schedule_node_extension)\n  {\n    isl_schedule_node_free(node_e);\n    isl_schedule_node_free(node_f);\n    return isl_bool_false;\n  }\n\n  extension = isl_schedule_node_extension_get_extension(node_e);\n\n  while (node_f && isl_schedule_node_has_parent(node_f))\n  {\n    if (isl_schedule_node_get_type(node_f) == isl_schedule_node_filter)\n      break;\n    node_f = isl_schedule_node_parent(node_f);\n  }\n\n  filter = isl_schedule_node_filter_get_filter(node_f);\n  extension_range = isl_union_map_range(extension);\n  filter = isl_union_set_intersect(filter, extension_range);\n  isl_schedule_node_free(node_e);\n  isl_schedule_node_free(node_f);\n  if (isl_union_set_is_empty(filter))\n  {\n    isl_union_set_free(filter);\n    return isl_bool_false;\n  }\n\n  isl_union_set_free(filter);\n  return isl_bool_true;\n}\n\n/* Insert data transfer statements beside the program statements. \n * If the statement is under the SIMD loop, the data transfer statements \n * are inserted before/after the SIMD loop. \n * Otherwise, it is inserted before/after the statement.\n */\n__isl_give isl_schedule_node *add_pe_ext_io_copies_stmt(\n    __isl_take isl_schedule_node *node, void *user)\n{\n  struct autosa_add_pe_ext_io_copies_data *data =\n      (struct autosa_add_pe_ext_io_copies_data *)(user);\n  isl_union_set *domain;\n  isl_space *space;\n  isl_space *acc_space;\n  isl_id *id;\n  isl_union_map *access;\n  int empty;\n  isl_multi_aff *from_access;\n  isl_ctx *ctx;\n  isl_schedule_node *graft;\n  isl_multi_aff *ma;\n  isl_multi_pw_aff *mpa;\n  isl_multi_union_pw_aff *mupa;\n  struct autosa_array_ref_group *pe_group = data->pe_group;\n  struct autosa_array_ref_group *io_group = data->io_group;\n  struct autosa_array_tile *tile;\n  int read = data->read;\n  isl_union_map *sched;\n  isl_union_map *ref_access;\n  isl_map *acc;\n  isl_bool ok;\n  int is_simd;\n  isl_printer *p_str;\n  char *stmt_name;\n  isl_union_set *empty_filter;\n  int n_lane = io_group->n_lane;\n\n  /* Test if the current stmt contains the reference. */\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_leaf)\n    return node;\n\n  /* Test if the node is under any extension node and if the \n   * node is extended by the extension node. \n   */\n  if (!leaf_node_is_extended(node))\n  {\n    isl_set *set;\n    isl_id *new_id;\n    domain = isl_schedule_node_get_domain(node);\n    set = isl_set_from_union_set(domain);\n    space = isl_set_get_space(set);\n    isl_set_free(set);\n    id = isl_space_get_tuple_id(space, isl_dim_set);\n    isl_space_free(space);\n    acc_space = isl_map_get_space(data->ref->access);\n    new_id = isl_space_get_tuple_id(acc_space, isl_dim_in);\n    if (id != new_id)\n    {\n      isl_space_free(acc_space);\n      isl_id_free(id);\n      isl_id_free(new_id);\n\n      /* Insert empty filter for dummy module. */\n      if (data->dummy)\n      {\n        empty_filter = isl_union_set_from_set(\n            isl_set_empty(isl_set_get_space(data->kernel->context)));\n        node = isl_schedule_node_insert_filter(node, empty_filter);\n      }\n      return node;\n    }\n    isl_id_free(id);\n    isl_id_free(new_id);\n    isl_space_free(acc_space);\n  }\n  else\n  {\n    /* Simply return for the extension nodes. */\n    return node;\n  }\n\n  ctx = isl_schedule_node_get_ctx(node);\n  tile = NULL;\n  /* Examine if there is any SIMD mark above. */\n  is_simd = is_node_under_simd(node);\n\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, ctx);\n//#endif\n\n  /* Aggregate the copy-in/out access\n   * S -> [D -> A]\n   * S: statement domain elements\n   * D: prefix schedule dimensions\n   * A: access\n   */\n  if (is_simd)\n  {\n    /* We will insert the statements before/after the SIMD loop. */\n    if (data->dummy)\n    {\n      isl_union_set *empty_filter;\n      empty_filter = isl_union_set_from_set(isl_set_empty(\n          isl_set_get_space(data->kernel->context)));\n      node = isl_schedule_node_insert_filter(node, empty_filter);\n    }\n    node = autosa_tree_move_up_to_mark(node, \"simd\");\n  }\n  access = io_comm_access_ref(data->kernel, node, io_group, data->ref, read);\n  empty = isl_union_map_is_empty(access);\n  if (empty < 0 || empty)\n  {\n    isl_union_map_free(access);\n    if (empty < 0)\n      return isl_schedule_node_free(node);\n    return autosa_tree_move_up_to_kernel(node);\n  }\n\n  if (data->dummy)\n  {\n    data->filter = isl_schedule_node_get_domain(node);\n  }\n\n  //pe_group->array->global = 1;\n  //pe_group->local_array->global = 1;\n\n  /* read.fifoX[D -> A] -> [D -> A] */\n  p_str = isl_printer_to_str(ctx);\n  if (data->dummy)\n    p_str = print_io_stmt_prefix(p_str, data->in, data->dummy, data->reduce, io_group);  \n  else\n    p_str = print_io_stmt_prefix(p_str, read, data->dummy, 0, io_group);\n  \n  stmt_name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  from_access = autosa_create_io_access_stmt(ctx, pe_group, io_group,\n                                             autosa_array_ref_group_tile(pe_group),\n                                             isl_schedule_node_get_schedule_depth(node), stmt_name);\n  free(stmt_name);\n\n  /* Create a register tiling. */\n  tile = create_register_tiling(node, pe_group, data->ref);\n  /* [D -> A] -> T */\n  ma = isl_multi_aff_copy(tile->tiling);\n  ma = isl_multi_aff_pullback_multi_aff(ma,\n                                        isl_multi_aff_copy(from_access));\n  mpa = isl_multi_pw_aff_from_multi_aff(ma);\n  /* read.fifoX[D -> A] -> T */\n  mupa = isl_multi_union_pw_aff_from_multi_pw_aff(mpa);\n  /* [D -> A] */\n  domain = isl_union_map_range(access);\n  /* read.fifoX[D -> A] */\n  domain = isl_union_set_preimage_multi_aff(domain, from_access);\n  /* read.fifoX[D -> A] -> D */\n  access = isl_union_set_wrapped_domain_map(domain);\n  /* D -> read.fifoX[D -> A] */\n  access = isl_union_map_reverse(access);\n  access = isl_union_map_coalesce(access);\n\n//#ifdef _DEBUG\n//  DBGUMAP(stdout, access, ctx);\n//#endif\n\n  graft = isl_schedule_node_from_extension(access);\n\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, graft, ctx);\n//  DBGMUPA(stdout, mupa, ctx);\n//#endif  \n  graft = isl_schedule_node_child(graft, 0);\n  graft = isl_schedule_node_insert_partial_schedule(graft, mupa);\n\n  /* Modify the n_lane for the sparse data */\n  if (io_group->local_array->is_sparse) {\n    n_lane *= (io_group->local_array->compress_ratio * io_group->local_array->n_nzero);\n  }\n\n  if (n_lane > 1)\n  {\n    /* Perform data packing. */\n    int n_index;\n    int tile_size[1];\n    isl_id *id;\n    isl_union_map *umap;\n    isl_union_set *filter;\n\n    n_index = isl_schedule_node_band_n_member(graft);\n    /* Split off the last dimension. */\n    if (n_index > 1)\n    {\n      graft = isl_schedule_node_band_split(graft, n_index - 1);\n      graft = isl_schedule_node_child(graft, 0);\n    }\n    /* Tile the last dimension. */\n    tile_size[0] = n_lane;\n    graft = autosa_tile_band(graft, tile_size);\n    graft = isl_schedule_node_child(graft, 0);\n    /* Create a filter. */\n    filter = schedule_eq_lb(graft);\n    graft = isl_schedule_node_insert_filter(graft, filter);\n  }\n\n  while (graft && isl_schedule_node_has_parent(graft))\n    graft = isl_schedule_node_parent(graft);\n\n  if (read) {\n    node = isl_schedule_node_graft_before(node, graft);\n  } else {\n    node = isl_schedule_node_graft_after(node, graft);\n  }\n\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif\n\n  if (data->dummy) {\n    /* insert an empty filter. */\n    empty_filter = isl_union_set_from_set(isl_set_empty(\n        isl_set_get_space(data->kernel->context)));\n    node = isl_schedule_node_insert_filter(node, empty_filter);\n  }\n\n  node = isl_schedule_node_parent(node); // filter\n  node = isl_schedule_node_parent(node); // sequence\n  node = isl_schedule_node_parent(node); // extension\n\n  autosa_array_tile_free(tile);\n\n  return node;\n}\n\n/* The \"node\" is pointed to the \"PE\" mark.\n * Add data transfer statements for each array access in the group.\n */\nstatic __isl_give isl_schedule_node *add_pe_ext_io_copies(\n    struct autosa_kernel *kernel,\n    struct autosa_local_array_info *local_array,\n    struct autosa_array_ref_group *io_group,\n    __isl_take isl_schedule_node *node, int read)\n{\n  for (int i = 0; i < io_group->n_ref; i++)\n  {\n    struct autosa_stmt_access *ref = io_group->refs[i];\n\n    if ((io_group->local_array->array_type == AUTOSA_EXT_ARRAY) ||\n       ((io_group->local_array->array_type == AUTOSA_INT_ARRAY) && \n       (read && ref->read) || (!read && ref->write)))\n    {\n      struct autosa_array_ref_group *pe_group =\n          autosa_find_pe_group(local_array, io_group, ref);\n      struct autosa_add_pe_ext_io_copies_data data =\n          {kernel, pe_group, io_group, ref, read, read, 0, 0, NULL};\n      node = isl_schedule_node_map_descendant_bottom_up(node,\n                                                        &add_pe_ext_io_copies_stmt, &data);\n    }\n  }\n\n  return node;\n}\n\n/* Add the statements for copy-in/out the data for array references associated with\n * interior I/O.\n * The \"node\" is pointed to the \"PE\" mark.\n */\n__isl_give isl_schedule_node *add_pe_int_io_copies(\n    struct autosa_kernel *kernel,\n    struct autosa_local_array_info *local_array,\n    struct autosa_array_ref_group *io_group,\n    __isl_take isl_schedule_node *node, int read)\n{\n  struct autosa_array_tile *tile;\n  isl_union_map *access;\n  isl_schedule_node *graft;\n  int empty;\n  isl_multi_aff *from_access;\n  isl_multi_aff *ma;\n  isl_multi_pw_aff *mpa;\n  isl_multi_union_pw_aff *mupa;\n  isl_union_set *domain;\n  struct autosa_array_ref_group *pe_group;\n  int n_lane = io_group->n_lane;\n  isl_printer *p_str;\n  char *stmt_name;\n  isl_id *id;\n\n  node = isl_schedule_node_child(node, 0);\n  /* For array references with interior I/O, \n   * search for the corresponding PE group. */\n  pe_group = autosa_find_pe_group(local_array, io_group, NULL);\n  tile = autosa_array_ref_group_tile(pe_group);\n\n  /* Aggregate the copy-in/out access \n   * S -> [D -> A] \n   * S: statement domain elements\n   * D: prefix schedule dimensions \n   * A: access */\n  access = io_comm_access(kernel, node, io_group, read);\n  empty = isl_union_map_is_empty(access);\n  if (empty < 0 || empty)\n  {\n    isl_union_map_free(access);\n    if (empty < 0)\n      return isl_schedule_node_free(node);\n    return autosa_tree_move_up_to_pe(node);\n  }\n\n  //pe_group->array->global = 1;\n  //pe_group->local_array->global = 1;\n\n  /* read.fifoX[D -> A] -> [D -> A] */\n  /* Generate statement name. */\n  p_str = isl_printer_to_str(kernel->ctx);\n  p_str = print_io_stmt_prefix(p_str, read, 0, 0, io_group);  \n  stmt_name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  from_access = autosa_create_io_access_stmt(kernel->ctx, pe_group, io_group,\n                                             autosa_array_ref_group_tile(pe_group),\n                                             isl_schedule_node_get_schedule_depth(node), stmt_name);\n  free(stmt_name);\n\n  /* [D -> A] -> T */\n  ma = isl_multi_aff_copy(tile->tiling);\n  ma = isl_multi_aff_pullback_multi_aff(ma,\n                                        isl_multi_aff_copy(from_access));\n  mpa = isl_multi_pw_aff_from_multi_aff(ma);\n  /* read.fifoX[D -> A] -> T */\n  mupa = isl_multi_union_pw_aff_from_multi_pw_aff(mpa);\n  /* [D -> A] */\n  domain = isl_union_map_range(access);\n  /* If the array is not a scalar, then we copy in/out the entire\n   * tile to/from the local memory. \n   */\n  if (read && !autosa_array_is_scalar(io_group->array))\n  {\n    isl_map *map;\n    isl_set *set;\n    set = isl_map_domain(isl_map_from_union_map(isl_union_set_unwrap(domain)));\n    map = group_tile_buffer(io_group, io_group->pe_tile);\n    map = isl_map_intersect_domain(map, set);\n    domain = isl_union_set_from_set(isl_map_wrap(map));\n  }\n\n  /* read.fifoX[D -> A] */\n  domain = isl_union_set_preimage_multi_aff(domain, from_access);\n  access = isl_union_set_wrapped_domain_map(domain);\n  access = isl_union_map_reverse(access);\n  access = isl_union_map_coalesce(access);\n\n  graft = isl_schedule_node_from_extension(access);\n  graft = isl_schedule_node_child(graft, 0);\n  graft = isl_schedule_node_insert_partial_schedule(graft, mupa);\n\n  if (n_lane > 1)\n  {\n    /* Perform data packing. */\n    int n_index;\n    int tile_size[1];\n    isl_id *id;\n    isl_union_map *umap;\n    isl_union_set *filter;\n\n    n_index = isl_schedule_node_band_n_member(graft);\n    /* Split off the last dimension. */\n    if (n_index > 1)\n    {\n      graft = isl_schedule_node_band_split(graft, n_index - 1);\n      graft = isl_schedule_node_child(graft, 0);\n    }\n    /* Tile the last dimension. */\n    tile_size[0] = n_lane;\n    graft = autosa_tile_band(graft, tile_size);\n    graft = isl_schedule_node_child(graft, 0);\n    /* Create a filter. */\n    filter = schedule_eq_lb(graft);\n    graft = isl_schedule_node_insert_filter(graft, filter);\n    /* Move to the tile loop. */\n    graft = isl_schedule_node_parent(graft);\n  }\n\n  /* Insert a \"pipeline\" mark inside the band node. */\n  id = isl_id_alloc(kernel->ctx, \"hls_pipeline\", NULL);\n  graft = isl_schedule_node_child(graft, 0);\n  graft = isl_schedule_node_insert_mark(graft, id);\n  graft = isl_schedule_node_parent(graft);\n\n  while (graft && isl_schedule_node_has_parent(graft))\n    graft = isl_schedule_node_parent(graft);\n\n  if (read)\n  {\n    node = isl_schedule_node_graft_before(node, graft);\n  }\n  else\n  {\n    node = isl_schedule_node_graft_after(node, graft);\n  }\n\n  node = autosa_tree_move_up_to_pe(node);\n\n  return node;\n}\n\nstatic isl_bool find_latency_mark(__isl_keep isl_schedule_node *node, void *user)\n{\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_mark)\n  {\n    isl_id *id;\n\n    id = isl_schedule_node_mark_get_id(node);\n    if (!strcmp(isl_id_get_name(id), \"latency\"))\n    {\n      isl_id_free(id);\n      return isl_bool_false;\n    }\n    isl_id_free(id);\n  }\n\n  return isl_bool_true;\n}\n\n/* Insert a \"hls_pipeline\" mark after the innermost \"latency\" mark.\n * The loop will be eventually pipelined.\n * The \"hls_pipeline\" mark is placed under the band node.\n */\nstatic __isl_give isl_schedule_node *insert_pipeline_mark(\n    __isl_take isl_schedule_node *node, void *user)\n{\n  struct autosa_kernel *kernel = (struct autosa_kernel *)user;\n  isl_ctx *ctx = kernel->ctx;\n\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_mark)\n  {\n    isl_id *id;\n\n    id = isl_schedule_node_mark_get_id(node);\n    if (!strcmp(isl_id_get_name(id), \"latency\"))\n    {\n      /* Examine if there is any latency mark inside the current mark. */\n      isl_bool no_inner_latency;\n      node = isl_schedule_node_child(node, 0);\n      no_inner_latency = isl_schedule_node_every_descendant(node,\n                                                            &find_latency_mark, NULL);\n      node = isl_schedule_node_parent(node);\n      if (no_inner_latency)\n      {\n        /* Insert the \"hls_pipeline\" mark below the band node. */\n        isl_id *hls_id;\n        hls_id = isl_id_alloc(ctx, \"hls_pipeline\", NULL);\n        node = isl_schedule_node_child(node, 0);\n        node = isl_schedule_node_child(node, 0);\n        node = isl_schedule_node_insert_mark(node, hls_id);\n\n        node = isl_schedule_node_parent(node);\n        node = isl_schedule_node_parent(node);\n      }\n    }\n    isl_id_free(id);\n  }\n\n  return node;\n}\n\n/* Tile the SIMD loop for the sparsity */\nstatic __isl_give isl_schedule_node *tile_simd_sparse(\n  __isl_take isl_schedule_node *node, void *user)\n{\n  struct autosa_kernel *kernel = (struct autosa_kernel *)user;\n  isl_ctx *ctx = kernel->ctx;\n\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_mark) {\n    isl_id *id;\n\n    id = isl_schedule_node_mark_get_id(node);\n    isl_id_free(id);\n    if (!strcmp(isl_id_get_name(id), \"simd\")) {\n      isl_union_map *umap;\n      isl_union_set *uset, *filter;\n      isl_set *set;\n      int new_ub = kernel->simd_w / kernel->compress_ratio;\n\n      umap = isl_schedule_node_get_subtree_schedule_union_map(node);\n      uset = isl_union_map_range(isl_union_map_copy(umap));\n      set = isl_set_from_union_set(uset);\n//#ifdef _DEBUG\n//      DBGSET(stdout, set, ctx);\n//      //exit(0);\n//#endif\n      set = isl_set_upper_bound_si(set, isl_dim_set, 0, new_ub - 1);\n      filter = isl_union_map_range(isl_union_map_intersect_domain(\n                  isl_union_map_reverse(umap), isl_union_set_from_set(set)));                  \n//#ifdef _DEBUG\n//      DBGSET(stdout, set, ctx);\n//      exit(0);\n//#endif\n      while (isl_schedule_node_get_type(node) != isl_schedule_node_band) {\n        node = isl_schedule_node_child(node, 0);\n      }\n      node = isl_schedule_node_insert_filter(node, filter);\n      //node = isl_schedule_node_child(node, 0);           \n      while (isl_schedule_node_has_parent(node)) {\n        if (isl_schedule_node_get_type(node) == isl_schedule_node_mark) {\n          isl_id *id;\n          id = isl_schedule_node_mark_get_id(node);\n          if (!strcmp(isl_id_get_name(id), \"simd\")) {\n            isl_id_free(id);\n            break;\n          }\n          isl_id_free(id);\n        }\n        node = isl_schedule_node_parent(node);\n      }\n    }    \n  }\n\n  return node;\n}\n\n/* Insert a \"hls_unroll\" mark after the \"simd\" mark.\n * The loop will be eventually unrolled.\n * The \"hls_unroll\" mark is placed under the band node.\n */\nstatic __isl_give isl_schedule_node *insert_unroll_mark(\n  __isl_take isl_schedule_node *node, void *user)\n{\n  struct autosa_kernel *kernel = (struct autosa_kernel *)user;\n  isl_ctx *ctx = kernel->ctx;\n\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_mark)\n  {\n    isl_id *id;\n\n    id = isl_schedule_node_mark_get_id(node);\n    if (!strcmp(isl_id_get_name(id), \"simd\"))\n    {\n      isl_id *hls_id;\n      hls_id = isl_id_alloc(ctx, \"hls_unroll\", NULL);\n      \n      if (kernel->options->target == AUTOSA_TARGET_CATAPULT_HLS_C) {\n        /* The hls_unroll will be inserted above the loop. */\n        node = isl_schedule_node_child(node, 0);        \n        node = isl_schedule_node_insert_mark(node, hls_id);        \n        node = isl_schedule_node_parent(node);\n      } else {\n        node = isl_schedule_node_child(node, 0);\n        node = isl_schedule_node_child(node, 0);\n        node = isl_schedule_node_insert_mark(node, hls_id);\n        node = isl_schedule_node_parent(node);\n        node = isl_schedule_node_parent(node);\n      }      \n    }\n    isl_id_free(id);\n  }\n\n  return node;\n}\n\n/* Insert a context node at \"node\" introducing the PE identifiers \n * along with their bounds, which are stored in kernel->sa_grid_size.\n */\nstatic __isl_give isl_schedule_node *insert_context(struct autosa_kernel *kernel,\n                                                    __isl_take isl_schedule_node *node)\n{\n  isl_set *context;\n\n  context = isl_set_universe(isl_set_get_space(kernel->context));\n  context = add_bounded_parameters_dynamic(context,\n                                           kernel->sa_grid_size, kernel->pe_ids);\n  node = isl_schedule_node_insert_context(node, context);\n\n  return node;\n}\n\n/* Create the local buffer variables inside the PE.\n * Specifically, we will also scan through all IO groups for the array,\n * find the lcm of all the data packing factors to set as the array partitioning\n * factor for the local buffer so that all I/O groups should be able to \n * access the packed elements without any bank conflict.\n */\nstatic void create_pe_module_var(isl_ctx *ctx,\n                                 struct autosa_kernel *kernel,\n                                 struct autosa_array_ref_group *group,\n                                 struct autosa_kernel_var *var, struct autosa_local_array_info *local,\n                                 const char *suffix, int sparse_modify_size)\n{\n  struct autosa_array_tile *tile;\n  isl_printer *p;\n  isl_val *lcm = isl_val_int_from_si(ctx, 1);\n\n  var->array = group->array;\n  var->type = autosa_array_ref_group_type(group);\n  var->n_lane = 1;\n  /* Scan all the I/O groups, and compute the lcm of the group SIMD factors,\n   * set it as the partition factor of the variable. */\n  for (int i = 0; i < local->n_io_group; i++)\n  {\n    struct autosa_array_ref_group *io_group = local->io_groups[i];\n    isl_val *val = isl_val_int_from_si(ctx, io_group->n_lane);\n    isl_val *product = isl_val_mul(isl_val_copy(val), isl_val_copy(lcm));\n    isl_val *gcd = isl_val_gcd(val, lcm);\n    lcm = isl_val_div(product, gcd);\n  }  \n  var->n_part = isl_val_get_num_si(lcm);  \n  isl_val_free(lcm);\n\n  tile = autosa_array_ref_group_tile(group);\n\n  p = isl_printer_to_str(ctx);\n  p = autosa_array_ref_group_print_name(group, p);\n  if (suffix) {\n    p = isl_printer_print_str(p, suffix);\n  }\n  var->name = isl_printer_get_str(p);\n  isl_printer_free(p);\n\n  if (tile == NULL)\n  {\n    var->size = isl_vec_alloc(ctx, 1);\n    var->size = isl_vec_set_element_si(var->size, 0, 1);\n  }\n  else\n  {\n    var->size = isl_vec_alloc(ctx, group->array->n_index);\n    for (int i = 0; i < group->array->n_index; ++i)\n    {\n      isl_val *size;\n\n      size = isl_val_copy(tile->bound[i].size);\n      \n      if (i == group->array->n_index - 1) {\n        if (group->local_array->is_sparse || sparse_modify_size) {\n          size = isl_val_mul_ui(size, kernel->n_nzero);\n          size = isl_val_div_ui(size, kernel->vec_len);\n        }\n      }\n      var->size = isl_vec_set_element_val(var->size, i, size);\n    }\n  }\n}\n\n/* Create the local buffer variables inside the PE module. */\nstatic isl_stat create_pe_module_vars(struct autosa_hw_module *module,\n                                      struct autosa_kernel *kernel)\n{\n  int n = 0;\n  for (int i = 0; i < kernel->n_array; ++i)\n  {\n    struct autosa_local_array_info *array = &kernel->array[i];\n\n    for (int j = 0; j < array->n_pe_group; j++)\n    {\n      struct autosa_array_ref_group *group = array->pe_groups[j];\n      enum autosa_group_access_type type;\n\n      type = autosa_array_ref_group_type(group);\n      if (type != AUTOSA_ACCESS_GLOBAL)\n        n++;      \n    }\n  }\n\n  module->var = isl_calloc_array(kernel->ctx, struct autosa_kernel_var, n);\n  if (!module->var)\n    return isl_stat_error;\n  module->n_var = n;\n\n  n = 0;\n  for (int i = 0; i < kernel->n_array; ++i)\n  {\n    struct autosa_local_array_info *array = &kernel->array[i];\n\n    for (int j = 0; j < array->n_pe_group; j++)\n    {\n      struct autosa_array_ref_group *group = array->pe_groups[j];\n      enum autosa_group_access_type type;\n\n      type = autosa_array_ref_group_type(group);\n      if (type == AUTOSA_ACCESS_GLOBAL)\n        continue;\n      if (kernel->sparse && array->array_type == AUTOSA_EXT_ARRAY && array->is_sparse == 0) {        \n        create_pe_module_var(kernel->ctx, kernel, group, &module->var[n], array, NULL, 1);\n        n++;\n      } else {\n        create_pe_module_var(kernel->ctx, kernel, group, &module->var[n], array, NULL, 0);\n        n++;\n      }      \n    }\n  }\n\n  return isl_stat_ok;\n}\n\n/* The \"node\" is pointed to the \"PE\" mark.\n */\nstatic __isl_give isl_schedule_node *add_pe_ext_io_copies_dummy(\n    struct autosa_kernel *kernel,\n    struct autosa_local_array_info *local_array,\n    struct autosa_array_ref_group *io_group,\n    __isl_take isl_schedule_node *node, int read, int in, int reduce)\n{\n  isl_union_set *filter = isl_union_set_from_set(isl_set_empty(\n      isl_set_get_space(kernel->context)));\n  for (int i = 0; i < io_group->n_ref; i++)\n  {\n    struct autosa_stmt_access *ref = io_group->refs[i];\n\n    if ((io_group->local_array->array_type == AUTOSA_EXT_ARRAY) ||\n       ((io_group->local_array->array_type == AUTOSA_INT_ARRAY) && \n       (read && ref->read) || (!read && ref->write)))\n    {\n      struct autosa_array_ref_group *pe_group = autosa_find_pe_group(\n          local_array, io_group, ref);\n      struct autosa_add_pe_ext_io_copies_data data =\n          {kernel, pe_group, io_group, ref, 1, in, 1, reduce, NULL};\n      node = isl_schedule_node_map_descendant_bottom_up(node,\n                                                        &add_pe_ext_io_copies_stmt, &data);\n      filter = isl_union_set_union(filter, data.filter);\n    }\n  }\n\n  filter = isl_union_set_coalesce(filter);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_filter(node, filter);\n  node = isl_schedule_node_parent(node);\n  return node;\n}\n\n/* Create the schedule for the PE dummy module that collects/sends the dummy data.\n * If \"in\" is 1, generate dummy module collects the dummy data.\n * Else, generate dummy module sends the dummy data.\n */\nstatic __isl_give isl_schedule *pe_module_dummy_gen(struct autosa_gen *gen,\n                                                    struct autosa_hw_module *module, \n                                                    struct autosa_array_ref_group *group,\n                                                    int in)\n{\n  isl_schedule *schedule;\n  isl_schedule_node *node;\n  isl_id *id, *hw_id;\n  struct autosa_kernel *kernel;\n\n  schedule = gen->schedule;\n  schedule = isl_schedule_dup(schedule);\n  node = isl_schedule_get_root(schedule);\n  isl_schedule_free(schedule);\n  node = autosa_tree_move_down_to_kernel(node);\n\n  id = isl_schedule_node_mark_get_id(node);\n  kernel = (struct autosa_kernel *)isl_id_get_user(id);\n  isl_id_free(id);\n\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  node = isl_schedule_node_child(node, 0);\n  node = split_band(node, kernel->n_sa_dim);\n  node = autosa_tree_move_down_to_pe(node, kernel->core);\n  node = add_pe_ext_io_copies_dummy(\n            kernel, group->local_array, group, node, 1, in, \n            gen->options->autosa->local_reduce && group->attached_drain_group);\n\n  if (gen->options->target != AUTOSA_TARGET_CATAPULT_HLS_C) {\n    /* Insert \"pipeline\" mark under the last \"latency\" mark. */\n    node = isl_schedule_node_map_descendant_bottom_up(node,\n                                                      &insert_pipeline_mark, kernel);\n  }                                                    \n\n  ///* Insert \"unroll\" mark under the last \"simd\" mark. */\n  //node = isl_schedule_node_map_descendant_bottom_up(node,\n  //                                                  &insert_unroll_mark, kernel);\n  \n\n  /* Add module mark after the kernel mark. */\n  hw_id = isl_id_alloc(gen->ctx, \"module\", module);\n  node = autosa_tree_move_up_to_kernel(node);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_mark(node, hw_id);\n\n  /* Add the PE id filter. */\n  node = autosa_tree_move_up_to_kernel(node);\n  isl_schedule_node_child(node, 0);\n  node = insert_context(kernel, node);\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_filter(node,\n                                         isl_union_set_copy(kernel->pe_filter));\n\n  schedule = isl_schedule_node_get_schedule(node);\n  isl_schedule_node_free(node);\n\n  return schedule;\n}\n\n/* Modify the input \"schedule\" to describe the PE module.\n * Set the schedule dimensions of space loops as parameters.\n *\n * For interior I/O groups\n * - add copy-in before PE computation (RAW, RAR)\n * - add copy-out after PE computation (RAW)\n *   - domain: S -> type[D -> access]\n *   - schedule: type[D -> access] -> tiling\n * For exterior I/O groups\n *   for each access in the group\n *   - add copy-in before user statement (RAW, RAR)\n *   - add copy-out after user statement (RAW, RAR)\n *     - domain: S -> type[D -> access]\n *     - schedule: type[D -> access] -> tiling \n *       (if any, otherwise, create a register tiling)\n * For WAW group \n * - for each access in the group\n *   - add write-out after user statement (WAW)\n *     - domain: S -> type[D -> access]\n *     - schedule: type[D -> access] -> tiling\n */\nstatic __isl_give struct autosa_hw_module *sa_pe_module_gen(struct autosa_gen *gen)\n{\n  isl_schedule_node *node;\n  isl_id *id;\n  struct autosa_kernel *kernel;\n  isl_schedule *schedule, *new_schedule;\n  int single_statement;\n  isl_union_set *domain;\n  struct autosa_hw_module *module;\n  isl_id *hw_id;\n\n  module = autosa_hw_module_alloc(gen);\n\n  /* Add the filters for PEs. */\n  schedule = gen->schedule;\n  schedule = isl_schedule_dup(schedule);\n  node = isl_schedule_get_root(schedule);\n  node = autosa_tree_move_down_to_kernel(node);\n\n  id = isl_schedule_node_mark_get_id(node);\n  kernel = (struct autosa_kernel *)isl_id_get_user(id);\n  isl_id_free(id);\n  single_statement = kernel->single_statement;\n  domain = isl_schedule_node_get_domain(node);\n\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  node = isl_schedule_node_child(node, 0);\n  node = split_band(node, kernel->n_sa_dim);\n  kernel->pe_ids = ppcg_scop_generate_names(gen->prog->scop,\n                                            kernel->n_sa_dim, \"p\");\n  kernel->pe_filter = set_schedule_modulo(node, kernel->pe_ids,\n                                          kernel->sa_dim);\n  kernel->sa_grid_size = extract_sa_grid_size(kernel, domain);\n\n  /* Add the statements for I/O groups with exterior I/O at the user \n   * statement level. \n   * Add the statements for I/O group with interior I/O at the PE level.\n   */\n  node = autosa_tree_move_down_to_pe(node, kernel->core);\n  /* Add copy-in/copy-out statements */\n  for (int i = 0; i < kernel->n_array; ++i)\n  {\n    struct autosa_local_array_info *array = &kernel->array[i];\n    for (int j = 0; j < array->n_io_group; j++)\n    {\n      struct autosa_array_ref_group *group = array->io_groups[j];      \n      if (group->local_array->array_type == AUTOSA_EXT_ARRAY)\n      {\n        if (group->pe_io_dir == IO_IN || group->pe_io_dir == IO_INOUT)\n          node = add_pe_ext_io_copies(kernel, array, group, node, 1);\n        if (group->pe_io_dir == IO_OUT || group->pe_io_dir == IO_INOUT)\n          node = add_pe_ext_io_copies(kernel, array, group, node, 0);        \n      }\n      else if (group->local_array->array_type == AUTOSA_INT_ARRAY)\n      {\n        if (group->io_type == AUTOSA_INT_IO)\n        {\n          if (group->pe_io_dir == IO_IN || group->pe_io_dir == IO_INOUT)\n            node = add_pe_int_io_copies(kernel, array, group, node, 1);\n          if (group->pe_io_dir == IO_OUT || group->pe_io_dir == IO_INOUT)\n            node = add_pe_int_io_copies(kernel, array, group, node, 0);          \n        }\n        else\n        {\n          if (group->pe_io_dir == IO_IN || group->pe_io_dir == IO_INOUT)\n            node = add_pe_ext_io_copies(kernel, array, group, node, 1);\n          if (group->pe_io_dir == IO_OUT || group->pe_io_dir == IO_INOUT)\n            node = add_pe_ext_io_copies(kernel, array, group, node, 0);          \n        }\n      }\n      module->n_io_group++;\n      module->io_groups = (struct autosa_array_ref_group **)realloc(\n          module->io_groups,\n          module->n_io_group * sizeof(struct autosa_array_ref_group *));\n      module->io_groups[module->n_io_group - 1] = group;\n    }\n    if (array->drain_group && array->drain_group->array_io_dir != IO_NULL)\n    {\n      node = add_pe_ext_io_copies(kernel, array, array->drain_group, node, 0);\n\n      module->n_io_group++;\n      module->io_groups = (struct autosa_array_ref_group **)realloc(\n          module->io_groups,\n          module->n_io_group * sizeof(struct autosa_array_ref_group *));\n      module->io_groups[module->n_io_group - 1] = array->drain_group;\n    }\n  }\n\n  if (gen->options->target != AUTOSA_TARGET_CATAPULT_HLS_C) {\n    /* Insert \"pipeline\" mark under the last \"latency\" mark. */\n    node = isl_schedule_node_map_descendant_bottom_up(node,\n                                                      &insert_pipeline_mark, kernel);\n  }\n\n  //DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n\n  /* Insert \"unroll\" mark under the last \"simd\" mark */\n  node = isl_schedule_node_map_descendant_bottom_up(node,\n                                                    &insert_unroll_mark, kernel);\n\n  /* Tile the SIMD look for sparsity */\n  if (kernel->sparse) {\n    node = isl_schedule_node_map_descendant_bottom_up(node,\n                                                      &tile_simd_sparse, kernel);\n  }\n\n  /* Add module mark after the kernel mark. */\n  hw_id = isl_id_alloc(gen->ctx, \"module\", module);\n  node = autosa_tree_move_up_to_kernel(node);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_mark(node, hw_id);\n\n  if (gen->options->autosa->tuning_method == 1) {\n    /* Generate another schedule for latency estimation. */    \n    isl_schedule *tuning_sched = isl_schedule_node_get_schedule(node);\n    module->tuning_num_sched = kernel->tuning_program->generate_tuning_schedule(tuning_sched);\n  }\n\n  /* Add the PE id filter. */\n  node = autosa_tree_move_up_to_kernel(node);\n  isl_schedule_node_child(node, 0);\n  node = insert_context(kernel, node);\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_filter(node,\n                                         isl_union_set_copy(kernel->pe_filter));\n\n  //DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n\n  if (gen->options->autosa->tuning_method == 1) {\n    /* Generate another schedule for latency estimation. */    \n    isl_schedule *tuning_sched = isl_schedule_node_get_schedule(node);\n    module->tuning_sched = kernel->tuning_program->generate_tuning_schedule(tuning_sched);    \n  }\n\n  isl_schedule_free(schedule);\n  new_schedule = isl_schedule_node_get_schedule(node);\n  isl_schedule_node_free(node);\n\n  module->sched = new_schedule;\n  module->type = PE_MODULE;\n  module->name = strdup(\"PE\");\n  module->inst_ids = isl_id_list_copy(kernel->pe_ids);\n  create_pe_module_vars(module, kernel);\n  module->kernel = kernel;\n\n  /* For io group with exterior I/O, we create input and output ports for each\n   * PE. However, for the first/last PE on the data transfer direction, \n   * the input/output port consumes/produces dummy data. \n   * We add dummy modules to handle these cases to consume the dummy data.\n   * \n   * In addition, when local reduce is enabled, the boundary PEs should only take \n   * in init values (i.e., 0), we will also add dummy module for such a case.\n   */\n  module->n_pe_dummy_modules = 0;\n  module->pe_dummy_modules = NULL;\n  for (int i = 0; i < kernel->n_array; ++i)\n  {\n    struct autosa_local_array_info *array = &kernel->array[i];\n    //if (array->array_type == AUTOSA_INT_ARRAY)\n    //  continue;\n    for (int j = 0; j < array->n_io_group; j++)\n    {\n      struct autosa_array_ref_group *group = array->io_groups[j];\n      if (group->io_type == AUTOSA_INT_IO)\n        continue;\n      if (group->pe_io_dir != IO_INOUT)\n        continue;\n      if (group->copy_in == 0 && group->copy_out == 0)\n        continue;\n\n      /* Generate the dummy module. */\n      isl_schedule *sched;\n      int in = array->array_type == AUTOSA_INT_ARRAY? 0 : 1;\n\n      sched = pe_module_dummy_gen(gen, module, group, in);\n      module->n_pe_dummy_modules++;\n      module->pe_dummy_modules =\n          (struct autosa_pe_dummy_module **)realloc(module->pe_dummy_modules,\n                                                    module->n_pe_dummy_modules * sizeof(struct autosa_pe_dummy_module *));\n      struct autosa_pe_dummy_module *dummy_module = autosa_pe_dummy_module_alloc();\n      dummy_module->module = module;\n      dummy_module->io_group = group;\n      dummy_module->sched = sched;\n      dummy_module->in = in;\n      module->pe_dummy_modules[module->n_pe_dummy_modules - 1] = dummy_module;\n    }\n  }\n\n  return module;\n}\n\n/* The input modules are organized in the sequence of:\n * PE module\n * I/O module (copy-in and copy-out)\n * Drain module\n * We will reorder the modules following the below sequence:\n * I/O module (copy-in) \n * PE module \n * I/O module (copy-out)\n * Drain module\n * The reason for the re-ordering is for CSim to proceed in Xilinx environment.\n */\nstatic __isl_give struct autosa_hw_module **hw_module_reorder(\n    __isl_take struct autosa_hw_module **modules, int n_module)\n{\n  struct autosa_hw_module **modules_new = (struct autosa_hw_module **)\n      malloc(n_module * sizeof(struct autosa_hw_module *));\n  int pos = 0;\n\n  /* I/O module (copy-in) */\n  for (int i = 0; i < n_module; i++)\n  {\n    struct autosa_hw_module *module = modules[i];\n    if (module->type == IO_MODULE && module->in)\n    {\n      modules_new[pos] = module;\n      pos++;\n    }\n  }\n\n  /* PE module */\n  modules_new[pos] = modules[0];\n  pos++;\n\n  /* I/O module (copy-out) */\n  for (int i = 0; i < n_module; i++)\n  {\n    struct autosa_hw_module *module = modules[i];\n    if (module->type == IO_MODULE && !module->in)\n    {\n      modules_new[pos] = module;\n      pos++;\n    }\n  }\n\n  /* Drain module */\n  for (int i = 0; i < n_module; i++)\n  {\n    struct autosa_hw_module *module = modules[i];\n    if (module->type == DRAIN_MODULE)\n    {\n      modules_new[pos] = module;\n      pos++;\n    }\n  }\n\n  free(modules);\n  return modules_new;\n}\n\n/* Create the schedule that calls all the PE dummy modules.\n * We will work on the transformed IO schedule for the io group.\n * We delete the schedule nodes above the array mark and below the PE mark,\n * add a filter to only consider the last module in the transfer chain.\n * Then insert the module call extension nodes right under the space bands.\n */\nstatic __isl_give isl_schedule *pe_dummy_gen_module_call(struct autosa_gen *gen,\n                                                         struct autosa_pe_dummy_module *pe_dummy_module)\n{\n  struct autosa_array_ref_group *group;\n  isl_schedule *sched;\n  isl_schedule_node *node;\n  struct autosa_kernel *kernel;\n  struct autosa_hw_module *module;\n  int n_member;\n  isl_union_set *L1_filter;\n  isl_bool insert_L1 = isl_bool_false;\n  isl_printer *p_str;\n  isl_ctx *ctx;\n  char *stmt_name;\n  isl_id *id;\n  isl_union_map *prefix, *extension;\n  isl_union_set *domain, *range;\n\n  module = pe_dummy_module->module;\n  kernel = module->kernel;\n  ctx = gen->ctx;\n  group = pe_dummy_module->io_group;\n  sched = isl_schedule_dup(group->io_L1_schedule);\n  node = isl_schedule_get_root(sched);\n  isl_schedule_free(sched);\n  isl_space *space;\n  isl_union_set *empty_filter;\n  isl_schedule_node *graft;  \n  int lower_band_num = -1;\n\n  /* Delete the node above the array mark. */\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  node = isl_schedule_node_parent(node);\n  while (!(autosa_tree_node_is_kernel(node) || isl_schedule_node_get_type(node) == isl_schedule_node_context)) {\n    node = isl_schedule_node_delete(node);\n    node = isl_schedule_node_parent(node);\n  }\n\n//#ifdef _DEBUG\n//  if (!strcmp(group->array->name, \"U_tmp\") && pe_dummy_module->in == 0) {\n//    printf(\"here\\n\");\n//    printf(\"group id: %d\\n\", group->nr);\n//    DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//    isl_schedule *sched_tmp = isl_schedule_node_get_schedule(node);\n//    print_code(gen, isl_schedule_copy(sched_tmp), \"U_tmp_out.c\");\n//    isl_schedule_free(sched_tmp);\n//  }\n//#endif\n\n  /* Insert a filter. */\n  node = autosa_tree_move_down_to_mark(node, kernel->core, \"io_L1\");\n  node = isl_schedule_node_parent(node);\n  n_member = isl_schedule_node_band_n_member(node);\n  if (n_member > 1)\n  {\n    node = isl_schedule_node_band_split(node, n_member - 1);\n    node = isl_schedule_node_child(node, 0);\n  }\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n  {\n    if (pe_dummy_module->in)\n      L1_filter = schedule_eq_ub(node);\n    else\n      L1_filter = schedule_eq_lb(node);    \n    insert_L1 = isl_bool_true;\n  }\n\n//#ifdef _DEBUG\n//  if (!strcmp(group->array->name, \"U_tmp\") && pe_dummy_module->in == 0) {\n//    DBGUSET(stdout, L1_filter, gen->ctx);\n//  }\n//#endif\n\n//#ifdef _DEBUG\n//  if (!strcmp(group->array->name, \"U_tmp\") && !pe_dummy_module->in)\n//    DBGUSET(stdout, L1_filter, isl_schedule_node_get_ctx(node));\n//#endif\n\n  node = autosa_tree_move_down_to_mark(node, kernel->core, \"io_L1\");\n  node = isl_schedule_node_child(node, 0);\n  if (insert_L1)\n  {\n    node = isl_schedule_node_insert_filter(node, L1_filter);\n  }\n\n  /* Delete the node under the pe mark. */\n  node = autosa_tree_move_down_to_pe(node, kernel->core);\n  node = isl_schedule_node_cut(node);\n\n  /* Make the ancestors atomic */\n  node = autosa_atomic_ancestors(node);\n\n//#ifdef _DEBUG\n//  if (!strcmp(group->array->name, \"U_tmp\") && pe_dummy_module->in == 0) {\n//    printf(\"here\\n\");\n//    printf(\"group id: %d\\n\", group->nr);\n//    DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//    isl_schedule *sched_tmp = isl_schedule_node_get_schedule(node);\n//    print_code(gen, isl_schedule_copy(sched_tmp), \"U_tmp_out2.c\");\n//    isl_schedule_free(sched_tmp);\n//  }\n//#endif\n\n  /* Test if the range of the last dimension contains single element */\n  lower_band_num = get_last_sched_dim_val(node);\n\n//#ifdef _DEBUG\n//  if (!strcmp(group->array->name, \"U_tmp\") && pe_dummy_module->in) {\n//    DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//  }\n//#endif\n\n  /* Graft an extension node. */\n  prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n  prefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix,\n                                                            isl_union_pw_multi_aff_copy(kernel->contraction));\n  domain = isl_union_map_range(prefix);\n\n  p_str = isl_printer_to_str(ctx);\n  p_str = isl_printer_print_str(p_str, \"module_call.\");\n  p_str = autosa_array_ref_group_print_prefix(group, p_str);\n  p_str = isl_printer_print_str(p_str, \"_PE_dummy\");\n  p_str = isl_printer_print_str(p_str, pe_dummy_module->in? \"_in\" : \"_out\");\n  p_str = isl_printer_print_str(p_str, \".0.0\");\n  if (lower_band_num != -1) {\n    p_str = isl_printer_print_str(p_str, \".\");\n    p_str = isl_printer_print_int(p_str, lower_band_num);\n  }\n  stmt_name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  space = isl_space_set_alloc(ctx, 0, 1);\n  space = isl_space_set_tuple_name(space, isl_dim_set, stmt_name);\n  free(stmt_name);\n\n  isl_point *pnt = isl_point_zero(space);\n  isl_set *set = isl_set_from_point(pnt);\n  range = isl_union_set_from_set(isl_set_copy(set));\n  extension = isl_union_map_from_domain_and_range(domain, range);\n  graft = isl_schedule_node_from_extension(extension);\n\n  isl_map *map = isl_set_identity(set);\n  map = isl_map_reset_tuple_id(map, isl_dim_out);\n  isl_union_map *umap = isl_union_map_from_map(map);\n  isl_multi_union_pw_aff *mupa = isl_multi_union_pw_aff_from_union_map(umap);\n\n  graft = isl_schedule_node_child(graft, 0);\n  graft = isl_schedule_node_insert_partial_schedule(graft, mupa);\n  graft = ppcg_set_schedule_node_type(graft, isl_ast_loop_atomic);\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, graft, isl_schedule_node_get_ctx(node));\n//#endif\n\n  while (graft && isl_schedule_node_has_parent(graft))\n    graft = isl_schedule_node_parent(graft);\n\n  node = isl_schedule_node_graft_before(node, graft);\n\n  /* Insert an empty filter. */\n  empty_filter = isl_union_set_from_set(isl_set_empty(\n      isl_set_get_space(kernel->context)));\n  node = isl_schedule_node_insert_filter(node, empty_filter);\n\n  /* Add module mark after the kernel mark. */\n  id = isl_id_alloc(ctx, \"module\", module);\n  node = autosa_tree_move_up_to_kernel(node);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_mark(node, id);\n\n  /* Add pe_dummy module mark after the module mark. */\n  id = isl_id_alloc(ctx, \"pe_dummy_module\", pe_dummy_module);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_mark(node, id);\n\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif\n\n  sched = isl_schedule_node_get_schedule(node);\n  isl_schedule_node_free(node);\n\n  return sched;\n}\n\n/* Create the schedule that calls all the PE modules.\n * We delete the schedule nodes above the array mark and below the PE mark,\n * then insert the module call extension nodes right under the space bands.\n */\nstatic isl_stat top_module_pe_gen_module_call(struct autosa_gen *gen,\n                                              struct autosa_hw_top_module *top, struct autosa_hw_module *module)\n{\n  isl_schedule *schedule;\n  isl_schedule_node *node, *graft;\n  isl_id *id;\n  struct autosa_kernel *kernel = gen->kernel;\n  isl_space *space;\n  isl_ctx *ctx;\n  isl_union_set *domain;\n  isl_union_set *empty_filter;\n  isl_printer *p_str;\n  char *stmt_name;\n\n  schedule = gen->schedule;\n  schedule = isl_schedule_dup(schedule);\n  node = isl_schedule_get_root(schedule);\n  isl_schedule_free(schedule);\n  ctx = isl_schedule_node_get_ctx(node);\n\n  /* Delete the node above the array mark. */\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  node = isl_schedule_node_parent(node);\n  while (!autosa_tree_node_is_kernel(node))\n  {\n    node = isl_schedule_node_delete(node);\n    node = isl_schedule_node_parent(node);\n  }\n\n  /* Delete the node under the pe mark. */\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  node = isl_schedule_node_child(node, 0);\n  node = split_band(node, kernel->n_sa_dim);\n\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_cut(node);\n\n  /* Graft an extension node. */\n  p_str = isl_printer_to_str(ctx);\n  p_str = isl_printer_print_str(p_str, \"module_call.\");\n  p_str = isl_printer_print_str(p_str, module->name);\n  p_str = isl_printer_print_str(p_str, \".0.0\");\n  stmt_name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  space = isl_space_set_alloc(ctx, 0, 0);\n  space = isl_space_set_tuple_name(space, isl_dim_set, stmt_name);\n  free(stmt_name);\n  domain = isl_union_set_from_set(isl_set_universe(space));\n  graft = isl_schedule_node_from_domain(domain);\n\n  node = isl_schedule_node_graft_before(node, graft);\n\n  /* Insert an empty filter */\n  empty_filter = isl_union_set_from_set(isl_set_empty(\n      isl_set_get_space(kernel->context)));\n  node = isl_schedule_node_insert_filter(node, empty_filter);\n\n  /* Add module mark after the kernel mark. */\n  id = isl_id_alloc(ctx, \"module\", module);\n  node = autosa_tree_move_up_to_kernel(node);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_mark(node, id);\n\n  schedule = isl_schedule_node_get_schedule(node);\n  isl_schedule_node_free(node);\n\n  top->n_module_calls++;\n  top->module_call_scheds = (isl_schedule **)realloc(top->module_call_scheds,\n                                                     top->n_module_calls * sizeof(isl_schedule *));\n  top->module_call_scheds[top->n_module_calls - 1] = schedule;\n\n  if (module->n_pe_dummy_modules > 0 && gen->options->target != AUTOSA_TARGET_CATAPULT_HLS_C)\n  {\n    int inserted = 0;\n    /* Generate dummy module calls. */\n    for (int i = 0; i < module->n_pe_dummy_modules; i++)\n    {\n      struct autosa_pe_dummy_module *pe_dummy_module;\n      isl_schedule *sched;\n\n      pe_dummy_module = module->pe_dummy_modules[i];\n      sched = pe_dummy_gen_module_call(gen, pe_dummy_module);\n\n      top->n_module_calls++;\n      top->module_call_scheds = (isl_schedule **)realloc(top->module_call_scheds,\n                                                         top->n_module_calls * sizeof(isl_schedule *));\n      /* If the module is out, we need to place it before the PE module call. */\n      if (!pe_dummy_module->in) {        \n        for (int j = top->n_module_calls - 2; j >= top->n_module_calls - 1 - inserted - 1; j--)\n          top->module_call_scheds[j + 1] = top->module_call_scheds[j];\n        top->module_call_scheds[top->n_module_calls - 1 - inserted - 1] = sched;\n      } else {\n        top->module_call_scheds[top->n_module_calls - 1] = sched;\n      }\n      inserted++;\n    }\n  }\n\n  return isl_stat_ok;\n}\n\n/* Generate the schedule that declares the fifos used in PEs. \n * If the io group data transfer direciton at the PE level is INOUT,\n * we will add another extension node at the boundary of the transfer chain\n * to declare one more fifo.\n */\nstatic isl_stat top_module_pe_gen_fifo_decl(struct autosa_gen *gen,\n                                            struct autosa_hw_top_module *top, struct autosa_hw_module *module)\n{\n  isl_schedule *schedule;\n  isl_schedule_node *node, *graft;\n  isl_id *id;\n  struct autosa_kernel *kernel = gen->kernel;\n  isl_space *space;\n  isl_ctx *ctx = gen->ctx;\n  isl_union_set *domain;\n  isl_union_set *empty_filter;\n  isl_printer *p_str;\n  char *stmt_name;\n\n  for (int i = 0; i < module->n_io_group; i++)\n  {\n    struct autosa_array_ref_group *group = module->io_groups[i];\n    isl_multi_aff *io_trans;\n    isl_mat *io_trans_mat;\n    isl_id *id;\n    isl_union_set *L1_filter = NULL;\n    bool insert_L1 = isl_bool_false;\n    if (group->pe_io_dir == IO_NULL)\n      continue;\n\n    schedule = isl_schedule_dup(group->io_L1_schedule);\n    node = isl_schedule_get_root(schedule);\n    isl_schedule_free(schedule);\n\n    /* Delete the node above the array mark. */\n    node = autosa_tree_move_down_to_array(node, kernel->core);\n    node = isl_schedule_node_parent(node);\n    while (!autosa_tree_node_is_kernel(node))\n    {\n      node = isl_schedule_node_delete(node);\n      node = isl_schedule_node_parent(node);\n    }\n\n    if (group->pe_io_dir == IO_INOUT)\n    {\n      int n_member;\n      node = autosa_tree_move_down_to_mark(node, kernel->core, \"io_L1\");\n      node = isl_schedule_node_parent(node);\n      n_member = isl_schedule_node_band_n_member(node);\n      node = isl_schedule_node_band_split(node, n_member - 1);\n      node = isl_schedule_node_child(node, 0);\n      if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n      {\n        L1_filter = schedule_eq_ub(node);\n        insert_L1 = isl_bool_true;\n      }\n      node = autosa_tree_move_up_to_array(node);\n    }\n\n    /* Delete the node under the pe mark. */\n    node = autosa_tree_move_down_to_pe(node, kernel->core);\n    node = isl_schedule_node_cut(node);\n\n    /* Graft an extension node. */\n    p_str = isl_printer_to_str(ctx);\n    p_str = isl_printer_print_str(p_str, \"fifo_decl.\");\n    p_str = autosa_array_ref_group_print_fifo_name(group, p_str);\n    stmt_name = isl_printer_get_str(p_str);\n    isl_printer_free(p_str);\n    space = isl_space_set_alloc(ctx, 0, 0);\n    id = isl_id_alloc(ctx, stmt_name, group);\n    space = isl_space_set_tuple_id(space, isl_dim_set, id);\n    free(stmt_name);\n    domain = isl_union_set_from_set(isl_set_universe(space));\n    graft = isl_schedule_node_from_domain(domain);\n\n    node = isl_schedule_node_graft_before(node, graft);\n\n    if (insert_L1)\n    {\n      isl_set *set;\n      isl_multi_union_pw_aff *mupa;\n      isl_union_map *prefix;\n      isl_union_set *domain;\n      isl_union_set *range;\n      isl_union_map *extension;\n      isl_map *map;\n      isl_union_map *umap;\n\n      /* Graft an extension node for boundary PE. */\n      node = isl_schedule_node_insert_filter(node, L1_filter);\n      node = isl_schedule_node_child(node, 0);\n      prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n      prefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix,\n                                                                isl_union_pw_multi_aff_copy(kernel->contraction));\n      domain = isl_union_map_range(prefix);\n\n      p_str = isl_printer_to_str(ctx);\n      p_str = isl_printer_print_str(p_str, \"fifo_decl_boundary.\");\n      p_str = autosa_array_ref_group_print_fifo_name(group, p_str);\n      stmt_name = isl_printer_get_str(p_str);\n      isl_printer_free(p_str);\n      space = isl_space_set_alloc(ctx, 0, 1);\n      id = isl_id_alloc(ctx, stmt_name, group);\n      space = isl_space_set_tuple_id(space, isl_dim_set, id);\n      free(stmt_name);\n\n      isl_point *pnt = isl_point_zero(space);\n      set = isl_set_from_point(pnt);\n      range = isl_union_set_from_set(isl_set_copy(set));\n\n      extension = isl_union_map_from_domain_and_range(domain, range);\n      graft = isl_schedule_node_from_extension(extension);\n\n      map = isl_set_identity(set);\n      map = isl_map_reset_tuple_id(map, isl_dim_out);\n      umap = isl_union_map_from_map(map);\n      mupa = isl_multi_union_pw_aff_from_union_map(umap);\n\n      graft = isl_schedule_node_child(graft, 0);\n      graft = isl_schedule_node_insert_partial_schedule(graft, mupa);\n\n      while (graft && isl_schedule_node_has_parent(graft))\n        graft = isl_schedule_node_parent(graft);\n\n      node = isl_schedule_node_graft_before(node, graft);\n    }\n    else\n    {\n      isl_union_set_free(L1_filter);\n    }\n\n    /* Insert an empty filter. */\n    empty_filter = isl_union_set_from_set(isl_set_empty(\n        isl_set_get_space(kernel->context)));\n    node = isl_schedule_node_insert_filter(node, empty_filter);\n\n    /* Add module mark after the kernel mark. */\n    id = isl_id_alloc(ctx, \"module\", module);\n    node = autosa_tree_move_up_to_kernel(node);\n    node = isl_schedule_node_child(node, 0);\n    node = isl_schedule_node_insert_mark(node, id);\n\n    schedule = isl_schedule_node_get_schedule(node);\n    isl_schedule_node_free(node);\n\n    top->n_fifo_decls++;\n    top->fifo_decl_scheds = (isl_schedule **)realloc(top->fifo_decl_scheds,\n                                                     top->n_fifo_decls * sizeof(isl_schedule *));\n    top->fifo_decl_scheds[top->n_fifo_decls - 1] = schedule;\n    top->fifo_decl_names = (char **)realloc(top->fifo_decl_names,\n                                            top->n_fifo_decls * sizeof(char *));\n    /* Generate fifo_decl name in the format of \n     * [fifo_name].[fifo_width] \n     */\n    p_str = isl_printer_to_str(ctx);\n    p_str = autosa_array_ref_group_print_fifo_name(group, p_str);\n    p_str = isl_printer_print_str(p_str, \"_\");\n    p_str = isl_printer_print_str(p_str, module->name);\n    p_str = isl_printer_print_str(p_str, \".\");\n    int n_lane = get_io_group_n_lane(module, NULL, group);    \n    int data_size = group->array->size;\n    int width = data_size * n_lane; // in bytes\n    p_str = isl_printer_print_int(p_str, width);\n    top->fifo_decl_names[top->n_fifo_decls - 1] = isl_printer_get_str(p_str);\n    isl_printer_free(p_str);\n  }\n\n  return isl_stat_ok;\n}\n\n/* Generate module calls and fifo decls for the PE module. \n */\nstatic isl_stat top_module_pe_gen(struct autosa_gen *gen,\n                                  struct autosa_hw_top_module *top, struct autosa_hw_module *module)\n{\n  /* Generate the function call schedule. */\n  top_module_pe_gen_module_call(gen, top, module);\n\n  /* Generate the fifo declaration schedule. */\n  top_module_pe_gen_fifo_decl(gen, top, module);\n\n  return isl_stat_ok;\n}\n\n/* The input \"node\" points to the node below io_[module->level] mark.\n * Return the node points to the \"kernel\" mark.\n * We will insert two module call extension nodes: \n * module_call_upper: which contains the module name and arguments for the \n * inter-module transfer\n * module_call_lower: which contains arguments for the intra-module transfer\n * (i.e., transfer to the lower-level modules)\n */\nstatic __isl_give isl_schedule_node *io_gen_module_call(\n    __isl_take isl_schedule_node *node, struct autosa_hw_module *module,\n    struct autosa_kernel *kernel, struct autosa_array_ref_group *group,\n    int boundary, int serialize,\n    __isl_take isl_union_set *filter_domain)\n{\n  isl_printer *p_str;\n  char *stmt_name;\n  isl_space *space;\n  isl_union_set *domain, *empty_filter, *lower_level_filter;\n  isl_schedule_node *graft;\n  isl_bool insert_lower = isl_bool_false;\n  isl_ctx *ctx = isl_schedule_node_get_ctx(node);\n  isl_id *id;\n  isl_union_map *prefix, *extension, *umap;\n  isl_union_set *range;\n  isl_set *set;\n  isl_map *map;\n  isl_multi_union_pw_aff *mupa;\n  int lower_band_num = -1;  \n  isl_union_set *filter_range;\n  isl_bool upper_inserted;\n\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif\n\n  /* Collect the filter for the lower I/O module. */\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n  {\n    if (module->level > 1)\n    {\n      if (module->to_pe) {\n        if (module->in)\n          lower_level_filter = schedule_eq_lb(node);\n        else\n          lower_level_filter = schedule_eq_ub(node);\n      } else {\n        lower_level_filter = schedule_eq_lb(node);\n      }\n      \n      insert_lower = isl_bool_true;\n    }\n  }\n\n  /* Graft an extension node for module call. */\n  prefix = isl_schedule_node_get_prefix_schedule_relation(node);  \n  prefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix,\n                                                            isl_union_pw_multi_aff_copy(kernel->contraction));\n  domain = isl_union_map_range(isl_union_map_copy(prefix));\n  if (filter_domain) {\n    filter_range = isl_union_set_apply(isl_union_set_copy(filter_domain), isl_union_map_copy(prefix));\n    domain = isl_union_set_intersect(domain, filter_range);\n  }\n  isl_union_map_free(prefix);\n\n  p_str = isl_printer_to_str(ctx);\n  p_str = isl_printer_print_str(p_str, \"module_call_upper.\");\n  p_str = isl_printer_print_str(p_str, module->name);  \n  if (boundary) \n    p_str = isl_printer_print_str(p_str, \".1\");\n  else\n    p_str = isl_printer_print_str(p_str, \".0\");\n  if (serialize)\n    p_str = isl_printer_print_str(p_str, \".1\");\n  else\n    p_str = isl_printer_print_str(p_str, \".0\");\n\n  stmt_name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  space = isl_space_set_alloc(ctx, 0, 1);\n  space = isl_space_set_tuple_name(space, isl_dim_set, stmt_name);\n  free(stmt_name);\n\n  isl_point *pnt = isl_point_zero(space);\n  set = isl_set_from_point(pnt);\n  range = isl_union_set_from_set(isl_set_copy(set));\n\n  extension = isl_union_map_from_domain_and_range(domain, range);\n  graft = isl_schedule_node_from_extension(extension);\n\n  map = isl_set_identity(set);\n  map = isl_map_reset_tuple_id(map, isl_dim_out);\n  umap = isl_union_map_from_map(map);\n  mupa = isl_multi_union_pw_aff_from_union_map(umap);\n\n  graft = isl_schedule_node_child(graft, 0);\n  graft = isl_schedule_node_insert_partial_schedule(graft, mupa);\n\n  while (graft && isl_schedule_node_has_parent(graft))\n    graft = isl_schedule_node_parent(graft);\n\n  node = isl_schedule_node_graft_before(node, graft);\n\n  if (module->level > 1)\n  {\n    node = autosa_tree_move_down_to_io_mark(node, kernel->core, module->level - 1);\n  }\n  node = isl_schedule_node_cut(node);\n\n  /* Graft an extension node for lower level transfer. */\n  if (insert_lower)\n  {    \n    if (module->to_pe) {\n      node = isl_schedule_node_insert_filter(node, lower_level_filter);\n      node = isl_schedule_node_child(node, 0);\n    } else {\n      /* In case the lower band only contains one element, we will compute the \n       * value and append to the module_call name.\n       */\n      isl_schedule_node *node_tmp;\n      node_tmp = isl_schedule_node_copy(node);\n      node_tmp = isl_schedule_node_parent(node_tmp); // band\n      node_tmp = isl_schedule_node_insert_filter(node_tmp, isl_union_set_copy(lower_level_filter));\n      node_tmp = isl_schedule_node_child(node_tmp, 0);\n      lower_band_num = get_band_single_schedule_val(node_tmp);\n      isl_schedule_node_free(node_tmp);\n\n//#ifdef _DEBUG\n//      if (!strcmp(module->name, \"U_drain_IO_L2_out\")) {\n//        printf(\"test %d\\n\", lower_band_num);\n//      }\n//#endif \n\n      node = isl_schedule_node_insert_filter(node, lower_level_filter);\n      node = isl_schedule_node_child(node, 0);\n\n      //node = isl_schedule_node_parent(node); // band\n      //node = isl_schedule_node_insert_filter(node, lower_level_filter);\n      //node = isl_schedule_node_child(node, 0);      \n      //lower_band_num = get_band_single_schedule_val(node);     \n      //node = isl_schedule_node_child(node, 0);\n    }\n  }\n  {\n    isl_union_map *prefix;\n    isl_union_set *domain, *range;\n    isl_point *pnt;\n    isl_set *set;\n    isl_union_map *extension, *umap;\n    isl_map *map;\n    isl_multi_union_pw_aff *mupa;\n\n    prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n    prefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix,\n                                                              isl_union_pw_multi_aff_copy(kernel->contraction));\n    domain = isl_union_map_range(isl_union_map_copy(prefix));\n    if (filter_domain) {\n      filter_range = isl_union_set_apply(isl_union_set_copy(filter_domain), isl_union_map_copy(prefix));    \n      domain = isl_union_set_intersect(domain, filter_range);     \n    }\n    isl_union_map_free(prefix);\n\n    p_str = isl_printer_to_str(ctx);\n    p_str = isl_printer_print_str(p_str, \"module_call_lower.\");\n    p_str = isl_printer_print_str(p_str, module->name);    \n    if (boundary) \n      p_str = isl_printer_print_str(p_str, \".1\");\n    else\n      p_str = isl_printer_print_str(p_str, \".0\");\n    if (serialize)\n      p_str = isl_printer_print_str(p_str, \".1\");\n    else\n      p_str = isl_printer_print_str(p_str, \".0\");\n\n    if (lower_band_num != -1) {\n      p_str = isl_printer_print_str(p_str, \".\");\n      p_str = isl_printer_print_int(p_str, lower_band_num);\n    }\n\n    stmt_name = isl_printer_get_str(p_str);\n    isl_printer_free(p_str);\n    space = isl_space_set_alloc(ctx, 0, 1);\n    id = isl_id_alloc(ctx, stmt_name, group);\n    space = isl_space_set_tuple_id(space, isl_dim_set, id);\n    free(stmt_name);\n\n    pnt = isl_point_zero(space);\n    set = isl_set_from_point(pnt);\n    range = isl_union_set_from_set(isl_set_copy(set));\n\n    /* Build an identical union map from domain.\n     * Project out the range dims and only keep the last dim.\n     * Set the range name as stmt_name. */    \n    extension = isl_union_map_from_domain_and_range(domain, range);\n    graft = isl_schedule_node_from_extension(extension);\n\n    map = isl_set_identity(set);\n    map = isl_map_reset_tuple_id(map, isl_dim_out);\n    umap = isl_union_map_from_map(map);\n    mupa = isl_multi_union_pw_aff_from_union_map(umap);\n\n    graft = isl_schedule_node_child(graft, 0);\n    graft = isl_schedule_node_insert_partial_schedule(graft, mupa);\n\n    while (graft && isl_schedule_node_has_parent(graft))\n      graft = isl_schedule_node_parent(graft);\n\n    node = isl_schedule_node_graft_after(node, graft);\n\n//    if (!strcmp(module->name, \"U_drain_IO_L2_out\")) {\n//      DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//    }    \n  }\n\n  /* Insert an empty filter. */\n  empty_filter = isl_union_set_from_set(isl_set_empty(isl_set_get_space(kernel->context)));\n  node = isl_schedule_node_insert_filter(node, empty_filter);\n\n  node = autosa_tree_move_up_to_kernel(node);\n  isl_union_set_free(filter_domain);\n\n  return node;\n}\n\n/* The input \"node\" points to the node below io_[module->level] mark.\n * Return the node points to the \"kernel\" mark.\n * We will insert one module call extension node: \n * module_call_upper: which contains the module name and arguments for the \n * inter-module transfer\n * This function is used for Intel OpenCL only. We will not generate \n * the module_call_lower, which is define as below:\n * module_call_lower: which contains arguments for the intra-module transfer\n * (i.e., transfer to the lower-level modules)\n */\nstatic __isl_give isl_schedule_node *io_gen_ext_module(\n    __isl_take isl_schedule_node *node, struct autosa_hw_module *module,\n    struct autosa_kernel *kernel, struct autosa_array_ref_group *group,\n    int boundary)\n{\n  isl_printer *p_str;\n  char *stmt_name;\n  isl_space *space;\n  isl_union_set *domain, *empty_filter, *lower_level_filter;\n  isl_schedule_node *graft;\n  isl_bool insert_lower = isl_bool_false;\n  isl_ctx *ctx = isl_schedule_node_get_ctx(node);\n  isl_id *id;\n  isl_union_map *prefix, *extension, *umap;\n  isl_union_set *range;\n  isl_set *set;\n  isl_map *map;\n  isl_multi_union_pw_aff *mupa;\n\n  /* Graft an extension node for module call. */\n  prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n  prefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix,\n                                                            isl_union_pw_multi_aff_copy(kernel->contraction));\n  domain = isl_union_map_range(prefix);\n\n  p_str = isl_printer_to_str(ctx);\n  p_str = isl_printer_print_str(p_str, \"ext_module_upper.\");\n  p_str = isl_printer_print_str(p_str, module->name);\n  if (boundary)\n    p_str = isl_printer_print_str(p_str, \".boundary\");\n  stmt_name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  space = isl_space_set_alloc(ctx, 0, 0);\n  space = isl_space_set_tuple_name(space, isl_dim_set, stmt_name);\n  free(stmt_name);\n\n  isl_point *pnt = isl_point_zero(space);\n  set = isl_set_from_point(pnt);\n  range = isl_union_set_from_set(isl_set_copy(set));\n\n  extension = isl_union_map_from_domain_and_range(domain, range);\n  graft = isl_schedule_node_from_extension(extension);\n\n  map = isl_set_identity(set);\n  map = isl_map_reset_tuple_id(map, isl_dim_out);\n  umap = isl_union_map_from_map(map);\n  mupa = isl_multi_union_pw_aff_from_union_map(umap);\n\n  graft = isl_schedule_node_child(graft, 0);\n  graft = isl_schedule_node_insert_partial_schedule(graft, mupa);\n\n  while (graft && isl_schedule_node_has_parent(graft))\n    graft = isl_schedule_node_parent(graft);\n\n  node = isl_schedule_node_graft_before(node, graft);\n  node = isl_schedule_node_cut(node);\n\n  /* Insert an empty filter. */\n  empty_filter = isl_union_set_from_set(isl_set_empty(isl_set_get_space(kernel->context)));\n  node = isl_schedule_node_insert_filter(node, empty_filter);\n\n  node = autosa_tree_move_up_to_kernel(node);\n\n  return node;\n}\n\n/* Generate the calls for the io module connected to the external memory. \n * This function is used for Intel OpenCL only.\n * Since all fifos will be replaced with channels later, this function only \n * generates the upper module calls, ignoring the lower module call.\n */\nstatic isl_stat top_module_io_gen_ext_module(\n    struct autosa_gen *gen, struct autosa_hw_top_module *top,\n    struct autosa_hw_module *module,\n    struct autosa_array_ref_group *group)\n{\n  isl_schedule *schedule;\n  isl_ctx *ctx = gen->ctx;\n  isl_schedule_node *node, *graft;\n  isl_id *id;\n  struct autosa_kernel *kernel = gen->kernel;\n  isl_printer *p_str;\n  char *stmt_name;\n  isl_space *space;\n  isl_union_set *domain, *empty_filter, *lower_level_filter;\n  isl_bool insert_lower = isl_bool_false;\n  int boundary = module->boundary;\n  isl_union_set *boundary_filter, *non_boundary_filter;\n  isl_union_set_list *boundary_filters;\n  isl_union_set *group_domain_filter;\n  int single_ele = -1;\n  isl_union_set *group_domain_filter_level;\n\n  /* Only the top-level io module connected to the external memory is handled.\n   */\n  if (module->type == PE_MODULE || module->to_mem == 0)\n    return isl_stat_ok;\n\n  /* Transform the schedule. */\n  schedule = isl_schedule_dup(group->io_schedule);\n  node = isl_schedule_get_root(schedule);\n  isl_schedule_free(schedule);\n\n  /* Compute the union of domains of all the array references in the group. */\n  group_domain_filter = compute_io_group_domain(node, group, kernel, gen, module->in);  \n  group_domain_filter = extend_io_group_domain(group_domain_filter, node, group, kernel, module->level);  \n  group_domain_filter_level = compute_io_group_domain_at_level(group_domain_filter, node, group, kernel, module->level);    \n\n  /* Delete the node above the array mark. */\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  node = isl_schedule_node_parent(node);  \n  while (!(autosa_tree_node_is_kernel(node) || isl_schedule_node_get_type(node) == isl_schedule_node_context)) {\n    node = isl_schedule_node_delete(node);\n    node = isl_schedule_node_parent(node);\n  }\n\n  node = autosa_tree_move_up_to_kernel(node);\n\n  /* Collect the filter for the boundary and non-boundary I/O module. */\n  if (boundary && (module->level <= group->space_dim))\n  {\n    node = autosa_tree_move_down_to_io_mark(node, kernel->core, module->level);\n    node = isl_schedule_node_parent(node);\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n    {\n      /* Test if the band only contains one elmenet */\n      isl_schedule_node *node_tmp;      \n      node_tmp = isl_schedule_node_copy(node);\n      if (group_domain_filter_level) {\n        node_tmp = isl_schedule_node_insert_filter(node_tmp, isl_union_set_copy(group_domain_filter_level));\n        node_tmp = isl_schedule_node_child(node_tmp, 0);\n      }\n      single_ele = get_band_single_schedule_val(node_tmp);\n      if (single_ele == -1) {\n        boundary_filter = schedule_eq_ub(node_tmp);\n        non_boundary_filter = schedule_neq_ub(node_tmp);\n      }\n      isl_schedule_node_free(node_tmp);\n\n      if (single_ele == -1) {\n        boundary_filters = isl_union_set_list_from_union_set(non_boundary_filter);\n        boundary_filters = isl_union_set_list_add(boundary_filters, boundary_filter);        \n\n        node = isl_schedule_node_child(node, 0); // io_mark\n        node = isl_schedule_node_child(node, 0); // band      \n        node = isl_schedule_node_insert_sequence(node, boundary_filters);\n        /* The node now is right below the io_[module->level] mark. */      \n      } else {\n        node = isl_schedule_node_child(node, 0); // io_mark\n        node = isl_schedule_node_child(node, 0); // band\n        node = isl_schedule_node_insert_filter(node, isl_union_set_copy(group_domain_filter_level));\n        node = isl_schedule_node_child(node, 0); // band\n      }\n    }\n  }\n  else\n  {\n    node = autosa_tree_move_down_to_io_mark(node, kernel->core, module->level);\n    node = isl_schedule_node_child(node, 0);\n  }\n\n  ///* Collect the filter for the boundary and non-boundary I/O module. */\n  //if (boundary)\n  //{\n  //  node = autosa_tree_move_down_to_io_mark(node, kernel->core, module->level);\n  //  node = isl_schedule_node_parent(node);\n  //  if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n  //  {\n  //    boundary_filter = schedule_eq_ub(node);\n  //    non_boundary_filter = schedule_neq_ub(node);\n  //    boundary_filters = isl_union_set_list_from_union_set(non_boundary_filter);\n  //    boundary_filters = isl_union_set_list_add(boundary_filters, boundary_filter);\n//\n  //    node = isl_schedule_node_child(node, 0); // io_mark\n  //    node = isl_schedule_node_child(node, 0); // band\n  //    node = isl_schedule_node_insert_sequence(node, boundary_filters);\n  //    /* The node now is right below the io_[module->level] mark. */\n  //  }\n  //}\n  //else\n  //{\n  //  node = autosa_tree_move_down_to_io_mark(node, kernel->core, module->level);\n  //  node = isl_schedule_node_child(node, 0);\n  //}\n\n  //if (boundary)\n  //{\n  //  node = isl_schedule_node_child(node, 0); // filter\n  //  node = isl_schedule_node_child(node, 0); // band\n  //  /* non-boundary */\n  //  node = io_gen_ext_module(node, module, kernel, group, 0);\n  //  node = autosa_tree_move_down_to_io_mark(node, kernel->core, module->level);\n  //  node = isl_schedule_node_child(node, 0); // sequence\n  //  node = isl_schedule_node_child(node, 1); // filter\n  //  node = isl_schedule_node_child(node, 0); // band\n  //  /* boundary */\n  //  node = io_gen_ext_module(node, module, kernel, group, 1);\n  //}\n  //else\n  //{\n  //  node = io_gen_ext_module(node, module, kernel, group, 0);\n  //}\n  if (boundary && (module->level <= group->space_dim))\n  {\n    if (single_ele == -1) {\n      node = isl_schedule_node_child(node, 0); // filter\n      node = isl_schedule_node_child(node, 0); // band\n      \n      /* non-boundary */\n      //node = io_gen_module_call(node, module, kernel, group, 0, serialize, isl_union_set_copy(group_domain_filter));\n      node = io_gen_ext_module(node, module, kernel, group, 0);\n      node = autosa_tree_move_down_to_io_mark(node, kernel->core, module->level);\n      node = isl_schedule_node_child(node, 0); // sequence\n      node = isl_schedule_node_child(node, 1); // filter\n      node = isl_schedule_node_child(node, 0); // band\n\n      /* boundary */\n      //node = io_gen_module_call(node, module, kernel, group, 1, serialize, isl_union_set_copy(group_domain_filter));\n      node = io_gen_ext_module(node, module, kernel, group, 1);\n    } else {\n      /* boundary */\n      //node = io_gen_module_call(node, module, kernel, group, 1, serialize, isl_union_set_copy(group_domain_filter));\n      node = io_gen_ext_module(node, module, kernel, group, 1);\n    }\n  } else {\n    //node = io_gen_module_call(node, module, kernel, group, boundary, serialize, isl_union_set_copy(group_domain_filter));\n    node = io_gen_ext_module(node, module, kernel, group, 0);\n  }\n\n\n  /* Cleanup the schedule tree. Remove \"array\" and \"io_LX\" mark.\n   */\n  node = autosa_tree_move_down_to_io_mark(node, kernel->core, module->level);\n  node = isl_schedule_node_delete(node);\n  node = autosa_tree_move_up_to_array(node);\n  node = isl_schedule_node_delete(node);\n  node = autosa_tree_move_up_to_kernel(node);\n\n  /* Add module mark after the kernel mark.auto */\n  id = isl_id_alloc(ctx, \"module\", module);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_mark(node, id);  \n\n  schedule = isl_schedule_node_get_schedule(node);\n  isl_schedule_node_free(node);\n  isl_union_set_free(group_domain_filter);\n  isl_union_set_free(group_domain_filter_level);\n\n  top->n_ext_module++;\n  top->ext_module_scheds = (isl_schedule **)realloc(top->ext_module_scheds,\n                                                    top->n_ext_module * sizeof(isl_schedule *));\n  top->ext_module_scheds[top->n_ext_module - 1] = schedule;\n\n  return isl_stat_ok;\n}\n\n/* Generate the module calls for the io module. \n * If serialize is set as 1, we are generating the extra serialization module.\n */\nstatic isl_stat top_module_io_gen_module_call(\n    struct autosa_gen *gen, struct autosa_hw_top_module *top,\n    struct autosa_hw_module *module,\n    struct autosa_array_ref_group *group,\n    int serialize)\n{\n  isl_schedule *schedule;\n  isl_ctx *ctx = gen->ctx;\n  isl_schedule_node *node, *graft;\n  isl_id *id;\n  struct autosa_kernel *kernel = gen->kernel;\n  isl_printer *p_str;\n  char *stmt_name;\n  isl_space *space;\n  isl_union_set *domain, *empty_filter, *lower_level_filter;\n  isl_bool insert_lower = isl_bool_false;\n  int boundary = module->boundary;\n  isl_union_set *boundary_filter, *non_boundary_filter;\n  isl_union_set_list *boundary_filters;\n  isl_union_set *group_domain_filter;\n  int single_ele = -1;\n  isl_union_set *group_domain_filter_level;\n\n  /* Transform the schedule. */\n  schedule = isl_schedule_dup(group->io_schedule);\n  node = isl_schedule_get_root(schedule);\n  isl_schedule_free(schedule);\n\n  /* Compute the union of domains of all the array references in the group. */\n  group_domain_filter = compute_io_group_domain(node, group, kernel, gen, module->in);  \n  group_domain_filter = extend_io_group_domain(group_domain_filter, node, group, kernel, module->level);  \n  group_domain_filter_level = compute_io_group_domain_at_level(group_domain_filter, node, group, kernel, module->level);    \n\n  /* Delete the node above the array mark. */\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  node = isl_schedule_node_parent(node);  \n  while (!(autosa_tree_node_is_kernel(node) || isl_schedule_node_get_type(node) == isl_schedule_node_context)) {\n    node = isl_schedule_node_delete(node);\n    node = isl_schedule_node_parent(node);\n  }\n\n  node = autosa_tree_move_up_to_kernel(node);\n\n  /* Collect the filter for the boundary and non-boundary I/O module. */\n  if (boundary && (module->level <= group->space_dim))\n  {\n    node = autosa_tree_move_down_to_io_mark(node, kernel->core, module->level);\n    node = isl_schedule_node_parent(node);\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n    {\n      /* Test if the band only contains one elmenet */\n      isl_schedule_node *node_tmp;      \n      node_tmp = isl_schedule_node_copy(node);\n      if (group_domain_filter_level) {\n        node_tmp = isl_schedule_node_insert_filter(node_tmp, isl_union_set_copy(group_domain_filter_level));\n        node_tmp = isl_schedule_node_child(node_tmp, 0);\n      }\n      single_ele = get_band_single_schedule_val(node_tmp);\n      if (single_ele == -1) {\n        boundary_filter = schedule_eq_ub(node_tmp);\n        non_boundary_filter = schedule_neq_ub(node_tmp);\n      }\n      isl_schedule_node_free(node_tmp);\n\n//#ifdef _DEBUG\n//      if (!strcmp(module->name, \"U_drain_IO_L2_out\")) {\n//        printf(\"single ele: %d\\n\", single_ele);\n//        DBGUSET(stdout, boundary_filter, ctx);\n//        DBGUSET(stdout, non_boundary_filter, ctx);\n//      }\n//#endif\n\n      if (single_ele == -1) {\n        //boundary_filter = schedule_eq_ub(node);\n        //non_boundary_filter = schedule_neq_ub(node);\n//#ifdef _DEBUG\n//        if (!strcmp(module->name, \"A_IO_L2_in\")) {\n//          printf(\"single ele: %d\\n\", single_ele);\n//          DBGUSET(stdout, boundary_filter, ctx);\n//          DBGUSET(stdout, non_boundary_filter, ctx);\n//        }\n//#endif\n        boundary_filters = isl_union_set_list_from_union_set(non_boundary_filter);\n        boundary_filters = isl_union_set_list_add(boundary_filters, boundary_filter);        \n\n        node = isl_schedule_node_child(node, 0); // io_mark\n        node = isl_schedule_node_child(node, 0); // band      \n        node = isl_schedule_node_insert_sequence(node, boundary_filters);\n        /* The node now is right below the io_[module->level] mark. */      \n      } else {\n        node = isl_schedule_node_child(node, 0); // io_mark\n        node = isl_schedule_node_child(node, 0); // band\n        node = isl_schedule_node_insert_filter(node, isl_union_set_copy(group_domain_filter_level));\n        node = isl_schedule_node_child(node, 0); // band\n      }\n    }\n  }\n  else\n  {\n    node = autosa_tree_move_down_to_io_mark(node, kernel->core, module->level);\n    node = isl_schedule_node_child(node, 0);\n  }\n\n  if (boundary && (module->level <= group->space_dim))\n  {\n//#ifdef _DEBUG\n//    DBGSCHDNODE(stdout, node, ctx);\n//#endif\n    if (single_ele == -1) {\n      node = isl_schedule_node_child(node, 0); // filter\n      node = isl_schedule_node_child(node, 0); // band\n      \n      //if (single_ele != -1) {\n      //  /* boundary */\n      //  node = io_gen_module_call(node, module, kernel, group, 1, serialize, isl_union_set_copy(group_domain_filter));  \n      //} else {\n      //  /* non-boundary */\n      //  node = io_gen_module_call(node, module, kernel, group, 0, serialize, isl_union_set_copy(group_domain_filter));\n      //}\n      /* non-boundary */\n      node = io_gen_module_call(node, module, kernel, group, 0, serialize, isl_union_set_copy(group_domain_filter));\n      node = autosa_tree_move_down_to_io_mark(node, kernel->core, module->level);\n      node = isl_schedule_node_child(node, 0); // sequence\n      node = isl_schedule_node_child(node, 1); // filter\n      node = isl_schedule_node_child(node, 0); // band\n  \n      /* boundary */\n      node = io_gen_module_call(node, module, kernel, group, 1, serialize, isl_union_set_copy(group_domain_filter));\n    } else {\n      /* boundary */\n      node = io_gen_module_call(node, module, kernel, group, 1, serialize, isl_union_set_copy(group_domain_filter));\n    }\n  }\n  else \n  {\n    node = io_gen_module_call(node, module, kernel, group, boundary, serialize, isl_union_set_copy(group_domain_filter));\n  }\n\n  /* Add module mark after the kernel mark.auto */\n  id = isl_id_alloc(ctx, \"module\", module);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_mark(node, id);\n\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif\n\n  schedule = isl_schedule_node_get_schedule(node);\n  isl_schedule_node_free(node);\n  isl_union_set_free(group_domain_filter);\n  isl_union_set_free(group_domain_filter_level);\n\n  top->n_module_calls++;\n  top->module_call_scheds = (isl_schedule **)realloc(top->module_call_scheds,\n                                                     top->n_module_calls * sizeof(isl_schedule *));\n  top->module_call_scheds[top->n_module_calls - 1] = schedule;\n\n  return isl_stat_ok;\n}\n\n/* Generate fifo decls for the I/O module.\n * Currently only works for filter I/O modules.\n */\nstatic isl_stat top_module_io_gen_fifo_decl(struct autosa_gen *gen,\n                                            struct autosa_hw_top_module *top,\n                                            struct autosa_hw_module *module, struct autosa_array_ref_group *group)\n{\n  isl_schedule *schedule;\n  isl_schedule_node *node, *graft;\n  isl_union_set *filter = NULL, *empty_filter;\n  struct autosa_kernel *kernel = gen->kernel;\n  bool insert_filter = isl_bool_false;\n  char *stmt_name;\n  isl_space *space;\n  isl_union_set *domain;\n  isl_printer *p_str;\n  isl_id *id;\n  isl_ctx *ctx = gen->ctx;\n\n  if (module->to_mem)\n    return isl_stat_ok;\n\n  schedule = isl_schedule_dup(group->io_schedule);\n  node = isl_schedule_get_root(schedule);\n  isl_schedule_free(schedule);\n\n  /* Delete the node above the array mark. */\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  node = isl_schedule_node_parent(node);\n  while (!autosa_tree_node_is_kernel(node))\n  {\n    node = isl_schedule_node_delete(node);\n    node = isl_schedule_node_parent(node);\n  }\n\n  node = autosa_tree_move_down_to_io_mark(node, kernel->core, module->level);\n  node = isl_schedule_node_parent(node);\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n  {\n    filter = schedule_eq_ub(node);\n    insert_filter = isl_bool_true;\n  }\n  node = autosa_tree_move_up_to_array(node);\n  node = autosa_tree_move_down_to_io_mark(node, kernel->core, module->level);\n  node = isl_schedule_node_cut(node);\n\n  /* Graft an extension node. */\n  p_str = isl_printer_to_str(ctx);\n  p_str = isl_printer_print_str(p_str, \"fifo_decl.\");\n  p_str = autosa_array_ref_group_print_fifo_name(group, p_str);\n  stmt_name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  space = isl_space_set_alloc(ctx, 0, 0);\n  id = isl_id_alloc(ctx, stmt_name, group);\n  space = isl_space_set_tuple_id(space, isl_dim_set, id);\n  free(stmt_name);\n  domain = isl_union_set_from_set(isl_set_universe(space));\n  graft = isl_schedule_node_from_domain(domain);\n\n  node = isl_schedule_node_graft_before(node, graft);\n\n  if (insert_filter)\n  {\n    isl_union_map *prefix, *extension, *umap;\n    isl_union_set *domain, *range;\n    isl_point *pnt;\n    isl_set *set;\n    isl_map *map;\n    isl_multi_union_pw_aff *mupa;\n\n    node = isl_schedule_node_insert_filter(node, filter);\n    node = isl_schedule_node_child(node, 0);\n\n    prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n    prefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix,\n                                                              isl_union_pw_multi_aff_copy(kernel->contraction));\n    domain = isl_union_map_range(prefix);\n\n    p_str = isl_printer_to_str(ctx);\n    p_str = isl_printer_print_str(p_str, \"fifo_decl_boundary.\");\n    p_str = autosa_array_ref_group_print_fifo_name(group, p_str);\n    stmt_name = isl_printer_get_str(p_str);\n    isl_printer_free(p_str);\n    space = isl_space_set_alloc(ctx, 0, 1);\n    id = isl_id_alloc(ctx, stmt_name, group);\n    space = isl_space_set_tuple_id(space, isl_dim_set, id);\n    free(stmt_name);\n\n    pnt = isl_point_zero(space);\n    set = isl_set_from_point(pnt);\n    range = isl_union_set_from_set(isl_set_copy(set));\n\n    extension = isl_union_map_from_domain_and_range(domain, range);\n    graft = isl_schedule_node_from_extension(extension);\n    map = isl_set_identity(set);\n    map = isl_map_reset_tuple_id(map, isl_dim_out);\n    umap = isl_union_map_from_map(map);\n    mupa = isl_multi_union_pw_aff_from_union_map(umap);\n\n    graft = isl_schedule_node_child(graft, 0);\n    graft = isl_schedule_node_insert_partial_schedule(graft, mupa);\n\n    while (graft && isl_schedule_node_has_parent(graft))\n      graft = isl_schedule_node_parent(graft);\n\n    node = isl_schedule_node_graft_before(node, graft);\n  }\n\n  /* Insert an empty filter. */\n  empty_filter = isl_union_set_from_set(isl_set_empty(\n      isl_set_get_space(kernel->context)));\n  node = isl_schedule_node_insert_filter(node, empty_filter);\n\n  /* Add module mark after the kernel mark. */\n  id = isl_id_alloc(ctx, \"module\", module);\n  node = autosa_tree_move_up_to_kernel(node);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_mark(node, id);\n\n  schedule = isl_schedule_node_get_schedule(node);\n  isl_schedule_node_free(node);\n\n  top->n_fifo_decls++;\n  top->fifo_decl_scheds = (isl_schedule **)realloc(top->fifo_decl_scheds,\n                                                   top->n_fifo_decls * sizeof(isl_schedule *));\n  top->fifo_decl_scheds[top->n_fifo_decls - 1] = schedule;\n  top->fifo_decl_names = (char **)realloc(top->fifo_decl_names,\n                                          top->n_fifo_decls * sizeof(char *));\n  /* Generate fifo_decl name in the format of\n   * [fifo_name].[fifo_width]\n   */\n  p_str = isl_printer_to_str(ctx);\n  p_str = autosa_array_ref_group_print_fifo_name(group, p_str);\n  p_str = isl_printer_print_str(p_str, \"_\");\n  p_str = isl_printer_print_str(p_str, module->name);\n  p_str = isl_printer_print_str(p_str, \".\");\n  int n_lane = get_io_group_n_lane(module, NULL, group);\n  int data_size = group->array->size;\n  int width = data_size * n_lane; // in bytes\n  p_str = isl_printer_print_int(p_str, width);\n  top->fifo_decl_names[top->n_fifo_decls - 1] = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  return isl_stat_ok;\n}\n\n/* Generate the module calls and fifo decls for the io group. */\nstatic isl_stat top_module_io_gen(struct autosa_gen *gen,\n                                  struct autosa_hw_top_module *top,\n                                  struct autosa_hw_module *module)\n{\n  struct autosa_array_ref_group *group;\n  assert(module->n_io_group == 1);\n  group = module->io_groups[0];\n\n  /* Generate the function call schedule. */\n  if (module->is_serialized && module->in) {\n    /* Generate an axtra function call schedule for the serialize module. */\n    top_module_io_gen_module_call(gen, top, module, group, 1);\n  }\n  top_module_io_gen_module_call(gen, top, module, group, 0);\n  if (module->is_serialized && !module->in) {\n    /* Generate an axtra function call schedule for the serialize module. */\n    top_module_io_gen_module_call(gen, top, module, group, 1);\n  }\n\n  /* Generate the fifo declaration schedule. */\n  top_module_io_gen_fifo_decl(gen, top, module, group);\n\n  /* Generate the external memory module arguments setting schedule. */\n  if (gen->options->target == AUTOSA_TARGET_INTEL_OPENCL)\n  {\n    top_module_io_gen_ext_module(gen, top, module, group);\n  }\n\n  return isl_stat_ok;\n}\n\n/* Generate the top module that contains module calls and fifo declarations. */\n__isl_give struct autosa_hw_top_module *sa_top_module_gen(struct autosa_gen *gen)\n{\n  struct autosa_hw_top_module *top_module;\n\n  top_module = autosa_hw_top_module_alloc();\n  top_module->hw_modules = gen->hw_modules;\n  top_module->kernel = gen->kernel;\n  top_module->n_hw_modules = gen->n_hw_modules;\n\n  for (int i = 0; i < gen->n_hw_modules; i++)\n  {\n    struct autosa_hw_module *module = gen->hw_modules[i];\n    if (module->type == PE_MODULE)\n    {\n      top_module_pe_gen(gen, top_module, gen->hw_modules[i]);\n    }\n    else\n    {\n      top_module_io_gen(gen, top_module, gen->hw_modules[i]);\n    }\n  }\n\n  return top_module;\n}\n\n/* Build new schedules for each hardware components.\n * The total number of schedules = \n * [1. the default schedule (CPU code)]\n * 2. PE schedule\n * 3. I/O module schedule\n * 4. drain module schedule\n * 5. top module schedule\n */\nvoid generate_hw_modules(__isl_take isl_schedule *schedule,\n                         struct autosa_gen *gen, struct autosa_kernel *kernel)\n{\n  gen->schedule = schedule;\n  gen->n_hw_modules = 1;\n  gen->hw_modules = isl_calloc_array(gen->ctx,\n                                     struct autosa_hw_module *, gen->n_hw_modules);\n  gen->hw_modules[0] = NULL;\n  \n  /* IO module */\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *info = &kernel->array[i];    \n    info->n_io_group_refs = 0;\n    for (int j = 0; j < info->n_io_group; j++)\n    {\n      int n_hw_modules = 0;\n      struct autosa_hw_module **hw_modules;\n      hw_modules = sa_io_module_gen(info->io_groups[j], gen, &n_hw_modules, 1, 1);\n\n      gen->hw_modules = (struct autosa_hw_module **)realloc(gen->hw_modules,\n                                                            (gen->n_hw_modules + n_hw_modules) * sizeof(struct polysa_hw_module *));\n      for (int k = 0; k < n_hw_modules; k++)\n      {\n        gen->hw_modules[gen->n_hw_modules + k] = hw_modules[k];\n      }\n      gen->n_hw_modules += n_hw_modules;\n      if (hw_modules)\n        free(hw_modules);\n    }    \n  }    \n\n  /* Drain module */\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *info = &kernel->array[i];\n    if (!info->drain_group)\n      continue;\n    int n_hw_modules = 0;\n    struct autosa_hw_module **hw_modules;    \n    hw_modules = sa_io_module_gen(info->drain_group, gen, &n_hw_modules, 0, 1);    \n\n    if (n_hw_modules > 0)\n    {\n      gen->hw_modules = (struct autosa_hw_module **)realloc(gen->hw_modules,\n                                                            (gen->n_hw_modules + n_hw_modules) * sizeof(struct polysa_hw_module *));\n      for (int j = 0; j < n_hw_modules; j++)\n      {\n        gen->hw_modules[gen->n_hw_modules + j] = hw_modules[j];\n      }\n      gen->n_hw_modules += n_hw_modules;\n    }\n    if (hw_modules)\n      free(hw_modules);\n  }    \n\n  /* PE module */\n  gen->hw_modules[0] = sa_pe_module_gen(gen);  \n\n  /* Reorder the sequence of the modules. */\n  gen->hw_modules = hw_module_reorder(gen->hw_modules, gen->n_hw_modules);\n\n  /* top module */\n  struct autosa_hw_top_module *top_module = sa_top_module_gen(gen);\n  gen->hw_top_module = top_module;  \n\n  /* Generate drain merge functions. */\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *info = &kernel->array[i];\n    if (!info->drain_group)\n      continue;\n    if (info->n_mem_ports == 1)\n      continue;\n    struct autosa_drain_merge_func *func =\n        generate_drain_merge_func(info->drain_group, kernel, gen);\n    gen->drain_merge_funcs = (struct autosa_drain_merge_func **)realloc(\n        gen->drain_merge_funcs, (gen->n_drain_merge_funcs + 1) *\n                                    sizeof(struct autosa_drain_merge_func *));\n    gen->drain_merge_funcs[gen->n_drain_merge_funcs] = func;\n    gen->n_drain_merge_funcs++;\n  }\n}\n\n/* Replace any reference to an array element in the range of \"copy\"\n * by a reference to all array elements (defined by the extent of the array).\n */\nstatic __isl_give isl_union_map *approximate_copy_out(\n    __isl_take isl_union_map *copy, struct autosa_prog *prog)\n{\n  int i;\n  isl_union_map *res;\n\n  res = isl_union_map_empty(isl_union_map_get_space(copy));\n\n  for (i = 0; i < prog->n_array; ++i)\n  {\n    isl_space *space;\n    isl_set *set;\n    isl_union_map *copy_i;\n    isl_union_set *extent, *domain;\n\n    space = isl_space_copy(prog->array[i].space);\n    extent = isl_union_set_from_set(isl_set_universe(space));\n    copy_i = isl_union_map_copy(copy);\n    copy_i = isl_union_map_intersect_range(copy_i, extent);\n    set = isl_set_copy(prog->array[i].extent);\n    extent = isl_union_set_from_set(set);\n    domain = isl_union_map_domain(copy_i);\n    copy_i = isl_union_map_from_domain_and_range(domain, extent);\n    res = isl_union_map_union(res, copy_i);\n  }\n\n  isl_union_map_free(copy);\n\n  return res;\n}\n\n/* Internal data structure for node_may_persist.\n *\n * \"tagger\" maps tagged iteration domains to the corresponding untagged\n *\titeration domain.\n *\n * \"may_persist_flow\" is the set of all tagged dataflow dependences\n * with those dependences removed that either precede or follow\n * the kernel launch in a sequence.\n * \"inner_band_flow\" is the set of all tagged dataflow dependences\n * that are local to a given iteration of the outer band nodes\n * with respect to the current node.\n * \"local_flow\" is equal to \"inner_band_flow\", except that the domain\n * and the range have been intersected with intermediate filters\n * on children of sets or sequences.\n */\nstruct ppcg_may_persist_data\n{\n  isl_union_pw_multi_aff *tagger;\n\n  isl_union_map *local_flow;\n  isl_union_map *inner_band_flow;\n  isl_union_map *may_persist_flow;\n};\n\n/* Update the information in \"data\" based on the band ancestor \"node\".\n *\n * In particular, we restrict the dependences in data->local_flow\n * to those dependence where the source and the sink occur in\n * the same iteration of the given band node.\n * We also update data->inner_band_flow to the new value of\n * data->local_flow.\n */\nstatic int update_may_persist_at_band(__isl_keep isl_schedule_node *node,\n                                      struct ppcg_may_persist_data *data)\n{\n  isl_multi_union_pw_aff *partial;\n  isl_union_pw_multi_aff *contraction;\n  isl_union_map *flow;\n\n  if (isl_schedule_node_band_n_member(node) == 0)\n    return 0;\n\n  partial = isl_schedule_node_band_get_partial_schedule(node);\n  contraction = isl_schedule_node_get_subtree_contraction(node);\n  partial = isl_multi_union_pw_aff_pullback_union_pw_multi_aff(partial,\n                                                               contraction);\n  partial = isl_multi_union_pw_aff_pullback_union_pw_multi_aff(partial,\n                                                               isl_union_pw_multi_aff_copy(data->tagger));\n\n  flow = data->local_flow;\n  flow = isl_union_map_eq_at_multi_union_pw_aff(flow, partial);\n  data->local_flow = flow;\n\n  isl_union_map_free(data->inner_band_flow);\n  data->inner_band_flow = isl_union_map_copy(data->local_flow);\n\n  return 0;\n}\n\n/* Given a set of local reaching domain elements \"domain\",\n * expand them to the corresponding leaf domain elements using \"contraction\"\n * and insert the array references tags using data->tagger.\n */\nstatic __isl_give isl_union_set *expand_and_tag(\n    __isl_take isl_union_set *domain,\n    __isl_take isl_union_pw_multi_aff *contraction,\n    struct ppcg_may_persist_data *data)\n{\n  domain = isl_union_set_preimage_union_pw_multi_aff(domain,\n                                                     contraction);\n  domain = isl_union_set_preimage_union_pw_multi_aff(domain,\n                                                     isl_union_pw_multi_aff_copy(data->tagger));\n  return domain;\n}\n\n/* Given a filter node that is the child of a set or sequence node,\n * restrict data->local_flow to refer only to those elements\n * in the filter of the node.\n * \"contraction\" maps the leaf domain elements of the schedule tree\n * to the corresponding domain elements at (the parent of) \"node\".\n */\nstatic int filter_flow(__isl_keep isl_schedule_node *node,\n                       struct ppcg_may_persist_data *data,\n                       __isl_take isl_union_pw_multi_aff *contraction)\n{\n  isl_union_set *filter;\n  isl_union_map *flow;\n\n  flow = data->local_flow;\n  filter = isl_schedule_node_filter_get_filter(node);\n  filter = expand_and_tag(filter, contraction, data);\n  flow = isl_union_map_intersect_domain(flow, isl_union_set_copy(filter));\n  flow = isl_union_map_intersect_range(flow, filter);\n  data->local_flow = flow;\n\n  return 0;\n}\n\n/* Given a filter node \"node\", collect the filters on all preceding siblings\n * (which are also filter nodes), add them to \"filters\" and return the result.\n */\nstatic __isl_give isl_union_set *add_previous_filters(\n    __isl_take isl_union_set *filters, __isl_keep isl_schedule_node *node)\n{\n  isl_schedule_node *sibling;\n\n  sibling = isl_schedule_node_copy(node);\n  while (sibling && isl_schedule_node_has_previous_sibling(sibling))\n  {\n    isl_union_set *filter;\n\n    sibling = isl_schedule_node_previous_sibling(sibling);\n    filter = isl_schedule_node_filter_get_filter(sibling);\n    filters = isl_union_set_union(filters, filter);\n  }\n  isl_schedule_node_free(sibling);\n  if (!sibling)\n    return isl_union_set_free(filters);\n\n  return filters;\n}\n\n/* Given a filter node \"node\", collect the filters on all following siblings\n * (which are also filter nodes), add them to \"filters\" and return the result.\n */\nstatic __isl_give isl_union_set *add_next_filters(\n    __isl_take isl_union_set *filters, __isl_keep isl_schedule_node *node)\n{\n  isl_schedule_node *sibling;\n\n  sibling = isl_schedule_node_copy(node);\n  while (sibling && isl_schedule_node_has_next_sibling(sibling))\n  {\n    isl_union_set *filter;\n\n    sibling = isl_schedule_node_next_sibling(sibling);\n    filter = isl_schedule_node_filter_get_filter(sibling);\n    filters = isl_union_set_union(filters, filter);\n  }\n  isl_schedule_node_free(sibling);\n  if (!sibling)\n    return isl_union_set_free(filters);\n\n  return filters;\n}\n\n/* Remove those flow dependences from data->may_persist_flow\n * that flow between elements of \"domain\" within the same iteration\n * of all outer band nodes.\n * \"contraction\" maps the leaf domain elements of the schedule tree\n * to the corresponding elements \"domain\".\n */\nstatic void remove_external_flow(struct ppcg_may_persist_data *data,\n                                 __isl_take isl_union_set *domain,\n                                 __isl_keep isl_union_pw_multi_aff *contraction)\n{\n  isl_union_map *flow;\n\n  contraction = isl_union_pw_multi_aff_copy(contraction);\n  domain = expand_and_tag(domain, contraction, data);\n  flow = isl_union_map_copy(data->local_flow);\n  flow = isl_union_map_intersect_domain(flow, isl_union_set_copy(domain));\n  flow = isl_union_map_intersect_range(flow, domain);\n\n  data->may_persist_flow = isl_union_map_subtract(data->may_persist_flow,\n                                                  flow);\n}\n\n/* Update the information in \"data\" based on the filter ancestor \"node\".\n * We only need to modify anything if the filter is the child\n * of a set or sequence node.\n *\n * In the case of a sequence, we remove the dependences between\n * statement instances that are both executed either before or\n * after the subtree that will be mapped to a kernel, within\n * the same iteration of outer bands.\n *\n * In both cases, we restrict data->local_flow to the current child.\n */\nstatic int update_may_persist_at_filter(__isl_keep isl_schedule_node *node,\n                                        struct ppcg_may_persist_data *data)\n{\n  enum isl_schedule_node_type type;\n  isl_schedule_node *parent;\n  isl_space *space;\n  isl_union_pw_multi_aff *contraction;\n  isl_union_set *before, *after, *filter;\n\n  type = isl_schedule_node_get_parent_type(node);\n  if (type != isl_schedule_node_sequence && type != isl_schedule_node_set)\n    return 0;\n\n  parent = isl_schedule_node_copy(node);\n  parent = isl_schedule_node_parent(parent);\n  contraction = isl_schedule_node_get_subtree_contraction(parent);\n  isl_schedule_node_free(parent);\n\n  if (type == isl_schedule_node_set)\n    return filter_flow(node, data, contraction);\n\n  filter = isl_schedule_node_filter_get_filter(node);\n  space = isl_union_set_get_space(filter);\n  isl_union_set_free(filter);\n  before = isl_union_set_empty(space);\n  after = isl_union_set_copy(before);\n  before = add_previous_filters(before, node);\n  after = add_next_filters(after, node);\n\n  remove_external_flow(data, before, contraction);\n  remove_external_flow(data, after, contraction);\n\n  return filter_flow(node, data, contraction);\n}\n\n/* Update the information in \"data\" based on the ancestor \"node\".\n */\nstatic isl_stat update_may_persist_at(__isl_keep isl_schedule_node *node,\n                                      void *user)\n{\n  struct ppcg_may_persist_data *data = (struct ppcg_may_persist_data *)user;\n\n  switch (isl_schedule_node_get_type(node))\n  {\n  case isl_schedule_node_error:\n    return isl_stat_error;\n  case isl_schedule_node_context:\n  case isl_schedule_node_domain:\n  case isl_schedule_node_expansion:\n  case isl_schedule_node_extension:\n  case isl_schedule_node_guard:\n  case isl_schedule_node_leaf:\n  case isl_schedule_node_mark:\n  case isl_schedule_node_sequence:\n  case isl_schedule_node_set:\n    break;\n  case isl_schedule_node_band:\n    if (update_may_persist_at_band(node, data) < 0)\n      return isl_stat_error;\n    break;\n  case isl_schedule_node_filter:\n    if (update_may_persist_at_filter(node, data) < 0)\n      return isl_stat_error;\n    break;\n  }\n\n  return isl_stat_ok;\n}\n\n/* Determine the set of array elements that may need to be perserved\n * by a kernel constructed from the subtree at \"node\".\n * This includes the set of array elements that may need to be preserved\n * by the entire scop (prog->may_persist) and the elements for which\n * there is a potential flow dependence that may cross a kernel launch.\n *\n * To determine the second set, we start from all flow dependences.\n * From this set of dependences, we remove those that cannot possibly\n * require data to be preserved by a kernel launch.\n * In particular, we consider the following sets of dependences.\n * - dependences of which the write occurs inside the kernel.\n *   If the data is needed outside the kernel, then it will\n *   be copied out immediately after the kernel launch, so there\n *   is no need for any special care.\n * - dependences of which the read occurs inside the kernel and the\n *   corresponding write occurs inside the same iteration of the\n *   outer band nodes.  This means that the data is needed in\n *   the first kernel launch after the write, which is already\n *   taken care of by the standard copy-in.  That is, the data\n *   do not need to be preserved by any intermediate call to\n *   the same kernel.\n * - dependences of which the write and the read either both occur\n *   before the kernel launch or both occur after the kernel launch,\n *   within the same iteration of the outer band nodes with respect\n *   to the sequence that determines the ordering of the dependence\n *   and the kernel launch.  Such flow dependences cannot cross\n *   any kernel launch.\n *\n * For the remaining (tagged) dependences, we take the domain\n * (i.e., the tagged writes) and apply the tagged access relation\n * to obtain the accessed data elements.\n * These are then combined with the elements that may need to be\n * preserved by the entire scop.\n */\nstatic __isl_give isl_union_set *node_may_persist(\n    __isl_keep isl_schedule_node *node, struct autosa_prog *prog)\n{\n  struct ppcg_may_persist_data data;\n  isl_union_pw_multi_aff *contraction;\n  isl_union_set *domain;\n  isl_union_set *persist;\n  isl_union_map *flow, *local_flow;\n\n  data.tagger = prog->scop->tagger;\n\n  flow = isl_union_map_copy(prog->scop->tagged_dep_flow);\n  data.local_flow = isl_union_map_copy(flow);\n  data.inner_band_flow = isl_union_map_copy(flow);\n  data.may_persist_flow = flow;\n  if (isl_schedule_node_foreach_ancestor_top_down(node,\n                                                  &update_may_persist_at, &data) < 0)\n    data.may_persist_flow =\n        isl_union_map_free(data.may_persist_flow);\n  flow = data.may_persist_flow;\n  isl_union_map_free(data.local_flow);\n\n  domain = isl_schedule_node_get_domain(node);\n  contraction = isl_schedule_node_get_subtree_contraction(node);\n  domain = isl_union_set_preimage_union_pw_multi_aff(domain,\n                                                     contraction);\n  domain = isl_union_set_preimage_union_pw_multi_aff(domain,\n                                                     isl_union_pw_multi_aff_copy(data.tagger));\n  /* Substract the case 1. */\n  flow = isl_union_map_subtract_domain(flow, isl_union_set_copy(domain));\n  local_flow = data.inner_band_flow;\n  local_flow = isl_union_map_intersect_range(local_flow, domain);\n  /* Substract the case 2. */\n  flow = isl_union_map_subtract(flow, local_flow);\n\n  persist = isl_union_map_domain(flow);\n  persist = isl_union_set_apply(persist,\n                                isl_union_map_copy(prog->scop->tagged_may_writes));\n  persist = isl_union_set_union(persist,\n                                isl_union_set_copy(prog->may_persist));\n\n  return persist;\n}\n\n/* Return (the universe spaces of) the arrays that are declared\n * inside the scop corresponding to \"prog\" and for which all\n * potential writes inside the scop form a subset of \"domain\".\n */\nstatic __isl_give isl_union_set *extract_local_accesses(struct autosa_prog *prog,\n                                                        __isl_keep isl_union_set *domain)\n{\n  int i;\n  isl_union_set *local;\n\n  local = isl_union_set_empty(isl_union_set_get_space(domain));\n\n  for (i = 0; i < prog->n_array; ++i)\n  {\n    isl_set *set;\n    isl_union_map *to_outer;\n    isl_union_map *may_write;\n    isl_union_set *write_domain;\n    isl_union_set *fields;\n    int subset;\n\n    if (!prog->array[i].local)\n      continue;\n\n    set = isl_set_universe(isl_space_copy(prog->array[i].space));\n    to_outer = isl_union_map_copy(prog->to_outer);\n    to_outer = isl_union_map_intersect_range(to_outer,\n                                             isl_union_set_from_set(isl_set_copy(set)));\n    fields = isl_union_map_domain(to_outer);\n    may_write = isl_union_map_copy(prog->may_write);\n    may_write = isl_union_map_intersect_range(may_write, fields);\n    write_domain = isl_union_map_domain(may_write);\n    subset = isl_union_set_is_subset(write_domain, domain);\n    isl_union_set_free(write_domain);\n\n    if (subset < 0)\n    {\n      isl_set_free(set);\n      return isl_union_set_free(local);\n    }\n    else if (subset)\n    {\n      local = isl_union_set_add_set(local, set);\n    }\n    else\n    {\n      isl_set_free(set);\n    }\n  }\n\n  return local;\n}\n\n/* For each array in \"prog\" of which an element appears in \"accessed\" and\n * that is not a read only scalar, create a zero-dimensional universe set\n * of which the tuple id has name \"<prefix>_<name of array>\" and a user\n * pointer pointing to the array (autosa_array_info).\n *\n * If the array is local to \"prog\", then make sure it will be declared\n * in the host code.\n *\n * Return the list of these universe sets.\n */\nstatic __isl_give isl_union_set_list *create_copy_filters(struct autosa_prog *prog,\n                                                          const char *prefix, __isl_take isl_union_set *accessed)\n{\n  int i;\n  isl_ctx *ctx;\n  isl_union_set_list *filters;\n\n  ctx = prog->ctx;\n  filters = isl_union_set_list_alloc(ctx, 0);\n  for (i = 0; i < prog->n_array; ++i)\n  {\n    struct autosa_array_info *array = &prog->array[i];\n    isl_space *space;\n    isl_set *accessed_i;\n    int empty;\n    char *name;\n    isl_id *id;\n    isl_union_set *uset;\n\n    if (autosa_array_is_read_only_scalar(array))\n      continue;\n\n    space = isl_space_copy(array->space);\n    accessed_i = isl_union_set_extract_set(accessed, space);\n    empty = isl_set_plain_is_empty(accessed_i);\n    isl_set_free(accessed_i);\n    if (empty < 0)\n    {\n      filters = isl_union_set_list_free(filters);\n      break;\n    }\n    if (empty)\n      continue;\n\n    array->global = 1;\n    array->local_array->global = 1;\n    if (array->local)\n      array->declare_local = 1;\n    if (!strcmp(prefix, \"to_device\"))\n      array->copy_in = 1;\n    if (!strcmp(prefix, \"from_device\"))\n      array->copy_out = 1;\n\n    name = concat(ctx, prefix, array->name);\n    id = name ? isl_id_alloc(ctx, name, array) : NULL;\n    free(name);\n    space = isl_space_set_alloc(ctx, 0, 0);\n    space = isl_space_set_tuple_id(space, isl_dim_set, id);\n    uset = isl_union_set_from_set(isl_set_universe(space));\n\n    filters = isl_union_set_list_add(filters, uset);\n  }\n  isl_union_set_free(accessed);\n\n  return filters;\n}\n\n/* Return the set of parameter values for which the array has a positive\n * size in all dimensions.\n * If the sizes are only valid for some parameter values, then those\n * constraints are also taken into account.\n */\n__isl_give isl_set *autosa_array_positive_size_guard(struct autosa_array_info *array)\n{\n  int i;\n  isl_space *space;\n  isl_set *guard;\n\n  if (!array)\n    return NULL;\n\n  space = isl_space_params(isl_space_copy(array->space));\n  guard = isl_set_universe(space);\n\n  for (i = 0; i < array->n_index; ++i)\n  {\n    isl_pw_aff *bound;\n    isl_set *guard_i, *zero;\n\n    bound = isl_multi_pw_aff_get_pw_aff(array->bound, i);\n    guard_i = isl_pw_aff_nonneg_set(isl_pw_aff_copy(bound));\n    zero = isl_pw_aff_zero_set(bound);\n    guard_i = isl_set_subtract(guard_i, zero);\n    guard = isl_set_intersect(guard, guard_i);\n  }\n\n  return guard;\n}\n\n/* Make sure that code for the statements in \"filters\" that\n * copy arrays to or from the device is only generated when\n * the size of the corresponding array is positive.\n * That is, add a set node underneath \"graft\" with \"filters\" as children\n * and for each child add a guard that the selects the parameter\n * values for which the corresponding array has a positive size.\n * The array is available in the user pointer of the statement identifier.\n * \"depth\" is the schedule depth of the position where \"graft\"\n * will be added.\n */\nstatic __isl_give isl_schedule_node *insert_positive_size_guards(\n    __isl_take isl_schedule_node *graft,\n    __isl_take isl_union_set_list *filters, int depth)\n{\n  int i, n;\n\n  graft = isl_schedule_node_child(graft, 0);\n  graft = isl_schedule_node_insert_set(graft, filters);\n  n = isl_schedule_node_n_children(graft);\n  for (i = 0; i < n; ++i)\n  {\n    isl_union_set *filter;\n    isl_set *domain, *guard;\n    isl_id *id;\n    struct autosa_array_info *array;\n\n    graft = isl_schedule_node_child(graft, i);\n    filter = isl_schedule_node_filter_get_filter(graft);\n    domain = isl_set_from_union_set(filter);\n    id = isl_set_get_tuple_id(domain);\n    array = (struct autosa_array_info *)isl_id_get_user(id);\n    isl_id_free(id);\n    isl_set_free(domain);\n    guard = autosa_array_positive_size_guard(array);\n    guard = isl_set_from_params(guard);\n    guard = isl_set_add_dims(guard, isl_dim_set, depth);\n    graft = isl_schedule_node_child(graft, 0);\n    graft = isl_schedule_node_insert_guard(graft, guard);\n    graft = isl_schedule_node_parent(graft);\n    graft = isl_schedule_node_parent(graft);\n  }\n  graft = isl_schedule_node_parent(graft);\n\n  return graft;\n}\n\n/* Create a graft for copying arrays to or from the device,\n * whenever the size of the array is strictly positive.\n * Each statement is called \"<prefix>_<name of array>\" and\n * the identifier has a user pointer pointing to the array.\n * The graft will be added at the position specified by \"node\".\n * \"copy\" contains the array elements that need to be copied.\n * Only arrays of which some elements need to be copied\n * will have a corresponding statement in the graph.\n * Note though that each such statement will copy the entire array.\n */\nstatic __isl_give isl_schedule_node *create_copy_device(struct autosa_prog *prog,\n                                                        __isl_keep isl_schedule_node *node, const char *prefix,\n                                                        __isl_take isl_union_set *copy)\n{\n  int depth;\n  isl_ctx *ctx;\n  isl_space *space;\n  isl_union_set *all, *domain;\n  isl_union_set_list *filters;\n  isl_union_map *extension;\n  isl_schedule_node *graft;\n\n  ctx = prog->ctx;\n  depth = isl_schedule_node_get_schedule_depth(node);\n  filters = create_copy_filters(prog, prefix, copy);\n  all = isl_union_set_list_union(isl_union_set_list_copy(filters));\n\n  space = depth < 0 ? NULL : isl_space_set_alloc(ctx, 0, depth);\n  domain = isl_union_set_from_set(isl_set_universe(space));\n  extension = isl_union_map_from_domain_and_range(domain, all);\n  graft = isl_schedule_node_from_extension(extension);\n\n  if (!filters)\n    return isl_schedule_node_free(graft);\n  if (isl_union_set_list_n_union_set(filters) == 0)\n  {\n    isl_union_set_list_free(filters);\n    return graft;\n  }\n\n  return insert_positive_size_guards(graft, filters, depth);\n}\n\n/* Add nodes for copying outer arrays in and out of the device\n * before and after the subtree \"node\", which contains one or more kernels.\n * \"domain\" contains the original statement instances, i.e.,\n * those that correspond to the domains of the access relations in \"prog\".\n * In particular, the domain has not been contracted in any way.\n * \"prefix\" contains the prefix schedule at that point, in terms\n * of the same original statement instances.\n *\n * We first compute the sets of outer array elements that need\n * to be copied in and out and then graft in the nodes for\n * performing this copying.\n *\n * In particular, for each array that is possibly written anywhere in\n * the subtree \"node\" and that may be used after \"node\"\n * or that may be visible outside the corresponding scop,\n * we copy out its entire extent.\n *\n * Any array elements that is read without first being written inside\n * the subtree \"node\" needs to be copied in.\n * Furthermore, if there are any array elements that\n * are copied out, but that may not be written inside \"node\", then\n * they also need to be copied in to ensure that the value after execution\n * is the same as the value before execution, at least for those array\n * elements that may have their values preserved by the scop or that\n * may be written before \"node\" and read after \"node\".\n * In case the array elements are structures, we need to take into\n * account that all members of the structures need to be written\n * by \"node\" before we can avoid copying the data structure in.\n *\n * Note that the may_write relation is intersected with the domain,\n * which has been intersected with the context.\n * This helps in those cases where the arrays are declared with a fixed size,\n * while the accesses are parametric and the context assigns a fixed value\n * to the parameters.\n *\n * If an element from a local array is read without first being written,\n * then there is no point in copying it in since it cannot have been\n * written prior to the scop. Warn about the uninitialized read instead.\n */\n__isl_give isl_schedule_node *sa_add_to_from_device(\n    __isl_take isl_schedule_node *node, __isl_take isl_union_set *domain,\n    __isl_take isl_union_map *prefix, struct autosa_prog *prog)\n{\n  isl_union_set *local;\n  isl_union_set *may_persist;\n  isl_union_map *may_write, *must_write, *copy_out, *not_written;\n  isl_union_map *read, *copy_in;\n  isl_union_map *tagged;\n  isl_union_map *local_uninitialized;\n  isl_schedule_node *graft;\n\n  /* Compute the copy-out that contains the live-out union\n   * domain of non-local flow dep. \n   */\n  tagged = isl_union_map_copy(prog->scop->tagged_reads);\n  tagged = isl_union_map_union(tagged,\n                               isl_union_map_copy(prog->scop->tagged_may_writes));\n  may_write = isl_union_map_copy(prog->may_write);\n  may_write = isl_union_map_intersect_domain(may_write,\n                                             isl_union_set_copy(domain));\n  /* Keep only the live-out union domain of non-local flow. */\n  may_write = remove_local_accesses(prog,\n                                    isl_union_map_copy(tagged), may_write,\n                                    isl_union_map_copy(prefix), 0);\n  may_write = isl_union_map_apply_range(may_write,\n                                        isl_union_map_copy(prog->to_outer));\n  may_write = isl_union_map_apply_domain(may_write,\n                                         isl_union_map_copy(prefix));\n  may_write = approximate_copy_out(may_write, prog);\n  copy_out = isl_union_map_copy(may_write);\n\n  /* Compute the copy-in. */\n  may_write = isl_union_map_apply_range(may_write,\n                                        isl_union_map_copy(prog->to_inner));\n  must_write = isl_union_map_copy(prog->must_write);\n  must_write = isl_union_map_apply_domain(must_write,\n                                          isl_union_map_copy(prefix));\n\n  may_persist = node_may_persist(node, prog);\n  may_write = isl_union_map_intersect_range(may_write, may_persist);\n  not_written = isl_union_map_subtract(may_write, must_write);\n\n  /* Detect the unitialized reads. */\n  /* \"local\" contains (universal space) of arrays that are declared locally and \n   * written by \"domain\". */\n  local = extract_local_accesses(prog, domain);\n  local = isl_union_set_apply(local, isl_union_map_copy(prog->to_inner));\n  local_uninitialized = isl_union_map_copy(prog->scop->live_in);\n  /* The local unitialized is defined as a read of a local array without \n   * first being written. */\n  local_uninitialized = isl_union_map_intersect_range(local_uninitialized,\n                                                      local);\n  read = isl_union_map_copy(prog->read);\n  read = isl_union_map_intersect_domain(read, domain);\n  read = remove_local_accesses(prog, tagged, read,\n                               isl_union_map_copy(prefix), 1);\n  local_uninitialized = isl_union_map_intersect(local_uninitialized,\n                                                isl_union_map_copy(read));\n  if (!isl_union_map_is_empty(local_uninitialized))\n  {\n    fprintf(stderr,\n            \"possibly uninitialized reads (not copied in):\\n\");\n    isl_union_map_dump(local_uninitialized);\n  }\n  read = isl_union_map_subtract(read, local_uninitialized);\n  read = isl_union_map_apply_domain(read, prefix);\n  copy_in = isl_union_map_union(read, not_written);\n  copy_in = isl_union_map_apply_range(copy_in,\n                                      isl_union_map_copy(prog->to_outer));\n\n  /* Add in the copy-in/copy-out nodes. */\n  graft = create_copy_device(prog, node, \"to_device\",\n                             isl_union_map_range(copy_in));\n  node = isl_schedule_node_graft_before(node, graft);\n  graft = create_copy_device(prog, node, \"from_device\",\n                             isl_union_map_range(copy_out));\n  node = isl_schedule_node_graft_after(node, graft);\n\n  return node;\n}\n\n/* Add nodes for initializing (\"init_device\") and clearing (\"clear_device\")\n * the device before and after \"node\".\n */\n__isl_give isl_schedule_node *sa_add_init_clear_device(\n    __isl_take isl_schedule_node *node, struct autosa_kernel *kernel)\n{\n  isl_ctx *ctx;\n  isl_space *space;\n  isl_union_set *domain;\n  isl_schedule_node *graft;\n  isl_id *id;\n\n  ctx = isl_schedule_node_get_ctx(node);\n\n  space = isl_space_set_alloc(ctx, 0, 0);\n  id = isl_id_alloc(ctx, \"init_device\", kernel);\n  //space = isl_space_set_tuple_name(space, isl_dim_set, \"init_device\");\n  space = isl_space_set_tuple_id(space, isl_dim_set, id);\n  domain = isl_union_set_from_set(isl_set_universe(space));\n  graft = isl_schedule_node_from_domain(domain);\n\n  node = isl_schedule_node_graft_before(node, graft);\n\n  space = isl_space_set_alloc(ctx, 0, 0);\n  id = isl_id_alloc(ctx, \"clear_device\", kernel);\n  //space = isl_space_set_tuple_name(space, isl_dim_set, \"clear_device\");\n  space = isl_space_set_tuple_id(space, isl_dim_set, id);\n  domain = isl_union_set_from_set(isl_set_universe(space));\n  graft = isl_schedule_node_from_domain(domain);\n\n  node = isl_schedule_node_graft_after(node, graft);\n\n  return node;\n}\n\n__isl_give isl_schedule_node *sa_add_drain_merge(\n    __isl_take isl_schedule_node *node, struct autosa_gen *gen)\n{\n  isl_ctx *ctx;\n\n  ctx = isl_schedule_node_get_ctx(node);\n  for (int i = 0; i < gen->n_drain_merge_funcs; i++)\n  {\n    isl_id *id;\n    isl_space *space;\n    isl_union_set *domain;\n    isl_schedule_node *graft;\n    struct autosa_drain_merge_func *func = gen->drain_merge_funcs[i];\n    struct autosa_array_ref_group *group = func->group;\n    if (group->local_array->n_mem_ports == 1)\n      continue;\n    space = isl_space_set_alloc(ctx, 0, 0);\n    id = isl_id_alloc(ctx, \"drain_merge\", func);\n    space = isl_space_set_tuple_id(space, isl_dim_set, id);\n    domain = isl_union_set_from_set(isl_set_universe(space));\n    graft = isl_schedule_node_from_domain(domain);\n    node = isl_schedule_node_graft_after(node, graft);\n  }\n\n  return node;\n}\n\n/***************************************************************\n * AST Codegen\n ***************************************************************/\n/* Internal data structure for at_domain.\n * \"prog\" represents the entire scop.\n * \"kernel\" points to the kernel to which the current schedule node\n * belongs. It is set by before_mark and reset by after_mark.\n * It may be NULL if we are outside any kernel.\n */\nstruct autosa_at_domain_data\n{\n  struct autosa_prog *prog;\n  struct autosa_kernel *kernel;\n  struct autosa_hw_module *module;\n  struct autosa_hw_top_module *top;\n  struct autosa_pe_dummy_module *pe_dummy_module;\n  struct autosa_drain_merge_func *drain_merge_func;\n  int filter_buffer;\n  int boundary;\n  int pe_dummy;\n  /* In the tuning mode. */\n  int tuning;\n  int tuning_num;\n\n  /* Under a \"pipeline\" mark */\n  int under_pipeline;\n  /* Under a \"unroll\" mark */\n  int under_unroll;\n  /* Inside a \"pipeline\" for loop */\n  int in_pipeline_for;\n  /* Inside a \"unroll\" for loop */\n  int in_unroll_for;\n  /* Inside a for loop */\n  int in_for;\n};\n\n/* Internal data structure for the index and AST expression transformation\n * callbacks for pet_stmt_build_ast_exprs.\n *\n * \"kernel\" is the kernel for which are computing AST expressions and\n * may be NULL if we are not inside a kernel.\n * \"accesses\" is the list of polysa_stmt_access in the statement.\n * \"iterator_map\" expresses the statement iterators in terms of\n * the AST loop iterators.\n * \"sched2copy\" expresses the outer copy_schedule_dim dimensions of\n * the kernel schedule in terms of the AST loop iterators and\n * may be NULL if we are not inside a kernel.\n *\n * The following fields are set in transform_index and used in transform_expr.\n * \"array\" is the array that is being accessed.\n * \"global\" is set if the global array is accessed (rather than\n * shared/private memory).\n * \"local_array\" refers to information on the array specialized\n * to the current kernel.\n */\nstruct autosa_transform_data\n{\n  struct autosa_kernel *kernel;\n  struct autosa_stmt_access *accesses;\n  isl_pw_multi_aff *iterator_map;\n  isl_pw_multi_aff *sched2copy;\n\n  struct autosa_array_info *array;\n  int global;\n  int reg;\n  struct autosa_local_array_info *local_array;\n  struct autosa_array_ref_group *group;\n};\n\n/* Set *depth (initialized to 0 by the caller) to the maximum\n * of the schedule depths of the leaf nodes for which this function is called.\n */\nstatic isl_bool update_depth(__isl_keep isl_schedule_node *node, void *user)\n{\n  int *depth = (int *)user;\n  int node_depth;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_leaf)\n    return isl_bool_true;\n  node_depth = isl_schedule_node_get_schedule_depth(node);\n  if (node_depth > *depth)\n    *depth = node_depth;\n\n  return isl_bool_false;\n}\n\n/* Given a mapping \"iterator_map\" from the AST schedule to a domain,\n * return the corresponding mapping from the AST schedule\n * to the outer kernel->copy_schedule_dim dimensions of\n * the schedule computed by AutoSA for this kernel.\n *\n * Note that kernel->copy_schedule_dim is at least as large as\n * the largest depth of any array reference group associated to the kernel.\n * This is needed as the returned schedule is used to extract a mapping\n * to the outer tile->depth dimensions in transform_index.\n */\nstatic __isl_give isl_pw_multi_aff *compute_sched_to_copy(\n    struct autosa_kernel *kernel, __isl_take isl_pw_multi_aff *iterator_map)\n{\n  isl_union_pw_multi_aff *upma;\n  isl_pw_multi_aff *pma;\n  isl_space *space;\n\n  space = isl_space_range(isl_pw_multi_aff_get_space(iterator_map));\n  space = isl_space_from_domain(space);\n  space = isl_space_add_dims(space, isl_dim_out,\n                             kernel->copy_schedule_dim);\n\n  upma = isl_union_pw_multi_aff_copy(kernel->copy_schedule);\n  pma = isl_union_pw_multi_aff_extract_pw_multi_aff(upma, space);\n  isl_union_pw_multi_aff_free(upma);\n\n  return isl_pw_multi_aff_pullback_pw_multi_aff(pma, iterator_map);\n}\n\n/* Return the autosa_stmt_access in the list \"accesses\" that corresponds\n * to \"ref_id\".\n */\nstatic struct autosa_stmt_access *find_access(struct autosa_stmt_access *accesses,\n                                              __isl_keep isl_id *ref_id)\n{\n  struct autosa_stmt_access *access;\n\n  for (access = accesses; access; access = access->next)\n    if (access->ref_id == ref_id)\n      return access;\n\n  return NULL;\n}\n\n/* Return the name of the outer array (of structs) accessed by \"access\".\n */\nstatic const char *get_outer_array_name(__isl_keep isl_map *access)\n{\n  isl_space *space;\n  const char *name;\n\n  space = isl_space_range(isl_map_get_space(access));\n  while (space && isl_space_is_wrapping(space))\n    space = isl_space_domain(isl_space_unwrap(space));\n  name = isl_space_get_tuple_name(space, isl_dim_set);\n  isl_space_free(space);\n\n  return name;\n}\n\n/* Return the index of the array called \"name\" in the list of arrays.\n */\nstatic int find_array_index(struct autosa_kernel *kernel, const char *name)\n{\n  int i;\n\n  for (i = 0; i < kernel->n_array; ++i)\n    if (!strcmp(name, kernel->array[i].array->name))\n      return i;\n\n  return -1;\n}\n\n/* Return a pointer to the autosa_array_ref_group in \"local\"\n * that contains the reference \"access\".\n * Return NULL if no such group can be found.\n */\nstatic struct autosa_array_ref_group *find_ref_group(\n    struct autosa_local_array_info *local, struct autosa_stmt_access *access)\n{\n  int i, j;\n\n  for (i = 0; i < local->n_group; ++i)\n  {\n    struct autosa_array_ref_group *group = local->groups[i];\n\n    for (j = 0; j < group->n_ref; ++j)\n      if (group->refs[j] == access)\n        return group;\n  }\n\n  return NULL;\n}\n\n/* Given a mapping \"iterator_map\" from the AST schedule to a domain,\n * return the corresponding mapping from the AST schedule\n * to the outer group->copy_schedule_dim dimensions of\n * the schedule computed by AutoSA for this kernel.\n *\n * Note that group->copy_schedule_dim is at least as large as\n * the largest depth of any array references associated to the group.\n * This is needed as the returned schedule is used to extract a mapping\n * to the outer tile->depth dimensions in transform_index.\n */\nstatic __isl_give isl_pw_multi_aff *compute_sched_to_copy_group(\n    __isl_take isl_pw_multi_aff *iterator_map,\n    struct autosa_array_ref_group *group)\n{\n  isl_union_pw_multi_aff *upma;\n  isl_pw_multi_aff *pma;\n  isl_space *space;\n\n  space = isl_space_range(isl_pw_multi_aff_get_space(iterator_map));\n  space = isl_space_from_domain(space);\n  space = isl_space_add_dims(space, isl_dim_out,\n                             group->copy_schedule_dim);\n\n  upma = isl_union_pw_multi_aff_copy(group->copy_schedule);\n  pma = isl_union_pw_multi_aff_extract_pw_multi_aff(upma, space);\n  isl_union_pw_multi_aff_free(upma);\n\n  return isl_pw_multi_aff_pullback_pw_multi_aff(pma, iterator_map);\n}\n\n/* Given an index expression \"index\" of the form\n *\n *\tL -> F(A),\n *\n * with F(A) either A or some subfield of A and L the AST loop iterators,\n * and a tiling \"tiling\" of the form\n *\n *\t[L -> A] -> T\n *\n * apply the tiling to the outer array in the index expression to obtain\n *\n *\tL -> T(A)\n *\n * If F(A) is some subfield of A, then separate the member access\n * into the base index expression and the field index expression,\n * apply the tiling to the base index expression and combine the result\n * with the field index expression.\n *\n * If F(A) is A, then modify index to keep track of the iterators\n *\n *\tL -> [L -> A]\n *\n * and combine the result with the tiling to obtain a tiled index expression\n * in terms of the AST loop iterators\n *\n *\tL -> T\n */\nstatic __isl_give isl_multi_pw_aff *tile_outer(\n    __isl_take isl_multi_pw_aff *index, __isl_take isl_multi_pw_aff *tiling)\n{\n  isl_bool is_wrapping;\n  isl_space *space;\n  isl_multi_pw_aff *mpa;\n\n  is_wrapping = isl_multi_pw_aff_range_is_wrapping(index);\n  if (is_wrapping < 0)\n    goto error;\n  if (is_wrapping)\n  {\n    isl_multi_pw_aff *field;\n\n    field = isl_multi_pw_aff_copy(index);\n    field = isl_multi_pw_aff_range_factor_range(field);\n    index = isl_multi_pw_aff_range_factor_domain(index);\n    index = tile_outer(index, tiling);\n    return isl_multi_pw_aff_range_product(index, field);\n  }\n\n  space = isl_space_domain(isl_multi_pw_aff_get_space(index));\n  space = isl_space_map_from_set(space);\n  mpa = isl_multi_pw_aff_identity(space);\n  index = isl_multi_pw_aff_range_product(mpa, index);\n  index = isl_multi_pw_aff_pullback_multi_pw_aff(tiling, index);\n\n  return index;\nerror:\n  isl_multi_pw_aff_free(index);\n  isl_multi_pw_aff_free(tiling);\n  return NULL;\n}\n\n/* Index transformation callback for pet_stmt_build_ast_exprs.\n *\n * \"index\" expresses the array indices in terms of statement iterators\n *\n * We first reformulate \"index\" in terms of the AST loop iterators.\n * Then we check if we are accessing the global array or\n * a shared/private copy.  In particular, if we are not inside a kernel\n * then we must be accessing a global array.\n * In the former case, we simply return\n * the updated index.  If \"index\" is an affine expression rather\n * than an array access, then we also return the updated index here.\n *\n * If no reference groups have been computed for the array,\n * then we can only be accessing the global array.\n *\n * Otherwise, we apply the tiling to the index.\n * This tiling is of the form\n *\n *\t[D -> A] -> T\n *\n * where D corresponds to the outer tile->depth dimensions of\n * the kernel schedule.\n * The index is of the form\n *\n *\tL -> A\n *\n * We update the tiling to refer to the AST loop iterators\n *\n *\t[L -> A] -> T\n *\n * and combine it with the index to obtain a tiled index expression in terms\n * of the AST loop iterators\n *\n *\tL -> T\n *\n * Note that while the tiling applies directly to an outer array.\n * the index may refer to some subfield of this outer array.\n * In such cases, the result will refer to the same subfield of the tile.\n * That is, an index expression of the form  L -> F(A) will be transformed\n * into an index expression of the form L -> F(T).\n */\nstatic __isl_give isl_multi_pw_aff *transform_index(\n    __isl_take isl_multi_pw_aff *index, __isl_keep isl_id *ref_id,\n    void *user)\n{\n  struct autosa_transform_data *data = (struct autosa_transform_data *)user;\n  struct autosa_stmt_access *access;\n  struct autosa_array_ref_group *group;\n  struct autosa_array_tile *tile;\n  isl_pw_multi_aff *iterator_map;\n  int i;\n  int dim;\n  const char *name;\n  isl_space *space;\n  isl_multi_pw_aff *tiling;\n  isl_pw_multi_aff *pma;\n  isl_pw_multi_aff *sched2depth;\n  isl_pw_multi_aff *sched2copy;\n\n  data->array = NULL;\n\n  iterator_map = isl_pw_multi_aff_copy(data->iterator_map);\n  index = isl_multi_pw_aff_pullback_pw_multi_aff(index, iterator_map);\n\n  if (!data->kernel)\n    return index;\n\n  access = find_access(data->accesses, ref_id);\n  if (!access)\n    return index;\n  if (!isl_map_has_tuple_name(access->access, isl_dim_out))\n    return index;\n\n  name = get_outer_array_name(access->access);\n  if (!name)\n    return isl_multi_pw_aff_free(index);\n  i = find_array_index(data->kernel, name);\n  if (i < 0)\n    isl_die(isl_multi_pw_aff_get_ctx(index), isl_error_internal,\n            \"cannot find array\",\n            return isl_multi_pw_aff_free(index));\n  data->local_array = &data->kernel->array[i];\n  data->array = data->local_array->array;\n  group = find_ref_group(data->local_array, access);\n  data->group = group;\n  if (!group)\n  {\n    data->global = 1;\n    data->reg = 1;\n    return index;\n  }\n\n  tile = autosa_array_ref_group_tile(group);\n  data->global = !tile;\n  data->reg = !tile;\n  if (!tile)\n    return index;\n\n  /* recompute the sched2copy for each index. */\n  if (group->group_type == AUTOSA_PE_GROUP) {\n    //std::cout << \"guard begin\" << std::endl;\n    sched2copy = compute_sched_to_copy_group(isl_pw_multi_aff_copy(data->iterator_map), group);\n    //std::cout << \"guard end\" << std::endl;\n  }\n\n  space = isl_space_domain(isl_multi_aff_get_space(tile->tiling));\n  space = isl_space_range(isl_space_unwrap(space));\n  space = isl_space_map_from_set(space);\n  pma = isl_pw_multi_aff_identity(space);\n  if (group->group_type == AUTOSA_PE_GROUP) {\n    sched2depth = sched2copy;\n  } else {\n    sched2depth = isl_pw_multi_aff_copy(data->sched2copy);\n  }\n  dim = isl_pw_multi_aff_dim(sched2depth, isl_dim_out);\n  sched2depth = isl_pw_multi_aff_drop_dims(sched2depth, isl_dim_out,\n                                           tile->depth, dim - tile->depth);\n  pma = isl_pw_multi_aff_product(sched2depth, pma);\n  tiling = isl_multi_pw_aff_from_multi_aff(\n      isl_multi_aff_copy(tile->tiling));\n  tiling = isl_multi_pw_aff_pullback_pw_multi_aff(tiling, pma);\n\n  index = tile_outer(index, tiling);\n\n  return index;\n}\n\n/* Dereference \"expr\" by adding an index [0].\n * The original \"expr\" is assumed not to have any indices.\n *\n * If \"expr\" is a member access, then the dereferencing needs\n * to be applied to the structure argument of this member access.\n */\nstatic __isl_give isl_ast_expr *dereference(__isl_take isl_ast_expr *expr)\n{\n  isl_ctx *ctx;\n  isl_ast_expr *arg0, *res;\n  isl_ast_expr_list *list;\n\n  arg0 = isl_ast_expr_get_op_arg(expr, 0);\n  if (!arg0)\n    return isl_ast_expr_free(expr);\n  if (isl_ast_expr_get_type(arg0) == isl_ast_expr_op &&\n      isl_ast_expr_get_op_type(arg0) == isl_ast_op_member)\n  {\n    isl_ast_expr *arg;\n\n    arg = isl_ast_expr_get_op_arg(arg0, 0);\n    arg = dereference(arg);\n    arg0 = isl_ast_expr_set_op_arg(arg0, 0, arg);\n    expr = isl_ast_expr_set_op_arg(expr, 0, arg0);\n\n    return expr;\n  }\n  isl_ast_expr_free(arg0);\n\n  ctx = isl_ast_expr_get_ctx(expr);\n  res = isl_ast_expr_from_val(isl_val_zero(ctx));\n  list = isl_ast_expr_list_from_ast_expr(res);\n  res = isl_ast_expr_get_op_arg(expr, 0);\n  res = isl_ast_expr_access(res, list);\n  isl_ast_expr_free(expr);\n\n  return res;\n}\n\n/* Linearize the index expression \"expr\" based on the array bounds\n * of \"array\".\n *\n * That is, transform expression\n *\n *\tA[i_0][i_1]...[i_n]\n *\n * to\n *\n *\tA[(..((i_0 * b_1 + i_1) ... ) * b_n + i_n]\n *\n * where b_0, b_1, ..., b_n are the bounds on the array.\n *\n * If the base of \"expr\" is a member access, then the linearization needs\n * to be applied to the structure argument of this member access.\n *\n * In the base case, if \"expr\" has no arguments (other than the name of\n * the array), then we are passing an entire array to a function.\n * In this case, there is nothing to linearize.\n * Note that at this point an expression with no arguments can\n * only be an entire array because the scalar case and\n * the case of single struct are handled by the caller.\n *\n * If the number of specified index expressions in \"expr\"\n * is smaller than the dimension of the accessed array,\n * then the missing i_j also do not appear in the linearized expression.\n * Furthermore, since such an expression does not refer to a single\n * element while the default linearized expression would refer to\n * a single element, we return the expression\n *\n *\tA + (..((i_0 * b_1 + i_1) ... ) * b_l + i_l)\n *\n * instead.  Note that because of the special case handling above,\n * we can assume here that there is at least one index expression.\n */\n__isl_give isl_ast_expr *autosa_local_array_info_linearize_index(\n    struct autosa_local_array_info *array, __isl_take isl_ast_expr *expr)\n{\n  int i, n;\n  isl_ast_expr *arg0;\n  isl_ast_expr *res;\n  isl_ast_expr_list *list;\n\n  arg0 = isl_ast_expr_get_op_arg(expr, 0);\n  if (isl_ast_expr_get_type(arg0) == isl_ast_expr_op &&\n      isl_ast_expr_get_op_type(arg0) == isl_ast_op_member)\n  {\n    isl_ast_expr *arg;\n\n    arg = isl_ast_expr_get_op_arg(arg0, 0);\n    arg = autosa_local_array_info_linearize_index(array, arg);\n    arg0 = isl_ast_expr_set_op_arg(arg0, 0, arg);\n    expr = isl_ast_expr_set_op_arg(expr, 0, arg0);\n\n    return expr;\n  }\n  isl_ast_expr_free(arg0);\n\n  if (isl_ast_expr_get_op_n_arg(expr) == 1)\n    return expr;\n\n  n = isl_ast_expr_get_op_n_arg(expr);\n  res = isl_ast_expr_get_op_arg(expr, 1);\n  for (i = 1; i < array->n_index; ++i)\n  {\n    isl_ast_expr *expr_i;\n\n    expr_i = isl_ast_expr_get_op_arg(array->bound_expr, 1 + i);\n    res = isl_ast_expr_mul(res, expr_i);\n\n    if (i + 1 >= n)\n      continue;\n    expr_i = isl_ast_expr_get_op_arg(expr, i + 1);\n    res = isl_ast_expr_add(res, expr_i);\n  }\n\n  if (1 + array->n_index > n)\n  {\n    res = isl_ast_expr_add(isl_ast_expr_get_op_arg(expr, 0), res);\n  }\n  else\n  {\n    list = isl_ast_expr_list_from_ast_expr(res);\n    res = isl_ast_expr_get_op_arg(expr, 0);\n    res = isl_ast_expr_access(res, list);\n  }\n\n  isl_ast_expr_free(expr);\n\n  return res;\n}\n\n/* AST expression transformation callback for pet_stmt_build_ast_exprs.\n *\n * If the AST expression refers to an array that is not accessed\n * at all, then this means the value of the expression is not used,\n * so we might as well print zero (NULL pointer) instead.\n *\n * If the AST expression refers to a global scalar that is not\n * a read-only scalar, then its address was passed to the kernel and\n * we need to dereference it.\n *\n * If the AST expression refers to an access to a global array,\n * then we linearize the access exploiting the bounds in data->local_array.\n */\nstatic __isl_give isl_ast_expr *transform_expr(__isl_take isl_ast_expr *expr,\n                                               __isl_keep isl_id *id, void *user)\n{\n  struct autosa_transform_data *data = (struct autosa_transform_data *)user;\n\n  if (!data->array)\n    return expr;\n\n  if (!data->array->accessed)\n  {\n    isl_ctx *ctx;\n\n    ctx = isl_ast_expr_get_ctx(expr);\n    isl_ast_expr_free(expr);\n    return isl_ast_expr_from_val(isl_val_zero(ctx));\n  }\n  if (autosa_array_is_read_only_scalar(data->array))\n    return expr;\n  if (!data->global)\n    return expr;\n  if (data->array->n_index == 0)\n    return dereference(expr);\n  if (!data->array->linearize)\n    return expr;\n\n  return autosa_local_array_info_linearize_index(data->local_array, expr);\n}\n\n/* This function is called for each instance of a user statement\n * in the kernel \"kernel\", identified by \"autosa_stmt\".\n * \"kernel\" may be NULL if we are not inside a kernel.\n *\n * We attach a struct autosa_kernel_stmt to the \"node\", containing\n * a computed AST expression for each access, through an annotation\n * with name \"user\".\n * These AST expressions are computed from iterator_map,\n * which expresses the domain elements in terms of the generated loops, \n * and sched2copy, which expresses the outer copy_schedule_dim dimensions of\n * the kernel schedule computed by AutoSA in terms of the generated loops.\n */\nstatic __isl_give isl_ast_node *create_domain_leaf(\n    struct autosa_kernel *kernel, __isl_take isl_ast_node *node,\n    __isl_keep isl_ast_build *build, struct autosa_stmt *autosa_stmt)\n{\n  struct autosa_transform_data data;\n  struct autosa_kernel_stmt *stmt;\n  isl_ctx *ctx;\n  isl_id *id;\n  isl_pw_multi_aff *sched2copy;\n  isl_map *map;\n  isl_pw_multi_aff *iterator_map;\n  isl_union_map *schedule;\n\n  if (!node)\n    return NULL;\n  ctx = isl_ast_node_get_ctx(node);\n\n  stmt = isl_calloc_type(ctx, struct autosa_kernel_stmt);\n  if (!stmt)\n    return isl_ast_node_free(node);\n\n  schedule = isl_ast_build_get_schedule(build);\n  map = isl_map_reverse(isl_map_from_union_map(schedule));\n  iterator_map = isl_pw_multi_aff_from_map(map);\n  if (kernel)\n    sched2copy = compute_sched_to_copy(kernel,\n                                       isl_pw_multi_aff_copy(iterator_map));\n  else\n    sched2copy = NULL;\n\n  stmt->type = AUTOSA_KERNEL_STMT_DOMAIN;\n  stmt->u.d.stmt = autosa_stmt;\n\n  data.kernel = kernel;\n  data.accesses = stmt->u.d.stmt->accesses;\n  data.iterator_map = iterator_map;\n  data.sched2copy = sched2copy;\n  stmt->u.d.ref2expr = pet_stmt_build_ast_exprs(stmt->u.d.stmt->stmt,\n                                                build, &transform_index, &data,\n                                                &transform_expr, &data);\n  isl_pw_multi_aff_free(iterator_map);\n  isl_pw_multi_aff_free(sched2copy);\n\n  id = isl_id_alloc(ctx, \"user\", stmt);\n  id = isl_id_set_free_user(id, &autosa_kernel_stmt_free);\n  if (!id)\n    autosa_kernel_stmt_free(stmt);\n  return isl_ast_node_set_annotation(node, id);\n}\n\n/* Does \"array\" need to be allocated on the device?\n * If it is a read-only scalar, then it will be passed as an argument\n * to the kernel and therefore does not require any allocation.\n * If this device memory is not accessed at all, then it does not\n * need to be allocated either.\n */\nint autosa_array_requires_device_allocation(struct autosa_array_info *array)\n{\n  if (autosa_array_is_read_only_scalar(array))\n    return 0;\n  if (!array->global)\n    return 0;\n  return 1;\n}\n\n/* Build AST expressions for the device array sizes of all arrays in \"prog\"\n * that require allocation on the device using \"build\", as well as\n * for the original array sizes of all arrays that need to be declared\n * on the host.\n * \"node\" is freed in case of error.\n */\nstatic __isl_give isl_ast_node *build_array_bounds(\n    __isl_take isl_ast_node *node, struct autosa_prog *prog,\n    __isl_keep isl_ast_build *build)\n{\n  int i;\n\n  for (i = 0; i < prog->n_array; ++i)\n  {\n    struct autosa_array_info *array = &prog->array[i];\n    isl_multi_pw_aff *size;\n    isl_ast_expr *expr;\n\n    if (!autosa_array_requires_device_allocation(array))\n      continue;\n\n    size = isl_multi_pw_aff_copy(array->bound);\n    expr = ppcg_build_size_expr(size, build);\n    array->bound_expr = expr;\n    if (!expr)\n      return isl_ast_node_free(node);\n  }\n\n  for (i = 0; i < prog->n_array; ++i)\n  {\n    struct autosa_array_info *array = &prog->array[i];\n    isl_set *extent;\n    isl_multi_pw_aff *size;\n    isl_ast_expr *expr;\n\n    if (!array->declare_local)\n      continue;\n    extent = isl_set_copy(array->declared_extent);\n    size = ppcg_size_from_extent(extent);\n    expr = ppcg_build_size_expr(size, build);\n    array->declared_size = expr;\n    if (!expr)\n      return isl_ast_node_free(node);\n  }\n\n  return node;\n}\n\n/* This function is called for each statement node in the AST\n * for copying to or from local memory.\n * Attach a pointer to a polysa_kernel_stmt representing the copy\n * statement to the node.\n * The statement name is \"read\" or \"write\", depending on whether we are\n * reading from global memory or writing to global memory.\n *\n * The schedule is of the form\n *\n *\ttype[D -> A] -> L\n *\n * where D corresponds to the outer tile->depth dimensions of\n * the kernel schedule, A to the global array and L to the outer\n * generated AST schedule.\n * We compute the inverse and strip off the type, resulting in\n *\n *\tL -> [D -> A]\n *\n * We combine this mapping with on the one hand the projection\n *\n *\t[D -> A] -> A\n *\n * and on the other hand the group tiling\n *\n *\t[D -> A] -> T\n *\n * resulting in\n *\n *\tL -> A\t\tand \tL -> T\n *\n * and store the corresponding expressions in stmt->index and stmt->local_index,\n * where stmt points to the ppcg_kernel_stmt that is attached to the node.\n * stmt->index is linearized if the global memory array is linearized.\n */\nstatic __isl_give isl_ast_node *create_access_leaf(struct autosa_kernel *kernel,\n                                                   struct autosa_array_ref_group *group, __isl_take isl_ast_node *node,\n                                                   __isl_keep isl_ast_build *build)\n{\n  struct autosa_kernel_stmt *stmt;\n  struct autosa_array_tile *tile;\n  isl_id *id;\n  isl_ast_expr *expr;\n  isl_space *space;\n  isl_map *access;\n  isl_pw_multi_aff *pma, *pma2;\n  const char *type;\n\n  stmt = isl_calloc_type(kernel->ctx, struct autosa_kernel_stmt);\n  if (!stmt)\n    return isl_ast_node_free(node);\n\n  /* type[D -> A] -> L */\n  access = isl_map_from_union_map(isl_ast_build_get_schedule(build));\n  type = isl_map_get_tuple_name(access, isl_dim_in);\n  stmt->u.c.read = type && !strcmp(type, \"read\");\n  /* L -> type[D -> A] */\n  access = isl_map_reverse(access);\n  pma = isl_pw_multi_aff_from_map(access);\n  pma = isl_pw_multi_aff_reset_tuple_id(pma, isl_dim_out);\n  space = isl_space_range(isl_pw_multi_aff_get_space(pma));\n  space = isl_space_unwrap(space);\n  /* [D -> A] -> A */\n  pma2 = isl_pw_multi_aff_range_map(space);\n  /* L -> A */\n  pma2 = isl_pw_multi_aff_pullback_pw_multi_aff(pma2,\n                                                isl_pw_multi_aff_copy(pma));\n  expr = isl_ast_build_access_from_pw_multi_aff(build, pma2);\n  if (group->array->linearize)\n    expr = autosa_local_array_info_linearize_index(group->local_array,\n                                                   expr);\n  stmt->u.c.index = expr;\n\n  tile = autosa_array_ref_group_tile(group);\n  /* [D -> A] -> T */\n  pma2 = isl_pw_multi_aff_from_multi_aff(\n      isl_multi_aff_copy(tile->tiling));\n  /* L -> T */\n  pma2 = isl_pw_multi_aff_pullback_pw_multi_aff(pma2, pma);\n  expr = isl_ast_build_access_from_pw_multi_aff(build, pma2);\n  stmt->u.c.local_index = expr;\n\n  stmt->u.c.array = group->array;\n  stmt->u.c.local_array = group->local_array;\n  stmt->type = AUTOSA_KERNEL_STMT_COPY;\n\n  id = isl_id_alloc(kernel->ctx, \"copy\", stmt);\n  id = isl_id_set_free_user(id, &autosa_kernel_stmt_free);\n  if (!id)\n    autosa_kernel_stmt_free(stmt);\n  return isl_ast_node_set_annotation(node, id);\n}\n\n/* This function is called for each instance of a user statement\n * in the kernel. This may be one of the original user statements\n * or a statement introduced by AutoSA.\n *\n * We first check if the statement id corresponds to a autosa statement,\n * which indicates the statement is an original user statement. Any statement\n * that is not an original user statement has been introduced by AutoSA and\n * requires special handling.\n *\n * If the user statement is one of the original user statements, then we call\n * create_domain_leaf.  \n * If it is \"init_device\", then we call build_array_bounds.  \n * Otherwise, we check if it is a copy statement and call the appropriate \n * functions.  \n * Statements that copy an array to/from the device do not need any \n * further treatment. Neither does \"clear_device\".\n */\nstatic __isl_give isl_ast_node *at_domain(__isl_take isl_ast_node *node,\n                                          __isl_keep isl_ast_build *build, void *user)\n{\n  struct autosa_at_domain_data *data = (struct autosa_at_domain_data *)user;\n  struct autosa_stmt *device_stmt;\n  isl_ast_expr *expr, *arg;\n  isl_id *id;\n  int is_sync;\n  const char *name;\n  void *p;\n\n  expr = isl_ast_node_user_get_expr(node);\n  arg = isl_ast_expr_get_op_arg(expr, 0);\n  id = isl_ast_expr_get_id(arg);\n  name = isl_id_get_name(id);\n  p = isl_id_get_user(id);\n  isl_ast_expr_free(expr);\n  isl_ast_expr_free(arg);\n\n  device_stmt = find_stmt(data->prog, id);\n  isl_id_free(id);\n\n  if (device_stmt)\n    return create_domain_leaf(data->kernel, node, build, device_stmt);\n  if (!prefixcmp(name, \"to_device_\") || !prefixcmp(name, \"from_device_\"))\n    return node;\n  if (!strcmp(name, \"init_device\"))\n    return build_array_bounds(node, data->prog, build);\n  if (!strcmp(name, \"clear_device\"))\n    return node;\n  if (!strcmp(name, \"drain_merge\"))\n    return node;\n  if (!strcmp(name, \"read\") || !strcmp(name, \"write\"))\n  {\n    struct autosa_array_ref_group *group = (struct autosa_array_ref_group *)p;\n    return create_access_leaf(data->kernel, group, node, build);\n  }\n\n  return node;\n}\n\n/* Build an access AST expression for the effective grid size using \"build\".\n * Store the result in kernel->grid_size_expr.\n */\nstatic isl_stat build_grid_size(struct autosa_kernel *kernel,\n                                __isl_keep isl_ast_build *build)\n{\n  isl_multi_pw_aff *size;\n\n  size = isl_multi_pw_aff_copy(kernel->grid_size);\n  size = isl_multi_pw_aff_set_tuple_name(size, isl_dim_out, \"grid\");\n  kernel->grid_size_expr = ppcg_build_size_expr(size, build);\n\n  if (!kernel->grid_size_expr)\n    return isl_stat_error;\n  return isl_stat_ok;\n}\n\n/* Build access AST expressions for the localized array sizes using \"build\".\n * Store the result in local->bound_expr.\n * Only do this for arrays for which localized bounds have been computed.\n */\nstatic isl_stat build_local_array_sizes(struct autosa_kernel *kernel,\n                                        __isl_keep isl_ast_build *build)\n{\n  int i;\n\n  for (i = 0; i < kernel->n_array; ++i)\n  {\n    struct autosa_local_array_info *local = &kernel->array[i];\n    isl_multi_pw_aff *size;\n\n    if (local->n_group == 0)\n      continue;\n    size = isl_multi_pw_aff_copy(local->bound);\n    local->bound_expr = ppcg_build_size_expr(size, build);\n    if (!local->bound_expr)\n      return isl_stat_error;\n  }\n\n  return isl_stat_ok;\n}\n\n/* Build access AST expressions for the effective grid size and\n * the localized array sizes using \"build\".\n */\nstatic isl_stat build_grid_and_local_array_sizes(struct autosa_kernel *kernel,\n                                                 __isl_keep isl_ast_build *build)\n{\n  if (build_grid_size(kernel, build) < 0)\n    return isl_stat_error;\n  if (build_local_array_sizes(kernel, build) < 0)\n    return isl_stat_error;\n  return isl_stat_ok;\n}\n\n/* This function is called before the AST generator starts traversing\n * the schedule subtree of a node with mark \"mark\".\n *\n * If the mark is called \"kernel\", store the kernel pointer in data->kernel\n * for use in at_domain and build AST expressions for the grid size and\n * the localized array sizes.\n */\nstatic isl_stat before_mark(__isl_keep isl_id *mark,\n                            __isl_keep isl_ast_build *build, void *user)\n{\n  struct autosa_at_domain_data *data = (struct autosa_at_domain_data *)user;\n\n  if (!mark)\n    return isl_stat_error;\n  if (!strcmp(isl_id_get_name(mark), \"kernel\"))\n  {\n    data->kernel = (struct autosa_kernel *)isl_id_get_user(mark);\n    if (build_grid_and_local_array_sizes(data->kernel, build) < 0)\n      return isl_stat_error;\n  }\n  return isl_stat_ok;\n}\n\n/* This function is called after the AST generator has finished traversing\n * the schedule subtree of a mark node. \"node\" points to the corresponding\n * mark AST node.\n *\n * If the mark is called \"kernel\", then replace \"node\" by a user node\n * that \"calls\" the kernel, representing the launch of the kernel.\n * The original \"node\" is stored inside the kernel object so that\n * it can be used to print the device code.\n * Note that this assumes that a kernel is only launched once.\n * Also clear data->kernel.\n */\nstatic __isl_give isl_ast_node *after_mark(__isl_take isl_ast_node *node,\n                                           __isl_keep isl_ast_build *build, void *user)\n{\n  isl_ctx *ctx;\n  isl_id *id;\n  isl_ast_expr *expr;\n  isl_ast_expr_list *list;\n  struct autosa_kernel *kernel;\n  struct autosa_at_domain_data *data = (struct autosa_at_domain_data *)user;\n\n  ctx = isl_ast_node_get_ctx(node);\n  id = isl_ast_node_mark_get_id(node);\n  if (!id)\n    return isl_ast_node_free(node);\n  if (strcmp(isl_id_get_name(id), \"kernel\") || !data->kernel)\n  {\n    isl_id_free(id);\n    return node;\n  }\n  kernel = data->kernel;\n  data->kernel = NULL;\n  kernel->space = isl_ast_build_get_schedule_space(build);\n  kernel->tree = isl_ast_node_mark_get_node(node);\n  isl_ast_node_free(node);\n  expr = isl_ast_expr_from_id(isl_id_copy(id));\n  list = isl_ast_expr_list_alloc(ctx, 0);\n  expr = isl_ast_expr_call(expr, list);\n  node = isl_ast_node_alloc_user(expr);\n  node = isl_ast_node_set_annotation(node, id);\n\n  return node;\n}\n\n/* Use isl to generate code for both the host and the device\n * from \"schedule\".\n * The device code is marked by \"kernel\" mark nodes in the schedule tree,\n * containing a pointer to a polysa_kernel object.\n * The returned AST only contains the AST for the host code.\n * The ASTs for the device code are embedded in polysa_kernel objects\n * attached to the leaf nodes that call \"kernel\".\n */\n__isl_give isl_ast_node *sa_generate_code(struct autosa_gen *gen,\n                                          __isl_take isl_schedule *schedule)\n{\n  struct autosa_at_domain_data data;\n  isl_ast_build *build;\n  isl_ast_node *tree;\n  isl_id_list *iterators;\n  int depth;\n\n  if (schedule == NULL)\n    return NULL;\n\n  data.prog = gen->prog;\n  data.kernel = NULL;\n\n  depth = 0;\n  if (isl_schedule_foreach_schedule_node_top_down(schedule, &update_depth,\n                                                  &depth) < 0)\n    schedule = isl_schedule_free(schedule);\n  build = isl_ast_build_alloc(gen->prog->ctx);\n  iterators = ppcg_scop_generate_names(gen->prog->scop, depth, \"c\");\n  build = isl_ast_build_set_iterators(build, iterators);\n  build = isl_ast_build_set_at_each_domain(build, &at_domain, &data);\n  build = isl_ast_build_set_before_each_mark(build, &before_mark, &data);\n  build = isl_ast_build_set_after_each_mark(build, &after_mark, &data);\n  if (gen->prog->scop->options->debug->dump_final_schedule)\n    isl_schedule_dump(schedule);\n  tree = isl_ast_build_node_from_schedule(build, schedule);\n  isl_ast_build_free(build);\n\n  return tree;\n}\n\n/* Initialize the autosa_at_domain_data struct. */\nstatic void autosa_at_domain_data_init(\n    struct autosa_at_domain_data *data, struct autosa_gen *gen)\n{\n  data->prog = gen->prog;\n  data->kernel = NULL;\n  data->module = NULL;\n  data->filter_buffer = 0;\n  data->under_unroll = 0;\n  data->under_pipeline = 0;\n  data->in_unroll_for = 0;\n  data->in_pipeline_for = 0;\n  data->in_for = 0;\n  data->boundary = 0;\n  data->pe_dummy = 0;\n  data->pe_dummy_module = NULL;\n  data->drain_merge_func = NULL;\n  data->tuning = 0;\n  data->tuning_num = 0;\n}\n\n/* Return a pointer to the autosa_array_ref_group in \"local\"\n * that contains the reference \"access\".\n * Return NULL if no such group can be found.\n */\nstatic struct autosa_array_ref_group *find_ref_group_module(\n    struct autosa_local_array_info *local, struct autosa_stmt_access *access)\n{\n  int i, j;\n\n  for (i = 0; i < local->n_pe_group; ++i)\n  {\n    struct autosa_array_ref_group *group = local->pe_groups[i];\n\n    for (j = 0; j < group->n_ref; ++j)\n      if (group->refs[j] == access)\n        return group;\n  }\n\n  return NULL;\n}\n\n/* Index transformation callback for pet_stmt_build_ast_exprs.\n *\n * \"index\" expresses the array indices in terms of statement iterators\n *\n * We first reformulate \"index\" in terms of the AST loop iterators.\n * Then we check if we are accessing the global array or\n * a shared/private copy.  In particular, if we are not inside a kernel\n * then we must be accessing a global array.\n * In the former case, we simply return\n * the updated index.  If \"index\" is an affine expression rather\n * than an array access, then we also return the updated index here.\n *\n * If no reference groups have been computed for the array,\n * then we can only be accessing the global array.\n *\n * Otherwise, we apply the tiling to the index.\n * This tiling is of the form\n *\n *\t[D -> A] -> T\n *\n * where D corresponds to the outer tile->depth dimensions of\n * the kernel schedule.\n * The index is of the form\n *\n *\tL -> A\n *\n * We update the tiling to refer to the AST loop iterators\n *\n *\t[L -> A] -> T\n *\n * and combine it with the index to obtain a tiled index expression in terms\n * of the AST loop iterators\n *\n *\tL -> T\n *\n * Note that while the tiling applies directly to an outer array.\n * the index may refer to some subfield of this outer array.\n * In such cases, the result will refer to the same subfield of the tile.\n * That is, an index expression of the form  L -> F(A) will be transformed\n * into an index expression of the form L -> F(T).\n */\nstatic __isl_give isl_multi_pw_aff *transform_index_module(\n    __isl_take isl_multi_pw_aff *index, __isl_keep isl_id *ref_id,\n    void *user)\n{\n  struct autosa_transform_data *data = (struct autosa_transform_data *)user;\n  struct autosa_stmt_access *access;\n  struct autosa_array_ref_group *group;\n  struct autosa_array_tile *tile;\n  isl_pw_multi_aff *iterator_map;\n  int i;\n  int dim;\n  const char *name;\n  isl_space *space;\n  isl_multi_pw_aff *tiling;\n  isl_pw_multi_aff *pma;\n  isl_pw_multi_aff *sched2depth;\n  isl_pw_multi_aff *sched2copy;\n\n  data->array = NULL;\n\n  iterator_map = isl_pw_multi_aff_copy(data->iterator_map);\n  index = isl_multi_pw_aff_pullback_pw_multi_aff(index, iterator_map);\n\n  if (!data->kernel)\n    return index;\n\n  access = find_access(data->accesses, ref_id);\n  if (!access)\n    return index;\n  if (!isl_map_has_tuple_name(access->access, isl_dim_out))\n    return index;\n\n  name = get_outer_array_name(access->access);\n  if (!name)\n    return isl_multi_pw_aff_free(index);\n  i = find_array_index(data->kernel, name);\n  if (i < 0)\n    isl_die(isl_multi_pw_aff_get_ctx(index), isl_error_internal,\n            \"cannot find array\",\n            return isl_multi_pw_aff_free(index));\n  data->local_array = &data->kernel->array[i];\n  data->array = data->local_array->array;\n\n  group = find_ref_group_module(data->local_array, access);\n  data->group = group;\n  if (!group)\n  {\n    data->global = 1;\n    data->reg = 1;\n    return index;\n  }\n\n  tile = autosa_array_ref_group_tile(group);\n  data->global = !tile;\n  data->reg = !tile;\n  if (!tile)\n    return index;\n\n  /* recompute the sched2copy for each index. */\n  if (group->group_type == AUTOSA_PE_GROUP)\n  {    \n    sched2copy = compute_sched_to_copy_group(\n        isl_pw_multi_aff_copy(data->iterator_map), group);    \n  }\n\n  space = isl_space_domain(isl_multi_aff_get_space(tile->tiling));\n  space = isl_space_range(isl_space_unwrap(space));\n  space = isl_space_map_from_set(space);\n  pma = isl_pw_multi_aff_identity(space);\n  if (group->group_type == AUTOSA_PE_GROUP)\n  {\n    sched2depth = sched2copy;\n  }\n  else\n  {\n    sched2depth = isl_pw_multi_aff_copy(data->sched2copy);\n  }\n  dim = isl_pw_multi_aff_dim(sched2depth, isl_dim_out);\n  sched2depth = isl_pw_multi_aff_drop_dims(sched2depth, isl_dim_out,\n                                           tile->depth, dim - tile->depth);\n  pma = isl_pw_multi_aff_product(sched2depth, pma);\n  tiling = isl_multi_pw_aff_from_multi_aff(\n      isl_multi_aff_copy(tile->tiling));\n  tiling = isl_multi_pw_aff_pullback_pw_multi_aff(tiling, pma);\n  index = tile_outer(index, tiling);\n\n  return index;\n}\n\n/* AST expression transformation callback for pet_stmt_build_ast_exprs.\n *\n * If the AST expression refers to an array that is not accessed\n * at all, then this means the value of the expression is not used,\n * so we might as well print zero (NULL pointer) instead.\n *\n * If the AST expression refers to a global scalar that is not\n * a read-only scalar, then its address was passed to the kernel and\n * we need to dereference it.\n *\n * If the AST expression refers to an array reference that is put in \n * the registers. We will modify the expr to a register access.\n *\n * If the AST expression refers to an access to a global array,\n * then we linearize the access exploiting the bounds in data->local_array.\n */\nstatic __isl_give isl_ast_expr *transform_expr_module(__isl_take isl_ast_expr *expr,\n                                                      __isl_keep isl_id *id, void *user)\n{\n  struct autosa_transform_data *data = (struct autosa_transform_data *)user;\n\n  if (!data->array)\n    return expr;\n\n  if (!data->array->accessed)\n  {\n    isl_ctx *ctx;\n\n    ctx = isl_ast_expr_get_ctx(expr);\n    isl_ast_expr_free(expr);\n    return isl_ast_expr_from_val(isl_val_zero(ctx));\n  }\n  if (autosa_array_is_read_only_scalar(data->array))\n    return expr;\n  if (!data->reg)\n    return expr;\n  if (data->reg)\n  {\n    isl_ctx *ctx;\n    char *local_name;\n    char buf[50];\n    isl_id *id;\n    isl_ast_expr *array;\n    isl_ast_expr_list *indices;\n    isl_ast_expr *indice;\n\n    ctx = isl_ast_expr_get_ctx(expr);\n    isl_ast_expr_free(expr);\n\n    /* Create a register access. */\n    isl_printer *p_str = isl_printer_to_str(ctx);\n    p_str = autosa_array_ref_group_print_name(data->group, p_str);\n    local_name = isl_printer_get_str(p_str);\n    isl_printer_free(p_str);\n    sprintf(buf, \"%s\", local_name);\n    free(local_name);\n\n    id = isl_id_alloc(ctx, buf, NULL);\n    array = isl_ast_expr_from_id(id);\n    indice = isl_ast_expr_from_val(isl_val_zero(ctx));\n    indices = isl_ast_expr_list_from_ast_expr(indice);\n    expr = isl_ast_expr_access(array, indices);\n\n    return expr;\n  }\n  if (data->array->n_index == 0)\n    return dereference(expr);\n  if (!data->array->linearize)\n    return expr;\n\n  return autosa_local_array_info_linearize_index(data->local_array, expr);\n}\n\n/* This function is called for each instance of a user statement\n * in the kernel \"kernel\", identified by \"autosa_stmt\".\n * \"kernel\" may be NULL if we are not inside a kernel.\n *\n * We attach a struct autosa_kernel_stmt to the \"node\", containing\n * a computed AST expression for each access, through an annotation\n * with name \"user\".\n * These AST expressions are computed from iterator_map,\n * which expresses the domain\n * elements in terms of the generated loops, and sched2copy,\n * which expresses the outer copy_schedule_dim dimensions of\n * the kernel schedule computed by PPCG in terms of the generated loops.\n */\nstatic __isl_give isl_ast_node *create_domain_leaf_module(\n    struct autosa_kernel *kernel, __isl_take isl_ast_node *node,\n    __isl_keep isl_ast_build *build, struct autosa_stmt *autosa_stmt)\n{\n  struct autosa_transform_data data;\n  struct autosa_kernel_stmt *stmt;\n  isl_ctx *ctx;\n  isl_id *id;\n  isl_pw_multi_aff *sched2copy;\n  isl_map *map;\n  isl_pw_multi_aff *iterator_map;\n  isl_union_map *schedule;\n\n  if (!node)\n    return NULL;\n  ctx = isl_ast_node_get_ctx(node);\n\n  stmt = isl_calloc_type(ctx, struct autosa_kernel_stmt);\n  if (!stmt)\n    return isl_ast_node_free(node);\n\n  schedule = isl_ast_build_get_schedule(build);\n  map = isl_map_reverse(isl_map_from_union_map(schedule));\n  iterator_map = isl_pw_multi_aff_from_map(map);\n  if (kernel)\n    sched2copy = compute_sched_to_copy(kernel,\n                                       isl_pw_multi_aff_copy(iterator_map));\n  else\n    sched2copy = NULL;\n\n  stmt->type = AUTOSA_KERNEL_STMT_DOMAIN;\n  stmt->u.d.stmt = autosa_stmt;\n\n  data.kernel = kernel;\n  data.accesses = stmt->u.d.stmt->accesses;\n  data.iterator_map = iterator_map;\n  data.sched2copy = sched2copy;\n  stmt->u.d.ref2expr = pet_stmt_build_ast_exprs(stmt->u.d.stmt->stmt,\n                                                build, &transform_index_module, &data,\n                                                &transform_expr_module, &data);\n\n  isl_pw_multi_aff_free(iterator_map);\n  isl_pw_multi_aff_free(sched2copy);\n\n  id = isl_id_alloc(ctx, \"user\", stmt);\n  id = isl_id_set_free_user(id, &autosa_kernel_stmt_free);\n  if (!id)\n    autosa_kernel_stmt_free(stmt);\n  return isl_ast_node_set_annotation(node, id);\n}\n\n/* This function extracts the reduce op in the stmt name, which is in the format of:\n * in/out_trans_reduce_[op]\n */\nstatic char *extract_io_stmt_reduce_op(\n  isl_ctx *ctx, const char *type)\n{\n  isl_printer *p_str;\n  char *op;\n  int loc = 0;\n  char ch;\n  int underscore_cnt = 0;\n\n  p_str = isl_printer_to_str(ctx);  \n  while ((ch = type[loc]) != '\\0')\n  {\n    if (ch == '.')\n      break;\n    if (ch == '_')\n      underscore_cnt++;\n    else if (underscore_cnt == 3) {\n      char buf[2];\n      buf[0] = ch;\n      buf[1] = '\\0';\n      p_str = isl_printer_print_str(p_str, buf);      \n    }\n    loc++;\n  }\n\n  op = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  return op;\n}\n\n/* AutoSA stmt is in the format of\n * [].[].[]\n * This function extracts the integer field at the pos-th position.\n * If the position is not found, -1 is returned.\n */\nstatic int extract_autosa_stmt_int_field(\n  isl_ctx *ctx, const char *type, int pos) \n{\n  int loc = 0;\n  char ch;\n  int dot_time = 0;\n  isl_printer *p_str;\n  char *comp_str;\n  int ret;\n\n  while ((ch = type[loc]) != '\\0')\n  {\n    if (ch == '.')\n      dot_time++;\n    if (dot_time == pos)\n      break;\n    loc++;\n  }\n\n  if (ch == '\\0') {\n    //std::string stmt(type);\n    //std::string info = \"[AutoSA] Error: Wrong pos: \" + std::to_string(pos) + \n    //  \" in stmt: \" + stmt;\n    //throw std::runtime_error(info);\n    return -1;\n  }\n\n  p_str = isl_printer_to_str(ctx);\n  loc++;\n  while (((ch = type[loc]) != '\\0') && ((ch = type[loc]) != '.'))\n  {\n    char buf[2];\n    buf[0] = ch;\n    buf[1] = '\\0';\n    p_str = isl_printer_print_str(p_str, buf);\n    loc++;\n  }\n\n  comp_str = isl_printer_get_str(p_str);\n  ret = atoi(comp_str);\n  free(comp_str);\n  isl_printer_free(p_str);\n\n  return ret;\n}\n\n/* AutoSA stmt is in the format of\n * [].[].[]\n * This function extracts the string field at the pos-th position.\n * If the position is not found, NULL is returned.\n */\nstatic __isl_give char *extract_autosa_stmt_str_field(\n  isl_ctx *ctx, const char *type, int pos) \n{\n  int loc = 0;\n  char ch;\n  int dot_time = 0;\n  isl_printer *p_str;\n  char *comp_str;  \n\n  while ((ch = type[loc]) != '\\0')\n  {\n    if (ch == '.')\n      dot_time++;\n    if (dot_time == pos)\n      break;\n    loc++;\n  }\n\n  if (ch == '\\0') {    \n    return NULL;\n  }\n\n  p_str = isl_printer_to_str(ctx);\n  loc++;\n  while (((ch = type[loc]) != '\\0') && ((ch = type[loc]) != '.'))\n  {\n    char buf[2];\n    buf[0] = ch;\n    buf[1] = '\\0';\n    p_str = isl_printer_print_str(p_str, buf);\n    loc++;\n  }\n\n  comp_str = isl_printer_get_str(p_str);  \n  isl_printer_free(p_str);\n\n  return comp_str;\n}\n\nstatic __isl_give isl_ast_node *create_serialize_leaf(struct autosa_kernel *kernel,\n                                                      struct autosa_array_ref_group_pair *pair,\n                                                      __isl_take isl_ast_node *node,\n                                                      const char *name,\n                                                      __isl_keep isl_ast_build *build)\n{\n  struct autosa_kernel_stmt *stmt;\n  struct autosa_array_ref_group *group;\n  isl_ctx *ctx;\n  isl_map *access;\n  isl_set *set;\n  isl_pw_multi_aff *pma, *pma2;\n  isl_space *space;\n  isl_ast_expr *expr;\n  isl_id *id;\n\n  stmt = isl_calloc_type(kernel->ctx, struct autosa_kernel_stmt);\n  if (!stmt)\n    return isl_ast_node_free(node);\n  stmt->type = AUTOSA_KERNEL_STMT_HOST_SERIALIZE;\n  ctx = kernel->ctx;\n  group = pair->local_group;\n\n  /* Compute the global index. */\n  /* type[D -> A] -> L */\n  access = isl_map_from_union_map(isl_ast_build_get_schedule(build));\n  /* L -> type[D -> A] */\n  access = isl_map_reverse(access);\n  pma = isl_pw_multi_aff_from_map(access);\n  pma = isl_pw_multi_aff_reset_tuple_id(pma, isl_dim_out);\n  space = isl_space_range(isl_pw_multi_aff_get_space(pma));\n  space = isl_space_unwrap(space);\n  /* [D -> A] -> A */\n  pma2 = isl_pw_multi_aff_range_map(space);\n  /* L -> A */\n  pma2 = isl_pw_multi_aff_pullback_pw_multi_aff(pma2,\n                                                pma);\n  expr = isl_ast_build_access_from_pw_multi_aff(build, pma2);\n  expr = autosa_local_array_info_linearize_index(group->local_array, expr);\n\n  stmt->u.s.index = expr;\n  stmt->u.s.in = !prefixcmp(name, \"serialize\") ? 1 : 0;\n  stmt->u.s.group = pair->io_group;\n\n  id = isl_id_alloc(kernel->ctx, \"serialize\", stmt);\n  id = isl_id_set_free_user(id, &autosa_kernel_stmt_free);\n  if (!id)\n    autosa_kernel_stmt_free(stmt);\n  return isl_ast_node_set_annotation(node, id);\n}\n\n/* This function is called for each statement node in the AST\n * for transferring through fifos.\n * Attach a pointer to an autosa_kernel_stmt representing the io\n * statemet to the node.\n * The statement name is \"in\" or \"out\", depending on whether we are \n * transferring in or out via fifos.\n *\n * The schedule is of the form\n *\n *  type[D -> A] -> L\n *\n * where D corresponds to the outer tile->depth dimensions of \n * the kernel schedule, A to the global array and L to the outer \n * generated AST schedule.\n * We compute the inverse and strip off the type, resulting in\n *\n *  L -> [D -> A]\n *\n * We combine this mapping with the group tiling\n *\n *  [D -> A] -> T\n *\n * resulting in\n *   \n *  L -> T\n *\n * and store the corresponding expressions in stmt->local_index,\n * where stmt points to the autosa_kernel_stmt that is attached to the node.\n */\nstatic __isl_give isl_ast_node *create_io_leaf(struct autosa_kernel *kernel,\n                                               struct autosa_hw_module *module,\n                                               struct autosa_array_ref_group_pair *pair,\n                                               __isl_take isl_ast_node *node,\n                                               __isl_keep isl_ast_build *build)\n{\n  struct autosa_kernel_stmt *stmt;\n  struct autosa_array_tile *tile;\n  isl_multi_aff *new_tiling;\n  isl_map *access;\n  const char *type;\n  isl_pw_multi_aff *pma, *pma2;\n  isl_space *space;\n  isl_ast_expr *expr;\n  isl_id *id;\n  int is_trans;        // i/o transfer statement between on-chip modules\n  int is_trans_dram;   // i/o transfer statement between dram and on-chip modules\n  int is_trans_lower;  // i/o transfer statement with lower transfer\n  int is_trans_buf;    // i/o transfer statement with local buffers\n  int is_trans_boundary;\n  int is_trans_reduce;\n  int is_dummy;\n  int is_dummy_reduce;\n  int is_serialize; // is dram access to be serialized\n  struct autosa_array_ref_group *group = pair->local_group;\n  int depth;\n  isl_ctx *ctx;\n\n  stmt = isl_calloc_type(kernel->ctx, struct autosa_kernel_stmt);\n  if (!stmt)\n    return isl_ast_node_free(node);\n  ctx = kernel->ctx;\n\n  /* type[D -> A] -> L */\n  access = isl_map_from_union_map(isl_ast_build_get_schedule(build));\n  isl_set *set = isl_map_domain(isl_set_unwrap(isl_map_domain(isl_map_copy(access))));\n  depth = isl_set_dim(set, isl_dim_set);\n  isl_set_free(set);\n\n  type = isl_map_get_tuple_name(access, isl_dim_in);  \n  /* The format of io_trans stmt name:\n   * in/out_trans[_dram]/[_dram_serialize]/[_boundary]/[_reduce_(reduce_op)].[in_fifo_name].[out_fifo_name].[is_buffer].\n   * [cur_pack_lane].[nxt_pack_lane].[coalesce_depth].[coalesce_bound]\n   * or \n   * in/out[_dummy][_reduce].[fifo_name].[cur_pack_lane].[nxt_pack_lane]\n   */\n\n  /* Classify the io stmt type. */\n  is_trans = !prefixcmp(type, \"in_trans\") || !prefixcmp(type, \"out_trans\");\n  is_trans_dram = !prefixcmp(type, \"in_trans_dram\") || !prefixcmp(type, \"out_trans_dram\");\n  is_trans_boundary = !prefixcmp(type, \"in_trans_boundary\") || !prefixcmp(type, \"out_trans_boundary\");\n  is_trans_reduce = !prefixcmp(type, \"in_trans_reduce\") || !prefixcmp(type, \"out_trans_reduce\");\n  if (is_trans)\n  {    \n    is_trans_buf = extract_autosa_stmt_int_field(ctx, type, 3);    \n  }\n  if (!is_trans)\n  {\n    is_dummy = !prefixcmp(type, \"in_dummy\") || !prefixcmp(type, \"out_dummy\");\n  }\n  else\n  {\n    is_dummy = 0;\n  }\n  if (is_dummy) {\n    is_dummy_reduce = !prefixcmp(type, \"in_dummy_reduce\") || !prefixcmp(type, \"out_dummy_reduce\");\n  } else {\n    is_dummy_reduce = 0;\n  }  \n  if (is_trans_dram)\n  {    \n    is_serialize = !prefixcmp(type, \"in_trans_dram_serialize\") || !prefixcmp(type, \"out_trans_dram_serialize\");    \n  } else {\n    is_serialize = 0;\n  }\n  \n  stmt->u.i.simd_depth = pair->simd_depth;\n  stmt->u.i.dummy = is_dummy;\n  stmt->u.i.in = type && !prefixcmp(type, \"in\");\n  stmt->u.i.buf = is_trans_buf;    \n  stmt->u.i.serialize = is_serialize;  \n  if (is_trans) {\n    stmt->u.i.data_pack = extract_autosa_stmt_int_field(ctx, type, 4);\n    stmt->u.i.nxt_data_pack = extract_autosa_stmt_int_field(ctx, type, 5);\n    stmt->u.i.coalesce_depth = extract_autosa_stmt_int_field(ctx, type, 6);\n    stmt->u.i.coalesce_bound = extract_autosa_stmt_int_field(ctx, type, 7);\n    stmt->u.i.if_depth = extract_autosa_stmt_int_field(ctx, type, 8);    \n  } else {\n    stmt->u.i.data_pack = extract_autosa_stmt_int_field(ctx, type, 2);\n    stmt->u.i.nxt_data_pack = extract_autosa_stmt_int_field(ctx, type, 3);\n    stmt->u.i.coalesce_depth = -1;\n    stmt->u.i.coalesce_bound = -1;    \n  }\n  if (is_trans_reduce) {\n    stmt->u.i.reduce = 1;\n    stmt->u.i.reduce_op = extract_io_stmt_reduce_op(ctx, type);\n  } else {\n    stmt->u.i.reduce = is_dummy_reduce;\n    stmt->u.i.reduce_op = NULL;\n  }\n\n  /* Compute the global index. */\n  /* L -> type[D -> A] */\n  access = isl_map_reverse(access);\n  pma = isl_pw_multi_aff_from_map(access);\n  pma = isl_pw_multi_aff_reset_tuple_id(pma, isl_dim_out);\n\n  space = isl_space_range(isl_pw_multi_aff_get_space(pma));\n  space = isl_space_unwrap(space);\n  /* [D -> A] -> A */\n  pma2 = isl_pw_multi_aff_range_map(space);\n  /* L -> A */\n  pma2 = isl_pw_multi_aff_pullback_pw_multi_aff(pma2,\n                                                isl_pw_multi_aff_copy(pma));\n  expr = isl_ast_build_access_from_pw_multi_aff(build, pma2);\n  if (group->array->linearize)\n  {\n    expr = autosa_local_array_info_linearize_index(group->local_array,\n                                                   expr);\n\n    if (stmt->u.i.data_pack > 1)\n    {\n      /* Update the last dimension,\n       * divide it by the data packing factor.\n       */\n      isl_ast_expr *arg, *div;\n      arg = isl_ast_expr_get_op_arg(expr, 1);\n      div = isl_ast_expr_from_val(isl_val_int_from_si(kernel->ctx, stmt->u.i.data_pack));\n      arg = isl_ast_expr_div(arg, div);\n      expr = isl_ast_expr_set_op_arg(expr, 1, arg);\n    }\n  }\n  else\n  {\n    if (stmt->u.i.data_pack > 1)\n    {\n      /* Update the last dimension,\n       * divide it by the data packing factor.\n       */\n      int n_arg;\n      isl_ast_expr *arg, *div;\n      n_arg = isl_ast_expr_get_op_n_arg(expr);\n      arg = isl_ast_expr_get_op_arg(expr, n_arg - 1);\n      div = isl_ast_expr_from_val(isl_val_int_from_si(kernel->ctx, stmt->u.i.data_pack));\n      arg = isl_ast_expr_div(arg, div);\n      expr = isl_ast_expr_set_op_arg(expr, n_arg - 1, arg);\n    }\n  }\n\n  stmt->u.i.index = expr;\n\n  /* Compute the local index. */\n  tile = pair->local_tile;\n  if (tile)\n  {\n    isl_ast_expr *arg, *div;\n    int n_arg;\n\n    /* [D -> A] -> T */\n    pma2 = isl_pw_multi_aff_from_multi_aff(\n        isl_multi_aff_copy(tile->tiling));\n    if (tile->depth < depth)\n    {\n      /* Extend the D dimension to depth in pma2. */\n      new_tiling = autosa_array_ref_group_recompute_tiling(tile, group, depth);\n      isl_pw_multi_aff_free(pma2);\n      pma2 = isl_pw_multi_aff_from_multi_aff(new_tiling);\n    }\n\n    /* L -> T */\n    pma2 = isl_pw_multi_aff_pullback_pw_multi_aff(pma2, pma);\n    expr = isl_ast_build_access_from_pw_multi_aff(build, pma2);\n    stmt->u.i.local_index = expr;\n    stmt->u.i.reg = 0;\n  }\n  else\n  {\n    /* Create a scalar expr. */\n    isl_printer *p_str;\n    char *local_name;\n    char buf[50];\n    isl_ast_expr *array, *indice;\n    isl_ast_expr_list *indices;\n\n    isl_pw_multi_aff_free(pma);\n    p_str = isl_printer_to_str(kernel->ctx);\n    p_str = autosa_array_ref_group_print_name(group, p_str);\n    local_name = isl_printer_get_str(p_str);\n    isl_printer_free(p_str);        \n    sprintf(buf, \"%s\", local_name);    \n    free(local_name);    \n\n    id = isl_id_alloc(kernel->ctx, buf, NULL);\n    array = isl_ast_expr_from_id(id);\n    indice = isl_ast_expr_from_val(isl_val_zero(kernel->ctx));\n    indices = isl_ast_expr_list_from_ast_expr(indice);\n    expr = isl_ast_expr_access(array, indices);\n    stmt->u.i.local_index = expr;\n    stmt->u.i.reg = 1;\n  }\n\n  if (is_trans) {\n    stmt->u.i.in_fifo_name = extract_autosa_stmt_str_field(ctx, type, 1);\n    stmt->u.i.out_fifo_name = extract_autosa_stmt_str_field(ctx, type, 2);\n  } else {\n    stmt->u.i.in_fifo_name = extract_autosa_stmt_str_field(ctx, type, 1);\n    stmt->u.i.out_fifo_name = extract_autosa_stmt_str_field(ctx, type, 1);\n  }\n  \n  stmt->u.i.group = pair->io_group;\n  stmt->u.i.module = module;\n  stmt->u.i.array = group->array;\n  stmt->u.i.local_array = group->local_array;\n  if (is_trans)\n  {\n    if (is_trans_dram)\n    {\n      stmt->type = AUTOSA_KERNEL_STMT_IO_DRAM;\n    }\n    else\n    {\n      stmt->type = AUTOSA_KERNEL_STMT_IO_TRANSFER;      \n      stmt->u.i.filter_sched_depth = -1;\n      stmt->u.i.filter_param_id = -1;\n      if (is_trans_boundary)\n      {\n        stmt->u.i.boundary = 1;\n      }\n      else\n      {\n        stmt->u.i.boundary = 0;\n      }\n    }\n  }\n  else\n  {\n    stmt->type = AUTOSA_KERNEL_STMT_IO;\n  }\n\n  id = isl_id_alloc(kernel->ctx, \"io\", stmt);\n  id = isl_id_set_free_user(id, &autosa_kernel_stmt_free);\n  if (!id)\n    autosa_kernel_stmt_free(stmt);\n  return isl_ast_node_set_annotation(node, id);\n}\n\nstatic __isl_give isl_ast_node *create_drain_merge_leaf(struct autosa_kernel *kernel,\n                                                        struct autosa_drain_merge_func *func, __isl_take isl_ast_node *node,\n                                                        __isl_keep isl_ast_build *build)\n{\n  struct autosa_kernel_stmt *stmt;\n  struct autosa_array_ref_group *group;\n  isl_ctx *ctx;\n  isl_map *access;\n  isl_pw_multi_aff *pma, *pma2;\n  isl_space *space;\n  isl_ast_expr *expr;\n  isl_id *id;\n\n  stmt = isl_calloc_type(kernel->ctx, struct autosa_kernel_stmt);\n  if (!stmt)\n    return isl_ast_node_free(node);\n  ctx = kernel->ctx;\n  stmt->type = AUTOSA_KERNEL_STMT_DRAIN_MERGE;\n  stmt->u.dm.func = func;\n\n  /* Compute the global index. */\n  /* type[D -> A] -> L */\n  access = isl_map_from_union_map(isl_ast_build_get_schedule(build));\n  /* L -> type[D -> A] */\n  access = isl_map_reverse(access);\n  pma = isl_pw_multi_aff_from_map(access);\n  pma = isl_pw_multi_aff_reset_tuple_id(pma, isl_dim_out);\n  space = isl_space_range(isl_pw_multi_aff_get_space(pma));\n  space = isl_space_unwrap(space);\n  /* [D -> A] -> A */\n  pma2 = isl_pw_multi_aff_range_map(space);\n  /* L -> A */\n  pma2 = isl_pw_multi_aff_pullback_pw_multi_aff(pma2,\n                                                isl_pw_multi_aff_copy(pma));\n  expr = isl_ast_build_access_from_pw_multi_aff(build, pma2);\n  isl_pw_multi_aff_free(pma);\n\n  /* Linearize the index. */\n  group = func->group;\n  expr = autosa_local_array_info_linearize_index(group->local_array, expr);\n  stmt->u.dm.index = expr;\n\n  id = isl_id_alloc(ctx, \"drain_merge\", stmt);\n  id = isl_id_set_free_user(id, &autosa_kernel_stmt_free);\n  if (!id)\n    autosa_kernel_stmt_free(stmt);\n  return isl_ast_node_set_annotation(node, id);\n}\n\n///* Exatract the boundary field from the module call type, which is in the format of:\n// * io_module.[].boundary\n// * or \n// * module_call.module_name.boundary\n// * */\n//static int extract_is_boundary(isl_ctx *ctx, const char *type)\n//{\n//  int ret_val;\n//  char *boundary = extract_io_stmt_str_field(ctx, type, 2);\n//  if (boundary && !strcmp(boundary, \"boundary\")) {\n//    ret_val = 1;\n//  } else {\n//    ret_val = 0;\n//  }\n//  free(boundary);\n//  return ret_val;\n//}\n\n/* Extract the module_name field from the module call type, which is in the format of:\n * module_call.module_name.boundary \n */\nstatic char *extract_module_name(isl_ctx *ctx, const char *type)\n{\n  char ch;\n  int loc = 0;\n  int n_dot = 0;\n  isl_printer *p_str;\n  char *module_name;\n\n  while ((ch = type[loc]) != '\\0')\n  {\n    if (ch == '.')\n      n_dot++;\n    if (n_dot == 1)\n      break;\n    loc++;\n  }\n\n  loc++;\n  p_str = isl_printer_to_str(ctx);\n  while ((ch = type[loc]) != '\\0')\n  {\n    if (ch == '.')\n      break;\n    char buf[2];\n    buf[0] = ch;\n    buf[1] = '\\0';\n    p_str = isl_printer_print_str(p_str, buf);\n    loc++;\n  }\n\n  module_name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  return module_name;\n}\n\n/* There are two types of module call statements:\n * module_call_upper and module_call_lower\n * For module_call_lower, if the module is connected to PEs,\n * we will calculate the AST expression io_pe_expr which is the \n * PE indices described by IO ids.\n */\nstatic __isl_give isl_ast_node *create_ext_module_leaf(\n    struct autosa_kernel *kernel,\n    __isl_take isl_ast_node *node, struct autosa_hw_module *module,\n    struct autosa_pe_dummy_module *pe_dummy_module,\n    struct autosa_array_ref_group *group, const char *name,\n    __isl_keep isl_ast_build *build)\n{\n  struct autosa_kernel_stmt *stmt;\n  isl_id *id;\n  isl_ctx *ctx;\n  isl_multi_aff *trans;\n  isl_map *map;\n  isl_pw_multi_aff *pma;\n  isl_ast_expr *expr;\n\n  ctx = isl_ast_node_get_ctx(node);\n  stmt = isl_calloc_type(ctx, struct autosa_kernel_stmt);\n  if (!stmt)\n    return isl_ast_node_free(node);\n\n  stmt->type = AUTOSA_KERNEL_STMT_EXT_MODULE;\n  stmt->u.m.module = module;\n  stmt->u.m.group = group;\n  /* module_lower/upper.module_name.[is_boundary].[is_serialize] */\n  stmt->u.m.boundary = extract_autosa_stmt_int_field(ctx, name, 2);  \n  stmt->u.m.module_name = extract_autosa_stmt_str_field(ctx, name, 1);\n  //stmt->u.m.dummy = !suffixcmp(stmt->u.m.module_name, \"dummy\");\n  if (!suffixcmp(stmt->u.m.module_name, \"dummy_in\") || !suffixcmp(stmt->u.m.module_name, \"dummy_out\"))\n    stmt->u.m.dummy = 1;\n  else\n    stmt->u.m.dummy = 0;\n  stmt->u.m.pe_dummy_module = pe_dummy_module;\n  if (!prefixcmp(name, \"ext_module_lower\"))\n  {\n    stmt->u.m.lower = 1;\n    stmt->u.m.upper = 0;\n  }\n  else if (!prefixcmp(name, \"ext_module_upper\"))\n  {\n    stmt->u.m.lower = 0;\n    stmt->u.m.upper = 1;\n  }\n  else\n  {\n    stmt->u.m.lower = 0;\n    stmt->u.m.upper = 0;\n  }\n\n  id = isl_id_alloc(ctx, \"ext_module\", stmt);\n  id = isl_id_set_free_user(id, &autosa_kernel_stmt_free);\n  if (!id)\n    autosa_kernel_stmt_free(stmt);\n  return isl_ast_node_set_annotation(node, id);\n}\n\n/* There are two types of module call statements:\n * module_call_upper and module_call_lower\n * For module_call_lower, if the module is connected to PEs,\n * we will calculate the AST expression io_pe_expr which is the \n * PE indices described by IO ids.\n */\nstatic __isl_give isl_ast_node *create_module_call_leaf(\n    struct autosa_kernel *kernel,\n    __isl_take isl_ast_node *node, struct autosa_hw_module *module,\n    struct autosa_pe_dummy_module *pe_dummy_module,\n    struct autosa_array_ref_group *group, const char *name,\n    __isl_keep isl_ast_build *build)\n{\n  struct autosa_kernel_stmt *stmt;\n  isl_id *id;\n  isl_ctx *ctx;\n  isl_multi_aff *trans;\n  isl_map *map;\n  isl_pw_multi_aff *pma;\n  isl_ast_expr *expr;\n\n  ctx = isl_ast_node_get_ctx(node);\n  stmt = isl_calloc_type(ctx, struct autosa_kernel_stmt);\n  if (!stmt)\n    return isl_ast_node_free(node);\n\n//#ifdef _DEBUG\n//  if (!strcmp(module->name, \"U_drain_IO_L2_out\")) {\n//    isl_union_map *sched_tmp;\n//    sched_tmp = isl_ast_build_get_schedule(build);\n//    DBGUMAP(stdout, sched_tmp, kernel->ctx);\n//    isl_space *space_tmp;\n//    space_tmp = isl_ast_build_get_schedule_space(build);\n//    DBGSPACE(stdout, space_tmp, kernel->ctx);\n//  }\n//#endif\n\n  stmt->type = AUTOSA_KERNEL_STMT_MODULE_CALL;\n  stmt->u.m.module = module;\n  stmt->u.m.group = group;\n  /* module_call_lower/upper.module_name.[is_boundary].[is_serialize].[lower_sched_val] */\n  stmt->u.m.boundary = extract_autosa_stmt_int_field(ctx, name, 2);\n  stmt->u.m.module_name = extract_autosa_stmt_str_field(ctx, name, 1);\n  //stmt->u.m.dummy = !suffixcmp(stmt->u.m.module_name, \"dummy\");  \n  if (!suffixcmp(stmt->u.m.module_name, \"dummy_in\") || !suffixcmp(stmt->u.m.module_name, \"dummy_out\"))\n    stmt->u.m.dummy = 1;\n  else\n    stmt->u.m.dummy = 0;\n  stmt->u.m.pe_dummy_module = pe_dummy_module;\n  stmt->u.m.serialize = extract_autosa_stmt_int_field(ctx, name, 3);\n  stmt->u.m.lower_sched_val = extract_autosa_stmt_int_field(ctx, name, 4);  \n//#ifdef _DEBUG\n//  if (!strcmp(stmt->u.m.module_name, \"U_tmp_1_PE_dummy_in\"))\n//    printf(\"debug here\\n\");\n//#endif\n\n  if (!prefixcmp(name, \"module_call_lower\"))\n  {\n    stmt->u.m.lower = 1;\n    stmt->u.m.upper = 0;\n  }\n  else if (!prefixcmp(name, \"module_call_upper\"))\n  {\n    stmt->u.m.lower = 0;\n    stmt->u.m.upper = 1;\n  }\n  else\n  {\n    stmt->u.m.lower = 0;\n    stmt->u.m.upper = 0;\n  }\n\n  if (stmt->u.m.lower)\n  {\n    if (!stmt->u.m.boundary)\n    {\n      if ((module->type == IO_MODULE || module->type == DRAIN_MODULE) && !group->io_pe_expr)\n      {\n        if (module->to_pe)\n        {\n          isl_union_map *umap = isl_ast_build_get_schedule(build);\n          isl_union_set *uset = isl_union_map_range(umap);\n          isl_set *set = isl_set_from_union_set(uset);\n          isl_map *map = isl_set_identity(set);\n          map = isl_map_flatten_range(map);\n          trans = isl_multi_aff_copy(group->io_trans);\n          isl_map *map2 = isl_map_from_multi_aff(trans);\n          map2 = isl_map_reverse(map2);\n          map = isl_map_apply_range(map, map2);\n          isl_pw_multi_aff *pma = isl_pw_multi_aff_from_map(map);\n          expr = isl_ast_build_access_from_pw_multi_aff(build, pma);\n          group->io_pe_expr = expr;\n        }\n      }\n    }\n    /* boundary module */\n    if (stmt->u.m.boundary)\n    {\n      if ((module->type == IO_MODULE || module->type == DRAIN_MODULE) && !group->io_pe_expr_boundary)\n      {\n        if (module->to_pe)\n        {\n          isl_union_map *umap = isl_ast_build_get_schedule(build);\n          isl_union_set *uset = isl_union_map_range(umap);\n          isl_set *set = isl_set_from_union_set(uset);\n          isl_map *map = isl_set_identity(set);\n          map = isl_map_flatten_range(map);\n          trans = isl_multi_aff_copy(group->io_trans);\n          isl_map *map2 = isl_map_from_multi_aff(trans);\n          map2 = isl_map_reverse(map2);\n          map = isl_map_apply_range(map, map2);\n          isl_pw_multi_aff *pma = isl_pw_multi_aff_from_map(map);\n          expr = isl_ast_build_access_from_pw_multi_aff(build, pma);\n          group->io_pe_expr_boundary = expr;\n        }\n      }\n    }\n  }\n\n  id = isl_id_alloc(ctx, \"module_call\", stmt);\n  id = isl_id_set_free_user(id, &autosa_kernel_stmt_free);\n  if (!id)\n    autosa_kernel_stmt_free(stmt);\n  return isl_ast_node_set_annotation(node, id);\n}\n\n/* For fifo decleration statements, we will compute the AST expressions of \n * PE indices that are described by the IO ids if the fifo is connected to \n * PEs.\n */\nstatic __isl_give isl_ast_node *create_fifo_decl_leaf(\n    struct autosa_kernel *kernel,\n    __isl_take isl_ast_node *node, struct autosa_hw_module *module,\n    struct autosa_array_ref_group *group, const char *name,\n    __isl_keep isl_ast_build *build)\n{\n  struct autosa_kernel_stmt *stmt;\n  isl_id *id;\n  isl_ctx *ctx;\n  isl_multi_aff *trans;\n  isl_map *map;\n  isl_pw_multi_aff *pma;\n  isl_ast_expr *expr;\n\n  ctx = isl_ast_node_get_ctx(node);\n  stmt = isl_calloc_type(ctx, struct autosa_kernel_stmt);\n  if (!stmt)\n    return isl_ast_node_free(node);\n\n  /* Generate the AST expr of io_trans. */\n  if (module->type == PE_MODULE && !group->io_L1_pe_expr)\n  {\n    isl_union_map *umap = isl_ast_build_get_schedule(build);\n    isl_union_set *uset = isl_union_map_range(umap);\n    isl_set *set = isl_set_from_union_set(uset);\n    isl_map *map = isl_set_identity(set);\n    map = isl_map_flatten_range(map);\n    trans = group->io_L1_trans;\n    isl_map *map2 = isl_map_from_multi_aff(isl_multi_aff_copy(trans));\n    map2 = isl_map_reverse(map2);\n    map = isl_map_apply_range(map, map2);\n    isl_pw_multi_aff *pma = isl_pw_multi_aff_from_map(map);\n    expr = isl_ast_build_access_from_pw_multi_aff(build, pma);\n    group->io_L1_pe_expr = expr;\n  }\n\n  stmt->type = AUTOSA_KERNEL_STMT_FIFO_DECL;\n  stmt->u.m.module = module;\n  stmt->u.m.group = group;\n  if (!prefixcmp(name, \"fifo_decl_boundary\"))\n    stmt->u.m.boundary = 1;\n  else\n    stmt->u.m.boundary = 0;\n  id = isl_id_alloc(ctx, \"fifo_decl\", stmt);\n  id = isl_id_set_free_user(id, &autosa_kernel_stmt_free);\n  if (!id)\n    autosa_kernel_stmt_free(stmt);\n  return isl_ast_node_set_annotation(node, id);\n}\n\n/* Attach a statement to the user node that describes the IO module type.\n */\nstatic __isl_give isl_ast_node *create_io_module_call_leaf(\n    struct autosa_kernel *kernel,\n    __isl_take isl_ast_node *node, struct autosa_hw_module *module,\n    const char *name, __isl_keep isl_ast_build *build)\n{\n  isl_id *id;\n  isl_ctx *ctx;\n  struct autosa_kernel_stmt *stmt;\n\n  ctx = isl_ast_node_get_ctx(node);\n  stmt = isl_calloc_type(ctx, struct autosa_kernel_stmt);\n  if (!stmt)\n    return isl_ast_node_free(node);\n\n  stmt->u.f.module = module;\n  stmt->u.f.boundary = extract_autosa_stmt_int_field(ctx, name, 2);\n  if (!prefixcmp(name, \"io_module.inter_trans\"))\n    stmt->type = AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTER_TRANS;\n  else if (!prefixcmp(name, \"io_module.intra_trans\"))\n    stmt->type = AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTRA_TRANS;\n  else if (!prefixcmp(name, \"io_module.inter_intra\"))\n    stmt->type = AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTER_INTRA;\n  else if (!prefixcmp(name, \"io_module.intra_inter\"))\n    stmt->type = AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTRA_INTER;\n  else if (!prefixcmp(name, \"io_module.state_handle\"))\n    stmt->type = AUTOSA_KERNEL_STMT_IO_MODULE_CALL_STATE_HANDLE;\n  id = isl_id_alloc(ctx, name, stmt);\n  id = isl_id_set_free_user(id, &autosa_kernel_stmt_free);\n  if (!id)\n    autosa_kernel_stmt_free(stmt);\n  return isl_ast_node_set_annotation(node, id);\n}\n\n/* This function is called for each instance of a user statement\n * in the kernel. This may be one of the original user statements\n * or a statement introduced by AutoSA.\n *\n * We first check if the statement id corresponds to a autosa statement,\n * which indicates the statement is an original user statement. Any statement\n * that is not an original user statement has been introduced by AutoSA and\n * requires special handling.\n *\n * If the user statement is one of the original user statements, then we call\n * create_domain_leaf.  \n * If it is \"init_device\", then we call build_array_bounds.  \n * Otherwise, we check if it is a copy statement and call the appropriate \n * functions.  \n * Statements that copy an array to/from the device do not need any \n * further treatment. Neither does \"clear_device\".\n */\nstatic __isl_give isl_ast_node *at_domain_module(__isl_take isl_ast_node *node,\n                                                 __isl_keep isl_ast_build *build, void *user)\n{\n  struct autosa_at_domain_data *data = (struct autosa_at_domain_data *)user;\n  struct autosa_stmt *device_stmt;\n  isl_ast_expr *expr, *arg;\n  isl_id *id;\n  int is_sync;\n  const char *name;\n  void *p;\n\n  expr = isl_ast_node_user_get_expr(node);\n  arg = isl_ast_expr_get_op_arg(expr, 0);\n  id = isl_ast_expr_get_id(arg);\n  name = isl_id_get_name(id);\n  p = isl_id_get_user(id);\n  isl_ast_expr_free(expr);\n  isl_ast_expr_free(arg);\n\n  device_stmt = find_stmt(data->prog, id);\n  isl_id_free(id);\n\n  if (device_stmt)\n    return create_domain_leaf_module(data->kernel, node, build, device_stmt);\n\n  if (!prefixcmp(name, \"to_device_\") || !prefixcmp(name, \"from_device_\"))\n    return node;\n  if (!strcmp(name, \"init_device\"))\n    return build_array_bounds(node, data->prog, build);\n  if (!strcmp(name, \"clear_device\"))\n    return node;\n  if (!strcmp(name, \"read\") || !strcmp(name, \"write\"))\n  {\n    struct autosa_array_ref_group *group = (struct autosa_array_ref_group *)p;\n    return create_access_leaf(data->kernel, group, node, build);\n  }\n  if (!prefixcmp(name, \"in\") || !prefixcmp(name, \"out\"))\n  {\n    struct autosa_array_ref_group_pair *pair = (struct autosa_array_ref_group_pair *)p;\n    return create_io_leaf(data->kernel, data->module, pair, node, build);\n  }\n  if (!prefixcmp(name, \"module_call\"))\n  {\n    /* module_call.[module_name]\n     * module_call_lower.[module_name]\n     */\n    struct autosa_array_ref_group *group = NULL;\n    if (!prefixcmp(name, \"module_call_lower\"))\n      group = (struct autosa_array_ref_group *)p;\n    return create_module_call_leaf(data->kernel, node, data->module, data->pe_dummy_module, group, name, build);\n  }\n  if (!prefixcmp(name, \"fifo_decl\"))\n  {\n    /* fifo_decl.[fifo_name]\n     * fifo_decl_boundary.[fifo_name]\n     */\n    struct autosa_array_ref_group *group = (struct autosa_array_ref_group *)p;\n    return create_fifo_decl_leaf(data->kernel, node, data->module, group, name, build);\n  }\n  if (!prefixcmp(name, \"ext_module\"))\n  {\n    /* set_ext_module_args_upper.[module_name]\n     * set_ext_module_args_lower.[module_name]\n     */\n    struct autosa_array_ref_group *group = NULL;\n    if (!prefixcmp(name, \"ext_module_lower\"))\n      group = (struct autosa_array_ref_group *)p;\n    return create_ext_module_leaf(data->kernel, node, data->module,\n                                  data->pe_dummy_module, group, name, build);\n  }\n  if (!prefixcmp(name, \"io_module\"))\n  {\n    return create_io_module_call_leaf(data->kernel, node, data->module, name, build);\n  }\n  if (!prefixcmp(name, \"drain_merge\"))\n  {\n    return create_drain_merge_leaf(data->kernel, data->drain_merge_func, node, build);\n  }\n  if (!prefixcmp(name, \"serialize\") || !prefixcmp(name, \"deserialize\"))\n  {\n    struct autosa_array_ref_group_pair *pair = (struct autosa_array_ref_group_pair *)p;\n    return create_serialize_leaf(data->kernel, pair, node, name, build);\n  }\n\n  return node;\n}\n\n/* This function is called before the AST generator starts traversing\n * the schedule subtree of a node with mark \"mark\".\n *\n * If the mark is called \"kernel\", store the kernel pointer in data->kernel\n * for use in at_domain_module.\n * If the mark is called \"module\", store the kernel pointer in data->module\n * for use in at_domain_module.\n */\nstatic isl_stat before_mark_module(__isl_keep isl_id *mark,\n                                   __isl_keep isl_ast_build *build, void *user)\n{\n  struct autosa_at_domain_data *data = (struct autosa_at_domain_data *)user;\n\n  if (!mark)\n    return isl_stat_error;\n  if (!strcmp(isl_id_get_name(mark), \"kernel\"))\n  {\n    data->kernel = (struct autosa_kernel *)isl_id_get_user(mark);\n  }\n  if (!strcmp(isl_id_get_name(mark), \"module\"))\n  {\n    data->module = (struct autosa_hw_module *)isl_id_get_user(mark);\n  }\n  if (!strcmp(isl_id_get_name(mark), \"pe_dummy_module\"))\n  {\n    data->pe_dummy_module = (struct autosa_pe_dummy_module *)isl_id_get_user(mark);\n    data->in_for = 0;\n  }\n  if (!strcmp(isl_id_get_name(mark), \"io_module.inter_trans\") ||\n      !strcmp(isl_id_get_name(mark), \"io_module.intra_trans\"))\n  {\n    data->filter_buffer = 1;\n    data->in_for = 0;\n  }\n  if (!strcmp(isl_id_get_name(mark), \"hls_pipeline\"))\n  {\n    data->under_pipeline = 1;\n  }\n  if (!strcmp(isl_id_get_name(mark), \"hls_unroll\"))\n  {\n    data->under_unroll = 1;\n  }\n  if (!strcmp(isl_id_get_name(mark), \"drain_merge\"))\n  {\n    data->drain_merge_func = (struct autosa_drain_merge_func *)isl_id_get_user(mark);\n  }\n  if (!strcmp(isl_id_get_name(mark), \"host_serialize\"))\n  {\n    data->module = (struct autosa_hw_module *)isl_id_get_user(mark);\n  }\n\n  return isl_stat_ok;\n}\n\n/* This function is called after the AST generator has finished traversing\n * the schedule subtree of a mark node. \"node\" points to the corresponding\n * mark AST node.\n *\n * If the mark is called \"module\", then replace \"node\" by a user node\n * that \"calls\" the module, representing the launch of the module.\n * The original \"node\" is stored inside the module object so that\n * it can be used to print the device code.\n * Also clear data->module.\n */\nstatic __isl_give isl_ast_node *after_mark_module(__isl_take isl_ast_node *node,\n                                                  __isl_keep isl_ast_build *build, void *user)\n{\n  isl_ctx *ctx;\n  isl_id *id;\n  isl_ast_expr *expr;\n  isl_ast_expr_list *list;\n  struct autosa_kernel *kernel;\n  struct autosa_at_domain_data *data = (struct autosa_at_domain_data *)user;\n  struct autosa_hw_module *module;\n  struct autosa_pe_dummy_module *pe_dummy_module;\n  struct autosa_drain_merge_func *func;\n  int tuning = data->tuning;\n  int tuning_num = data->tuning_num;\n\n  ctx = isl_ast_node_get_ctx(node);\n  id = isl_ast_node_mark_get_id(node);\n  if (!id)\n    return isl_ast_node_free(node);\n\n  if (!strcmp(isl_id_get_name(id), \"kernel\") && data->kernel)\n  {\n    isl_id_free(id);\n    if (tuning == 0 && tuning_num == 0) {\n      if (!data->kernel->space)\n        data->kernel->space = isl_ast_build_get_schedule_space(build);\n    }\n    data->kernel = NULL;\n    return node;\n  }\n  if (!strcmp(isl_id_get_name(id), \"io_module.inter_trans\"))\n  {\n    module = data->module;\n    if (tuning) {\n      if (!data->boundary)\n        module->tuning_inter_tree = isl_ast_node_mark_get_node(node);\n    } else if (tuning_num) {\n      if (!data->boundary)\n        module->tuning_num_inter_tree = isl_ast_node_mark_get_node(node);\n    } else {\n      if (!module->inter_space)\n        module->inter_space = isl_ast_build_get_schedule_space(build);\n\n      if (!data->boundary)\n        module->inter_tree = isl_ast_node_mark_get_node(node);\n      else\n        module->boundary_inter_tree = isl_ast_node_mark_get_node(node);      \n    }    \n    isl_ast_node_free(node);\n\n    expr = isl_ast_expr_from_id(isl_id_copy(id));\n    list = isl_ast_expr_list_alloc(ctx, 0);\n    expr = isl_ast_expr_call(expr, list);\n    node = isl_ast_node_alloc_user(expr);\n    node = isl_ast_node_set_annotation(node, id);\n\n    return node;\n  }\n  if (!strcmp(isl_id_get_name(id), \"io_module.intra_trans\"))\n  {\n    module = data->module;\n    if (tuning) {\n      module->tuning_intra_tree = isl_ast_node_mark_get_node(node);\n    } else if (tuning_num) {\n      module->tuning_num_intra_tree = isl_ast_node_mark_get_node(node);\n    } else { \n      if (!module->intra_space)\n        module->intra_space = isl_ast_build_get_schedule_space(build);\n      module->intra_tree = isl_ast_node_mark_get_node(node);\n    }\n    isl_ast_node_free(node);\n\n    expr = isl_ast_expr_from_id(isl_id_copy(id));\n    list = isl_ast_expr_list_alloc(ctx, 0);\n    expr = isl_ast_expr_call(expr, list);\n    node = isl_ast_node_alloc_user(expr);\n    node = isl_ast_node_set_annotation(node, id);\n\n    return node;\n  }\n  if (!strcmp(isl_id_get_name(id), \"drain_merge\"))\n  {  \n    if (tuning == 0 && tuning_num == 0) {\n      func = data->drain_merge_func;\n      func->device_tree = isl_ast_node_mark_get_node(node);\n    }\n    isl_ast_node_free(node);\n\n    expr = isl_ast_expr_from_id(isl_id_copy(id));\n    list = isl_ast_expr_list_alloc(ctx, 0);\n    expr = isl_ast_expr_call(expr, list);\n    node = isl_ast_node_alloc_user(expr);\n    node = isl_ast_node_set_annotation(node, id);\n\n    return node;\n  }\n  if (!strcmp(isl_id_get_name(id), \"host_serialize\"))\n  {\n    module = data->module;\n    data->module = NULL;\n    if (tuning == 0 && tuning_num == 0) {\n      module->serialize_tree = isl_ast_node_mark_get_node(node);\n    }\n    isl_ast_node_free(node);\n\n    expr = isl_ast_expr_from_id(isl_id_copy(id));\n    list = isl_ast_expr_list_alloc(ctx, 0);\n    expr = isl_ast_expr_call(expr, list);\n    node = isl_ast_node_alloc_user(expr);\n    node = isl_ast_node_set_annotation(node, id);\n\n    return node;\n  }\n  if (!strcmp(isl_id_get_name(id), \"hls_pipeline\"))\n  {\n    isl_id_free(id);\n    data->under_pipeline = 0;\n\n    return node;\n  }\n  if (!strcmp(isl_id_get_name(id), \"hls_unroll\"))\n  {\n    isl_id_free(id);\n    data->under_unroll = 0;\n\n    return node;\n  }\n  if (strcmp(isl_id_get_name(id), \"module\") || !data->module)\n  {\n    isl_id_free(id);\n    return node;\n  }\n  /* Prepare for boundary I/O module. */\n  if (data->boundary && data->filter_buffer == 0)\n  {\n    module = data->module;\n    data->module = NULL;\n    if (tuning == 0 && tuning_num == 0) {\n      module->boundary_tree = isl_ast_node_mark_get_node(node);\n      if (!module->space)\n        module->space = isl_ast_build_get_schedule_space(build);\n    }\n    \n    isl_ast_node_free(node);\n    \n    expr = isl_ast_expr_from_id(isl_id_copy(id));\n    list = isl_ast_expr_list_alloc(ctx, 0);\n    expr = isl_ast_expr_call(expr, list);\n    node = isl_ast_node_alloc_user(expr);\n    node = isl_ast_node_set_annotation(node, id);\n\n    return node;\n  }\n\n  /* Prepare for PE dummy module */\n  if (data->pe_dummy && data->filter_buffer == 0)\n  {\n    module = data->module;\n    data->module = NULL;\n    if (tuning == 0 && tuning_num == 0) {\n      pe_dummy_module = data->pe_dummy_module;      \n      pe_dummy_module->device_tree = isl_ast_node_mark_get_node(node);\n      if (!module->space)\n        module->space = isl_ast_build_get_schedule_space(build);\n    }\n    \n    data->pe_dummy_module = NULL;\n    isl_ast_node_free(node);\n    \n    expr = isl_ast_expr_from_id(isl_id_copy(id));\n    list = isl_ast_expr_list_alloc(ctx, 0);\n    expr = isl_ast_expr_call(expr, list);\n    node = isl_ast_node_alloc_user(expr);\n    node = isl_ast_node_set_annotation(node, id);\n\n    return node;\n  }\n\n  if (!data->boundary && data->filter_buffer == 0)\n  {\n    module = data->module;\n    data->module = NULL;\n    if (tuning) {\n      module->tuning_device_tree = isl_ast_node_mark_get_node(node);\n    } else if (tuning_num) {\n      module->tuning_num_device_tree = isl_ast_node_mark_get_node(node);\n    } else {    \n      module->device_tree = isl_ast_node_mark_get_node(node);\n      if (!module->space)\n        module->space = isl_ast_build_get_schedule_space(build);\n    }\n    isl_ast_node_free(node);\n    \n    expr = isl_ast_expr_from_id(isl_id_copy(id));\n    list = isl_ast_expr_list_alloc(ctx, 0);\n    expr = isl_ast_expr_call(expr, list);\n    node = isl_ast_node_alloc_user(expr);\n    node = isl_ast_node_set_annotation(node, isl_id_copy(id));\n  }\n  isl_id_free(id);\n\n  return node;\n}\n\nstatic __isl_give isl_id *before_for_module(\n    __isl_keep isl_ast_build *build, void *user)\n{\n  isl_id *id;\n  struct autosa_at_domain_data *data = (struct autosa_at_domain_data *)user;\n  struct autosa_ast_node_userinfo *node_info;\n\n  node_info = alloc_ast_node_userinfo();\n  /* TODO: Update the info for Catapult HLS. */\n  \n  id = isl_id_alloc(isl_ast_build_get_ctx(build), \"\", node_info);\n  id = isl_id_set_free_user(id, free_ast_node_userinfo);\n\n  return id;\n}\n\n//static __isl_give isl_id *before_for_module_call(\n//    __isl_keep isl_ast_build *build, void *user)\n//{\n//  isl_id *id;\n//  struct autosa_at_domain_data *data = (struct autosa_at_domain_data *)user;\n//  struct autosa_ast_node_userinfo *node_info;\n//\n//#ifdef _DEBUG\n//  if (!strcmp(data->module->name, \"U_drain_IO_L2_out\")) {\n//    isl_union_map *sched_tmp;\n//    sched_tmp = isl_ast_build_get_schedule(build);\n//    DBGUMAP(stdout, sched_tmp, data->kernel->ctx);\n//  }\n//#endif\n//\n//  node_info = alloc_ast_node_userinfo();\n//  id = isl_id_alloc(isl_ast_build_get_ctx(build), \"\", node_info);\n//  id = isl_id_set_free_user(id, free_ast_node_userinfo);\n//\n//  return id;\n//}\n\nstatic __isl_give isl_ast_node *after_for_module(\n    __isl_take isl_ast_node *node, __isl_keep isl_ast_build *build,\n    void *user)\n{\n  isl_id *id;\n  struct autosa_at_domain_data *data = (struct autosa_at_domain_data *)user;\n  struct autosa_ast_node_userinfo *node_info;\n\n  id = isl_ast_node_get_annotation(node);\n  node_info = (struct autosa_ast_node_userinfo *)isl_id_get_user(id);\n\n  //if (node_info->is_outermost_for)\n  //{\n  //node_info->is_outermost_for = 0;\n  //data->in_for = 0;\n  //}\n\n  isl_id_free(id);\n\n  return node;\n}\n\n/* Generate AST from the schedule for AutoSA hardware modules. \n * If \"iterator_prefix\" is set, we will use it as the iterator prefix.\n * Otherwise, we use the default value \"c\".\n */\nstatic __isl_give isl_ast_node *autosa_generate_ast_from_schedule(\n    __isl_take isl_schedule *schedule,\n    struct autosa_at_domain_data data, struct autosa_gen *gen,\n    const char *iterator_prefix)\n{\n  isl_ast_build *build;\n  isl_ast_node *tree;\n  isl_id_list *iterators;\n  int depth;\n\n  if (schedule == NULL)\n    return NULL;\n\n  depth = 0;\n  if (isl_schedule_foreach_schedule_node_top_down(schedule, &update_depth,\n                                                  &depth) < 0)\n    schedule = isl_schedule_free(schedule);\n  build = isl_ast_build_alloc(gen->prog->ctx);\n  iterators = ppcg_scop_generate_names(gen->prog->scop, depth,\n                                       iterator_prefix == NULL ? \"c\" : iterator_prefix);\n  build = isl_ast_build_set_iterators(build, iterators);\n  build = isl_ast_build_set_at_each_domain(build, &at_domain_module, &data);\n  build = isl_ast_build_set_before_each_mark(build, &before_mark_module, &data);\n  build = isl_ast_build_set_after_each_mark(build, &after_mark_module, &data);\n  build = isl_ast_build_set_before_each_for(build, &before_for_module, &data);\n  build = isl_ast_build_set_after_each_for(build, &after_for_module, &data);\n\n  if (gen->prog->scop->options->debug->dump_final_schedule)\n    isl_schedule_dump(schedule);\n  tree = isl_ast_build_node_from_schedule(build, schedule);\n  isl_ast_build_free(build);\n\n  return tree;\n}\n\nstruct loop_infinitize_check_data\n{\n  /* Indicates if we are checking the outermost loop bands. */\n  isl_bool outer_for;\n  struct autosa_hw_module *module;\n  /* Indicates if we have found any infinitizable loop. */\n  isl_bool found;\n  /* Number of infinitizable loops. */\n  int n_loops;\n};\n\nstruct iterator_used_data\n{\n  isl_ast_expr *iterator;\n  isl_bool used;\n  struct autosa_hw_module *module;\n  isl_bool has_inter_intra;\n};\n\n/* Search if the isl_ast_expr_id \"key\" exists in the ast_expr \"expr\".\n */\nstatic isl_bool search_expr_id(__isl_keep isl_ast_expr *expr, __isl_keep isl_ast_expr *key)\n{\n  enum isl_ast_expr_type type;\n\n  type = isl_ast_expr_get_type(expr);\n  if (type == isl_ast_expr_id)\n  {\n    return isl_ast_expr_is_equal(expr, key);\n  }\n  else if (type == isl_ast_expr_int)\n  {\n    return isl_bool_false;\n  }\n  else if (type == isl_ast_expr_op)\n  {\n    isl_size n_arg = isl_ast_expr_op_get_n_arg(expr);\n    for (int i = 0; i < n_arg; i++)\n    {\n      isl_ast_expr *arg = isl_ast_expr_op_get_arg(expr, i);\n      isl_bool found = search_expr_id(arg, key);\n      isl_ast_expr_free(arg);\n      if (found == isl_bool_true)\n        return isl_bool_true;\n    }\n  }\n\n  return isl_bool_false;\n}\n\nstruct search_id_to_expr_id_data\n{\n  bool found;\n  isl_ast_expr *iterator;\n};\n\nisl_stat search_id_to_expr_id(__isl_take isl_id *key,\n                              __isl_take isl_ast_expr *val, void *user)\n{\n  struct search_id_to_expr_id_data *data = (struct search_id_to_expr_id_data *)user;\n  data->found = (int)search_expr_id(val, data->iterator) || data->found;  \n\n  isl_id_free(key);\n  isl_ast_expr_free(val);\n  return isl_stat_ok;\n}\n\nstatic isl_bool iterator_used(__isl_keep isl_ast_node *node, void *user)\n{\n  struct iterator_used_data *data = (struct iterator_used_data *)user;\n  enum isl_ast_node_type type;\n  \n\n  type = isl_ast_node_get_type(node);\n  if (type == isl_ast_node_for)\n  {\n    isl_ast_expr *expr;\n    isl_bool found = isl_bool_false;\n\n    /* Init */\n    expr = isl_ast_node_for_get_init(node);\n    found = search_expr_id(expr, data->iterator);\n    isl_ast_expr_free(expr);\n    if (found)\n    {\n      data->used = isl_bool_true;\n      return isl_bool_false;\n    }\n\n    /* Cond */\n    expr = isl_ast_node_for_get_cond(node);\n    found = search_expr_id(expr, data->iterator);\n    isl_ast_expr_free(expr);\n    if (found)\n    {\n      data->used = isl_bool_true;\n      return isl_bool_false;\n    }\n  }\n  else if (type == isl_ast_node_if)\n  {\n    isl_ast_expr *expr;\n    isl_bool found = isl_bool_false;\n\n    /* Cond */\n    expr = isl_ast_node_if_get_cond(node);\n    found = search_expr_id(expr, data->iterator);\n    isl_ast_expr_free(expr);\n    if (found)\n    {\n      data->used = isl_bool_true;\n      return isl_bool_false;\n    }\n  }\n  else if (type == isl_ast_node_block)\n  {\n    /* We do nothing here. */\n    return isl_bool_true;\n  }\n  else if (type == isl_ast_node_mark)\n  {\n    /* We do nothing here. */\n    return isl_bool_true;\n  }\n  else if (type == isl_ast_node_user)\n  {\n    isl_ast_expr *expr;\n    isl_bool found = isl_bool_false;\n    isl_id *id;\n    struct autosa_kernel_stmt *stmt;\n\n    id = isl_ast_node_get_annotation(node);\n    stmt = (struct autosa_kernel_stmt *)isl_id_get_user(id);\n    isl_id_free(id);\n\n    if (stmt->type == AUTOSA_KERNEL_STMT_DOMAIN)\n    {\n      /* TODO: At present, we only test if the array index contains the iterator.\n       */\n      isl_id_to_ast_expr *ref2expr = stmt->u.d.ref2expr;\n      struct search_id_to_expr_id_data local_data;\n      local_data.found = isl_bool_false;\n      local_data.iterator = data->iterator;\n      isl_id_to_ast_expr_foreach(ref2expr, &search_id_to_expr_id, &local_data);\n      if (local_data.found)\n      {\n        data->used = isl_bool_true;\n        return isl_bool_false;\n      }\n    }\n    else if (stmt->type == AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTER_TRANS ||\n             stmt->type == AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTRA_TRANS ||\n             stmt->type == AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTER_INTRA ||\n             stmt->type == AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTRA_INTER)\n    {\n      isl_ast_node *nested_node;\n      struct iterator_used_data nested_used_data;\n\n      data->has_inter_intra = isl_bool_true;\n\n      /* Search under the nested AST tree. */\n      nested_node = data->module->inter_tree;\n      nested_used_data.iterator = data->iterator;\n      nested_used_data.used = data->used;\n      nested_used_data.module = data->module;\n      isl_ast_node_foreach_descendant_top_down(nested_node, &iterator_used,\n                                               &nested_used_data);\n      found = nested_used_data.used;\n      if (found)\n      {\n        data->used = isl_bool_true;\n        return isl_bool_false;\n      }\n\n      /* Search under the nested AST tree. */\n      nested_node = data->module->intra_tree;\n      nested_used_data.iterator = data->iterator;\n      nested_used_data.used = data->used;\n      nested_used_data.module = data->module;\n      isl_ast_node_foreach_descendant_top_down(nested_node, &iterator_used,\n                                               &nested_used_data);\n      found = nested_used_data.used;\n      if (found)\n      {\n        data->used = isl_bool_true;\n        return isl_bool_false;\n      }\n    }\n    else if (stmt->type == AUTOSA_KERNEL_STMT_IO_TRANSFER)\n    {\n      int filter_depth = stmt->u.i.filter_sched_depth;\n      if (stmt->u.i.boundary)\n        filter_depth = -1;\n      if (filter_depth < 0)\n        return isl_bool_true;\n\n      /* Check if the iterator equals to c[filter_depth]. */\n      isl_printer *p_str;\n      char *filter_iterator;\n      char *cur_iterator;\n      p_str = isl_printer_to_str(isl_ast_node_get_ctx(node));\n      p_str = isl_printer_print_str(p_str, \"c\");\n      p_str = isl_printer_print_int(p_str, filter_depth);\n      filter_iterator = isl_printer_get_str(p_str);\n      p_str = isl_printer_flush(p_str);\n\n      p_str = isl_printer_set_output_format(p_str, ISL_FORMAT_C);\n      p_str = isl_printer_print_ast_expr(p_str, data->iterator);\n      cur_iterator = isl_printer_get_str(p_str);\n      isl_printer_free(p_str);\n\n      if (!strcmp(filter_iterator, cur_iterator))\n        found = isl_bool_true;\n      free(filter_iterator);\n      free(cur_iterator);\n\n      if (found)\n      {\n        data->used = isl_bool_true;\n        return isl_bool_false;\n      }\n    }\n  }\n\n  return isl_bool_true;\n}\n\nstatic isl_bool loop_infinitize_check(__isl_keep isl_ast_node *node, void *user)\n{\n  struct loop_infinitize_check_data *data = (struct loop_infinitize_check_data *)user;\n  enum isl_ast_node_type type;\n\n  /* Only check the for loops in the outermost loop band. */\n  if (!data->outer_for)\n    return isl_bool_false;\n\n  type = isl_ast_node_get_type(node);\n  if (type == isl_ast_node_block || type == isl_ast_node_user)\n  {\n    data->outer_for = isl_bool_false;\n    return isl_bool_false;\n  }\n  if (type == isl_ast_node_for && !isl_ast_node_for_is_degenerate(node))\n  {\n    isl_ast_expr *iterator;\n    isl_ast_node *body;\n    isl_bool used = isl_bool_false;\n    struct iterator_used_data used_data;\n    isl_id *id;\n\n    iterator = isl_ast_node_for_get_iterator(node);\n    body = isl_ast_node_for_get_body(node);\n    /* Examine if the iterator exists in any AST expressions in the sub tree. */\n    used_data.iterator = iterator;\n    used_data.used = isl_bool_false;\n    used_data.module = data->module;\n    used_data.has_inter_intra = isl_bool_false;\n    isl_ast_node_foreach_descendant_top_down(body, &iterator_used, &used_data);\n\n    if (!used_data.used)\n    {\n      /* This loop is legal to be infinitized. */\n      struct autosa_ast_node_userinfo *node_info;\n\n      data->n_loops++;\n      id = isl_ast_node_get_annotation(node);\n      if (id)\n      {\n        node_info = (struct autosa_ast_node_userinfo *)isl_id_get_user(id);\n        if (node_info)\n        {\n          node_info->is_infinitize_legal = 1;\n          if (!data->found)\n          {\n            node_info->is_first_infinitizable_loop = 1;\n            data->found = isl_bool_true;\n          }\n\n          if (used_data.has_inter_intra)\n          {\n            isl_space *space;\n            int n;\n            isl_printer *p_str;\n            char *iterator_str;\n            /* Update the inter/intra_trans module space. \n             * Remove the corresponding iterators from the sub module space. \n             */\n            p_str = isl_printer_to_str(isl_id_get_ctx(id));\n            p_str = isl_printer_set_output_format(p_str, ISL_FORMAT_C);\n            p_str = isl_printer_print_ast_expr(p_str, iterator);\n            iterator_str = isl_printer_get_str(p_str);\n            isl_printer_free(p_str);\n\n            space = data->module->inter_space;\n            n = isl_space_find_dim_by_name(space, isl_dim_set, iterator_str);\n            if (n >= 0)\n              space = isl_space_drop_dims(space, isl_dim_set, n, 1);\n            data->module->inter_space = space;\n\n            space = data->module->intra_space;\n            n = isl_space_find_dim_by_name(space, isl_dim_set, iterator_str);\n            if (n >= 0)\n              space = isl_space_drop_dims(space, isl_dim_set, n, 1);\n            data->module->intra_space = space;\n\n            free(iterator_str);\n          }\n        }\n        isl_id_free(id);\n      }\n    }\n    else\n    {\n      /* Stop from here. */\n      isl_ast_expr_free(iterator);\n      isl_ast_node_free(body);\n      return isl_bool_false;\n    }\n\n    isl_ast_expr_free(iterator);\n    isl_ast_node_free(body);\n  }\n\n  return isl_bool_true;\n}\n\n/* Try to apply the loop infinitization optimization.\n * This optimization is useful for Intel devices since we can remove some \n * for loops with a simple while (1) loop to reduce the loop control overheads.\n * We will examine the outermost for loop band from outside to inside.\n * For each for loop, we examine if the loop iterator appears in any AST\n * expression below. If not, this loop will be marked to be infinitized later.\n * When printing out for loops later, such loops will be skipped. \n * Since we use the nested AST for module ASTs, we examine the \n * module->tree.\n * If we encounter any AST node calling io_module.inter_trans/io_module.intra_trans,\n * we will search from module->intra_tree and module->inter_tree\n * otherwise, we will search from module->device_tree.\n */\nstatic void loop_infinitization_optimize(struct autosa_hw_module *module)\n{\n  if (module->double_buffer || module->to_mem)\n    return;\n\n  if (module->device_tree)\n  {\n    isl_ast_node *node = module->device_tree;\n    struct loop_infinitize_check_data data = {isl_bool_true, module, isl_bool_false};\n    isl_ast_node_foreach_descendant_top_down(node, &loop_infinitize_check, &data);\n  }\n  if (module->boundary_tree)\n  {\n    isl_ast_node *node = module->boundary_tree;\n    struct loop_infinitize_check_data data = {isl_bool_true, module, isl_bool_false};\n    isl_ast_node_foreach_descendant_top_down(node, &loop_infinitize_check, &data);\n  }\n}\n\n/* Mark all for loop as visited.  \n */\nstatic isl_bool update_for_visit(__isl_keep isl_ast_node *node, void *user)\n{\n  enum isl_ast_node_type type;\n\n  type = isl_ast_node_get_type(node);\n  if (type == isl_ast_node_for)\n  {\n    struct autosa_ast_node_userinfo *info;\n    isl_id *id;\n\n    id = isl_ast_node_get_annotation(node);\n    if (id)\n    {\n      info = (struct autosa_ast_node_userinfo *)isl_id_get_user(id);\n      info->visited = 1;\n    }\n    isl_id_free(id);\n  }\n\n  return isl_bool_true;\n}\n\nstruct count_loop_data {\n  int pe;\n  int io;\n  int under_simd;\n  int find_simd_loop;\n  int n_loop;\n  int under_latency;  \n  int find_latency_loop;\n  int n_latency_loop;  \n};\n\nstatic isl_bool count_loop(__isl_keep isl_ast_node *node, void *user)\n{\n  struct count_loop_data *data = (struct count_loop_data *)user;\n  enum isl_ast_node_type type;\n\n  type = isl_ast_node_get_type(node);\n  if (type == isl_ast_node_for) {\n    data->n_loop++;        \n    if (data->pe) {\n      if (data->under_simd) {\n        data->find_simd_loop = 1;      \n      }\n      if (data->under_latency) {\n        data->n_latency_loop++;\n      }\n    }\n  } else if (type == isl_ast_node_mark) {\n    isl_id *id;\n    id = isl_ast_node_mark_get_id(node);    \n    if (!strcmp(isl_id_get_name(id), \"simd\")) {\n      data->under_simd = 1;\n    } \n    if (!strcmp(isl_id_get_name(id), \"latency\")) {\n      data->under_latency = 1;\n    }\n    isl_id_free(id);\n  }\n\n  return isl_bool_true;\n}\n\nstruct loop_coalesce_update_data {\n  int update_level_for_pe;\n  int update_level_for_io;\n};\n\nstatic isl_bool update_latency_coalesce(__isl_keep isl_ast_node *node, void *user)\n{\n  struct count_loop_data *data = (struct count_loop_data *)user;\n  enum isl_ast_node_type type;\n  \n  type = isl_ast_node_get_type(node);\n  if (type == isl_ast_node_for) {\n    if (data->under_latency && data->find_latency_loop == 0) {\n      struct autosa_ast_node_userinfo *info;\n      isl_id *id;\n            \n      id = isl_ast_node_get_annotation(node);\n      if (id) {\n        info = (struct autosa_ast_node_userinfo *)isl_id_get_user(id);       \n        info->n_coalesce_loop = data->n_latency_loop - ((data->find_simd_loop == 1)? 1 : 0);        \n      }\n      isl_id_free(id);\n      data->find_latency_loop = 1;\n    }\n  } else if (type == isl_ast_node_mark) {\n    isl_id *id;\n    id = isl_ast_node_mark_get_id(node);    \n    if (!strcmp(isl_id_get_name(id), \"latency\")) {\n      data->under_latency = 1;\n    }\n    isl_id_free(id);\n  }\n\n  return isl_bool_true;\n}\n\n/* If the ast node is a for loop node, we will first extract the annonated \n * userinfo from the node. If the loop is marked to be infinitized, we will \n * skip this loop.\n * Otherwise, since we visit the AST in top-down manner, this is the outermost \n * loop to be added with the loop_coalesce pragma.\n * We will mark all the chidren nodes of this node as visited.\n * Next time when we first meet an unvisited for node, that will be the other\n * outermost loop to be annodated. \n * \n * If the module is PE module or intra_trans I/O module with data pack, \n * we will also update the for loop levels beneath the current for node.\n */\nstatic isl_bool loop_coalesce_update(__isl_keep isl_ast_node *node, void *user)\n{\n  struct loop_coalesce_update_data *data = (struct loop_coalesce_update_data *)user;\n  enum isl_ast_node_type type;\n\n  type = isl_ast_node_get_type(node);\n  if (type == isl_ast_node_for)\n  {\n    struct autosa_ast_node_userinfo *info;\n    isl_id *id;\n\n    id = isl_ast_node_get_annotation(node);\n    if (id)\n    {\n      info = (struct autosa_ast_node_userinfo *)isl_id_get_user(id);\n      if (info && !info->is_infinitize_legal && !info->visited)\n      {\n        /* This is the outermost loop to be coalesced. \n         * We will then visit all the children nodes and add the visit flag.\n         */\n        info->visited = 1;\n        info->is_outermost_for = 1;\n        /* Update the children. */\n        isl_ast_node_foreach_descendant_top_down(node, &update_for_visit, NULL);\n        if (data->update_level_for_io) {\n          info->is_dep_free = 1;\n        } else if (data->update_level_for_pe) {\n          struct count_loop_data tmp_data = \n            {data->update_level_for_pe, data->update_level_for_io, 0, 0, 0, 0, 0, 0};\n          isl_ast_node_foreach_descendant_top_down(node, &count_loop, &tmp_data);\n          if (tmp_data.pe && tmp_data.find_simd_loop) {          \n            info->n_coalesce_loop = tmp_data.n_loop - tmp_data.n_latency_loop; \n            /* Update the coalesce info for the latency hiding loop */\n            tmp_data.under_latency = 0;\n            tmp_data.find_latency_loop = 0;            \n            isl_ast_node_foreach_descendant_top_down(node, &update_latency_coalesce, &tmp_data);\n          } else if (tmp_data.io) {\n            info->n_coalesce_loop = tmp_data.n_loop - 1;\n          } else {\n            info->n_coalesce_loop = 0;\n          }          \n        }\n      }\n      isl_id_free(id);\n    }\n  }\n\n  return isl_bool_true;\n}\n\n/* This function will mark the outermost for loop which is not infinitized \n * to be added with \"loop_coalesce\" pragma later in the generated OpenCL code.\n * We will examine all the AST trees to be printed for this module.\n */\nstatic void loop_coalesce_optimize(struct autosa_hw_module *module)\n{\n  isl_ast_node *node;\n  struct loop_coalesce_update_data data = {0, 0};\n  if (module->type == PE_MODULE)\n    data.update_level_for_pe = 1;      \n\n  if (module->device_tree)\n  {\n    node = module->device_tree;\n    isl_ast_node_foreach_descendant_top_down(node, &loop_coalesce_update, &data);\n  }\n  if (module->inter_tree)\n  {\n    node = module->inter_tree;\n    isl_ast_node_foreach_descendant_top_down(node, &loop_coalesce_update, &data);\n  }\n  if (module->intra_tree)\n  {\n    if (module->data_pack_inter != module->data_pack_intra && module->in == 0)\n      data.update_level_for_io = 1;\n    node = module->intra_tree;\n    isl_ast_node_foreach_descendant_top_down(node, &loop_coalesce_update, &data);\n    data.update_level_for_io = 0;\n  }\n  if (module->boundary_outer_tree)\n  {\n    node = module->boundary_outer_tree;\n    isl_ast_node_foreach_descendant_top_down(node, &loop_coalesce_update, &data);\n  }\n  if (module->boundary_inter_tree)\n  {\n    node = module->boundary_inter_tree;\n    isl_ast_node_foreach_descendant_top_down(node, &loop_coalesce_update, &data);\n  }\n  if (module->boundary_tree)\n  {\n    node = module->boundary_tree;\n    isl_ast_node_foreach_descendant_top_down(node, &loop_coalesce_update, &data);\n  }\n}\n\nstruct loop_guards_update_data {\n  /* Indicates if we are checking the outermost loop bands. */\n  isl_bool outer_for;\n  struct autosa_hw_module *module;\n  /* Indicates if we have found any infinitizable loop. */\n  isl_bool found;\n  /* Number of infinitizable loops. */\n  int n_loops;\n  int start_updated;\n  int end_updated;\n  /* Store the last for loop info. */\n  struct autosa_ast_node_userinfo *info;\n  int module_type; // default: 0 outer: 1 intra: 2 inter: 3\n  int double_buffer;\n  char *module_name;\n  char *buf_name;\n  int inter;\n  int read;\n};\n\n/* We mark the guard_start at the outermost for loop.\n * As for the guard_end, we mark it at the last for loop before the double buffer mark\n * for inter/intra trans module, \n * for the rest, we mark it at the last infinitizable loop.\n */\nstatic isl_bool loop_guards_update(__isl_keep isl_ast_node *node, void *user)\n{\n  struct loop_guards_update_data *data = (struct loop_guards_update_data *)user;\n  enum isl_ast_node_type type;\n\n  type = isl_ast_node_get_type(node);\n  if (type == isl_ast_node_for) {\n    struct autosa_ast_node_userinfo *info;\n    isl_id *id;\n\n    if (data->end_updated) {\n      /* Count the loops inside the guards. */\n      data->n_loops++;\n    } else {\n      data->n_loops--;\n    }\n\n    id = isl_ast_node_get_annotation(node);\n    if (id) {\n      info = (struct autosa_ast_node_userinfo *)isl_id_get_user(id);\n      if (!data->end_updated)\n        info->visited = true;      \n\n      if (info && !data->start_updated) {\n        data->start_updated = 1;\n        info->is_guard_start = 1;        \n      }\n      if (info && info->is_infinitize_legal && !data->end_updated) {\n        /* This is the first loop that can't be infinitized */        \n        if (data->n_loops == 0) {\n          info->is_guard_end = 1;\n          data->end_updated = 1;          \n          /* Update the local buffer information if needed. */\n          if (data->module_type == 2 || data->module_type == 3) {\n            info->double_buffer = data->double_buffer;\n            info->module_name = data->module_name;\n            info->inter = data->module_type - 2;\n            info->read = data->read;\n            info->buf_name = data->buf_name;\n          } else {\n            info->double_buffer = -1;\n            info->module_name = NULL;\n            info->inter = -1;\n            info->read = -1;\n            info->buf_name = NULL;\n          }\n        }\n      }      \n      data->info = info;\n    } \n    isl_id_free(id);\n  } else if (type == isl_ast_node_mark) {\n    isl_id *id = isl_ast_node_mark_get_id(node);\n    const char *name = isl_id_get_name(id);\n    if (!strcmp(name, \"synth\")) {\n      data->info->is_guard_end = 1;\n      data->end_updated = 1;\n      data->n_loops = 0;\n      if (data->module_type == 2 || data->module_type == 3) {\n        data->info->double_buffer = data->double_buffer;\n        data->info->module_name = data->module_name;\n        data->info->inter = data->module_type - 2;\n        data->info->read = data->read;\n        data->info->buf_name = data->buf_name;\n      }\n    }\n    isl_id_free(id);\n  }\n  \n  return isl_bool_true;\n}\n\nstatic isl_bool loop_pipeline_update(__isl_keep isl_ast_node *node, void *user)\n{\n  enum isl_ast_node_type type;\n\n  type = isl_ast_node_get_type(node);\n  if (type == isl_ast_node_for) {\n    struct autosa_ast_node_userinfo *info;\n    isl_id *id;\n\n    id = isl_ast_node_get_annotation(node);\n    if (id) {\n      info = (struct autosa_ast_node_userinfo *)isl_id_get_user(id);\n      if (info && !info->visited) {\n        /* This is the outermost loop to be pipelined.\n         * We will visit all the children nodes and update hte visit flag.\n         */\n        info->visited = 1;\n        info->is_pipeline = 1;\n        /* Update the children. */\n        isl_ast_node_foreach_descendant_top_down(node, &update_for_visit, NULL);\n      }      \n    }\n    isl_id_free(id);\n  }\n\n  return isl_bool_true;\n}\n\n/* Mark the loop_guard_start before the outermost loop. \n * Store the fifo guards information \n * - name of fifos:\n * - number of elements to be read\n * Mark the loop guard_end.\n * - For inter/intra module, mark it at the end of the outer loop.\n *   Store the infomation about\n *   - module name\n *   - buffer name\n *   - fifo name\n * - For other modules, put it after the last loop in the outermost loop band.\n */\nstatic void loop_guards_optimize(struct autosa_hw_module *module)\n{    \n  /* Mark the loop guard start before the outermost loop. */\n  if (module->device_tree) {    \n    isl_ast_node *node = module->device_tree;\n    struct loop_guards_update_data data = \n      {isl_bool_true, module, isl_bool_false, 0, 0, 0, NULL, 0};\n    data.double_buffer = module->double_buffer;\n    data.module_name = module->name;\n    data.buf_name = NULL;\n    data.inter = -1;\n    data.read = -1;\n    isl_ast_node_foreach_descendant_top_down(node, &loop_infinitize_check, &data);\n    isl_ast_node_foreach_descendant_top_down(node, &loop_guards_update, &data);\n    if (data.n_loops == 0)\n      module->pipeline_at_default_func = 1;\n    else {      \n      /* Find the first for loop under the guard_end. Mark it as pipeline. */\n      isl_ast_node_foreach_descendant_top_down(node, &loop_pipeline_update, NULL);\n    }\n  }\n  if (module->inter_tree) {    \n    isl_ast_node *node = module->inter_tree;\n    struct loop_guards_update_data data = {isl_bool_true, module, isl_bool_false, 0, 0, 0, NULL, 3};\n    data.double_buffer = module->double_buffer;\n    data.module_name = module->name;\n    if (module->n_var > 0) {\n      data.buf_name = (&(module->var[0]))->name;\n    } else {\n      data.buf_name = NULL;\n    }\n    data.inter = 1;\n    data.read = (module->in)? 0 : 1;\n    isl_ast_node_foreach_descendant_top_down(node, &loop_infinitize_check, &data);\n    isl_ast_node_foreach_descendant_top_down(node, &loop_guards_update, &data);    \n    if (data.n_loops == 0) {\n      module->pipeline_at_filter_func[2] = 1;      \n    } else {\n      /* Find the first for loop under the guard_end. Mark it as pipeline. */\n      isl_ast_node_foreach_descendant_top_down(node, &loop_pipeline_update, NULL);\n    }\n  }\n  if (module->intra_tree) {\n    isl_ast_node *node = module->intra_tree;\n    struct loop_guards_update_data data = {isl_bool_true, module, isl_bool_false, 0, 0, 0, NULL, 2};\n    data.double_buffer = module->double_buffer;\n    data.module_name = module->name;\n    if (module->n_var > 0) {\n      data.buf_name = (&(module->var[0]))->name;\n    } else {\n      data.buf_name = NULL;\n    }\n    data.inter = 0;\n    data.read = (module->in)? 1 : 0;\n    isl_ast_node_foreach_descendant_top_down(node, &loop_infinitize_check, &data);\n    isl_ast_node_foreach_descendant_top_down(node, &loop_guards_update, &data);        \n    if (data.n_loops == 0) {\n      module->pipeline_at_filter_func[1] = 1;          \n    } else {\n      /* Find the first for loop under the guard_end. Mark it as pipeline. */\n      isl_ast_node_foreach_descendant_top_down(node, &loop_pipeline_update, NULL);\n    }\n  }\n  if (module->boundary_outer_tree) {    \n    isl_ast_node *node = module->boundary_outer_tree;\n    struct loop_guards_update_data data = {isl_bool_true, module, isl_bool_false, 0, 0, 0, NULL, 1};\n    data.double_buffer = module->double_buffer;\n    data.module_name = module->name;\n    data.buf_name = NULL;\n    data.inter = -1;\n    data.read = -1;\n    isl_ast_node_foreach_descendant_top_down(node, &loop_infinitize_check, &data);\n    isl_ast_node_foreach_descendant_top_down(node, &loop_guards_update, &data);\n    if (data.n_loops != 0) {\n      /* Find the first for loop under the guard_end. Mark it as pipeline. */\n      isl_ast_node_foreach_descendant_top_down(node, &loop_pipeline_update, NULL);\n    }\n  }\n  if (module->boundary_inter_tree) {    \n    isl_ast_node *node = module->boundary_inter_tree;\n    struct loop_guards_update_data data = {isl_bool_true, module, isl_bool_false, 0, 0, 0, NULL, 3};\n    data.double_buffer = module->double_buffer;\n    data.module_name = module->name;\n    if (module->n_var > 0) {\n      data.buf_name = (&(module->var[0]))->name;\n    } else {\n      data.buf_name = NULL;\n    }\n    data.inter = 1;\n    data.read = (module->in)? 0 : 1;\n    isl_ast_node_foreach_descendant_top_down(node, &loop_infinitize_check, &data);\n    isl_ast_node_foreach_descendant_top_down(node, &loop_guards_update, &data);\n    if (data.n_loops != 0) {\n      /* Find the first for loop under the guard_end. Mark it as pipeline. */\n      isl_ast_node_foreach_descendant_top_down(node, &loop_pipeline_update, NULL);\n    }\n  }\n  if (module->boundary_tree) {    \n    isl_ast_node *node = module->boundary_tree;\n    struct loop_guards_update_data data = {isl_bool_true, module, isl_bool_false, 0, 0, 0, NULL, 0};\n    data.double_buffer = module->double_buffer;\n    data.module_name = module->name;\n    data.buf_name = NULL;\n    data.inter = -1;\n    data.read = -1;\n    isl_ast_node_foreach_descendant_top_down(node, &loop_infinitize_check, &data);\n    isl_ast_node_foreach_descendant_top_down(node, &loop_guards_update, &data);\n    if (data.n_loops != 0) {\n      /* Find the first for loop under the guard_end. Mark it as pipeline. */\n      isl_ast_node_foreach_descendant_top_down(node, &loop_pipeline_update, NULL);\n    }\n  }\n\n  return;\n}\n\n/* If marker is not the following, delete it.\n * kernel, module, pe_dummy_module, \n * io_module.inter_trans, io_module.intra_trans,\n * hls_pipeline, hls_unroll,\n * drain_merge, host_serialize\n */\nstatic __isl_give isl_schedule_node *delete_marker_catapult(\n  __isl_take isl_schedule_node *node, void *user)\n{\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_mark) {\n    isl_id *id;\n    const char *name;\n    id = isl_schedule_node_mark_get_id(node);\n    name = isl_id_get_name(id);\n    isl_id_free(id);\n    if (!(!strcmp(name, \"kernel\") || !strcmp(name, \"module\") || !strcmp(name, \"pe_dummy_module\") ||\n        !strcmp(name, \"io_module.inter_trans\") || !strcmp(name, \"io_module.intra_trans\") || \n        !strcmp(name, \"hls_pipeline\") || !strcmp(name, \"hls_unroll\") ||\n        !strcmp(name, \"drain_merge\") || !strcmp(name, \"host_serialize\") ||\n        !strcmp(name, \"synth\")))\n    {\n      /* Delete the current marker. */\n      node = isl_schedule_node_delete(node);\n    }\n  }\n  return node;\n}\n\n/* There are three schedules to handle in this module:\n * - outer loop schedule\n * - inter trans schedule\n * - intra trans schedule\n * We will first generate AST for inter trans function and intra trans function.\n * The AST trees below the inter trans and intra trans mark are stored \n * seperately.\n * The outer loop AST will print out these two AST trees while handling \n * the inter trans and intra trans function calls.\n */\nisl_stat sa_filter_buffer_io_module_generate_code(struct autosa_gen *gen,\n                                                  struct autosa_hw_module *module)\n{\n  isl_schedule *schedule;\n  struct autosa_at_domain_data data;\n  isl_ast_node *tree;\n\n  /* Generate AST for inter transfer function call. */\n  schedule = module->inter_sched;\n  if (gen->options->target == AUTOSA_TARGET_CATAPULT_HLS_C) {\n    /* Delete the unnecessary marker. */\n    schedule = isl_schedule_map_schedule_node_bottom_up(\n      schedule, &delete_marker_catapult, NULL);\n  }\n  autosa_at_domain_data_init(&data, gen);\n  tree = autosa_generate_ast_from_schedule(schedule, data, gen,\n                                           module->double_buffer && gen->options->autosa->double_buffer_style == 0 ? \"inter_c\" : NULL);\n  isl_ast_node_free(tree);\n  if (gen->options->autosa->tuning_method == 1 && module->tuning_inter_sched) {\n    schedule = module->tuning_inter_sched;\n    autosa_at_domain_data_init(&data, gen);\n    data.tuning = 1;\n    data.tuning_num = 0;\n    tree = autosa_generate_ast_from_schedule(schedule, data, gen,\n                                             module->double_buffer && gen->options->autosa->double_buffer_style == 0 ? \"inter_c\" : NULL);\n    isl_ast_node_free(tree);\n  }\n  if (gen->options->autosa->tuning_method == 1 && module->tuning_inter_sched) {\n    schedule = module->tuning_num_inter_sched;\n    autosa_at_domain_data_init(&data, gen);\n    data.tuning = 0;\n    data.tuning_num = 1;\n    tree = autosa_generate_ast_from_schedule(schedule, data, gen,\n                                             module->double_buffer && gen->options->autosa->double_buffer_style == 0 ? \"inter_c\" : NULL);\n    isl_ast_node_free(tree);\n  }\n\n  if (module->boundary)\n  {\n    /* Generate boundary module AST. */\n    schedule = module->boundary_inter_sched;\n    if (gen->options->target == AUTOSA_TARGET_CATAPULT_HLS_C) {\n      /* Delete the unnecessary marker. */\n      schedule = isl_schedule_map_schedule_node_bottom_up(\n        schedule, &delete_marker_catapult, NULL);\n    }\n    autosa_at_domain_data_init(&data, gen);\n    data.boundary = 1;\n    tree = autosa_generate_ast_from_schedule(schedule, data, gen,\n                                             module->double_buffer && gen->options->autosa->double_buffer_style == 0 ? \"inter_c\" : NULL);\n    isl_ast_node_free(tree);\n  }\n\n  /* Generate AST for intra transfer function call. */\n  schedule = module->intra_sched;  \n  if (gen->options->target == AUTOSA_TARGET_CATAPULT_HLS_C) {\n    /* Delete the unnecessary marker. */\n    schedule = isl_schedule_map_schedule_node_bottom_up(\n      schedule, &delete_marker_catapult, NULL);\n  }\n  autosa_at_domain_data_init(&data, gen);\n  tree = autosa_generate_ast_from_schedule(schedule, data, gen,\n                                           module->double_buffer && gen->options->autosa->double_buffer_style == 0 ? \"intra_c\" : NULL);\n  isl_ast_node_free(tree);\n  if (gen->options->autosa->tuning_method == 1 && module->tuning_inter_sched) {\n    schedule = module->tuning_intra_sched;\n    autosa_at_domain_data_init(&data, gen);\n    data.tuning = 1;\n    data.tuning_num = 0;\n    tree = autosa_generate_ast_from_schedule(schedule, data, gen,\n                                             module->double_buffer && gen->options->autosa->double_buffer_style == 0 ? \"inter_c\" : NULL);\n    isl_ast_node_free(tree);\n  }\n  if (gen->options->autosa->tuning_method == 1 && module->tuning_inter_sched) {\n    schedule = module->tuning_num_intra_sched;\n    autosa_at_domain_data_init(&data, gen);\n    data.tuning = 0;\n    data.tuning_num = 1;\n    tree = autosa_generate_ast_from_schedule(schedule, data, gen,\n                                             module->double_buffer && gen->options->autosa->double_buffer_style == 0 ? \"inter_c\" : NULL);\n    isl_ast_node_free(tree);\n  }\n\n  /* Generate AST for outer loop function call. */\n  schedule = module->outer_sched;  \n  if (gen->options->target == AUTOSA_TARGET_CATAPULT_HLS_C) {\n    /* Delete the unnecessary marker. */\n    schedule = isl_schedule_map_schedule_node_bottom_up(\n      schedule, &delete_marker_catapult, NULL);\n  }\n  autosa_at_domain_data_init(&data, gen);\n  tree = autosa_generate_ast_from_schedule(schedule, data, gen,\n                                           module->double_buffer && gen->options->autosa->double_buffer_style == 0 ? \"outer_c\" : NULL);\n  module->tree = tree;\n  if (gen->options->autosa->tuning_method == 1 && module->tuning_inter_sched) {\n    schedule = module->tuning_outer_sched;\n    autosa_at_domain_data_init(&data, gen);\n    data.tuning = 1;\n    data.tuning_num = 0;\n    tree = autosa_generate_ast_from_schedule(schedule, data, gen,\n                                             module->double_buffer && gen->options->autosa->double_buffer_style == 0 ? \"inter_c\" : NULL);\n    module->tuning_tree = tree;\n  }\n  if (gen->options->autosa->tuning_method == 1 && module->tuning_inter_sched) {\n    schedule = module->tuning_num_outer_sched;\n    autosa_at_domain_data_init(&data, gen);\n    data.tuning = 0;\n    data.tuning_num = 1;\n    tree = autosa_generate_ast_from_schedule(schedule, data, gen,\n                                             module->double_buffer && gen->options->autosa->double_buffer_style == 0 ? \"inter_c\" : NULL);\n    module->tuning_num_tree = tree;\n  }\n\n  if (module->boundary)\n  {\n    /* Generate boundary module AST. */\n    schedule = module->boundary_outer_sched;    \n    if (gen->options->target == AUTOSA_TARGET_CATAPULT_HLS_C) {\n      /* Delete the unnecessary marker. */\n      schedule = isl_schedule_map_schedule_node_bottom_up(\n        schedule, &delete_marker_catapult, NULL);\n    }\n    autosa_at_domain_data_init(&data, gen);\n    data.boundary = 1;\n    tree = autosa_generate_ast_from_schedule(schedule, data, gen,\n                                             module->double_buffer && gen->options->autosa->double_buffer_style == 0 ? \"outer_c\" : NULL);\n    isl_ast_node_free(tree);\n  }\n\n  /* Perform loop infinitization optimization. */\n  if (gen->options->target == AUTOSA_TARGET_INTEL_OPENCL &&\n      gen->options->autosa->loop_infinitize)\n  {\n    loop_infinitization_optimize(module);\n  }\n  /* Perform loop coalesce optimization. \n   * This step should be always after the loop infinitization opt.\n   */\n  if (gen->options->target == AUTOSA_TARGET_INTEL_OPENCL)\n  {\n    loop_coalesce_optimize(module);\n  }\n  if (gen->options->target == AUTOSA_TARGET_CATAPULT_HLS_C) \n  {    \n    loop_guards_optimize(module);    \n  }\n\n  return isl_stat_ok;\n}\n\n/* Use isl to generate code for host data serialization/deserialization. \n */\nisl_stat sa_host_serialize_generate_code(struct autosa_gen *gen,\n                                         struct autosa_hw_module *module)\n{\n  isl_schedule *schedule;\n  struct autosa_at_domain_data data;\n  isl_ast_node *tree;\n\n  schedule = module->serialize_sched;\n  autosa_at_domain_data_init(&data, gen);\n  tree = autosa_generate_ast_from_schedule(schedule, data, gen, NULL);\n  isl_ast_node_free(tree);\n\n  return isl_stat_ok;\n}\n\n/* Use isl to generate code for the hw module from \"schedule\".\n * The device code of the hw module is marked by \"module\" mark nodes in the \n * schedule tree, containing a pointer to a autosa_hw_module object.\n * The returned AST only contains the AST for the host code.\n * The ASTs for the device code are embedded in autosa_hw_module objects\n * attached to the leaf nodes that call \"module\".\n */\nisl_stat sa_module_generate_code(struct autosa_gen *gen,\n                                 struct autosa_hw_module *module)\n{\n  isl_schedule *schedule;\n  struct autosa_at_domain_data data;\n  isl_ast_node *tree;\n\n  schedule = module->sched;  \n  if (gen->options->target == AUTOSA_TARGET_CATAPULT_HLS_C) {\n    /* Delete the unnecessary marker. */\n    schedule = isl_schedule_map_schedule_node_bottom_up(\n      schedule, &delete_marker_catapult, NULL);\n  }\n  autosa_at_domain_data_init(&data, gen);\n  tree = autosa_generate_ast_from_schedule(schedule, data, gen, NULL);\n  module->tree = tree;\n  if (gen->options->autosa->tuning_method == 1 && module->tuning_sched) {\n    /* Generate the tuning AST. */    \n    schedule = module->tuning_sched;\n    autosa_at_domain_data_init(&data, gen);\n    data.tuning = 1;\n    data.tuning_num = 0;\n    tree = autosa_generate_ast_from_schedule(schedule, data, gen, NULL);\n    module->tuning_tree = tree;\n  }\n  if (gen->options->autosa->tuning_method == 1 && module->tuning_num_sched) {\n    schedule = module->tuning_num_sched;\n    autosa_at_domain_data_init(&data, gen);\n    data.tuning = 0;\n    data.tuning_num = 1;\n    tree = autosa_generate_ast_from_schedule(schedule, data, gen, NULL);\n    module->tuning_num_tree = tree;    \n  }\n\n  if (module->boundary)\n  {\n    /* Generate boundary module AST */\n    schedule = module->boundary_sched;\n    if (gen->options->target == AUTOSA_TARGET_CATAPULT_HLS_C) {\n      /* Delete the unnecessary marker. */\n      schedule = isl_schedule_map_schedule_node_bottom_up(\n        schedule, &delete_marker_catapult, NULL);\n    } \n    autosa_at_domain_data_init(&data, gen);\n    data.boundary = 1;\n    tree = autosa_generate_ast_from_schedule(schedule, data, gen, NULL);\n    isl_ast_node_free(tree);\n  }\n\n  if (module->n_pe_dummy_modules > 0)\n  {\n    /* Generate dummy module AST */\n    for (int i = 0; i < module->n_pe_dummy_modules; i++)\n    {\n      struct autosa_pe_dummy_module *dummy_module = module->pe_dummy_modules[i];\n      schedule = dummy_module->sched;\n      autosa_at_domain_data_init(&data, gen);\n      data.pe_dummy = 1;\n      data.pe_dummy_module = dummy_module;\n      tree = autosa_generate_ast_from_schedule(schedule, data, gen, NULL);\n      isl_ast_node_free(tree);\n    }\n  }\n\n  /* Perform loop infinitization optimization. */\n  if (gen->options->target == AUTOSA_TARGET_INTEL_OPENCL &&\n      gen->options->autosa->loop_infinitize)\n  {\n    loop_infinitization_optimize(module);\n  }\n  /* Perform loop coalesce optimization. \n   * This step should be always after the loop infinitization opt.\n   */\n  if (gen->options->target == AUTOSA_TARGET_INTEL_OPENCL)\n  {\n    loop_coalesce_optimize(module);\n  }\n  /* Mark the loop guards. */\n  if (gen->options->target == AUTOSA_TARGET_CATAPULT_HLS_C) \n  {\n    loop_guards_optimize(module);\n  }\n\n  return isl_stat_ok;\n}\n\nisl_stat sa_drain_merge_generate_code(struct autosa_gen *gen,\n                                      struct autosa_drain_merge_func *func)\n{\n  isl_schedule *schedule;\n  struct autosa_at_domain_data data;\n  isl_ast_node *tree;\n\n  schedule = func->sched;\n  autosa_at_domain_data_init(&data, gen);\n  tree = autosa_generate_ast_from_schedule(schedule, data, gen, NULL);\n  func->tree = tree;\n\n  return isl_stat_ok;\n}\n\n/* This function is called after the AST generator has finished traversing\n * the schedule subtree of a mark node. \"node\" points to the corresponding\n * mark AST node.\n *\n * If the mark is called \"fifo_decl\", then replace \"node\" by a user node\n * that \"calls\" the fifo_decl, representing the printing of fifo decls.\n * We will store the AST node into the fifo_decl_wrapped_trees.\n */\nstatic __isl_give isl_ast_node *after_mark_fifo_decl(\n    __isl_take isl_ast_node *node,\n    __isl_keep isl_ast_build *build, void *user)\n{\n  isl_ctx *ctx;\n  isl_id *id;\n  isl_ast_expr *expr;\n  isl_ast_expr_list *list;\n  struct autosa_kernel *kernel;\n  struct autosa_at_domain_data *data = (struct autosa_at_domain_data *)user;\n  struct autosa_hw_module *module;\n  struct autosa_hw_top_module *top;\n\n  ctx = isl_ast_node_get_ctx(node);\n  id = isl_ast_node_mark_get_id(node);\n  if (!id)\n    return isl_ast_node_free(node);\n\n  if (!strcmp(isl_id_get_name(id), \"kernel\") && data->kernel)\n  {\n    isl_id_free(id);\n    if (!data->kernel->space)\n      data->kernel->space = isl_ast_build_get_schedule_space(build);\n    data->kernel = NULL;\n    return node;\n  }\n  if (strcmp(isl_id_get_name(id), \"module\") || !data->module)\n  {\n    isl_id_free(id);\n    return node;\n  }\n  top = data->top;\n  data->top = NULL;\n  top->n_fifo_decl_wrapped++;\n  top->fifo_decl_wrapped_trees = (isl_ast_node **)realloc(\n      top->fifo_decl_wrapped_trees,\n      top->n_fifo_decl_wrapped * sizeof(isl_ast_node *));\n  top->fifo_decl_wrapped_trees[top->n_fifo_decl_wrapped - 1] =\n      isl_ast_node_mark_get_node(node);\n  isl_ast_node_free(node);\n\n  expr = isl_ast_expr_from_id(isl_id_copy(id));\n  list = isl_ast_expr_list_alloc(ctx, 0);\n  expr = isl_ast_expr_call(expr, list);\n  node = isl_ast_node_alloc_user(expr);\n  node = isl_ast_node_set_annotation(node, id);\n\n  return node;\n}\n\n/* Generate code for declaring fifos given the input schedule \"schedule\". \n */\n__isl_give isl_ast_node *sa_fifo_decl_generate_code(\n    struct autosa_gen *gen, __isl_take isl_schedule *schedule)\n{\n  struct autosa_at_domain_data data;\n  isl_ast_build *build;\n  isl_ast_node *tree;\n  isl_id_list *iterators;\n\n  int depth;\n\n  if (schedule == NULL)\n    return NULL;\n\n  data.prog = gen->prog;\n  data.kernel = NULL;\n  data.module = NULL;\n  data.top = gen->hw_top_module;\n\n  depth = 0;\n  if (isl_schedule_foreach_schedule_node_top_down(schedule, &update_depth,\n                                                  &depth) < 0)\n    schedule = isl_schedule_free(schedule);\n  build = isl_ast_build_alloc(gen->prog->ctx);\n  iterators = ppcg_scop_generate_names(gen->prog->scop, depth, \"c\");\n  build = isl_ast_build_set_iterators(build, iterators);\n  build = isl_ast_build_set_at_each_domain(build, &at_domain_module, &data);\n  build = isl_ast_build_set_before_each_mark(build, &before_mark_module, &data);\n  build = isl_ast_build_set_after_each_mark(build, &after_mark_fifo_decl, &data);\n  if (gen->prog->scop->options->debug->dump_final_schedule)\n    isl_schedule_dump(schedule);\n  tree = isl_ast_build_node_from_schedule(build, schedule);\n  isl_ast_build_free(build);\n\n  return tree;\n}\n\n/* This function is called after the AST generator has finished traversing\n * the schedule subtree of a mark node. \"node\" points to the corresponding\n * mark AST node.\n *\n * If the mark is called \"module call\", then replace \"node\" by a user node\n * that \"calls\" the module call, representing the printing of module calls.\n * We will store the AST node into the module_call_wrapped_trees.\n */\nstatic __isl_give isl_ast_node *after_mark_module_call(\n    __isl_take isl_ast_node *node,\n    __isl_keep isl_ast_build *build, void *user)\n{\n  isl_ctx *ctx;\n  isl_id *id;\n  isl_ast_expr *expr;\n  isl_ast_expr_list *list;\n  struct autosa_kernel *kernel;\n  struct autosa_at_domain_data *data = (struct autosa_at_domain_data *)user;\n  struct autosa_hw_module *module;\n  struct autosa_hw_top_module *top;\n\n  ctx = isl_ast_node_get_ctx(node);\n  id = isl_ast_node_mark_get_id(node);\n  if (!id)\n    return isl_ast_node_free(node);\n\n  if (!strcmp(isl_id_get_name(id), \"kernel\") && data->kernel)\n  {\n    isl_id_free(id);\n    if (!data->kernel->space)\n      data->kernel->space = isl_ast_build_get_schedule_space(build);\n    data->kernel = NULL;\n    return node;\n  }\n  if (strcmp(isl_id_get_name(id), \"module\") || !data->module)\n  {\n    isl_id_free(id);\n    return node;\n  }\n  top = data->top;\n  data->top = NULL;\n  top->n_module_call_wrapped++;\n  top->module_call_wrapped_trees = (isl_ast_node **)realloc(\n      top->module_call_wrapped_trees,\n      top->n_module_call_wrapped * sizeof(isl_ast_node *));\n  top->module_call_wrapped_trees[top->n_module_call_wrapped - 1] =\n      isl_ast_node_mark_get_node(node);\n  isl_ast_node_free(node);\n\n  expr = isl_ast_expr_from_id(isl_id_copy(id));\n  list = isl_ast_expr_list_alloc(ctx, 0);\n  expr = isl_ast_expr_call(expr, list);\n  node = isl_ast_node_alloc_user(expr);\n  node = isl_ast_node_set_annotation(node, id);\n\n  return node;\n}\n\n/* Generate code for calling modules given the input schedule \"schedule\". \n */\n__isl_give isl_ast_node *sa_module_call_generate_code(\n    struct autosa_gen *gen, __isl_take isl_schedule *schedule)\n{\n  struct autosa_at_domain_data data;\n  isl_ast_build *build;\n  isl_ast_node *tree;\n  isl_id_list *iterators;\n\n  int depth;\n\n  if (schedule == NULL)\n    return NULL;\n\n  data.prog = gen->prog;\n  data.kernel = NULL;\n  data.module = NULL;\n  data.pe_dummy_module = NULL;\n  data.top = gen->hw_top_module;\n\n  depth = 0;\n  if (isl_schedule_foreach_schedule_node_top_down(schedule, &update_depth,\n                                                  &depth) < 0)\n    schedule = isl_schedule_free(schedule);\n  build = isl_ast_build_alloc(gen->prog->ctx);\n  iterators = ppcg_scop_generate_names(gen->prog->scop, depth, \"c\");\n  build = isl_ast_build_set_iterators(build, iterators);\n  build = isl_ast_build_set_at_each_domain(build, &at_domain_module, &data);\n  build = isl_ast_build_set_before_each_mark(build, &before_mark_module, &data);\n  build = isl_ast_build_set_after_each_mark(build, &after_mark_module_call, &data);\n  //build = isl_ast_build_set_before_each_for(build, &before_for_module_call, &data);\n  if (gen->prog->scop->options->debug->dump_final_schedule)\n    isl_schedule_dump(schedule);\n  tree = isl_ast_build_node_from_schedule(build, schedule);\n  isl_ast_build_free(build);\n\n  return tree;\n}\n\n/* This function is called after the AST generator has finished traversing\n * the schedule subtree of a mark node. \"node\" points to the corresponding\n * mark AST node.\n *\n * If the mark is called \"module call\", then replace \"node\" by a user node\n * that \"calls\" the module call, representing the printing of module calls.\n * We will store the AST node into the module_call_wrapped_trees.\n */\nstatic __isl_give isl_ast_node *after_mark_ext_module(\n    __isl_take isl_ast_node *node,\n    __isl_keep isl_ast_build *build, void *user)\n{\n  isl_ctx *ctx;\n  isl_id *id;\n  isl_ast_expr *expr;\n  isl_ast_expr_list *list;\n  struct autosa_kernel *kernel;\n  struct autosa_at_domain_data *data = (struct autosa_at_domain_data *)user;\n  struct autosa_hw_module *module;\n  struct autosa_hw_top_module *top;\n\n  ctx = isl_ast_node_get_ctx(node);\n  id = isl_ast_node_mark_get_id(node);\n  if (!id)\n    return isl_ast_node_free(node);\n\n  if (!strcmp(isl_id_get_name(id), \"kernel\") && data->kernel)\n  {\n    isl_id_free(id);\n    if (!data->kernel->space)\n      data->kernel->space = isl_ast_build_get_schedule_space(build);\n    data->kernel = NULL;\n    return node;\n  }\n  if (strcmp(isl_id_get_name(id), \"module\") || !data->module)\n  {\n    isl_id_free(id);\n    return node;\n  }\n  top = data->top;\n  data->top = NULL;\n  top->n_ext_module_wrapped++;\n  top->ext_module_wrapped_trees = (isl_ast_node **)realloc(\n      top->ext_module_wrapped_trees,\n      top->n_ext_module_wrapped * sizeof(isl_ast_node *));\n  top->ext_module_wrapped_trees[top->n_ext_module_wrapped - 1] =\n      isl_ast_node_mark_get_node(node);\n  isl_ast_node_free(node);\n\n  expr = isl_ast_expr_from_id(isl_id_copy(id));\n  list = isl_ast_expr_list_alloc(ctx, 0);\n  expr = isl_ast_expr_call(expr, list);\n  node = isl_ast_node_alloc_user(expr);\n  node = isl_ast_node_set_annotation(node, id);\n\n  return node;\n}\n\n/* Generate code for setting arguments of the io modules connected to the \n * external memory given the input schedule \"schedule\". \n */\n__isl_give isl_ast_node *sa_set_ext_module_args_generate_code(\n    struct autosa_gen *gen, __isl_take isl_schedule *schedule)\n{\n  struct autosa_at_domain_data data;\n  isl_ast_build *build;\n  isl_ast_node *tree;\n  isl_id_list *iterators;\n\n  int depth;\n\n  if (schedule == NULL)\n    return NULL;\n\n  data.prog = gen->prog;\n  data.kernel = NULL;\n  data.module = NULL;\n  data.pe_dummy_module = NULL;\n  data.top = gen->hw_top_module;\n\n  depth = 0;\n  if (isl_schedule_foreach_schedule_node_top_down(schedule, &update_depth,\n                                                  &depth) < 0)\n    schedule = isl_schedule_free(schedule);\n  build = isl_ast_build_alloc(gen->prog->ctx);\n  iterators = ppcg_scop_generate_names(gen->prog->scop, depth, \"c\");\n  build = isl_ast_build_set_iterators(build, iterators);\n  build = isl_ast_build_set_at_each_domain(build, &at_domain_module, &data);\n  build = isl_ast_build_set_before_each_mark(build, &before_mark_module, &data);\n  build = isl_ast_build_set_after_each_mark(build,\n                                            &after_mark_ext_module, &data);\n  if (gen->prog->scop->options->debug->dump_final_schedule)\n    isl_schedule_dump(schedule);\n  tree = isl_ast_build_node_from_schedule(build, schedule);\n  isl_ast_build_free(build);\n\n  return tree;\n}\n\n/* Generate AST for module calls and fifo decls in the top module.\n */\nisl_stat sa_top_module_generate_code(struct autosa_gen *gen)\n{\n  struct autosa_hw_top_module *top = gen->hw_top_module;\n  /* fifo declaration */\n  top->fifo_decl_trees = (isl_ast_node **)malloc(\n      top->n_fifo_decls * sizeof(isl_ast_node *));\n  for (int i = 0; i < top->n_fifo_decls; i++)\n  {\n    top->fifo_decl_trees[i] = sa_fifo_decl_generate_code(gen,\n                                                         top->fifo_decl_scheds[i]);\n  }\n\n  /* module call */\n  top->module_call_trees = (isl_ast_node **)malloc(\n      top->n_module_calls * sizeof(isl_ast_node *));\n  for (int i = 0; i < top->n_module_calls; i++)\n  {\n    top->module_call_trees[i] = sa_module_call_generate_code(gen,\n                                                             top->module_call_scheds[i]);\n  }\n\n  if (gen->options->target == AUTOSA_TARGET_INTEL_OPENCL)\n  {\n    top->ext_module_trees = (isl_ast_node **)malloc(\n        top->n_ext_module * sizeof(isl_ast_node *));\n    for (int i = 0; i < top->n_ext_module; i++)\n    {\n      top->ext_module_trees[i] = sa_set_ext_module_args_generate_code(gen,\n                                                                      top->ext_module_scheds[i]);\n    }\n\n    //    for (int i = 0; i < top->n_ext_module; i++) {\n    //      isl_ast_node_free(top->ext_module_trees[i]);\n    //      isl_ast_node_free(top->ext_module_wrapped_trees[i]);\n    //    }\n    //    free(top->ext_module_trees);\n    //    free(top->ext_module_wrapped_trees);\n    //    top->ext_module_trees = NULL;\n    //    top->ext_module_wrapped_trees = NULL;\n    //    top->n_ext_module = 0;\n  }\n\n  return isl_stat_ok;\n}\n\n/* Representation of a statement inside a generated AST.\n *\n * \"stmt\" refers to the original statement.\n * \"ref2expr\" maps the reference identifier of each access in\n * the statement to an AST expression that should be printed\n * at the place of the access.\n */\nstruct ppcg_stmt {\n\tstruct pet_stmt *stmt;\n\n\tisl_id_to_ast_expr *ref2expr;\n};\n\nstatic __isl_give isl_printer *print_user(__isl_take isl_printer *p,\n  __isl_take isl_ast_print_options *print_options,\n  __isl_keep isl_ast_node *node, void *user)\n{\n\tstruct ppcg_stmt *stmt;\n\tisl_id *id;\n  const char *stmt_name;\n\n\tid = isl_ast_node_get_annotation(node);\n\tstmt = (struct ppcg_stmt *)isl_id_get_user(id);\n  stmt_name = isl_id_get_name(id);\n\tisl_id_free(id);\n\n  if (stmt)\n\t  p = pet_stmt_print_body(stmt->stmt, p, stmt->ref2expr);\n  else\n    p = isl_printer_print_str(p, stmt_name);\n\n\tisl_ast_print_options_free(print_options);\n  return p;\n}\n\n///* Set *depth (initialized to 0 by the caller) to the maximum\n// * of the schedule depths of the leaf nodes for which this function is called.\n// */\n//static isl_bool update_depth(__isl_keep isl_schedule_node *node, void *user)\n//{\n//\tint *depth = (int *)user;\n//\tint node_depth;\n//\n//\tif (isl_schedule_node_get_type(node) != isl_schedule_node_leaf)\n//\t\treturn isl_bool_true;\n//\tnode_depth = isl_schedule_node_get_schedule_depth(node);\n//\tif (node_depth > *depth)\n//\t\t*depth = node_depth;\n//\n//\treturn isl_bool_false;\n//}\n\n/* Find the element in scop->stmts that has the given \"id\".\n */\nstatic struct pet_stmt *pet_find_stmt(struct ppcg_scop *scop, __isl_keep isl_id *id)\n{\n\tint i;\n\n\tfor (i = 0; i < scop->pet->n_stmt; ++i) {\n\t\tstruct pet_stmt *stmt = scop->pet->stmts[i];\n\t\tisl_id *id_i;\n\n\t\tid_i = isl_set_get_tuple_id(stmt->domain);\n\t\tisl_id_free(id_i);\n\n\t\tif (id_i == id)\n\t\t\treturn stmt;\n\t}\n\n\tisl_die(isl_id_get_ctx(id), isl_error_internal,\n\t\t\"statement not found\", return NULL);\n}\n\n/* Index transformation callback for pet_stmt_build_ast_exprs.\n *\n * \"index\" expresses the array indices in terms of statement iterators\n * \"iterator_map\" expresses the statement iterators in terms of\n * AST loop iterators.\n *\n * The result expresses the array indices in terms of\n * AST loop iterators.\n */\nstatic __isl_give isl_multi_pw_aff *pullback_index(\n\t__isl_take isl_multi_pw_aff *index, __isl_keep isl_id *id, void *user)\n{\n\tisl_pw_multi_aff *iterator_map = (isl_pw_multi_aff *)user;\n\n\titerator_map = isl_pw_multi_aff_copy(iterator_map);\n\treturn isl_multi_pw_aff_pullback_pw_multi_aff(index, iterator_map);\n}\n\nstatic void ppcg_stmt_free(void *user)\n{\n\tstruct ppcg_stmt *stmt = (struct ppcg_stmt *)user;\n\n\tif (!stmt)\n\t\treturn;\n\n\tisl_id_to_ast_expr_free(stmt->ref2expr);\n\n\tfree(stmt);\n}\n\n/* Transform the accesses in the statement associated to the domain\n * called by \"node\" to refer to the AST loop iterators, construct\n * corresponding AST expressions using \"build\",\n * collect them in a ppcg_stmt and annotate the node with the ppcg_stmt.\n */\nstatic __isl_give isl_ast_node *at_each_domain(__isl_take isl_ast_node *node,\n\t__isl_keep isl_ast_build *build, void *user)\n{\n\tstruct ppcg_scop *scop = (struct ppcg_scop *)user;\n\tisl_ast_expr *expr, *arg;\n\tisl_ctx *ctx;\n\tisl_id *id;\n\tisl_map *map;\n\tisl_pw_multi_aff *iterator_map;\n\tstruct ppcg_stmt *stmt;  \n\n\tctx = isl_ast_node_get_ctx(node);\n\tstmt = isl_calloc_type(ctx, struct ppcg_stmt);\n\tif (!stmt)\n\t\tgoto error;\n\n\texpr = isl_ast_node_user_get_expr(node);\n\targ = isl_ast_expr_get_op_arg(expr, 0);\n\tisl_ast_expr_free(expr);\n\tid = isl_ast_expr_get_id(arg);\n\tisl_ast_expr_free(arg);\n\tstmt->stmt = pet_find_stmt(scop, id);\n\tisl_id_free(id);\n\tif (!stmt->stmt)\n    ppcg_stmt_free(stmt);\n    return node;\n\t\t//goto error;\n\n\tmap = isl_map_from_union_map(isl_ast_build_get_schedule(build));\n\tmap = isl_map_reverse(map);\n\titerator_map = isl_pw_multi_aff_from_map(map);\n\tstmt->ref2expr = pet_stmt_build_ast_exprs(stmt->stmt, build,\n\t\t\t\t    &pullback_index, iterator_map, NULL, NULL);\n\tisl_pw_multi_aff_free(iterator_map);\n\n\tid = isl_id_alloc(isl_ast_node_get_ctx(node), NULL, stmt);\n\tid = isl_id_set_free_user(id, &ppcg_stmt_free);\n\treturn isl_ast_node_set_annotation(node, id);\nerror:\n\tppcg_stmt_free(stmt);\n\treturn isl_ast_node_free(node);\n}\n\n/* For internal debugging.\n * Print out the code from the given schedule.\n */\nvoid print_code(struct autosa_gen *gen, __isl_take isl_schedule *schedule, const char *output_f)\n{\n  isl_ast_node *tree;\n  isl_printer *p;\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = gen->ctx;\n  FILE *f;\n  int depth;\n  isl_ast_build *build;\n  isl_id_list *iterators;\n  \n  depth = 0;\n  if (isl_schedule_foreach_schedule_node_top_down(schedule, &update_depth, &depth) < 0)\n\t\treturn;\n  build = isl_ast_build_alloc(ctx);\n  iterators = ppcg_scop_generate_names(gen->prog->scop, depth, \"c\");\n  build = isl_ast_build_set_iterators(build, iterators);\n  build = isl_ast_build_set_at_each_domain(build, &at_each_domain, gen->prog->scop);\n  tree = isl_ast_build_node_from_schedule(build, schedule);\n  isl_ast_build_free(build);\n\n  f = fopen(output_f, \"w\");\n  p = isl_printer_to_file(ctx, f);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_user, NULL);\n  p = isl_ast_node_print(tree, p, print_options);\n\n  isl_ast_node_free(tree);\n  fclose(f);\n  isl_printer_free(p);\n}\n\n/* Dump the intermediate code. */\nvoid dump_intermediate_code(\n  struct autosa_gen *gen, __isl_take isl_schedule *schedule, const char *stage)\n{\n  FILE *tmp_f;\n  isl_printer *p;\n  isl_ast_node *tree = sa_generate_code(gen, schedule);\n  \n  p = isl_printer_to_str(gen->ctx);\n  p = isl_printer_print_str(p, gen->options->autosa->output_dir);\n  p = isl_printer_print_str(p, \"/src/tmp.\");\n  p = isl_printer_print_str(p, stage);\n  p = isl_printer_print_str(p, \".cpp\");\n  char *f_path = isl_printer_get_str(p)        ;\n  isl_printer_free(p);\n  tmp_f = fopen(f_path, \"w\");\n  free(f_path);\n  p = isl_printer_to_file(gen->ctx, tmp_f);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  isl_ast_print_options *print_options;\n  print_options = isl_ast_print_options_alloc(gen->ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_cpu_user, NULL);\n  p = isl_ast_node_print(tree, p, print_options);\n  p = isl_printer_free(p);\n  fclose(tmp_f);\n  isl_ast_node_free(tree);  \n}"
  },
  {
    "path": "src/autosa_codegen.h",
    "content": "#ifndef _AUTOSA_CODEGEN_H\n#define _AUTOSA_CODEGEN_H\n\n#include \"print.h\"\n#include \"util.h\"\n\n#include \"autosa_common.h\"\n\nvoid generate_hw_modules(__isl_take isl_schedule *schedule,\n                         struct autosa_gen *gen, struct autosa_kernel *kernel);\n\n__isl_give isl_schedule_node *sa_add_to_from_device(\n    __isl_take isl_schedule_node *node, __isl_take isl_union_set *domain,\n    __isl_take isl_union_map *prefix, struct autosa_prog *prog);\n__isl_give isl_schedule_node *sa_add_init_clear_device(\n    __isl_take isl_schedule_node *node, struct autosa_kernel *kernel);\n__isl_give isl_schedule_node *sa_add_drain_merge(\n    __isl_take isl_schedule_node *node, struct autosa_gen *gen);\n\n__isl_give isl_ast_node *sa_generate_code(struct autosa_gen *gen,\n                                          __isl_take isl_schedule *schedule);\nisl_stat sa_filter_buffer_io_module_generate_code(struct autosa_gen *gen,\n                                                  struct autosa_hw_module *module);\nisl_stat sa_module_generate_code(struct autosa_gen *gen,\n                                 struct autosa_hw_module *module);\nisl_stat sa_top_module_generate_code(struct autosa_gen *gen);\nisl_stat sa_drain_merge_generate_code(struct autosa_gen *gen,\n                                      struct autosa_drain_merge_func *func);\nisl_stat sa_host_serialize_generate_code(struct autosa_gen *gen,\n                                         struct autosa_hw_module *module);                                      \n\nint autosa_array_requires_device_allocation(struct autosa_array_info *array);\n\n__isl_give isl_schedule_node *insert_io_group_domain(\n  __isl_take isl_schedule_node *node, \n  struct autosa_array_ref_group *group,\n  struct autosa_kernel *kernel,\n  struct autosa_gen *gen,\n  int read);\n\nvoid print_code(struct autosa_gen *gen, __isl_take isl_schedule *schedule, const char *output_f);\nvoid dump_intermediate_code(\n  struct autosa_gen *gen, __isl_take isl_schedule *schedule, const char *stage);\n\n#endif"
  },
  {
    "path": "src/autosa_comm.cpp",
    "content": "/* Define functions for communication management. */\n\n#include <isl/ilp.h>\n\n#include \"autosa_schedule_tree.h\"\n#include \"autosa_utils.h\"\n#include \"autosa_print.h\"\n#include \"autosa_codegen.h\"\n#include \"autosa_comm.h\"\n\n/* Internal data structure for autosa_group_references.\n */\nstruct autosa_group_data\n{\n  struct autosa_gen *gen;\n  struct ppcg_scop *scop;\n  /* The schedule depth where the kernel launch will be \n   * introduced.\n   */\n  int kernel_depth;\n  /* The schedule depth at which the copying in/from local_memory\n   * is computed. The copy operation may then later\n   * be hoisted to a higher level.\n   */\n  int local_depth;\n  /* The schedule depth of \"pe\" mark. */\n  int pe_depth;\n  isl_schedule *schedule;\n\n  /* All the schedules are formulated in terms of the original statement\n   * instances, i.e., those that appear in the domains of the access \n   * relations. \n   */\n  /* Contains the kernel_depth dimensions of the host schedule. */\n  isl_union_map *host_sched;\n  /* Contains the first local_depth dimensions of the kernel schedule. */\n  isl_union_map *local_sched;\n  /* Contains the first local_depth dimensions of the kernel schedule. */\n  isl_union_map *copy_sched;\n  /* Contains the first pe_depth dimensions of the kernel schedule. */\n  isl_union_map *pe_sched;\n  /* A union map representation of the entire kernel schedule. */\n  isl_union_map *full_sched;\n};\n\n/* Return the prefix schedule at \"node\" as a relation\n * between domain elements and schedule dimensions after detecting\n * equalities in this relation.\n */\nstatic __isl_give isl_union_map *prefix_with_equalities(\n    __isl_keep isl_schedule_node *node)\n{\n  isl_union_map *schedule;\n\n  schedule = isl_schedule_node_get_prefix_schedule_relation(node);\n  /* Simplify. */\n  schedule = isl_union_map_detect_equalities(schedule);\n\n  return schedule;\n}\n\n/* Expand the domain of the schedule \"s\" by plugging in\n * the contraction \"contraction\" and return the result.\n */\nstatic isl_union_map *expand(__isl_take isl_union_map *s,\n                             __isl_keep isl_union_pw_multi_aff *contraction)\n{\n  contraction = isl_union_pw_multi_aff_copy(contraction);\n  s = isl_union_map_preimage_domain_union_pw_multi_aff(s, contraction);\n  return s;\n}\n\n/* Fill up the groups of array with singleton groups, i.e., one group\n * per reference, initializing all the necessary fields.\n * In particular the access field is initialized to the scheduled\n * access relation of the array reference.\n *\n * Return the number of elements initialized, i.e., the number of\n * active references in the current kernel.\n */\nstatic int populate_array_references_pe(struct autosa_local_array_info *local,\n                                        struct autosa_array_ref_group **groups, struct autosa_group_data *data)\n{\n  int i;\n  int j;\n  int n;\n  isl_ctx *ctx = isl_union_map_get_ctx(data->pe_sched);\n\n  n = 0;\n  for (i = 0; i < local->array->n_ref; ++i)\n  {\n    isl_union_map *umap;\n    isl_map *map;\n    struct autosa_array_ref_group *group;\n    struct autosa_stmt_access *access = local->array->refs[i];\n\n    map = isl_map_copy(access->access);\n    umap = isl_union_map_from_map(map);\n    umap = isl_union_map_apply_domain(umap,\n                                      isl_union_map_copy(data->pe_sched));\n\n    if (isl_union_map_is_empty(umap))\n    {\n      isl_union_map_free(umap);\n      continue;\n    }\n\n    map = isl_map_from_union_map(umap);\n    map = isl_map_detect_equalities(map);\n    \n    group = new autosa_array_ref_group;\n    group = autosa_array_ref_group_init(group);\n    if (!group)\n    {\n      isl_map_free(map);\n      return -1;\n    }\n    group->local_array = local;\n    group->array = local->array;\n    group->access = map;\n    group->write = access->write;\n    group->exact_write = access->exact_write;\n    group->slice = access->n_index < local->array->n_index;\n    group->refs = &local->array->refs[i];\n    group->n_ref = 1;\n    group->io_type = AUTOSA_UNKNOWN_IO;\n    group->dir = NULL;\n    group->old_dir = NULL;\n    group->group_type = AUTOSA_PE_GROUP;\n    group->local_tile = NULL;\n    group->io_trans = NULL;\n    group->io_pe_expr = NULL;\n    group->n_io_buffer = 0;\n    group->io_buffers = NULL;\n    group->copy_schedule = NULL;\n    group->pe_tile = NULL;\n    group->tuning_refs.push_back(std::shared_ptr<TPArrayRef>(local->array->tuning_refs[i]));\n    group->tuning_pe_tile = NULL;\n\n    groups[n++] = group;\n  }\n\n  return n;\n}\n\n/* Combine the given two groups into a single group, containing\n * the references of both groups.\n */\nstatic struct autosa_array_ref_group *join_groups(\n    struct autosa_array_ref_group *group1,\n    struct autosa_array_ref_group *group2)\n{\n  int i, j;\n  isl_ctx *ctx;\n  struct autosa_array_ref_group *group;\n\n  if (!group1 || !group2)\n    return NULL;\n\n  ctx = isl_map_get_ctx(group1->access);\n  //group = isl_calloc_type(ctx, struct autosa_array_ref_group);\n  group = new autosa_array_ref_group;\n  group = autosa_array_ref_group_init(group);\n  if (!group)\n    return NULL;\n  group->local_array = group1->local_array;\n  group->array = group1->array;\n  group->access = isl_map_union(isl_map_copy(group1->access),\n                                isl_map_copy(group2->access));\n  group->write = group1->write || group2->write;\n  group->exact_write = group1->exact_write && group2->exact_write;\n  group->slice = group1->slice || group2->slice;\n  //group->n_ref = group1->n_ref + group2->n_ref;\n  //group->refs = isl_alloc_array(ctx, struct autosa_stmt_access *,\n  //                              group->n_ref);\n  //if (!group->refs)\n  //  return autosa_array_ref_group_free(group);  \n  group->n_ref = group1->n_ref;\n  group->refs = isl_alloc_array(ctx, struct autosa_stmt_access *,\n                                group->n_ref);\n  if (!group->refs)                                     \n    return autosa_array_ref_group_free(group);\n  for (i = 0; i < group1->n_ref; ++i) {\n    group->refs[i] = group1->refs[i];\n    group->tuning_refs.push_back(std::shared_ptr<TPArrayRef>(group1->tuning_refs[i]));\n  }\n  /* Compare if the refs equals */      \n  for (i = 0; i < group2->n_ref; ++i) {\n    struct autosa_stmt_access *ref = group2->refs[i];\n    bool found = false;\n    for (j = 0; j < group1->n_ref; j++) {\n      if (isl_map_is_equal(ref->tagged_access, group1->refs[j]->tagged_access)) {\n        found = true;\n        break;\n      }\n    }\n    if (!found) {\n      group->n_ref++;\n      group->refs = (struct autosa_stmt_access **)realloc(group->refs,\n                        group->n_ref * sizeof(struct autosa_stmt_access *));      \n      group->refs[group->n_ref - 1] = group2->refs[i];\n      group->tuning_refs.push_back(std::shared_ptr<TPArrayRef>(group2->tuning_refs[i]));\n    }\n  }\n\n  group->io_type = group1->io_type;\n  group->dir = isl_vec_copy(group1->dir);\n  group->group_type = group1->group_type;\n  group->pe_io_dir = group1->pe_io_dir;\n  group->array_io_dir = group1->array_io_dir;\n  group->io_trans = group1->io_trans;\n  group->io_pe_expr = group1->io_pe_expr;\n  group->io_L1_pe_expr = group1->io_L1_pe_expr;\n  group->n_io_buffer = group1->n_io_buffer;\n  group->io_buffers = group1->io_buffers;\n  group->n_mem_ports = group1->n_mem_ports;\n  group->local_tile = NULL;\n  group->pe_tile = NULL;\n  /* Merge the tuning refs */\n  for (auto ref : group1->tuning_refs) {\n    group->tuning_refs.push_back(std::shared_ptr<TPArrayRef>(ref));\n  }\n\n  return group;\n}\n\n/* Combine the given two groups into a single group and free\n * the original two groups.\n */\nstatic struct autosa_array_ref_group *join_groups_and_free(\n    struct autosa_array_ref_group *group1,\n    struct autosa_array_ref_group *group2)\n{\n  struct autosa_array_ref_group *group;\n\n  group = join_groups(group1, group2);  \n  autosa_array_ref_group_free(group1);\n  autosa_array_ref_group_free(group2);\n  return group;\n}\n\nstatic void set_array_groups_default(struct autosa_local_array_info *array,\n                                     int n, struct autosa_array_ref_group **groups)\n{\n  int i;\n\n  array->n_group = n;\n  array->groups = groups;\n\n  for (i = 0; i < n; ++i)\n    groups[i]->nr = i;\n}\n\n/* Default grouping. Simply group all array references together\n * if any of them is associated with RAW/RAR carried by space loops.\n */\nstatic int group_array_references_default(struct autosa_kernel *kernel,\n                                          struct autosa_local_array_info *local, struct autosa_group_data *data)\n{\n  int i, j;\n  int n;\n  isl_ctx *ctx = isl_union_map_get_ctx(data->pe_sched);\n  struct autosa_array_ref_group **groups;\n  int merge_all = 0;\n  isl_schedule_node *node;\n\n  groups = isl_calloc_array(ctx, struct autosa_array_ref_group *,\n                            local->array->n_ref);  \n  if (!groups)\n    return -1;\n\n  n = populate_array_references_pe(local, groups, data);\n\n  /* Examine if any of the array references is associated with RAW or\n   * RAR carried at space loops. If then, merge all the groups. \n   */\n  for (int i = 0; i < n; ++i)\n  {\n    struct autosa_array_ref_group *group_i = groups[i];\n    for (int j = 0; j < group_i->n_ref; ++j)\n    {\n      struct autosa_stmt_access *ref_i = group_i->refs[j];\n      for (int k = 0; k < ref_i->n_io_info; ++k)\n      {\n        if (ref_i->io_info[k]->dep->type == AUTOSA_DEP_RAW)\n        {\n          merge_all = 1;\n          break;\n        }\n      }\n    }\n  }\n\n  if (merge_all)\n  {\n    /* Join all referneces together. */\n    for (int i = 1; i < n; ++i)\n    {      \n      groups[0] = join_groups_and_free(groups[0], groups[i]);\n    }\n    n = 1;\n  }\n\n  set_array_groups_default(local, n, groups);\n\n  return 0;\n}\n\n/* Return the union of all read (read = 1) and/or write (write = 1)\n * access relations in the group.\n */\n__isl_give isl_union_map *autosa_array_ref_group_access_relation(\n    struct autosa_array_ref_group *group, int read, int write)\n{\n  int i;\n  isl_union_map *access;\n\n  access = isl_union_map_empty(isl_map_get_space(group->access));\n  for (i = 0; i < group->n_ref; ++i)\n  {\n    isl_map *map_i;\n\n    if (!((read && group->refs[i]->read) ||\n          (write && group->refs[i]->write)))\n      continue;\n    map_i = isl_map_copy(group->refs[i]->access);\n    access = isl_union_map_union(access,\n                                 isl_union_map_from_map(map_i));\n  }\n\n  return access;\n}\n\n/* Map the domain of \"access\" to the outer data->pe_depth\n * schedule dimensions.   \n */\nstatic __isl_give isl_map *local_access_pe(struct autosa_array_ref_group *group,\n                                           __isl_keep isl_union_map *access, struct autosa_group_data *data)\n{\n  isl_union_map *local;\n\n  local = isl_union_map_copy(access);\n  /* Group at the PE level. */\n  local = isl_union_map_apply_domain(local,\n                                     isl_union_map_copy(data->pe_sched));\n  return isl_map_from_union_map(local);\n}\n\n/* Given an array access \"access\", check if for any index i there is\n * a shift a(p) and a stride g such that\n *\n *\ta(p) + i = 0 mod g\n *\n * If so, record the information in tile->bound[i]->stride and\n * tile->bound[i]->shift.\n * Otherwise, set tile->bound[i]->stride to 1 (and tile->bound[i]->shift to 0).\n * Return isl_bool_true if any non-trivial stride was found.\n *\n * Note that the stride info returned by isl_map_get_range_stride_info\n * is of the form\n *\n *\ti = o(p) + g n\n *\n * a(p) can therefore be taken to be equal to -o(p).\n */\nstatic isl_bool detect_strides(struct autosa_array_tile *tile,\n                               __isl_keep isl_map *access)\n{\n  int i;\n  isl_bool has_strides = isl_bool_false;\n\n  for (i = 0; i < tile->n; ++i)\n  {\n    struct autosa_array_bound *bound = &tile->bound[i];\n    isl_stride_info *si;\n\n    si = isl_map_get_range_stride_info(access, i);\n    bound->stride = isl_stride_info_get_stride(si);\n    bound->shift = isl_aff_neg(isl_stride_info_get_offset(si));\n    isl_stride_info_free(si);\n\n    if (!has_strides)\n      has_strides = isl_val_gt_si(bound->stride, 1);\n    if (has_strides < 0)\n      return isl_bool_error;\n  }\n\n  return has_strides;\n}\n\n/* Given an array access \"access\", remove the strides based\n * on the information in tile->bound[i]->stride and tile->bound[i]->shift.\n *\n * In particular let the access be A[a] and\n * let the shifts s_i(p) and the strides g_i be such that\n *\n *  S(p) + a = 0 mod G\n *\n * Replace the access by\n *\n *  A[(a + S(p))/G]\n *\n * First collect the shifts s_i into an isl_multi_aff and\n * the strides into the scaling function A[i] -> A[G i].\n * Then add the shifts to the original access and\n * take the preimage over the scaling.\n */\nstatic __isl_give isl_map *remove_strides(__isl_take isl_map *access,\n                                          struct autosa_array_tile *tile)\n{\n  int i;\n  isl_space *space;\n  isl_multi_aff *shift, *scale;\n  isl_multi_val *stride;\n\n  space = isl_map_get_space(access);\n  shift = isl_multi_aff_zero(isl_space_copy(space));\n  space = isl_space_range(space);\n  stride = isl_multi_val_zero(isl_space_copy(space));\n  scale = isl_multi_aff_identity(isl_space_map_from_set(space));\n  for (i = 0; i < tile->n; ++i)\n  {\n    struct autosa_array_bound *bound = &tile->bound[i];\n    isl_aff *shift_i;\n    isl_val *stride_i;\n\n    shift_i = isl_aff_copy(bound->shift);\n    stride_i = isl_val_copy(bound->stride);\n    shift = isl_multi_aff_set_aff(shift, i, shift_i);\n    stride = isl_multi_val_set_val(stride, i, stride_i);\n  }\n  scale = isl_multi_aff_scale_multi_val(scale, stride);\n\n  access = isl_map_sum(access, isl_map_from_multi_aff(shift));\n  access = isl_map_preimage_range_multi_aff(access, scale);\n\n  return access;\n}\n\n/* Check if we can find a memory tile for the given array\n * based on the given accesses, and if so, put the results in \"tile\".\n *\n * We project the accesses on each index in turn and look for a parametric\n * offset such that the size is constant, after removing\n * any stride that may appear in the accesses.\n *\n * tile->depth is initialized to the input dimension of the computed bounds.\n */\nisl_bool can_tile(__isl_keep isl_map *access,\n                  struct autosa_array_tile *tile)\n{\n  int i;\n  isl_bool has_strides, valid;\n  isl_fixed_box *box;\n  isl_multi_aff *offset;\n  isl_multi_val *size;\n\n  if (!tile)\n    return isl_bool_error;\n\n  isl_map_free(isl_map_detect_equalities(isl_map_copy(access)));\n\n  has_strides = detect_strides(tile, access);\n  if (has_strides < 0)\n    return isl_bool_error;\n\n  tile->depth = isl_map_dim(access, isl_dim_in);\n\n  access = isl_map_copy(access);\n  if (has_strides)\n    access = remove_strides(access, tile);\n\n  box = isl_map_get_range_simple_fixed_box_hull(access);\n  isl_map_free(access);\n\n  valid = isl_fixed_box_is_valid(box);\n  if (valid >= 0 && valid)\n  {\n    offset = isl_fixed_box_get_offset(box);\n    size = isl_fixed_box_get_size(box);\n    for (i = 0; i < tile->n; ++i)\n    {\n      tile->bound[i].size = isl_multi_val_get_val(size, i);\n      tile->bound[i].lb = isl_multi_aff_get_aff(offset, i);\n    }\n    isl_multi_aff_free(offset);\n    isl_multi_val_free(size);\n  }\n  isl_fixed_box_free(box);\n\n  return valid;\n}\n\nstruct check_contraction_data {\n  bool legal;\n  struct autosa_array_ref_group *group;\n  struct autosa_kernel *kernel;\n  isl_union_map *prefix;\n  isl_union_pw_multi_aff *prefix_upma;\n  int depth;\n};\n\nstruct check_stmt_contain_acc_data {\n  struct autosa_kernel *kernel;\n  struct autosa_array_ref_group *group;\n};\n\n/* Test if the current statement with the domain \"set\" contains the array access\n * in the current array group. \n */\nstatic isl_bool check_stmt_contain_acc(__isl_keep isl_set *set, void *user)\n{\n  isl_space *space;\n  isl_id *id;\n  struct autosa_stmt *stmt;\n  struct check_stmt_contain_acc_data *data = (struct check_stmt_contain_acc_data *)user;\n  struct autosa_stmt_access *accesses, *access;\n\n  space = isl_set_get_space(set);\n  id = isl_space_get_tuple_id(space, isl_dim_set);\n  isl_space_free(space);\n  stmt = find_stmt(data->kernel->prog, id);\n  isl_id_free(id);\n  accesses = stmt->accesses;\n\n  for (access = accesses; access; access = access->next)\n  {\n    //if (access == data->group->refs[0])\n    //{\n    //  return isl_bool_false;\n    //}\n    for (int i = 0; i < data->group->n_ref; i++) {\n      if (access == data->group->refs[i])\n        return isl_bool_false;\n    }\n  }\n\n  return isl_bool_true;\n}\n\n/* Check if the pe_group is mapped to a single register.\n * Specifically, check for each array access in the current pe_group, \n * if all the loops above the array access and below the PE mark are\n * parallel loops.\n */\nstatic __isl_give isl_schedule_node *check_contraction(\n  __isl_take isl_schedule_node *node, void *user)\n{\n  struct check_contraction_data *data = (struct check_contraction_data *)user;\n  isl_union_set *domain;\n  isl_bool not_contain_acc;\n  struct check_stmt_contain_acc_data check_data;\n  isl_schedule_node *tmp_node;\n  isl_ctx *ctx = isl_schedule_node_get_ctx(node);\n\n  //DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_leaf)\n    return node;\n\n  if (!data->legal)\n    return node;\n\n  /* Test if the statement contains the access from the group. */\n  domain = isl_schedule_node_get_domain(node);\n  check_data.kernel = data->kernel;\n  check_data.group = data->group;\n  not_contain_acc = isl_union_set_every_set(domain, &check_stmt_contain_acc, &check_data);\n  isl_union_set_free(domain);  \n\n  /* Then check if all the loops above the statement until the PE mark are parallel loops. */\n  tmp_node = isl_schedule_node_copy(node);\n  if (!not_contain_acc) {    \n    isl_schedule_node *tmp_node2;\n\n    /* If the node is under SIMD, we will move up to the \"SIMD\" mark, and \n     * compute the tiling at this level.\n     */\n    int is_simd;\n    is_simd = is_node_under_simd(tmp_node);\n    if (is_simd) {\n      tmp_node = autosa_tree_move_up_to_mark(tmp_node, \"simd\");      \n    }\n\n    tmp_node2 = isl_schedule_node_copy(tmp_node);\n\n    /* Check if all band nodes above are parallel loops. */    \n    while (!(autosa_tree_node_is_mark(tmp_node, \"pe\"))) {    \n      if (isl_schedule_node_get_type(tmp_node) == isl_schedule_node_band) {\n        int dim = isl_schedule_node_band_n_member(tmp_node);\n        for (int i = 0; i < dim; i++) {\n          if (!isl_schedule_node_band_member_get_coincident(tmp_node, i)) {\n            data->legal = false;\n            break;\n          }\n        }\n      }\n      tmp_node = isl_schedule_node_parent(tmp_node);\n    }\n\n    if (data->prefix == NULL) {\n      data->prefix = isl_schedule_node_get_prefix_schedule_union_map(tmp_node2);\n      data->prefix_upma = isl_schedule_node_get_prefix_schedule_union_pw_multi_aff(tmp_node2);\n      data->depth = isl_schedule_node_get_schedule_depth(tmp_node2);\n    } else {\n      /* Find the depth that shares the same prefix schedule with the current one. */\n      /* Lift the node until it reaches a scheduling depth no greater than data->depth. */\n      while (isl_schedule_node_get_schedule_depth(tmp_node2) > data->depth)\n        tmp_node2 = isl_schedule_node_parent(tmp_node2);\n      if (isl_schedule_node_get_schedule_depth(tmp_node2) < data->depth) {\n        /* Lower the node until the scheduling depth equals to the data->depth */                  \n        tmp_node2 = isl_schedule_node_band_split(tmp_node2, \n                      data->depth - isl_schedule_node_get_schedule_depth(tmp_node2));\n        tmp_node2 = isl_schedule_node_child(tmp_node2, 0);\n      }\n\n      /* Lift the node until it achieves the same prefix schedule with the data->prefix. */\n      isl_union_map *tmp_prefix = isl_schedule_node_get_prefix_schedule_union_map(tmp_node2);\n      int tmp_depth = isl_schedule_node_get_schedule_depth(tmp_node2);      \n      isl_set *tmp_prefix_range = isl_set_from_union_set(isl_union_map_range(tmp_prefix));\n      isl_set *prefix_range = isl_set_from_union_set(isl_union_map_range(isl_union_map_copy(data->prefix)));\n      \n      //DBGUSET(stdout, prefix_range, ctx);\n\n      int common_depth = 0;\n      for (common_depth = 0; common_depth < tmp_depth; common_depth++) {\n        isl_set *tmp_range = isl_set_project_out(isl_set_copy(tmp_prefix_range), isl_dim_set, common_depth, tmp_depth - common_depth);\n        isl_set *range = isl_set_project_out(isl_set_copy(prefix_range), isl_dim_set, common_depth, tmp_depth - common_depth);\n        isl_set *diff = isl_set_subtract(tmp_range, range);\n        if (!isl_set_is_empty(diff)) {\n          common_depth--;\n          isl_set_free(diff);\n          break;\n        }\n        isl_set_free(diff);\n      }\n      isl_set_free(tmp_prefix_range);\n      isl_set_free(prefix_range);\n\n      /* Lift the node until if reaches common_depth */\n      while (isl_schedule_node_get_schedule_depth(tmp_node2) > common_depth) {\n        tmp_node2 = isl_schedule_node_parent(tmp_node2);\n      }\n      if (isl_schedule_node_get_schedule_depth(tmp_node2) < common_depth) {\n        tmp_node2 = isl_schedule_node_band_split(tmp_node2, \n                      common_depth - isl_schedule_node_get_schedule_depth(tmp_node2));\n        tmp_node2 = isl_schedule_node_child(tmp_node2, 0);\n      }\n \n      /* Update the scheduling information */      \n      isl_union_map_free(data->prefix);\n      isl_union_pw_multi_aff_free(data->prefix_upma);\n      data->prefix = isl_schedule_node_get_prefix_schedule_union_map(tmp_node2);\n      data->prefix_upma = isl_schedule_node_get_prefix_schedule_union_pw_multi_aff(tmp_node2);\n      data->depth = isl_schedule_node_get_schedule_depth(tmp_node2);\n    }    \n    isl_schedule_node_free(tmp_node2);\n  }\n  isl_schedule_node_free(tmp_node);\n\n  return node;\n}\n\n/* Compute the tiling of the group at the PE level.\n * If array_contraction is enabled, check if all loops under the PE mark\n * and before the SIMD marks are parallel loops. \n * If so, contract the local tile to a single register.\n */\nstatic isl_stat compute_group_bounds_core_pe(struct autosa_kernel *kernel,\n                                             struct autosa_array_ref_group *group, struct autosa_group_data *data)\n{\n  isl_ctx *ctx = isl_space_get_ctx(group->array->space);\n  int use_local = kernel->options->autosa->use_local_memory;\n  isl_stat r = isl_stat_ok;\n  isl_union_map *access;\n  isl_map *acc;\n  isl_bool ok;\n\n  if (!use_local)\n    return isl_stat_ok;\n  if (autosa_array_is_read_only_scalar(group->array))\n    return isl_stat_ok;\n  if (!group->exact_write)\n    return isl_stat_ok;\n  if (group->slice)\n    return isl_stat_ok;\n\n  /* Collect all accesses in the group. */\n  access = autosa_array_ref_group_access_relation(group, 1, 1);\n  /* Create local tile */\n  if (use_local)\n  {\n    struct check_contraction_data contract_data;\n    isl_schedule_node *node;        \n    contract_data.legal = false;\n    contract_data.prefix = NULL;\n    contract_data.prefix_upma = NULL;\n\n    /* Create a tile. */\n    group->local_tile = autosa_array_tile_create(ctx,\n                                                 group->array->n_index);\n\n    /* Check if array contraction is possible. */\n    if ((kernel->options->autosa->local_reduce && kernel->options->autosa->array_contraction) ||\n       (kernel->options->autosa->tuning_method == 1 && kernel->options->autosa->array_contraction)) {      \n      contract_data.group = group;\n      contract_data.kernel = kernel;\n      contract_data.legal = true;\n      contract_data.prefix = NULL;\n      contract_data.prefix_upma = NULL;\n      contract_data.depth = -1;      \n      node = isl_schedule_get_root(kernel->schedule);\n      node = autosa_tree_move_down_to_pe(node, kernel->core);      \n      node = isl_schedule_node_map_descendant_bottom_up(node, &check_contraction, &contract_data);\n      isl_schedule_node_free(node);      \n    }\n    \n    if (contract_data.legal) {\n      /* We are able to create a register tiling. */      \n      acc = isl_map_from_union_map(isl_union_map_apply_domain(isl_union_map_copy(access), \n                                                              isl_union_map_copy(contract_data.prefix)));\n      group->copy_schedule_dim = contract_data.depth;\n      group->copy_schedule = contract_data.prefix_upma;\n      group->copy_schedule = isl_union_pw_multi_aff_pullback_union_pw_multi_aff(group->copy_schedule,\n                                                                                isl_union_pw_multi_aff_copy(kernel->contraction));\n    } else {\n      isl_union_pw_multi_aff_free(contract_data.prefix_upma);\n      /* Map the domain to the outer scheduling dimensions */\n      acc = local_access_pe(group, access, data);  \n      node = isl_schedule_get_root(kernel->schedule);\n      node = autosa_tree_move_down_to_pe(node, kernel->core);\n      if (kernel->options->autosa->tuning_method == 1)\n        group->tuning_local_tile = TP_infer_tiled_array(data->gen, kernel, node, group, 1, 1);\n      isl_schedule_node_free(node);\n    }\n    if (contract_data.prefix) \n      isl_union_map_free(contract_data.prefix);\n\n    /* Collect the shift and scale factors of the tile. */\n    ok = can_tile(acc, group->local_tile);\n    if (ok < 0)\n      r = isl_stat_error;\n    else if (!ok)\n      group->local_tile =\n          autosa_array_tile_free(group->local_tile);\n    isl_map_free(acc);\n  }\n\n  if (r < 0)\n  {\n    isl_union_map_free(access);\n    return r;\n  }\n\n  isl_union_map_free(access);\n  return isl_stat_ok;\n}\n\n/* Internal struct for compute_group_bounds_core_pe_acc. */\nstruct compute_local_tile_acc_data\n{\n  struct autosa_kernel *kernel;\n  struct autosa_array_ref_group *group;\n  int depth;\n  isl_union_map *prefix;\n  isl_union_pw_multi_aff *prefix_upma;\n  int status;\n};\n\n/* Examine the schedule depth and prefix schedule used to calculated the \n * register tiling. Specifically, if the access is under the SIMD loop,\n * we will move up to the \"SIMD\" mark and compute tiling at this level.\n * Otherwise, we will compute the tiling at the statement level.\n * In addition, if the access is found in more than one loop, we will \n * not create register tiling. Instead, we create a local buffer at the PE level.\n */\nstatic __isl_give isl_schedule_node *compute_local_tile_acc(\n    __isl_take isl_schedule_node *node, void *user)\n{\n  struct compute_local_tile_acc_data *data = (struct compute_local_tile_acc_data *)user;\n  struct autosa_array_ref_group *group = data->group;\n  struct autosa_stmt_access *acc = group->refs[0];\n  isl_union_set *domain;\n  isl_union_map *prefix;\n  isl_union_pw_multi_aff *prefix_upma;\n  isl_bool not_contain_acc;\n  int depth;\n  struct check_stmt_contain_acc_data check_data;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_leaf)\n    return node;\n\n  /* Test if the statement contains the access. */\n  domain = isl_schedule_node_get_domain(node);\n  check_data.kernel = data->kernel;\n  check_data.group = data->group;\n  not_contain_acc = isl_union_set_every_set(domain, &check_stmt_contain_acc, &check_data);\n  isl_union_set_free(domain);\n\n  if (!not_contain_acc)\n  {\n    int is_simd;\n    is_simd = is_node_under_simd(node);\n    if (is_simd)\n    {\n      /* If the node is under SIMD, we will move up to the \"SIMD\" mark, and \n       * compute the tiling at this level. \n       */\n      isl_schedule_node *new_node;\n\n      new_node = isl_schedule_node_copy(node);\n      new_node = autosa_tree_move_up_to_mark(new_node, \"simd\");\n      prefix = isl_schedule_node_get_prefix_schedule_union_map(new_node);\n      prefix_upma = isl_schedule_node_get_prefix_schedule_union_pw_multi_aff(new_node);\n      depth = isl_schedule_node_get_schedule_depth(new_node);\n      isl_schedule_node_free(new_node);\n    }\n    else\n    {\n      prefix = isl_schedule_node_get_prefix_schedule_union_map(node);\n      prefix_upma = isl_schedule_node_get_prefix_schedule_union_pw_multi_aff(node);\n      depth = isl_schedule_node_get_schedule_depth(node);\n    }\n    if (data->depth == -1)\n    {\n      data->depth = depth;\n      data->prefix = prefix;\n      data->prefix_upma = prefix_upma;\n      data->status = 1;\n    }\n    else\n    {\n      /* The array reference is found in more than one loop. \n       * We will compute the tiling at the PE level. \n       */\n      isl_union_map_free(prefix);\n      isl_union_pw_multi_aff_free(prefix_upma);\n      data->status = 0;\n    }\n  }\n\n  return node;\n}\n\n/* Compute the tiling of the group at the statement level.\n */\nstatic isl_stat compute_group_bounds_core_pe_acc(struct autosa_kernel *kernel,\n                                                 struct autosa_array_ref_group *group, struct autosa_group_data *data)\n{\n  isl_ctx *ctx = isl_space_get_ctx(group->array->space);\n  int use_local = kernel->options->autosa->use_local_memory;\n  isl_stat r = isl_stat_ok;\n  isl_union_map *access;\n  isl_map *acc;\n  isl_bool ok;\n  isl_schedule_node *node;\n\n  if (!use_local)\n    return isl_stat_ok;\n  if (autosa_array_is_read_only_scalar(group->array))\n    return isl_stat_ok;\n  if (!group->exact_write)\n    return isl_stat_ok;\n  if (group->slice)\n    return isl_stat_ok;\n\n  /* Collect all accesses in the group */\n  access = autosa_array_ref_group_access_relation(group, 1, 1);\n  /* Create local tile */\n  if (use_local)\n  {\n    struct compute_local_tile_acc_data tile_data;\n\n    tile_data.kernel = kernel;\n    tile_data.group = group;\n    tile_data.status = 0;\n    tile_data.depth = -1;\n    tile_data.prefix = NULL;\n    /* Create a tile. */\n    group->local_tile = autosa_array_tile_create(ctx, group->array->n_index);\n    /* Map the domain to the outer scheduling dimensions */\n    node = isl_schedule_get_root(kernel->schedule);\n    node = autosa_tree_move_down_to_pe(node, kernel->core);\n    node = isl_schedule_node_map_descendant_bottom_up(node, &compute_local_tile_acc, &tile_data);\n    isl_schedule_node_free(node);\n    if (tile_data.status)\n    {\n      /* We are able to create a register tiling. */\n      acc = isl_map_from_union_map(isl_union_map_apply_domain(isl_union_map_copy(access),\n                                                              tile_data.prefix));\n      /* Update the copy schedule. */\n      group->copy_schedule_dim = tile_data.depth;\n      group->copy_schedule = tile_data.prefix_upma;\n      group->copy_schedule = isl_union_pw_multi_aff_pullback_union_pw_multi_aff(group->copy_schedule,\n                                                                                isl_union_pw_multi_aff_copy(kernel->contraction));\n    }\n    else\n    {\n      /* We will create the tiling at the PE level. */\n      acc = local_access_pe(group, access, data);\n      /* Update the copy schedule */\n      node = isl_schedule_get_root(kernel->schedule);\n      node = autosa_tree_move_down_to_pe(node, kernel->core);\n      group->copy_schedule_dim = isl_schedule_node_get_schedule_depth(node);\n      group->copy_schedule =\n          isl_schedule_node_get_prefix_schedule_union_pw_multi_aff(node);\n      group->copy_schedule = isl_union_pw_multi_aff_pullback_union_pw_multi_aff(\n          group->copy_schedule, isl_union_pw_multi_aff_copy(kernel->contraction));\n      isl_schedule_node_free(node);\n    }\n    /* Collect the shift and scale factors of the tile. */\n    ok = can_tile(acc, group->local_tile);\n    if (ok < 0)\n      r = isl_stat_error;\n    else if (!ok)\n      group->local_tile = autosa_array_tile_free(group->local_tile);\n    isl_map_free(acc);\n  }\n\n  if (r < 0)\n  {\n    isl_union_map_free(access);\n    return r;\n  }\n\n  isl_union_map_free(access);\n  return isl_stat_ok;\n}\n\n/* Compute the local memory tiles for the array\n * reference group \"group\" of array \"array\" and set the tile depth.\n * Return 0 on success and -1 on error.\n */\nstatic int compute_group_bounds_pe(struct autosa_kernel *kernel,\n                                   struct autosa_array_ref_group *group, struct autosa_group_data *data)\n{\n  if (!group)\n    return -1;\n  if (compute_group_bounds_core_pe(kernel, group, data) < 0)\n    return -1;\n\n  return 0;\n}\n\n/* Compute the register tiles for the array\n * reference group \"group\" of array \"array\" and set the tile depth.\n * Return 0 on success and -1 on error.\n */\nstatic int compute_group_bounds_pe_acc(struct autosa_kernel *kernel,\n                                       struct autosa_array_ref_group *group, struct autosa_group_data *data)\n{\n  if (!group)\n    return -1;\n  if (compute_group_bounds_core_pe_acc(kernel, group, data) < 0)\n    return -1;\n\n  return 0;\n}\n\n/* Set array->n_group and array->groups to n and groups.\n *\n * Additionally, set the \"nr\" field of each group.\n */\nstatic void set_array_groups_pe(struct autosa_local_array_info *array,\n                                int n, struct autosa_array_ref_group **groups)\n{\n  int i;\n\n  array->n_pe_group = n;\n  array->pe_groups = groups;\n\n  for (i = 0; i < n; ++i)\n    groups[i]->nr = i;\n}\n\n/* Populate the array reference groups with single array reference.\n * If any of the array reference is associated with RAW, the array reference\n * is from an internal array, we will merge all the array references into \n * one single group.\n * Otherwise, the array reference is from an external array, we do nothing\n * here. \n * For internal array, we compute the group tiling at the PE level.\n * For external array, we compute the group tiling at the statement level.\n * Return -1 on error.\n */\nstatic int group_array_references_pe(struct autosa_kernel *kernel,\n                                     struct autosa_local_array_info *local, struct autosa_group_data *data)\n{\n  int i, j;\n  int n;\n  isl_ctx *ctx = isl_union_map_get_ctx(data->pe_sched);\n  struct autosa_array_ref_group **groups;\n  int merge_all = 0;\n  isl_schedule_node *node;\n\n  groups = isl_calloc_array(ctx, struct autosa_array_ref_group *,\n                            local->array->n_ref);\n  if (!groups)\n    return -1;\n\n  n = populate_array_references_pe(local, groups, data);\n\n  /* Examine if any of the array references is associated with RAW. \n   * If then, merge all the groups. \n   */\n  for (int i = 0; i < n; ++i)\n  {\n    struct autosa_array_ref_group *group_i = groups[i];\n    for (int j = 0; j < group_i->n_ref; ++j)\n    {\n      struct autosa_stmt_access *ref_i = group_i->refs[j];\n      for (int k = 0; k < ref_i->n_io_info; ++k)\n      {\n        if (ref_i->io_info[k]->dep->type == AUTOSA_DEP_RAW)\n        {\n          merge_all = 1;\n          break;\n        }\n      }\n    }\n  }  \n\n  if (merge_all)\n  {\n    /* Join all referneces together. */\n    for (int i = 1; i < n; ++i)\n    {\n      groups[0] = join_groups_and_free(groups[0], groups[i]);\n    }\n    n = 1;\n  }\n\n  if (merge_all)\n  {\n    /* Internal array. */\n    for (i = 0; i < n; ++i)\n    {\n      if (compute_group_bounds_pe(kernel, groups[i], data) < 0)\n      {\n        for (j = 0; j < n; j++)\n        {\n          autosa_array_ref_group_free(groups[j]);\n        }\n        free(groups);\n        return -1;\n      }\n\n      if (groups[i]->copy_schedule_dim == 0) {\n        /* Update the copy schedule at the PE level */\n        node = isl_schedule_get_root(kernel->schedule);\n        node = autosa_tree_move_down_to_pe(node, kernel->core);\n        groups[i]->copy_schedule_dim = isl_schedule_node_get_schedule_depth(node);\n        groups[i]->copy_schedule =\n            isl_schedule_node_get_prefix_schedule_union_pw_multi_aff(node);\n        groups[i]->copy_schedule =\n            isl_union_pw_multi_aff_pullback_union_pw_multi_aff(groups[i]->copy_schedule,\n                                                               isl_union_pw_multi_aff_copy(kernel->contraction));\n        isl_schedule_node_free(node);\n      }\n    }\n  }\n  else\n  {\n    /* External array. \n     * We will build the tiling for each array access. */\n    for (i = 0; i < n; ++i)\n    {\n      if (compute_group_bounds_pe_acc(kernel, groups[i], data) < 0)\n      {\n        for (j = 0; j < n; j++)\n        {\n          autosa_array_ref_group_free(groups[j]);\n        }\n        free(groups);\n        return -1;\n      }\n    }\n  }\n\n  set_array_groups_pe(local, n, groups);\n\n  return 0;\n}\n\n/* Fill up the groups array with singleton groups, i.e., one group\n * per reference, initializing the array, access, write, n_ref and refs fields.\n * In particular the access field is initialized to the scheduled\n * access relation of the array reference.\n *\n * Return the number of elements initialized, i.e., the number of\n * active references in the current kernel.\n */\nstatic int populate_array_references_io(struct autosa_local_array_info *local,\n                                        struct autosa_array_ref_group **groups, struct autosa_group_data *data)\n{\n  int i;\n  int j;\n  int n;\n  isl_ctx *ctx = isl_union_map_get_ctx(data->pe_sched);\n\n  n = 0;\n  for (i = 0; i < local->array->n_ref; ++i)\n  {\n    for (j = 0; j < local->array->refs[i]->n_io_info; ++j)\n    {\n      if (!((local->array->refs[i]->io_info[j]->dep->type == AUTOSA_DEP_RAR) ||\n         (local->array->refs[i]->io_info[j]->dep->type == AUTOSA_DEP_RAW)))\n         continue;\n\n      isl_union_map *umap;\n      isl_map *map;\n      struct autosa_array_ref_group *group;\n      struct autosa_stmt_access *access = local->array->refs[i];\n\n      map = isl_map_copy(access->access);\n      umap = isl_union_map_from_map(map);\n      umap = isl_union_map_apply_domain(umap,\n                                        isl_union_map_copy(data->copy_sched));\n\n      if (isl_union_map_is_empty(umap))\n      {\n        isl_union_map_free(umap);\n        continue;\n      }\n\n      map = isl_map_from_union_map(umap);\n      map = isl_map_detect_equalities(map);\n\n      //group = isl_calloc_type(ctx, struct autosa_array_ref_group);\n      group = new autosa_array_ref_group;\n      group = autosa_array_ref_group_init(group);\n      if (!group)\n      {\n        isl_map_free(map);\n        return -1;\n      }\n      group->local_array = local;\n      group->array = local->array;\n      group->access = map; // not used\n      group->write = access->write;\n      group->exact_write = access->exact_write;\n      group->slice = access->n_index < local->array->n_index;\n      group->refs = &local->array->refs[i];\n      group->n_ref = 1;\n      group->io_type = access->io_info[j]->io_type;\n      group->dir = isl_vec_copy(access->io_info[j]->dir);\n      group->old_dir = isl_vec_copy(group->dir);\n      group->group_type = AUTOSA_IO_GROUP;\n      group->pe_io_dir = IO_NULL;\n      group->array_io_dir = IO_NULL;\n      group->io_trans = NULL;\n      group->io_pe_expr = NULL;\n      group->io_L1_pe_expr = NULL;\n      group->n_io_buffer = 0;\n      group->io_buffers = NULL;\n      group->copy_schedule = NULL;\n      group->pe_tile = NULL;\n      group->n_mem_ports = 1;\n      group->local_tile = NULL;\n      //std::cout << local->array->tuning_refs[i]->to_str() << std::endl;\n      group->tuning_refs.push_back(std::shared_ptr<TPArrayRef>(local->array->tuning_refs[i]));\n      group->tuning_pe_tile = NULL;\n\n      groups[n++] = group;\n    }\n  }\n\n  return n;\n}\n\n/* Examine if two groups share the same I/O modules:\n * - with the same I/O type\n * - with the same I/O direction\n */\nstatic int share_io(struct autosa_array_ref_group *group1,\n                    struct autosa_array_ref_group *group2)\n{\n  if (group1->io_type != group2->io_type)\n    return 0;\n\n  for (int i = 0; i < isl_vec_size(group1->dir); i++)\n  {\n    if (isl_vec_cmp_element(group1->dir, group2->dir, i))\n      return 0;\n  }\n\n  return 1;\n}\n\n/* If two groups have shared I/O (as determined by\n * the \"share\" function),\n * then merge the two groups into one.\n * TODO: If \"compute_bounds\" is set, then call compute_group_bounds\n * on the merged groups.\n *\n * Return the updated number of groups.\n * Return -1 on error.\n */\nstatic int group_io(struct autosa_kernel *kernel,\n                    int n, struct autosa_array_ref_group **groups,\n                    int (*share)(struct autosa_array_ref_group *group1,\n                                 struct autosa_array_ref_group *group2),\n                    int compute_bounds,\n                    struct autosa_group_data *data)\n{\n  int i, j;\n\n  for (i = 0; i < n; ++i)\n  {\n    for (j = n - 1; j > i; --j)\n    {\n      if (!share(groups[i], groups[j]))\n        continue;\n\n      groups[i] = join_groups_and_free(groups[i], groups[j]);\n      if (j != n - 1)\n        groups[j] = groups[n - 1];\n      groups[n - 1] = NULL;\n      n--;\n\n      if (!groups[i])\n        return -1;\n      //\t\t\tif (compute_bounds &&\n      //\t\t\t    compute_group_bounds_io(kernel, groups[i], data) < 0)\n      //\t\t\t\treturn -1;\n    }\n  }\n\n  return n;\n}\n\n/* If two groups share the same I/O type and I/O direction,\n * merge the two groups into one.\n *\n * Return the updated number of groups.\n */\nstatic int group_share_io(struct autosa_kernel *kernel,\n                          int n, struct autosa_array_ref_group **groups,\n                          struct autosa_group_data *data)\n{\n  return group_io(kernel, n, groups, &share_io, 0, data);\n}\n\n/* Perform interior I/O elimination.\n * Find the I/O group with interior I/O, and assign new data tranfer direction \n * at the PE level.\n * At present, we will assign the first dim to 1 by default.\n */\nstatic isl_stat autosa_interior_io_eliminate(\n    struct autosa_kernel *kernel, struct autosa_array_ref_group *group,\n    struct autosa_gen *gen, struct autosa_group_data *data)\n{\n  if (isl_vec_is_zero(group->dir))\n  {\n    /* This group will generate interior I/O, which needs to be eliminated. \n     * By default, set the first dim to be 1. \n     * Hack: For LU, we set the the last dim to be 1. \n     * TODO: make it an option.\n     */\n    if (gen->options->autosa->int_io_dir == 0)\n      group->dir = isl_vec_set_element_si(group->dir, 0, 1);\n    else\n      group->dir = isl_vec_set_element_si(group->dir, isl_vec_size(group->dir) - 1, 1);\n\n    /* Update the array info */\n    for (int i = 0; i < group->n_ref; i++)\n    {\n      struct autosa_stmt_access *ref = group->refs[i];\n      for (int j = 0; j < ref->n_io_info; j++)\n      {\n        struct autosa_io_info *io_info = ref->io_info[j];\n        if (io_info->io_type == group->io_type && isl_vec_is_zero(io_info->dir))\n        {\n          isl_vec_free(io_info->dir);\n          io_info->dir = isl_vec_copy(group->dir);\n        }\n      }\n    }\n  }\n  return isl_stat_ok;\n}\n\n/* The \"node\" points to the current space band.\n * We will cluster it using the direction \"dir\".\n * Specifically, following the space-time transformation using projection and \n * scheduling vectors, we assign projection vector d = dir, scheduling vector\n * s = dir.\n * Next, we compose the new transformation matrix:\n * \n * T = [ P\n *      ---\n *       s ]\n * where PdT = 0.\n * \n * This new transformation matrix is applied to the space band.\n * We will return the transformaton matrix in \"io_trans_mat\" and \"io_trans_ma\".\n */\nstatic __isl_give isl_schedule_node *io_cluster(\n    __isl_take isl_schedule_node *node,\n    __isl_keep isl_vec *dir, isl_mat **io_trans_mat, isl_multi_aff **io_trans_ma)\n{\n  isl_multi_union_pw_aff *mupa;\n  isl_mat *trans_mat, *d_mat, *null_mat;\n  int space_dim;\n  isl_ctx *ctx;\n  isl_space *space;\n  isl_multi_aff *ma;\n  std::vector<TPIterator *> iters;\n\n  mupa = isl_schedule_node_band_get_partial_schedule(node);\n  space_dim = isl_schedule_node_band_n_member(node);\n  ctx = isl_schedule_node_get_ctx(node);\n\n  /* Store the tuning iters */\n  for (int i = 0; i < isl_schedule_node_band_n_member(node); i++) {\n    iters.push_back((TPIterator *)isl_schedule_node_band_member_get_iter(node, i));\n    //std::cout << \"io cluster: \" << iters[iters.size() - 1]->name << \", \" << \n    //    iters[iters.size() - 1]->space_time << std::endl;\n  }\n\n  /* Build the transformation matrix. */\n  trans_mat = isl_mat_alloc(ctx, space_dim, space_dim);\n  d_mat = isl_mat_alloc(ctx, 1, space_dim);\n  for (int i = 0; i < isl_vec_size(dir); i++)\n  {\n    d_mat = isl_mat_set_element_val(d_mat, 0, i,\n                                    isl_vec_get_element_val(dir, i));\n  }\n  null_mat = isl_mat_right_kernel(d_mat);  \n\n  for (int i = 0; i < isl_mat_cols(null_mat); i++)\n    for (int j = 0; j < isl_mat_rows(null_mat); j++)\n    {\n      trans_mat = isl_mat_set_element_val(trans_mat, i, j,\n                                          isl_mat_get_element_val(null_mat, j, i));\n    }\n  for (int i = 0; i < isl_vec_size(dir); i++)\n  {\n    trans_mat = isl_mat_set_element_val(trans_mat, isl_mat_cols(null_mat), i,\n                                        isl_vec_get_element_val(dir, i));\n  }\n  *io_trans_mat = trans_mat;\n\n  /* Convert the transformation matrix to multi_aff. */\n  space = isl_multi_union_pw_aff_get_space(mupa);\n  space = isl_space_map_from_set(space);\n  ma = isl_multi_aff_identity(space);\n\n  for (int i = 0; i < isl_mat_rows(trans_mat); i++)\n  {\n    isl_aff *aff = isl_multi_aff_get_aff(ma, i);\n    for (int j = 0; j < isl_mat_cols(trans_mat); j++)\n    {\n      isl_val *val = isl_mat_get_element_val(trans_mat, i, j);      \n      aff = isl_aff_set_coefficient_si(aff, isl_dim_in, j, isl_val_get_num_si(val));      \n      isl_val_free(val);\n    }\n    ma = isl_multi_aff_set_aff(ma, i, aff);\n  }\n\n  /* Apply the new transformation on the original partial schedule. */\n  mupa = isl_multi_union_pw_aff_apply_multi_aff(mupa, isl_multi_aff_copy(ma));\n  *io_trans_ma = ma;\n\n  node = isl_schedule_node_delete(node);\n  /* Insert the new partial schedule. */\n  node = isl_schedule_node_insert_partial_schedule(node, mupa);\n  /* Add back the tuning iterators.\n   * Since all the io dirs are unit vectors, which means only loop permutation is \n   * allowed, we simply swap the iter infos.\n   */\n  std::vector<int> swap_index;  \n  for (int i = 0; i < isl_mat_rows(*io_trans_mat); i++) {\n    int tmp = 0;\n    for (int j = 0; j < isl_mat_cols(*io_trans_mat); j++) {\n      isl_val *val_tmp = isl_mat_get_element_val(*io_trans_mat, i, j);\n      tmp += isl_val_get_num_si(val_tmp) * j;\n      isl_val_free(val_tmp);\n    }\n    swap_index.push_back(tmp);\n  }\n  // Restore the loop iterators\n  for (int i = 0; i < isl_schedule_node_band_n_member(node); i++) {\n    //std::cout << \"swapped iter: \" << iters[swap_index[i]]->name << std::endl;\n    node = isl_schedule_node_band_member_set_iter(node, i, iters[swap_index[i]]);\n  }\n\n  isl_mat_free(null_mat);\n\n  return node;\n}\n\nstatic isl_stat extract_set_max_dim(__isl_take isl_basic_set *bset, void *user)\n{\n  isl_val *val;\n  isl_val **max_val = (isl_val **)user;\n\n  val = isl_basic_set_dim_max_val(bset, 0);\n  if (isl_val_gt(val, *max_val))\n  {\n    isl_val_free(*max_val);\n    *max_val = val;\n  }\n  else\n  {\n    isl_val_free(val);\n  }\n\n  return isl_stat_ok;\n}\n\n/* Insert the global context for introducing the IO module identifiers. \n * The \"node\" points to the \"kernel\" mark.\n * Return the node at the same position.\n */\nstatic __isl_give isl_schedule_node *insert_io_module_context(\n  __isl_take isl_schedule_node *node,\n  struct autosa_array_ref_group *group,\n  struct autosa_gen *gen, struct autosa_kernel *kernel)\n{\n  int n_io_ids;\n  isl_id_list *io_ids;\n  isl_set *context;\n\n  n_io_ids = group->space_dim;\n  if (n_io_ids <= 0)\n    return node;\n  io_ids = ppcg_scop_generate_names(gen->prog->scop, n_io_ids, \"p\");\n  n_io_ids = 0;\n\n  /* Update the context. */\n  context = isl_set_universe(isl_set_get_space(kernel->context));\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n\n  while (!isl_schedule_node_is_io_mark(node, 1))\n  {\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n    {\n      isl_union_map *umap;\n      isl_union_set *uset;\n      isl_multi_pw_aff *size;\n      isl_id *id;\n      isl_id_list *ids;\n      isl_union_set *domain;\n      isl_union_pw_multi_aff *contraction;\n\n      umap = isl_schedule_node_band_get_partial_schedule_union_map(node);\n      domain = isl_schedule_node_get_domain(node);\n      contraction = isl_schedule_node_get_subtree_contraction(node);\n      domain = isl_union_set_preimage_union_pw_multi_aff(domain, contraction);\n      umap = isl_union_map_intersect_domain(umap, domain);\n      uset = isl_union_map_range(umap);\n      size = ppcg_size_from_extent(isl_set_from_union_set(uset));\n      ids = isl_id_list_from_id(isl_id_list_get_id(io_ids, n_io_ids));\n      n_io_ids++;\n      context = add_bounded_parameters_dynamic(context, size, ids);\n      isl_id_list_free(ids);\n      isl_multi_pw_aff_free(size);\n    }\n    node = isl_schedule_node_child(node, 0);\n  }\n  node = autosa_tree_move_up_to_kernel(node);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_context(node, context);\n  node = autosa_tree_move_up_to_kernel(node);\n\n  isl_id_list_free(io_ids);\n\n  return node;\n}\n\n/* Perform HBM/Multi-port DRAM optimization.\n */\nstatic __isl_give isl_schedule_node *hbm_optimize(\n    __isl_take isl_schedule_node *node,\n    isl_multi_aff **io_trans_ma,\n    struct autosa_kernel *kernel, struct autosa_array_ref_group *group,\n    struct autosa_gen *gen)\n{\n  isl_union_set *uset;\n  isl_set *set;\n  isl_basic_set *bset;\n  isl_union_map *umap;\n  isl_val *val;\n  isl_ctx *ctx = gen->ctx;\n  int tile_len = 1;\n  int *tile_size = NULL;\n  cJSON *hbm_json, *hbm_mode_json;\n  const char *hbm_mode;\n  isl_printer *p_str;\n  char *module_name;\n  int *ubs = NULL;\n\n  /* Parse the tuning configuration. */\n  hbm_json = cJSON_GetObjectItemCaseSensitive(gen->tuning_config, \"hbm\");\n  if (!hbm_json)\n  {\n    /* Default in auto mode. */\n    hbm_mode = \"auto\";\n  }\n  else\n  {\n    hbm_mode_json = cJSON_GetObjectItemCaseSensitive(hbm_json, \"mode\");\n    hbm_mode = hbm_mode_json->valuestring;\n  }\n\n  ubs = extract_band_upper_bounds(node);\n  if (!strcmp(hbm_mode, \"auto\"))\n  {\n    /* HBM optimization is set in AUTO mode. \n     * We will pick up the tiling factors by default.\n     */\n    tile_size = read_default_hbm_tile_sizes(kernel, tile_len);\n  }\n  else\n  {\n    /* HBM optimization is set in MANUAL mode. \n     * We will take the user specification to select the HBM factors.\n     */\n    char *name;\n    isl_printer *p_str;\n    p_str = isl_printer_to_str(ctx);\n    p_str = isl_printer_print_str(p_str, \"hbm_\");\n    p_str = autosa_array_ref_group_print_prefix(group, p_str);\n    name = isl_printer_get_str(p_str);\n    isl_printer_free(p_str);\n\n    tile_size = read_hbm_tile_sizes(kernel, tile_len, name);\n    if (!tile_size)\n    {\n      /* User hasn't specified the tiling factors for HBM optimization yet,\n       * we will dump out the number and upper bounds of the last-level IO loops\n       * and exit the program.\n       */\n\n      FILE *fp;\n      char *content;\n      cJSON *tuning, *hbm_json, *loops_json;\n      isl_printer *p_str;\n      char *tuning_path;\n\n      tuning = cJSON_CreateObject();\n      hbm_json = cJSON_CreateObject();\n      cJSON_AddItemToObject(tuning, name, hbm_json);\n      loops_json = cJSON_CreateArray();\n      cJSON_AddItemToObject(hbm_json, \"tilable_loops\", loops_json);\n      for (int i = 0; i < tile_len; i++)\n      {\n        cJSON *loop = cJSON_CreateNumber(ubs[i]);\n        cJSON_AddItemToArray(loops_json, loop);\n      }\n      p_str = isl_printer_to_str(ctx);\n      p_str = isl_printer_print_str(p_str, kernel->options->autosa->output_dir);\n      p_str = isl_printer_print_str(p_str, \"/tuning.json\");\n      tuning_path = isl_printer_get_str(p_str);\n      fp = fopen(tuning_path, \"w\");\n      content = cJSON_Print(tuning);\n      fprintf(fp, \"%s\", content);\n      cJSON_Delete(tuning);\n      isl_printer_free(p_str);\n      free(tuning_path);\n      free(name);\n      free(ubs);\n      exit(0);\n    }\n    free(name);\n  }\n\n  p_str = isl_printer_to_str(ctx);\n  p_str = autosa_array_ref_group_print_prefix(group, p_str);\n  module_name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  printf(\"[AutoSA] #HBM port for %s: %d \\n\", module_name, tile_size[0]);\n  free(module_name);\n\n  /* Check if the tile factor is greater or equal than the loop bound. */\n  umap = isl_schedule_node_band_get_partial_schedule_union_map(node);\n  uset = isl_union_map_range(umap);\n  set = isl_set_from_union_set(uset);\n  val = isl_val_zero(ctx);\n  isl_set_foreach_basic_set(set, &extract_set_max_dim, &val);\n  isl_set_free(set);\n  if (isl_val_get_num_si(val) <= tile_size[0])\n  {\n    /* The current loop bound is smaller than the tile size, \n     * no need to further tile. \n     */\n    // TODO: At present, we require tile factor to be greater than the loop bound.\n    // This is due to the reason that we can't handle loop with bound one since\n    // such loop will be degenerated. Fix it in the future.\n    free(tile_size);\n    isl_val_free(val);\n    printf(\"[AutoSA] HBM optimization failed! Please try to use a smaller HBM port number.\\n\");\n    return node;\n  }\n  isl_val_free(val);\n\n  group->n_mem_ports = tile_size[0];\n  group->space_dim++;\n\n  tile_size[0] = ubs[0] / tile_size[0];\n  node = autosa_tile_band(node, tile_size);\n  node = isl_schedule_node_child(node, 0);\n\n  /* Update the transformation function. */\n  isl_aff *aff = isl_multi_aff_get_aff(*io_trans_ma, 0);\n  isl_aff *tile_aff, *point_aff;\n  tile_aff = isl_aff_scale_down_ui(isl_aff_copy(aff), tile_size[0]);\n  tile_aff = isl_aff_floor(tile_aff);\n  point_aff = isl_aff_scale_down_ui(isl_aff_copy(aff), tile_size[0]);\n  point_aff = isl_aff_floor(point_aff);\n  point_aff = isl_aff_scale_val(point_aff, isl_val_int_from_ui(ctx, tile_size[0]));\n  point_aff = isl_aff_sub(aff, point_aff);\n\n  isl_aff_list *aff_list = isl_aff_list_from_aff(tile_aff);\n  aff_list = isl_aff_list_add(aff_list, point_aff);\n  for (int n = 1; n < isl_multi_aff_dim(*io_trans_ma, isl_dim_out); n++)\n  {\n    aff = isl_multi_aff_get_aff(*io_trans_ma, n);\n    aff_list = isl_aff_list_add(aff_list, aff);\n  }\n\n  isl_space *space = isl_multi_aff_get_space(*io_trans_ma);\n  isl_multi_aff_free(*io_trans_ma);\n  space = isl_space_add_dims(space, isl_dim_out, 1);\n  *io_trans_ma = isl_multi_aff_from_aff_list(space, aff_list);\n  free(tile_size);\n  free(ubs);\n\n  return node;\n}\n\n/* This function examines if the accessed elements of the I/O group \n * are fully overlapped at the PE level.\n * We will create a relation \"overlap\"\n * \n *  [[D -> R] -> [D' -> R']]\n * \n * of pairs of domain iterations accessing the reference group and \n * the domain iterations D' is lexicographically greater than D by one \n * at the last array_part loop with PE loops equal.\n * \n * This relation is intersected with all flow dependences to derive the \n * pairs of iterations that overlapped due to the flow dependence.\n * \n * Next, we construct a relation \"external\"\n * that contains pair of iteration domains with flow dependences that \n * access the elements by the I/O group.\n * \n * We substract \"overlap\" from \"external\". If the diff is null, clearly\n * the accessed elements are overlapped between different array partitions \n * for one PE, we will return true for this case.\n * Otherwise, we return false.\n */\nstatic isl_bool internal_group_in_out_overlap(\n    __isl_keep isl_schedule_node *node,\n    struct autosa_kernel *kernel,\n    struct autosa_array_ref_group *group, int read)\n{\n  int empty;\n  struct autosa_prog *prog = kernel->prog;\n  isl_union_pw_multi_aff *tagger;\n  isl_union_map *prefix;\n  isl_union_map *access, *tagged;\n  isl_union_set *domain;\n  isl_set *prefix_range;\n  isl_map *lt;\n  int n_sched_dim;\n  isl_union_map *overlap;\n  isl_union_map *external, *universe;\n  isl_union_set *access_domain;\n  isl_union_set *tag_set;\n  isl_map *sched_identity;\n  int pe_depth, array_depth;\n\n  node = isl_schedule_node_copy(node);\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  array_depth = isl_schedule_node_get_schedule_depth(node);\n  node = autosa_tree_move_down_to_pe(node, kernel->core);\n  pe_depth = isl_schedule_node_get_schedule_depth(node);\n  prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n  prefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix,\n                                                            isl_union_pw_multi_aff_copy(kernel->contraction));\n  isl_schedule_node_free(node);\n  access = autosa_io_group_access_relation(group, kernel, read, !read);\n  tagged = group_tagged_access_relation(group);\n\n  /* Remove the local dependency first. */\n  access = remove_local_accesses_group_flow(kernel, group, access, prefix, read);\n\n  /* Tagger maps the tagged iteration domain to untagged iteration domain.\n   * Iteration domain is tagged to the access function.\n   * e.g. [S1[i,j,k] -> _pet_ref_1[]] -> S1[(i),(j),(k)]\n   */\n  tagger = isl_union_pw_multi_aff_copy(prog->scop->tagger);\n  domain = isl_union_map_domain(isl_union_map_copy(tagged));\n  tagger = isl_union_pw_multi_aff_intersect_domain(tagger,\n                                                   isl_union_set_copy(domain));\n  prefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix, tagger);\n\n  prefix_range = isl_set_from_union_set(isl_union_map_range(isl_union_map_copy(prefix)));\n  n_sched_dim = isl_set_dim(prefix_range, isl_dim_set);\n  sched_identity = isl_set_identity(isl_set_copy(prefix_range));\n\n  lt = isl_map_lex_lt_first(isl_map_get_space(sched_identity), array_depth);\n  isl_map_free(sched_identity);\n\n  /* Set the space dims equal. */\n  for (int i = array_depth; i < n_sched_dim; i++)\n  {\n    lt = isl_map_equate(lt, isl_dim_in, i, isl_dim_out, i);\n  }\n  lt = isl_map_intersect_domain(lt, isl_set_copy(prefix_range));\n  lt = isl_map_intersect_range(lt, prefix_range);\n  lt = isl_map_lexmin(lt);\n\n  overlap = isl_union_map_apply_range(isl_union_map_copy(prefix),\n                                      isl_union_map_from_map(lt));\n  overlap = isl_union_map_apply_range(overlap, isl_union_map_reverse(prefix));\n  overlap = isl_union_map_coalesce(overlap);\n\n  /* Derive the overlapping set. */\n  overlap = isl_union_map_intersect(overlap,\n                                    isl_union_map_copy(prog->scop->tagged_dep_flow));\n  empty = isl_union_map_is_empty(overlap);\n\n  external = isl_union_map_copy(prog->scop->tagged_dep_flow);\n  universe = isl_union_map_universe(isl_union_map_copy(access));\n  access_domain = isl_union_map_domain(universe);\n  domain = isl_union_set_universe(domain);\n  universe = isl_union_set_unwrap(domain);\n  universe = isl_union_map_intersect_domain(universe, access_domain);\n  /* D -> __pet_ref_1 */\n  domain = isl_union_map_wrap(universe);\n  if (read)\n    external = isl_union_map_intersect_range(external, domain);\n  else\n    external = isl_union_map_intersect_domain(external, domain);\n  external = isl_union_map_intersect_params(external,\n                                            isl_set_copy(prog->scop->context));\n  /* external contains flow dep that are associated with the group access. */\n\n  external = isl_union_map_subtract(external, overlap);\n  /* external only contains access non-overlap RAW pairs. */\n\n  if (read)\n  {\n    tag_set = isl_union_map_range(external);\n    external = wrapped_reference_to_access(tag_set, tagged);\n  }\n  else\n  {\n    tag_set = isl_union_map_domain(external);\n    external = wrapped_reference_to_access(tag_set, tagged);\n  }\n\n  if (empty < 0)\n    external = isl_union_map_free(external);\n  else if (empty)\n    external = isl_union_map_universe(external);\n\n  access = isl_union_map_intersect(access, external);\n  empty = isl_union_map_is_empty(access);\n  isl_union_map_free(access);\n\n  if (empty)\n    return isl_bool_true;\n  else\n    return isl_bool_false;\n}\n\n/* This function examines if the dependence in the io group are carried by the \n * loops above the \"array\" node. \n */\nstatic isl_bool io_group_carried_by_array_loops(\n    __isl_keep isl_schedule_node *node,\n    struct autosa_kernel *kernel,\n    struct autosa_array_ref_group *group, int read)\n{\n  isl_union_map *prefix, *identity_sched;\n  isl_union_map *access, *tagged;\n  isl_union_pw_multi_aff *tagger;\n  isl_union_set *domain, *access_domain;\n  struct autosa_prog *prog = kernel->prog;\n  isl_set *prefix_range;\n  int n_sched_dim;\n  isl_map *sched_identity;\n  isl_union_map *external, *universe;\n  isl_union_set *tag_set;\n  int empty;  \n\n  node = isl_schedule_node_copy(node);\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n\n  /* Test if the array partition band is empty */\n  node = isl_schedule_node_parent(node);\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_band) {\n    /* No array partitioning, directly return. */\n    isl_schedule_node_free(node);\n    return isl_bool_false;\n  }\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n\n  prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n  prefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix,\n                                                            isl_union_pw_multi_aff_copy(kernel->contraction));\n  isl_schedule_node_free(node);\n  access = autosa_io_group_access_relation(group, kernel, read, !read);  \n  /* Remove the local dependence first. */\n  access = remove_local_accesses_group_flow(kernel, group, access, prefix, read);\n\n  tagged = group_tagged_access_relation(group);\n  tagger = isl_union_pw_multi_aff_copy(prog->scop->tagger);\n  domain = isl_union_map_domain(isl_union_map_copy(tagged));\n  tagger = isl_union_pw_multi_aff_intersect_domain(tagger,\n                                                   isl_union_set_copy(domain));\n\n  prefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix, tagger);  \n  identity_sched = isl_union_map_apply_range(prefix, \n                                             isl_union_map_reverse(isl_union_map_copy(prefix)));\n  identity_sched = isl_union_map_intersect(identity_sched,\n                                           isl_union_map_copy(prog->scop->tagged_dep_flow));\n  empty = isl_union_map_is_empty(identity_sched);\n\n  external = isl_union_map_copy(prog->scop->tagged_dep_flow);\n  universe = isl_union_map_universe(isl_union_map_copy(access));\n  access_domain = isl_union_map_domain(universe);\n  domain = isl_union_set_universe(domain);\n  universe = isl_union_set_unwrap(domain);\n  universe = isl_union_map_intersect_domain(universe, access_domain);\n  domain = isl_union_map_wrap(universe);\n  if (read)\n    external = isl_union_map_intersect_range(external, domain);\n  else\n    external = isl_union_map_intersect_domain(external, domain);\n  external = isl_union_map_intersect_params(external,\n                                            isl_set_copy(prog->scop->context));\n  external = isl_union_map_subtract(external, identity_sched);\n\n  if (read)\n  {\n    tag_set = isl_union_map_range(external);\n    external = wrapped_reference_to_access(tag_set, tagged);\n  }\n  else\n  {\n    tag_set = isl_union_map_domain(external);\n    external = wrapped_reference_to_access(tag_set, tagged);\n  }\n\n  if (empty < 0)\n    external = isl_union_map_free(external);\n  else if (empty)\n    external = isl_union_map_universe(external);\n\n  access = isl_union_map_intersect(access, external);\n  empty = isl_union_map_is_empty(access);\n  isl_union_map_free(access);\n\n  if (empty)\n    return isl_bool_false;\n  else\n    return isl_bool_true;   \n}\n\n/* Return is the inter PE communication is required for this group.\n * There are several cases to consider:\n * - For I/O group with RAR dependences\n *   - if the group is with exterior I/O, then both in/out PE comm is required.\n *   - if the group is with interior I/O, only in PE comm is required.\n * - For I/O group with RAW deps\n *   - If the group is with exterior I/O, then both in/out PE comm is required.\n *   - If the group is with interior I/O, then it equals the array-level I/O direction. \n */\nstatic isl_bool is_inter_pe_comm_valid(\n    __isl_keep isl_schedule_node *node,\n    struct autosa_kernel *kernel,\n    struct autosa_array_ref_group *group, int read)\n{\n  int external_group = 1;\n\n  if (group->group_type == AUTOSA_PE_GROUP)\n    return isl_bool_true;\n  \n  /* External group */\n  for (int i = 0; i < group->n_ref; i++)\n  {\n    struct autosa_stmt_access *ref = group->refs[i];\n    for (int j = 0; j < ref->n_io_info; j++)\n    {\n      struct autosa_io_info *io_info = ref->io_info[j];\n      if (io_info->io_type == group->io_type && !isl_vec_cmp(io_info->dir, group->dir))\n      {\n        if (io_info->dep->type != AUTOSA_DEP_RAR)\n        {\n          external_group = 0;\n          break;\n        }\n      }\n    }\n  }\n\n  if (external_group)\n  {\n    if (group->io_type == AUTOSA_EXT_IO)      \n      return isl_bool_true;\n    else {\n      if (read)\n        return isl_bool_true;\n      else\n        return isl_bool_false;\n    }   \n  } else {\n    if (group->io_type == AUTOSA_EXT_IO)\n      return isl_bool_true;\n    else {\n      if (read) \n        return (group->array_io_dir == IO_IN || group->array_io_dir == IO_INOUT)? isl_bool_true : isl_bool_false;\n      else \n        return (group->array_io_dir == IO_OUT || group->array_io_dir == IO_INOUT)? isl_bool_true : isl_bool_false;\n    }\n  }\n\n  return isl_bool_true;\n}\n\n/* Return if the current module is valid to be generated. \n * There are several cases to consider:\n * - For I/O group with all RAR depenendence, no copy-out modules to be generated.\n * - For I/O group with RAW dependence.\n *   - If the dep is carried by array loops\n *     - if the group is interior I/O and the next read equals the previous write, no copy-in/copy-out to be generated.\n *   - Else if the dep is not carried by array loops\n *     - no copy-in/copy-out to be generated.\n */\nisl_bool is_io_module_valid(\n    __isl_keep isl_schedule_node *node,\n    struct autosa_kernel *kernel,\n    struct autosa_array_ref_group *group, int read)\n{\n  int external_group = 1;\n\n  if (group->group_type == AUTOSA_PE_GROUP)\n    return isl_bool_true;\n  if (group->group_type == AUTOSA_DRAIN_GROUP && read)\n    return isl_bool_false;\n  if (group->attached_drain_group)\n    return isl_bool_true;\n\n  /* External group */\n  for (int i = 0; i < group->n_ref; i++)\n  {\n    struct autosa_stmt_access *ref = group->refs[i];\n    for (int j = 0; j < ref->n_io_info; j++)\n    {\n      struct autosa_io_info *io_info = ref->io_info[j];\n      if (io_info->io_type == group->io_type && !isl_vec_cmp(io_info->dir, group->dir))\n      {\n        if (io_info->dep->type != AUTOSA_DEP_RAR)\n        {\n          external_group = 0;\n          break;\n        }\n      }\n    }\n  }\n\n  if (external_group)\n  {\n    if (read)\n      return isl_bool_true;\n    else\n      return isl_bool_false;\n  }\n\n  /* Internal group */\n  if (io_group_carried_by_array_loops(node, kernel, group, read)) {\n    if (group->io_type == AUTOSA_INT_IO &&\n        internal_group_in_out_overlap(node, kernel, group, read))\n      return isl_bool_false;\n  } else {\n    return isl_bool_false;\n  }\n\n  return isl_bool_true;\n}\n\n/* This function computes the schedule for the I/O modules that transfers\n * the data for the I/O group \"group\".\n * We will cluster I/O modules level by level. \n * We will first insert a \"IO_L1\" mark below the space loops, which indicates\n * IO_L1 modules will be allocated beside each PE.\n * Next, to clulster IO_L1 modules, we look at the space loops above the current\n * mark. We will perform a space-time transformation to cluster the I/O modules.\n * In the current implmentation, we will always use the projection vector (1,X)\n * to project all I/O modules along the direction of (1,X) together, and \n * schedule them following the direction of (1,X).\n * After one clustering, we will insert a new I/O mark below the new space loops.\n * This is done iteratively untill we run out of the available space loops.\n * The transformed space band will look like:\n * \"array\" mark\n * |\n * \"IO_LX\" mark\n * |\n * X \n * | \n * \"IO_LY\" mark\n * |\n * Y\n * |\n * \"PE\" mark\n */\nstatic isl_stat compute_io_group_schedule(\n    struct autosa_kernel *kernel, struct autosa_array_ref_group *group,\n    struct autosa_gen *gen)\n{\n  isl_printer *p_str;\n  char *io_str;\n  int io_level = 0;\n  int i;\n  isl_ctx *ctx = gen->ctx;\n  isl_id *id;\n  isl_schedule *sched;\n  isl_mat *io_trans_mat = NULL;\n  isl_multi_aff *io_trans_ma = NULL;\n  isl_map *io_trans_map = NULL;\n  isl_schedule_node *node;\n  int space_dim;\n  isl_schedule *schedule;\n\n  /* Sink to the space band */\n  schedule = isl_schedule_dup(kernel->schedule);\n  node = isl_schedule_get_root(schedule);\n  isl_schedule_free(schedule);\n\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  node = isl_schedule_node_child(node, 0);\n  //DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n  space_dim = isl_schedule_node_band_n_member(node);  \n  group->space_dim = space_dim;\n\n  /* Insert the IO_L1 mark. */\n  node = isl_schedule_node_child(node, 0);\n  p_str = isl_printer_to_str(ctx);\n  p_str = isl_printer_print_str(p_str, \"io_L\");\n  p_str = isl_printer_print_int(p_str, io_level + 1);\n  io_str = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  id = isl_id_alloc(ctx, io_str, NULL);\n  free(io_str);\n  node = isl_schedule_node_insert_mark(node, id);\n  io_level++;\n  node = isl_schedule_node_parent(node);\n\n  /* Cluster the I/O modules from innermost space loops to outermost loops. */\n  for (int i = space_dim - 1; i >= 0; i--)\n  {\n    isl_mat *io_trans_mat_i;\n    isl_multi_aff *io_trans_ma_i;\n    isl_vec *dir;\n    isl_mat *mat;\n\n    /* Perform space-time transformation on the current band. */    \n    if (i == space_dim - 1)\n    {      \n      dir = isl_vec_dup(group->dir);\n    }\n    else\n    {\n      /* By default, we set the first element of the direction vector as 1. */\n      dir = isl_vec_zero(ctx, i + 1);\n      dir = isl_vec_set_element_si(dir, 0, 1);\n    }\n    node = io_cluster(node, dir, &io_trans_mat_i, &io_trans_ma_i);\n    isl_vec_free(dir);\n\n    if (io_level == 1)\n    {\n      sched = isl_schedule_node_get_schedule(node);\n      group->io_L1_schedule = isl_schedule_dup(sched);\n      // TODO: if the space schedule is to be degenerated, we\n      // will need to update the io_trans/io_L1_trans as well.\n      group->io_L1_trans = isl_multi_aff_copy(io_trans_ma_i);\n\n      isl_schedule_free(sched);\n      io_trans_mat = io_trans_mat_i;\n      io_trans_ma = io_trans_ma_i;\n    }\n    else\n    {\n      isl_multi_aff_free(io_trans_ma_i);\n      /* Update the transformation matrix. */\n      int nrow = isl_mat_rows(io_trans_mat);\n      int ncol = isl_mat_cols(io_trans_mat);\n      isl_mat *extend_mat = isl_mat_alloc(ctx, nrow, ncol);\n      isl_mat *product_mat = isl_mat_alloc(ctx, nrow, ncol);\n      for (int r = 0; r < nrow; r++)\n        for (int c = 0; c < ncol; c++)\n        {\n          extend_mat = isl_mat_set_element_si(extend_mat, r, c, 0);\n          product_mat = isl_mat_set_element_si(product_mat, r, c, 0);\n        }\n\n      for (int r = 0; r < isl_mat_rows(io_trans_mat_i); r++)\n        for (int c = 0; c < isl_mat_cols(io_trans_mat_i); c++)\n        {\n          extend_mat = isl_mat_set_element_val(extend_mat, r, c,\n                                               isl_mat_get_element_val(io_trans_mat_i, r, c));\n        }\n      for (int r = isl_mat_rows(io_trans_mat_i); r < nrow; r++)\n      {\n        extend_mat = isl_mat_set_element_si(extend_mat, r, r, 1);\n      }\n      for (int r = 0; r < nrow; r++)\n        for (int c = 0; c < ncol; c++)\n        {\n          for (int k = 0; k < nrow; k++)\n          {\n            isl_val *v1, *v2, *v3;\n            v1 = isl_mat_get_element_val(extend_mat, r, k);\n            v2 = isl_mat_get_element_val(io_trans_mat, k, c);\n            v3 = isl_mat_get_element_val(product_mat, r, c);\n            v1 = isl_val_mul(v1, v2);\n            v3 = isl_val_add(v1, v3);\n            product_mat = isl_mat_set_element_val(product_mat, r, c, v3);\n          }\n        }\n      isl_mat_free(io_trans_mat);\n      isl_mat_free(extend_mat);\n      isl_mat_free(io_trans_mat_i);\n      io_trans_mat = product_mat;\n\n      /* Reset the transformation function. */\n      for (int r = 0; r < nrow; r++)\n      {\n        isl_aff *aff = isl_multi_aff_get_aff(io_trans_ma, r);\n        for (int c = 0; c < ncol; c++)\n        {\n          isl_val *val = isl_mat_get_element_val(io_trans_mat, r, c);          \n          aff = isl_aff_set_coefficient_si(aff, isl_dim_in, c, isl_val_get_num_si(val));          \n          isl_val_free(val);\n        }\n        io_trans_ma = isl_multi_aff_set_aff(io_trans_ma, r, aff);\n      }\n    }\n\n    /* Split the band and insert the IO mark. */\n    if (i > 0)\n    {\n      node = isl_schedule_node_band_split(node, i);\n      node = isl_schedule_node_child(node, 0);\n    }\n\n    /* If the multi-port DRAM/HBM is to be used, we will need to tile the loop again.\n     */\n    if (i == 0 && gen->options->autosa->hbm)\n    {\n      /* Test if this group contains both copy-in and copy-out set. \n       * At present, HBM optimization is not supported for this type of I/O group.\n       * We will need to make sure the copy-in and copy-out set for each HBM channel \n       * do not overlap since we only support fixed HBM port mapping for now.\n       * Therefore, for this type of I/O group, we will disable the HBM optimization.\n       * TODO: Relax this constraint in the future.\n       */\n      printf(\"[AutoSA] Apply HBM optimization.\\n\");\n      if (group->group_type == AUTOSA_IO_GROUP &&\n          is_flow_dep_carried_by_array_part_loops(kernel->schedule, group, kernel))\n      {\n        isl_printer *p_str;\n        char *module_name;\n        p_str = isl_printer_to_str(ctx);\n        p_str = autosa_array_ref_group_print_prefix(group, p_str);\n        module_name = isl_printer_get_str(p_str);\n        isl_printer_free(p_str);\n\n        printf(\"[AutoSA] The flow dependence is carried by the array partitioning loops.\\n\");\n        printf(\"[AutoSA] HBM optimization for the group: %s is omitted.\\n\", module_name);\n        free(module_name);\n        goto next;\n      }\n      if (group->io_type == AUTOSA_EXT_IO && i == space_dim - 1)\n      {\n        printf(\"[AutoSA] HBM optimization failed! Not enough I/O modules.\\n\");\n        goto next;\n      }\n      node = hbm_optimize(node, &io_trans_ma, kernel, group, gen);\n    }\n  next:\n    p_str = isl_printer_to_str(ctx);\n    p_str = isl_printer_print_str(p_str, \"io_L\");\n    p_str = isl_printer_print_int(p_str, io_level + 1);\n    io_str = isl_printer_get_str(p_str);\n    isl_printer_free(p_str);\n    id = isl_id_alloc(ctx, io_str, NULL);\n    free(io_str);\n    node = isl_schedule_node_insert_mark(node, id);\n    node = isl_schedule_node_parent(node);\n    io_level++;\n  }\n\n  isl_mat_free(io_trans_mat);  \n\n  group->io_level = io_level;\n  group->io_trans = io_trans_ma;\n\n  /* Insert the context node for the IO ids. \n   * NOTE: We will update this again in the later IO module generation.\n   */\n  node = autosa_tree_move_up_to_kernel(node);\n  node = insert_io_module_context(node, group, gen, kernel);\n\n  /* Determine if the I/O module for this group could be eliminated.\n   */\n  group->copy_in = 0;\n  group->copy_out = 0;\n  if (is_io_module_valid(node, kernel, group, 1))\n  {\n    group->copy_in = 1;\n    group->array_io_dir = (group->array_io_dir == IO_OUT)? IO_INOUT : IO_IN;\n  }\n  if (is_io_module_valid(node, kernel, group, 0))\n  {\n    group->copy_out = 1;\n    group->array_io_dir = (group->array_io_dir == IO_IN)? IO_INOUT : IO_OUT;\n  }\n  /* For drain group, copy-out module is always required. */\n  if (group->group_type == AUTOSA_DRAIN_GROUP) {\n    group->copy_out = 1;\n    group->array_io_dir = (group->array_io_dir == IO_IN)? IO_INOUT : IO_OUT;\n  }\n\n  if (group->copy_in || group->copy_out)\n  {\n    group->mem_port_id = group->local_array->n_mem_ports;\n    group->local_array->n_mem_ports += group->n_mem_ports;\n  }\n\n  /* Determine if the inter-PE communication is required. */\n  if (is_inter_pe_comm_valid(node, kernel, group, 1)) {\n    group->pe_io_dir = (group->pe_io_dir == IO_OUT)? IO_INOUT : IO_IN;\n  }\n  if (is_inter_pe_comm_valid(node, kernel, group, 0)) {\n    group->pe_io_dir = (group->pe_io_dir == IO_IN)? IO_INOUT : IO_OUT;\n  }\n  if (group->group_type == AUTOSA_DRAIN_GROUP) {\n    group->pe_io_dir = (group->pe_io_dir == IO_IN)? IO_INOUT : IO_OUT;\n  }\n\n  /* Store the I/O schedule. */\n  sched = isl_schedule_node_get_schedule(node);\n  group->io_schedule = isl_schedule_dup(sched);\n  isl_schedule_free(sched);\n  isl_schedule_node_free(node);\n\n  return isl_stat_ok;\n}\n\nstatic __isl_give isl_map *local_access_io_at_node(struct autosa_kernel *kernel,\n                                                   struct autosa_array_ref_group *group,\n                                                   __isl_keep isl_union_map *access, __isl_keep isl_schedule_node *node)\n{\n  isl_union_map *local, *sched;\n  isl_union_pw_multi_aff *contraction;\n\n  local = isl_union_map_copy(access);\n  sched = prefix_with_equalities(node);\n  // TODO: fix the contraction\n  contraction = isl_schedule_node_get_subtree_contraction(node);\n  /* #ifdef _DEBUG\n  isl_printer *pd = isl_printer_to_file(isl_schedule_node_get_ctx(node), stdout);\n  pd = isl_printer_print_union_pw_multi_aff(pd, contraction);\n  pd = isl_printer_end_line(pd);\n  isl_printer_free(pd);\n#endif */\n\n  sched = expand(sched, contraction);\n  local = isl_union_map_apply_domain(local, sched);\n\n  isl_union_pw_multi_aff_free(contraction);\n\n  return isl_map_from_union_map(local);\n}\n\n/* Compute the local memory tiles for the drain group \"group\"\n * of array \"array\". Return isl_stat_ok on success and isl_stat_error on error.\n *\n * If the array is a read-only scalar or if the user requested not to use local\n * memory, then we do not need to do anything.\n */\nisl_stat compute_group_bounds_drain_at_node(struct autosa_kernel *kernel,\n                                            struct autosa_array_ref_group *group, __isl_keep isl_schedule_node *node,\n                                            struct autosa_io_buffer *buffer)\n{\n  isl_ctx *ctx = isl_space_get_ctx(group->array->space);\n  int use_local = kernel->options->autosa->use_local_memory;\n  isl_stat r = isl_stat_ok;\n  isl_union_map *access;\n  isl_map *acc;\n  isl_bool ok;\n\n  if (!use_local)\n    return isl_stat_ok;\n  if (autosa_array_is_read_only_scalar(group->array))\n    return isl_stat_ok;\n  if (!group->exact_write)\n    return isl_stat_ok;\n  if (group->slice)\n    return isl_stat_ok;\n\n  /* Collect all accesses in the group. */\n  access = autosa_array_ref_group_access_relation(group, 0, 1);\n  /* Create local tile */\n  if (use_local)\n  {\n    /* Create a tile */\n    buffer->tile = autosa_array_tile_create(ctx, group->array->n_index);\n    /* Map the domain to the outer scheduling dimensions */\n    acc = local_access_io_at_node(kernel, group, access, node);\n    /* Collect the shift and scale factors of the tile */\n    ok = can_tile(acc, buffer->tile);\n    if (ok < 0)\n      r = isl_stat_error;\n    else if (!ok)\n      buffer->tile = autosa_array_tile_free(buffer->tile);\n    isl_map_free(acc);\n  }\n\n  if (r < 0)\n  {\n    isl_union_map_free(access);\n    return r;\n  }\n\n  isl_union_map_free(access);\n  return isl_stat_ok;\n}\n\n/* Should this array reference group be mapped to local or global\n * memory?\n * If the array is scalar, we will map it to the global memory.\n * Otherwise, it is mapped to local memory. \n */\nenum autosa_group_access_type autosa_array_ref_group_type(\n    struct autosa_array_ref_group *group)\n{\n  if (autosa_array_is_read_only_scalar(group->array))\n    return AUTOSA_ACCESS_GLOBAL;\n  else\n    return AUTOSA_ACCESS_LOCAL;\n}\n\n/* Return the effective array_tile associated to \"group\" or\n * NULL if there is no such array_tile.\n */\nstruct autosa_array_tile *autosa_array_ref_group_tile(\n    struct autosa_array_ref_group *group)\n{\n  switch (autosa_array_ref_group_type(group))\n  {\n  case AUTOSA_ACCESS_GLOBAL:\n    return NULL;\n  case AUTOSA_ACCESS_LOCAL:\n    return group->local_tile;\n  }\n\n  return NULL;\n}\n\n/* Should this array reference group be mapped to local or global\n * memory?\n */\nenum autosa_group_access_type autosa_cpu_array_ref_group_type(\n    struct autosa_array_ref_group *group)\n{\n  if (group->local_tile)\n    return AUTOSA_ACCESS_LOCAL;\n  return AUTOSA_ACCESS_GLOBAL;\n}\n\n/* Given a description of an array tile \"tile\" and the \"space\"\n *\n *\t{ D -> A }\n *\n * where D represents the first tile->depth schedule dimensions\n * and A represents the array, construct an isl_multi_aff\n *\n *\t{ [D[i] -> A[a]] -> A'[a'] }\n *\n * with A' a scaled down copy of A according to the shifts and strides\n * in \"tile\".  In particular,\n *\n *\ta' = (a + shift(i))/stride\n *\n * \"insert_array\" represents\n *\n *\t{ [D -> A] -> D }\n *\n * and is used to insert A into the domain of functions that only\n * reference D.\n */\nstatic __isl_give isl_multi_aff *strided_tile(\n    struct autosa_array_tile *tile, __isl_keep isl_space *space,\n    __isl_keep isl_multi_aff *insert_array)\n{\n  int i;\n  isl_ctx *ctx;\n  isl_multi_aff *shift;\n  isl_multi_val *stride;\n  isl_space *space2;\n  isl_local_space *ls;\n  isl_multi_aff *tiling;\n\n  ctx = isl_space_get_ctx(space);\n  space2 = isl_space_domain(isl_space_copy(space));\n  ls = isl_local_space_from_space(space2);\n  space2 = isl_space_range(isl_space_copy(space));\n  stride = isl_multi_val_zero(space2);\n  shift = isl_multi_aff_zero(isl_space_copy(space));\n\n  for (i = 0; i < tile->n; ++i)\n  {\n    struct autosa_array_bound *bound = &tile->bound[i];\n    isl_val *stride_i;\n    isl_aff *shift_i;\n\n    stride_i = isl_val_copy(bound->stride);\n    shift_i = isl_aff_copy(bound->shift);\n\n    stride = isl_multi_val_set_val(stride, i, stride_i);\n    shift = isl_multi_aff_set_aff(shift, i, shift_i);\n  }\n  isl_local_space_free(ls);\n\n  shift = isl_multi_aff_pullback_multi_aff(shift,\n                                           isl_multi_aff_copy(insert_array));\n\n  tiling = isl_multi_aff_range_map(isl_space_copy(space));\n  tiling = isl_multi_aff_add(tiling, shift);\n  tiling = isl_multi_aff_scale_down_multi_val(tiling, stride);\n\n  return tiling;\n}\n\n/* Print the name of the local copy of a given group of array references.\n */\n__isl_give isl_printer *autosa_array_ref_group_print_name(\n    struct autosa_array_ref_group *group, __isl_take isl_printer *p)\n{\n  int global = 0;\n  enum autosa_group_access_type type;\n\n  type = autosa_array_ref_group_type(group);\n  if (type == AUTOSA_ACCESS_LOCAL)\n    p = isl_printer_print_str(p, \"local_\");\n  else\n    global = 1;\n\n  p = isl_printer_print_str(p, group->array->name);\n  if (!global)\n  {\n    if (group->group_type == AUTOSA_IO_GROUP && group->local_array->n_io_group > 1)\n    {\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_int(p, group->nr);\n    }\n    else if (group->group_type == AUTOSA_PE_GROUP && group->local_array->n_pe_group > 1)\n    {\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_int(p, group->nr);\n    }\n  }\n\n  return p;\n}\n\n/* Compute a tiling for the array reference group \"group\".\n *\n * The tiling is of the form\n *\n *\t{ [D[i] -> A[a]] -> T[t] }\n *\n * where D represents the first tile->depth schedule dimensions,\n * A represents the global array and T represents the local memory \n * tile.  The name of T is the name of the local array.\n *\n * If there is any stride in the accesses, then the mapping is\n *\n *\tt = (a + shift(i))/stride - lb(i)\n *\n * otherwise, it is simply\n *\n *\tt = a - lb(i)\n *\n * Compute the tiling based on the \"tile\". If \"tile\" is NULL, \n * compute the tiling based on the tile from the \"group\".\n */\nvoid autosa_array_ref_group_compute_tiling(\n    struct autosa_array_tile *tile,\n    struct autosa_array_ref_group *group)\n{\n  int i;\n  isl_space *space;\n  isl_multi_aff *tiling, *lb, *insert_array;\n  isl_printer *p;\n  char *local_name;\n\n  if (tile == NULL && autosa_array_ref_group_tile(group) == NULL)\n    return;\n\n  if (tile == NULL)\n    tile = autosa_array_ref_group_tile(group);\n\n  space = isl_map_get_space(group->access);\n  space = isl_space_from_range(isl_space_range(space));\n  /* Build D[i] -> A[a] */\n  space = isl_space_add_dims(space, isl_dim_in, tile->depth);\n  /* Build [D[i] -> A[a]] -> D[i] */\n  insert_array = isl_multi_aff_domain_map(isl_space_copy(space));\n\n  for (i = 0; i < tile->n; ++i)\n    if (tile->bound[i].shift)\n      break;\n\n  if (i < tile->n)\n    tiling = strided_tile(tile, space, insert_array);\n  else\n    tiling = isl_multi_aff_range_map(isl_space_copy(space));\n\n  lb = isl_multi_aff_zero(space);\n  for (i = 0; i < tile->n; ++i)\n  {\n    isl_aff *lb_i = isl_aff_copy(tile->bound[i].lb);\n    lb = isl_multi_aff_set_aff(lb, i, lb_i);\n  }\n  lb = isl_multi_aff_pullback_multi_aff(lb, insert_array);\n\n  tiling = isl_multi_aff_sub(tiling, lb);\n\n  p = isl_printer_to_str(isl_multi_aff_get_ctx(tiling));\n  p = autosa_array_ref_group_print_name(group, p);\n  local_name = isl_printer_get_str(p);\n  isl_printer_free(p);\n  tiling = isl_multi_aff_set_tuple_name(tiling, isl_dim_out, local_name);\n  free(local_name);\n\n  tile->tiling = tiling;\n}\n\n/* Compute the tiling bounds for the drain group at the PE level. \n */\nstatic isl_stat compute_group_bounds_drain_at_node_PE(\n    struct autosa_kernel *kernel, struct autosa_array_ref_group *group,\n    __isl_keep isl_schedule_node *node)\n{\n  isl_ctx *ctx = isl_space_get_ctx(group->array->space);\n  int use_local = kernel->options->autosa->use_local_memory;\n  isl_stat r = isl_stat_ok;\n  isl_union_map *access;\n  isl_map *acc;\n  isl_bool ok;\n\n  if (!use_local)\n    return isl_stat_ok;\n  if (autosa_array_is_read_only_scalar(group->array))\n    return isl_stat_ok;\n  if (!group->exact_write)\n    return isl_stat_ok;\n  if (group->slice)\n    return isl_stat_ok;\n\n  /* Collect all accesses in the group. */\n  access = autosa_array_ref_group_access_relation(group, 0, 1);\n  /* Create local tile. */\n  if (use_local)\n  {\n    /* Create a tile. */\n    group->pe_tile = autosa_array_tile_create(ctx, group->array->n_index);\n    /* Map the domain to the outer scheduling dimensions. */\n    acc = local_access_io_at_node(kernel, group, access, node);\n    /* Collect the shift and scale factors of the tile. */\n    ok = can_tile(acc, group->pe_tile);\n    if (ok < 0)\n      r = isl_stat_error;\n    else if (!ok)\n      group->pe_tile = autosa_array_tile_free(group->pe_tile);\n    isl_map_free(acc);\n  }\n\n  if (r < 0)\n  {\n    isl_union_map_free(access);\n    return r;\n  }\n\n  isl_union_map_free(access);\n  return isl_stat_ok;\n}\n\n/* Compute the drain group tiling at the PE level. */\nstatic isl_stat compute_drain_tiling_at_PE(struct autosa_kernel *kernel,\n                                           struct autosa_array_ref_group *group)\n{\n  isl_schedule_node *node;\n  struct autosa_array_tile *tile;\n\n  node = isl_schedule_get_root(kernel->schedule);\n  node = autosa_tree_move_down_to_pe(node, kernel->core);\n  compute_group_bounds_drain_at_node_PE(kernel, group, node);\n  autosa_array_ref_group_compute_tiling(group->pe_tile, group);\n  isl_schedule_node_free(node);\n\n  return isl_stat_ok;\n}\n\n/* Compute the local memory tiles for the io group \"group\"\n * of array \"array\". Return isl_stat_ok on success and isl_stat_error on error.\n *\n * If the array is a read-only scalar or if the user requested not to use local\n * memory, then we do not need to do anything.\n */\nisl_stat compute_group_bounds_io_at_node(struct autosa_kernel *kernel,\n                                         struct autosa_array_ref_group *group, __isl_keep isl_schedule_node *node,\n                                         struct autosa_io_buffer *buffer)\n{\n  isl_ctx *ctx = isl_space_get_ctx(group->array->space);\n  int use_local = kernel->options->autosa->use_local_memory;\n  isl_stat r = isl_stat_ok;\n  isl_union_map *access;\n  isl_map *acc;\n  isl_bool ok;\n\n  if (!use_local)\n    return isl_stat_ok;\n  if (autosa_array_is_read_only_scalar(group->array))\n    return isl_stat_ok;\n  if (!group->exact_write)\n    return isl_stat_ok;\n  if (group->slice)\n    return isl_stat_ok;\n\n  /* Collect all accesses in the group. */\n  access = autosa_array_ref_group_access_relation(group, 1, 1);\n  /* Create local tile. */\n  if (use_local)\n  {\n    /* Create a tile. */\n    buffer->tile = autosa_array_tile_create(ctx, group->array->n_index);\n    /* Map the domain to the outer scheduling dimensions. */\n    acc = local_access_io_at_node(kernel, group, access, node);\n    /* Collect the shift and scale factors of the tile. */\n    ok = can_tile(acc, buffer->tile);\n    if (ok < 0)\n      r = isl_stat_error;\n    else if (!ok)\n      buffer->tile = autosa_array_tile_free(buffer->tile);\n    isl_map_free(acc);\n  }\n\n  if (r < 0)\n  {\n    isl_union_map_free(access);\n    return r;\n  }\n\n  isl_union_map_free(access);\n  return isl_stat_ok;\n}\n\n/* Compute the tiling group bounds for the io group at the PE level. */\nisl_stat compute_group_bounds_io_at_node_PE(\n    struct autosa_kernel *kernel,\n    struct autosa_array_ref_group *group, __isl_keep isl_schedule_node *node)\n{\n  isl_ctx *ctx = isl_space_get_ctx(group->array->space);\n  int use_local = kernel->options->autosa->use_local_memory;\n  isl_stat r = isl_stat_ok;\n  isl_union_map *access;\n  isl_map *acc;\n  isl_bool ok;\n\n  if (!use_local)\n    return isl_stat_ok;\n  if (autosa_array_is_read_only_scalar(group->array))\n    return isl_stat_ok;\n  if (!group->exact_write)\n    return isl_stat_ok;\n  if (group->slice)\n    return isl_stat_ok;\n\n  /* Collect all accesses in the group. */\n  access = autosa_array_ref_group_access_relation(group, 1, 1);\n  /* Create local tile. */\n  if (use_local)\n  {\n    /* Create a tile. */\n    group->pe_tile = autosa_array_tile_create(ctx, group->array->n_index);\n    /* Map the domain to the outer scheduling dimensions. */\n    acc = local_access_io_at_node(kernel, group, access, node);\n    /* Collect the shift and scale factors of the tile. */\n    ok = can_tile(acc, group->pe_tile);\n    if (ok < 0)\n      r = isl_stat_error;\n    else if (!ok)\n      group->pe_tile = autosa_array_tile_free(group->pe_tile);\n    isl_map_free(acc);\n  }\n\n  if (r < 0)\n  {\n    isl_union_map_free(access);\n    return r;\n  }\n\n  isl_union_map_free(access);\n  return isl_stat_ok;\n}\n\n/* Create the tiling for the IO group at the PE level. */\nstatic isl_stat compute_io_tiling_at_PE(struct autosa_kernel *kernel,\n                                        struct autosa_array_ref_group *group)\n{\n  isl_schedule_node *node;\n  struct autosa_array_tile *tile;\n\n  node = isl_schedule_get_root(kernel->schedule);\n  node = autosa_tree_move_down_to_pe(node, kernel->core);\n  compute_group_bounds_io_at_node_PE(kernel, group, node);\n  autosa_array_ref_group_compute_tiling(group->pe_tile, group);\n  isl_schedule_node_free(node);\n\n  return isl_stat_ok;\n}\n\n/* Insert the IO module filter ids into the schedule.\n * \"node\" points to the IO_L[io_level] mark.\n * Return the new node points to the same position.\n */\nstatic __isl_give isl_schedule_node *insert_io_module_ids(\n    struct autosa_gen *gen, struct autosa_kernel *kernel,\n    __isl_take isl_schedule_node *node, int space_dim, int io_level)\n{\n  int n_io_ids;\n  isl_id_list *io_ids;\n  isl_set *context;\n  isl_union_set *filter = NULL;\n\n  n_io_ids = space_dim - io_level + 1;\n  if (n_io_ids <= 0)\n    return node;\n  io_ids = ppcg_scop_generate_names(gen->prog->scop, n_io_ids, \"p\");\n  n_io_ids = 0;\n\n  /* Add the filters. */\n  n_io_ids = 0;\n  node = autosa_tree_move_up_to_array(node);\n  while (!isl_schedule_node_is_io_mark(node, io_level))\n  {\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n    {\n      isl_id *id;\n      isl_id_list *ids;\n      isl_union_set *uset;\n\n      ids = isl_id_list_from_id(isl_id_list_get_id(io_ids, n_io_ids));\n      uset = set_schedule_eq(node, ids);\n      n_io_ids++;      \n      if (filter == NULL)\n        filter = uset;\n      else\n        filter = isl_union_set_union(filter, uset);      \n      //node = isl_schedule_node_insert_filter(node, uset);\n      //node = isl_schedule_node_child(node, 0);      \n      isl_id_list_free(ids);      \n    }\n    node = isl_schedule_node_child(node, 0);\n  }\n\n  isl_id_list_free(io_ids);\n  /* Insert the filter. */\n  node = autosa_tree_move_up_to_kernel(node);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_filter(node, filter);\n  node = autosa_tree_move_down_to_io_mark(node, kernel->core, io_level);\n\n  return node;\n}\n\n/* Allocate I/O buffers at each I/O level.\n * If two-level buffer is disabled, we will only allocate buffer \n * at the innermost level for each group:\n * - drain group @ io_L1\n * - io group @ io_L1 (INT_IO) | io_L2 (EXT_IO)\n * If two-level buffer is turned on, we will also allocate buffers\n * at the outermost level for each group.\n */\nstatic isl_stat compute_io_group_buffer(struct autosa_kernel *kernel,\n                                        struct autosa_array_ref_group *group, struct autosa_gen *gen)\n{\n  isl_schedule_node *node;\n  int io_level = group->io_level;\n  int i;\n  int two_level_buffer = gen->options->autosa->two_level_buffer;\n\n  node = isl_schedule_get_root(group->io_schedule);\n\n  /* Compute the group tiling at each I/O level. */\n  node = autosa_tree_move_down_to_pe(node, kernel->core);\n  i = 1;\n  assert(group->io_buffers == NULL);\n  assert(group->n_io_buffer == 0);\n  group->io_buffers = NULL;\n  group->n_io_buffer = 0;\n  while (i <= io_level)\n  {\n    isl_schedule_node *node_cp = NULL;\n    node = isl_schedule_node_parent(node);\n    if (isl_schedule_node_is_io_mark(node, i))\n    {\n      /* In the automatic mode, AutoSA only computes the tiling at L1\n       * for drain group and I/O group with interior I/O, and at L2 for I/O \n       * group with exterior I/O.\n       */\n      (group->n_io_buffer)++;\n      group->io_buffers = (struct autosa_io_buffer **)realloc(\n          group->io_buffers, sizeof(struct autosa_io_buffer *) * group->n_io_buffer);\n      group->io_buffers[group->n_io_buffer - 1] = autosa_io_buffer_alloc();          \n      group->io_buffers[group->n_io_buffer - 1]->level = i;\n      group->io_buffers[group->n_io_buffer - 1]->tile = NULL;\n\n      node_cp = isl_schedule_node_copy(node);      \n      if (group->group_type == AUTOSA_DRAIN_GROUP)\n      {\n        if (i == 1)\n        {\n          /* Compute the group tiling at this level */\n          compute_group_bounds_drain_at_node(kernel, group, node_cp,\n                                             group->io_buffers[group->n_io_buffer - 1]);\n          autosa_array_ref_group_compute_tiling(\n              group->io_buffers[group->n_io_buffer - 1]->tile, group);\n          compute_drain_tiling_at_PE(kernel, group);\n          if (gen->options->autosa->tuning_method == 1) {                        \n            group->io_buffers[group->n_io_buffer - 1]->tuning_tile = TP_infer_tiled_array(gen, kernel, node, group, 0, 1);\n            isl_schedule_node *new_node = isl_schedule_get_root(kernel->schedule);\n            new_node = autosa_tree_move_down_to_pe(new_node, kernel->core);            \n            group->tuning_pe_tile = TP_infer_tiled_array(gen, kernel, node, group, 0, 1); \n            isl_schedule_node_free(new_node);\n          }\n        }\n        else\n        {\n          group->io_buffers[group->n_io_buffer - 1]->tile = NULL;\n        }\n      }\n      else if (group->group_type == AUTOSA_IO_GROUP)\n      {\n        if ((group->io_type == AUTOSA_EXT_IO && i == 2) ||\n            (group->io_type == AUTOSA_INT_IO && i == 1))\n        {\n          /* Compute the group tiling at this level. */\n          compute_group_bounds_io_at_node(kernel, group, node_cp,\n                                          group->io_buffers[group->n_io_buffer - 1]);\n          autosa_array_ref_group_compute_tiling(\n              group->io_buffers[group->n_io_buffer - 1]->tile, group);\n          if (group->io_type == AUTOSA_INT_IO && i == 1)\n          {\n            compute_io_tiling_at_PE(kernel, group);\n          }\n          if (gen->options->autosa->tuning_method == 1) {\n            group->io_buffers[group->n_io_buffer - 1]->tuning_tile = TP_infer_tiled_array(gen, kernel, node, group, 1, 1);\n            if (group->io_type == AUTOSA_INT_IO && i == 1) {\n              isl_schedule_node *new_node = isl_schedule_get_root(kernel->schedule);\n              new_node = autosa_tree_move_down_to_pe(new_node, kernel->core);              \n              group->tuning_pe_tile = TP_infer_tiled_array(gen, kernel, node, group, 1, 1); \n              isl_schedule_node_free(new_node);\n            }\n          }          \n        }\n        else\n        {\n          group->io_buffers[group->n_io_buffer - 1]->tile = NULL;\n        }\n      }\n      else\n      {\n        group->io_buffers[group->n_io_buffer - 1]->tile = NULL;\n      }\n      if (two_level_buffer)\n      {\n        if (i == io_level)\n        {          \n          /* Compute the group tiling at the outermost I/O module. */\n          if (group->group_type == AUTOSA_DRAIN_GROUP)\n            compute_group_bounds_drain_at_node(kernel, group, node_cp, group->io_buffers[group->n_io_buffer - 1]);\n          else if (group->group_type == AUTOSA_IO_GROUP)\n            compute_group_bounds_io_at_node(kernel, group, node_cp, group->io_buffers[group->n_io_buffer - 1]);\n\n          autosa_array_ref_group_compute_tiling(group->io_buffers[group->n_io_buffer - 1]->tile, group);\n        }\n      }      \n      isl_schedule_node_free(node_cp);\n      i++;\n    }\n  }\n\n  isl_schedule_node_free(node);\n\n  return isl_stat_ok;\n}\n\n/* Adjust the fields of \"tile\" to reflect the new input dimension \"depth\".\n * The dimension beyond \"depth\" are assumed not to affect the tile,\n * so they can simply be dropped.\n */\nstatic int tile_adjust_depth(struct autosa_array_tile *tile, int depth)\n{\n  int i;\n\n  if (tile->depth == depth)\n    return 0;\n\n  for (i = 0; i < tile->n; ++i)\n  {\n    tile->bound[i].lb = isl_aff_drop_dims(tile->bound[i].lb,\n                                          isl_dim_in, depth, tile->depth - depth);\n    if (!tile->bound[i].lb)\n      return -1;\n    if (!tile->bound[i].shift)\n      continue;\n    tile->bound[i].shift = isl_aff_drop_dims(tile->bound[i].shift,\n                                             isl_dim_in, depth, tile->depth - depth);\n    if (!tile->bound[i].shift)\n      return -1;\n  }\n\n  tile->depth = depth;\n\n  return 0;\n}\n\n/* Compute the number of outer schedule tile dimensions that affect\n * the offset of \"tile\".\n * If there is no such dimension, then return the index\n * of the first kernel dimension, i.e., data->kernel_depth.\n */\nstatic int compute_tile_depth(struct autosa_group_data *data,\n                              struct autosa_array_tile *tile)\n{\n  int i, j;\n\n  for (j = tile->depth - 1; j >= data->kernel_depth; --j)\n  {\n    for (i = 0; i < tile->n; ++i)\n    {\n      isl_aff *lb;\n      isl_aff *shift;\n\n      lb = tile->bound[i].lb;\n      if (isl_aff_involves_dims(lb, isl_dim_in, j, 1))\n        break;\n\n      shift = tile->bound[i].shift;\n      if (!shift)\n        continue;\n      if (isl_aff_involves_dims(shift, isl_dim_in, j, 1))\n        break;\n    }\n    if (i < tile->n)\n      break;\n  }\n\n  return ++j;\n}\n\n/* Determine the number of schedule dimensions that affect the offset of the\n * local tile \"tile\" and store the result in tile->depth, with\n * a lower bound of data->kernel_depth.\n * Also adjust the fields of the tile to only refer to the tile->depth\n * outer schedule dimensions.\n */\nstatic isl_stat tile_set_depth(struct autosa_group_data *data,\n                               struct autosa_array_tile *tile)\n{\n  if (tile_adjust_depth(tile, compute_tile_depth(data, tile)) < 0)\n    return isl_stat_error;\n\n  return isl_stat_ok;\n}\n\n/* Internal struct used for update_group_simd. */\nstruct update_group_simd_data\n{\n  struct autosa_array_ref_group *group;\n  struct autosa_kernel *kernel;\n  int updated;\n};\n\n/* Examine if there is any array references in the \"group\" under the SIMD loop.\n * If so, exmaine if the array reference has a stride of 1 under the SIMD loop.\n * If so, update the SIMD lane of the \"group\".\n */\nstatic isl_bool update_group_simd(__isl_keep isl_schedule_node *node, void *user)\n{\n  struct update_group_simd_data *data = (struct update_group_simd_data *)user;\n\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_mark)\n  {\n    isl_id *id;\n    isl_union_set *domain;\n    struct autosa_array_ref_group *group = data->group;\n\n    id = isl_schedule_node_mark_get_id(node);\n    if (strcmp(isl_id_get_name(id), \"simd\"))\n    {\n      isl_id_free(id);\n      return isl_bool_true;\n    }\n\n    isl_id_free(id);\n    node = isl_schedule_node_child(node, 0);\n    domain = isl_schedule_node_get_domain(node);\n    for (int i = 0; i < group->n_ref; i++)\n    {\n      struct autosa_stmt_access *ref = group->refs[i];\n      for (int j = 0; j < ref->n_io_info; j++)\n      {\n        struct autosa_io_info *info = ref->io_info[j];\n        if (info->io_type == group->io_type && !isl_vec_cmp(info->dir, group->dir))\n        {\n          /* Test if either the source or dest of the dependence associated with\n           * the array reference is intersected with the current loop domain. */\n          struct autosa_dep *dep = info->dep;\n          isl_basic_map *bmap;\n          isl_map *map;\n          isl_set *src, *dest;\n          isl_union_set *uset;\n          bmap = isl_basic_map_copy(dep->isl_dep);\n          map = isl_map_from_basic_map(bmap);\n          map = isl_map_factor_domain(map);\n          src = isl_map_domain(isl_map_copy(map));\n          dest = isl_map_range(map);\n          uset = isl_union_set_union(isl_union_set_from_set(src),\n                                     isl_union_set_from_set(dest));\n          uset = isl_union_set_intersect(uset, isl_union_set_copy(domain));\n          if (!isl_union_set_is_empty(uset))\n          {\n            if (ref->simd_stride == 1) {\n              group->n_lane = data->kernel->simd_w;\n              data->updated = 1;\n            }\n          }\n          isl_union_set_free(uset);\n        }\n      }\n    }\n    isl_union_set_free(domain);\n  }\n\n  return isl_bool_true;\n}\n\n/* Select the data pack factor for I/O buffers. For this function, the array\n * that the I/O group is assoicated with is a sparse matrix.\n * The unit of data packing factor is the non_zero_num elements + one offset.\n */\nstatic isl_stat compute_io_group_data_pack_sparse(\n  struct autosa_kernel *kernel, struct autosa_array_ref_group *group,\n  struct autosa_gen *gen, int max_n_lane)\n{\n  isl_schedule_node *node;\n  isl_union_map *sizes;\n  int *data_pack_ubs = NULL;\n  struct update_group_simd_data data;\n  int ele_size = group->array->size; // bytes\n  /* Given the maximal DRAM port width as 64 Bytes, \n   * compute the maximal data pack factor. */\n  //if (max_n_lane == -1)\n  //  max_n_lane = 64 / ele_size;\n\n  group->n_lane = 1;\n  node = isl_schedule_get_root(kernel->schedule);\n  data.group = group;\n  data.kernel = kernel;\n  data.updated = 0;\n  isl_schedule_node_foreach_descendant_top_down(node, &update_group_simd, &data);\n  isl_schedule_node_free(node);\n\n  /* Update the group n_lane considering the sparse information */\n  if (group->n_lane % kernel->vec_len != 0) {\n    printf(\"[AutoSA] Error: The sparse block size is not a sub-multiple of the SIMD factor. Abort!\\n\");\n    exit(1);\n  }\n  group->n_lane /= kernel->vec_len;\n  \n  /* If data packing is disabled, simply update the data packing factor of \n   * each I/O buffer to the SIMD lanes that are required. \n   */\n  if (!gen->options->autosa->data_pack) {\n    for (int i = 0; i < group->io_level; i++) {\n      struct autosa_io_buffer *buf = group->io_buffers[i];\n      buf->n_lane = group->n_lane;\n      /* Update the sparse information */\n      buf->sparse = 1;\n      buf->vec_len = kernel->vec_len;\n    }\n    return isl_stat_ok;\n  }\n\n  int cur_n_lane = group->n_lane;\n  int status = false;\n  /* Parse the data pack settings. */\n  sizes = extract_sizes_from_str(gen->ctx, gen->options->autosa->data_pack_sizes);\n  //data_pack_ubs = read_data_pack_sizes(sizes, 3);\n  data_pack_ubs = read_data_pack_sizes_array(sizes, group->array->name);\n  if (!data_pack_ubs) {\n    /* Use the default numbers. */\n    data_pack_ubs = isl_alloc_array(gen->ctx, int, 3);\n    data_pack_ubs[0] = 8;\n    data_pack_ubs[1] = 32;\n    data_pack_ubs[2] = 64;\n  }\n\n  int cur_max_n_lane;\n  for (int i = 0; i < group->io_level; i++) {\n    struct autosa_io_buffer *buf = group->io_buffers[i];\n    if (i == 0)\n      cur_max_n_lane = std::max(group->n_lane, data_pack_ubs[0] / (kernel->n_nzero * ele_size + 1));\n    else if (i > 0 && i < group->io_level - 1)\n      cur_max_n_lane = std::max(group->n_lane, data_pack_ubs[1] / (kernel->n_nzero * ele_size + 1));\n    else\n      cur_max_n_lane = std::max(group->n_lane, data_pack_ubs[2] / ((kernel->n_nzero + kernel->n_meta_data) * ele_size));\n    if (buf->tile) {      \n      int n_lane = cur_n_lane;\n      isl_val *size = isl_val_copy(buf->tile->bound[group->array->n_index - 1].size);\n      if (i == group->io_level - 1 && group->local_array->host_serialize) {\n        for (int n = 0; n < group->array->n_index - 1; n++) {\n          size = isl_val_mul(size, isl_val_copy(buf->tile->bound[n].size));\n        }        \n      }      \n      size = isl_val_div(size, isl_val_int_from_si(gen->ctx, kernel->vec_len));\n\n      while (n_lane <= cur_max_n_lane) {\n        /* The lane should be multiples of SIMD lane. */\n        if (n_lane % group->n_lane == 0) {\n          isl_val *val = isl_val_int_from_si(gen->ctx, n_lane);\n          /* The lane should be sub-multiples of the last dim of the array. */\n          if (isl_val_is_divisible_by(size, val)) {\n            cur_n_lane = n_lane;\n            status = true;\n          }\n          isl_val_free(val);\n        }\n        //n_lane *= 2;\n        n_lane += 1;\n      }\n      if (status) {\n        buf->n_lane = cur_n_lane;        \n      } else {\n        printf(\"[AutoSA] Error: Cannot find data pack factors as sub-multiples of the last dim of the local array. Abort!\\n\");\n        printf(\"[AutoSA] Please try to use different tiling factors.\\n\");\n        exit(1);\n      }\n      isl_val_free(size);\n    } else {\n      buf->n_lane = cur_n_lane;\n    }    \n    /* Update the sparse information */\n    buf->sparse = 1;\n    buf->vec_len = kernel->vec_len;\n  }\n  isl_union_map_free(sizes);\n  free(data_pack_ubs);\n\n  return isl_stat_ok;\n}\n\n/* Select the data pack factor for I/O buffers. The data pack factor\n * should be sub-multiples of the last dimension of the local array.\n * Meanwhile, it should also be sub-multiples of the data pack factors \n * selected for the upper-level I/O buffers.\n * \n * If SIMD vectorization is enabled, and the data stored in the I/O buffer is \n * to be vectorized, the data pack factor should also be multiples of the SIMD factor.\n */\nstatic isl_stat compute_io_group_data_pack(struct autosa_kernel *kernel,\n                                           struct autosa_array_ref_group *group,\n                                           struct autosa_gen *gen,\n                                           int max_n_lane)\n{\n  isl_schedule_node *node;\n  isl_union_map *sizes;\n  isl_val *size;\n  int *data_pack_ubs = NULL;\n  struct update_group_simd_data data;\n  int ele_size = group->array->size; // bytes\n  /* Given the maximal DRAM port width as 64 Bytes, \n   * compute the maximal data pack factor. */\n  if (max_n_lane == -1)\n    max_n_lane = 64 / ele_size;\n  /* Parse the data pack settings. */\n  /* For L1 buffers, we restrain the fifo widths to be no more than 256 bits \n   * given hardware consideration (on Xilinx). \n   * Specifically, for FIFOs with depth * width > 512bits, HLS will \n   * use BRAM/SRL to implement FIFOs, which could potentially increase \n   * the BRAM/LUT usage by a great scale and cause routing failure.\n   * \n   * Furthermore, for L1 buffers reside at the io_L1 level (beside PEs), we \n   * furtehr restrain the FIFO widths to be no more than 64 bits to mitigate \n   * the potential routing congestion.\n   */  \n  sizes = extract_sizes_from_str(gen->ctx, gen->options->autosa->data_pack_sizes);  \n  data_pack_ubs = read_data_pack_sizes_array(sizes, group->array->name);  \n  if (!data_pack_ubs)\n  {\n    /* Use the default numbers. */\n    data_pack_ubs = isl_alloc_array(gen->ctx, int, 3);\n    data_pack_ubs[0] = 16;\n    //data_pack_ubs[1] = 32;\n    data_pack_ubs[1] = 64;\n    data_pack_ubs[2] = 64;\n  }\n  //std::cout << data_pack_ubs[0] << std::endl;\n  //std::cout << data_pack_ubs[1] << std::endl;\n  //std::cout << data_pack_ubs[2] << std::endl;\n\n  /* Examine if any of the array reference in the group is in used by SIMD loop.\n   * The default SIMD lane for the group is 1. \n   * If any of the array references in the group is under the SIMD loop, and \n   * if the stride of reference under the loop is one. The SIMD lane of the \n   * group is then updated to the SIMD lane of the loop.\n   */\n  group->n_lane = 1;\n  node = isl_schedule_get_root(kernel->schedule);\n  data.group = group;\n  data.kernel = kernel;\n  data.updated = 0;\n  isl_schedule_node_foreach_descendant_top_down(node, &update_group_simd, &data);\n  isl_schedule_node_free(node);\n\n  if (gen->options->autosa->tuning_method == 1) {    \n    /* Update the data packing factor */\n    for (int i = 0; i < group->io_level; i++) {\n      struct autosa_io_buffer *buf = group->io_buffers[i];\n      if (buf->tuning_tile && buf->tuning_tile->data_pack_factor_inter == NULL) {        \n        /* Inter */\n        class TPParameter *dp = new TPParameter(\"p\" + std::to_string(kernel->tuning_program->params.size()));\n        dp->tune = false;\n        dp->attr = \"data_pack_factor\";\n        dp->tags.insert(\"auto_infer\");\n        dp->tags.insert(\"power_of_two\");\n        /* Update the bounds */\n        /* lb */\n        if (data.updated == 0) {          \n          dp->bounds.push_back(std::make_shared<TPExpr>(\"literal\", new TPConst(1)));\n        } else {\n          /* Find the SIMD tiling factor */\n          for (auto param : kernel->tuning_program->params) {\n            if (param->attr == \"SIMD_tiling_factor\") {              \n              dp->bounds.push_back(std::make_shared<TPExpr>(\"literal\", param->dup()));\n              dp->multiples.push_back(std::make_shared<TPExpr>(\"literal\", param->dup()));\n            }\n          }\n        }\n        /* ub */\n        int user_max_n_lane;\n        if (i == 0)\n          user_max_n_lane = data_pack_ubs[0] / ele_size;\n        else if (i > 0 && i < group->io_level - 1)\n          user_max_n_lane = data_pack_ubs[1] / ele_size;\n        else\n          user_max_n_lane = data_pack_ubs[2] / ele_size;\n        TPExpr *ub = buf->tuning_tile->sizes[buf->tuning_tile->sizes.size() - 1]->dup();\n        ub = ub->min(new TPExpr(\"literal\", new TPConst(user_max_n_lane)));\n        ub = ub->max(dp->bounds[0]->dup());\n        dp->bounds.push_back(std::shared_ptr<TPExpr>(ub));        \n        dp->divisors.push_back(std::shared_ptr<TPExpr>(buf->tuning_tile->sizes[buf->tuning_tile->sizes.size() - 1]->dup()));\n        assert(dp->bounds.size() == 2);    \n        buf->tuning_tile->data_pack_factor_inter = dp;\n        kernel->tuning_program->params.push_back(dp);\n        kernel->tuning_program->param_map[dp->name] = dp;\n\n        /* Intra */\n        if (data.updated == 0) {\n          buf->tuning_tile->data_pack_factor_intra = std::make_shared<TPExpr>(\"literal\", new TPConst(1));          \n        } else {\n          /* Find the SIMD tiling factor */\n          for (auto param : kernel->tuning_program->params) {\n            if (param->attr == \"SIMD_tiling_factor\") {              \n              buf->tuning_tile->data_pack_factor_intra = std::make_shared<TPExpr>(\"literal\", param->dup());              \n            }\n          }\n        }\n\n        break;\n      }\n    }            \n  }\n\n  if (max_n_lane % group->n_lane != 0)\n  {\n    printf(\"[AutoSA] Error: The data is not aligned to the DRAM port. Abort!\\n\");\n    printf(\"[AutoSA] Please try to use a SIMD factor as sub-multiples of %d.\\n\", max_n_lane);\n    exit(1);\n  }\n\n  /* If data packing is disabled, simply update the data packing factor of \n   * each I/O buffer to the SIMD lanes that are required.\n   */\n  if (!gen->options->autosa->data_pack)\n  {\n    for (int i = 0; i < group->io_level; i++)\n    {\n      struct autosa_io_buffer *buf = group->io_buffers[i];\n      buf->n_lane = group->n_lane;\n    }\n    return isl_stat_ok;\n  }\n\n  int cur_n_lane = group->n_lane;\n  int status = false;\n  int cur_max_n_lane;\n  for (int i = 0; i < group->io_level; i++)\n  {\n    struct autosa_io_buffer *buf = group->io_buffers[i];\n    if (i == 0)\n      cur_max_n_lane = std::max(group->n_lane, data_pack_ubs[0] / ele_size);\n    else if (i > 0 && i < group->io_level - 1)\n      cur_max_n_lane = std::max(group->n_lane, data_pack_ubs[1] / ele_size);\n    else\n      cur_max_n_lane = std::max(group->n_lane, data_pack_ubs[2] / ele_size);\n    if (buf->tile && group->array->n_index > 0)\n    {      \n      size = isl_val_copy(buf->tile->bound[group->array->n_index - 1].size);\ncompute_data_pack:      \n      int n_lane = cur_n_lane;\n      while (n_lane <= cur_max_n_lane)\n      {\n        /* The lane should be multiples of SIMD lane. */\n        if (n_lane % group->n_lane == 0)\n        {\n          isl_val *val = isl_val_int_from_si(gen->ctx, n_lane);\n          /* The lane should be sub-multiples of the last dim of the array. */\n          if (isl_val_is_divisible_by(size, val))\n          {\n            cur_n_lane = n_lane;\n            status = true;\n          }\n          isl_val_free(val);\n        }\n        n_lane = n_lane * 2;\n      }\n      if (status)\n      {\n        buf->n_lane = cur_n_lane;\n      }\n      else\n      {\n        printf(\"[AutoSA] Error: Cannot find data pack factors as sub-multiples of the last dim of the local array. Abort!\\n\");\n        printf(\"[AutoSA] Please try to use different tiling factors.\\n\");\n        exit(1);\n      }\n      isl_val_free(size);      \n    } else if (i == group->io_level - 1 && !gen->options->autosa->host_serialize) {\n      /* If it is the outermost loop, try to extend the data packing factor again. \n       * If the host serialization is enabled, as there is a re-packing later.\n       * We won't do anything here. \n       */\n      /* Locate the next buffer. */            \n      struct autosa_io_buffer *nxt_buf;\n      for (int j = i; j >= 0; j--) {\n        nxt_buf = group->io_buffers[j];\n        if (nxt_buf->tile) \n          break;                  \n      }\n      if (nxt_buf->tile) {        \n        size = isl_val_copy(nxt_buf->tile->bound[group->array->n_index - 1].size);\n        goto compute_data_pack;\n      }        \n    } else\n    {\n      buf->n_lane = cur_n_lane;\n    }\n  }\n  isl_union_map_free(sizes);\n  free(data_pack_ubs);\n\n  return isl_stat_ok;\n}\n\n/* Lift up the L1 I/O buffer between the paralle loops and non-parallel loops\n * in the array loop band.\n * If there is no array loop band. Lift up the L1 I/O buffer above the array mark.\n */\nstatic isl_stat hoist_L1_io_buffer_local_reduce(\n  struct autosa_kernel *kernel,\n  struct autosa_array_ref_group *group,\n  struct autosa_gen *gen,\n  struct autosa_group_data *data)\n{\n  struct autosa_io_buffer *cur_buffer;\n  isl_schedule_node *node, *node_cp;\n  int n;\n\n  /* Find the L1 buffer. */\n  for (int i = 1; i <= group->io_level; i++) \n  {\n    cur_buffer = group->io_buffers[i - 1];\n    if (cur_buffer->tile)\n      break;\n  }\n\n  autosa_array_tile_free(cur_buffer->tile);\n  node = isl_schedule_get_root(group->io_schedule);\n  node = autosa_tree_move_down_to_io_mark(node, kernel->core, cur_buffer->level);\n  node = insert_io_module_ids(gen, kernel, node, group->space_dim, cur_buffer->level);\n  node = autosa_tree_move_up_to_array(node);\n\n  if (kernel->array_part_w > 0) {\n    int pos = 0;\n    node = isl_schedule_node_parent(node);\n    n = isl_schedule_node_band_n_member(node);\n    for (pos = n - 1; pos >= 0; pos--)\n    {\n      if (isl_schedule_node_band_member_get_coincident(node, pos))\n        break;\n    }\n    if (pos == n - 1) {\n      node = isl_schedule_node_child(node, 0);\n    } else {\n      node = isl_schedule_node_band_split(node, pos + 1);\n      node = isl_schedule_node_child(node, 0);      \n    }\n  } \n  \n  if (group->group_type == AUTOSA_DRAIN_GROUP)\n    compute_group_bounds_drain_at_node(kernel, group, node, cur_buffer);\n  else if (group->group_type == AUTOSA_IO_GROUP)\n    compute_group_bounds_io_at_node(kernel, group, node, cur_buffer);\n  autosa_array_ref_group_compute_tiling(cur_buffer->tile, group);\n  \n  return isl_stat_ok;\n}\n\nstruct update_int_io_L1_buffer_data {\n  struct autosa_array_ref_group *group;  \n  struct autosa_kernel *kernel;\n  bool inserted;\n  bool tile_computed;\n  int depth;\n};\n\nstatic __isl_give isl_schedule_node *update_int_io_L1_depth(__isl_take isl_schedule_node *node, void *user)\n{\n  struct update_int_io_L1_buffer_data *data = (struct update_int_io_L1_buffer_data *)user;\n  int under_simd, n;\n  struct autosa_array_ref_group *group;\n  isl_schedule_node *insert_node = NULL;  \n  isl_union_set *domain;\n  int is_carried = 0;\n\n  if (data->inserted)\n    return node;\n  /* Examine if the node is under the SIMD mark */\n  under_simd = is_node_under_simd(node);\n  if (under_simd)\n    return node;\n  \n  if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n    return node;\n\n  domain = isl_schedule_node_get_domain(node);\n  if (isl_union_set_is_empty(domain)) {\n    isl_union_set_free(domain);\n    return node;\n  }\n  isl_union_set_free(domain);\n\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif\n\n  n = isl_schedule_node_band_n_member(node);\n  /* Examine if the dependences of the current I/O group are carreid by the current band. */\n  group = data->group;\n  for (int i = 0; i < n; i++) {\n    isl_schedule_node *node_tmp = isl_schedule_node_copy(node);\n    if (n > 1) {\n      if (i > 0) {\n        node_tmp = isl_schedule_node_band_split(node_tmp, i);\n        node_tmp = isl_schedule_node_child(node_tmp, 0);\n      }\n      if (n - i - 1 > 0) {\n        node_tmp = isl_schedule_node_band_split(node_tmp, 1);\n      }\n    }\n\n    for (int j = 0; j < group->n_ref; j++) {\n      struct autosa_stmt_access *ref = group->refs[j];\n      for (int k = 0; k < ref->n_io_info; k++) {\n        struct autosa_io_info *io_info = ref->io_info[k];\n        if (io_info->io_type == group->io_type && \n            !isl_vec_cmp(io_info->dir, group->dir)) {\n          if (is_dep_carried_by_node(io_info->dep->isl_dep, node_tmp)) {\n            ///* Insert the I/O buffer below the current node */\n            //insert_node = isl_schedule_node_copy(node_tmp);\n            //insert_node = isl_schedule_node_child(insert_node, 0);\n            is_carried = 1;\n            break;\n          }\n        }\n      }\n      if (is_carried)\n        break;      \n    }\n\n    if (is_carried) {\n      insert_node = isl_schedule_node_copy(node_tmp);\n      //insert_node = isl_schedule_node_child(insert_node, 0);\n      isl_schedule_node_free(node_tmp);\n      break;\n    }\n\n    isl_schedule_node_free(node_tmp);\n  }\n\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, insert_node, isl_schedule_node_get_ctx(insert_node));\n//#endif\n\n  if (insert_node) {\n    data->depth = isl_schedule_node_get_schedule_depth(insert_node);\n    data->inserted = true;\n    isl_schedule_node_free(insert_node);\n  }\n  \n  return node;\n}\n\nstatic __isl_give isl_schedule_node *update_int_io_L1_buffer(\n  __isl_take isl_schedule_node *node, void *user)\n{\n  struct update_int_io_L1_buffer_data *data = (struct update_int_io_L1_buffer_data *)user;\n  int under_simd;\n  isl_union_set *domain;\n  struct autosa_array_ref_group *group;\n\n  ///* Examine if the node is under the SIMD mark */\n  //under_simd = is_node_under_simd(node);\n  //if (under_simd)\n  //  return node;\n\n  if (data->tile_computed)\n    return node;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n    return node;\n  \n  domain = isl_schedule_node_get_domain(node);\n  if (isl_union_set_is_empty(domain)) {\n    isl_union_set_free(domain);\n    return node;\n  }\n  isl_union_set_free(domain);\n\n  if (isl_schedule_node_get_schedule_depth(node) < data->depth) {\n    /* Check the child node */\n    node = isl_schedule_node_child(node, 0);\n  }\n\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif\n\n  if (isl_schedule_node_get_schedule_depth(node) == data->depth) {\n    /* Find the L1 buffer */\n    struct autosa_io_buffer *cur_buffer;\n    group = data->group;\n    for (int i = 1; i < group->io_level; i++) {\n      cur_buffer = group->io_buffers[i - 1];\n      if (cur_buffer->tile)\n        break;\n    }\n\n    autosa_array_tile_free(cur_buffer->tile);\n    if (group->group_type == AUTOSA_DRAIN_GROUP)\n      compute_group_bounds_drain_at_node(data->kernel, group, node, cur_buffer);\n    else if (group->group_type == AUTOSA_IO_GROUP)\n      compute_group_bounds_io_at_node(data->kernel, group, node, cur_buffer);\n    autosa_array_ref_group_compute_tiling(cur_buffer->tile, group);\n\n    data->tile_computed = true;\n  }\n\n  return node;\n}\n\n//static __isl_give isl_schedule_node *update_int_io_L1_buffer(__isl_take isl_schedule_node *node, void *user)\n//{\n//  struct update_int_io_L1_buffer_data *data = (struct update_int_io_L1_buffer_data *)user;\n//  int under_simd, n;\n//  struct autosa_array_ref_group *group;\n//  isl_schedule_node *insert_node = NULL;  \n//  isl_union_set *domain;\n//  int is_carried = 0;\n//\n//  if (data->inserted)\n//    return node;\n//  /* Examine if the node is under the SIMD mark */\n//  under_simd = is_node_under_simd(node);\n//  if (under_simd)\n//    return node;\n//  \n//  if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n//    return node;\n//\n//  domain = isl_schedule_node_get_domain(node);\n//  if (isl_union_set_is_empty(domain)) {\n//    isl_union_set_free(domain);\n//    return node;\n//  }\n//  isl_union_set_free(domain);\n//\n//  n = isl_schedule_node_band_n_member(node);\n//  /* Examine if the dependences of the current I/O group are carreid by the current band. */\n//  group = data->group;\n//  for (int i = 0; i < n; i++) {\n//    isl_schedule_node *node_tmp = isl_schedule_node_copy(node);\n//    if (n > 1) {\n//      if (i > 0) {\n//        node_tmp = isl_schedule_node_band_split(node_tmp, i);\n//        node_tmp = isl_schedule_node_child(node_tmp, 0);\n//      }\n//      if (n - i - 1 > 0) {\n//        node_tmp = isl_schedule_node_band_split(node_tmp, 1);\n//      }\n//    }\n//\n//    for (int j = 0; j < group->n_ref; j++) {\n//      struct autosa_stmt_access *ref = group->refs[j];\n//      for (int k = 0; k < ref->n_io_info; k++) {\n//        struct autosa_io_info *io_info = ref->io_info[k];\n//        if (io_info->io_type == group->io_type && \n//            !isl_vec_cmp(io_info->dir, group->dir)) {\n//          if (is_dep_carried_by_node(io_info->dep->isl_dep, node_tmp)) {\n//            ///* Insert the I/O buffer below the current node */\n//            //insert_node = isl_schedule_node_copy(node_tmp);\n//            //insert_node = isl_schedule_node_child(insert_node, 0);\n//            is_carried = 1;\n//            break;\n//          }\n//        }\n//      }\n//      if (is_carried)\n//        break;      \n//    }\n//\n//    if (!is_carried) {\n//      insert_node = isl_schedule_node_copy(node_tmp);\n//      insert_node = isl_schedule_node_child(insert_node, 0);\n//      break;\n//    }\n//\n//    isl_schedule_node_free(node_tmp);\n//  }\n//\n//  if (insert_node) {      \n////#ifdef _DEBUG\n////    DBGSCHDNODE(stdout, insert_node, isl_schedule_node_get_ctx(insert_node));\n////#endif\n//\n//    /* Find the L1 buffer */\n//    struct autosa_io_buffer *cur_buffer;\n//    for (int i = 1; i < group->io_level; i++) {\n//      cur_buffer = group->io_buffers[i - 1];\n//      if (cur_buffer->tile)\n//        break;\n//    }\n//    autosa_array_tile_free(cur_buffer->tile);\n//    if (group->group_type == AUTOSA_DRAIN_GROUP)\n//      compute_group_bounds_drain_at_node(data->kernel, group, insert_node, cur_buffer);\n//    else if (group->group_type == AUTOSA_IO_GROUP)\n//      compute_group_bounds_io_at_node(data->kernel, group, insert_node, cur_buffer);\n//    autosa_array_ref_group_compute_tiling(cur_buffer->tile, group);\n//\n////#ifdef _DEBUG    \n////    printf(\"%d\\n\", cur_buffer->tile->depth);\n////#endif\n//\n//    isl_schedule_node_free(insert_node);\n//    data->inserted = true;\n//  }\n//  \n//  return node;\n//}\n\nstatic __isl_give isl_schedule_node *insert_io_L1_mark(\n  __isl_take isl_schedule_node *node, void *user)\n{\n  int *depth = (int *)user;\n\n  if (isl_schedule_node_get_schedule_depth(node) == *depth && \n      isl_schedule_node_get_type(node) == isl_schedule_node_band) \n  {\n    isl_id *id;\n    id = isl_id_alloc(isl_schedule_node_get_ctx(node), \"io_L1\", NULL);\n    node = isl_schedule_node_child(node, 0);\n    node = isl_schedule_node_insert_mark(node, id);\n    node = isl_schedule_node_parent(node);\n  }\n\n  return node;\n}\n\n/* This function generates a new io schedule when the L1 IO buffer is lowered.\n * Specifically, the L1 io band node with its mark node will be sunk to schedule\n * depth of (depth - 1). \n * This function assume that the entire schedule tree is fully permutable. \n * The legality should be checked before calling this function.\n */\nstatic __isl_give isl_schedule *generate_io_L1_lower_schedule(\n  __isl_keep isl_schedule *schedule,\n  struct autosa_kernel *kernel,\n  int depth)\n{\n  isl_schedule_node *node;\n  isl_schedule *new_schedule;\n\n  new_schedule = isl_schedule_dup(schedule);\n  node = isl_schedule_get_root(new_schedule);\n  isl_schedule_free(new_schedule);\n\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif\n\n  node = autosa_tree_move_down_to_io_mark(node, kernel->core, 1);\n  node = isl_schedule_node_delete(node);\n  node = isl_schedule_node_parent(node);\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif\n  /* Sink the L1 band to (depth - 1) */\n  node = autosa_node_sink_to_depth(node, depth - 1);\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif\n  /* Insert the io_L1 mark */\n  int depth_inc = depth - 1;\n  node = isl_schedule_node_map_descendant_bottom_up(node, &insert_io_L1_mark, &depth_inc);\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif\n\n  new_schedule = isl_schedule_node_get_schedule(node);\n  isl_schedule_node_free(node);\n  return new_schedule;\n}\n\n/* This function tries to lower the L1 buffer for the interior I/O module (for external array)\n * to help reduce the memory resource usage.\n * \n * It first checks if the I/O group is with the interior I/O, and if the array is\n * an external array.\n * If so, one L1 I/O buffer is allocated by default. \n * Next, it examines if there is at least one parallel loop (independent of the \n * reuse dependence) from innermost. L1 buffer will be lowered to the boundary\n * between the non-parallel and parallel loops.\n */\nstatic isl_stat lower_int_io_L1_buffer(\n  struct autosa_kernel *kernel,\n  struct autosa_array_ref_group *group,\n  struct autosa_gen *gen)\n{\n  if (!(group->io_type == AUTOSA_INT_IO && group->local_array->array_type == AUTOSA_EXT_ARRAY))\n    return isl_stat_ok;\n\n  isl_schedule_node *node;\n  struct update_int_io_L1_buffer_data data = {group, kernel, false, false, -1};\n\n  node = isl_schedule_get_root(group->io_schedule);\n  /* Insert the domain filter for the current I/O group */\n  node = autosa_tree_move_down_to_kernel(node);\n  /* This function only works for copy-in modules */\n  node = insert_io_group_domain(node, group, kernel, gen, 1);  \n\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, gen->ctx);\n//  //printf(\"%s\\n\", group->array->name);\n//#endif\n  /* Update the depth to insert the buffer */\n  node = isl_schedule_node_map_descendant_bottom_up(node, &update_int_io_L1_depth, &data);\n  isl_schedule_node_free(node);\n  \n  if (data.inserted) {\n    /* Generate the new I/O schedule */\n    group->io_L1_lower_schedule = \n      generate_io_L1_lower_schedule(group->io_schedule, kernel, data.depth);\n    /* Update the L1 buffer */\n    node = isl_schedule_get_root(group->io_L1_lower_schedule);    \n    node = isl_schedule_node_map_descendant_bottom_up(node, &update_int_io_L1_buffer, &data);\n    isl_schedule_node_free(node);\n  }\n\n  return isl_stat_ok;\n}\n\n/* This function is used when lower IO L1 buffer is enabled.\n * An extra second-level buffer is inserted to increase the effective DRAM BW.\n */\nstatic isl_stat insert_L2_io_buffer(\n  struct autosa_kernel *kernel,\n  struct autosa_array_ref_group *group,\n  struct autosa_gen *gen\n){\n  if (!(group->io_type == AUTOSA_INT_IO && group->local_array->array_type == AUTOSA_EXT_ARRAY))\n    return isl_stat_ok;\n\n  isl_schedule_node *node;\n  struct autosa_io_buffer *buffer;\n\n  node = isl_schedule_get_root(group->io_L1_lower_schedule);\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  buffer = group->io_buffers[group->io_level - 1];\n  if (group->group_type == AUTOSA_DRAIN_GROUP)\n    compute_group_bounds_drain_at_node(kernel, group, node, buffer);\n  else if (group->group_type == AUTOSA_IO_GROUP)\n    compute_group_bounds_io_at_node(kernel, group, node, buffer);\n\n  autosa_array_ref_group_compute_tiling(buffer->tile, group);\n  isl_schedule_node_free(node);\n\n  return isl_stat_ok;\n}\n\n/* This function hoists the L1 I/O buffer to save the data communication.\n * It tries to hoist up the buffer if the local buffer size is irrelavant to the outer loop.\n */\nstatic isl_stat hoist_L1_io_buffer(\n  struct autosa_kernel *kernel, \n  struct autosa_array_ref_group *group,\n  struct autosa_gen *gen,\n  struct autosa_group_data *data  \n) {\n  struct autosa_io_buffer *cur_buffer;\n  int io_level = group->io_level;\n  isl_schedule_node *node, *node_cp;\n  int n, i;  \n  std::vector<isl_val *> cur_dims;\n  std::vector<isl_val *> prev_dims;\n  isl_union_set *L1_io_buffer_domain = NULL;\n  int L1_io_buffer_depth = -1;\n  \n  struct autosa_array_tile *cur_tile;\n\n  for (int i = io_level; i >= 1; i--) {\n    cur_buffer = group->io_buffers[i - 1];\n    if (cur_buffer->tile)\n      break;\n  }\n\n  for (int i = 0; i < cur_buffer->tile->n; i++) {\n    prev_dims.push_back(cur_buffer->tile->bound[i].size);\n  }    \n  cur_tile = cur_buffer->tile;\n\n  node = isl_schedule_get_root(group->io_schedule);\n  //DBGSCHDNODE(stdout, node, gen->ctx);  \n  /* Insert the filter ids. */\n  node = autosa_tree_move_down_to_io_mark(node, kernel->core, cur_buffer->level);  \n  node = insert_io_module_ids(gen, kernel, node, group->space_dim, cur_buffer->level);  \n  node = autosa_tree_move_up_to_array(node);\n  node = isl_schedule_node_parent(node);\n  //DBGSCHDNODE(stdout, node, gen->ctx);  \n  n = isl_schedule_node_band_n_member(node);  \n  for (i = n - 1; i > 0; i--) {\n    node_cp = isl_schedule_node_copy(node);\n    node_cp = isl_schedule_node_band_split(node_cp, i);\n    node_cp = isl_schedule_node_child(node_cp, 0);\n    if (group->group_type == AUTOSA_DRAIN_GROUP)\n      compute_group_bounds_drain_at_node(kernel, group, node_cp, cur_buffer);\n    else if (group->group_type == AUTOSA_IO_GROUP)\n      compute_group_bounds_io_at_node(kernel, group, node_cp, cur_buffer);\n    autosa_array_ref_group_compute_tiling(cur_buffer->tile, group);\n    /* Test if the last dim is changed. */    \n    bool is_equal = true;\n    for (int d = 0; d < cur_buffer->tile->n; d++) {\n      if (!isl_val_eq(cur_buffer->tile->bound[d].size, prev_dims[d])) {\n        //DBGVAL(stdout, cur_buffer->tile->bound[d].size, gen->ctx);\n        //DBGVAL(stdout, prev_dims[d], gen->ctx);\n        is_equal = false;\n        break;\n      }\n    }    \n    autosa_array_tile_free(cur_buffer->tile);    \n    if (!is_equal) {            \n      isl_schedule_node_free(node_cp);\n      break;\n    } else {      \n      L1_io_buffer_depth = isl_schedule_node_get_schedule_depth(node_cp);\n      L1_io_buffer_domain = isl_union_set_free(L1_io_buffer_domain);\n      /* Compute the domain. */      \n      isl_union_map *partial = isl_schedule_node_band_get_partial_schedule_union_map(node_cp);\n      /* Delete the module id filter */\n      node_cp = autosa_tree_move_up_to_kernel(node_cp);\n      node_cp = isl_schedule_node_child(node_cp, 0); \n      node_cp = isl_schedule_node_child(node_cp, 0); \n      node_cp = isl_schedule_node_delete(node_cp);\n      node_cp = autosa_tree_move_down_to_array(node_cp, kernel->core);\n      node_cp = isl_schedule_node_parent(node_cp);\n      isl_union_set *domain = isl_schedule_node_get_domain(node_cp);\n      partial = isl_union_map_intersect_domain(partial, domain);\n      isl_union_set *range = isl_union_map_range(isl_union_map_copy(partial));      \n      range = isl_union_set_lexmin(range);      \n      partial = isl_union_map_intersect_range(partial, range);      \n      L1_io_buffer_domain = isl_union_map_domain(partial);\n      isl_schedule_node_free(node_cp);\n    }\n  }  \n  isl_schedule_node_free(node);\n  cur_buffer->tile = cur_tile;\n  cur_buffer->hoist_depth = L1_io_buffer_depth;\n  cur_buffer->hoist_domain = L1_io_buffer_domain;\n\n  return isl_stat_ok;\n}\n\n/* This function tries to hoist the L2 I/O buffer to increase the memory \n * coelescing. \n * \n * Specifically, we will start from the original position where the L2 buffer\n * in inserted. We will compare if the last dimension of the L2 buffer is \n * larger than the last dimension of the L1 buffer.\n * If not, we will try to hoist the L2 buffer until the last dimension is increased.\n * \n * If we could not increase the last dimension, we will reallocate the L2 buffer\n * at the outermost I/O level. And try to hoist up the buffer if the local \n * buffer size is irrelevant to the outer loop. This helps save the communication.\n * \n * If the buffer location is not changed, we will last check if the last dimension\n * of the array can be packed as multiples of 512 bits. Since the maximal DRAM\n * port width is 512 bits.\n * This is helpful because on Xilinx FPGAs, we limit the maximal on-chip fifo \n * width to 256 bits. Repacking the data to 512 bits at the L2 I/O buffer \n * could help improve the effective DRAM bandwidth.\n *\n * If it is not a multiple of 512 bits, there is no benefit overall to generate\n * L2 I/O buffers. In this case, we will free up the L2 I/O buffer. \n * No L2 I/O buffer is generated.\n */\nstatic isl_stat hoist_L2_io_buffer(\n  struct autosa_kernel *kernel,\n  struct autosa_array_ref_group *group, \n  struct autosa_gen *gen,\n  struct autosa_group_data *data)\n{\n  struct autosa_io_buffer *cur_buffer, *nxt_buffer;\n  int io_level = group->io_level;\n  bool is_last_dim_equal = false;\n  isl_val *cur_last_dim, *nxt_last_dim;\n  isl_schedule_node *node, *node_cp;\n  int i, n;\n  int old_depth, new_depth;\n\n  cur_buffer = group->io_buffers[io_level - 1];\n  for (int i = io_level - 1; i >= 1; i--)\n  {\n    nxt_buffer = group->io_buffers[i - 1];\n    if (nxt_buffer->tile)\n      break;\n  }\n\n  /* Compare if the last dimension of the current buffer\n   * and the next buffer equals.\n   */\n  cur_last_dim = cur_buffer->tile->bound[cur_buffer->tile->n - 1].size;\n  nxt_last_dim = nxt_buffer->tile->bound[nxt_buffer->tile->n - 1].size;\n  is_last_dim_equal = isl_val_eq(cur_last_dim, nxt_last_dim);\n\n  if (is_last_dim_equal)\n  {\n    /* Try to hoist the io buffer until the last dimenison is increased. */\n    autosa_array_tile_free(cur_buffer->tile);\n    node = isl_schedule_get_root(group->io_schedule);\n    /* Insert the filter ids. */\n    node = autosa_tree_move_down_to_io_mark(node, kernel->core, io_level);\n    node = insert_io_module_ids(gen, kernel, node, group->space_dim, io_level);    \n    node = autosa_tree_move_up_to_array(node);    \n    node = isl_schedule_node_parent(node);\n    n = isl_schedule_node_band_n_member(node);\n    for (i = n - 1; i > 0; i--)\n    {\n      node_cp = isl_schedule_node_copy(node);\n      node_cp = isl_schedule_node_band_split(node_cp, i);\n      node_cp = isl_schedule_node_child(node_cp, 0);\n      if (group->group_type == AUTOSA_DRAIN_GROUP)\n        compute_group_bounds_drain_at_node(kernel, group, node_cp, cur_buffer);\n      else if (group->group_type == AUTOSA_IO_GROUP)\n        compute_group_bounds_io_at_node(kernel, group, node_cp, cur_buffer);\n      autosa_array_ref_group_compute_tiling(cur_buffer->tile, group);\n      /* Test if the last dim is increased. */\n      cur_last_dim = cur_buffer->tile->bound[cur_buffer->tile->n - 1].size;      \n      is_last_dim_equal = isl_val_eq(cur_last_dim, nxt_last_dim);\n      isl_schedule_node_free(node_cp);\n      if (!is_last_dim_equal)\n      {\n        break;\n      }\n      autosa_array_tile_free(cur_buffer->tile);\n    }\n    if (i == 0)\n    {\n      /* In this case, none of the second level array part loops helps \n       * increase the burst length. We will allocate the buffer again \n       * at the innermost array_L2 loop and try to hoist up the buffer \n       * to save the communication. \n       */\n      int old_depth, new_depth;\n      node = isl_schedule_node_child(node, 0);\n      if (group->group_type == AUTOSA_DRAIN_GROUP)\n        compute_group_bounds_drain_at_node(kernel, group, node, cur_buffer);\n      else if (group->group_type == AUTOSA_IO_GROUP)\n        compute_group_bounds_io_at_node(kernel, group, node, cur_buffer);\n      autosa_array_ref_group_compute_tiling(cur_buffer->tile, group);\n    }\n    isl_schedule_node_free(node);\n  }\n  /* Test if the buffer position could be further hoisted. */\n  old_depth = cur_buffer->tile->depth;\n  tile_set_depth(data, cur_buffer->tile);\n  new_depth = cur_buffer->tile->depth;\n  if (is_last_dim_equal && new_depth == old_depth)\n  {\n    /* In this case, the buffer couldn't be hosited up, and it doesn't \n     * increase the burst length. \n     * We will test if the last dimension is a multiple of 512 bits (64 bytes).\n     */\n    cur_last_dim = cur_buffer->tile->bound[cur_buffer->tile->n - 1].size;\n    long dim_val = isl_val_get_num_si(cur_last_dim);\n    if ((dim_val * group->array->size) % 64 != 0)\n    {\n      /*There is no benefit to generate the \n       * second-level buffer. We will free up the tile.\n       */\n      autosa_array_tile_free(cur_buffer->tile);\n      cur_buffer->tile = NULL;\n    }\n  }\n  else\n  {\n    if (new_depth != old_depth)\n    {\n      isl_multi_aff_free(cur_buffer->tile->tiling);\n      autosa_array_ref_group_compute_tiling(cur_buffer->tile, group);\n    }\n  }\n\n  return isl_stat_ok;\n}\n\n/* Return the prefix I/O schedule at io_level \"level\". */\nstatic __isl_give isl_union_map *get_io_schedule_at_level(\n    __isl_keep isl_schedule *sched, int level)\n{\n  isl_schedule_node *node;\n  struct autosa_kernel *kernel;\n  isl_id *id;\n  isl_union_map *io_sched;\n\n  node = isl_schedule_get_root(sched);\n  node = autosa_tree_move_down_to_kernel(node);\n  id = isl_schedule_node_mark_get_id(node);\n  kernel = (struct autosa_kernel *)isl_id_get_user(id);\n  isl_id_free(id);\n  node = autosa_tree_move_down_to_io_mark(node, kernel->core, level);\n  io_sched = prefix_with_equalities(node);\n  io_sched = expand(io_sched, kernel->contraction);\n  isl_schedule_node_free(node);\n\n  return io_sched;\n}\n\n/* Map the domain of \"access\" to the outer data->local_depth\n * schedule dimensions.   \n */\nstatic __isl_give isl_map *local_access_io(struct autosa_array_ref_group *group,\n                                           __isl_keep isl_union_map *access, struct autosa_group_data *data)\n{\n  isl_union_map *local;\n  local = isl_union_map_copy(access);\n\n  if (group->io_type == AUTOSA_EXT_IO)\n  {\n    /* Group at the IO_L2 level */\n    isl_union_map *new_sched = get_io_schedule_at_level(group->io_schedule, 2);\n    local = isl_union_map_apply_domain(local,\n                                       new_sched);\n  }\n  else if (group->io_type == AUTOSA_INT_IO)\n  {\n    /* Group at the IO_L1 level. */\n    isl_union_map *new_sched = get_io_schedule_at_level(group->io_schedule, 1);\n    local = isl_union_map_apply_domain(local,\n                                       new_sched);\n  }\n  return isl_map_from_union_map(local);\n}\n\n/* Compute the local memory tiles for the array reference group \"group\"\n * of array \"array\". Return isl_stat_ok on success and isl_stat_error on error.\n *\n * If the array is a read-only scalar or if the user requested not to use \n * local emory, then we do not need to do anything.\n *\n * For interior I/O group, the tiling is computed at the io_L1 level.\n * For exteriro I/O group, the tiling is computed at the io_L2 level.\n */\nstatic isl_stat compute_group_bounds_core_io(struct autosa_kernel *kernel,\n                                             struct autosa_array_ref_group *group,\n                                             struct autosa_group_data *data)\n{\n  isl_ctx *ctx = isl_space_get_ctx(group->array->space);\n  int use_local = kernel->options->autosa->use_local_memory;\n  isl_stat r = isl_stat_ok;\n  isl_union_map *access;\n  isl_map *acc;\n  isl_bool ok;\n\n  if (!use_local)\n    return isl_stat_ok;\n  if (autosa_array_is_read_only_scalar(group->array))\n    return isl_stat_ok;\n  if (!group->exact_write)\n    return isl_stat_ok;\n  if (group->slice)\n    return isl_stat_ok;\n\n  /* Collect all accesses in the group. \n   * TODO: Overapproximation */\n  access = autosa_array_ref_group_access_relation(group, 1, 1);\n  /* Create local tile */\n  if (use_local)\n  {\n    /* Create a tile. */\n    group->local_tile = autosa_array_tile_create(ctx,\n                                                 group->array->n_index);\n    /* Map the domain to the outer scheduling dimensions. */\n    acc = local_access_io(group, access, data);\n    /* Collect the shift and scale factors of the tile. */\n    ok = can_tile(acc, group->local_tile);\n    if (ok < 0)\n      r = isl_stat_error;\n    else if (!ok)\n      group->local_tile =\n          autosa_array_tile_free(group->local_tile);\n    isl_map_free(acc);\n  }\n\n  if (r < 0)\n  {\n    isl_union_map_free(access);\n    return r;\n  }\n\n  isl_union_map_free(access);\n  return isl_stat_ok;\n}\n\n/* Compute the local memory tiles for the array\n * reference group \"group\" of array \"array\" and set the tile depth.\n * Return 0 on success and -1 on error.\n */\nstatic int compute_group_bounds_io(struct autosa_kernel *kernel,\n                                   struct autosa_array_ref_group *group,\n                                   struct autosa_group_data *data)\n{\n  if (!group)\n    return -1;\n  if (compute_group_bounds_core_io(kernel, group, data) < 0)\n    return -1;\n\n  return 0;\n}\n\n/* Set array->n_group and array->groups to n and groups.\n *\n * Additionally, set the \"nr\" field of each group.\n */\nstatic void set_array_groups_io(struct autosa_local_array_info *array,\n                                int n, struct autosa_array_ref_group **groups)\n{\n  int i;\n\n  array->n_io_group = n;\n  array->io_groups = groups;\n\n  for (i = 0; i < n; ++i)\n    groups[i]->nr = i;\n}\n\n/* Group array references together if they share the I/O modules.\n * Return -1 on error.\n *\n * Two array references are grouped together if they share:\n * - I/O direction \"dir\" \n * - I/O type \"io_type\"\n * Besides, they should all under the SIMD loop or not.\n *\n * For exterior I/O pair, calculate the group tiling at the io_L2 level.\n * For interior I/O pair, calculate the group tiling at the io_L1 level.\n */\nstatic int group_array_references_io(struct autosa_kernel *kernel,\n                                     struct autosa_local_array_info *local, struct autosa_group_data *data)\n{\n  int i, j;\n  int n;\n  isl_ctx *ctx = isl_union_map_get_ctx(data->pe_sched);\n  struct autosa_array_ref_group **groups;\n\n  /* Count the total number of groups. \n   * We first populate the groups with the number of total communication pairs \n   * (io_info).\n   * We only consider io_info with RAR/RAW for IO groups.\n   */\n  n = 0;\n  for (i = 0; i < local->array->n_ref; i++)\n  {    \n    struct autosa_stmt_access *ref = local->array->refs[i];\n    for (j = 0; j < ref->n_io_info; j++) {\n      struct autosa_io_info *io_info = ref->io_info[j];\n      if (io_info->dep->type == AUTOSA_DEP_RAW || io_info->dep->type == AUTOSA_DEP_RAR)\n        n++;      \n    }    \n  }\n\n  groups = (struct autosa_array_ref_group **)calloc(n,\n                                                    sizeof(struct autosa_array_ref_group *));\n  //groups = new autosa_array_ref_group*[n];\n  if (!groups)\n    return -1;\n\n  /* Populate the groups. */\n  n = populate_array_references_io(local, groups, data);\n\n  /* Group references that share the same I/O direction and I/O type. */\n  n = group_share_io(kernel, n, groups, data);\n\n  /* Perform interior I/O elimination. */\n  for (i = 0; i < n; ++i)\n  {\n    autosa_interior_io_eliminate(kernel, groups[i], data->gen, data);\n  }\n\n  set_array_groups_io(local, n, groups);\n\n  return 0;\n}\n\n/* Internal struct usedd for extract_access_waw_domain */\nstruct extract_access_waw_domain_data\n{\n  struct autosa_stmt_access *ref;\n  isl_set *drain_domain;\n};\n\n/* Check if the access is associated with the waw,\n * if so, calculate the write-out (drain) domain as:\n * acc domain - waw src_domain\n */\nstatic void extract_access_waw_domain(__isl_keep isl_basic_map *dep, void *user)\n{\n  isl_space *space;\n  isl_space *src_space;\n  isl_id *src_id;\n  isl_set *src_domain;\n  struct extract_access_waw_domain_data *data =\n      (struct extract_access_waw_domain_data *)(user);\n  isl_basic_map *bmap;\n  isl_map *map;\n\n  space = isl_basic_map_get_space(dep);\n  src_space = isl_space_unwrap(isl_space_domain(space));\n  src_id = isl_space_get_tuple_id(src_space, isl_dim_out);\n  isl_space_free(src_space);\n\n  if (src_id != data->ref->ref_id)\n  {\n    isl_id_free(src_id);\n    return;\n  }\n  isl_id_free(src_id);\n\n  bmap = isl_basic_map_copy(dep);\n  map = isl_map_from_basic_map(bmap);\n  map = isl_map_factor_domain(map);\n  src_domain = isl_map_domain(map);\n\n  data->drain_domain = isl_set_subtract(data->drain_domain, src_domain);\n\n  return;\n}\n\n/* Extract the write-out domain for the given access. */\nstatic isl_bool extract_access_waw_domain_wrap(__isl_keep isl_map *map, void *user)\n{\n  isl_basic_map_list *bmap_list = isl_map_get_basic_map_list(map);\n  for (int i = 0; i < isl_map_n_basic_map(map); i++)\n  {\n    isl_basic_map *dep = isl_basic_map_list_get_basic_map(bmap_list, i);\n    extract_access_waw_domain(dep, user);\n    isl_basic_map_free(dep);\n  }\n  isl_basic_map_list_free(bmap_list);\n  return isl_bool_true;\n}\n\n/* Compute the local memory tiles for the array reference group \"group\"\n * of array \"array\". Return isl_stat_ok on success and isl_stat_error on error.\n *\n * The tiling is computed at the PE level.\n */\nstatic isl_stat compute_group_bounds_core_drain(struct autosa_kernel *kernel,\n                                                struct autosa_array_ref_group *group, struct autosa_group_data *data)\n{\n  isl_ctx *ctx = isl_space_get_ctx(group->array->space);\n  int use_local = kernel->options->autosa->use_local_memory;\n  isl_stat r = isl_stat_ok;\n  isl_union_map *access;\n  isl_map *acc;\n  isl_bool ok;\n\n  if (!use_local)\n    return isl_stat_ok;\n  if (autosa_array_is_read_only_scalar(group->array))\n    return isl_stat_ok;\n  if (!group->exact_write)\n    return isl_stat_ok;\n  if (group->slice)\n    return isl_stat_ok;\n\n  /* Collect all accesses in the group. */\n  /* This is overapproximated. */\n  access = autosa_array_ref_group_access_relation(group, 0, 1);\n  /* Create local tile */\n  if (use_local)\n  {\n    /* Create a tile. */\n    group->local_tile = autosa_array_tile_create(ctx,\n                                                 group->array->n_index);\n    /* Map the domain to the outer scheduling dimensions */\n    acc = local_access_io(group, access, data);\n    /* Collect the shift and scale factors of the tile. */\n    ok = can_tile(acc, group->local_tile);\n    if (ok < 0)\n      r = isl_stat_error;\n    else if (!ok)\n      group->local_tile =\n          autosa_array_tile_free(group->local_tile);\n    isl_map_free(acc);\n  }\n\n  if (r < 0)\n  {\n    isl_union_map_free(access);\n    return r;\n  }\n\n  isl_union_map_free(access);\n  return isl_stat_ok;\n}\n\n/* Compute the local memory tiles for the array\n * reference group \"group\" of array \"array\" and set the tile depth.\n * Return 0 on success and -1 on error.\n */\nstatic int compute_group_bounds_drain(struct autosa_kernel *kernel,\n                                      struct autosa_array_ref_group *group, struct autosa_group_data *data)\n{\n  if (!group)\n    return -1;\n  if (compute_group_bounds_core_drain(kernel, group, data) < 0)\n    return -1;\n\n  return 0;\n}\n\n/* Group array references together if they are associated with WAW dep and need \n * to be drained out.\n * Return -1 on error.\n *\n * Calculate the group tiling at the PE level.\n */\nstatic int group_array_references_drain(struct autosa_kernel *kernel,\n                                        struct autosa_local_array_info *local, struct autosa_group_data *data)\n{\n  local->drain_group = NULL;\n  if (local->array->local)\n    return 0;\n\n  int i, j;\n  int n;\n  isl_ctx *ctx = isl_union_map_get_ctx(data->pe_sched);\n  struct autosa_array_ref_group **groups = NULL;\n  isl_union_map *dep_waw = kernel->scop->tagged_dep_waw;  \n\n  /* Populate the groups. */\n  n = 0;\n  for (int i = 0; i < local->array->n_ref; ++i)\n  {\n    struct autosa_stmt_access *access = local->array->refs[i];    \n    if (!access->write)\n      continue;\n    isl_set *domain = isl_map_domain(isl_map_copy(access->access));\n    isl_set *access_domain = isl_union_set_extract_set(\n        kernel->expanded_domain,\n        isl_set_get_space(domain));\n    isl_set_free(domain);\n    \n    struct extract_access_waw_domain_data drain_data = {access, access_domain};\n    isl_union_map_every_map(dep_waw, &extract_access_waw_domain_wrap, &drain_data);    \n    if (!isl_set_is_empty(drain_data.drain_domain))\n    {\n      isl_map *map;\n      isl_union_map *umap;\n\n      map = isl_map_copy(access->access);\n      umap = isl_union_map_from_map(map);\n      umap = isl_union_map_apply_domain(umap,\n                                        isl_union_map_copy(data->pe_sched));\n\n      map = isl_map_from_union_map(umap);\n      map = isl_map_detect_equalities(map);\n\n      /* Add this access relation to the group. */\n      //struct autosa_array_ref_group *group =\n      //    isl_calloc_type(ctx, struct autosa_array_ref_group);\n      struct autosa_array_ref_group *group = new autosa_array_ref_group;\n      group = autosa_array_ref_group_init(group);\n      if (!group)\n      {\n        isl_map_free(map);\n        isl_set_free(drain_data.drain_domain);\n        return -1;\n      }\n\n      group->local_array = local;\n      group->array = local->array;\n      group->access = map;\n      group->write = access->write;\n      group->exact_write = access->exact_write;\n      group->slice = access->n_index < local->array->n_index;\n      group->refs = &local->array->refs[i];\n      group->n_ref = 1;\n      group->io_type = AUTOSA_INT_IO;\n      group->dir = isl_vec_zero(ctx, kernel->n_sa_dim);\n      group->old_dir = isl_vec_zero(ctx, kernel->n_sa_dim);\n      /* Perform interior I/O elimination by default. */\n      if (kernel->options->autosa->int_io_dir == 0)\n        group->dir = isl_vec_set_element_si(group->dir, 0, 1);\n      else\n        group->dir = isl_vec_set_element_si(group->dir, isl_vec_size(group->dir) - 1, 1);\n      group->group_type = AUTOSA_DRAIN_GROUP;\n      group->pe_io_dir = IO_OUT;\n      group->array_io_dir = IO_OUT;\n      group->io_pe_expr = NULL;\n      group->io_L1_pe_expr = NULL;\n      group->n_io_buffer = 0;\n      group->io_buffers = NULL;\n      group->copy_schedule = NULL;\n      group->pe_tile = NULL;\n      group->local_tile = NULL;\n      group->n_mem_ports = 1;\n      group->tuning_refs.push_back(std::shared_ptr<TPArrayRef>(local->array->tuning_refs[i]));\n      group->tuning_pe_tile = NULL;\n\n      //groups = (struct autosa_array_ref_group **)realloc(groups, (++n) *\n      //                                                               sizeof(struct autosa_array_ref_group *));      \n      struct autosa_array_ref_group **groups_tmp = isl_calloc_array(ctx, struct autosa_array_ref_group *, ++n);\n      for (int g = 0; g < n - 1; g++) {\n        groups_tmp[g] = groups[g];\n      }\n      free(groups);\n      groups = groups_tmp;      \n      groups[n - 1] = group;\n    }\n    isl_set_free(drain_data.drain_domain);\n  }\n\n  /* Join all referneces together. */\n  for (i = 1; i < n; ++i)\n  {\n    groups[0] = join_groups_and_free(groups[0], groups[i]);\n  }\n  if (n > 1)\n    n = 1;\n\n  /* Set the group. */\n  if (n > 0)\n  {\n    groups[0]->nr = 0;\n    local->drain_group = groups[0];\n  }\n  else\n  {\n    local->drain_group = NULL;\n  }\n  free(groups);\n\n  return 0;\n}\n\nstatic int gcd(int n1, int n2)\n{\n  while (n1 != n2)\n  {\n    if (n1 > n2)\n      n1 -= n2;\n    else\n      n2 -= n1;\n  }\n\n  return n1;\n}\n\n/* Compute a tiling for all the array reference groups in \"kernel\".\n */\nstatic void compute_group_tilings_pe(struct autosa_kernel *kernel)\n{\n  int i, j;\n\n  for (i = 0; i < kernel->n_array; ++i)\n  {\n    struct autosa_local_array_info *array = &kernel->array[i];\n\n    for (j = 0; j < array->n_pe_group; ++j)\n      autosa_array_ref_group_compute_tiling(NULL, array->pe_groups[j]);\n  }\n}\n\n/* Compute a tiling for all the array reference groups in \"kernel\".\n */\nstatic void compute_group_tilings_io(struct autosa_kernel *kernel)\n{\n  int i, j;\n\n  for (i = 0; i < kernel->n_array; ++i)\n  {\n    struct autosa_local_array_info *array = &kernel->array[i];\n\n    for (j = 0; j < array->n_io_group; ++j)\n      autosa_array_ref_group_compute_tiling(NULL, array->io_groups[j]);\n  }\n}\n\n/* Compute a tiling for all the array reference groups in \"kernel\".\n */\nstatic void compute_group_tilings_drain(struct autosa_kernel *kernel)\n{\n  int i, j;\n\n  for (i = 0; i < kernel->n_array; ++i)\n  {\n    struct autosa_local_array_info *array = &kernel->array[i];\n    if (!array->drain_group)\n      continue;\n    autosa_array_ref_group_compute_tiling(NULL, array->drain_group);\n  }\n}\n\n/* Update the I/O schedules by I/O module clustering. */\nstatic isl_stat autosa_io_clustering(struct autosa_kernel *kernel,\n                                     struct autosa_gen *gen, struct autosa_group_data *data)\n{\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local = &kernel->array[i];\n    for (int j = 0; j < local->n_io_group; j++)\n    {\n      compute_io_group_schedule(kernel, local->io_groups[j], gen);\n    }\n    if (local->drain_group)\n    {\n      compute_io_group_schedule(kernel, local->drain_group, gen);\n    }\n  }\n  return isl_stat_ok;\n}\n\n/* Allocate I/O buffers inside I/O modules. */\nstatic isl_stat autosa_io_buffer_allocate(struct autosa_kernel *kernel,\n                                          struct autosa_gen *gen, struct autosa_group_data *data)\n{\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local = &kernel->array[i];\n    for (int j = 0; j < local->n_io_group; j++)\n    {      \n      compute_io_group_buffer(kernel, local->io_groups[j], gen);            \n      if (!gen->options->autosa->lower_int_io_L1_buffer) {\n        /* Hoist the L1 I/O buffer. \n         * Do not touch internal array when local reduce is enabled.\n         */\n        if (!(gen->options->autosa->local_reduce && local->array_type == AUTOSA_INT_ARRAY)) {\n          if (kernel->array_part_w > 0)\n            hoist_L1_io_buffer(kernel, local->io_groups[j], gen, data);\n        }\n      }      \n      if (gen->options->autosa->two_level_buffer)\n      {\n        /* Seek the opportunity to hoist up the L2 I/O buffers. */\n        hoist_L2_io_buffer(kernel, local->io_groups[j], gen, data);\n      }            \n      if (gen->options->autosa->local_reduce && local->io_groups[j]->attached_drain_group)\n      {\n        if (gen->options->autosa->two_level_buffer) {\n          /* At present, two-level buffer and local reduce can be enabled at the same time. */\n          throw std::runtime_error(\"[AutoSA] Error: Two-level buffer and local reduce can't be used at the same time.\");\n        }        \n      }      \n      if (gen->options->autosa->lower_int_io_L1_buffer) {\n        /* Lower the L1 buffer for interior I/O module if possible. */\n        lower_int_io_L1_buffer(kernel, local->io_groups[j], gen);\n        /* Enable the second-level buffer for this array */\n        insert_L2_io_buffer(kernel, local->io_groups[j], gen);\n        /* Seek the opportunity to hoist up the L2 I/O buffers. */\n        //hoist_L2_io_buffer(kernel, local->io_groups[j], gen, data);\n      }            \n    }    \n    if (local->drain_group)\n    {      \n      compute_io_group_buffer(kernel, local->drain_group, gen);\n      if (gen->options->autosa->two_level_buffer)\n      {\n        hoist_L2_io_buffer(kernel, local->drain_group, gen, data);\n      }\n    }\n  }\n  return isl_stat_ok;\n}\n\n/* Compute data packing factors. */\nstatic isl_stat autosa_io_data_pack(struct autosa_kernel *kernel,\n                                    struct autosa_gen *gen, struct autosa_group_data *data)\n{\n  /* Initalize the IO buffer */\n  for (int i = 0; i < kernel->n_array; i++) {\n    struct autosa_local_array_info *local = &kernel->array[i];\n    for (int j = 0; j < local->n_io_group; j++) {\n      struct autosa_array_ref_group *group = local->io_groups[j];\n      //if (group->copy_in || group->copy_out) {\n        for (int k = 0; k < group->io_level; k++) {\n          struct autosa_io_buffer *buf = group->io_buffers[k];\n          buf->sparse = 0;\n          buf->vec_len = 0;        \n          buf->serialize = (gen->options->autosa->host_serialize == 1)? 1 : 0;\n        }      \n      //}\n    }\n    if (local->drain_group) {\n      struct autosa_array_ref_group *group = local->drain_group;\n      for (int k = 0; k < group->io_level; k++) {\n        struct autosa_io_buffer *buf = group->io_buffers[k];\n        buf->sparse = 0;\n        buf->vec_len = 0;        \n        buf->serialize = (gen->options->autosa->host_serialize == 1)? 1 : 0;\n      }      \n    }\n  }\n\n  for (int i = 0; i < kernel->n_array; i++) {\n    struct autosa_local_array_info *local = &kernel->array[i];\n    for (int j = 0; j < local->n_io_group; j++) {\n      //if (local->io_groups[j]->copy_in || local->io_groups[j]->copy_out) {\n        if (local->is_sparse)\n          compute_io_group_data_pack_sparse(kernel, local->io_groups[j], gen, -1);\n        else\n          compute_io_group_data_pack(kernel, local->io_groups[j], gen, -1);\n      //}\n    }\n    if (local->drain_group) {\n      if (local->is_sparse)\n        compute_io_group_data_pack_sparse(kernel, local->drain_group, gen, -1);\n      else\n        compute_io_group_data_pack(kernel, local->drain_group, gen, -1);\n    }\n  }\n  return isl_stat_ok;\n}\n\n/* Construct a map from domain_space to domain_space that increments\n * the dimension at position \"pos\" and leaves all other dimensions constant. \n */\nstatic __isl_give isl_map *next(__isl_take isl_space *domain_space, int pos)\n{\n  isl_space *space;\n  isl_aff *aff;\n  isl_multi_aff *next;\n\n  space = isl_space_map_from_set(domain_space);\n  next = isl_multi_aff_identity(space);\n  aff = isl_multi_aff_get_aff(next, pos);\n  aff = isl_aff_add_constant_si(aff, 1);\n  next = isl_multi_aff_set_aff(next, pos, aff);\n\n  return isl_map_from_multi_aff(next);\n}\n\n/* This function generates different possible loop orderings for the array partitioning loop band.\n * For I/O groups with external array, we will select the loops that do not appear in the \n * array indices, and select them as the innermost loop in the generated loop ordering.\n * There are several considerations here.\n * 1. Why consider loop index, but not data dependence?\n * For external groups, we don't handle overlapping reuse. For example, when the \n * reuse factor is (1,-1). In the next iteration, we will only load part of the data\n * and reuse some data left in the previous iteration. However, this brings additional \n * hardware overheads for indexing the new data and rearranging the old data. \n * Therefore, such overlapping reuse is not considered. In other words, only reuse \n * vectors that are in the form of unit vectors are considered. \n * Therefore, we will only look for loop indices not showing the array index.\n * 2. Why put them innermost?\n * Placing reuse loops innermost maximizes the locality and minimizes the data communication.\n * The relative order between reuse loops and non-reuse loops doesn't matter as overlapped reuse \n * is not supported. Therefore, we will only randomly pick one loop order for this group.\n * \n * For I/O groups with internal array, simply, we choose to examine the array indexs.\n * And select the loops that doesn't appear in these indices. This is due to the same \n * reason as argued above for the simplification of the architecture.\n * \n * Drain I/O groups are not considered as the data communication volumn is fixed and \n * is not affected by the loop permutation.\n */\nstatic void explore_loop_permute(struct autosa_kernel *kernel, struct autosa_gen *gen) {\n  /* Count the number of possible loop permutation. */\n  int n_order = 0;\n  std::vector<std::unordered_set<int>> loop_orderings;\n  isl_schedule_node *node = isl_schedule_get_root(kernel->schedule);\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  isl_union_map *prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n  isl_schedule_node_free(node);\n  \n  for (int i = 0; i < kernel->n_array; i++) {\n    struct autosa_local_array_info *local = &kernel->array[i];\n    for (int j = 0; j < local->n_io_group; j++) {\n      struct autosa_array_ref_group *group = local->io_groups[j];      \n      for (int r = 0; r < group->n_ref; r++) {\n        struct autosa_stmt_access *ref = group->refs[r];\n        isl_map *acc = isl_map_from_union_map(isl_union_map_apply_domain(\n                          isl_union_map_from_map(isl_map_copy(ref->access)),\n                          isl_union_map_copy(prefix)));\n        int n_dim = isl_map_dim(acc, isl_dim_in);\n        std::unordered_set<int> reuse_loops;\n        for (int d = 0; d < n_dim; d++) {\n          /* We will test if the array elements accessed by the iterations that increased \n           * at position \"d\" is the same as the original array elements.\n           */\n          isl_space *space = isl_map_get_space(acc);\n          space = isl_space_domain(space);\n          isl_map *next_iter = next(space, d);                    \n          isl_map *map = isl_map_apply_domain(next_iter, isl_map_copy(acc));\n          map = isl_map_apply_range(map, isl_map_copy(acc));\n          map = isl_map_coalesce(map);          \n          isl_set *domain = isl_map_domain(isl_map_copy(map));\n          isl_set *range = isl_map_range(isl_map_copy(map));          \n          isl_map_free(map);\n          if (isl_set_is_subset(domain, range) && isl_set_is_subset(range, domain)) {            \n            //std::cout << d << std::endl;\n            reuse_loops.insert(d);\n          }          \n          isl_set_free(domain);\n          isl_set_free(range);\n        }        \n        isl_map_free(acc);\n        if (reuse_loops.size() > 0) {\n          // Prune the duplicated ordering.\n          int d = 0;\n          for (d = 0; d < loop_orderings.size(); d++) {\n            if (loop_orderings[d] == reuse_loops) \n              break;\n          }\n          if (d == loop_orderings.size())\n            loop_orderings.push_back(reuse_loops);\n        }\n      }\n    }\n  }  \n  isl_union_map_free(prefix);\n  n_order = loop_orderings.size();\n\n  /* When there is more than one loop permutation found,\n   * We will print a temp file named by the sequence of the next loop ordering to the \n   * temporary directory. For example, when there are two orderings, \n   * in the first-time compilation, we print a file named \"permute_1\" to the tmp directory.\n   * AutoSA wrapper script will then call the compilation again serving this index \n   * as the next loop ordering to be selected.\n   * This process is iterated until all the orderings are explored.\n   * In the last ordering, we print \"permute_done\" instead, which will instruct the \n   * wrapper script to stop calling the program.\n   */\n  if (n_order == 1)\n    return;\n\n  int cur_n_order = gen->options->autosa->loop_permute_order;\n  if (gen->options->autosa->tuning_method == 1) {\n    /* Print the tmp file. */  \n    isl_printer *p_str = isl_printer_to_str(gen->ctx);\n    p_str = isl_printer_print_str(p_str, gen->options->autosa->output_dir);\n    p_str = isl_printer_print_str(p_str, \"/permute_\");\n    if (cur_n_order == n_order - 1) {\n      p_str = isl_printer_print_str(p_str, \"done\");\n    } else {\n      p_str = isl_printer_print_int(p_str, cur_n_order + 1);\n    }\n    char *file_name = isl_printer_get_str(p_str);\n    isl_printer_free(p_str);\n    FILE *fp = fopen(file_name, \"w\");\n    fclose(fp);    \n\n    kernel->tuning_program->id2 = cur_n_order;\n  }\n\n  /* Modify the loop ordering. */\n  std::unordered_set<int> order = loop_orderings[cur_n_order];\n  node = isl_schedule_get_root(kernel->schedule);\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  node = isl_schedule_node_parent(node);\n  int n_dim = isl_schedule_node_band_n_member(node);\n  std::unordered_map<int, int> pos_map;\n  for (int p = 0; p < n_dim; p++) {\n    pos_map[p] = p;\n  }\n  int n_processed = 0;\n  for (auto o : order) {\n    //std::cout << o << std::endl;\n    /* Move the \"o\"-th loop inside */    \n    node = loop_interchange_at_node(node, pos_map[o], n_dim - 1 - n_processed);\n    pos_map[n_dim - 1 - n_processed] = pos_map[o];\n    pos_map[o] = n_dim - 1 - n_processed;\n    n_processed++;\n  }\n  //DBGSCHDNODE(stdout, node, gen->ctx);\n  isl_schedule_free(kernel->schedule);\n  kernel->schedule = isl_schedule_node_get_schedule(node);\n  isl_schedule_node_free(node);\n\n  // TODO: Test if we need to update anything else\n}\n\n/* Group references of all arrays in \"kernel\".\n * Each array is associated with three types of groups:\n * PE group: Assign the local buffers inside PEs.\n * I/O group: Assign the I/O modules for transferring data between\n *   PEs and the external memory\n * Drain group: Assign the I/O modules for transferring out the results from\n *   PEs to the external memory.\n */\nisl_stat sa_io_construct_optimize(struct autosa_kernel *kernel, struct autosa_gen *gen)\n{\n  int r = 0;\n  struct autosa_group_data data;\n  isl_schedule_node *node;\n  isl_union_pw_multi_aff *contraction;\n\n  node = isl_schedule_get_root(kernel->schedule);\n  node = autosa_tree_move_down_to_kernel(node);\n\n  /* Set autosa_group_data. */\n  data.scop = kernel->prog->scop;\n  data.gen = gen;\n  data.kernel_depth = isl_schedule_node_get_schedule_depth(node);\n  data.host_sched = isl_schedule_node_get_prefix_schedule_relation(node);\n  node = autosa_tree_move_down_to_pe(node, kernel->core);\n  data.pe_depth = isl_schedule_node_get_schedule_depth(node);\n  data.pe_sched = prefix_with_equalities(node);\n  contraction = isl_union_pw_multi_aff_copy(kernel->contraction);\n  data.host_sched = expand(data.host_sched, contraction);\n  data.copy_sched = isl_union_map_copy(data.pe_sched);\n  data.pe_sched = expand(data.pe_sched, contraction);\n  isl_union_pw_multi_aff_free(contraction);\n  data.full_sched = isl_union_map_copy(data.pe_sched);\n  data.full_sched = isl_union_map_flat_range_product(data.full_sched,\n                                                     isl_schedule_node_get_subtree_schedule_union_map(node));\n  data.schedule = kernel->schedule;\n\n  /* Create the default array reference groups (PPCG heritage). */\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    r = group_array_references_default(kernel, &kernel->array[i], &data);\n    if (r < 0)\n      break;\n  }\n\n  /* Group the array references for the PE.\n   * These groups will be used for allocate local buffers inside PEs.\n   */\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    r = group_array_references_pe(kernel, &kernel->array[i], &data);\n    if (r < 0)\n      break;\n  }\n\n  /* Group the array references for the I/O modules. */\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    r = group_array_references_io(kernel, &kernel->array[i], &data);\n    if (r < 0)\n      break;\n  }\n\n  /* Group the array references for the drain data */\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    r = group_array_references_drain(kernel, &kernel->array[i], &data);\n    if (r < 0)\n      break;\n  }\n\n  if (kernel->scop->options->autosa->explore_loop_permute == 1) {\n      /* Explore different loop orderings of the array partitioning band. */\n    explore_loop_permute(kernel, gen);\n  }\n\n  /* Perform I/O Optimization */  \n  /* I/O module clustering */\n  autosa_io_clustering(kernel, gen, &data);\n\n  /* Local reduce */\n  if (gen->options->autosa->local_reduce) \n  {\n    printf(\"[AutoSA] Warning: Local reduction is enabled. The legality should be guaranteed by users.\\n\");\n    // TODO: In the future, add a legality check for this optimization.\n    /* Check if there is one exterior I/O group.\n     * Then attach the drain group to this I/O group and set the drain group to NULL.\n     */\n    for (int i = 0; i < kernel->n_array; i++)\n    {\n      int ext_group_cnt = 0;\n      int group_id = -1;\n      struct autosa_local_array_info *local = &kernel->array[i];\n      for (int j = 0; j < local->n_io_group; j++)\n      {\n        if (local->io_groups[j]->io_type == AUTOSA_EXT_IO &&\n            local->array_type == AUTOSA_INT_ARRAY) {\n          ext_group_cnt++;\n          group_id = j;\n        }\n      }\n      if (local->drain_group && ext_group_cnt == 1) {\n        local->io_groups[group_id]->attached_drain_group = local->drain_group;\n        local->io_groups[group_id]->copy_out = 1;\n        local->drain_group = NULL;\n        local->io_groups[group_id]->copy_in = 0;\n        local->n_mem_ports = 1;\n      }    \n    }\n  }\n\n  if (gen->options->autosa->host_serialize)\n  {\n    /* Check if there is only one I/O/drain group for each array.\n     * Otherwise, we will disable the host serialize.\n     */\n    for (int i = 0; i < kernel->n_array; i++)\n    {\n      int module_cnt = 0;\n      struct autosa_local_array_info *local = &kernel->array[i];\n      for (int j = 0; j < local->n_io_group; j++)\n      {\n        if (local->io_groups[j]->copy_in)\n          module_cnt++;\n        if (local->io_groups[j]->copy_out)\n          module_cnt++;\n      }\n      if (local->drain_group)\n      {\n        if (local->drain_group->copy_out)\n          module_cnt++;\n      }\n      if (module_cnt > 1) {\n        gen->options->autosa->host_serialize = 0;\n        // TODO: In the future, we should separate this check for each array.\n        printf(\"[AutoSA] Warning: More than one IO/drain group found for array: %s. Host data serialization is disabled.\\n\", local->array->name);\n      }\n    }\n  }\n  if (gen->options->autosa->host_serialize)\n  {\n    /* Disable the two-level buffering when host data serialization is enabled. */\n    gen->options->autosa->two_level_buffer = 0;\n    printf(\"[AutoSA] Warning: Two-level buffering is disabled because host data serialization is enabled.\\n\");\n  }\n  if (gen->options->autosa->host_serialize && gen->options->autosa->hbm)\n  {\n    printf(\"[AutoSA] Error: Host serialization and HBM can't be enabled at the same time!\\n\");\n    exit(1);\n  }\n\n  /* Print the IO grouping information */\n  print_io_grouping_info(stdout, kernel);\n\n  /* Test if there is any IO group with internal array and needs copy-in. \n   * Such designs can't run due to the HLS limitation. \n   * Code generation will proceeed as usual only for tuning purpose.\n   */\n  bool is_safe = true;\n  for (int i = 0; i < kernel->n_array; i++) {\n    struct autosa_local_array_info *local = &kernel->array[i];\n    for (int j = 0; j < local->n_io_group; j++) {\n      struct autosa_array_ref_group *group = local->io_groups[j];\n      if (group->copy_in && local->array_type == AUTOSA_INT_ARRAY) {\n        is_safe = false;\n      }\n    }\n  }\n  if (!is_safe) {\n    printf(\"[AutoSA] Warning: The generated program contains feedback loops and can't be synthesized by HLS.\\n\");\n    printf(\"                  The compilation flow will proceed as usual.\\n\");\n  }\n    \n  /* I/O buffer allocation */\n  autosa_io_buffer_allocate(kernel, gen, &data);  \n  /* I/O module data pack */\n  autosa_io_data_pack(kernel, gen, &data);    \n\n  /* Since different I/O groups of the same array will access the DRAM with the \n   * same global array pointer. We will need to make sure the outermost \n   * data packing factors are the same across these groups.\n   * Here we will examine if they are the same.\n   * If not, we will need to repack to the I/O groups to make them equal. \n   */\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    int n_lane = -1;\n    bool repack = false;\n    for (int j = 0; j < local_array->n_io_group; j++)\n    {      \n      struct autosa_array_ref_group *group = local_array->io_groups[j];\n      if (!(group->copy_in || group->copy_out))\n        continue;\n      int cur_n_lane = group->io_buffers[group->n_io_buffer - 1]->n_lane;\n      if (n_lane == -1)\n        n_lane = cur_n_lane;\n      else\n        n_lane = gcd(n_lane, cur_n_lane);\n      if (n_lane != cur_n_lane)\n      {\n        repack = true;\n      }\n    }\n    if (local_array->drain_group)\n    {      \n      struct autosa_array_ref_group *group = local_array->drain_group;\n      int cur_n_lane = group->io_buffers[group->n_io_buffer - 1]->n_lane;\n      if (n_lane == -1)\n        n_lane = cur_n_lane;\n      else\n        n_lane = gcd(n_lane, cur_n_lane);\n      if (n_lane != cur_n_lane)\n      {\n        repack = true;\n      }\n    }\n\n    if (repack)\n    {\n      /* We need to repack the data for each I/O buffers */\n      for (int j = 0; j < local_array->n_io_group; j++)\n      {\n        struct autosa_array_ref_group *group = local_array->io_groups[j];\n        compute_io_group_data_pack(kernel, group, gen, n_lane);\n      }\n      if (local_array->drain_group)\n      {\n        struct autosa_array_ref_group *group = local_array->drain_group;\n        compute_io_group_data_pack(kernel, group, gen, n_lane);\n      }\n    }\n\n    local_array->n_lane = std::max(1, n_lane);\n    local_array->array->n_lane = std::max(1, n_lane);\n  }\n\n  isl_union_map_free(data.host_sched);\n  isl_union_map_free(data.copy_sched);\n  isl_union_map_free(data.full_sched);\n  isl_union_map_free(data.pe_sched);\n  isl_schedule_node_free(node);\n\n  /* Compute a tiling for all the array reference groups in \"kernel\". */\n  compute_group_tilings_pe(kernel);\n  compute_group_tilings_io(kernel);\n  compute_group_tilings_drain(kernel);\n\n  return isl_stat_ok;\n}\n\n/* Return the access relation associated with the comm pair of the array reference\n * \"ref\" in the current I/O group \"group\".\n * For each reference, if \n * - extract copy-in access (read == 1) \n *   - read access\n *     - RAR: extract the union of the src and dest domain of dep\n *     - RAW: extract the dest domain of dep\n * - extract copy-out access (write == 1)\n *   - write access\n *     - RAW: extract the src domain of dep \n */\n__isl_give isl_union_map *autosa_io_group_ref_access_relation(\n    struct autosa_array_ref_group *group,\n    struct autosa_stmt_access *ref,\n    int read, int write)\n{\n  isl_union_map *access;\n  isl_map *map;\n\n  access = isl_union_map_empty(isl_map_get_space(ref->access));\n  for (int i = 0; i < ref->n_io_info; i++)\n  {\n    struct autosa_io_info *info_i = ref->io_info[i];\n    if (info_i->io_type == group->io_type &&\n        !isl_vec_cmp(info_i->dir, group->dir))\n    {\n      isl_map *dep = isl_map_factor_domain(isl_map_from_basic_map(\n          isl_basic_map_copy(info_i->dep->isl_dep)));\n      isl_set *dep_src = isl_map_domain(isl_map_copy(dep));\n      isl_set *dep_dest = isl_map_range(dep);\n      if (info_i->dep->type == AUTOSA_DEP_RAR)\n      {\n        isl_set *domain = isl_set_union(dep_src, dep_dest);\n        domain = isl_set_coalesce(domain);\n        access = isl_union_map_union(access,\n                                     isl_union_map_from_map(isl_map_intersect_domain(\n                                         isl_map_copy(ref->access), domain)));\n      }\n      else if (info_i->dep->type == AUTOSA_DEP_RAW)\n      {\n        isl_set *domain;\n        if (ref->read)\n        {\n          domain = dep_dest;\n          isl_set_free(dep_src);\n        }\n        else\n        {\n          domain = dep_src;\n          isl_set_free(dep_dest);\n        }\n        access = isl_union_map_union(access,\n                                     isl_union_map_from_map(isl_map_intersect_domain(\n                                         isl_map_copy(ref->access), domain)));\n      }\n      else\n      {\n        isl_set_free(dep_src);\n        isl_set_free(dep_dest);\n      }\n    }\n  }\n\n  return access;\n}\n\n/* Return the access relation associated with the comm pair of the array reference\n * \"ref\" in the current drain group \"group\".\n * For each reference, domain = domain - src domain of WAW dep.\n */\n__isl_give isl_union_map *autosa_drain_group_ref_access_relation(\n    struct autosa_array_ref_group *group,\n    struct autosa_stmt_access *ref,\n    int read, int write, __isl_keep isl_union_set *domain)\n{\n  isl_union_map *access;\n  isl_set *acc_domain;\n  isl_space *space;\n\n  access = isl_union_map_empty(isl_map_get_space(group->access));\n  acc_domain = isl_map_domain(isl_map_copy(ref->access));\n  space = isl_set_get_space(acc_domain);\n  isl_set_free(acc_domain);\n  acc_domain = isl_union_set_extract_set(domain, space);\n  for (int i = 0; i < ref->n_io_info; i++)\n  {\n    struct autosa_io_info *info_i = ref->io_info[i];\n    if (info_i->dep->type == AUTOSA_DEP_WAW)\n    {\n      isl_set *src_domain;\n      isl_space *space, *src_space;\n      isl_id *src_id;\n\n      space = isl_basic_map_get_space(info_i->dep->isl_dep);\n      src_space = isl_space_unwrap(isl_space_domain(space));\n      src_id = isl_space_get_tuple_id(src_space, isl_dim_out);\n      isl_space_free(src_space);\n      if (src_id != ref->ref_id)\n      {\n        isl_id_free(src_id);\n        continue;\n      }\n      isl_id_free(src_id);\n      src_domain = isl_map_domain(isl_map_factor_domain(isl_map_from_basic_map(\n          isl_basic_map_copy(info_i->dep->isl_dep))));\n      acc_domain = isl_set_subtract(acc_domain, src_domain);\n    }\n  }\n  access = isl_union_map_union(access,\n                               isl_union_map_from_map(isl_map_intersect_domain(\n                                   isl_map_copy(ref->access), acc_domain)));\n\n  return access;\n}\n\n/* For each reference, if \n * - extract copy-in access (read == 1) \n *   - read access\n *     - RAR: extract the union of the src and dest domain of dep\n *     - RAW: extract the dest domain of dep\n * - extract copy-out access (write == 1)\n *   - write access\n *     - RAW: extract the src domain of dep \n */\n__isl_give isl_union_map *autosa_io_group_access_relation(\n  struct autosa_array_ref_group *group, \n  struct autosa_kernel *kernel,\n  int read, int write)\n{\n  isl_union_map *access;\n\n  access = isl_union_map_empty(isl_map_get_space(group->access));\n  for (int i = 0; i < group->n_ref; ++i)\n  {\n    struct autosa_stmt_access *ref_i = group->refs[i];\n\n    if (!((read && group->refs[i]->read) ||\n          (write && group->refs[i]->write)))\n      continue;\n\n    if (group->group_type == AUTOSA_IO_GROUP) \n    {\n      access = isl_union_map_union(access,\n                                   autosa_io_group_ref_access_relation(group, ref_i, read, write));\n    } else if (group->group_type == AUTOSA_DRAIN_GROUP) \n    {\n      access = isl_union_map_union(access,\n                                   autosa_drain_group_ref_access_relation(group, ref_i, read, write,\n                                                                          kernel->expanded_domain));\n    }\n  }\n\n  /* Simplify the access relation. */\n  access = isl_union_map_coalesce(access);\n\n  return access;\n}\n\n/* Return the union of all tagged access relations in the group.\n */\n__isl_give isl_union_map *group_tagged_access_relation(\n    struct autosa_array_ref_group *group)\n{\n  int i;\n  isl_union_map *access;\n\n  access = isl_union_map_empty(isl_map_get_space(group->access));\n  for (i = 0; i < group->n_ref; ++i)\n  {\n    isl_map *map_i;\n\n    map_i = isl_map_copy(group->refs[i]->tagged_access);\n    access = isl_union_map_union(access,\n                                 isl_union_map_from_map(map_i));\n  }\n\n  return access;\n}\n\n/* Given a set of wrapped references \"ref\", return the corresponding\n * access relations based on the tagged access relations \"tagged\".\n *\n * The elements of \"ref\" are of the form\n *\n *\t[D -> R]\n *\n * with D an iteration domains and R a reference.\n * The elements of \"tagged\" are of the form\n *\n *\t[D -> R] -> A\n *\n * with A an array.\n *\n * Extend \"tagged\" to include the iteration domain in the range, i.e.,\n *\n *\t[D -> R] -> [D -> A]\n *\n * apply the result to \"ref\" and then unwrap the resulting set\n * to obtain relations of the form\n *\n *\tD -> A\n */\n__isl_give isl_union_map *wrapped_reference_to_access(\n    __isl_take isl_union_set *ref, __isl_take isl_union_map *tagged)\n{\n  isl_union_map *tag2access;\n\n  tag2access = isl_union_map_copy(tagged);\n  tag2access = isl_union_map_universe(tag2access);\n  tag2access = isl_union_set_unwrap(isl_union_map_domain(tag2access));\n\n  /* Construct [D -> R] -> D */\n  tag2access = isl_union_map_domain_map(tag2access);\n\n  /* Construct [D -> R] -> [D -> A] */\n  tag2access = isl_union_map_range_product(tag2access, tagged);\n\n  ref = isl_union_set_coalesce(ref);\n  ref = isl_union_set_apply(ref, tag2access);\n\n  return isl_union_set_unwrap(ref);\n}\n\n/* Given an access relation \"access\" from one or more array reference groups,\n * remove those reads if (\"read\" is 1) or writes (if \"read\" is 0)\n * that are only needed to communicate data within\n * the same iteration of \"sched\".\n * The domain of \"sched\" corresponds to the original statement instances,\n * i.e., those that appear in the domains of the access relations.\n * \"tagged\" contains all tagged access relations to all\n * the array reference groups accessed by \"access\" from statement\n * instances scheduled by \"sched\".\n *\n * If the access is a read then it is either an element of\n *\n *\tlive_in union (range flow)\n *\n * where live_in and flow may be overapproximations, or\n * it reads an uninitialized value (that is not live-in because\n * there is an intermediate kill) or it reads a value that was\n * written within the same (compound) statement instance.\n * If the access is a write then it is either an element of\n *\n *\tlive_out union (domain flow)\n *\n * or it writes a value that is never read (and is not live-out\n * because of an intermediate kill) or only\n * within the same (compound) statement instance.\n * In both cases, the access relation is also a subset of\n * the group access relation.\n *\n * The cases where an uninitialized value is read or a value is written\n * that is never read or where the dataflow occurs within a statement\n * instance are also considered local and may also be removed.\n *\n * Essentially, we compute the intersection of \"access\" with either\n *\n *\tlive_in union (range non-local-flow)\n *\n * or\n *\n *\tlive_out union (domain non-local-flow)\n *\n * We first construct a relation \"local\"\n *\n *\t[[D -> R] -> [D' -> R']]\n *\n * of pairs of domain iterations accessing the reference group\n * and references in the group that are coscheduled by \"sched\".\n *\n * If this relation does not intersect the dataflow dependences,\n * then there is nothing we can possibly remove, unless the dataflow\n * dependences themselves only relate a subset of the accesses.\n * In particular, the accesses may not be involved in any dataflow\n * dependences, either because they are uninitialized reads/dead writes\n * or because the dataflow occurs inside a statement instance.\n *\n * Since the computation below may break up the access relation\n * into smaller pieces, we only perform the intersection with\n * the non-local dependent accesses if the local pairs\n * intersect the dataflow dependences. Otherwise, we intersect\n * with the universe of the non-local dependent accesses.\n * This should at least remove accesses from statements that\n * do not participate in any dependences.\n *\n * In particular, we remove the \"local\" dataflow dependences from\n * the set of all dataflow dependences, or at least those\n * that may contribute to a domain/range that intersects\n * the domain of \"access\".\n * Note that if the potential dataflow dependences are an overapproximation\n * of the actual dataflow dependences, then the result remains an\n * overapproximation of the non-local dataflow dependences.\n * Copying to/from global memory is only needed for the references\n * in the domain/range of the result or for accesses that are live out/in\n * for the entire scop.\n *\n * We therefore map the domain/range of the \"external\" relation\n * to the corresponding access relation and take the union with\n * the live out/in relation.\n */\n__isl_give isl_union_map *remove_local_accesses(\n    struct autosa_prog *prog, __isl_take isl_union_map *tagged,\n    __isl_take isl_union_map *access, __isl_take isl_union_map *sched,\n    int read)\n{\n  int empty;\n  isl_union_pw_multi_aff *tagger;\n  isl_union_set *domain, *access_domain;\n  isl_union_map *local, *external, *universe;\n  isl_union_set *tag_set;\n\n  if (isl_union_map_is_empty(access))\n  {\n    isl_union_map_free(sched);\n    isl_union_map_free(tagged);\n    return access;\n  }\n\n  /* Tagger maps the tagged iteration domain to untagged iteration domain. \n   * Iteration domain is tagged to the access function.\n   * e.g., [S1[i,j,k]->_pet_ref_1[]] -> S1[(i),(j),(k)]\n   */\n  tagger = isl_union_pw_multi_aff_copy(prog->scop->tagger);\n  domain = isl_union_map_domain(isl_union_map_copy(tagged));\n  tagger = isl_union_pw_multi_aff_intersect_domain(tagger,\n                                                   isl_union_set_copy(domain));\n  sched = isl_union_map_preimage_domain_union_pw_multi_aff(sched, tagger);\n\n  /* Construct the relation \"local\"\n   * [[D -> R] -> [D' -> R']]\n   */\n  local = isl_union_map_apply_range(sched,\n                                    isl_union_map_reverse(isl_union_map_copy(sched)));\n  /* Derive the local dependence set. */\n  local = isl_union_map_intersect(local,\n                                  isl_union_map_copy(prog->scop->tagged_dep_flow));\n\n  empty = isl_union_map_is_empty(local);\n\n  external = isl_union_map_copy(prog->scop->tagged_dep_flow);\n  universe = isl_union_map_universe(isl_union_map_copy(access));\n  access_domain = isl_union_map_domain(universe);\n  domain = isl_union_set_universe(domain);\n  universe = isl_union_set_unwrap(domain);\n  universe = isl_union_map_intersect_domain(universe, access_domain);\n  domain = isl_union_map_wrap(universe);\n  if (read)\n    external = isl_union_map_intersect_range(external, domain);\n  else\n    external = isl_union_map_intersect_domain(external, domain);\n  external = isl_union_map_intersect_params(external,\n                                            isl_set_copy(prog->scop->context));\n  external = isl_union_map_subtract(external, local);\n  /* So far external contains only access non-local RAW pairs. */\n\n  if (read)\n  {\n    tag_set = isl_union_map_range(external);\n    external = wrapped_reference_to_access(tag_set, tagged);\n    external = isl_union_map_union(external,\n                                   isl_union_map_copy(prog->scop->live_in));\n  }\n  else\n  {\n    tag_set = isl_union_map_domain(external);\n    external = wrapped_reference_to_access(tag_set, tagged);\n    external = isl_union_map_union(external,\n                                   isl_union_map_copy(prog->scop->live_out));\n  }\n\n  if (empty < 0)\n    external = isl_union_map_free(external);\n  else if (empty)\n    external = isl_union_map_universe(external);\n\n  access = isl_union_map_intersect(access, external);\n\n  return access;\n}\n\n/* Extended from remove_local_accesses.\n * Given an access relation \"access\" from one or more array reference groups,\n * remove those reads if (\"read\" is 1) or writes (if \"read\" is 0)\n * that are only needed to communicate data within\n * the same iteration of \"sched\".\n * We exclude the live-in and live-out accesses. \n * This function only considers RAW deps.\n * \n * \"tagged\" contain all tagged accesses in the group.\n * \"access\" contain the accesses of interest for the current group.\n * \"sched\" is the prefix schedule. The depth of the scheduling domain is where\n * the copy statemetns are inserted.\n */\n__isl_give isl_union_map *remove_local_accesses_flow(\n  struct autosa_prog *prog, __isl_take isl_union_map *tagged,\n  __isl_take isl_union_map *access, __isl_take isl_union_map *sched,\n  int read)\n{\n  int empty;\n  isl_union_pw_multi_aff *tagger;\n  isl_union_set *domain, *access_domain;\n  isl_union_map *local, *external, *universe;\n  isl_union_set *tag_set;\n\n  if (isl_union_map_is_empty(access))\n  {\n    isl_union_map_free(sched);\n    isl_union_map_free(tagged);\n    return access;\n  }\n\n  /* Tagger maps the tagged iteration domain to untagged iteration domain. \n   * Iteration domain is tagged to the access function.\n   * e.g., [S1[i,j,k]->_pet_ref_1[]] -> S1[(i),(j),(k)]\n   */\n  tagger = isl_union_pw_multi_aff_copy(prog->scop->tagger);\n  domain = isl_union_map_domain(isl_union_map_copy(tagged));\n  tagger = isl_union_pw_multi_aff_intersect_domain(tagger,\n                                                   isl_union_set_copy(domain));\n  sched = isl_union_map_preimage_domain_union_pw_multi_aff(sched, tagger);\n\n  /* Construct the relation \"local\"\n   * [[D -> R] -> [D' -> R']]\n   * D contains all the iteration domains accessing the elements in the group.\n   */\n  local = isl_union_map_apply_range(sched,\n                                    isl_union_map_reverse(isl_union_map_copy(sched)));\n  /* Derive the local dependence set. */\n  local = isl_union_map_intersect(local,\n                                  isl_union_map_copy(prog->scop->tagged_dep_flow));\n  empty = isl_union_map_is_empty(local);\n\n  external = isl_union_map_copy(prog->scop->tagged_dep_flow);\n  universe = isl_union_map_universe(isl_union_map_copy(access));\n  access_domain = isl_union_map_domain(universe);\n  domain = isl_union_set_universe(domain);\n  universe = isl_union_set_unwrap(domain);\n  universe = isl_union_map_intersect_domain(universe, access_domain);\n  domain = isl_union_map_wrap(universe);\n  if (read)\n    external = isl_union_map_intersect_range(external, domain);\n  else\n    external = isl_union_map_intersect_domain(external, domain);\n  external = isl_union_map_intersect_params(external,\n                                            isl_set_copy(prog->scop->context));\n  external = isl_union_map_subtract(external, local);\n  /* So far \"external\" contains only iteration elements accessing the \n   * non-local RAW pairs. */\n\n  if (read)\n  {\n    tag_set = isl_union_map_range(external);\n    external = wrapped_reference_to_access(tag_set, tagged);\n    //    /* Temporarily commented out, we don't consider live-in so far. */\n    //\t\texternal = isl_union_map_union(external,\n    //\t\t\t\tisl_union_map_copy(prog->scop->live_in));\n  }\n  else\n  {\n    tag_set = isl_union_map_domain(external);\n    external = wrapped_reference_to_access(tag_set, tagged);\n    //    /* Temporarily commented out, we don't consider live-out so far. */\n    //\t\texternal = isl_union_map_union(external,\n    //\t\t\t\tisl_union_map_copy(prog->scop->live_out));\n  }\n\n  if (empty < 0)\n    external = isl_union_map_free(external);\n  else if (empty)\n    external = isl_union_map_universe(external);\n\n  access = isl_union_map_intersect(access, external);\n\n  return access;\n}\n\n/* Given an access relation \"access\" from \"group\", remove those reads\n * if (\"read\" is 1) or writes (if \"read\" is 0) that are only needed to\n * communicate data within the same iteration of the schedule \"prefix\"\n * at the position where the copying of the group is inserted.\n * That is, the output dimension of \"prefix\"\n * is equal to tile->depth.\n * The domain of \"prefix\" corresponds to the original statement instances,\n * i.e., those that appear in the domains of the access relations.\n *\n * Extract the tagged access relation of \"group\" and\n * then call remove_local_accesses.\n */\n__isl_give isl_union_map *remove_local_accesses_group_flow(\n  struct autosa_kernel *kernel, struct autosa_array_ref_group *group,\n  __isl_take isl_union_map *access, __isl_keep isl_union_map *prefix,\n  int read)\n{\n  isl_union_map *sched, *tagged;\n\n  if (isl_union_map_is_empty(access))\n    return access;\n\n  tagged = group_tagged_access_relation(group);\n  sched = isl_union_map_copy(prefix);\n\n  return remove_local_accesses_flow(kernel->prog, tagged, access, sched, read);\n}\n\n/* Given an access relation \"access\" from \"group\", remove those reads\n * if (\"read\" is 1) or writes (if \"read\" is 0) that are only needed to\n * communicate data within the same iteration of the schedule \"prefix\"\n * at the position where the copying of the group is inserted.\n * That is, the output dimension of \"prefix\"\n * is equal to tile->depth.\n * The domain of \"prefix\" corresponds to the original statement instances,\n * i.e., those that appear in the domains of the access relations.\n *\n * Extract the tagged access relation of \"group\" and\n * then call remove_local_accesses.\n */\n__isl_give isl_union_map *remove_local_accesses_group(\n    struct autosa_kernel *kernel, struct autosa_array_ref_group *group,\n    __isl_take isl_union_map *access, __isl_keep isl_union_map *prefix,\n    int read)\n{\n  isl_union_map *sched, *tagged;\n\n  if (isl_union_map_is_empty(access))\n    return access;\n\n  tagged = group_tagged_access_relation(group);\n  sched = isl_union_map_copy(prefix);\n\n  return remove_local_accesses(kernel->prog, tagged, access, sched, read);\n}\n\n/* Compute the access relation for the access \"ref\".\n * Return the map D -> [S -> A]\n * where D is the iteration domain, S is the scheduling domains with the depth\n * of \"node\".\n */\n__isl_give isl_union_map *io_comm_access_ref(\n    struct autosa_kernel *kernel, __isl_keep isl_schedule_node *node,\n    struct autosa_array_ref_group *group,\n    struct autosa_stmt_access *ref,\n    int read)\n{\n  isl_union_map *prefix;\n  isl_union_map *access;  \n\n  prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n  prefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix,\n                                                            isl_union_pw_multi_aff_copy(kernel->contraction));\n  if (group->group_type == AUTOSA_IO_GROUP) {\n    access = autosa_io_group_ref_access_relation(group, ref, read, !read);\n  } else if (group->group_type == AUTOSA_DRAIN_GROUP) {\n    access = autosa_drain_group_ref_access_relation(\n        group, ref, read, !read, kernel->expanded_domain);\n  }\n\n  if (group->local_array->array_type == AUTOSA_INT_ARRAY)\n    access = remove_local_accesses_group_flow(kernel, group, access, prefix, read);\n\n  if (group->group_type == AUTOSA_IO_GROUP && group->attached_drain_group && !read) {\n    // TODO: temporary solution. We assume the io group and attached drain group\n    // always share the same access. Could be buggy.\n    access = isl_union_map_union(access, \n                                 autosa_drain_group_ref_access_relation(group->attached_drain_group, ref, read, !read, kernel->expanded_domain));\n  }\n\n  access = isl_union_map_range_product(prefix, access);\n\n  return access;\n}\n\n/* Compute the access relation for the accesses in the group.\n * Return the map D -> [S -> A]\n * where D is the iteration domain, S is the scheduling domains with the depth\n * of \"node\".\n */\n__isl_give isl_union_map *io_comm_access(\n    struct autosa_kernel *kernel, __isl_keep isl_schedule_node *node,\n    struct autosa_array_ref_group *group, int read)\n{\n  isl_union_map *prefix;\n  isl_union_map *access;\n\n  prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n  prefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix,\n                                                            isl_union_pw_multi_aff_copy(kernel->contraction));\n  access = isl_union_map_empty(isl_map_get_space(group->access));\n  for (int i = 0; i < group->n_ref; i++)\n  {\n    struct autosa_stmt_access *ref = group->refs[i];\n    if (group->group_type == AUTOSA_IO_GROUP) {\n      access = isl_union_map_union(access, autosa_io_group_ref_access_relation(\n                                               group, ref, read, !read));      \n    } else if (group->group_type == AUTOSA_DRAIN_GROUP)\n      access = isl_union_map_union(access, autosa_drain_group_ref_access_relation(\n                                               group, ref, read, !read, kernel->expanded_domain));\n  }\n\n  if (group->attached_drain_group) {    \n    for (int i = 0; i < group->attached_drain_group->n_ref; i++) {\n      struct autosa_stmt_access *ref = group->attached_drain_group->refs[i];\n      access = isl_union_map_union(access, autosa_drain_group_ref_access_relation(\n                                               group->attached_drain_group, ref, read, !read, kernel->expanded_domain));      \n    }\n  }\n\n  if (group->local_array->array_type == AUTOSA_INT_ARRAY)\n    access = remove_local_accesses_group_flow(kernel, group, access, prefix, read);\n\n  access = isl_union_map_range_product(prefix, access);\n\n  return access;\n}\n\nvoid free_group_pair(void *user)\n{\n  struct autosa_array_ref_group_pair *pair =\n      (struct autosa_array_ref_group_pair *)user;\n  free(pair);\n}\n\n/* Create a register tiling at the \"node\" for array reference \"ref\".\n */\nstruct autosa_array_tile *create_register_tiling(\n    isl_schedule_node *node,\n    struct autosa_array_ref_group *group,\n    struct autosa_stmt_access *ref)\n{\n  isl_union_map *access;\n  isl_map *access_i;\n  isl_ctx *ctx;\n  isl_union_map *sched;\n  isl_bool ok;\n  struct autosa_array_tile *tile;\n  isl_union_set *domain;\n\n  ctx = isl_schedule_node_get_ctx(node);\n  access = isl_union_map_from_map(isl_map_copy(ref->access));\n  tile = autosa_array_tile_create(ctx, group->array->n_index);\n  sched = isl_schedule_node_get_prefix_schedule_union_map(node);\n  domain = isl_schedule_node_get_domain(node);\n  sched = isl_union_map_intersect_domain(sched, domain);\n  access = isl_union_map_apply_domain(access, sched);\n  access_i = isl_map_from_union_map(access);\n  ok = can_tile(access_i, tile);\n  isl_map_free(access_i);\n  autosa_array_ref_group_compute_tiling(tile, group);\n\n  return tile;\n}\n\n/* Return the extent of \"array\", recomputed from the bounds.\n * The recomputed extent may be simpler than the original extent.\n */\nstatic __isl_give isl_set *array_extent(struct autosa_array_info *array)\n{\n  int i;\n  isl_id *id;\n  isl_space *space;\n  isl_local_space *ls;\n  isl_set *extent;\n\n  id = isl_set_get_tuple_id(array->extent);\n  space = isl_set_get_space(array->extent);\n  extent = isl_set_universe(isl_space_copy(space));\n  ls = isl_local_space_from_space(space);\n  for (i = 0; i < array->n_index; ++i)\n  {\n    isl_pw_aff *bound;\n    isl_aff *aff;\n    isl_pw_aff *index;\n    isl_set *lt;\n\n    extent = isl_set_lower_bound_si(extent, isl_dim_set, i, 0);\n\n    aff = isl_aff_var_on_domain(isl_local_space_copy(ls),\n                                isl_dim_set, i);\n    index = isl_pw_aff_from_aff(aff);\n    bound = isl_multi_pw_aff_get_pw_aff(array->bound, i);\n    bound = isl_pw_aff_from_range(bound);\n    bound = isl_pw_aff_add_dims(bound, isl_dim_in, array->n_index);\n    bound = isl_pw_aff_set_tuple_id(bound, isl_dim_in,\n                                    isl_id_copy(id));\n    lt = isl_pw_aff_lt_set(index, bound);\n    extent = isl_set_intersect(extent, lt);\n  }\n  isl_local_space_free(ls);\n  isl_id_free(id);\n\n  return extent;\n}\n\n/* Return a map from the first group->local_tile->depth dimensions\n * of the computed schedule to the array tile in\n * global memory that corresponds to the local memory copy.\n *\n * In particular, return a map\n *\n *\t{ D[i] -> A[a] }\n *\n * with constraints\n *\n *\ttile_offset(i) <= a <= tile_offset(i) + tile_size - 1\t\t(1)\n *\n * and\n *\n *\t0 <= a <= array_size - 1\t\t\t\t\t(2)\n *\n * Note that if some stride has been detected (i.e., when\n * group->local_tile->bound[i].shift is set), then a in (1) refers\n * to the shifted and scaled down version.\n *\n * Constraints (1) are obtained by mapping the size constraints on the\n * local memory tile back to the access relation.\n * Constraints (2) are obtained from the (recomputed) extent.\n */\n__isl_give isl_map *group_tile(struct autosa_array_ref_group *group)\n{\n  int i;\n  int n_index = group->array->n_index;\n  isl_map *tile;\n  isl_space *space;\n  isl_set *local;\n  isl_set *extent;\n\n  space = isl_multi_aff_get_space(group->local_tile->tiling);\n  space = isl_space_range(space);\n  local = isl_set_universe(space);\n  for (i = 0; i < n_index; ++i)\n  {\n    isl_val *bound;\n\n    local = isl_set_lower_bound_si(local, isl_dim_set, i, 0);\n    bound = isl_val_copy(group->local_tile->bound[i].size);\n    bound = isl_val_sub_ui(bound, 1);\n    local = isl_set_upper_bound_val(local, isl_dim_set, i, bound);\n  }\n  local = isl_set_preimage_multi_aff(local,\n                                     isl_multi_aff_copy(group->local_tile->tiling));\n  tile = isl_set_unwrap(local);\n  extent = array_extent(group->array);\n  tile = isl_map_intersect_range(tile, extent);\n\n  return tile;\n}\n\n/* Return a map from the first tile->depth dimensions\n * of the computed schedule to the array tile in\n * global memory that corresponds to the local memory copy.\n *\n * In particular, return a map\n *\n *\t{ D[i] -> A[a] }\n *\n * with constraints\n *\n *\ttile_offset(i) <= a <= tile_offset(i) + tile_size - 1\t\t(1)\n *\n * and\n *\n *\t0 <= a <= array_size - 1\t\t\t\t\t(2)\n *\n * Note that if some stride has been detected (i.e., when\n * group->local_tile->bound[i].shift is set), then a in (1) refers\n * to the shifted and scaled down version.\n *\n * Constraints (1) are obtained by mapping the size constraints on the\n * local memory tile back to the access relation.\n * Constraints (2) are obtained from the (recomputed) extent.\n */\n__isl_give isl_map *group_tile_buffer(struct autosa_array_ref_group *group,\n                                      struct autosa_array_tile *tile)\n{\n  int i;\n  int n_index = group->array->n_index;\n  isl_map *map;\n  isl_space *space;\n  isl_set *local;\n  isl_set *extent;\n\n  space = isl_multi_aff_get_space(tile->tiling);\n  space = isl_space_range(space);\n  local = isl_set_universe(space);\n\n  for (i = 0; i < n_index; ++i)\n  {\n    isl_val *bound;\n\n    local = isl_set_lower_bound_si(local, isl_dim_set, i, 0);\n    bound = isl_val_copy(tile->bound[i].size);\n    bound = isl_val_sub_ui(bound, 1);\n    local = isl_set_upper_bound_val(local, isl_dim_set, i, bound);\n  }\n  local = isl_set_preimage_multi_aff(local,\n                                     isl_multi_aff_copy(tile->tiling));\n  map = isl_set_unwrap(local);\n  extent = array_extent(group->array);\n  map = isl_map_intersect_range(map, extent);\n\n  return map;\n}\n\n/* Return the data packing factor used to trnasfer the data of \"group\" across\n * \"module\".\n * Specifically, we use data_pack_inter for IO modules.\n * For PE modules, if the array is an external array, it should equal to the \n * io_group SIMD lane (group->n_lane).\n * If the array is an internal array, for the drain group, we use the SIMD lane,\n * for the io group, if the io is an exterior I/O, we use the SIMD lane, \n * otherwise, we use the data packing factor of the local buffer \n * (io_buffers[0]->n_lane) which is allocated inside the PE.\n */\nint get_io_group_n_lane(struct autosa_hw_module *module,\n                        struct autosa_pe_dummy_module *dummy_module,\n                        struct autosa_array_ref_group *group)\n{\n  int n_lane;\n\n  if (module && module->type == PE_MODULE || dummy_module)\n  {\n    n_lane = (group->local_array->array_type == AUTOSA_EXT_ARRAY) ? group->n_lane : ((group->group_type == AUTOSA_DRAIN_GROUP) ? group->n_lane : ((group->io_type == AUTOSA_EXT_IO) ? group->n_lane : group->io_buffers[0]->n_lane));\n  }\n  else\n  {\n    n_lane = module->data_pack_inter;\n  }\n\n  return n_lane;\n}\n\n/* Given a description of an array tile \"tile\" and the \"space\"\n *\n *\t{ D -> A }\n *\n * where D represents the first tile->depth schedule dimensions\n * and A represents the array, construct an isl_multi_aff\n *\n *\t{ [D[i] -> A[a]] -> A'[a'] }\n *\n * with A' a scaled down copy of A according to the shifts and strides\n * in \"tile\".  In particular,\n *\n *\ta' = (a + shift(i))/stride\n *\n * \"insert_array\" represents\n *\n *\t{ [D -> A] -> D }\n *\n * and is used to insert A into the domain of functions that only\n * reference D.\n */\nstatic __isl_give isl_multi_aff *strided_tile_depth(\n    struct autosa_array_tile *tile, __isl_keep isl_space *space,\n    __isl_keep isl_multi_aff *insert_array, int depth)\n{\n  int i;\n  isl_ctx *ctx;\n  isl_multi_aff *shift;\n  isl_multi_val *stride;\n  isl_space *space2;\n  isl_local_space *ls;\n  isl_multi_aff *tiling;\n\n  ctx = isl_space_get_ctx(space);\n  space2 = isl_space_domain(isl_space_copy(space));\n  ls = isl_local_space_from_space(space2);\n  space2 = isl_space_range(isl_space_copy(space));\n  stride = isl_multi_val_zero(space2);\n  shift = isl_multi_aff_zero(isl_space_copy(space));\n\n  for (i = 0; i < tile->n; ++i)\n  {\n    struct autosa_array_bound *bound = &tile->bound[i];\n    isl_val *stride_i;\n    isl_aff *shift_i;\n\n    stride_i = isl_val_copy(bound->stride);\n    shift_i = isl_aff_copy(bound->shift);\n\n    shift_i = isl_aff_insert_dims(shift_i, isl_dim_in, tile->depth, depth - tile->depth);\n\n    stride = isl_multi_val_set_val(stride, i, stride_i);\n    shift = isl_multi_aff_set_aff(shift, i, shift_i);\n  }\n  isl_local_space_free(ls);\n\n  shift = isl_multi_aff_pullback_multi_aff(shift,\n                                           isl_multi_aff_copy(insert_array));\n\n  tiling = isl_multi_aff_range_map(isl_space_copy(space));\n  tiling = isl_multi_aff_add(tiling, shift);\n  tiling = isl_multi_aff_scale_down_multi_val(tiling, stride);\n\n  return tiling;\n}\n\n/* Recompute the tiling by extending the scheduling domain to the \"depth\". */\n__isl_give isl_multi_aff *autosa_array_ref_group_recompute_tiling(\n    struct autosa_array_tile *tile,\n    struct autosa_array_ref_group *group,\n    int depth)\n{\n  int i;\n  isl_space *space;\n  isl_multi_aff *tiling, *lb, *insert_array;\n  isl_printer *p;\n  char *local_name;\n\n  if (tile == NULL)\n    return NULL;\n\n  space = isl_map_get_space(group->access);\n  space = isl_space_from_range(isl_space_range(space));\n  /* Build D[i] -> A[a] */\n  space = isl_space_add_dims(space, isl_dim_in, depth);\n  /* Build [D[i] -> A[a]] -> D[i] */\n  insert_array = isl_multi_aff_domain_map(isl_space_copy(space));\n\n  for (i = 0; i < tile->n; ++i)\n    if (tile->bound[i].shift)\n      break;\n\n  if (i < tile->n)\n    tiling = strided_tile_depth(tile, space, insert_array, depth);\n  else\n    tiling = isl_multi_aff_range_map(isl_space_copy(space));\n\n  lb = isl_multi_aff_zero(space);\n  for (i = 0; i < tile->n; ++i)\n  {\n    isl_aff *lb_i = isl_aff_copy(tile->bound[i].lb);\n    lb_i = isl_aff_insert_dims(lb_i, isl_dim_in, tile->depth, depth - tile->depth);\n    lb = isl_multi_aff_set_aff(lb, i, lb_i);\n  }\n  lb = isl_multi_aff_pullback_multi_aff(lb, insert_array);\n\n  tiling = isl_multi_aff_sub(tiling, lb);\n\n  p = isl_printer_to_str(isl_multi_aff_get_ctx(tiling));\n  p = autosa_array_ref_group_print_name(group, p);\n  local_name = isl_printer_get_str(p);\n  isl_printer_free(p);\n  tiling = isl_multi_aff_set_tuple_name(tiling, isl_dim_out, local_name);\n  free(local_name);\n\n  return tiling;\n}\n\nvoid print_io_grouping_info(FILE *fp, struct autosa_kernel *kernel)\n{\n  const char *io_types[] = {\"AUTOSA_INT_IO\", \"AUTOSA_EXT_IO\", \"AUTOSA_UNKNOWN_IO\"};\n\n  fprintf(fp, \"================= IO Grouping Information =================\\n\");\n  for (int i = 0; i < kernel->n_array; i++) {\n    struct autosa_local_array_info *local = &kernel->array[i];\n    fprintf(fp, \"===================================================\\n\");\n    fprintf(fp, \"Array: %s\\n\", local->array->name);\n    fprintf(fp, \"===================================================\\n\");\n    fprintf(fp, \"local: %d\\n\", local->array->local);\n    fprintf(fp, \"n_io_groups: %d\\n\", local->n_io_group);\n    fprintf(fp, \"n_drain_groups: %d\\n\", local->drain_group? 1 : 0);\n    for (int j = 0; j < local->n_io_group; j++) {\n      struct autosa_array_ref_group *group = local->io_groups[j];\n      fprintf(fp, \"------------------------------\\n\");\n      fprintf(fp, \"IO Group: %d\\n\", j);\n      fprintf(fp, \"------------------------------\\n\");\n      fprintf(fp, \"copy_in: %d\\n\", group->copy_in);\n      fprintf(fp, \"copy_out: %d\\n\", group->copy_out);\n      fprintf(fp, \"io_type: %s\\n\", io_types[group->io_type]);\n      char *vec_str = isl_vec_to_str(group->dir);\n      fprintf(fp, \"io_dir: %s\\n\", vec_str);\n      free(vec_str);\n    }\n    if (local->drain_group) {\n      struct autosa_array_ref_group *group = local->drain_group;\n      fprintf(fp, \"------------------------------\\n\");\n      fprintf(fp, \"Drain Group\\n\");\n      fprintf(fp, \"------------------------------\\n\");\n      fprintf(fp, \"copy_in: %d\\n\", group->copy_in);\n      fprintf(fp, \"copy_out: %d\\n\", group->copy_out);\n      fprintf(fp, \"io_type: %s\\n\", io_types[group->io_type]);\n      char *vec_str = isl_vec_to_str(group->dir);\n      fprintf(fp, \"io_dir: %s\\n\", vec_str);\n      free(vec_str);      \n    }\n  }\n  fprintf(fp, \"================= IO Grouping Information =================\\n\");\n}"
  },
  {
    "path": "src/autosa_comm.h",
    "content": "#ifndef _AUTOSA_COMM_H\n#define _AUTOSA_COMM_H\n\n#include \"autosa_common.h\"\n\n#if defined(__cplusplus)\nextern \"C\" {\n#endif   \n\nisl_stat sa_io_construct_optimize(struct autosa_kernel *kernel, struct autosa_gen *gen);\nenum autosa_group_access_type autosa_array_ref_group_type(\n\tstruct autosa_array_ref_group *group);\nenum autosa_group_access_type autosa_cpu_array_ref_group_type(\n\tstruct autosa_array_ref_group *group);\t\nstruct autosa_array_tile *autosa_array_ref_group_tile(\n\tstruct autosa_array_ref_group *group);  \n__isl_give isl_printer *autosa_array_ref_group_print_name(\n\tstruct autosa_array_ref_group *group, __isl_take isl_printer *p);\n__isl_give isl_union_map *autosa_io_group_ref_access_relation(\n  struct autosa_array_ref_group *group,\n  struct autosa_stmt_access *ref,\n  int read, int write);\n__isl_give isl_union_map *autosa_array_ref_group_access_relation(\n\tstruct autosa_array_ref_group *group, int read, int write);\t\n__isl_give isl_union_map *autosa_io_group_access_relation(\n  struct autosa_array_ref_group *group, \n  struct autosa_kernel *kernel,\n  int read, int write);\n__isl_give isl_union_map *autosa_drain_group_ref_access_relation(\n  struct autosa_array_ref_group *group,\n  struct autosa_stmt_access *ref,\n  int read, int write, __isl_keep isl_union_set *domain);\t\n__isl_give isl_union_map *group_tagged_access_relation(\n\tstruct autosa_array_ref_group *group);\n__isl_give isl_union_map *remove_local_accesses_flow(\n\tstruct autosa_prog *prog, __isl_take isl_union_map *tagged,\n\t__isl_take isl_union_map *access, __isl_take isl_union_map *sched,\n\tint read);\t\n__isl_give isl_union_map *wrapped_reference_to_access(\n\t__isl_take isl_union_set *ref, __isl_take isl_union_map *tagged);\t\n__isl_give isl_union_map *remove_local_accesses(\n\tstruct autosa_prog *prog, __isl_take isl_union_map *tagged,\n\t__isl_take isl_union_map *access, __isl_take isl_union_map *sched,\n\tint read);\t\n__isl_give isl_union_map *remove_local_accesses_group_flow(\n\tstruct autosa_kernel *kernel, struct autosa_array_ref_group *group,\n\t__isl_take isl_union_map *access, __isl_keep isl_union_map *prefix,\n\tint read);\n__isl_give isl_union_map *remove_local_accesses_group(\n\tstruct autosa_kernel *kernel, struct autosa_array_ref_group *group,\n\t__isl_take isl_union_map *access, __isl_keep isl_union_map *prefix,\n\tint read);\t\n__isl_give isl_union_map *io_comm_access_ref(\n  struct autosa_kernel *kernel, __isl_keep isl_schedule_node *node,\n  struct autosa_array_ref_group *group, \n  struct autosa_stmt_access *ref,\n  int read);\t\n__isl_give isl_union_map *io_comm_access(\n  struct autosa_kernel *kernel, __isl_keep isl_schedule_node *node,\n  struct autosa_array_ref_group *group, int read);\t\nvoid free_group_pair(void *user);\nstruct autosa_array_tile *create_register_tiling(\n  isl_schedule_node *node,\n  struct autosa_array_ref_group *group,\n  struct autosa_stmt_access *ref);\n__isl_give isl_map *group_tile(struct autosa_array_ref_group *group);\t\n__isl_give isl_map *group_tile_buffer(struct autosa_array_ref_group *group,\n  struct autosa_array_tile *tile);\nint get_io_group_n_lane(struct autosa_hw_module *module, \n  struct autosa_pe_dummy_module *dummy_module,\n  struct autosa_array_ref_group *group);\n__isl_give isl_multi_aff *autosa_array_ref_group_recompute_tiling(\n  struct autosa_array_tile *tile,\n  struct autosa_array_ref_group *group,\n  int depth);  \nisl_bool is_io_module_valid(\n  __isl_keep isl_schedule_node *node,  \n  struct autosa_kernel *kernel, \n  struct autosa_array_ref_group *group, int read);  \nvoid print_io_grouping_info(FILE *fp, struct autosa_kernel *kernel);\n\n#if defined(__cplusplus)\n}\n#endif \n\n#endif"
  },
  {
    "path": "src/autosa_common.cpp",
    "content": "/* Defines functions used for AutoSA structs. */\n\n#include <isl/id.h>\n#include <cJSON/cJSON.h>\n\n#include \"autosa_common.h\"\n#include \"autosa_utils.h\"\n#include \"autosa_print.h\"\n\n/****************************************************************\n * AutoSA kernel\n ****************************************************************/\n/* Free the AutoSA kernel struct. */\nvoid *autosa_kernel_free(struct autosa_kernel *kernel)\n{\n  if (!kernel)\n    return NULL;\n\n  isl_schedule_free(kernel->schedule);\n  isl_ast_node_free(kernel->tree);\n  isl_union_map_free(kernel->sizes);\n  isl_union_map_free(kernel->used_sizes);\n  isl_union_set_free(kernel->core);\n  isl_set_free(kernel->context);\n  isl_multi_pw_aff_free(kernel->sa_grid_size);\n  isl_union_set_free(kernel->arrays);\n  isl_union_pw_multi_aff_free(kernel->copy_schedule);\n  isl_space_free(kernel->space);\n  isl_id_list_free(kernel->block_ids);\n  isl_id_list_free(kernel->thread_ids);\n  isl_id_list_free(kernel->pe_ids);\n  isl_union_set_free(kernel->pe_filter);\n  isl_multi_pw_aff_free(kernel->grid_size);\n  isl_ast_expr_free(kernel->grid_size_expr);\n  isl_union_pw_multi_aff_free(kernel->contraction);\n  isl_union_set_free(kernel->expanded_domain);\n  isl_set_free(kernel->host_domain);\n  isl_union_set_free(kernel->domain);\n  for (int i = 0; i < kernel->n_array; ++i)\n  {\n    struct autosa_local_array_info *array = &kernel->array[i];\n    for (int j = 0; j < array->n_group; ++j)\n      autosa_array_ref_group_free(array->groups[j]);\n    free(array->groups);    \n    for (int j = 0; j < array->n_pe_group; ++j)\n      autosa_array_ref_group_free(array->pe_groups[j]);\n    free(array->pe_groups);\n    for (int j = 0; j < array->n_io_group; ++j)\n      autosa_array_ref_group_free(array->io_groups[j]);\n    free(array->io_groups);\n    autosa_array_ref_group_free(array->drain_group);\n\n    isl_multi_pw_aff_free(array->bound);\n    isl_ast_expr_free(array->bound_expr);\n    \n    isl_pw_qpolynomial_free(array->serialize_bound);\n  }\n  if (kernel->array) {\n    delete[] kernel->array;\n    //free(kernel->array);\n  }\n\n  for (int i = 0; i < kernel->n_var; i++)\n  {\n    free(kernel->var[i].name);\n    isl_vec_free(kernel->var[i].size);\n  }\n  free(kernel->var);\n  delete kernel->tuning_program;\n\n  free(kernel);\n  return NULL;\n}\n\n/* Copy a new AutoSA kernel struct. */\nstruct autosa_kernel *autosa_kernel_copy(struct autosa_kernel *kernel)\n{\n  struct autosa_kernel *kernel_dup = (struct autosa_kernel *)malloc(\n      sizeof(struct autosa_kernel));\n  kernel_dup->ctx = kernel->ctx;\n  kernel_dup->schedule = isl_schedule_copy(kernel->schedule);\n  kernel_dup->scop = kernel->scop;\n  kernel_dup->options = kernel->options;\n  kernel_dup->n_sa_dim = kernel->n_sa_dim;\n  for (int i = 0; i < kernel->n_sa_dim; i++)\n  {\n    kernel_dup->sa_dim[i] = kernel->sa_dim[i];\n  }\n  kernel_dup->array_part_w = kernel->array_part_w;\n  kernel_dup->space_w = kernel->space_w;\n  kernel_dup->time_w = kernel->time_w;\n  kernel_dup->type = kernel->type;\n  kernel_dup->sa_grid_size = isl_multi_pw_aff_copy(kernel->sa_grid_size);\n  kernel_dup->sizes = isl_union_map_copy(kernel->sizes);\n  kernel_dup->used_sizes = isl_union_map_copy(kernel->used_sizes);\n  kernel_dup->id = kernel->id;\n  kernel_dup->space_time_id = kernel->space_time_id;\n  kernel_dup->core = isl_union_set_copy(kernel->core);\n  kernel_dup->arrays = isl_union_set_copy(kernel->arrays);\n  kernel_dup->n_array = kernel->n_array;\n  kernel_dup->array = kernel->array;\n  kernel_dup->copy_schedule = isl_union_pw_multi_aff_copy(kernel->copy_schedule);\n  kernel_dup->copy_schedule_dim = kernel->copy_schedule_dim;\n  kernel_dup->space = isl_space_copy(kernel->space);\n  kernel_dup->tree = isl_ast_node_copy(kernel->tree);\n  kernel_dup->n_var = kernel->n_var;\n  kernel_dup->var = kernel->var;\n  kernel_dup->block_ids = isl_id_list_copy(kernel->block_ids);\n  kernel_dup->thread_ids = isl_id_list_copy(kernel->thread_ids);\n  kernel_dup->pe_ids = isl_id_list_copy(kernel->pe_ids);\n  kernel_dup->pe_filter = isl_union_set_copy(kernel->pe_filter);\n  kernel_dup->n_grid = kernel->n_grid;\n  kernel_dup->n_block = kernel->n_block;\n  for (int i = 0; i < kernel->n_grid; i++)\n  {\n    kernel_dup->grid_dim[i] = kernel->grid_dim[i];\n  }\n  for (int i = 0; i < kernel->n_block; i++)\n  {\n    kernel_dup->block_dim[i] = kernel->block_dim[i];\n  }\n  kernel_dup->grid_size = isl_multi_pw_aff_copy(kernel->grid_size);\n  kernel_dup->grid_size_expr = isl_ast_expr_copy(kernel->grid_size_expr);\n  kernel_dup->context = isl_set_copy(kernel->context);\n  kernel_dup->contraction = isl_union_pw_multi_aff_copy(kernel->contraction);\n  kernel_dup->expanded_domain = isl_union_set_copy(kernel->expanded_domain);\n  kernel_dup->host_domain = isl_set_copy(kernel->host_domain);\n  kernel_dup->domain = isl_union_set_copy(kernel->domain);\n  kernel_dup->single_statement = kernel->single_statement;\n  kernel_dup->sparse = kernel->sparse;\n  kernel_dup->vec_len = kernel->vec_len;\n  kernel_dup->n_nzero = kernel->n_nzero;\n  kernel_dup->compress_ratio = kernel->compress_ratio;\n  kernel_dup->n_meta_data = kernel->n_meta_data;\n  kernel_dup->eff_compress_ratio = kernel->eff_compress_ratio;\n\n  // TODO: Deep-copy\n  kernel_dup->tuning_program = kernel->tuning_program;\n\n  return kernel_dup;\n}\n\n/* Allocate a new AutoSA kernel struct with the given schedule. */\nstruct autosa_kernel *autosa_kernel_from_schedule(__isl_take isl_schedule *schedule)\n{\n  struct autosa_kernel *kernel = (struct autosa_kernel *)malloc(\n      sizeof(struct autosa_kernel));\n  kernel->ctx = isl_schedule_get_ctx(schedule);\n  kernel->schedule = schedule;\n  kernel->scop = NULL;\n  kernel->prog = NULL;\n  kernel->options = NULL;\n  kernel->n_sa_dim = 0;\n  kernel->array_part_w = 0;\n  kernel->space_w = 0;\n  kernel->time_w = 0;\n  kernel->type = 0;\n  kernel->sa_grid_size = NULL;\n  kernel->sizes = NULL;\n  kernel->used_sizes = NULL;\n  kernel->id = 0;\n  kernel->core = NULL;\n  kernel->arrays = NULL;\n  kernel->n_array = 0;\n  kernel->array = NULL;\n  kernel->copy_schedule = NULL;\n  kernel->copy_schedule_dim = -1;\n  kernel->space = NULL;\n  kernel->tree = NULL;\n  kernel->n_var = 0;\n  kernel->var = NULL;\n  kernel->block_ids = NULL;\n  kernel->thread_ids = NULL;\n  kernel->pe_ids = NULL;\n  kernel->pe_filter = NULL;\n  kernel->n_grid = 0;\n  kernel->n_block = 0;\n  kernel->grid_size = NULL;\n  kernel->grid_size_expr = NULL;\n  kernel->context = NULL;\n  kernel->contraction = NULL;\n  kernel->expanded_domain = NULL;\n  kernel->host_domain = NULL;\n  kernel->domain = NULL;\n  kernel->single_statement = 0;\n  kernel->sparse = 0;\n  kernel->vec_len = 0;\n  kernel->n_nzero = 0;\n  kernel->compress_ratio = 0;\n  kernel->n_meta_data = 0;\n  kernel->eff_compress_ratio = 0;\n  kernel->tuning_program = NULL;\n\n  return kernel;\n}\n\nstruct autosa_kernel *autosa_kernel_alloc(isl_ctx *ctx, struct ppcg_scop *scop)\n{\n  struct autosa_kernel *kernel;\n  isl_space *space;\n  isl_map *id;\n\n  if (!scop)\n    return NULL;\n\n  kernel = isl_calloc_type(ctx, struct autosa_kernel);\n  if (!kernel)\n    return NULL;\n\n  kernel->ctx = ctx;\n  kernel->scop = scop;\n  kernel->prog = NULL;\n  kernel->options = NULL;\n  kernel->n_sa_dim = 0;\n  kernel->array_part_w = 0;\n  kernel->space_w = 0;\n  kernel->time_w = 0;\n  kernel->type = 0;\n  kernel->sa_grid_size = NULL;\n  kernel->sizes = NULL;\n  kernel->used_sizes = NULL;\n  kernel->id = 0;\n  kernel->core = NULL;\n  kernel->arrays = NULL;\n  kernel->n_array = 0;\n  kernel->array = NULL;\n  kernel->copy_schedule = NULL;\n  kernel->copy_schedule_dim = -1;\n  kernel->space = NULL;\n  kernel->tree = NULL;\n  kernel->n_var = 0;\n  kernel->var = NULL;\n  kernel->block_ids = NULL;\n  kernel->thread_ids = NULL;\n  kernel->pe_ids = NULL;\n  kernel->pe_filter = NULL;\n  kernel->n_grid = 0;\n  kernel->n_block = 0;\n  kernel->grid_size = NULL;\n  kernel->grid_size_expr = NULL;\n  kernel->context = NULL;\n  kernel->contraction = NULL;\n  kernel->expanded_domain = NULL;\n  kernel->host_domain = NULL;\n  kernel->domain = NULL;\n  kernel->single_statement = 0;  \n  kernel->sparse = 0;\n  kernel->vec_len = 0;\n  kernel->n_nzero = 0;\n  kernel->compress_ratio = 0;\n  kernel->n_meta_data = 0;\n  kernel->eff_compress_ratio = 0;\n  kernel->tuning_program = NULL;\n\n  return kernel;\n}\n\n/****************************************************************\n * AutoSA access\n ****************************************************************/\n/* Create an identical mapping. */\nstatic __isl_give isl_map *same(__isl_take isl_space *domain_space)\n{\n  isl_space *space;\n  isl_aff *aff;\n  isl_multi_aff *next;\n\n  space = isl_space_map_from_set(domain_space);\n  next = isl_multi_aff_identity(space);\n\n  return isl_map_from_multi_aff(next);\n}\n\n/* Construct a map from domain_space to domain_space that increments\n * the dimension at position \"pos\" and leaves all other dimensions constant. \n */\nstatic __isl_give isl_map *next(__isl_take isl_space *domain_space, int pos)\n{\n  isl_space *space;\n  isl_aff *aff;\n  isl_multi_aff *next;\n\n  space = isl_space_map_from_set(domain_space);\n  next = isl_multi_aff_identity(space);\n  aff = isl_multi_aff_get_aff(next, pos);\n  aff = isl_aff_add_constant_si(aff, 1);\n  next = isl_multi_aff_set_aff(next, pos, aff);\n\n  return isl_map_from_multi_aff(next);\n}\n\n/* Check is the \"access\" has stride-0 access at dim \"pos\".\n * The access is already transformed to scheduling domains. \n * We first create an identical mapping \"next_element\"that maps the accessed \n * elements to the same elements. \n * Then, we create a mapping \"map\" that maps the array elements accessed by the \n * current iteration to the elements accssed by the next iteration.\n * We examine if the access is stride-0 by testing if map is the subset of \n * \"next_element\".\n */\nisl_bool access_is_stride_zero(__isl_keep isl_map *access, int pos)\n{\n  isl_space *space;\n  int dim;\n  isl_map *next_element, *map, *next_iter;\n  isl_set *accessed;\n  isl_bool empty, zero;\n\n  space = isl_map_get_space(access);\n  space = isl_space_range(space);\n  dim = isl_space_dim(space, isl_dim_set);\n  if (dim == 0)\n    next_element = isl_map_empty(isl_space_map_from_set(space));\n  else\n    next_element = same(space);\n\n  accessed = isl_map_range(isl_map_copy(access));\n  map = isl_map_copy(next_element);\n  map = isl_map_intersect_domain(map, isl_set_copy(accessed));\n  map = isl_map_intersect_range(map, accessed);\n  empty = isl_map_is_empty(map);\n  isl_map_free(map);\n\n  if (empty < 0 || empty)\n  {\n    isl_map_free(next_element);\n    return empty;\n  }\n\n  space = isl_map_get_space(access);\n  space = isl_space_domain(space);\n  next_iter = next(space, isl_map_dim(access, isl_dim_in) - 1);\n  map = isl_map_apply_domain(next_iter, isl_map_copy(access));\n  map = isl_map_apply_range(map, isl_map_copy(access));\n  zero = isl_map_is_subset(map, next_element);\n\n  isl_map_free(next_element);\n  isl_map_free(map);\n\n  return zero;\n}\n\n/* Check is the \"access\" has stride-1 access at dim \"pos\".\n * The access is already transformed to scheduling domains. \n * We first create a mapping \"next_element\"that maps the accessed \n * elements to the elements with a stride of one. \n * Then, we create a mapping \"map\" that maps the array elements accessed by the \n * current iteration to the elements accssed by the next iteration.\n * We examine if the access is stride-1 by testing if map is the subset of \n * \"next_element\".\n */\nisl_bool access_is_stride_one(__isl_keep isl_map *access, int pos)\n{\n  isl_space *space;\n  int dim;\n  isl_map *next_element, *map, *next_iter;\n  isl_set *accessed;\n  isl_bool empty, coalesced;\n\n  space = isl_map_get_space(access);\n  space = isl_space_range(space);\n  dim = isl_space_dim(space, isl_dim_set);\n  if (dim == 0)\n    next_element = isl_map_empty(isl_space_map_from_set(space));\n  else\n    next_element = next(space, pos);\n\n  accessed = isl_map_range(isl_map_copy(access));\n  map = isl_map_copy(next_element);\n  map = isl_map_intersect_domain(map, isl_set_copy(accessed));\n  map = isl_map_intersect_range(map, accessed);\n  empty = isl_map_is_empty(map);\n  isl_map_free(map);\n\n  if (empty < 0 || empty)\n  {\n    isl_map_free(next_element);\n    return empty;\n  }\n\n  space = isl_map_get_space(access);\n  space = isl_space_domain(space);\n  next_iter = next(space, isl_map_dim(access, isl_dim_in) - 1);  \n  map = isl_map_apply_domain(next_iter, isl_map_copy(access));\n  map = isl_map_apply_range(map, isl_map_copy(access));\n  if (isl_map_is_empty(map))\n  {\n    isl_map_free(next_element);\n    isl_map_free(map);\n    return isl_bool_false;\n  }\n  coalesced = isl_map_is_subset(map, next_element);\n\n  isl_map_free(next_element);\n  isl_map_free(map);\n\n  return coalesced;\n}\n\nvoid *autosa_acc_free(struct autosa_acc *acc)\n{\n  if (!acc)\n    return NULL;\n\n  isl_map_free(acc->tagged_map);\n  isl_map_free(acc->map);\n  isl_space_free(acc->id);\n\n  free(acc);\n\n  return NULL;\n}\n\nstruct autosa_io_buffer *autosa_io_buffer_alloc()\n{\n  struct autosa_io_buffer *io_buffer = (struct autosa_io_buffer *)malloc(sizeof(struct autosa_io_buffer));\n  io_buffer->tile = NULL;\n  io_buffer->level = -1;\n  io_buffer->n_lane = -1;\n  io_buffer->serialize = -1;\n  io_buffer->sparse = -1;\n  io_buffer->vec_len = -1;  \n  io_buffer->tuning_tile = NULL;\n  io_buffer->hoist_depth = -1;\n  io_buffer->hoist_domain = NULL;\n\n  return io_buffer;\n}\n\n/****************************************************************\n * AutoSA dep\n ****************************************************************/\n/* Free up the dependence. */\nvoid *autosa_dep_free(__isl_take struct autosa_dep *dep)\n{\n  if (!dep)\n    return NULL;\n\n  if (dep->src)\n    dep->src = isl_id_free(dep->src);\n  if (dep->dest)\n    dep->dest = isl_id_free(dep->dest);\n  if (dep->disvec)\n    isl_vec_free(dep->disvec);\n  if (dep->src_sched_domain)\n    isl_set_free(dep->src_sched_domain);\n  if (dep->dest_sched_domain)\n    isl_set_free(dep->dest_sched_domain);\n  if (dep->isl_dep)\n    isl_basic_map_free(dep->isl_dep);\n\n  free(dep);\n\n  return NULL;\n}\n\n/****************************************************************\n * AutoSA iterator\n ****************************************************************/\n\n__isl_null struct autosa_iter *autosa_iter_free(struct autosa_iter *iter)\n{\n  if (!iter)\n    return NULL;\n\n  free(iter->name);\n  free(iter->ts_name);\n  isl_aff_free(iter->lb);\n  isl_aff_free(iter->ub);\n\n  free(iter);\n\n  return NULL;\n}\n\n/****************************************************************\n * AutoSA array\n ****************************************************************/\n\nstatic void free_array_info(struct autosa_prog *prog)\n{\n  int i;\n\n  for (i = 0; i < prog->n_array; ++i)\n  {\n    free(prog->array[i].type);\n    free(prog->array[i].name);\n    isl_multi_pw_aff_free(prog->array[i].bound);\n    isl_ast_expr_free(prog->array[i].bound_expr);\n    isl_space_free(prog->array[i].space);\n    isl_set_free(prog->array[i].declared_extent);\n    isl_set_free(prog->array[i].extent);\n    isl_ast_expr_free(prog->array[i].declared_size);\n    free(prog->array[i].refs);\n    isl_union_map_free(prog->array[i].dep_order);\n  }\n  //free(prog->array);\n  delete[] prog->array;\n}\n\n/* Is the array \"array\" being extracted a read-only scalar?\n *\n * That is, is \"array\" a scalar that is never possibly written to.\n * An array containing structures is never considered to be a scalar.\n */\nstatic int is_read_only_scalar(struct autosa_array_info *array,\n                               struct autosa_prog *prog)\n{\n  isl_set *space;\n  isl_union_map *write;\n  int empty;\n\n  if (array->has_compound_element)\n    return 0;\n  if (array->n_index != 0)\n    return 0;\n\n  write = isl_union_map_copy(prog->may_write);\n  space = isl_set_universe(isl_space_copy(array->space));\n  write = isl_union_map_intersect_range(write,\n                                        isl_union_set_from_set(space));\n  empty = isl_union_map_is_empty(write);\n  isl_union_map_free(write);\n\n  return empty;\n}\n\n/* Compute and return the extent of \"array\", taking into account the set of\n * accessed elements.\n *\n * In particular, the extent in the outer dimension is taken\n * from \"accessed\", while the extents in the remaining dimensions\n * are taken from array->extent.\n *\n * The extent in the outer dimension cannot be taken from array->extent\n * because that may be unbounded.  Furthermore, even if it is bounded,\n * it may be larger than the piece of the array that is being accessed.\n */\nstatic __isl_give isl_set *compute_extent(struct pet_array *array,\n                                          __isl_keep isl_set *accessed)\n{\n  int n_index;\n  isl_id *id;\n  isl_set *outer;\n  isl_set *extent;\n\n  extent = isl_set_copy(array->extent);\n\n  n_index = isl_set_dim(accessed, isl_dim_set);\n  if (n_index == 0)\n    return extent;\n\n  extent = isl_set_project_out(extent, isl_dim_set, 0, 1);\n  outer = isl_set_copy(accessed);\n  outer = isl_set_project_out(outer, isl_dim_set, 1, n_index - 1);\n  extent = isl_set_flat_product(outer, extent);\n  id = isl_set_get_tuple_id(accessed);\n  extent = isl_set_set_tuple_id(extent, id);\n\n  return extent;\n}\n\n/* Return the name of the outer array (of structs) accessed by \"access\".\n */\nstatic const char *get_outer_array_name(__isl_keep isl_map *access)\n{\n  isl_space *space;\n  const char *name;\n\n  space = isl_space_range(isl_map_get_space(access));\n  while (space && isl_space_is_wrapping(space))\n    space = isl_space_domain(isl_space_unwrap(space));\n  name = isl_space_get_tuple_name(space, isl_dim_set);\n  isl_space_free(space);\n\n  return name;\n}\n\n/* Collect all references to the given array and store pointers to them\n * in array->refs.\n */\nstatic isl_stat collect_references(struct autosa_prog *prog,\n                                   struct autosa_array_info *array)\n{\n  int i;\n  int n;\n\n  n = 0;\n  for (i = 0; i < prog->n_stmts; ++i)\n  {\n    struct autosa_stmt *stmt = &prog->stmts[i];\n    struct autosa_stmt_access *access;\n\n    for (access = stmt->accesses; access; access = access->next)\n    {\n      const char *name;\n      name = get_outer_array_name(access->access);\n      if (name && !strcmp(array->name, name))\n        n++;\n    }\n  }\n\n  array->refs = isl_alloc_array(prog->ctx, struct autosa_stmt_access *, n);\n  if (!array->refs)\n    return isl_stat_error;\n  array->n_ref = n;\n\n  n = 0;\n  for (i = 0; i < prog->n_stmts; ++i)\n  {\n    struct autosa_stmt *stmt = &prog->stmts[i];\n    struct autosa_stmt_access *access;\n\n    for (access = stmt->accesses; access; access = access->next)\n    {\n      const char *name;\n      name = get_outer_array_name(access->access);\n      if (!name || strcmp(array->name, name))\n        continue;\n\n      array->refs[n++] = access;\n    }\n  }\n\n  return isl_stat_ok;\n}\n\n/* Is \"array\" only accessed as individual, fixed elements?\n * That is, does each access to \"array\" access a single, fixed element?\n */\nstatic isl_bool only_fixed_element_accessed(struct autosa_array_info *array)\n{\n  int i;\n\n  for (i = 0; i < array->n_ref; ++i)\n    if (!array->refs[i]->fixed_element)\n      return isl_bool_false;\n\n  return isl_bool_true;\n}\n\n/* Compute bounds on the host array \"pa\" based on the corresponding\n * accessed elements in \"arrays\"\n * and collect all references to the array.\n * Store the results in \"info\".\n *\n * If the array is zero-dimensional and does not contain structures,\n * i.e., if the array is a scalar, we check whether it is read-only.\n * We also check whether the array is accessed at all.\n */\nstatic isl_stat extract_array_info(struct autosa_prog *prog,\n                                   struct autosa_array_info *info, struct pet_array *pa,\n                                   __isl_keep isl_union_set *arrays)\n{\n  int empty;\n  const char *name;\n  int n_index;\n  isl_multi_pw_aff *bounds;\n  isl_set *accessed, *extent;\n\n  n_index = isl_set_dim(pa->extent, isl_dim_set);\n  name = isl_set_get_tuple_name(pa->extent);\n\n  info->space = isl_set_get_space(pa->extent);\n  info->name = strdup(name);\n  info->n_index = n_index;\n  info->linearize = prog->scop->options->linearize_device_arrays;\n\n  info->type = strdup(pa->element_type);\n  info->size = pa->element_size;\n  info->local = pa->declared && !pa->exposed;\n  info->has_compound_element = pa->element_is_record;\n  info->read_only_scalar = is_read_only_scalar(info, prog);\n\n  info->declared_extent = isl_set_copy(pa->extent);\n  accessed = isl_union_set_extract_set(arrays,\n                                       isl_space_copy(info->space));\n  empty = isl_set_is_empty(accessed);\n  extent = compute_extent(pa, accessed);\n  isl_set_free(accessed);\n  info->extent = extent;\n  if (empty < 0)\n    return isl_stat_error;\n  info->accessed = !empty;\n  bounds = ppcg_size_from_extent(isl_set_copy(extent));\n  bounds = isl_multi_pw_aff_gist(bounds, isl_set_copy(prog->context));\n  if (!bounds)\n    return isl_stat_error;\n  if (!isl_multi_pw_aff_is_cst(bounds))\n    info->linearize = prog->scop->options->linearize_device_arrays;\n    //info->linearize = 1;\n  info->bound = bounds;\n\n  if (collect_references(prog, info) < 0)\n    return isl_stat_error;\n  info->only_fixed_element = only_fixed_element_accessed(info);\n  info->declare_local = 0;\n  info->dep_order = NULL;\n  info->declared_size = NULL;\n  info->global = 0;\n  info->bound_expr = NULL;\n\n  /* AutoSA Extended */\n  info->n_lane = 0;\n  info->local_array = NULL;\n  info->copy_in = 0;\n  info->copy_out = 0;  \n  /* AutoSA Extended */\n\n  return isl_stat_ok;\n}\n\n/* Remove independence from the order constraints \"order\" on array \"array\".\n * Since the pairs of iterations in the filter relation of an independence\n * are guaranteed to be completely independent by the user, there is\n * no need to ensure that live ranges are ordered along those pairs.\n * We make an exception for local variables, though, as the independence\n * guarantee does not apply to those.\n *\n * The order constraints are used in two places.\n * Those on scalars are used in check_scalar_live_ranges to check if\n * we need to force the scalar to be private.  Any non-local scalar\n * should not be forced scalar if it only appears in independent loops.\n * Those on non-scalars are added to the coincidence constraints\n * in compute_schedule because we do not support any array expansion.\n * Accesses to non-local arrays should not prevent a loop from being\n * considered coincident so we should indeed remove those constraints\n * from the order constraints.\n */\nstatic __isl_give isl_union_map *remove_independences(struct autosa_prog *prog,\n                                                      struct autosa_array_info *array, __isl_take isl_union_map *order)\n{\n  int i;\n\n  for (i = 0; i < prog->scop->pet->n_independence; ++i)\n  {\n    struct pet_independence *pi = prog->scop->pet->independences[i];\n    if (isl_union_set_contains(pi->local, array->space))\n      continue;\n\n    order = isl_union_map_subtract(order,\n                                   isl_union_map_copy(pi->filter));\n  }\n\n  return order;\n}\n\n/* Can \"array\" be mapped to private memory?\n * That is, is it only accessed as individual elements with\n * constant index expressions?\n */\nstatic isl_bool autosa_array_can_be_private(struct autosa_array_info *array)\n{\n  if (!array)\n    return isl_bool_error;\n  return array->only_fixed_element ? isl_bool_true : isl_bool_false;\n}\n\n/* For each array in \"prog\", store the (untagged) order dependences\n * derived from the array in array->dep_order.\n * In particular, consider all references that access the given array\n * and take the order dependences that have one of these references\n * as source.  (Since an order dependence relates two references to\n * the same array, the target of these order dependences will also\n * be one of these references.)\n * Additionally, store the union of these array->dep_order relations\n * for all arrays that cannot be mapped to private memory in prog->array_order.\n */\nstatic void collect_order_dependences(struct autosa_prog *prog)\n{\n  int i;\n  isl_space *space;\n  isl_union_map *accesses;\n\n  space = isl_union_map_get_space(prog->read);\n  prog->array_order = isl_union_map_empty(space);\n\n  accesses = isl_union_map_copy(prog->scop->tagged_reads);\n  accesses = isl_union_map_union(accesses,\n                                 isl_union_map_copy(prog->scop->tagged_may_writes));\n  accesses = isl_union_map_universe(accesses);\n  accesses = isl_union_map_apply_range(accesses,\n                                       isl_union_map_copy(prog->to_outer));\n\n  for (i = 0; i < prog->n_array; ++i)\n  {\n    struct autosa_array_info *array = &prog->array[i];\n    isl_set *set;\n    isl_union_set *uset;\n    isl_union_map *order;\n\n    set = isl_set_universe(isl_space_copy(array->space));\n    uset = isl_union_set_from_set(set);\n    uset = isl_union_map_domain(\n        isl_union_map_intersect_range(isl_union_map_copy(accesses),\n                                      uset));\n    order = isl_union_map_copy(prog->scop->tagged_dep_order);\n    order = isl_union_map_intersect_domain(order, uset);\n    order = isl_union_map_zip(order);\n    order = isl_union_set_unwrap(isl_union_map_domain(order));\n    order = remove_independences(prog, array, order);\n    array->dep_order = order;\n\n    if (autosa_array_can_be_private(array))\n      continue;\n\n    prog->array_order = isl_union_map_union(prog->array_order,\n                                            isl_union_map_copy(array->dep_order));\n  }\n\n  isl_union_map_free(accesses);\n}\n\n/* Construct a autosa_array_info for each array referenced by prog->scop and\n * collect them in prog->array.\n * \n * The sizes are based on the extents and the set of possibly accessed\n * elements by \"prog\".\n * If there are any member accesses involved, then they are first mapped\n * to the outer arrays of structs.\n * Only extract autosa_array_info entries for these outer arrays.\n * \n * If we are allowing live range reordering, then also set \n * the dep_order field. Otherwise leave it NULL.\n */\nisl_stat collect_array_info(struct autosa_prog *prog)\n{\n  int i;\n  isl_stat r = isl_stat_ok;\n  isl_union_set *arrays;\n\n  prog->n_array = 0;\n  //prog->array = isl_calloc_array(prog->ctx,\n  //                               struct autosa_array_info, prog->scop->pet->n_array);\n  prog->array = new autosa_array_info[prog->scop->pet->n_array];\n  if (!prog->array)\n    return isl_stat_error;\n\n  arrays = isl_union_map_range(isl_union_map_copy(prog->read));\n  arrays = isl_union_set_union(arrays,\n                               isl_union_map_range(isl_union_map_copy(prog->may_write)));\n\n  arrays = isl_union_set_apply(arrays,\n                               isl_union_map_copy(prog->to_outer));\n\n  arrays = isl_union_set_coalesce(arrays);\n\n  for (i = 0; i < prog->scop->pet->n_array; ++i)\n  {\n    isl_bool field;\n\n    field = isl_set_is_wrapping(prog->scop->pet->arrays[i]->extent);\n    if (field < 0)\n      break;\n    if (field)\n      continue;\n    if (extract_array_info(prog, &prog->array[prog->n_array++],\n                           prog->scop->pet->arrays[i], arrays) < 0)\n      r = isl_stat_error;\n  }\n  if (i < prog->scop->pet->n_array)\n    r = isl_stat_error;\n\n  isl_union_set_free(arrays);\n\n  if (prog->scop->options->live_range_reordering)\n    collect_order_dependences(prog);\n\n  return r;\n}\n\n/* Is \"array\" a read-only scalar?\n */\nint autosa_array_is_read_only_scalar(struct autosa_array_info *array)\n{\n  return array->read_only_scalar;\n}\n\n/* Check if a autosa array is a scalar.  A scalar is a value that is not stored\n * as an array or through a pointer reference, but as a single data element.\n * At the moment, scalars are represented as zero-dimensional arrays.\n * Note that the single data element may be an entire structure.\n */\nint autosa_array_is_scalar(struct autosa_array_info *array)\n{\n  return array->n_index == 0;\n}\n\n/* Does \"kernel\" need to be passed an argument corresponding to array \"i\"?\n *\n * The argument is only needed if the kernel accesses this device memory.\n */\nint autosa_kernel_requires_array_argument(struct autosa_kernel *kernel, int i)\n{\n  return kernel->array[i].global;\n}\n\n/* If group->n_ref == 1, then group->refs was set by\n * populate_array_references to point directly into\n * group->array->refs and should not be freed.\n * If group->n_ref > 1, then group->refs was set by join_groups\n * to point to a newly allocated array.\n */\nstruct autosa_array_ref_group *autosa_array_ref_group_free(\n    struct autosa_array_ref_group *group)\n{\n  if (!group)\n    return NULL;\n  autosa_array_tile_free(group->local_tile); // TODO: fix it\n  autosa_array_tile_free(group->pe_tile);\n  isl_map_free(group->access);\n  if (group->n_ref > 1)\n    free(group->refs);\n  isl_vec_free(group->dir);\n  isl_vec_free(group->old_dir);\n  isl_multi_aff_free(group->io_trans);\n  isl_multi_aff_free(group->io_L1_trans);\n  isl_ast_expr_free(group->io_pe_expr);\n  isl_ast_expr_free(group->io_L1_pe_expr);\n  isl_ast_expr_free(group->io_pe_expr_boundary);\n  isl_ast_expr_free(group->io_L1_pe_expr_boundary);\n  /* Free io buffers */\n  for (int i = 0; i < group->n_io_buffer; i++)\n  {        \n    autosa_array_tile_free(group->io_buffers[i]->tile);\n    isl_union_set_free(group->io_buffers[i]->hoist_domain);\n    if (group->io_buffers[i]->tuning_tile) {      \n      delete group->io_buffers[i]->tuning_tile;      \n    }\n    free(group->io_buffers[i]);\n  }\n  free(group->io_buffers);\n  isl_schedule_free(group->io_schedule);\n  if (group->io_L1_schedule)\n    isl_schedule_free(group->io_L1_schedule);\n  isl_schedule_free(group->io_L1_lower_schedule);\n  isl_union_pw_multi_aff_free(group->copy_schedule);\n  if (group->attached_drain_group)\n    autosa_array_ref_group_free(group->attached_drain_group);\n  group->tuning_refs.clear();\n  delete group->tuning_pe_tile;\n  delete group->tuning_local_tile;\n  //free(group);\n  delete group;\n\n  return NULL;\n}\n\nstruct autosa_array_ref_group *autosa_array_ref_group_init(\n    struct autosa_array_ref_group *group)\n{\n  group->local_array = NULL;\n  group->array = NULL;\n  group->nr = -1;\n  group->access = NULL;\n  group->write = -1;\n  group->exact_write = -1;\n  group->slice = -1;\n  group->min_depth = -1;\n  group->shared_tile = NULL;\n  group->private_tile = NULL;\n  group->local_tile = NULL;\n  group->n_ref = 0;\n  group->refs = NULL;\n  group->io_buffers = NULL;\n  group->n_io_buffer = 0;\n  group->io_type = AUTOSA_UNKNOWN_IO;\n  group->pe_io_dir = IO_UNKNOWN;\n  group->array_io_dir = IO_UNKNOWN;\n  group->dir = NULL;\n  group->old_dir = NULL;\n  group->io_trans = NULL;\n  group->io_L1_trans = NULL;\n  group->io_pe_expr = NULL;\n  group->io_L1_pe_expr = NULL;\n  group->io_pe_expr_boundary = NULL;\n  group->io_L1_pe_expr_boundary = NULL;\n  group->io_schedule = NULL;\n  group->io_L1_schedule = NULL;\n  group->io_L1_lower_schedule = NULL;\n  group->io_level = 0;\n  group->space_dim = 0;\n  group->n_lane = 0;\n  group->copy_schedule_dim = 0;\n  group->copy_schedule = NULL;\n  group->attached_drain_group = NULL;\n  group->tuning_pe_tile = NULL;\n  group->tuning_local_tile = NULL;\n\n  return group;\n}\n\nstruct autosa_array_tile *autosa_array_tile_free(struct autosa_array_tile *tile)\n{\n  int j;\n\n  if (!tile)\n    return NULL;\n\n  for (j = 0; j < tile->n; ++j)\n  {\n    isl_val_free(tile->bound[j].size);\n    isl_val_free(tile->bound[j].stride);\n    isl_aff_free(tile->bound[j].lb);\n    isl_aff_free(tile->bound[j].shift);\n  }\n  free(tile->bound);\n  isl_multi_aff_free(tile->tiling);\n  free(tile);\n\n  return NULL;\n}\n\n/* Create a autosa_array_tile for an array of dimension \"n_index\".\n */\nstruct autosa_array_tile *autosa_array_tile_create(isl_ctx *ctx, int n_index)\n{\n  int i;\n  struct autosa_array_tile *tile;\n\n  tile = isl_calloc_type(ctx, struct autosa_array_tile);\n  if (!tile)\n    return NULL;\n\n  tile->ctx = ctx;\n  tile->bound = isl_alloc_array(ctx, struct autosa_array_bound, n_index);\n  if (!tile->bound)\n    return autosa_array_tile_free(tile);\n\n  tile->n = n_index;\n\n  for (i = 0; i < n_index; ++i)\n  {\n    tile->bound[i].size = NULL;\n    tile->bound[i].lb = NULL;\n    tile->bound[i].stride = NULL;\n    tile->bound[i].shift = NULL;\n  }\n\n  return tile;\n}\n\n/* Compute the size of the tile specified by \"tile\"\n * in number of elements and return the result.\n */\n__isl_give isl_val *autosa_array_tile_size(struct autosa_array_tile *tile)\n{\n  int i;\n  isl_val *size;\n\n  if (!tile)\n    return NULL;\n\n  size = isl_val_one(tile->ctx);\n\n  for (i = 0; i < tile->n; ++i)\n    size = isl_val_mul(size, isl_val_copy(tile->bound[i].size));\n\n  return size;\n}\n\n/****************************************************************\n * AutoSA statement\n ****************************************************************/\nstatic void *free_autosa_io_info(struct autosa_io_info *io_info)\n{\n  autosa_dep_free(io_info->dep);\n  isl_vec_free(io_info->dir);\n  isl_vec_free(io_info->old_dir);\n\n  free(io_info);\n  return NULL;\n}\n\nstatic void *free_stmts(struct autosa_stmt *stmts, int n)\n{\n  int i;\n\n  if (!stmts)\n    return NULL;\n\n  for (i = 0; i < n; ++i)\n  {\n    struct autosa_stmt_access *access, *next;\n\n    for (access = stmts[i].accesses; access; access = next)\n    {\n      next = access->next;\n      isl_id_free(access->ref_id);\n      isl_map_free(access->access);\n      isl_map_free(access->tagged_access);\n\n      for (int k = 0; k < access->n_io_info; k++)\n        free_autosa_io_info(access->io_info[k]);\n      free(access->io_info);\n\n      free(access);\n    }\n\n    isl_id_free(stmts[i].id);\n  }\n  free(stmts);\n\n  return NULL;\n}\n\n/* Has statement \"stmt\" been killed from \"scop\"?\n * That is, is the instance set of \"scop\" free from any\n * instances of \"stmt\"?\n */\nstatic isl_bool is_stmt_killed(struct ppcg_scop *scop, struct pet_stmt *stmt)\n{\n  isl_space *space;\n  isl_set *left;\n  isl_bool empty;\n\n  if (!scop || !stmt)\n    return isl_bool_error;\n  space = isl_set_get_space(stmt->domain);\n  left = isl_union_set_extract_set(scop->domain, space);\n  empty = isl_set_plain_is_empty(left);\n  isl_set_free(left);\n\n  return empty;\n}\n\n/* Given a tagged access relation to a single array \"tagged\", extract it\n * as a map, taking into account that the input may be empty.\n * If the access relation is empty, then it does not contain\n * any space information, so we try to recover it from the index\n * expression.\n * The space of the index expression is of the form I -> A,\n * with I the statement instances and A the array, or [I -> F] -> A,\n * with F the filters corresponding to arguments.\n * We first drop F, if present, obtaining I -> A.\n * Then we construct I -> R, with R the reference tag,\n * combine the two into I -> [R -> A] and uncurry to obtain\n * the final result [I -> R] -> A.\n * Note that the index expression may have a lower dimension\n * than that of the array, but this dimension is not used\n * if the access relation is empty.\n */\nstatic __isl_give isl_map *extract_single_tagged_access(\n    __isl_take isl_union_map *tagged, __isl_keep pet_expr *expr)\n{\n  int empty;\n  isl_id *id;\n  isl_space *space, *space2;\n  isl_multi_pw_aff *index;\n\n  empty = isl_union_map_is_empty(tagged);\n  if (empty < 0)\n    goto error;\n  if (!empty)\n    return isl_map_from_union_map(tagged);\n  isl_union_map_free(tagged);\n\n  index = pet_expr_access_get_index(expr);\n  space = isl_multi_pw_aff_get_space(index);\n  isl_multi_pw_aff_free(index);\n  if (isl_space_domain_is_wrapping(space))\n    space = isl_space_domain_factor_domain(space);\n  space2 = isl_space_copy(space);\n  space2 = isl_space_from_domain(isl_space_domain(space));\n  id = pet_expr_access_get_ref_id(expr);\n  space2 = isl_space_set_tuple_id(space2, isl_dim_out, id);\n  space = isl_space_range_product(space2, space);\n  space = isl_space_uncurry(space);\n\n  return isl_map_empty(space);\nerror:\n  isl_union_map_free(tagged);\n  return NULL;\n}\n\n/* Does the index expression \"index\" of \"expr\" represent an access\n * to a single element?\n * That is, is \"index\" completely specified?\n *\n * If \"expr\" accesses elements from different spaces (i.e., fields\n * of a structure), then it does not access a single element.\n * Otherwise, if the single space of the access matches the space\n * of \"index\", then the index expression is completely specified\n * (no pointer to a lower-dimensional slice of the accessed array)\n * and a single element is being accessed.\n */\nstatic isl_bool complete_index(__isl_keep pet_expr *expr,\n                               __isl_keep isl_multi_pw_aff *index)\n{\n  isl_union_map *read, *write, *all;\n  isl_map *map;\n  isl_space *space1, *space2;\n  isl_bool complete;\n\n  read = pet_expr_access_get_may_read(expr);\n  write = pet_expr_access_get_may_write(expr);\n  all = isl_union_map_union(read, write);\n  if (!all)\n    return isl_bool_error;\n  if (isl_union_map_n_map(all) != 1)\n  {\n    isl_union_map_free(all);\n    return isl_bool_false;\n  }\n  map = isl_map_from_union_map(all);\n  space1 = isl_map_get_space(map);\n  isl_map_free(map);\n  space2 = isl_multi_pw_aff_get_space(index);\n  complete = isl_space_tuple_is_equal(space1, isl_dim_out,\n                                      space2, isl_dim_out);\n  isl_space_free(space1);\n  isl_space_free(space2);\n\n  return complete;\n}\n\n/* Does \"expr\" access a single, fixed element (independently of the statement\n * instance)?\n * That is, does it have a completely specified constant index expression?\n *\n * Note that it is not sufficient for the index expression to be\n * piecewise constant.  isl_multi_pw_aff_is_cst can therefore not be used.\n */\nstatic isl_bool accesses_fixed_element(__isl_keep pet_expr *expr)\n{\n  int i, n;\n  isl_multi_pw_aff *index;\n  isl_bool fixed = isl_bool_true;\n\n  index = pet_expr_access_get_index(expr);\n  if (index < 0)\n    return isl_bool_error;\n  n = isl_multi_pw_aff_dim(index, isl_dim_out);\n  for (i = 0; i < n; ++i)\n  {\n    isl_pw_aff *pa;\n\n    pa = isl_multi_pw_aff_get_pw_aff(index, 0);\n    fixed = (isl_pw_aff_n_piece(pa) == 1) ? isl_bool_true : isl_bool_false;\n    if (fixed)\n      fixed = isl_pw_aff_is_cst(pa);\n    isl_pw_aff_free(pa);\n    if (fixed < 0 || !fixed)\n      break;\n  }\n  if (fixed >= 0 && fixed)\n    fixed = complete_index(expr, index);\n  isl_multi_pw_aff_free(index);\n\n  return fixed;\n}\n\n/* Extract a autosa_stmt_access from \"expr\", append it to the list\n * that ends in *data->next_access and update the end of the list.\n * If the access expression performs a write, then it is considered\n * exact only if it appears in a single expression statement and\n * if its may access relation is equal to its must access relation.\n *\n * The combined set of may accesses may be a union if member accesses\n * are involved, but the entire set is derived from a single reference and\n * therefore from a single index expression.  These accesses therefore\n * all map to the same outer array.\n */\nstatic int extract_access(__isl_keep pet_expr *expr, void *user)\n{\n  struct ppcg_extract_access_data *data = (struct ppcg_extract_access_data *)user;\n  isl_union_map *tagged;\n  struct autosa_stmt_access *access;\n  isl_ctx *ctx = pet_expr_get_ctx(expr);\n  isl_multi_pw_aff *index;\n\n  access = isl_alloc_type(ctx, struct autosa_stmt_access);\n  if (!access)\n    return -1;\n  access->next = NULL;\n  access->read = pet_expr_access_is_read(expr);\n  access->write = pet_expr_access_is_write(expr);\n  tagged = pet_expr_access_get_tagged_may_read(expr);\n  tagged = isl_union_map_union(tagged,\n                               pet_expr_access_get_tagged_may_write(expr));\n  tagged = isl_union_map_apply_range(tagged,\n                                     isl_union_map_copy(data->any_to_outer));\n  if (!access->write)\n  {\n    access->exact_write = 1;\n  }\n  else if (!data->single_expression)\n  {\n    access->exact_write = 0;\n  }\n  else\n  {\n    isl_union_map *must, *may;\n    may = isl_union_map_copy(tagged);\n    may = isl_union_map_domain_factor_domain(may);\n    must = pet_expr_access_get_must_write(expr);\n    access->exact_write = isl_union_map_is_equal(must, may);\n    isl_union_map_free(must);\n    isl_union_map_free(may);\n  }\n  index = pet_expr_access_get_index(expr);\n  access->n_index = isl_multi_pw_aff_dim(index, isl_dim_out);\n  isl_multi_pw_aff_free(index);\n  access->ref_id = pet_expr_access_get_ref_id(expr);\n  access->tagged_access = extract_single_tagged_access(tagged, expr);\n  access->access = isl_map_copy(access->tagged_access);\n  access->access = isl_map_domain_factor_domain(access->access);\n  access->fixed_element = accesses_fixed_element(expr);\n\n  /* AutoSA Extended */\n  access->n_io_info = 0;\n  access->io_info = NULL;\n  access->layout_trans = -1;\n  access->simd_dim = -1;\n  access->simd_stride = -1;\n  /* AutoSA Extended */\n\n  *data->next_access = access;\n  data->next_access = &(*data->next_access)->next;\n\n  if (!access->access || access->fixed_element < 0)\n    return -1;\n\n  return 0;\n}\n\n/* Construct a linked list of autosa_stmt_access objects,\n * one for each access expression in the statement body.\n * \"any_to_outer\" maps all intermediate arrays to their outer arrays.\n */\nstatic int pet_stmt_extract_accesses(struct autosa_stmt *stmt,\n                                     __isl_keep isl_union_map *any_to_outer)\n{\n  struct ppcg_extract_access_data data;\n\n  stmt->accesses = NULL;\n  data.next_access = &stmt->accesses;\n  data.single_expression =\n      pet_tree_get_type(stmt->stmt->body) == pet_tree_expr;\n  data.any_to_outer = any_to_outer;\n  return pet_tree_foreach_access_expr(stmt->stmt->body,\n                                      &extract_access, &data);\n}\n\n/* Return an array of autosa_stmt representing the statements in \"scop\".\n * Do not collect array accesses for statements that have been killed.\n */\nstruct autosa_stmt *extract_stmts(isl_ctx *ctx, struct ppcg_scop *scop,\n                                  __isl_keep isl_union_map *any_to_outer)\n{\n  int i;\n  struct autosa_stmt *stmts;\n\n  stmts = isl_calloc_array(ctx, struct autosa_stmt, scop->pet->n_stmt);\n  if (!stmts)\n    return NULL;\n\n  for (i = 0; i < scop->pet->n_stmt; ++i)\n  {\n    struct autosa_stmt *s = &stmts[i];\n    isl_bool killed;\n\n    s->id = isl_set_get_tuple_id(scop->pet->stmts[i]->domain);\n    s->stmt = scop->pet->stmts[i];\n    killed = is_stmt_killed(scop, scop->pet->stmts[i]);\n    if (killed < 0)\n      return (struct autosa_stmt *)free_stmts(stmts, i + 1);\n    if (killed)\n      continue;\n    if (pet_stmt_extract_accesses(s, any_to_outer) < 0)\n      return (struct autosa_stmt *)free_stmts(stmts, i + 1);\n  }\n\n  return stmts;\n}\n\nvoid autosa_kernel_stmt_free(void *user)\n{\n  struct autosa_kernel_stmt *stmt = (struct autosa_kernel_stmt *)user;\n\n  if (!stmt)\n    return;\n\n  switch (stmt->type)\n  {\n  case AUTOSA_KERNEL_STMT_COPY:\n    isl_ast_expr_free(stmt->u.c.index);\n    isl_ast_expr_free(stmt->u.c.local_index);\n    break;\n  case AUTOSA_KERNEL_STMT_DOMAIN:\n    isl_id_to_ast_expr_free(stmt->u.d.ref2expr);\n    break;\n  case AUTOSA_KERNEL_STMT_SYNC:\n    break;\n  case AUTOSA_KERNEL_STMT_IO:\n  case AUTOSA_KERNEL_STMT_IO_TRANSFER:\n  case AUTOSA_KERNEL_STMT_IO_TRANSFER_BUF:\n  case AUTOSA_KERNEL_STMT_IO_DRAM:    \n    free(stmt->u.i.in_fifo_name);\n    free(stmt->u.i.out_fifo_name);\n    isl_ast_expr_free(stmt->u.i.local_index);\n    isl_ast_expr_free(stmt->u.i.index);\n    free(stmt->u.i.reduce_op);\n    break;\n  case AUTOSA_KERNEL_STMT_MODULE_CALL:\n  case AUTOSA_KERNEL_STMT_EXT_MODULE:\n    free(stmt->u.m.module_name);\n    break;\n  case AUTOSA_KERNEL_STMT_FIFO_DECL:\n    break;\n  case AUTOSA_KERNEL_STMT_DRAIN_MERGE:\n    isl_ast_expr_free(stmt->u.dm.index);\n    break;\n  case AUTOSA_KERNEL_STMT_HOST_SERIALIZE:\n    isl_ast_expr_free(stmt->u.s.index);\n    break;\n  }\n\n  free(stmt);\n}\n\n/* Find the element in gen->stmt that has the given \"id\".\n * Return NULL if no such autosa_stmt can be found.\n */\nstruct autosa_stmt *find_stmt(struct autosa_prog *prog, __isl_keep isl_id *id)\n{\n  int i;\n\n  for (i = 0; i < prog->n_stmts; ++i)\n  {\n    if (id == prog->stmts[i].id)\n      break;\n  }\n\n  return i < prog->n_stmts ? &prog->stmts[i] : NULL;\n}\n\n/****************************************************************\n * AutoSA prog\n ****************************************************************/\n/* Compute the set of inner array elements that may have their values\n * preserved by \"prog\".  In particular, collect the array elements of\n * arrays that are not local to \"prog\" and remove those elements that\n * are definitely killed or definitely written by \"prog\".\n */\nstatic __isl_give isl_union_set *compute_may_persist(struct autosa_prog *prog)\n{\n  int i;\n  isl_union_set *may_persist, *killed;\n  isl_union_map *must_kill;\n\n  may_persist = isl_union_set_empty(isl_set_get_space(prog->context));\n  for (i = 0; i < prog->n_array; ++i)\n  {\n    isl_set *extent;\n\n    if (prog->array[i].local)\n      continue;\n\n    extent = isl_set_copy(prog->array[i].extent);\n    may_persist = isl_union_set_add_set(may_persist, extent);\n  }\n\n  may_persist = isl_union_set_intersect_params(may_persist,\n                                               isl_set_copy(prog->context));\n  may_persist = isl_union_set_apply(may_persist,\n                                    isl_union_map_copy(prog->to_inner));\n  must_kill = isl_union_map_copy(prog->tagged_must_kill);\n  killed = isl_union_map_range(must_kill);\n  must_kill = isl_union_map_copy(prog->must_write);\n  killed = isl_union_set_union(killed, isl_union_map_range(must_kill));\n\n  may_persist = isl_union_set_subtract(may_persist, killed);\n  return may_persist;\n}\n\nstruct autosa_prog *autosa_prog_alloc(isl_ctx *ctx, struct ppcg_scop *scop)\n{\n  struct autosa_prog *prog;\n  isl_space *space;\n  isl_map *id;\n\n  if (!scop)\n    return NULL;\n\n  prog = isl_calloc_type(ctx, struct autosa_prog);\n  if (!prog)\n    return NULL;\n\n  prog->ctx = ctx;\n  prog->scop = scop;\n  prog->context = isl_set_copy(scop->context);\n  prog->n_stmts = scop->pet->n_stmt;\n  prog->any_to_outer = pet_scop_compute_outer_to_any(scop->pet);\n  prog->any_to_outer = isl_union_map_reverse(prog->any_to_outer);\n  space = isl_union_map_get_space(prog->any_to_outer);\n  space = isl_space_set_from_params(space);\n  space = isl_space_add_dims(space, isl_dim_set, 1);\n  space = isl_space_map_from_set(space);\n  id = isl_map_identity(space);\n  prog->any_to_outer = isl_union_map_add_map(prog->any_to_outer, id);\n  prog->stmts = extract_stmts(ctx, scop, prog->any_to_outer);\n  prog->read = isl_union_map_copy(scop->reads);\n  prog->may_write = isl_union_map_copy(scop->may_writes);\n  prog->must_write = isl_union_map_copy(scop->must_writes);\n  prog->tagged_must_kill = isl_union_map_copy(scop->tagged_must_kills);\n  prog->to_inner = pet_scop_compute_outer_to_inner(scop->pet);\n  prog->to_outer = isl_union_map_copy(prog->to_inner);\n  prog->to_outer = isl_union_map_reverse(prog->to_outer);\n\n  if (!prog->stmts)\n    return (struct autosa_prog *)autosa_prog_free(prog);\n\n  if (collect_array_info(prog) < 0)\n    return (struct autosa_prog *)autosa_prog_free(prog);\n  prog->may_persist = compute_may_persist(prog); // TODO\n\n  return prog;\n}\n\nvoid *autosa_prog_free(struct autosa_prog *prog)\n{\n  if (!prog)\n    return NULL;\n  free_array_info(prog);\n  free_stmts(prog->stmts, prog->n_stmts);\n  isl_union_map_free(prog->any_to_outer);\n  isl_union_map_free(prog->to_outer);\n  isl_union_map_free(prog->to_inner);\n  isl_union_map_free(prog->read);\n  isl_union_map_free(prog->may_write);\n  isl_union_map_free(prog->must_write);\n  isl_union_map_free(prog->tagged_must_kill);\n  isl_union_map_free(prog->array_order);\n  isl_union_set_free(prog->may_persist);\n  isl_set_free(prog->context);\n  free(prog);\n\n  return NULL;\n}\n\n/****************************************************************\n * AutoSA hw module\n ****************************************************************/\nstruct autosa_hw_module *autosa_hw_module_alloc(struct autosa_gen *gen)\n{\n  struct autosa_hw_module *module = (struct autosa_hw_module *)malloc(\n      sizeof(struct autosa_hw_module));\n  module->options = gen->options;\n  module->name = NULL;\n  module->tree = NULL;\n  module->device_tree = NULL;\n  module->inst_ids = NULL;\n  module->n_var = 0;\n  module->var = NULL;\n  module->kernel = NULL;\n  module->n_io_group = 0;\n  module->io_groups = NULL;\n  module->to_pe = 0;\n  module->to_mem = 0;\n  module->double_buffer = 0;\n  module->is_filter = 0;\n  module->is_buffer = 0;\n  module->outer_sched = NULL;\n  module->inter_sched = NULL;\n  module->intra_sched = NULL;\n  module->inter_space = NULL;\n  module->intra_space = NULL;\n  module->space = NULL;\n  module->inter_tree = NULL;\n  module->intra_tree = NULL;\n  module->credit = 0;\n  module->boundary_sched = NULL;\n  module->boundary_tree = NULL;\n  module->boundary = 0;\n  module->boundary_outer_sched = NULL;\n  module->boundary_inter_sched = NULL;\n  module->boundary_outer_tree = NULL;\n  module->boundary_inter_tree = NULL;\n  module->n_pe_dummy_modules = 0;\n  module->pe_dummy_modules = NULL;\n  module->n_array_ref = 0;\n  module->serialize_sched = NULL;\n  module->serialize_tree = NULL;\n  module->coalesce_bound = -1;\n  module->is_serialized = 0;\n  module->use_FF = 0;\n  module->in = -1;\n  module->pipeline_at_default_func = 0;\n  module->pipeline_at_filter_func[0] = 0;\n  module->pipeline_at_filter_func[1] = 0;\n  module->pipeline_at_filter_func[2] = 0;\n\n  module->n_fifo_serialize = 0;\n  module->fifo_bounds_serialize = NULL;\n  module->fifo_names_serialize = NULL;\n  module->n_fifo_default = 0;\n  module->fifo_names_default = NULL;\n  module->fifo_bounds_default = NULL;\n  module->n_fifo_inter = 0;\n  module->fifo_names_inter = NULL;\n  module->fifo_bounds_inter = NULL;\n  module->n_fifo_intra = 0;\n  module->fifo_names_intra = NULL;\n  module->fifo_bounds_intra = NULL;\n\n  module->tuning_sched = NULL;\n  module->tuning_outer_sched = NULL;\n  module->tuning_inter_sched = NULL;\n  module->tuning_intra_sched = NULL;\n  module->tuning_tree = NULL;\n  module->tuning_device_tree = NULL;\n  module->tuning_intra_tree = NULL;\n  module->tuning_inter_tree = NULL;\n\n  module->tuning_num_sched = NULL;\n  module->tuning_num_outer_sched = NULL;\n  module->tuning_num_inter_sched = NULL;\n  module->tuning_num_intra_sched = NULL;\n  module->tuning_num_tree = NULL;\n  module->tuning_num_device_tree = NULL;\n  module->tuning_num_intra_tree = NULL;\n  module->tuning_num_inter_tree = NULL;  \n\n  return module;\n}\n\nvoid *autosa_hw_module_free(struct autosa_hw_module *module)\n{\n  if (!module)\n    return NULL;\n\n  free(module->name);\n\n  isl_ast_node_free(module->tree);\n  isl_ast_node_free(module->device_tree);\n  isl_ast_node_free(module->inter_tree);\n  isl_ast_node_free(module->intra_tree);\n  isl_ast_node_free(module->boundary_tree);\n  isl_ast_node_free(module->boundary_outer_tree);\n  isl_ast_node_free(module->boundary_inter_tree);\n  isl_ast_node_free(module->serialize_tree);\n\n  isl_space_free(module->inter_space);\n  isl_space_free(module->intra_space);\n  isl_space_free(module->space);\n\n  isl_id_list_free(module->inst_ids);\n  for (int i = 0; i < module->n_var; i++)\n  {\n    free(module->var[i].name);\n    isl_vec_free(module->var[i].size);\n  }\n  free(module->var);\n  free(module->io_groups);  \n  for (int i = 0; i < module->n_pe_dummy_modules; i++)\n  {\n    autosa_pe_dummy_module_free(module->pe_dummy_modules[i]);\n  }\n  free(module->pe_dummy_modules);\n\n  if (module->n_fifo_serialize > 0) {\n    for (int i = 0; i < module->n_fifo_serialize; i++) {\n      free(module->fifo_names_serialize[i]);\n      isl_pw_qpolynomial_free(module->fifo_bounds_serialize[i]);\n    }\n    free(module->fifo_bounds_serialize);\n    free(module->fifo_names_serialize);\n  }\n  if (module->n_fifo_default > 0) {\n    for (int i = 0; i < module->n_fifo_default; i++) {\n      free(module->fifo_names_default[i]);\n      isl_pw_qpolynomial_free(module->fifo_bounds_default[i]);\n    }\n    free(module->fifo_bounds_default);\n    free(module->fifo_names_default);\n  }\n  if (module->n_fifo_inter > 0) {\n    for (int i = 0; i < module->n_fifo_inter; i++) {\n      free(module->fifo_names_inter[i]);\n      isl_pw_qpolynomial_free(module->fifo_bounds_inter[i]);\n    }\n    free(module->fifo_bounds_inter);\n    free(module->fifo_names_inter);\n  }\n  if (module->n_fifo_intra > 0) {\n    for (int i = 0; i < module->n_fifo_intra; i++) {\n      free(module->fifo_names_intra[i]);\n      isl_pw_qpolynomial_free(module->fifo_bounds_intra[i]);\n    }\n    free(module->fifo_bounds_intra);\n    free(module->fifo_names_intra);\n  }\n\n  isl_ast_node_free(module->tuning_tree);\n  isl_ast_node_free(module->tuning_device_tree);\n  isl_ast_node_free(module->tuning_inter_tree);\n  isl_ast_node_free(module->tuning_intra_tree);\n\n  isl_ast_node_free(module->tuning_num_tree);\n  isl_ast_node_free(module->tuning_num_device_tree);\n  isl_ast_node_free(module->tuning_num_inter_tree);\n  isl_ast_node_free(module->tuning_num_intra_tree);\n\n  free(module);\n\n  return NULL;\n}\n\nstruct autosa_hw_top_module *autosa_hw_top_module_alloc()\n{\n  struct autosa_hw_top_module *module = (struct autosa_hw_top_module *)malloc(\n      sizeof(struct autosa_hw_top_module));\n\n  module->n_module_calls = 0;\n  module->n_fifo_decls = 0;\n  module->module_call_scheds = NULL;\n  module->fifo_decl_scheds = NULL;\n  module->module_call_trees = NULL;\n  module->fifo_decl_trees = NULL;\n  module->fifo_decl_names = NULL;\n\n  module->n_module_call_wrapped = 0;\n  module->n_fifo_decl_wrapped = 0;\n  module->module_call_wrapped_trees = NULL;\n  module->fifo_decl_wrapped_trees = NULL;\n\n  module->kernel = NULL;\n  module->hw_modules = NULL;\n  module->n_hw_modules = 0;\n\n  module->n_ext_module = 0;\n  module->ext_module_scheds = NULL;\n  module->ext_module_trees = NULL;\n  module->n_ext_module_wrapped = 0;\n  module->ext_module_wrapped_trees = NULL;\n\n  return module;\n}\n\nvoid *autosa_hw_top_module_free(struct autosa_hw_top_module *module)\n{\n  if (!module)\n    return NULL;\n\n  if (module->module_call_trees)\n  {\n    for (int i = 0; i < module->n_module_calls; i++)\n    {\n      isl_ast_node_free(module->module_call_trees[i]);\n    }\n  }\n\n  if (module->fifo_decl_trees)\n  {\n    for (int i = 0; i < module->n_fifo_decls; i++)\n    {\n      isl_ast_node_free(module->fifo_decl_trees[i]);\n      free(module->fifo_decl_names[i]);\n    }\n  }\n\n  if (module->module_call_wrapped_trees)\n  {\n    for (int i = 0; i < module->n_module_call_wrapped; i++)\n    {\n      isl_ast_node_free(module->module_call_wrapped_trees[i]);\n    }\n  }\n\n  if (module->fifo_decl_wrapped_trees)\n  {\n    for (int i = 0; i < module->n_fifo_decl_wrapped; i++)\n    {\n      isl_ast_node_free(module->fifo_decl_wrapped_trees[i]);\n    }\n  }\n\n  if (module->ext_module_trees)\n  {\n    for (int i = 0; i < module->n_ext_module; i++)\n    {\n      isl_ast_node_free(module->ext_module_trees[i]);\n    }\n  }\n\n  if (module->ext_module_wrapped_trees)\n  {\n    for (int i = 0; i < module->n_ext_module_wrapped; i++)\n    {\n      isl_ast_node_free(module->ext_module_wrapped_trees[i]);\n    }\n  }\n\n  free(module->module_call_scheds);\n  free(module->fifo_decl_scheds);\n  free(module->ext_module_scheds);\n  free(module->module_call_trees);\n  free(module->fifo_decl_trees);\n  free(module->ext_module_trees);\n  free(module->module_call_wrapped_trees);\n  free(module->fifo_decl_wrapped_trees);\n  free(module->ext_module_wrapped_trees);\n  free(module->fifo_decl_names);\n  free(module);\n\n  return NULL;\n}\n\nstruct autosa_pe_dummy_module *autosa_pe_dummy_module_alloc()\n{\n  struct autosa_pe_dummy_module *module = (struct autosa_pe_dummy_module *)malloc(\n      sizeof(struct autosa_pe_dummy_module));\n  module->module = NULL;\n  module->io_group = NULL;\n  module->sched = NULL;\n  module->tree = NULL;\n  module->device_tree = NULL;\n\n  return module;\n}\n\nvoid *autosa_pe_dummy_module_free(struct autosa_pe_dummy_module *module)\n{\n  if (!module)\n    return NULL;\n\n  isl_ast_node_free(module->tree);\n  isl_ast_node_free(module->device_tree);\n  free(module);\n\n  return NULL;\n}\n\nstruct autosa_drain_merge_func *autosa_drain_merge_func_alloc(struct autosa_gen *gen)\n{\n  struct autosa_drain_merge_func *func = (struct autosa_drain_merge_func *)\n      malloc(sizeof(struct autosa_drain_merge_func));\n  func->group = NULL;\n  func->kernel = NULL;\n  func->inst_ids = NULL;\n  func->sched = NULL;\n  func->tree = NULL;\n  func->device_tree = NULL;\n\n  return func;\n}\n\nvoid *autosa_drain_merge_func_free(struct autosa_drain_merge_func *func)\n{\n  if (!func)\n    return NULL;\n\n  isl_id_list_free(func->inst_ids);\n  isl_ast_node_free(func->tree);\n  isl_ast_node_free(func->device_tree);\n  free(func);\n\n  return NULL;\n}\n\n/****************************************************************\n * AutoSA AST node\n ****************************************************************/\nstruct autosa_ast_node_userinfo *alloc_ast_node_userinfo()\n{\n  struct autosa_ast_node_userinfo *info =\n      (struct autosa_ast_node_userinfo *)malloc(sizeof(\n          struct autosa_ast_node_userinfo));\n  info->is_pipeline = 0;\n  info->is_unroll = 0;\n  info->is_outermost_for = 0;\n  info->is_infinitize_legal = 0;\n  info->is_first_infinitizable_loop = 0;  \n  info->is_dep_free = 0;\n  info->n_coalesce_loop = 0;\n  info->visited = 0;\n\n  info->is_guard_start = 0;\n  info->is_guard_end = 0;\n  info->n_fifo = 0;\n  info->fifo_names = NULL;\n  info->bounds = NULL;\n  info->module_name = NULL;\n\n  return info;\n}\n\nvoid free_ast_node_userinfo(void *ptr)\n{\n  struct autosa_ast_node_userinfo *info = (struct autosa_ast_node_userinfo *)ptr;  \n\n  free(info);\n}\n\n/****************************************************************\n * AutoSA PE opt\n ****************************************************************/\n/* Internal data structure for extract_size_of_type.\n * \"type\" specifies the name of the space that we want to extract.\n * \"res\" is used to store the subset of that space.\n */\nstruct autosa_extract_size_data\n{\n  const char *type;\n  isl_set *res;\n};\n\n/* This function is called for each set in a union_set.\n * If the name of the set matches data->type, we store the\n * set in data->res.\n */\nstatic isl_stat extract_size_of_type(__isl_take isl_set *size, void *user)\n{\n  struct autosa_extract_size_data *data = (struct autosa_extract_size_data *)user;\n  const char *name;\n\n  name = isl_set_get_tuple_name(size);\n  if (name && !strcmp(name, data->type))\n  {\n    data->res = size;\n    return isl_stat_error;\n  }\n\n  isl_set_free(size);\n  return isl_stat_ok;\n}\n\n/* Given a union map { kernel[] -> *[...] },\n * return the range in the space called \"type\" for the kernel with \n * sequence number \"id\".\n */\n__isl_give isl_set *extract_sa_sizes(__isl_keep isl_union_map *sizes,\n                                     const char *type)\n{\n  isl_space *space;\n  isl_set *dom;\n  isl_union_set *local_sizes;\n  struct autosa_extract_size_data data = {type, NULL};\n\n  if (!sizes)\n    return NULL;\n\n  space = isl_union_map_get_space(sizes);\n  space = isl_space_set_from_params(space);\n  //space = isl_space_add_dims(space, isl_dim_set, 1);\n  space = isl_space_set_tuple_name(space, isl_dim_set, \"kernel\");\n  dom = isl_set_universe(space);\n  //dom = isl_set_fix_si(dom, isl_dim_set, 0, id);\n\n  local_sizes = isl_union_set_apply(isl_union_set_from_set(dom),\n                                    isl_union_map_copy(sizes));\n  isl_union_set_foreach_set(local_sizes, &extract_size_of_type, &data);\n  isl_union_set_free(local_sizes);\n  return data.res;\n}\n\n/* Given a singleton set, extract the *len elements of the single integer tuple\n * into *sizes. \n *\n * If the element value is \"-1\", the loop at the same position is not tiled.\n *  \n * If \"set\" is NULL, then the \"sizes\" array is not updated.\n */\nstatic isl_stat read_sa_sizes_from_set(__isl_take isl_set *set, int *sizes, int len)\n{\n  int i;\n  int dim;\n\n  if (!set)\n    return isl_stat_ok;\n\n  dim = isl_set_dim(set, isl_dim_set);\n  if (dim < len)\n    isl_die(isl_set_get_ctx(set), isl_error_invalid,\n            \"fewer sa_sizes than required\", return isl_stat_error);\n\n  for (i = 0; i < len; ++i)\n  {\n    isl_val *v;\n\n    v = isl_set_plain_get_val_if_fixed(set, isl_dim_set, i);\n    if (!v)\n      goto error;    \n    sizes[i] = isl_val_get_num_si(v);    \n    isl_val_free(v);\n  }\n\n  isl_set_free(set);\n  return isl_stat_ok;\nerror:\n  isl_set_free(set);\n  return isl_stat_error;\n}\n\n/* Given a union map { kernel[] -> *[...] },\n * return the range in the space called \"type\" for the kernel.\n */\nstatic __isl_give isl_set *extract_config_sizes(__isl_keep isl_union_map *sizes,\n  const char *type)\n{\n  isl_space *space;\n  isl_set *dom;\n  isl_union_set *local_sizes;\n  struct autosa_extract_size_data data = {type, NULL};\n\n  if (!sizes)\n    return NULL;\n  \n  space = isl_union_map_get_space(sizes);\n  space = isl_space_set_from_params(space);\n  //space = isl_space_add_dims(space, isl_dim_set, 1);\n  space = isl_space_set_tuple_name(space, isl_dim_set, \"kernel\");\n  dom = isl_set_universe(space);\n//#ifdef _DEBUG\n//  isl_printer *pd = isl_printer_to_file(isl_set_get_ctx(dom), stdout);\n//  pd = isl_printer_print_set(pd, dom);\n//  pd = isl_printer_end_line(pd);\n//#endif\n\n  local_sizes = isl_union_set_apply(isl_union_set_from_set(dom),\n                                    isl_union_map_copy(sizes));\n\n//#ifdef _DEBUG\n//  pd = isl_printer_print_union_set(pd, local_sizes);\n//  pd = isl_printer_end_line(pd);\n//#endif\n  isl_union_set_foreach_set(local_sizes, &extract_size_of_type, &data);                                      \n  isl_union_set_free(local_sizes);\n  return data.res;\n}\n\n/* Given a singleton set, extract the *len elements of the single integer tuple\n * into *sizes. \n *\n * If the element value is \"-1\", the loop at the same position is not tiled.\n *  \n * If \"set\" is NULL, then the \"sizes\" array is not updated.\n */\nstatic isl_stat read_config_sizes_from_set(__isl_take isl_set *set, \n  int *sizes, int len)\n{\n  int i;\n  int dim;\n\n  if (!set)\n    return isl_stat_ok;\n\n  dim = isl_set_dim(set, isl_dim_set);\n  if (dim < len)\n    isl_die(isl_set_get_ctx(set), isl_error_invalid,\n            \"fewer sizes than required\", return isl_stat_error);\n\n  for (i = 0; i < len; ++i)\n  {\n    isl_val *v;\n\n    v = isl_set_plain_get_val_if_fixed(set, isl_dim_set, i);\n    if (!v)\n      goto error;\n    sizes[i] = isl_val_get_num_si(v);\n    isl_val_free(v);\n  }\n\n  isl_set_free(set);\n  return isl_stat_ok;\nerror:\n  isl_set_free(set);\n  return isl_stat_error;\n}\n\n/* Add the map { kernel[id] -> type[sizes] } to gen->used-sizes \n * if the option debug->dump_sa_sizes is set.\n */\nstatic void set_sa_used_sizes(struct autosa_kernel *sa, const char *type, int id,\n                              int *sizes, int len)\n{\n  // TODO: fix it\n}\n\n/* Extract user specified \"sa_tile\" sizes from the \"sa_sizes\" command line options,\n * defaulting to option->sa_tile_size in each dimension.\n * *tile_len contains the maximum number of tile sizes needed.\n * Update *tile_len to the number of specified tile sizes, if any, and\n * return a pointer to the tile sizes (or NULL on error).\n * And the effectively used sizes to sa->used_sizes.\n */\nint *read_hbm_tile_sizes(struct autosa_kernel *sa, int tile_len, char *name)\n{\n  int n;\n  int *tile_size;\n  isl_set *size;\n\n  tile_size = isl_alloc_array(sa->ctx, int, tile_len);\n  if (!tile_size)\n    return NULL;\n\n  size = extract_sa_sizes(sa->sizes, name);\n  if (isl_set_dim(size, isl_dim_set) < tile_len)\n  {\n    free(tile_size);\n    isl_set_free(size);\n    return NULL;\n  }\n  if (read_sa_sizes_from_set(size, tile_size, tile_len) < 0)\n    goto error;\n  set_sa_used_sizes(sa, name, sa->id, tile_size, tile_len);\n\n  return tile_size;\nerror:\n  free(tile_size);\n  return NULL;\n}\n\nint read_mem_port_map(__isl_keep isl_union_map *port_map, char *name)\n{\n  isl_set *size;\n  int port;\n\n  size = extract_sa_sizes(port_map, name);\n  if (isl_set_dim(size, isl_dim_set) != 1) {\n    isl_set_free(size);\n    return -1;\n  }\n  if (read_sa_sizes_from_set(size, &port, 1) < 0)\n    goto error;\n  \n  return port;\nerror:\n  return -1;\n}\n\nint *read_default_hbm_tile_sizes(struct autosa_kernel *sa, int tile_len)\n{\n  int n;\n  int *tile_size;\n\n  tile_size = isl_alloc_array(sa->ctx, int, tile_len);\n  if (!tile_size)\n    return NULL;\n  for (n = 0; n < tile_len; ++n)\n    tile_size[n] = sa->scop->options->autosa->n_hbm_port;\n\n  return tile_size;\n}\n\n/* Extract user specified data pack sizes for array \"name\".\n */\nint *read_data_pack_sizes_array(__isl_keep isl_union_map *sizes, char *name)\n{\n  isl_set *size;\n  int *data_pack_sizes;\n  \n  size = extract_sa_sizes(sizes, name);\n  if (isl_set_dim(size, isl_dim_set) != 3) {\n    isl_set_free(size);\n    return NULL;\n  }\n  data_pack_sizes = (int *)malloc(3 * sizeof(int));\n  if (read_sa_sizes_from_set(size, data_pack_sizes, 3) < 0)\n    goto error;\n\n  return data_pack_sizes;\nerror:\n  free(data_pack_sizes);\n  return NULL;\n}\n\n/* Extract user specified data pack sizes from the \"data_pack_sizes\" command line\n * option, defaulting to 8, 32, 64, correponding to the upper bounds of data \n * pack factors at the innermost, in-between, and outermost I/O module levels.\n * Return a pointer to the tile sizes (or NULL on error).\n */\nint *read_data_pack_sizes(__isl_keep isl_union_map *sizes, int tile_len)\n{\n  int n;\n  int *tile_size;\n  isl_set *size;\n  isl_ctx *ctx;\n\n  ctx = isl_union_map_get_ctx(sizes);\n  tile_size = isl_alloc_array(ctx, int, tile_len);\n  if (!tile_size)\n    return NULL;\n\n  size = extract_config_sizes(sizes, \"data_pack\");\n//#ifdef _DEBUG\n//  isl_printer *pd = isl_printer_to_file(ctx, stdout);\n//  pd = isl_printer_print_union_map(pd, sizes);\n//  pd = isl_printer_end_line(pd);\n//  if (!size)\n//    printf(\"null\\n\");\n//  pd = isl_printer_print_set(pd, size);\n//  pd = isl_printer_end_line(pd);\n//  isl_printer_free(pd);\n//#endif\n  \n  if (isl_set_dim(size, isl_dim_set) < tile_len) \n  {\n    free(tile_size);\n    isl_set_free(size);\n    return NULL;\n  }\n  if (read_config_sizes_from_set(size, tile_size, tile_len) < 0)\n    goto error;\n\n  return tile_size;\nerror:\n  free(tile_size);\n  return NULL;\n}\n\n/* Extract user specified \"sa_tile\" sizes from the \"sa_sizes\" command line option,\n * defaulting to option->sa_tile_size in each dimension.\n * *tile_len contains the maximum number of tile sizes needed.\n * Update *tile_len to the number of specified tile sizes, if any, and \n * return a pointer to the tile sizes (or NULL on error).\n * And the effectively used sizes to sa->used_sizes.\n */\nint *read_array_part_tile_sizes(struct autosa_kernel *sa, int tile_len)\n{\n  int n;\n  int *tile_size;\n  isl_set *size;\n\n  tile_size = isl_alloc_array(sa->ctx, int, tile_len);\n  if (!tile_size)\n    return NULL;\n\n  size = extract_sa_sizes(sa->sizes, \"array_part\");\n  if (isl_set_dim(size, isl_dim_set) < tile_len)\n  {\n    free(tile_size);\n    isl_set_free(size);\n    return NULL;\n  }\n  if (read_sa_sizes_from_set(size, tile_size, tile_len) < 0)\n    goto error;\n  set_sa_used_sizes(sa, \"array_part\", sa->id, tile_size, tile_len);\n\n  return tile_size;\nerror:\n  free(tile_size);\n  return NULL;\n}\n\nint *read_default_array_part_tile_sizes(struct autosa_kernel *sa, int tile_len)\n{\n  int n;\n  int *tile_size;\n\n  tile_size = isl_alloc_array(sa->ctx, int, tile_len);\n  if (!tile_size)\n    return NULL;\n  for (n = 0; n < tile_len; ++n)\n    tile_size[n] = sa->scop->options->autosa->sa_tile_size;\n\n  return tile_size;\n}\n\n/* Extract user specified \"sa_tile\" sizes from the \"sa_sizes\" command line option,\n * defaulting to option->sa_tile_size in each dimension.\n * *tile_len contains the maximum number of tile sizes needed.\n * Update *tile_len to the number of specified tile sizes, if any, and\n * return a pointer to the tile sizes (or NULL on error).\n * And store the effectively used sizes to sa->used_sizes.\n */\nint *read_latency_tile_sizes(struct autosa_kernel *sa, int tile_len)\n{\n  int n;\n  int *tile_size;\n  isl_set *size;\n\n  tile_size = isl_alloc_array(sa->ctx, int, tile_len);\n  if (!tile_size)\n    return NULL;\n\n  size = extract_sa_sizes(sa->sizes, \"latency\");\n  if (isl_set_dim(size, isl_dim_set) < tile_len)\n  {\n    free(tile_size);\n    isl_set_free(size);\n    return NULL;\n  }\n  if (read_sa_sizes_from_set(size, tile_size, tile_len) < 0)\n    goto error;\n  set_sa_used_sizes(sa, \"latency\", sa->id, tile_size, tile_len);\n\n  return tile_size;\nerror:\n  free(tile_size);\n  return NULL;\n}\n\nint *read_default_latency_tile_sizes(struct autosa_kernel *sa, int tile_len)\n{\n  int n;\n  int *tile_size;\n\n  tile_size = isl_alloc_array(sa->ctx, int, tile_len);\n  if (!tile_size)\n    return NULL;\n  for (n = 0; n < tile_len; ++n)\n    tile_size[n] = sa->scop->options->autosa->sa_tile_size / 2;\n\n  return tile_size;\n}\n\nint *read_simd_tile_sizes(struct autosa_kernel *sa, int tile_len)\n{\n  int n;\n  int *tile_size;\n  isl_set *size;\n\n  tile_size = isl_alloc_array(sa->ctx, int, tile_len);\n  if (!tile_size)\n    return NULL;\n\n  size = extract_sa_sizes(sa->sizes, \"simd\");\n  if (isl_set_dim(size, isl_dim_set) < tile_len)\n  {\n    free(tile_size);\n    isl_set_free(size);\n    return NULL;\n  }\n  if (read_sa_sizes_from_set(size, tile_size, tile_len) < 0)\n    goto error;\n  set_sa_used_sizes(sa, \"simd\", sa->id, tile_size, tile_len);\n\n  return tile_size;\nerror:\n  free(tile_size);\n  return NULL;\n}\n\nint *read_default_simd_tile_sizes(struct autosa_kernel *sa, int tile_len)\n{\n  int n;\n  int *tile_size;\n\n  tile_size = isl_alloc_array(sa->ctx, int, tile_len);\n  if (!tile_size)\n    return NULL;\n  for (n = 0; n < tile_len; ++n)\n    tile_size[n] = sa->scop->options->autosa->sa_tile_size / 2;\n\n  return tile_size;\n}\n\nint read_space_time_kernel_id(__isl_keep isl_union_map *sizes)\n{\n  isl_set *size;\n  int kernel_id;\n  int dim;\n  size = extract_sa_sizes(sizes, \"space_time\");\n  if (!size)\n    return -1;\n  dim = isl_set_dim(size, isl_dim_set);\n  if (dim == 0)\n    return -1;\n  else\n  {\n    read_sa_sizes_from_set(size, &kernel_id, 1);\n    return kernel_id;\n  }\n}\n\nint *read_array_part_L2_tile_sizes(struct autosa_kernel *sa, int tile_len)\n{\n  int n;\n  int *tile_size;\n  isl_set *size;\n\n  tile_size = isl_alloc_array(sa->ctx, int, tile_len);\n  if (!tile_size)\n    return NULL;\n\n  size = extract_sa_sizes(sa->sizes, \"array_part_L2\");\n  if (isl_set_dim(size, isl_dim_set) < tile_len)\n  {\n    free(tile_size);\n    isl_set_free(size);\n    return NULL;\n  }\n  if (read_sa_sizes_from_set(size, tile_size, tile_len) < 0)\n    goto error;\n  set_sa_used_sizes(sa, \"array_part_L2\", sa->id, tile_size, tile_len);\n\n  return tile_size;\nerror:\n  free(tile_size);\n  return NULL;\n}\n\nint *read_default_array_part_L2_tile_sizes(struct autosa_kernel *sa, int tile_len)\n{\n  int n;\n  int *tile_size;\n\n  tile_size = isl_alloc_array(sa->ctx, int, tile_len);\n  if (!tile_size)\n    return NULL;\n  for (n = 0; n < tile_len; ++n)\n    tile_size[n] = sa->scop->options->autosa->sa_tile_size;\n\n  return tile_size;\n}\n\n/****************************************************************\n * AutoSA latency and resource estimation\n ****************************************************************/\nstruct extract_loop_info_data\n{\n  cJSON *loop_struct;\n};\n\n/* Extract the loop info containing: iterator, lower bound,\n * upper bound, and stride.\n * Return the pointer to the loop child.\n */\nstatic cJSON *extract_isl_ast_node_for(__isl_keep isl_ast_node *node, cJSON *loop,\n                                       isl_bool degenerate)\n{\n  cJSON *loop_info = cJSON_CreateObject();\n  cJSON *loop_child = cJSON_CreateObject();\n  isl_printer *p_str = NULL;\n  isl_ctx *ctx = isl_ast_node_get_ctx(node);\n  char *str = NULL;\n\n  /* Extract the loop info */\n  isl_ast_expr *init, *cond, *inc, *iterator, *arg;\n  init = isl_ast_node_for_get_init(node);\n  cond = isl_ast_node_for_get_cond(node);\n  inc = isl_ast_node_for_get_inc(node);\n  iterator = isl_ast_node_for_get_iterator(node);\n\n  /* iterator */\n  p_str = isl_printer_to_str(ctx);\n  p_str = isl_printer_set_output_format(p_str, ISL_FORMAT_C);\n  p_str = isl_printer_print_ast_expr(p_str, iterator);\n  str = isl_printer_get_str(p_str);\n  cJSON_AddStringToObject(loop_info, \"iter\", str);\n  isl_printer_free(p_str);\n  free(str);\n  isl_ast_expr_free(iterator);\n\n  /* lower bound */\n  p_str = isl_printer_to_str(ctx);\n  p_str = isl_printer_set_output_format(p_str, ISL_FORMAT_C);\n  p_str = isl_printer_print_ast_expr(p_str, init);\n  str = isl_printer_get_str(p_str);\n  cJSON_AddStringToObject(loop_info, \"lb\", str);\n  isl_printer_free(p_str);\n  free(str);\n  isl_ast_expr_free(init);\n\n  if (!degenerate)\n  {\n    /* upper bound */\n    p_str = isl_printer_to_str(ctx);\n    p_str = isl_printer_set_output_format(p_str, ISL_FORMAT_C);\n    arg = isl_ast_expr_op_get_arg(cond, 1);\n    p_str = isl_printer_print_ast_expr(p_str, arg);\n    str = isl_printer_get_str(p_str);\n    cJSON_AddStringToObject(loop_info, \"ub\", str);\n    isl_printer_free(p_str);\n    free(str);\n    isl_ast_expr_free(arg);\n\n    /* stride */\n    p_str = isl_printer_to_str(ctx);\n    p_str = isl_printer_set_output_format(p_str, ISL_FORMAT_C);\n    p_str = isl_printer_print_ast_expr(p_str, inc);\n    str = isl_printer_get_str(p_str);\n    cJSON_AddStringToObject(loop_info, \"stride\", str);\n    isl_printer_free(p_str);\n    free(str);\n  }\n  else\n  {\n    const cJSON *lb;\n\n    lb = cJSON_GetObjectItemCaseSensitive(loop_info, \"lb\");\n    cJSON_AddStringToObject(loop_info, \"ub\", lb->valuestring);\n    cJSON_AddStringToObject(loop_info, \"stride\", \"1\");\n  }\n  isl_ast_expr_free(cond);\n  isl_ast_expr_free(inc);\n\n  cJSON_AddItemToObject(loop, \"loop_info\", loop_info);\n  cJSON_AddItemToObject(loop, \"child\", loop_child);\n\n  return loop_child;\n}\n\nstatic cJSON *extract_isl_ast_node_block(__isl_keep isl_ast_node *node, cJSON *block)\n{\n  cJSON *block_child = cJSON_CreateArray();\n  cJSON_AddItemToObject(block, \"child\", block_child);\n\n  return block_child;\n}\n\nstatic cJSON *extract_isl_ast_node_mark(__isl_keep isl_ast_node *node, cJSON *mark)\n{\n  cJSON *mark_child = cJSON_CreateObject();\n  isl_id *id = isl_ast_node_mark_get_id(node);\n  char *name = (char *)isl_id_get_name(id);\n  isl_id_free(id);\n  cJSON_AddStringToObject(mark, \"mark_name\", name);\n  cJSON_AddItemToObject(mark, \"child\", mark_child);\n\n  return mark_child;\n}\n\nstatic cJSON *extract_isl_ast_node_user(__isl_keep isl_ast_node *node, cJSON *user)\n{\n  isl_ctx *ctx = isl_ast_node_get_ctx(node);\n  isl_ast_expr *expr = isl_ast_node_user_get_expr(node);\n  isl_printer *p_str = isl_printer_to_str(ctx);\n  p_str = isl_printer_set_output_format(p_str, ISL_FORMAT_C);\n  p_str = isl_printer_print_ast_expr(p_str, expr);\n  char *user_expr = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  cJSON_AddStringToObject(user, \"user_expr\", user_expr);\n  free(user_expr);\n  isl_ast_expr_free(expr);\n\n  return user;\n}\n\nstatic cJSON *extract_loop_info_at_ast_node(__isl_keep isl_ast_node *node,\n                                            cJSON *loop_struct)\n{\n  enum isl_ast_node_type type;\n  isl_ctx *ctx = isl_ast_node_get_ctx(node);\n  type = isl_ast_node_get_type(node);\n\n  switch (type)\n  {\n  case isl_ast_node_for:\n  {\n    isl_bool degenerate = isl_ast_node_for_is_degenerate(node);\n    /* Extract the loop information and insert it into the loop struct */\n    cJSON *loop = cJSON_CreateObject();\n    cJSON *loop_child = extract_isl_ast_node_for(node, loop, degenerate);\n    if (cJSON_IsObject(loop_struct))\n    {\n      cJSON_AddItemToObject(loop_struct, \"loop\", loop);\n    }\n    else if (cJSON_IsArray(loop_struct))\n    {\n      cJSON *item = cJSON_CreateObject();\n      cJSON_AddItemToObject(item, \"loop\", loop);\n      cJSON_AddItemToArray(loop_struct, item);\n    }\n    isl_ast_node *child_node;\n    /* Update the JSON pointer */\n    child_node = isl_ast_node_for_get_body(node);\n    extract_loop_info_at_ast_node(child_node, loop_child);\n    isl_ast_node_free(child_node);\n\n    break;\n  }\n  case isl_ast_node_block:\n  {\n    /* Extract the block information and insert it into the loop struct */\n    isl_ast_node_list *child_list = isl_ast_node_block_get_children(node);\n    int n_child = isl_ast_node_list_n_ast_node(child_list);\n    cJSON *block = cJSON_CreateObject();\n    cJSON *block_child = extract_isl_ast_node_block(node, block);\n    if (cJSON_IsObject(loop_struct))\n    {\n      cJSON_AddItemToObject(loop_struct, \"block\", block);\n    }\n    else if (cJSON_IsArray(loop_struct))\n    {\n      cJSON *item = cJSON_CreateObject();\n      cJSON_AddItemToObject(item, \"block\", block);\n      cJSON_AddItemToArray(loop_struct, item);\n    }\n\n    isl_ast_node *child_node;\n    for (int i = 0; i < n_child; i++)\n    {\n      cJSON *child_struct;\n      child_node = isl_ast_node_list_get_ast_node(child_list, i);\n      extract_loop_info_at_ast_node(child_node, block_child);\n      isl_ast_node_free(child_node);\n    }\n    isl_ast_node_list_free(child_list);\n\n    break;\n  }\n  case isl_ast_node_user:\n  {\n    /* Extract the user information and insert it into the loop struct */\n    cJSON *user = cJSON_CreateObject();\n    user = extract_isl_ast_node_user(node, user);\n\n    if (cJSON_IsObject(loop_struct))\n    {\n      cJSON_AddItemToObject(loop_struct, \"user\", user);\n    }\n    else if (cJSON_IsArray(loop_struct))\n    {\n      cJSON *item = cJSON_CreateObject();\n      cJSON_AddItemToObject(item, \"user\", user);\n      cJSON_AddItemToArray(loop_struct, item);\n    }\n\n    break;\n  }\n  case isl_ast_node_if:\n  {\n    cJSON *if_struct = cJSON_CreateObject();\n    cJSON *then_struct = cJSON_CreateObject();\n    cJSON *else_struct = NULL;\n    if (cJSON_IsObject(loop_struct))\n    {\n      cJSON_AddItemToObject(loop_struct, \"if\", if_struct);\n    }\n    else if (cJSON_IsArray(loop_struct))\n    {\n      cJSON *item = cJSON_CreateObject();\n      cJSON_AddItemToObject(item, \"if\", if_struct);\n      cJSON_AddItemToArray(loop_struct, item);\n    }\n\n    isl_ast_node *child_node;\n    child_node = isl_ast_node_if_get_then_node(node);\n    cJSON_AddItemToObject(if_struct, \"then\", then_struct);\n    extract_loop_info_at_ast_node(child_node, then_struct);\n    isl_ast_node_free(child_node);\n\n    child_node = isl_ast_node_if_get_else_node(node);\n    if (child_node)\n    {\n      else_struct = cJSON_CreateObject();\n      cJSON_AddItemToObject(if_struct, \"else\", else_struct);\n      extract_loop_info_at_ast_node(child_node, else_struct);\n      isl_ast_node_free(child_node);\n    }\n\n    break;\n  }\n  case isl_ast_node_mark:\n  {\n    /* Extract the mark id and insert it into the loop struct */\n    cJSON *mark = cJSON_CreateObject();\n    cJSON *mark_child = extract_isl_ast_node_mark(node, mark);\n    if (cJSON_IsObject(loop_struct))\n    {\n      cJSON_AddItemToObject(loop_struct, \"mark\", mark);\n    }\n    else if (cJSON_IsArray(loop_struct))\n    {\n      cJSON *item = cJSON_CreateObject();\n      cJSON_AddItemToObject(item, \"mark\", mark);\n      cJSON_AddItemToArray(loop_struct, item);\n    }\n\n    isl_ast_node *child_node;\n    child_node = isl_ast_node_mark_get_node(node);\n    extract_loop_info_at_ast_node(child_node, mark_child);\n    isl_ast_node_free(child_node);\n\n    break;\n  }\n  default:\n    break;\n  }\n\n  return NULL;\n}\n\n/* Extract the loop structure and detailed information of the hardware module into \n * a JSON struct. If \"print\" is set, we will print out the JSON file. \n * Otherwise, return it as a string.\n */\nstatic char *extract_loop_info_from_module(\n    struct autosa_gen *gen, __isl_keep isl_ast_node *tree,\n    char *module_name, int double_buffer, int in,\n    int print)\n{\n  if (!tree)\n    return NULL;\n\n  cJSON *loop_struct = cJSON_CreateObject();\n  cJSON *module_props = cJSON_CreateObject();\n  char *json_str = NULL;\n\n  cJSON_AddStringToObject(loop_struct, \"module_name\", module_name);\n  cJSON_AddNumberToObject(module_props, \"double_buffer\", double_buffer);  \n  cJSON_AddNumberToObject(module_props, \"in\", in);\n  cJSON_AddItemToObject(loop_struct, \"module_prop\", module_props);\n  \n  extract_loop_info_at_ast_node(tree, loop_struct);\n\n  /* Print the JSON file */\n  json_str = cJSON_Print(loop_struct);\n\n  if (!print)\n  {\n    cJSON_Delete(loop_struct);\n    return json_str;\n  }\n  else\n  {\n    char *file_name;\n    FILE *fp;\n    isl_printer *p_str;\n    const cJSON *module_name = NULL;\n\n    module_name = cJSON_GetObjectItemCaseSensitive(loop_struct, \"module_name\");\n    p_str = isl_printer_to_str(gen->ctx);\n    p_str = isl_printer_print_str(p_str, gen->options->autosa->output_dir);\n    p_str = isl_printer_print_str(p_str, \"/latency_est/\");\n    p_str = isl_printer_print_str(p_str, module_name->valuestring);\n    p_str = isl_printer_print_str(p_str, \"_loop_info.json\");\n    file_name = isl_printer_get_str(p_str);\n    isl_printer_free(p_str);\n    cJSON_Delete(loop_struct);\n\n    fp = fopen(file_name, \"w\");\n    if (!fp)\n    {\n      printf(\"[AutoSA] Error: Cannot open file: %s\\n\", file_name);\n      exit(1);\n    }\n    free(file_name);\n    fprintf(fp, \"%s\", json_str);\n    fclose(fp);\n    free(json_str);\n\n    return NULL;\n  }\n}\n\n/* Extract the loop structure and detailed information of the hardware module into \n * a JSON struct.\n */\nisl_stat sa_extract_loop_info(struct autosa_gen *gen, struct autosa_hw_module *module)\n{\n  char *module_name = NULL;\n  char *json_str = NULL;\n  isl_ctx *ctx = gen->ctx;\n\n  if (module->is_filter && module->is_buffer)\n  {\n    /* Parse the loop structure of the intra trans module */\n    module_name = concat(ctx, module->name, \"intra_trans\");\n    json_str = extract_loop_info_from_module(gen, module->intra_tree, module_name, module->double_buffer, module->in, 1);\n    free(module_name);\n\n    /* Parse the loop structure of the inter trans module */\n    module_name = concat(ctx, module->name, \"inter_trans\");\n    json_str = extract_loop_info_from_module(gen, module->inter_tree, module_name, module->double_buffer, module->in, 1);\n    free(module_name);\n\n    if (module->boundary)\n    {\n      module_name = concat(ctx, module->name, \"inter_trans_boundary\");\n      json_str = extract_loop_info_from_module(gen, module->boundary_inter_tree, module_name, module->double_buffer, module->in, 1);\n      free(module_name);\n    }\n  }\n\n  /* Parse the loop structure of the default module */\n  json_str = extract_loop_info_from_module(gen, module->device_tree, module->name, module->double_buffer, module->in, 1);\n\n  /* Parse the loop structure of the boundary module */\n  if (module->boundary)\n  {\n    module_name = concat(ctx, module->name, \"boundary\");\n    json_str = extract_loop_info_from_module(gen, module->boundary_tree, module_name, module->double_buffer, module->in, 1);\n    free(module_name);\n  }\n\n  /* Parse the loop structure of the dummy module */\n  if (module->n_pe_dummy_modules > 0)\n  {\n    for (int i = 0; i < module->n_pe_dummy_modules; i++)\n    {\n      struct autosa_pe_dummy_module *dummy_module = module->pe_dummy_modules[i];\n      struct autosa_array_ref_group *group = dummy_module->io_group;\n\n      /* Generate module name */\n      isl_printer *p_str = isl_printer_to_str(gen->ctx);\n      p_str = autosa_array_ref_group_print_prefix(group, p_str);\n      p_str = isl_printer_print_str(p_str, \"_PE_dummy\");\n      module_name = isl_printer_get_str(p_str);\n      isl_printer_free(p_str);\n      json_str = extract_loop_info_from_module(gen, dummy_module->device_tree, module_name, 0, 0, 1);\n      free(module_name);\n    }\n  }\n\n  return isl_stat_ok;\n}\n\n/* Extract the array type information that will be used for latency estimation.\n */\nisl_stat sa_extract_array_info(struct autosa_kernel *kernel)\n{\n  cJSON *array_info = cJSON_CreateObject();\n  char *json_str = NULL;\n  FILE *fp;\n  isl_printer *p_str;\n  char *file_path;\n\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    cJSON *array = cJSON_CreateObject();\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    char *array_name = local_array->array->name; /* Name of the array */\n    char *array_type = local_array->array->type; /* Element type */\n\n    cJSON *n_lane = cJSON_CreateNumber(local_array->n_lane);          /* Data pack factor of the array */\n    cJSON *array_size = cJSON_CreateNumber(local_array->array->size); /* Element size */\n\n    cJSON_AddItemToObject(array, \"n_lane\", n_lane);\n    cJSON_AddStringToObject(array, \"ele_type\", array_type);\n    cJSON_AddItemToObject(array, \"ele_size\", array_size);\n    cJSON_AddItemToObject(array_info, array_name, array);\n  }\n\n  /* Print out the JSON */\n  json_str = cJSON_Print(array_info);\n  p_str = isl_printer_to_str(kernel->ctx);\n  p_str = isl_printer_print_str(p_str, kernel->options->autosa->output_dir);\n  p_str = isl_printer_print_str(p_str, \"/latency_est/array_info.json\");\n  file_path = isl_printer_get_str(p_str);\n  fp = fopen(file_path, \"w\");\n  if (!fp)\n  {\n    printf(\"[AutoSA] Error: Cannot open file: %s\\n\", file_path);\n    exit(1);\n  }\n  isl_printer_free(p_str);\n  free(file_path);\n  fprintf(fp, \"%s\", json_str);\n  fclose(fp);\n  free(json_str);\n  cJSON_Delete(array_info);\n\n  return isl_stat_ok;\n}\n\nisl_stat TP_extract_loop_info(struct autosa_gen *gen, struct autosa_hw_module *module) {\n  std::vector<isl_ast_node *> asts;  \n  if (module->is_filter && module->is_buffer) {\n    if (module->in) {\n      //std::cout << module->name << std::endl;\n      //DBGASTNODE(stdout, module->tuning_device_tree, gen->ctx);\n      //DBGASTNODE(stdout, module->tuning_intra_tree, gen->ctx);\n      //DBGASTNODE(stdout, module->tuning_inter_tree, gen->ctx);\n    }\n     asts.push_back(module->tuning_device_tree);\n     asts.push_back(module->tuning_intra_tree);\n     asts.push_back(module->tuning_inter_tree);              \n  } else {\n    /* Default module */\n    //if (!module->in) {\n    //  std::cout << module->name << std::endl;\n    //  DBGASTNODE(stdout, module->tuning_device_tree, gen->ctx);\n    //}\n    asts.push_back(module->tuning_device_tree);        \n  }\n  gen->kernel->tuning_program->extract_module_loop_info(      \n      std::string(module->name), asts);\n  \n  return isl_stat_ok;\n}\n\nisl_stat TP_extract_module_attr(struct autosa_gen *gen, struct autosa_hw_module *module) {\n  gen->kernel->tuning_program->extract_module_attr(      \n      std::string(module->name), module->double_buffer, module->in, \n      (module->type == IO_MODULE || module->type == DRAIN_MODULE)? 1 : 0,\n      module->to_mem, module->is_serialized, module->to_pe, module->is_filter);\n  if (module->is_filter && module->is_buffer) {\n    gen->kernel->tuning_program->extract_module_attr(\n      std::string(module->name) + std::string(\"_inter\"),  module->double_buffer, module->in, \n      (module->type == IO_MODULE || module->type == DRAIN_MODULE)? 1 : 0,\n      module->to_mem, module->is_serialized, module->to_pe, module->is_filter);\n    gen->kernel->tuning_program->extract_module_attr(\n      std::string(module->name) + std::string(\"_intra\"), module->double_buffer, module->in, \n      (module->type == IO_MODULE || module->type == DRAIN_MODULE)? 1 : 0,\n      module->to_mem, module->is_serialized, module->to_pe, module->is_filter);  \n  }\n\n  return isl_stat_ok;\n}\n\n/* Extract the memory (BRAM) and computation (DSP) information that will be used for \n * resource estimation in the auto-tuner.\n */\nisl_stat TP_extract_resource_info(struct autosa_gen *gen, struct autosa_hw_module *module) {\n  /* memory */\n  //std::cout << module->name << \": \" << module->is_buffer << std::endl;\n  if ((module->type == IO_MODULE || module->type == DRAIN_MODULE) && module->is_buffer) {    \n    int double_buffer = module->double_buffer;\n    struct autosa_array_ref_group *group = module->io_groups[0];\n    for (int i = 0; i < group->n_io_buffer; i++) {\n      if (group->io_buffers[i]->tuning_tile) {    \n        std::vector<isl_ast_node *> asts;\n        if (module->is_filter) {\n          asts.push_back(module->tuning_num_device_tree);\n          asts.push_back(module->tuning_num_inter_tree);\n          gen->kernel->tuning_program->extract_module_memory_info(\n              std::string(module->name), double_buffer, group->io_buffers[i]->tuning_tile, asts);\n        } else {\n          asts.push_back(module->tuning_num_device_tree);\n          gen->kernel->tuning_program->extract_module_memory_info(\n              std::string(module->name), double_buffer, group->io_buffers[i]->tuning_tile, asts);\n        }\n      }\n    }    \n  } else if (module->type == PE_MODULE) {    \n    //if (!((gen->kernel->options->autosa->local_reduce && gen->kernel->options->autosa->array_contraction) ||         \n    //      (gen->kernel->options->autosa->tuning_method == 1 && gen->kernel->options->autosa->array_contraction))) {\n      for (int i = 0; i < gen->kernel->n_array; i++) {\n        struct autosa_local_array_info *array = &(gen->kernel->array[i]);\n        for (int j = 0; j < array->n_pe_group; j++) {\n          struct autosa_array_ref_group *group = array->pe_groups[j];\n          if (group->tuning_local_tile) {\n            std::vector<isl_ast_node *> asts;\n            asts.push_back(module->tuning_num_device_tree);\n            gen->kernel->tuning_program->extract_module_memory_info(\n                std::string(module->name), 0, group->tuning_local_tile, asts);\n          }\n        }\n      }\n    //}    \n  }\n\n  /* compute */\n  if (module->type == PE_MODULE) {\n    std::string ele_type = std::string(module->io_groups[0]->array->type);\n    gen->kernel->tuning_program->extract_module_compute_info(\n        std::string(module->name), ele_type, module->tuning_num_device_tree);\n  }  \n\n  /* io */\n  if ((module->type == IO_MODULE || module->type == DRAIN_MODULE)) {    \n    struct autosa_array_ref_group *group = module->io_groups[0];\n    std::vector<isl_ast_node *> asts;\n    if (module->is_filter) {\n      asts.push_back(module->tuning_num_device_tree);\n      asts.push_back(module->tuning_num_inter_tree);         \n    } else {\n      asts.push_back(module->tuning_num_device_tree);      \n    }\n    gen->kernel->tuning_program->extract_module_io_info(\n      std::string(module->name), module->level, asts);\n  }\n\n  return isl_stat_ok;\n}\n\n/* Extract the array references in the prog and build a mapping in the tuning program. \n */\nisl_stat TP_extract_array_info(struct autosa_gen *gen, struct autosa_kernel *kernel) {\n  struct autosa_prog *prog = gen->prog;  \n  isl_schedule *schedule = kernel->schedule;\n  isl_schedule_node *root = isl_schedule_get_root(schedule);\n  isl_union_map *umap_schedule = isl_schedule_node_get_subtree_schedule_union_map(root);\n  //DBGUMAP(stdout, umap_schedule, gen->ctx);\n  isl_schedule_node_free(root);\n\n  for (int i = 0; i < prog->n_array; i++) {\n    struct autosa_array_info *array = &(prog->array[i]);\n    TPArray *tp_arr = new TPArray(std::string(array->name));\n    assert(array->tuning_refs.size() == 0);\n    array->tuning_refs.clear();\n    for (int j = 0; j < array->n_ref; j++) {\n      struct autosa_stmt_access *ref = array->refs[j];\n      isl_map *access = ref->access;      \n      /* Build the tuning program array access representation. */\n      std::shared_ptr<TPArrayRef> tp_ref = kernel->tuning_program->build_array_ref(std::string(array->name), access, schedule);\n     \n      tp_arr->refs.push_back(std::shared_ptr<TPArrayRef>(tp_ref));      \n      array->tuning_refs.push_back(std::shared_ptr<TPArrayRef>(tp_ref));\n    }\n    kernel->tuning_program->arrays.push_back(tp_arr);    \n  }  \n  isl_union_map_free(umap_schedule);\n\n  return isl_stat_ok;\n}\n\n/* Generate a tiled array reference. */\nTPArrayTile *TP_infer_tiled_array(\n  struct autosa_gen *gen, struct autosa_kernel *kernel, \n  __isl_keep struct isl_schedule_node *node,\n  struct autosa_array_ref_group *group,\n  int read, int write)\n{\n  // Collect all accesses in the group\n  std::vector<std::shared_ptr<TPArrayRef>> group_refs;\n  for (int i = 0; i < group->n_ref; i++) {\n    if (!((read && group->refs[i]->read) ||\n          (write && group->refs[i]->write)))\n      continue;\n    group_refs.push_back(group->tuning_refs[i]);\n  }\n\n  // Collect the fixed iter dimensions\n  std::vector<TPIterator *> fixed_iters;  \n  isl_schedule_node *new_node = isl_schedule_node_copy(node);\n  while (isl_schedule_node_has_parent(new_node)) {\n    if (isl_schedule_node_get_type(new_node) == isl_schedule_node_band) {\n      for (int i = 0; i < isl_schedule_node_band_n_member(new_node); i++) {\n        TPIterator *iter = (TPIterator *)isl_schedule_node_band_member_get_iter(new_node, i);\n        if (iter) {\n          fixed_iters.push_back(iter);\n        } else {\n          std::cout << \"not found\" << std::endl;\n        }\n      }\n    }\n    new_node = isl_schedule_node_parent(new_node);\n  }\n  isl_schedule_node_free(new_node);  \n\n  // Infer the tile bounds\n  TPArrayTile *array_tile = new TPArrayTile();\n  array_tile = kernel->tuning_program->infer_tiled_array_bounds(array_tile, group_refs, fixed_iters);  \n  array_tile->name = std::string(group->array->name);\n  array_tile->type = std::string(group->array->type);\n  array_tile->ele_size = group->array->size;\n\n  return array_tile;\n}\n\n/* Extract the memory type of the local array.\n * Heuristics: \n * Compute the buffer utilization (18Kb BRAM):\n * - If the buffer port width < 18bits, util = #ele / 1024\n * - Otherwise, util = #ele / 512\n * \n * If the local buffer is inside PE module or I/O/drain module at IO_L1:\n * - If the buffer uses primitive type (n_lane == 1) and #ele <= 32, use FF\n * - Otherwise, use BRAM\n * Otherwise:\n * - If the module is connected to DRAM, use URAM if URAM is allowed, otherwise\n *   use BRAM.\n * - Otherwise, if memory util > 0.2 use BRAM, else use LUTRAM.\n */\nint extract_memory_type(struct autosa_hw_module *module,\n                        struct autosa_kernel_var *var, int uram)\n{\n  /* 0: FF 1: LUTRAM 2: BRAM 3: URAM */\n  int use_memory = 0;\n  int var_size = 1;\n  float bram_util;\n\n  for (int i = 0; i < isl_vec_size(var->size); ++i)\n  {\n    isl_val *v = isl_vec_get_element_val(var->size, i);    \n    long v_i = isl_val_get_num_si(v);\n    var_size *= v_i;\n    isl_val_free(v);\n  }\n  if (var->array->size * var->n_lane < 3)\n    bram_util = (float)var_size / 1024;\n  else\n    bram_util = (float)var_size / 512;\n  \n  if (module->type != PE_MODULE && module->to_mem == 1) {\n    if (uram)\n      use_memory = 3;\n    else\n      use_memory = 2;\n  } else {    \n    //if (module->type == IO_MODULE && module->level == 1) {          \n    //  use_memory = 1;      \n    //} else {\n    //  if (var->n_lane == 1 && var_size <= 8)\n    //    use_memory = 0;\n    //  else\n    //    use_memory = 2;    \n    //}    \n    if (var->n_lane == 1 && var_size <= 8)\n        use_memory = 0;\n      else\n        use_memory = 2;\n  }  \n\n  if (use_memory == 0) \n    module->use_FF = 1;\n\n  return use_memory;\n}\n\nstatic cJSON *extract_buffer_info_from_module(struct autosa_gen *gen,\n                                              struct autosa_hw_module *module,\n                                              struct autosa_kernel_var *var, const char *suffix)\n{\n  cJSON *buffer = cJSON_CreateObject();\n\n  /* Generate buffer name */\n  char *buffer_name = var->name;\n  if (suffix)\n    buffer_name = concat(gen->ctx, buffer_name, suffix);\n  cJSON_AddStringToObject(buffer, \"buffer_name\", buffer_name);\n  if (suffix)\n    free(buffer_name);\n\n  /* Generate buffer port width */\n  int n_lane = var->n_lane;\n  int ele_size = var->array->size;\n  int port_w = n_lane * ele_size; // in bytes\n  cJSON *port_width = cJSON_CreateNumber(port_w);\n  cJSON_AddItemToObject(buffer, \"port_width\", port_width);\n\n  /* Generate buffer size */\n  int size = 1;\n  for (int j = 0; j < isl_vec_size(var->size); j++)\n  {\n    isl_val *v;\n    int v_int;\n    v = isl_vec_get_element_val(var->size, j);\n    v_int = isl_val_get_num_si(v);\n    isl_val_free(v);\n    size *= v_int;\n  }\n  cJSON *buffer_size = cJSON_CreateNumber(size);\n  cJSON_AddItemToObject(buffer, \"buffer_depth\", buffer_size);\n\n  /* Partition number */\n  cJSON *n_part = cJSON_CreateNumber(var->n_part);\n  cJSON_AddItemToObject(buffer, \"partition_number\", n_part);\n\n  /* Buffer memory type */\n  int mem_type = extract_memory_type(module, var, gen->options->autosa->uram);\n  if (mem_type == 0)\n    cJSON_AddStringToObject(buffer, \"mem_type\", \"FF\");\n  else if (mem_type == 1)\n    cJSON_AddStringToObject(buffer, \"mem_type\", \"LUTRAM\");\n  else if (mem_type == 2)\n    cJSON_AddStringToObject(buffer, \"mem_type\", \"BRAM\");\n  else\n    cJSON_AddStringToObject(buffer, \"mem_type\", \"URAM\");\n\n  ///* Array map */\n  //if (module->double_buffer) {\n  //  cJSON_AddStringToObject(buffer, \"array_map\", \"horizontal\");\n  //}\n\n  return buffer;\n}\n\n/* If \"buffer\" is set 1, extract local buffer information. */\nstatic cJSON *extract_design_info_from_module(struct autosa_gen *gen,\n                                              struct autosa_hw_module *module, char *module_name, int buffer)\n{\n  cJSON *info = cJSON_CreateObject();\n  int double_buffer = module->double_buffer;\n\n  if (module->type == PE_MODULE)\n  {\n    /* Extract the SIMD factor */\n    cJSON *unroll = cJSON_CreateNumber(gen->kernel->simd_w);\n    cJSON_AddItemToObject(info, \"unroll\", unroll);\n    cJSON *lat_hide_len = cJSON_CreateNumber(gen->kernel->lat_hide_len);\n    cJSON_AddItemToObject(info, \"latency_hide_len\", lat_hide_len);\n\n    int *fifo_lanes_num = (int *)malloc(module->n_io_group * sizeof(int));\n    for (int i = 0; i < module->n_io_group; i++)\n      fifo_lanes_num[i] = module->io_groups[i]->n_lane;\n    cJSON *fifo_lanes = cJSON_CreateIntArray(fifo_lanes_num, module->n_io_group);\n    cJSON_AddItemToObject(info, \"fifo_lanes\", fifo_lanes);\n    free(fifo_lanes_num);\n  }\n  else\n  {\n    /* Extract the input and output data lanes and width */\n    cJSON *data_pack_inter = cJSON_CreateNumber(module->data_pack_inter);\n    cJSON *data_pack_intra = cJSON_CreateNumber(module->data_pack_intra);\n    cJSON_AddItemToObject(info, \"data_pack_inter\", data_pack_inter);\n    cJSON_AddItemToObject(info, \"data_pack_intra\", data_pack_intra);\n\n    struct autosa_array_ref_group *group = module->io_groups[0];\n    struct autosa_array_info *array = group->array;\n    cJSON_AddStringToObject(info, \"ele_type\", array->type);\n    cJSON *data_size = cJSON_CreateNumber(array->size);\n    cJSON_AddItemToObject(info, \"ele_size\", data_size);\n\n    /* Mark the module accessing the DRAM */\n    if (module->to_mem) {\n      cJSON_AddNumberToObject(info, \"access_mem\", 1);\n    } else {\n      cJSON_AddNumberToObject(info, \"access_mem\", 0);\n    }\n  }\n  /* Extract the local buffer */\n  if (buffer)\n  {\n    cJSON *buffers = cJSON_CreateArray();\n    for (int i = 0; i < module->n_var; ++i)\n    {\n      cJSON *buffer = NULL;\n      struct autosa_kernel_var *var = &module->var[i];\n      if (double_buffer)\n      {\n        buffer = extract_buffer_info_from_module(gen, module, var, \"ping\");\n        cJSON_AddItemToArray(buffers, buffer);\n        buffer = extract_buffer_info_from_module(gen, module, var, \"pong\");\n        cJSON_AddItemToArray(buffers, buffer);\n      }\n      else\n      {\n        buffer = extract_buffer_info_from_module(gen, module, var, NULL);\n        cJSON_AddItemToArray(buffers, buffer);\n      }\n    }\n    cJSON_AddItemToObject(info, \"local_buffers\", buffers);\n  }\n\n  return info;\n}\n\nstatic cJSON *extract_design_info_from_serialize_module(struct autosa_gen *gen,\n                                                        struct autosa_hw_module *module, char *module_name)\n{\n  cJSON *info = cJSON_CreateObject();\n  /* Extract the input and output data lanes and width */\n  cJSON *data_pack_inter = cJSON_CreateNumber(module->data_pack_serialize);\n  cJSON *data_pack_intra = cJSON_CreateNumber(module->data_pack_intra);\n  cJSON_AddItemToObject(info, \"data_pack_inter\", data_pack_inter);\n  cJSON_AddItemToObject(info, \"data_pack_intra\", data_pack_intra);\n\n  struct autosa_array_ref_group *group = module->io_groups[0];\n  struct autosa_array_info *array = group->array;\n  cJSON_AddStringToObject(info, \"ele_type\", array->type);\n  cJSON *data_size = cJSON_CreateNumber(array->size);\n  cJSON_AddItemToObject(info, \"ele_size\", data_size);\n\n  return info;\n}\n\n/* Extract the data packing factor \"n_lane\" for PE dummy module.\n * Note that for PE dummay module with internal array, if the I/O type is \n * interior I/O, we look for the n_lane of IO_L1 buffer.\n */\nstatic cJSON *extract_design_info_from_pe_dummy_module(struct autosa_gen *gen,\n                                                       struct autosa_pe_dummy_module *module, char *module_name)\n{\n  cJSON *info = cJSON_CreateObject();\n  struct autosa_array_ref_group *group = module->io_group;\n  int n_lane = (group->local_array->array_type == AUTOSA_EXT_ARRAY) ? group->n_lane : ((group->group_type == AUTOSA_DRAIN_GROUP) ? group->n_lane : (group->io_type == AUTOSA_EXT_IO) ? group->n_lane : group->io_buffers[0]->n_lane);\n  cJSON *data_pack = cJSON_CreateNumber(n_lane);\n  cJSON_AddItemToObject(info, \"unroll\", data_pack);\n\n  return info;\n}\n\n/* Exatract the design information into a JSON struct for resource estimation.\n * If the module contains buffers, extract the buffer information.\n * For I/O modules, extract:\n * - input and output data lanes and width\n * For PE modules, extract:\n * - simd factor if any\n */\nisl_stat sa_extract_design_info(struct autosa_gen *gen)\n{\n  cJSON *design_info = cJSON_CreateObject();\n  char *json_str = NULL;\n  FILE *fp;\n  struct autosa_hw_top_module *top = gen->hw_top_module;\n  isl_ctx *ctx = gen->ctx;\n  isl_printer *p_str;\n  char *file_path;\n\n  /* kernel id */\n  //DBGVAR(std::cout, gen->kernel->id);\n  cJSON *kernel_id = cJSON_CreateNumber(gen->kernel->id);\n  cJSON_AddItemToObject(design_info, \"kernel_id\", kernel_id);\n\n  /* module */\n  cJSON *modules = cJSON_CreateObject();\n  cJSON_AddItemToObject(design_info, \"modules\", modules);\n  for (int i = 0; i < gen->n_hw_modules; i++)\n  {\n    struct autosa_hw_module *module = gen->hw_modules[i];\n    char *module_name;\n    cJSON *info;\n\n    if (module->is_filter && module->is_buffer)\n    {\n      /* intra_trans */\n      module_name = concat(ctx, module->name, \"intra_trans\");\n      info = extract_design_info_from_module(gen, module, module_name, 0);\n      cJSON_AddItemToObject(modules, module_name, info);\n      free(module_name);\n\n      /* inter_trans */\n      module_name = concat(ctx, module->name, \"inter_trans\");\n      info = extract_design_info_from_module(gen, module, module_name, 0);\n      cJSON_AddItemToObject(modules, module_name, info);\n      free(module_name);\n\n      if (module->boundary)\n      {\n        module_name = concat(ctx, module->name, \"inter_trans_boundary\");\n        info = extract_design_info_from_module(gen, module, module_name, 0);\n        cJSON_AddItemToObject(modules, module_name, info);\n        free(module_name);\n      }\n    }\n\n    /* default module */\n    info = extract_design_info_from_module(gen, module, module_name, 1);\n    cJSON_AddItemToObject(modules, module->name, info);\n\n    /* boundary module */\n    if (module->boundary)\n    {\n      module_name = concat(ctx, module->name, \"boundary\");\n      info = extract_design_info_from_module(gen, module, module_name, 1);\n      cJSON_AddItemToObject(modules, module_name, info);\n      free(module_name);\n    }\n\n    if (module->n_pe_dummy_modules > 0)\n    {\n      for (int i = 0; i < module->n_pe_dummy_modules; i++)\n      {\n        struct autosa_pe_dummy_module *dummy_module = module->pe_dummy_modules[i];\n        struct autosa_array_ref_group *group = dummy_module->io_group;\n        char *module_name;\n        /* Generate module name */\n        isl_printer *p_str = isl_printer_to_str(ctx);\n        p_str = isl_printer_print_str(p_str, group->array->name);\n        if (group->group_type == AUTOSA_IO_GROUP)\n        {\n          if (group->local_array->n_io_group > 1)\n          {\n            p_str = isl_printer_print_str(p_str, \"_\");\n            p_str = isl_printer_print_int(p_str, group->nr);\n          }\n        }\n        else if (group->group_type == AUTOSA_DRAIN_GROUP)\n        {\n          p_str = isl_printer_print_str(p_str, \"_\");\n          p_str = isl_printer_print_str(p_str, \"drain\");\n        }\n        p_str = isl_printer_print_str(p_str, \"_PE_dummy\");\n        if (dummy_module->in) \n          p_str = isl_printer_print_str(p_str, \"_in\");\n        else\n          p_str = isl_printer_print_str(p_str, \"_out\");\n        module_name = isl_printer_get_str(p_str);\n        isl_printer_free(p_str);\n        info = extract_design_info_from_pe_dummy_module(gen, dummy_module, module_name);\n        cJSON_AddItemToObject(modules, module_name, info);\n        free(module_name);\n      }\n    }\n\n    if (module->is_serialized) {\n      if (module->boundary)\n        module_name = concat(ctx, module->name, \"boundary_serialize\");\n      else\n        module_name = concat(ctx, module->name, \"serialize\");\n      info = extract_design_info_from_serialize_module(gen, module, module_name);\n      cJSON_AddItemToObject(modules, module_name, info);\n      free(module_name);\n    }\n  }\n\n  json_str = cJSON_Print(design_info);\n  p_str = isl_printer_to_str(gen->ctx);\n  p_str = isl_printer_print_str(p_str, gen->options->autosa->output_dir);\n  p_str = isl_printer_print_str(p_str, \"/resource_est/design_info.json\");\n  file_path = isl_printer_get_str(p_str);\n  fp = fopen(file_path, \"w\");\n  if (!fp)\n  {\n    printf(\"[AutoSA] Error: Cannot open file: %s\\n\", file_path);\n  }\n  fprintf(fp, \"%s\", json_str);\n  fclose(fp);\n  free(file_path);\n  isl_printer_free(p_str);\n  cJSON_Delete(design_info);\n  free(json_str);\n\n  return isl_stat_ok;\n}\n\n/* The sparse info is provided in the format of \n * kernel[]->block_sparse[n_non_zero_num, vec_len]\n * Extract these information and compute the extra meta i nformation.\n */\nisl_stat autosa_kernel_extract_sparse_info(struct autosa_kernel *kernel, \n  struct autosa_gen *gen)\n{\n  isl_union_map *sparse_info;\n  isl_set *size;\n  int *ratios;\n  int array_size;\n\n  ratios = isl_alloc_array(kernel->ctx, int, 2);\n  if (!ratios) {\n    return isl_stat_error;\n  }\n\n  sparse_info = extract_sizes_from_str(kernel->ctx, gen->options->autosa->block_sparse_ratio);\n  for (int i = 0; i < kernel->n_array; i++) {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    isl_set *tmp_size;    \n    tmp_size = extract_sa_sizes(sparse_info, local_array->array->name);\n    if (tmp_size) {\n      local_array->is_sparse = 1;\n      size = tmp_size;    \n    } else {\n      isl_set_free(tmp_size);\n    }\n  }\n  isl_union_map_free(sparse_info);\n\n  if (isl_set_dim(size, isl_dim_set) < 2) {\n    isl_set_free(size);\n    free(ratios);    \n    return isl_stat_error;\n  }\n\n  if (read_sa_sizes_from_set(size, ratios, 2) < 0) \n    goto error;\n\n  kernel->sparse = 1;\n  kernel->vec_len = ratios[1];\n  kernel->n_nzero = ratios[0];\n  free(ratios);  \n  kernel->compress_ratio = (float)kernel->vec_len / kernel->n_nzero;\n  /* Get the data type, we assume that all arrays are in the same precisions. */\n  array_size = -1; // in bytes\n  for (int i = 0; i < kernel->n_array; i++) {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (array_size == -1)\n      array_size = local_array->array->size;\n    else {\n      if (array_size != local_array->array->size) {\n        throw std::runtime_error(\"[AutoSA] Error: Arrays with different data types are not supported for the block sparsity.\");\n      }\n    }\n  }\n  /* Currently we only support vec_len no greater than 8. */\n  if (kernel->vec_len > 8) {\n    throw std::runtime_error(\"[AutoSA] Error: Block size greater than 8 is not supported for the block sparsity.\");\n  }\n\n  /* For Xilinx HLS, data needs to be aligned with 32/64/128/256/512-bit boundary. */\n  if (array_size * kernel->n_nzero * 8 + 8 <= 32) {\n    kernel->n_meta_data = (32 / 8 - array_size * kernel->n_nzero) / array_size;\n  } else if (array_size * kernel->n_nzero * 8 + 8 <= 64) {\n    kernel->n_meta_data = (64 / 8 - array_size * kernel->n_nzero) / array_size;\n  } else if (array_size * kernel->n_nzero * 8 + 8 <= 128) {\n    kernel->n_meta_data = (128 / 8 - array_size * kernel->n_nzero) / array_size;\n  } else if (array_size * kernel->n_nzero * 8 + 8 <= 256) {\n    kernel->n_meta_data = (256 / 8 - array_size * kernel->n_nzero) / array_size;\n  } else if (array_size * kernel->n_nzero * 8 + 8 <= 512) {\n    kernel->n_meta_data = (512 / 8 - array_size * kernel->n_nzero) / array_size;\n  } else {\n    throw std::runtime_error(\"[AutoSA] Error: The requested aligned sparse data is longer than 512-bit.\");\n  }\n  kernel->eff_compress_ratio = (float)kernel->vec_len / (kernel->n_nzero + kernel->n_meta_data);    \n  /* Update the local array */\n  for (int i = 0; i < kernel->n_array; i++) {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (local_array->is_sparse) {\n      local_array->vec_len = kernel->vec_len;\n      local_array->n_nzero = kernel->n_nzero;\n      local_array->compress_ratio = kernel->compress_ratio;\n      local_array->n_meta_data = kernel->n_meta_data;\n      local_array->eff_compress_ratio = kernel->eff_compress_ratio;\n    }\n  }\n\n  return isl_stat_ok;\nerror:    \n  free(ratios);\n  return isl_stat_error;\n}"
  },
  {
    "path": "src/autosa_common.h",
    "content": "#ifndef _AUTOSA_COMMON_H\n#define _AUTOSA_COMMON_H\n\n#include <assert.h>\n#include <limits.h>\n#include <string.h>\n#include <iostream>\n#include <vector>\n#include <utility>\n#include <stdexcept>\n\n#include <isl/aff.h>\n#include <isl/aff_type.h>\n#include <isl/id.h>\n#include <isl/ctx.h>\n#include <isl/flow.h>\n#include <isl/map.h>\n#include <isl/map_type.h>\n#include <isl/space.h>\n#include <isl/ast_build.h>\n#include <isl/schedule.h>\n#include <isl/schedule_node.h>\n#include <isl/val.h>\n#include <isl/polynomial.h>\n\n#include <cJSON/cJSON.h>\n\n#include \"ppcg.h\"\n#include \"schedule.h\"\n#include \"util.h\"\n#include \"autosa_tuning.h\"\n\n#ifdef _DEBUG\n#define D(x) x\n#else\n#define D(x)\n#endif\n\n#if defined(__cplusplus)\nextern \"C\" {\n#endif  \n\n//#define min(a, b) (((a) < (b)) ? (a) : (b))\n//#define max(a, b) (((a) > (b)) ? (a) : (b))\n\n/* If enabled, use the default ISL sink API. */\n//#define ISL_SINK\n/* If enabled, the loop tiling factors should be reversed as well. \n * The tiled point loops will have a reverse order compared to the original loops.\n */\n//#define REVERSE_ORDER\n\nenum autosa_group_access_type\n{\n  AUTOSA_ACCESS_GLOBAL,\n  AUTOSA_ACCESS_LOCAL,\n  AUTOSA_ACCESS_SHARED,\n  AUTOSA_ACCESS_PRIVATE\n};\n\nenum autosa_kernel_stmt_type\n{\n  AUTOSA_KERNEL_STMT_COPY,\n  AUTOSA_KERNEL_STMT_DOMAIN,\n  AUTOSA_KERNEL_STMT_SYNC,\n  AUTOSA_KERNEL_STMT_IO,\n  AUTOSA_KERNEL_STMT_IO_TRANSFER,\n  AUTOSA_KERNEL_STMT_IO_TRANSFER_BUF,\n  AUTOSA_KERNEL_STMT_IO_DRAM,\n  AUTOSA_KERNEL_STMT_FIFO_DECL,\n  AUTOSA_KERNEL_STMT_MODULE_CALL,\n  AUTOSA_KERNEL_STMT_EXT_MODULE,\n  AUTOSA_KERNEL_STMT_DRAIN_MERGE,\n  AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTER_TRANS,\n  AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTRA_TRANS,\n  AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTER_INTRA,\n  AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTRA_INTER,\n  AUTOSA_KERNEL_STMT_IO_MODULE_CALL_STATE_HANDLE,\n  AUTOSA_KERNEL_STMT_HOST_SERIALIZE\n};\n\nenum autosa_dep_type\n{\n  AUTOSA_DEP_RAW,\n  AUTOSA_DEP_RAR,\n  AUTOSA_DEP_WAR,\n  AUTOSA_DEP_WAW,\n  AUTOSA_DEP_UNKNOWN\n};\n\nenum autosa_io_type\n{\n  AUTOSA_INT_IO,\n  AUTOSA_EXT_IO,\n  AUTOSA_UNKNOWN_IO\n};\n\nenum autosa_io_dir\n{\n  IO_IN,\n  IO_OUT,\n  IO_INOUT,\n  IO_NULL,\n  IO_UNKNOWN\n};\n\nenum autosa_module_type\n{\n  PE_MODULE,\n  IO_MODULE,\n  DRAIN_MODULE\n};\n\nenum autosa_group_type\n{\n  AUTOSA_IO_GROUP,\n  AUTOSA_PE_GROUP,\n  AUTOSA_DRAIN_GROUP,\n  AUTOSA_UNKNOWN_GROUP\n};\n\nenum autosa_array_type\n{\n  AUTOSA_EXT_ARRAY,\n  AUTOSA_INT_ARRAY,\n  AUTOSA_UNKNOWN_ARRAY\n};\n\nenum platform\n{\n  INTEL_HW,\n  XILINX_HW,\n  CATAPULT_HW,\n  TAPA_HW\n};\n\nstruct autosa_dep\n{\n  isl_id *src;\n  isl_id *dest;\n  isl_vec *disvec;\n  enum autosa_dep_type type;\n  isl_basic_map *isl_dep;\n\n  /* Iteration domain in scheduling dimensions. */\n  isl_set *src_sched_domain;\n  isl_set *dest_sched_domain;\n};\n\n/* A sequence of \"n\" names of types.\n */\nstruct autosa_types\n{\n  int n;\n  char **name;\n};\n\nstruct autosa_iter\n{\n  char *name;\n  isl_aff *lb;\n  isl_aff *ub;\n  int stride;\n  char *ts_name;\n};\n\n/* Representation of a local variable in a kernel \n */\nstruct autosa_kernel_var\n{\n  struct autosa_array_info *array;\n  enum autosa_group_access_type type;\n  char *name;\n  isl_vec *size;\n  /* Data packing factors */\n  int n_lane;\n  /* Array partition factors */\n  int n_part;\n  /* Needs initialize */\n  int init_required;\n};\n\nstruct autosa_kernel\n{\n  isl_ctx *ctx;\n  isl_schedule *schedule;\n  struct ppcg_scop *scop;\n  struct autosa_prog *prog;\n  struct ppcg_options *options;\n\n  int n_sa_dim;\n  int sa_dim[3];\n  int space_parallel[3];\n  int space_time_id;\n  int array_part_w;\n  int space_w;\n  int time_w;\n  int simd_w;\n  int lat_hide_len;\n\n  int type; // AUTOSA_SA_TYPE_ASYNC | AUTOSA_SA_TYPE_SYNC\n\n  isl_multi_pw_aff *sa_grid_size;\n  /* User specified (array_part/latency_hiding/simd) sizes for each kernel. */\n  isl_union_map *sizes;\n  /* Effectively used (array_part/latency_hiding/simd) sizes for each kernel. */\n  isl_union_map *used_sizes;\n\n  /* Identifier of the kernel. */\n  int id;\n  /* The spaces of the statement domains that form the core computation of the \n   * kernel. \n   */\n  isl_union_set *core;\n  /* The set of possibly accessed outer array elements. */\n  isl_union_set *arrays;\n  /* \"n_array\" is the total number of arrays in the input program and also\n   * the number of elements in the \"array\".\n   * \"array\" contains information about each array that is local to the current\n   * kernel. If an array is not used in a kernel, then the corresponding \n   * entry does not contain any information.\n   */\n  int n_array;\n  struct autosa_local_array_info *array;\n\n  /* \"copy_schdule\" corresponds to the schedule dimensions of the \n   * (tiled) schedule for this kernel that have been taken into account\n   * for computing private/shared memory tiles.\n   * copy_schedule_dim is the dimension of this schedule. \n   */\n  isl_union_pw_multi_aff *copy_schedule;\n  int copy_schedule_dim;\n\n  /* \"space\" is the schedule space of the AST context. That is, it represents\n   * the loops of the generated host code containing the kernel launch. \n   */\n  isl_space *space;\n  isl_ast_node *tree;\n\n  /* Local variables in a kernel. */\n  int n_var;\n  struct autosa_kernel_var *var;\n\n  /* Contains the list of block identifiers for this kernel. */\n  isl_id_list *block_ids;\n  /* Contains the list of thread identifiers for this kernel. */\n  isl_id_list *thread_ids;\n  /* Contains the list of PE identifers for this kernel. */\n  isl_id_list *pe_ids;\n\n  /* Contains constraints on the domain elements in the kernel\n   * that encode the mapping to PE identifiers, where the PE identifiers\n   * are represented by \"space_w\" parameters with the names as the elements\n   * of \"pe_ids\".\n   */\n  isl_union_set *pe_filter;\n\n  /* The first n_grid elements of grid_dim represent the specified size of \n   * the grid.\n   * The first n_block elements of block_dim represent the specified or \n   * effective size of tghe block.\n   * Note that in the input file, the sizes of the grid and the blocks \n   * are specified in the order x, y, z, but internally, the sizes \n   * are stored in reverse order, so that the last elments always referes\n   * to the x dimension.\n   *\n   * grid_size reflects the effective grid size.\n   * grid_size_expr contains a corresponding access AST expression, built within\n   * the context where the launch appears.\n   */\n  int n_grid;\n  int n_block;\n  int grid_dim[2];\n  int block_dim[3];\n\n  isl_multi_pw_aff *grid_size;\n  isl_ast_expr *grid_size_expr;\n\n  /* Contains the values of the parameters and outer schedule dimensions\n   * for which any statement instance in this kernel needs to be executed.\n   */\n  isl_set *context;\n\n  /* Contraction maps those original statement instances to the statement\n   * instances that are active at the point in the schedule tree where \n   * the kernel is created.\n   */\n  isl_union_pw_multi_aff *contraction;\n  /* Contains the original statement instances,\n   * i.e., those that appear in the domains of access relations, \n   * that are involved in the kernel. \n   */\n  isl_union_set *expanded_domain;\n  isl_union_set *domain;\n\n  isl_set *host_domain;\n  int single_statement;  \n\n  /* Data structures for block sparsity.\n   * vec_len is the vector length of each sparse block.\n   * n_nzero is the number of non-zero elements in the block.\n   * compress_ratio is calculated as vec_len / n_nzero.\n   * Each sparse block is stored as [data, data, offset]\n   * The offset is a 8-bit long unsigned char that stores a mask that \n   * indicating the the position of non-zero elements.\n   * This block is also padded to align with 32/128/256/512-bit boundary \n   * as required by Xilinx HLS.\n   * n_meta_data stores the size of the padded elements plus the offset together \n   * counted in terms of the size of the data elements. \n   * effective_compress_ratio is calculated as vec_len / (n_nzero + n_meta_data).\n   */\n  int sparse;\n  int vec_len;\n  int n_nzero;\n  float compress_ratio;\n  int n_meta_data;\n  float eff_compress_ratio;\n\n  /* Tuning program */\n  TuningProgram *tuning_program;\n};\n\nstruct autosa_io_info\n{\n  enum autosa_io_type io_type;\n  struct autosa_dep *dep;\n  isl_vec *dir;\n  /* Old data transfer direction before interior I/O elimination */\n  isl_vec *old_dir;  \n};\n\n/* An access to an outer array element or an iterator.\n * Accesses to iterators have an access relation that maps to an unnamed space.\n * An access may be both read and write.\n * If the access relation is empty, then the output dimension may\n * not be equal to the dimension of the corresponding array.\n */\nstruct autosa_stmt_access\n{\n  /* Access reads elements */\n  int read;\n  /* Access writes elements */\n  int write;\n  /* All writes are definite writes. */\n  int exact_write;\n  /* Is a single, fixed element being accessed? */\n  isl_bool fixed_element;\n  /* The number of index expressions specified in the access. */\n  int n_index;\n\n  /* May access relation */\n  isl_map *access;\n  /* May access relation with as domain a mapping from iteration domain\n\t * to a reference identifier.\n\t */\n  isl_map *tagged_access;\n  /* The reference id of the corresponding pet_expr. */\n  isl_id *ref_id;\n\n  /* AutoSA extended */\n  struct autosa_io_info **io_info;\n  int n_io_info;\n  /* Indicates if layout transformation is required for SIMD */\n  int layout_trans;\n  /* Indicates which array dimension should be permuted innmermost for SIMD */\n  int simd_dim;\n  /* Indicates the stride pattern under the SIMD loop.\n   * Default value as -1. 0 if stride-0 and 1 if stride-1 */\n  int simd_stride;\n  /* AutoSA extended */\n\n  struct autosa_stmt_access *next;\n};\n\n/* Internal data structure for extract_access.\n * \"next_access\" points to the end of a linked list that is extended\n * by extract_access.\n * \"single_expression\" is set if the access expressions belong to\n * an expression statement (i.e., a statement without internal control).\n * \"any_to_outer\" maps all intermediate arrays to their outer arrays.\n */\nstruct ppcg_extract_access_data\n{\n  struct autosa_stmt_access **next_access;\n  int single_expression;\n  isl_union_map *any_to_outer;\n};\n\n/* A representation of a user statement.\n * \"stmt\" points to the corresponding pet statement.\n * \"id\" is the identifier of the instance set of the statement.\n * \"accesses\" is a linked list of accesses performed by the statement.\n * If the statement has been killed, i.e., if it will not be scheduled,\n * then this linked list may be empty even if the actual statement does\n * perform accesses.\n */\nstruct autosa_stmt\n{\n  isl_id *id;\n  struct pet_stmt *stmt;\n\n  struct autosa_stmt_access *accesses;\n};\n\n/* Represents an outer array possibly accessed by a autosa_prog.\n */\nstruct autosa_array_info\n{\n  /* The array data space. */\n  isl_space *space;\n  /* Element type. */\n  char *type;\n  /* Element size. */\n  int size;\n  /* Name of the array. */\n  char *name;\n  /* Declared extent of original array. */\n  isl_set *declared_extent;\n  /* AST expression for declared size of original array. */\n  isl_ast_expr *declared_size;\n  /* Extent of the array that needs to be copied. */\n  isl_set *extent;\n  /* Number of indices. */\n  unsigned n_index;\n  /* For each index, a bound on \"extent\" in that direction. */\n  isl_multi_pw_aff *bound;\n  /* The corresponding access AST expression, if the array needs\n\t * to be allocated on the device.\n\t */\n  isl_ast_expr *bound_expr;\n\n  /* All references to this array; point to elements of a linked list. */\n  int n_ref;\n  struct autosa_stmt_access **refs;\n\n  /* Is this array accessed at all by the program? */\n  int accessed;\n\n  /* Is this a scalar that is read-only within the entire program? */\n  int read_only_scalar;\n\n  /* Are the elements of the array structures? */\n  int has_compound_element;\n\n  /* Are the elements only accessed through constant index expressions? */\n  int only_fixed_element;\n\n  /* Is the array local to the scop? */\n  int local;\n  /* Is the array local and should it be declared on the host? */\n  int declare_local;\n\n  /* Is the corresponding global device memory accessed in any way? */\n  int global;\n\n  /* Should the array be linearized? */\n  int linearize;\n\n  /* Order dependences on this array.\n\t * Only used if live_range_reordering option is set.\n\t * It is set to NULL otherwise.\n\t */\n  isl_union_map *dep_order;\n\n  /* AutoSA Extended */\n  int n_lane;\n  /* Since in AutoSA, we only a single kernel, \n   * the \"local_array\" is safely pointed to the local array inside the kernel.\n   */\n  struct autosa_local_array_info *local_array;\n  /* Is the array to be copied in to the device memory? */\n  int copy_in;\n  /* Is the array to be copied out from the device memory? */\n  int copy_out;\n  /* Tuning array refs */\n  std::vector<std::shared_ptr<TPArrayRef>> tuning_refs;\n  /* AutoSA Extended */\n};\n\nstruct autosa_io_buffer\n{\n  /* The local buffer tile, NULL if none. */\n  struct autosa_array_tile *tile;\n  /* The buffer is located at io_L\"level\". */\n  int level;\n  /* The data packing factor */\n  int n_lane;\n  /* Is the buffer data serialzied at the host size. */\n  int serialize;\n  /* Is the buffer data sparse */\n  int sparse;\n  int vec_len;\n  /* Tuning array tile */\n  TPArrayTile *tuning_tile;\n  /* Used for hoisting buffer */\n  int hoist_depth;\n  isl_union_set *hoist_domain;\n};\n\n/* A group of array references in a kernel that should be handled together. \n */\nstruct autosa_array_ref_group\n{\n  /* The references in this group access this local array. */\n  struct autosa_local_array_info *local_array;\n  /* This is the corresponding array. */\n  struct autosa_array_info *array;\n  /* Position of this group in the list of reference group of array. */\n  int nr;\n\n  /* The following fields are use during the construction of the groups.\n\t * access is the combined access relation relative to the private\n\t * memory tiling.  In particular, the domain of the map corresponds\n\t * to the first thread_depth dimensions of the kernel schedule.\n\t * write is set if any access in the group is a write.\n\t * exact_write is set if all writes are definite writes.\n\t * slice is set if there is at least one access in the group\n\t * that refers to more than one element\n\t * \"min_depth\" is the minimum of the tile depths and thread_depth.\n\t */\n  isl_map *access;\n  int write;\n  int exact_write;\n  int slice;\n  int min_depth;\n\n  /* The shared memory tile, NULL if none. */\n  struct autosa_array_tile *shared_tile;\n\n  /* The private memory tile, NULL if none. */\n  struct autosa_array_tile *private_tile;\n\n  /* The local memory tile, NULL if none. */\n  struct autosa_array_tile *local_tile;\n\n  /* References in this group; point to elements of a linked list. */\n  int n_ref;\n  struct autosa_stmt_access **refs;\n\n  /* AutoSA Extended */\n  /* The local memory tile inside PEs. This is for internal array with interior I/O */\n  struct autosa_array_tile *pe_tile;\n  /* I/O buffers inserted at each IO level */\n  struct autosa_io_buffer **io_buffers;\n  int n_io_buffer;\n  /* I/O type: interior/exterior I/O */\n  enum autosa_io_type io_type;\n  /* I/O direction at the PE level (after interior I/O elimination) */\n  isl_vec *dir;\n  /* I/O direction at the PE level (before interior I/O elimination) */\n  isl_vec *old_dir;\n  /* Group type: I/O/drain/PE group */\n  enum autosa_group_type group_type;\n  /* I/O direction at the PE level */\n  enum autosa_io_dir pe_io_dir;\n  /* I/O direction at the array level */\n  enum autosa_io_dir array_io_dir;\n  /* Maps PE identifiers to I/O identifiers */\n  isl_multi_aff *io_trans;    /* pe ids -> io ids */\n  isl_multi_aff *io_L1_trans; /* pe ids -> L1 io ids */\n  /* AST expression maps L1 I/O identifiers to PE identifiers */\n  isl_ast_expr *io_pe_expr;    /* io ids -> pe ids */\n  isl_ast_expr *io_L1_pe_expr; /* L1 io ids -> pe ids */\n  isl_ast_expr *io_pe_expr_boundary;\n  isl_ast_expr *io_L1_pe_expr_boundary;\n  /* I/O schedule */\n  isl_schedule *io_schedule;\n  isl_schedule *io_L1_schedule;\n  isl_schedule *io_L1_lower_schedule;\n  /* Number of I/O levels */\n  int io_level;\n  /* Dims of space band */\n  int space_dim;\n  /* Data pack factor inside PEs */\n  int n_lane;\n  /* Copy schedule for PE group */\n  int copy_schedule_dim;\n  isl_union_pw_multi_aff *copy_schedule;\n  /* Number of DRAM ports that this group is connected. */\n  int n_mem_ports;\n  /* The starting offset of external memory port id for this group. */\n  int mem_port_id;\n  /* Does copy-in module exist? */\n  int copy_in;\n  /* Does copy-out module exist? */\n  int copy_out;\n  /* Attached drain group */\n  struct autosa_array_ref_group *attached_drain_group;  \n  /* Tuning array refs */\n  std::vector<std::shared_ptr<TPArrayRef>> tuning_refs;\n  TPArrayTile *tuning_pe_tile;\n  TPArrayTile *tuning_local_tile;\n  /* AutoSA Extended */\n};\n\nstruct autosa_array_ref_group_pair\n{\n  struct autosa_array_ref_group *local_group;\n  struct autosa_array_ref_group *io_group;\n  struct autosa_array_tile *local_tile; /* Compute the local tile */\n  int in_use;\n  isl_map *tagged_access;\n  int simd_depth;\n};\n\n/* Represents an outer array accessed by a autosa_kernel, localized\n * to the context of this kernel.\n *\n * \"array\" points to the corresponding array in the autosa_prog.\n * The \"n_group\" \"groups\" are the reference groups associated to the array.\n * If \"force_private\" is set, then the array (in practice a scalar)\n * must be mapped to a register.\n * \"global\" is set if the global device memory corresponding\n * to this array is accessed by the kernel.\n * \"bound\" is equal to array->bound specialized to the current kernel.\n * \"bound_expr\" is the corresponding access AST expression.\n */\nstruct autosa_local_array_info\n{\n  struct autosa_array_info *array;\n\n  /* PE groups */\n  int n_pe_group;\n  struct autosa_array_ref_group **pe_groups;\n\n  /* IO groups */\n  int n_io_group;\n  struct autosa_array_ref_group **io_groups;\n\n  /* Drain groups */\n  struct autosa_array_ref_group *drain_group;\n\n  /* Number of different I/O modules that access the array.\n   * Due to the limitation of Xilinx HLS, we will need to \n   * allocate separater pointers for each group. \n   */\n  int n_io_group_refs;\n  /* Number of external memory ports that this array is allocated. */\n  int n_mem_ports;\n  /* Map from io_group_ref to mem_port. */  \n  std::vector<int> group_ref_mem_port_map;  \n\n  /* Default groups */\n  int n_group;\n  struct autosa_array_ref_group **groups;\n\n  /* Is array serialized at the host side. */\n  int host_serialize;\n  isl_pw_qpolynomial *serialize_bound;\n\n  enum autosa_array_type array_type;\n  int n_lane;\n\n  int force_private;\n  int global;\n\n  unsigned n_index;\n  isl_multi_pw_aff *bound;\n  isl_ast_expr *bound_expr;\n\n  /* Is this the sparse matrix in the block sparsity */\n  int is_sparse;\n  int vec_len;\n  int n_nzero;\n  float compress_ratio;\n  int n_meta_data;\n  float eff_compress_ratio;\n};\n\n/* \"read\" and \"write\" contain the original access relations, possibly \n * involving member accesses.\n * \n * The elements of \"array\", as well as the ranges of \"copy_in\" and \"copy_out\"\n * only refer to the outer arrays of any possible member accesses.\n */\nstruct autosa_prog\n{\n  isl_ctx *ctx;\n\n  struct ppcg_scop *scop;\n\n  /* Set of parameter values */\n  isl_set *context;\n\n  /* All potential read accesses in the entire program */\n  isl_union_map *read;\n\n  /* All potential write accesses in the entire program */\n  isl_union_map *may_write;\n  /* All definite write accesses in the entire program */\n  isl_union_map *must_write;\n  /* All tagged definite kills in the entire program */\n  isl_union_map *tagged_must_kill;\n\n  /* The set of inner array elements that may be preserved. */\n  isl_union_set *may_persist;\n\n  /* A mapping from all innermost arrays to their outer arrays. */\n  isl_union_map *to_outer;\n  /* A mapping from all the outer arrays to all corresponding inner arrays */\n  isl_union_map *to_inner;\n  /* A mapping from all intermediate arrays to their outer arrays,\n\t * including an identity mapping from the anonymous 1D space to itself.\n\t */\n  isl_union_map *any_to_outer;\n\n  /* Order dependences on non-scalars. */\n  isl_union_map *array_order;\n\n  /* Array of statements */\n  int n_stmts;\n  struct autosa_stmt *stmts;\n\n  int n_array;\n  struct autosa_array_info *array;  \n};\n\nstruct autosa_hw_top_module\n{\n  int n_fifo_decls;\n  int n_module_calls;\n  isl_schedule **fifo_decl_scheds;\n  isl_schedule **module_call_scheds;\n  isl_ast_node **fifo_decl_trees;\n  isl_ast_node **module_call_trees;\n  char **fifo_decl_names;\n\n  /* Wrapped AST */\n  int n_fifo_decl_wrapped;\n  int n_module_call_wrapped;\n  isl_ast_node **fifo_decl_wrapped_trees;\n  isl_ast_node **module_call_wrapped_trees;\n\n  int n_hw_modules;\n  struct autosa_hw_module **hw_modules;\n  struct autosa_kernel *kernel;\n\n  /* For Intel devices */\n  int n_ext_module;\n  isl_schedule **ext_module_scheds;\n  isl_ast_node **ext_module_trees;\n  int n_ext_module_wrapped;\n  isl_ast_node **ext_module_wrapped_trees;\n};\n\nstruct autosa_pe_dummy_module\n{\n  struct autosa_hw_module *module;\n  struct autosa_array_ref_group *io_group;\n  isl_schedule *sched;\n  isl_ast_node *tree;\n  isl_ast_node *device_tree;\n  int in;\n};\n\nstruct autosa_drain_merge_func\n{\n  struct autosa_array_ref_group *group;\n  struct autosa_kernel *kernel;\n  isl_id_list *inst_ids;\n  isl_schedule *sched;\n  isl_ast_node *tree;\n  isl_ast_node *device_tree;\n};\n\nstruct autosa_hw_module\n{\n  struct ppcg_options *options;\n\n  enum autosa_module_type type;\n  /* Module name */\n  char *name;\n\n  isl_id_list *inst_ids;\n  int n_var;\n  struct autosa_kernel_var *var;\n\n  /* Module function schedule */\n  isl_schedule *sched;\n\n  /* Module function AST */\n  isl_ast_node *tree;\n  isl_ast_node *device_tree;\n\n  /* Array reference group for I/O or drain module */\n  struct autosa_array_ref_group **io_groups;\n  int n_io_group;\n\n  /* I/O module level */\n  int level;\n  /* I/O module copy-in/out */\n  int in;\n  /* Connect to external memory */\n  int to_mem;\n  /* Connect to PE */\n  int to_pe;\n  /* Contains buffer */\n  int is_buffer;\n  /* Filter module */\n  int is_filter;\n  /* Is the DRAM data serialized */\n  int is_serialized;\n\n  /* Serialization schedule */\n  isl_schedule *serialize_sched;\n  isl_ast_node *serialize_tree;\n\n  /* Module function schedule for buffer_filter modules */\n  isl_schedule *outer_sched; /* Outer loops */\n  isl_schedule *inter_sched; /* Inter transfer */\n  isl_schedule *intra_sched; /* Intra transfer */\n\n  isl_schedule *boundary_outer_sched; /* Outer loops in boundary module */\n  isl_schedule *boundary_inter_sched; /* Inter transfer in boundary module */\n\n  isl_space *inter_space;\n  isl_space *intra_space;\n  isl_space *space;\n\n  isl_ast_node *inter_tree;\n  isl_ast_node *intra_tree;\n\n  isl_ast_node *boundary_outer_tree;\n  isl_ast_node *boundary_inter_tree;\n\n  /* Module function schedule for filter modules at the boundary */\n  isl_schedule *boundary_sched;\n  isl_ast_node *boundary_tree;\n  int boundary;\n\n  /* Dummy modules for collecting data at boundary PEs */\n  int n_pe_dummy_modules;\n  struct autosa_pe_dummy_module **pe_dummy_modules;\n\n  int double_buffer;\n\n  /* Generate credit control */\n  int credit;\n\n  /* Data pack factor */\n  int data_pack_inter;\n  int data_pack_intra;\n  int data_pack_serialize;\n\n  /* For I/O module, local array ref index */  \n  int n_array_ref;\n\n  /* Coalesce bound\n   * Used for I/O module that connects to the DRAM. \n   * Indicates the loop extent of the memory coalesce loop.\n   */\n  int coalesce_bound;\n\n  /* The module uses FF to implement arrays. */\n  int use_FF;\n\n  struct autosa_kernel *kernel;\n\n  /* For Catapult HLS */\n  /* Pipeline the whole function. */\n  int pipeline_at_default_func;\n  int pipeline_at_filter_func[3]; // outer, intra, inter  \n  /* Fifo guards information. */\n  int n_fifo_serialize;\n  char** fifo_names_serialize;\n  isl_pw_qpolynomial **fifo_bounds_serialize;\n  int n_fifo_default;\n  char **fifo_names_default;\n  isl_pw_qpolynomial **fifo_bounds_default;  \n  int n_fifo_inter;\n  char **fifo_names_inter;\n  isl_pw_qpolynomial **fifo_bounds_inter;  \n  int n_fifo_intra;\n  char **fifo_names_intra;\n  isl_pw_qpolynomial **fifo_bounds_intra;  \n\n  /* Tuning purpose */\n  /* Latency */\n  isl_schedule *tuning_sched;\n  isl_schedule *tuning_outer_sched;\n  isl_schedule *tuning_inter_sched;\n  isl_schedule *tuning_intra_sched;  \n\n  isl_ast_node *tuning_tree;\n  isl_ast_node *tuning_device_tree;  \n  isl_ast_node *tuning_intra_tree;\n  isl_ast_node *tuning_inter_tree;  \n  \n  /* Counting module numbers */\n  isl_schedule *tuning_num_sched;\n  isl_schedule *tuning_num_outer_sched;\n  isl_schedule *tuning_num_inter_sched;\n  isl_schedule *tuning_num_intra_sched;  \n\n  isl_ast_node *tuning_num_tree;\n  isl_ast_node *tuning_num_device_tree;  \n  isl_ast_node *tuning_num_intra_tree;\n  isl_ast_node *tuning_num_inter_tree;\n};\n\nstruct autosa_gen\n{\n  isl_ctx *ctx;\n  struct ppcg_options *options;\n\n  /* Callback for printing of AST in appropriate format. */\n  __isl_give isl_printer *(*print)(__isl_take isl_printer *p,\n                                   struct autosa_prog *prog, __isl_keep isl_ast_node *tree,\n                                   struct autosa_hw_module **modules, int n_modules,\n                                   struct autosa_hw_top_module *top_module,\n                                   struct autosa_drain_merge_func **drain_merge_funcs, int n_drain_merge_funcs,\n                                   struct autosa_types *types, void *user);\n  void *print_user;\n\n  struct autosa_prog *prog;\n  struct autosa_kernel *kernel;\n  /* The default AST */\n  isl_ast_node *tree;\n\n  /* The default schedule */\n  isl_schedule *schedule;\n\n  /* The SA module schedule */\n  struct autosa_hw_module **hw_modules;\n  int n_hw_modules;\n  struct autosa_hw_top_module *hw_top_module;\n  struct autosa_drain_merge_func **drain_merge_funcs;\n  int n_drain_merge_funcs;\n\n  /* The sequence of types for which a definition has been printed. */\n  struct autosa_types types;\n\n  /* User specified tile sizes for each kernel. */\n  isl_union_map *sizes;\n\n  /* Effectively used tile sizes for each kernel. */\n  isl_union_map *used_sizes;\n\n  /* Identifier of the next kernel. */\n  int kernel_id;\n\n  /* Tuning configuration */\n  cJSON *tuning_config;\n\n  /* Tuning programs */\n  std::vector<TuningProgram *> tuning_progs;\n};\n\n/* Representation of special statements, in particular copy statements\n * ,__syncthreads statements, and I/O statements, inside a kernel.\n *\n * type represents the kind of statement\n *\n * for autosa_kernel_copy statements we have\n *\n * read is set if the statement should copy data from global memory\n * to shared memory or registers.\n *\n * index expresses an access to the array element that needs to be copied\n * local_index expresses the corresponding element in the tile\n *\n * array refers to the original array being copied\n * local_array is a pointer to the appropriate element in the \"array\"\n *\tarray of the autosa_kernel to which this copy access belongs\n *\n *\n * for autosa_kernel_domain statements we have\n *\n * stmt is the corresponding input statement\n *\n * n_access is the number of accesses in stmt\n * access is an array of local information about the accesses\n *\n * for autosa_kernel_io statements we have\n *\n * in is set if the statement should read data from fifo \n * to local array or registers.\n *\n * local_index expresses the corresponding element in the tile\n *\n * array refers to the original array being transferred\n * local_array is a pointer to the appropriate element in the \"array\"\n *  array of the autosa_kernel to which this copy access belongs\n */\nstruct autosa_kernel_stmt\n{\n  enum autosa_kernel_stmt_type type;\n\n  union {\n    struct\n    {\n      int read;\n      isl_ast_expr *index;\n      isl_ast_expr *local_index;\n      struct autosa_array_info *array;\n      struct autosa_local_array_info *local_array;\n    } c;\n    struct\n    {\n      struct autosa_stmt *stmt;\n      isl_id_to_ast_expr *ref2expr;\n    } d;\n    struct\n    {\n      int in;\n      int buf;\n      //int filter;\n      //int lower;\n      int boundary;\n      int dummy;\n      int serialize;\n      int reduce;\n      char *in_fifo_name;\n      char *out_fifo_name;\n      char *fifo_type;\n      char *reduce_op;\n      int filter_sched_depth;      \n      int filter_param_id;\n      int data_pack;\n      int reg;\n      int nxt_data_pack;\n      isl_ast_expr *local_index;\n      isl_ast_expr *index;\n      int coalesce_depth;\n      int coalesce_bound;\n      struct autosa_array_info *array;\n      struct autosa_local_array_info *local_array;\n      struct autosa_array_ref_group *group;\n      struct autosa_hw_module *module;      \n      int simd_depth;\n      int if_depth;\n    } i;\n    struct\n    {\n      struct autosa_hw_module *module;\n      struct autosa_pe_dummy_module *pe_dummy_module;\n      struct autosa_array_ref_group *group;\n      int boundary;\n      int dummy;\n      int upper;\n      int lower;\n      int lower_sched_val;\n      int serialize;\n      char *module_name;\n    } m;\n    struct\n    {\n      struct autosa_hw_module *module;\n      int boundary;\n    } f;\n    struct\n    {\n      struct autosa_drain_merge_func *func;\n      isl_ast_expr *index;\n    } dm;\n    struct\n    {\n      isl_ast_expr *index;\n      struct autosa_array_ref_group *group;\n      int in;\n    } s;\n  } u;\n};\n\nstruct autosa_acc\n{\n  isl_map *tagged_map;\n  isl_map *map;\n  isl_space *id;\n\n  int rw; // 0 - read 1 - write\n};\n\nstruct autosa_node_band_prop\n{\n  int permutable;\n  int *coincident;\n  enum autosa_loop_type *pe_opt;\n  enum autosa_loop_type *space_time;\n  int *sched_pos;\n  void *iter[20];\n  int n_member;\n  isl_multi_union_pw_aff *mupa;\n};\n\nstruct autosa_ast_node_userinfo\n{\n  int is_pipeline;\n  int is_unroll;\n  int is_outermost_for;\n  int is_infinitize_legal;\n  int is_first_infinitizable_loop;\n  int is_dep_free;  \n  int n_coalesce_loop;\n  /* Temporary variable used in AST traversal. */\n  bool visited;\n  /* Variables for Catapult codegen. */\n  int is_guard_start;\n  int is_guard_end;\n  int n_fifo;\n  char **fifo_names;\n  isl_pw_qpolynomial **bounds;\n  int double_buffer;\n  int inter;\n  int read;\n  char *module_name;\n  char *buf_name;\n};\n\n/* The current index is such that if you add \"shift\",\n * then the result is always a multiple of \"stride\",\n * where \"stride\" may be equal to 1.\n * Let D represent the initial tile->depth dimensions of the computed schedule.\n * The spaces of \"lb\" and \"shift\" are of the form\n *\n *\tD -> [b]\n */\nstruct autosa_array_bound\n{\n  isl_val *size;\n  isl_aff *lb;\n\n  isl_val *stride;\n  isl_aff *shift;\n};\n\n/* A tile of an outer array.\n *\n * requires_unroll is set if the schedule dimensions that are mapped\n * to threads need to be unrolled for this (private) tile to be used.\n *\n * \"depth\" reflects the number of schedule dimensions that affect the tile.\n * The copying into and/or out of the tile is performed at that depth.\n *\n * n is the dimension of the array.\n * bound is an array of size \"n\" representing the lower bound\n *\tand size for each index.\n *\n * tiling maps a tile in the global array to the corresponding\n * local memory tile and is of the form\n *\n *\t{ [D[i] -> A[a]] -> T[(a + shift(i))/stride - lb(i)] }\n *\n * where D represents the initial \"depth\" dimensions\n * of the computed schedule.\n */\nstruct autosa_array_tile\n{\n  isl_ctx *ctx;\n  int requires_unroll;\n  int depth;\n  int n;\n  struct autosa_array_bound *bound;\n  isl_multi_aff *tiling;\n};\n\nstruct hls_info\n{\n  FILE *host_c;    /* OpenCL host. */\n  FILE *host_h;    /* OpenCL host header. */\n  FILE *kernel_c;  /* Definition of hardware modules. */\n  FILE *kernel_h;  /* Declaration of hardware modules. */\n  FILE *top_gen_c; /* Prints out the top module that connects the hardware modules. */\n  FILE *top_gen_h;\n  FILE *tcl;       /* Catapult TCL. */\n\n  enum platform target;\n  int hls;          /* Generate HLS host instead of OpenCL host */\n  char *output_dir; /* Output directory */\n  char *kernel_prefix; /* Kernel file prefix */\n  isl_ctx *ctx;  \n  bool hcl; /* Sets to true if the generated code is integrated with HeteroCL. */\n  FILE *hcl_decl;\n};\n\n/* Band node */\n__isl_give isl_multi_val *construct_band_tile_sizes(\n    __isl_keep isl_schedule_node *node, int *tile_size);\nstruct autosa_node_band_prop *extract_node_band_prop(__isl_keep isl_schedule_node *node);\nstruct autosa_node_band_prop *autosa_node_band_prop_free(\n    __isl_take struct autosa_node_band_prop *prop);\nisl_bool is_permutable_node(__isl_keep isl_schedule_node *node);\nisl_bool has_single_permutable_node(__isl_keep isl_schedule *schedule);\nisl_bool is_dep_uniform_at_node(__isl_keep isl_schedule_node *node, void *user);\nisl_bool is_dep_uniform(__isl_keep isl_basic_map *bmap, void *user);\nisl_bool is_dep_uniform_wrap(__isl_keep isl_map *map, void *user);\nisl_bool uniform_dep_check(__isl_keep isl_schedule *schedule, struct ppcg_scop *scop);\n__isl_give isl_vec *get_dep_dis_at_schedule(__isl_keep isl_basic_map *dep,\n                                            __isl_keep isl_schedule *schedule);\n__isl_give isl_vec *get_dep_dis_at_node(__isl_keep isl_basic_map *dep,\n                                        __isl_keep isl_schedule_node *band);\n//__isl_give isl_schedule *loop_interchange_at_node(\n//    __isl_take isl_schedule_node *node, isl_size level1, isl_size level2);\n__isl_give isl_schedule_node *loop_interchange_at_node(\n    __isl_take isl_schedule_node *node, isl_size level1, isl_size level2);\n__isl_give isl_schedule_node *get_outermost_permutable_node(\n    __isl_keep isl_schedule *schedule);\n__isl_give isl_schedule_node *get_innermost_permutable_node(\n    __isl_keep isl_schedule *schedule);\n__isl_give isl_schedule_node *tile_band(\n    __isl_take isl_schedule_node *node, __isl_take isl_multi_val *sizes);\n__isl_give isl_schedule_node *autosa_tile_band(\n    __isl_take isl_schedule_node *node, __isl_keep int *sizes);\n__isl_give isl_schedule_node *autosa_node_band_tile_loop(\n    __isl_take isl_schedule_node *node, int tile_size, int pos);\n__isl_give isl_schedule_node *clear_pe_opt_prop(\n    __isl_take isl_schedule_node *node, void *user);\n__isl_give isl_schedule_node *restore_node_band_prop(\n    __isl_take isl_schedule_node *node,\n    __isl_take struct autosa_node_band_prop *prop);\n__isl_give isl_schedule_node *autosa_node_interchange(\n    __isl_take isl_schedule_node *node);\n__isl_give isl_schedule_node *autosa_node_interchange_up(\n    __isl_take isl_schedule_node *node);\nisl_bool no_permutable_node(__isl_keep isl_schedule_node *node, void *user);\nisl_bool all_parallel_node(__isl_keep isl_schedule_node *node, void *user);\n//isl_bool isl_schedule_node_is_io_mark(__isl_keep isl_schedule_node *node, int io_level);\nint is_node_under_simd(__isl_keep isl_schedule_node *node);\nint is_node_under_latency(__isl_keep isl_schedule_node *node);\nint *extract_band_upper_bounds(__isl_keep isl_schedule_node *node);\n__isl_give isl_union_set *set_schedule_eq(\n    __isl_keep isl_schedule_node *node, __isl_keep isl_id_list *names);\n__isl_give isl_union_set *set_schedule_neq(\n    __isl_keep isl_schedule_node *node, __isl_keep isl_id_list *names);    \nisl_bool is_flow_dep_carried_by_array_part_loops(__isl_keep isl_schedule *schedule,\n                                                 struct autosa_array_ref_group *group, struct autosa_kernel *kernel);\n__isl_give isl_schedule_node *reorder_band_by_dep_dis(__isl_take isl_schedule_node *node);\n__isl_give isl_schedule_node *sched_pos_setup(__isl_take isl_schedule_node *node);\nint get_band_single_schedule_val(__isl_keep isl_schedule_node *node);\nint get_last_sched_dim_val(__isl_keep isl_schedule_node *node);\n__isl_give isl_schedule_node *autosa_atomic_ancestors(__isl_take isl_schedule_node *node);\nint is_dep_carried_by_node(__isl_keep isl_basic_map *dep, __isl_keep isl_schedule_node *node);\n__isl_give isl_schedule_node *autosa_node_sink_to_depth(__isl_take isl_schedule_node *node, int depth);\n__isl_give isl_schedule_node *autosa_node_sink_to_mark(__isl_take isl_schedule_node *node, const char *name);\nint is_marked(__isl_keep isl_schedule_node *node, const char *name);\n\n/* Schedule */\n__isl_give isl_schedule *compute_schedule(struct autosa_gen *gen);\n__isl_give isl_schedule *get_schedule(struct autosa_gen *gen);\n__isl_give isl_schedule *merge_outer_bands(__isl_give isl_schedule *schedule, struct autosa_gen *gen);\n\n/* AutoSA kernel */\nvoid *autosa_kernel_free(struct autosa_kernel *kernel);\nstruct autosa_kernel *autosa_kernel_copy(struct autosa_kernel *kernel);\nstruct autosa_kernel *autosa_kernel_from_schedule(__isl_take isl_schedule *schedule);\nstruct autosa_kernel *autosa_kernel_alloc(isl_ctx *ctx, struct ppcg_scop *scop);\n\n/* AutoSA access */\nisl_bool access_is_stride_zero(__isl_keep isl_map *access, int pos);\nisl_bool access_is_stride_one(__isl_keep isl_map *access, int pos);\nvoid *autosa_acc_free(struct autosa_acc *acc);\nstruct autosa_io_buffer *autosa_io_buffer_alloc();\n\n/* AutoSA dep */\nvoid *autosa_dep_free(__isl_take struct autosa_dep *dep);\n\n/* AutoSA iterator */\nstruct autosa_iter *autosa_iter_free(struct autosa_iter *iter);\n\n/* AutoSA array */\nisl_stat collect_array_info(struct autosa_prog *prog);\nint autosa_array_is_read_only_scalar(struct autosa_array_info *array);\nint autosa_array_is_scalar(struct autosa_array_info *array);\nint autosa_kernel_requires_array_argument(struct autosa_kernel *kernel, int i);\nstruct autosa_array_ref_group *autosa_array_ref_group_free(\n    struct autosa_array_ref_group *group);\nstruct autosa_array_ref_group *autosa_array_ref_group_init(\n    struct autosa_array_ref_group *group);\nstruct autosa_array_tile *autosa_array_tile_free(struct autosa_array_tile *tile);\nstruct autosa_array_tile *autosa_array_tile_create(isl_ctx *ctx, int n_index);\n__isl_give isl_val *autosa_array_tile_size(struct autosa_array_tile *tile);\n\n/* AutoSA statement */\nstruct autosa_stmt *extract_stmts(isl_ctx *ctx, struct ppcg_scop *scop,\n                                  __isl_keep isl_union_map *any_to_outer);\nvoid autosa_kernel_stmt_free(void *user);\nstruct autosa_stmt *find_stmt(struct autosa_prog *prog, __isl_keep isl_id *id);\n\n/* AutoSA prog */\nstruct autosa_prog *autosa_prog_alloc(isl_ctx *ctx, struct ppcg_scop *scop);\nvoid *autosa_prog_free(struct autosa_prog *prog);\n\n/* AutoSA hw module */\nstruct autosa_hw_module *autosa_hw_module_alloc(struct autosa_gen *gen);\nvoid *autosa_hw_module_free(struct autosa_hw_module *module);\nstruct autosa_hw_top_module *autosa_hw_top_module_alloc();\nvoid *autosa_hw_top_module_free(struct autosa_hw_top_module *module);\nstruct autosa_pe_dummy_module *autosa_pe_dummy_module_alloc();\nvoid *autosa_pe_dummy_module_free(struct autosa_pe_dummy_module *module);\nstruct autosa_drain_merge_func *autosa_drain_merge_func_alloc(struct autosa_gen *gen);\nvoid *autosa_drain_merge_func_free(struct autosa_drain_merge_func *func);\n\n/* AutoSA AST node */\nstruct autosa_ast_node_userinfo *alloc_ast_node_userinfo();\nvoid free_ast_node_userinfo(void *ptr);\n\n/* AutoSA PE opt */\n__isl_give isl_set *extract_sa_sizes(__isl_keep isl_union_map *sizes,\n                                     const char *type);\nint *read_hbm_tile_sizes(struct autosa_kernel *kernel, int tile_len, char *name);\nint *read_default_hbm_tile_sizes(struct autosa_kernel *sa, int tile_len);\nint *read_array_part_tile_sizes(struct autosa_kernel *kernel, int tile_len);\nint *read_default_array_part_tile_sizes(struct autosa_kernel *kernel, int tile_len);\nint *read_latency_tile_sizes(struct autosa_kernel *kernel, int tile_len);\nint *read_default_latency_tile_sizes(struct autosa_kernel *kernel, int tile_len);\nint *read_simd_tile_sizes(struct autosa_kernel *kernel, int tile_len);\nint *read_default_simd_tile_sizes(struct autosa_kernel *kernel, int tile_len);\nint read_space_time_kernel_id(__isl_keep isl_union_map *sizes);\nint *read_array_part_L2_tile_sizes(struct autosa_kernel *kernel, int tile_len);\nint *read_default_array_part_L2_tile_sizes(struct autosa_kernel *kernel, int tile_len);\nint *read_data_pack_sizes(__isl_keep isl_union_map *sizes, int tile_len);\nint *read_data_pack_sizes_array(__isl_keep isl_union_map *sizes, char *name);\nint read_mem_port_map(__isl_keep isl_union_map *port_map, char *name);\n\n/* AutoSA latency and resource estimation */\nisl_stat sa_extract_loop_info(struct autosa_gen *gen, struct autosa_hw_module *module);\nisl_stat sa_extract_array_info(struct autosa_kernel *kernel);\nint extract_memory_type(struct autosa_hw_module *module,\n                        struct autosa_kernel_var *var, int uram);\nisl_stat sa_extract_design_info(struct autosa_gen *gen);\n\n/* Tuning program */\nisl_stat TP_extract_loop_info(struct autosa_gen *gen, struct autosa_hw_module *module);\nisl_stat TP_extract_resource_info(struct autosa_gen *gen, struct autosa_hw_module *module);\nisl_stat TP_extract_module_attr(struct autosa_gen *gen, struct autosa_hw_module *module);\nisl_stat TP_extract_array_info(struct autosa_gen *gen, struct autosa_kernel *kernel);\nTPArrayTile *TP_infer_tiled_array(\n  struct autosa_gen *gen, struct autosa_kernel *kernel, struct isl_schedule_node *node,\n  struct autosa_array_ref_group *group, int read, int write);\n\n/* AutoSA block sparsity */\nisl_stat autosa_kernel_extract_sparse_info(struct autosa_kernel *kernel, \n  struct autosa_gen *gen);\n\n#if defined(__cplusplus)\n}\n#endif  \n\n#endif\n"
  },
  {
    "path": "src/autosa_cpu.cpp",
    "content": ""
  },
  {
    "path": "src/autosa_cpu.h",
    "content": "#ifndef _AUTOSA_CPU_H\n#define _AUTOSA_CPU_H\n\n#include <isl/ctx.h>\n\n#include \"ppcg.h\"\n\nstruct ppcg_options;\n\nint generate_autosa_cpu(isl_ctx *ctx, struct ppcg_options *options,\n\t\t\t\t\t\t\t\t\t\t\t\tconst char *input);\n\n#endif"
  },
  {
    "path": "src/autosa_intel_opencl.cpp",
    "content": "#include <vector>\n#include <algorithm>\n\n#include <isl/ctx.h>\n\n#include \"autosa_intel_opencl.h\"\n#include \"autosa_common.h\"\n#include \"autosa_print.h\"\n#include \"autosa_trans.h\"\n#include \"autosa_codegen.h\"\n#include \"autosa_utils.h\"\n#include \"autosa_comm.h\"\n\nstruct print_host_user_data\n{\n  struct hls_info *hls;\n  struct autosa_prog *prog;\n  struct autosa_hw_top_module *top;\n};\n\nstruct print_hw_module_data\n{\n  struct hls_info *hls;\n  struct autosa_prog *prog;\n  struct autosa_hw_module *module;  \n  /* Used for Intel codegen. Modify the printed iterator prefix. */\n  const char *iterator_prefix;\n};\n\nstatic void print_intel_host_header(FILE *fp)\n{\n  fprintf(fp, \"#include <stdio.h>\\n\");\n  fprintf(fp, \"#include <stdlib.h>\\n\");\n  fprintf(fp, \"#include <math.h>\\n\");\n  fprintf(fp, \"#include <cassert>\\n\");\n  fprintf(fp, \"#include <cstdio>\\n\");\n  fprintf(fp, \"#include <cstdlib>\\n\");\n  fprintf(fp, \"#include <cstring>\\n\");\n  fprintf(fp, \"#include <fstream>\\n\");\n  fprintf(fp, \"#include <iomanip>\\n\");\n  fprintf(fp, \"#include <iostream>\\n\");\n  fprintf(fp, \"#include <sstream>\\n\");\n  fprintf(fp, \"#include <string>\\n\");\n  fprintf(fp, \"#ifdef _WIN32\\n\");\n  fprintf(fp, \"#include <time.h>\\n\");\n  fprintf(fp, \"#include <windows.h>\\n\");\n  fprintf(fp, \"#else\\n\");\n  fprintf(fp, \"#include <sys/time.h>\\n\");\n  fprintf(fp, \"#endif\\n\");\n  fprintf(fp, \"#include <CL/opencl.h>\\n\");\n  //fprintf(fp, \"#include <CL/cl_ext_intelfpga.h>\\n\");\n  fprintf(fp, \"#include <chrono>\\n\");\n  fprintf(fp, \"#include \\\"AOCLUtils/aocl_utils.h\\\"\\n\\n\");\n\n  fprintf(fp, \"using namespace aocl_utils;\\n\\n\");\n  //  fprintf(fp, \"using namespace aocl_utils;\\n\\n\");\n  //  fprintf(fp, \"#define AOCX_FIEL \\\"krnl.aocx\\\"\\n\\n\");\n\n  /* Print Intel helper function */\n  fprintf(fp, \"#define HOST\\n\");\n  fprintf(fp, \"#define ACL_ALIGNMENT 64\\n\");\n  fprintf(fp, \"#ifdef _WIN32\\n\");\n  fprintf(fp, \"void *acl_aligned_malloc(size_t size) {\\n\");\n  fprintf(fp, \"    return _aligned_malloc(size, ACL_ALIGNMENT);\\n\");\n  fprintf(fp, \"}\\n\");\n  fprintf(fp, \"void acl_aligned_free(void *ptr) {\\n\");\n  fprintf(fp, \"    _aligned_free(ptr);\\n\");\n  fprintf(fp, \"}\\n\");\n  fprintf(fp, \"#else\\n\");\n  fprintf(fp, \"void *acl_aligned_malloc(size_t size) {\\n\");\n  fprintf(fp, \"    void *result = NULL;\\n\");\n  fprintf(fp, \"    if (posix_memalign(&result, ACL_ALIGNMENT, size) != 0)\\n\");\n  fprintf(fp, \"        printf(\\\"acl_aligned_malloc() failed.\\\\n\\\");\\n\");\n  fprintf(fp, \"    return result;\\n\");\n  fprintf(fp, \"}\\n\");\n  fprintf(fp, \"void acl_aligned_free(void *ptr) {\\n\");\n  fprintf(fp, \"    free(ptr);\\n\");\n  fprintf(fp, \"}\\n\");\n  fprintf(fp, \"#endif\\n\\n\");\n\n  //fprintf(fp, \"$define AOCX_FILE \\\"krnl.aocx\\\"\\n\\n\");\n  //fprintf(fp, \"// Function prototypes\\n\");\n  //fprintf(fp, \"void cleanup_host_side_resources();\\n\");\n  //fprintf(fp, \"void cleanup();\\n\\n\");\n\n  fprintf(fp, \"// Check the status returned by the OpenCL API functions\\n\");\n  fprintf(fp, \"#define CHECK(status) \\\\\\n\");\n  fprintf(fp, \"if (status != CL_SUCCESS) { \\\\\\n\");\n  fprintf(fp, \"    fprintf(stderr, \\\"error %%d in line %%d.\\\\n\\\", status, __LINE__); \\\\\\n\");\n  fprintf(fp, \"    exit(1); \\\\\\n\");\n  fprintf(fp, \"}\\n\\n\");\n\n  fprintf(fp, \"// Check the status returned by the OpenCL API functions, don't exit on error\\n\");\n  fprintf(fp, \"#define CHECK_NO_EXIT(status) \\\\\\n\");\n  fprintf(fp, \"if (status != CL_SUCCESS) { \\\\\\n\");\n  fprintf(fp, \"    fprintf(stderr, \\\"error %%d in line %%d.\\\\n\\\", status, __LINE__); \\\\\\n\");\n  fprintf(fp, \"}\\n\\n\");\n\n  fprintf(fp, \"template <typename T>\\n\");\n  fprintf(fp, \"struct aligned_allocator\\n\");\n  fprintf(fp, \"{\\n\");\n  fprintf(fp, \"  using value_type = T;\\n\");\n  fprintf(fp, \"  T* allocate(std::size_t num)\\n\");\n  fprintf(fp, \"  {\\n\");\n  fprintf(fp, \"    void* ptr = nullptr;\\n\");\n  fprintf(fp, \"    if (posix_memalign(&ptr, ACL_ALIGNMENT, num*sizeof(T)))\\n\");\n  fprintf(fp, \"      throw std::bad_alloc();\\n\");\n  fprintf(fp, \"    return reinterpret_cast<T*>(ptr);\\n\");\n  fprintf(fp, \"  }\\n\");\n  fprintf(fp, \"  void deallocate(T* p, std::size_t num)\\n\");\n  fprintf(fp, \"  {\\n\");\n  fprintf(fp, \"    free(p);\\n\");\n  fprintf(fp, \"  }\\n\");\n  fprintf(fp, \"};\\n\\n\");\n\n  fprintf(fp, \"void cleanup()\\n\");\n  fprintf(fp, \"{\\n\");\n  fprintf(fp, \"  // Place holder. Prohibit the function from elimination.\\n\");\n  fprintf(fp, \"  printf(\\\"Cleanup...\\\\n\\\");\\n\");\n  fprintf(fp, \"}\\n\\n\");\n}\n\n/* Open the host .cpp file and the kernel .h and .cpp files for writing.\n * Add the necessary includes.\n */\nstatic void opencl_open_files(struct hls_info *info, const char *input)\n{\n  char name[PATH_MAX];\n  char dir[PATH_MAX];\n  int len, len_dir;\n  isl_printer *p_str;\n  char *file_path;\n\n  p_str = isl_printer_to_str(info->ctx);\n  p_str = isl_printer_print_str(p_str, info->output_dir);\n  p_str = isl_printer_print_str(p_str, \"/src/\");\n  file_path = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  len = ppcg_extract_base_name(name, input);\n  /* Add the prefix */\n  sprintf(dir, \"%s\", file_path);\n  len_dir = strlen(file_path);\n\n  /* OpenCL host */\n  strcpy(name + len, \"_host.cpp\");\n  strcpy(dir + len_dir, name);\n  info->host_c = fopen(dir, \"w\");\n  if (!info->host_c)\n  {\n    printf(\"[AutoSA] Error: Can't open the file: %s\\n\", dir);\n    exit(1);\n  }\n\n  strcpy(name + len, \"_host.h\");\n  strcpy(dir + len_dir, name);\n  info->host_h = fopen(dir, \"w\");\n  print_intel_host_header(info->host_h);\n  fprintf(info->host_c, \"#include \\\"%s\\\"\\n\", name);\n  strcpy(name + len, \"_kernel.aocx\");\n  //fprintf(info->host_c, \"#define AOCX_FILE \\\"%s\\\"\\n\", name);\n\n  strcpy(name + len, \"_kernel_modules.cl\");\n  strcpy(dir + len_dir, name);\n  info->kernel_c = fopen(dir, \"w\");\n  if (!info->kernel_c)\n  {\n    printf(\"[AutoSA] Error: Can't open the file: %s\\n\", dir);\n    exit(1);\n  }\n\n  strcpy(name + len, \"_kernel.h\");\n  strcpy(dir + len_dir, name);\n  info->kernel_h = fopen(dir, \"w\");\n  if (!info->kernel_h)\n  {\n    printf(\"[AutoSA] Error: Can't open the file: %s\\n\", dir);\n    exit(1);\n  }\n  fprintf(info->kernel_c, \"#include \\\"%s\\\"\\n\", name);\n  fprintf(info->kernel_c, \"#include \\\"ihc_apint.h\\\"\\n\");\n  //fprintf(info->kernel_c, \"#pragma OPENCL EXTENSION cl_intel_channels : enable\\n\\n\");\n\n  strcpy(name + len, \"_top_gen.cpp\");\n  strcpy(dir + len_dir, name);\n  info->top_gen_c = fopen(dir, \"w\");\n\n  strcpy(name + len, \"_top_gen.h\");\n  strcpy(dir + len_dir, name);\n  info->top_gen_h = fopen(dir, \"w\");\n\n  fprintf(info->top_gen_c, \"#include <isl/printer.h>\\n\");\n  fprintf(info->top_gen_c, \"#include \\\"%s\\\"\\n\", name);\n\n  free(file_path);\n}\n\n/* Close all output files. \n */\nstatic void opencl_close_files(struct hls_info *info)\n{\n  isl_printer *p_str;\n  char *complete;\n  FILE *f;\n\n  fclose(info->kernel_c);\n  fclose(info->kernel_h);\n  fclose(info->host_c);\n  if (!info->hls)\n  {\n    fclose(info->host_h);\n  }\n  fclose(info->top_gen_c);\n  fclose(info->top_gen_h);\n\n  p_str = isl_printer_to_str(info->ctx);\n  p_str = isl_printer_print_str(p_str, info->output_dir);\n  p_str = isl_printer_print_str(p_str, \"/src/completed\");\n  complete = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  f = fopen(complete, \"w\");\n  fclose(f);\n  free(complete);\n}\n\n/* Extract the data pack factors for each I/O buffer allocated for the current\n * I/O group.\n * Only insert the data pack factor that is not found in the current list\n * \"data_pack_factors\".\n * The list is in ascending order.\n */\nstatic int *extract_data_pack_factors(int *data_pack_factors,\n                                      int *n_factor, struct autosa_array_ref_group *group)\n{\n  for (int i = 0; i < group->n_io_buffer; i++)\n  {\n    struct autosa_io_buffer *buf = group->io_buffers[i];\n    bool insert = true;\n    int pos = 0;\n    for (pos = 0; pos < *n_factor; pos++)\n    {\n      if (buf->n_lane > data_pack_factors[pos])\n      {\n        if (pos < *n_factor - 1)\n        {\n          if (buf->n_lane < data_pack_factors[pos + 1])\n          {\n            // insert @pos+1\n            pos++;\n            break;\n          }\n        }\n      }\n      else if (buf->n_lane == data_pack_factors[pos])\n      {\n        insert = false;\n        break;\n      }\n    }\n\n    if (!insert)\n      continue;\n\n    *n_factor = *n_factor + 1;\n    data_pack_factors = (int *)realloc(data_pack_factors,\n                                       sizeof(int) * (*n_factor));\n    for (int j = *n_factor - 1; j > pos; j--)\n    {\n      data_pack_factors[j] = data_pack_factors[j - 1];\n    }\n    data_pack_factors[pos] = buf->n_lane;\n  }\n\n  return data_pack_factors;\n}\n\n/* Examine the local buffers of each array group. \n * Extract the data pack factors and build the data types \n * required by the program. \n * For Intel devices, we use the vectorized data types.\n */\nstatic isl_stat print_data_types_intel(\n    struct autosa_hw_top_module *top, struct hls_info *hls)\n{\n  isl_printer *p;\n  struct autosa_kernel *kernel;\n\n  kernel = top->kernel;\n  p = isl_printer_to_file(kernel->ctx, hls->kernel_h);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_str_new_line(p, \"/* Data Type */\");\n\n  /* Print the primitive data type. */\n  for (int i = 0; i < kernel->n_array; i++) {\n    struct autosa_local_array_info *local = &kernel->array[i];\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"typedef \");\n    p = isl_printer_print_str(p, local->array->type);\n    p = isl_printer_print_str(p, \" \");\n    p = isl_printer_print_str(p, local->array->name);\n    p = isl_printer_print_str(p, \"_t1;\");\n    p = isl_printer_end_line(p);\n  }\n\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local = &kernel->array[i];\n    int *data_pack_factors = (int *)malloc(sizeof(int));\n    int n_factor = 1;\n    /* First insert the default data pack factor for the array. */\n    data_pack_factors[0] = local->n_lane;\n\n    /* IO group */\n    for (int n = 0; n < local->n_io_group; n++)\n    {\n      struct autosa_array_ref_group *group = local->io_groups[n];\n      data_pack_factors = extract_data_pack_factors(data_pack_factors, &n_factor, group);\n    }\n    /* Drain group */\n    if (local->drain_group)\n      data_pack_factors = extract_data_pack_factors(data_pack_factors, &n_factor, local->drain_group);\n\n    for (int n = 0; n < n_factor; n++)\n    {\n      if (data_pack_factors[n] != 1)\n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"struct \");\n        p = isl_printer_print_str(p, local->array->name);\n        p = isl_printer_print_str(p, \"_t\");\n        p = isl_printer_print_int(p, data_pack_factors[n]);\n        p = isl_printer_print_str(p, \"_t {\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_indent(p, 2);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, local->array->type);\n        p = isl_printer_print_int(p, data_pack_factors[n]);\n        p = isl_printer_print_str(p, \" data;\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"};\");\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"typedef struct \");\n        p = isl_printer_print_str(p, local->array->name);\n        p = isl_printer_print_str(p, \"_t\");\n        p = isl_printer_print_int(p, data_pack_factors[n]);\n        p = isl_printer_print_str(p, \"_t \");\n        p = isl_printer_print_str(p, local->array->name);\n        p = isl_printer_print_str(p, \"_t\");\n        p = isl_printer_print_int(p, data_pack_factors[n]);\n        p = isl_printer_print_str(p, \";\");\n        p = isl_printer_end_line(p);\n      }\n    }\n    free(data_pack_factors);\n  }\n  p = print_str_new_line(p, \"/* Data Type */\");\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  return isl_stat_ok;\n}\n\n/* Print the arguments to a drain merge function declaration or call.\n * If \"types\" is set, then print a declaration (including the types of the arguments).\n * \n * The arguments are printed in the following order:\n * - the module identifiers\n * - the parameters\n * - the host loop iterators\n * - the arrays accssed by the module\n */\n//static __isl_give isl_printer *print_drain_merge_arguments_intel(\n//    __isl_take isl_printer *p,\n//    struct autosa_kernel *kernel,\n//    struct autosa_array_ref_group *group,\n//    struct autosa_drain_merge_func *func,\n//    int types,\n//    int hls)\n//{\n//  int first = 1;\n//  int nparam;\n//  int n;\n//  isl_space *space;\n//  const char *type;\n//  struct autosa_local_array_info *local_array;\n//\n//  type = isl_options_get_ast_iterator_type(kernel->ctx);\n//  /* module identifiers */\n//  const char *dims[] = {\"idx\", \"idy\", \"idz\"};\n//  n = isl_id_list_n_id(func->inst_ids);\n//  for (int i = 0; i < n; ++i)\n//  {\n//    if (!first)\n//      p = isl_printer_print_str(p, \", \");\n//    if (types)\n//    {\n//      p = isl_printer_print_str(p, type);\n//      p = isl_printer_print_str(p, \" \");\n//    }\n//    p = isl_printer_print_str(p, dims[i]);\n//\n//    first = 0;\n//  }\n//\n//  /* params */\n//  space = isl_union_set_get_space(kernel->arrays);\n//  nparam = isl_space_dim(space, isl_dim_param);\n//  for (int i = 0; i < nparam; ++i)\n//  {\n//    const char *name;\n//\n//    name = isl_space_get_dim_name(space, isl_dim_param, i);\n//\n//    if (!first)\n//      p = isl_printer_print_str(p, \", \");\n//    if (types)\n//      p = isl_printer_print_str(p, \"int \");\n//    p = isl_printer_print_str(p, name);\n//\n//    first = 0;\n//  }\n//  isl_space_free(space);\n//\n//  /* Host iters */\n//  n = isl_space_dim(kernel->space, isl_dim_set);\n//  for (int i = 0; i < n; ++i)\n//  {\n//    const char *name;\n//\n//    if (!first)\n//      p = isl_printer_print_str(p, \", \");\n//    name = isl_space_get_dim_name(kernel->space, isl_dim_set, i);\n//    if (types)\n//    {\n//      p = isl_printer_print_str(p, type);\n//      p = isl_printer_print_str(p, \" \");\n//    }\n//    p = isl_printer_print_str(p, name);\n//\n//    first = 0;\n//  }\n//\n//  /* Arrays */\n//  local_array = group->local_array;\n//  if (!first)\n//    p = isl_printer_print_str(p, \", \");\n//  if (types)\n//  {\n//    if (hls)\n//    {\n//      p = isl_printer_print_str(p, local_array->array->type);\n//      p = isl_printer_print_str(p, \" *\");\n//    }\n//    else\n//    {\n//      p = isl_printer_print_str(p, \"std::vector<\");\n//      p = isl_printer_print_str(p, local_array->array->type);\n//      p = isl_printer_print_str(p, \", aligned_allocator<\");\n//      p = isl_printer_print_str(p, local_array->array->type);\n//      p = isl_printer_print_str(p, \">> &\");\n//    }\n//    p = isl_printer_print_str(p, local_array->array->name);\n//    p = isl_printer_print_str(p, \"_to\");\n//  }\n//  else\n//  {\n//    p = isl_printer_print_str(p, \"dev_\");\n//    p = isl_printer_print_str(p, local_array->array->name);\n//    p = isl_printer_print_str(p, \"[0]\");\n//  }\n//  first = 0;\n//\n//  if (!first)\n//    p = isl_printer_print_str(p, \", \");\n//  if (types)\n//  {\n//    if (hls)\n//    {\n//      p = isl_printer_print_str(p, local_array->array->type);\n//      p = isl_printer_print_str(p, \" *\");\n//    }\n//    else\n//    {\n//      p = isl_printer_print_str(p, \"std::vector<\");\n//      p = isl_printer_print_str(p, local_array->array->type);\n//      p = isl_printer_print_str(p, \", aligned_allocator<\");\n//      p = isl_printer_print_str(p, local_array->array->type);\n//      p = isl_printer_print_str(p, \">> &\");\n//    }\n//    p = isl_printer_print_str(p, local_array->array->name);\n//    p = isl_printer_print_str(p, \"_from\");\n//  }\n//  else\n//  {\n//    p = isl_printer_print_str(p, \"dev_\");\n//    p = isl_printer_print_str(p, local_array->array->name);\n//    p = isl_printer_print_str(p, \"[idx]\");\n//  }\n//  first = 0;\n//\n//  return p;\n//}\n\nstatic __isl_give isl_printer *print_for_with_coalesce(__isl_keep isl_ast_node *node,\n                                                       __isl_take isl_printer *p,\n                                                       __isl_take isl_ast_print_options *print_options,\n                                                       int n_coalesce_loop)\n{\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"#pragma loop_coalesce\");\n  if (n_coalesce_loop > 0) {\n    p = isl_printer_print_str(p, \" \");\n    p = isl_printer_print_int(p, n_coalesce_loop);\n  }\n  p = isl_printer_end_line(p);\n\n  p = isl_ast_node_for_print(node, p, print_options);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_for_infinitize(__isl_keep isl_ast_node *node,\n                                                    __isl_take isl_printer *p,\n                                                    __isl_take isl_ast_print_options *print_options,\n                                                    int is_first)\n{\n  isl_ast_node *body;\n\n  if (is_first) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"while (1) {\");    \n    p = isl_printer_end_line(p);    \n    p = isl_printer_indent(p, 2);\n  }\n\n  body = isl_ast_node_for_get_body(node);\n  p = isl_ast_node_print(body, p, print_options);\n  isl_ast_node_free(body);\n\n  if (is_first) {    \n    p = isl_printer_indent(p, -2);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"}\");\n    p = isl_printer_end_line(p);\n  }\n\n  return p;\n}                                                  \n\nstatic __isl_give isl_printer *print_module_for(__isl_take isl_printer *p,\n                                                __isl_take isl_ast_print_options *print_options,\n                                                __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  int outermost_for;\n  int infinitize, is_first_infinitize;\n  int n_coalesce_loop;\n  int is_dep_free;\n\n  outermost_for = 0;\n  infinitize = 0;\n  is_first_infinitize = 0;\n  id = isl_ast_node_get_annotation(node);\n  if (id)\n  {\n    struct autosa_ast_node_userinfo *info;\n    info = (struct autosa_ast_node_userinfo *)isl_id_get_user(id);\n    if (info && info->is_outermost_for)\n      outermost_for = 1;\n    if (info && info->is_infinitize_legal) {\n      infinitize = 1;\n      is_first_infinitize = info->is_first_infinitizable_loop;\n    }\n    n_coalesce_loop = info->n_coalesce_loop;\n    is_dep_free = info->is_dep_free;\n  }\n  \n  if (infinitize)\n    p = print_for_infinitize(node, p, print_options, is_first_infinitize);\n  else if (outermost_for || n_coalesce_loop > 1) {\n    if (is_dep_free == 1) {\n      p = print_str_new_line(p, \"#pragma ivdep\");\n    }\n    p = print_for_with_coalesce(node, p, print_options, n_coalesce_loop);\n  } else {\n    p = isl_ast_node_for_print(node, p, print_options);\n  }\n\n  isl_id_free(id);\n\n  return p;\n}\n\n//static __isl_give isl_printer *print_module_stmt(__isl_take isl_printer *p,\n//                                                 __isl_take isl_ast_print_options *print_options,\n//                                                 __isl_keep isl_ast_node *node, void *user)\n//{\n//  isl_id *id;\n//  struct autosa_kernel_stmt *stmt;\n//  struct print_hw_module_data *hw_data = (struct print_hw_module_data *)(user);\n//  struct autosa_hw_module *module = hw_data->module;\n//\n//  id = isl_ast_node_get_annotation(node);\n//  stmt = (struct autosa_kernel_stmt *)isl_id_get_user(id);\n//  isl_id_free(id);\n//\n//  isl_ast_print_options_free(print_options);\n//\n//  switch (stmt->type)\n//  {\n//    //    case POLYSA_KERNEL_STMT_COPY:\n//    //      return autosa_kernel_print_copy(p, stmt);\n//    //    case POLYSA_KERNEL_STMT_SYNC:\n//    //      return print_sync(p, stmt);\n//  case AUTOSA_KERNEL_STMT_DOMAIN:\n//    return autosa_kernel_print_domain(p, stmt);\n//  case AUTOSA_KERNEL_STMT_IO:\n//    return autosa_kernel_print_io(p, stmt, hw_data->hls);\n//  case AUTOSA_KERNEL_STMT_IO_TRANSFER:\n//    return autosa_kernel_print_io_transfer(p, stmt, hw_data->hls, \n//              module->options->autosa->double_buffer_style == 0?\n//                hw_data->iterator_prefix : NULL);\n//  case AUTOSA_KERNEL_STMT_IO_DRAM:\n//    return autosa_kernel_print_io_dram(p, stmt, hw_data->hls);\n//  case AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTER_TRANS:\n//    return autosa_kernel_print_inter_trans(p, stmt, hw_data->hls);\n//  case AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTRA_TRANS:\n//    return autosa_kernel_print_intra_trans(p, stmt, hw_data->hls);\n//  case AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTER_INTRA:\n//    return autosa_kernel_print_inter_intra(p, stmt, hw_data->hls);\n//  case AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTRA_INTER:\n//    return autosa_kernel_print_intra_inter(p, stmt, hw_data->hls);\n//  case AUTOSA_KERNEL_STMT_IO_MODULE_CALL_STATE_HANDLE:\n//    return autosa_kernel_print_state_handle(p, stmt, hw_data->hls);\n//  case AUTOSA_KERNEL_STMT_DRAIN_MERGE:\n//    return autosa_kernel_print_drain_merge(p, stmt, hw_data->hls);\n//  case AUTOSA_KERNEL_STMT_HOST_SERIALIZE:\n//    return autosa_kernel_print_host_serialize(p, stmt, hw_data->hls);\n//  }\n//\n//  return p;\n//}\n\n/* Print the host serialization functions.\n */\n//static isl_stat print_host_serialize_funcs(\n//    struct autosa_kernel *kernel,\n//    struct autosa_hw_module **modules,\n//    int n_modules, struct hls_info *hls)\n//{\n//  isl_printer *p;\n//  isl_ctx *ctx;\n//\n//  ctx = kernel->ctx;\n//  if (!hls->hls)\n//    p = isl_printer_to_file(ctx, hls->host_h);\n//  else\n//    p = isl_printer_to_file(ctx, hls->kernel_h);\n//  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n//  for (int i = 0; i < n_modules; i++) {\n//    struct autosa_hw_module *module = modules[i];\n//    isl_ast_print_options *print_options;\n//    struct print_hw_module_data hw_data = {hls, NULL, NULL, NULL};\n//\n//    if (module->serialize_tree) {\n//      p = print_str_new_line(p, \"/* Helper Function */\");\n//      p = isl_printer_start_line(p);\n//      if (hls->hls)\n//        p = isl_printer_print_str(p, \"inline \");\n//      p = isl_printer_print_str(p, \"void \");\n//      if (module->in) {\n//        p = isl_printer_print_str(p, \"host_serialize_\");\n//      } else {\n//        p = isl_printer_print_str(p, \"host_deserialize_\");\n//      }      \n//      p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n//      p = isl_printer_print_str(p, \"(\");      \n//      p = print_host_serialize_arguments(p, kernel, module->io_groups[0], module, 1, hls->hls);\n//      p = isl_printer_print_str(p, \"){\");\n//      p = isl_printer_end_line(p);\n//      p = isl_printer_indent(p, 2);\n//\n//      p = print_str_new_line(p, \"/* Variable Declaration */\");\n//      p = print_str_new_line(p, \"unsigned int cnt = 0;\");      \n//      p = print_str_new_line(p, \"/* Variable Declaration */\");\n//      p = isl_printer_end_line(p);\n//\n//      print_options = isl_ast_print_options_alloc(ctx);\n//      print_options = isl_ast_print_options_set_print_user(print_options,\n//                                                           &print_module_stmt, &hw_data);\n//      p = isl_ast_node_print(module->serialize_tree, p, print_options);\n//\n//      p = isl_printer_indent(p, -2);\n//      p = print_str_new_line(p, \"}\");\n//      p = print_str_new_line(p, \"/* Helper Function */\");\n//      p = isl_printer_end_line(p);\n//    }    \n//  }\n//  isl_printer_free(p);\n//\n//  return isl_stat_ok;\n//}\n\n/* For each io_module connected to the external memory, we will need to create \n * one separate queue assoicated with separate OpenCL kernels.\n */\nstatic __isl_give isl_printer *find_device_intel(__isl_take isl_printer *p,\n                                                 struct autosa_hw_top_module *top)\n{\n  int n_cmd_q;\n  int n_kernel;\n  int indent;\n\n  p = print_str_new_line(p, \"// OpenCL host code starts from here\");\n  //p = print_str_new_line(p, \"bool use_emulator = false; // control whether the emulator should be used.\");\n  p = print_str_new_line(p, \"if (argc != 2) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"std::cout << \\\"Usage: \\\" << argv[0] << \\\" <path/to/bitstream.aocx>\\\" << std::endl;\");\n  p = print_str_new_line(p, \"return -1;\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  p = print_str_new_line(p, \"cl_int status;\");\n  p = print_str_new_line(p, \"cl_platform_id platform = NULL;\");\n  p = print_str_new_line(p, \"cl_device_id *devices = NULL;\");\n  p = print_str_new_line(p, \"cl_context context = NULL;\");\n  p = print_str_new_line(p, \"cl_program program = NULL;\");\n  p = print_str_new_line(p, \"std::string binary_file = argv[1];\");\n\n  int q_id = 0;\n  for (int i = 0; i < top->n_hw_modules; i++)\n  {\n    struct autosa_hw_module *module = top->hw_modules[i];\n    if (module->type == PE_MODULE || module->to_mem == 0)\n      continue;\n    struct autosa_array_ref_group *group = module->io_groups[0];\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int ID_\");\n    p = isl_printer_print_str(p, module->name);\n    p = isl_printer_print_str(p, \"_base = \");\n    p = isl_printer_print_int(p, q_id);\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n    q_id += group->n_mem_ports;\n  }\n\n  n_cmd_q = q_id;\n  n_kernel = n_cmd_q;\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"int NUM_QUEUES_TO_CREATE = \");\n  p = isl_printer_print_int(p, n_cmd_q);\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"int NUM_KERNELS_TO_CREATE = \");\n  p = isl_printer_print_int(p, n_kernel);\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"cl_kernel kernel[NUM_KERNELS_TO_CREATE];\");\n  p = print_str_new_line(p, \"cl_command_queue cmdQueue[NUM_QUEUES_TO_CREATE];\");\n\n  p = isl_printer_end_line(p);\n//  p = print_str_new_line(p, \"// Parse command line arguments\");\n//  p = print_str_new_line(p, \"Options options(argc, argv);\");\n//  p = print_str_new_line(p, \"if (options.has(\\\"emulator\\\")) {\");\n//  p = isl_printer_indent(p, 2);\n//  p = print_str_new_line(p, \"use_emulator = options.get<bool>(\\\"emulator\\\");\");\n//  p = isl_printer_indent(p, -2);\n//  p = print_str_new_line(p, \"}\");\n  p = print_str_new_line(p, \"if (!setCwdToExeDir()) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"return false;\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"// Get the OpenCL platform\");\n  //p = print_str_new_line(p, \"if (use_emulator) {\");\n  //p = isl_printer_indent(p, 2);\n  //p = print_str_new_line(p, \"platform = findPlatform(\\\"Intel(R) FPGA Emulation Platform for OpenCL(TM)\\\");\");\n  //p = isl_printer_indent(p, -2);\n  //p = print_str_new_line(p, \"} else {\");\n  //p = isl_printer_indent(p, 2);\n  //p = print_str_new_line(p, \"platform = findPlatform(\\\"Intel(R) FPGA SDK for OpenCL(TM)\\\");\");\n  //p = isl_printer_indent(p, -2);\n  //p = print_str_new_line(p, \"}\");\n  p = print_str_new_line(p, \"platform = findPlatform(\\\"Intel\\\");\");\n  p = print_str_new_line(p, \"if (platform == NULL) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"printf(\\\"ERROR: Unable to find Intel(R) FPGA OpenCL platform\\\\n\\\");\");\n  p = print_str_new_line(p, \"return -1;\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"// Discover and initialize the devices\");\n  p = print_str_new_line(p, \"cl_uint numDevices = 0;\");\n  p = print_str_new_line(p, \"char buffer[4096];\");\n  p = print_str_new_line(p, \"unsigned int buf_uint;\");\n  p = print_str_new_line(p, \"int device_found = 0;\");\n  p = print_str_new_line(p, \"status = clGetDeviceIDs(platform,\");\n  p = isl_printer_indent(p, strlen(\"status = clGetDeviceIDs(\"));\n  p = print_str_new_line(p, \"CL_DEVICE_TYPE_ALL,\");\n  p = print_str_new_line(p, \"0,\");\n  p = print_str_new_line(p, \"NULL,\");\n  p = print_str_new_line(p, \"&numDevices);\");\n  indent = strlen(\"status = clGetDeviceIDs(\");\n  p = isl_printer_indent(p, -indent);\n  p = print_str_new_line(p, \"if (status == CL_SUCCESS) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"clGetPlatformInfo(platform,\");\n  p = isl_printer_indent(p, strlen(\"clGetPlatformInfo(\"));\n  p = print_str_new_line(p, \"CL_PLATFORM_VENDOR,\");\n  p = print_str_new_line(p, \"4096,\");\n  p = print_str_new_line(p, \"buffer,\");\n  p = print_str_new_line(p, \"NULL);\");\n  indent = strlen(\"clGetPlatformInfo(\");\n  p = isl_printer_indent(p, -indent);\n  p = print_str_new_line(p, \"if (strstr(buffer, \\\"Intel(R)\\\") != NULL) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"device_found = 1;\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = print_str_new_line(p, \"if (device_found) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"devices = (cl_device_id*) acl_aligned_malloc(numDevices * sizeof(cl_device_id));\");\n  p = print_str_new_line(p, \"status = clGetDeviceIDs(platform,\");\n  p = isl_printer_indent(p, strlen(\"status = clGetDeviceIDs(\"));\n  p = print_str_new_line(p, \"CL_DEVICE_TYPE_ALL,\");\n  p = print_str_new_line(p, \"numDevices,\");\n  p = print_str_new_line(p, \"devices,\");\n  p = print_str_new_line(p, \"NULL);\");\n  indent = strlen(\"status = clGetDeviceIDs(\");\n  p = isl_printer_indent(p, -indent);\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = print_str_new_line(p, \"if (!device_found) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"printf(\\\"failed to find a OpenCL device\\\\n\\\");\");\n  p = print_str_new_line(p, \"exit(1);\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  p = print_str_new_line(p, \"for (int i = 0; i < numDevices; i++) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"clGetDeviceInfo(devices[i],\");\n  indent = strlen(\"clGetDeviceInfo(\");\n  p = isl_printer_indent(p, indent);\n  p = print_str_new_line(p, \"CL_DEVICE_NAME,\");\n  p = print_str_new_line(p, \"4096,\");\n  p = print_str_new_line(p, \"buffer,\");\n  p = print_str_new_line(p, \"NULL);\");\n  p = isl_printer_indent(p, -indent);\n  p = print_str_new_line(p, \"fprintf(stdout, \\\"\\\\nDevice Name: %s\\\\n\\\", buffer);\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"clGetDeviceInfo(devices[i],\");\n  indent = strlen(\"clGetDeviceInfo(\");\n  p = isl_printer_indent(p, indent);\n  p = print_str_new_line(p, \"CL_DEVICE_VENDOR,\");\n  p = print_str_new_line(p, \"4096,\");\n  p = print_str_new_line(p, \"buffer,\");\n  p = print_str_new_line(p, \"NULL);\");\n  p = isl_printer_indent(p, -indent);\n  p = print_str_new_line(p, \"fprintf(stdout, \\\"Device Vendor: %s\\\\n\\\", buffer);\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"clGetDeviceInfo(devices[i],\");\n  indent = strlen(\"clGetDeviceInfo(\");\n  p = isl_printer_indent(p, indent);\n  p = print_str_new_line(p, \"CL_DEVICE_MAX_COMPUTE_UNITS,\");\n  p = print_str_new_line(p, \"sizeof(buf_uint),\");\n  p = print_str_new_line(p, \"&buf_uint,\");\n  p = print_str_new_line(p, \"NULL);\");\n  p = isl_printer_indent(p, -indent);\n  p = print_str_new_line(p, \"fprintf(stdout, \\\"Device Computing Units: %u\\\\n\\\", buf_uint);\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"clGetDeviceInfo(devices[i],\");\n  indent = strlen(\"clGetDeviceInfo(\");\n  p = isl_printer_indent(p, indent);\n  p = print_str_new_line(p, \"CL_DEVICE_GLOBAL_MEM_SIZE,\");\n  p = print_str_new_line(p, \"sizeof(unsigned long),\");\n  p = print_str_new_line(p, \"&buffer,\");\n  p = print_str_new_line(p, \"NULL);\");\n  p = isl_printer_indent(p, -indent);\n  p = print_str_new_line(p, \"fprintf(stdout, \\\"Global Memory Size: %lu\\\\n\\\", *((unsigned long*)buffer));\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"clGetDeviceInfo(devices[i],\");\n  indent = strlen(\"clGetDeviceInfo(\");\n  p = isl_printer_indent(p, indent);\n  p = print_str_new_line(p, \"CL_DEVICE_MAX_MEM_ALLOC_SIZE,\");\n  p = print_str_new_line(p, \"sizeof(unsigned long),\");\n  p = print_str_new_line(p, \"&buffer,\");\n  p = print_str_new_line(p, \"NULL);\");\n  p = isl_printer_indent(p, -indent);\n  p = print_str_new_line(p, \"fprintf(stdout, \\\"Global Memory Allocation Size: %lu\\\\n\\\\n\\\", *((unsigned long*)buffer));\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_end_line(p);\n\n  /* Context */\n  p = print_str_new_line(p, \"// Create a context\");\n  p = print_str_new_line(p, \"context = clCreateContext(NULL,\");\n  indent = strlen(\"context = clCreateContext(\");\n  p = isl_printer_indent(p, indent);\n  p = print_str_new_line(p, \"1,\");\n  p = print_str_new_line(p, \"devices,\");\n  p = print_str_new_line(p, \"NULL,\");\n  p = print_str_new_line(p, \"NULL,\");\n  p = print_str_new_line(p, \"&status); CHECK(status);\");\n  p = isl_printer_indent(p, -indent);\n  p = isl_printer_end_line(p);\n\n  /* Command Queue */\n  p = print_str_new_line(p, \"// Create command queues\");\n  p = print_str_new_line(p, \"for (int i = 0; i < NUM_QUEUES_TO_CREATE; i++) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"cmdQueue[i] = clCreateCommandQueue(context,\");\n  indent = strlen(\"cmdQueue[i] = clCreateCommandQueue(\");\n  p = isl_printer_indent(p, indent);\n  p = print_str_new_line(p, \"devices[0],\");\n  p = print_str_new_line(p, \"CL_QUEUE_PROFILING_ENABLE,\");\n  p = print_str_new_line(p, \"&status); CHECK(status);\");\n  p = isl_printer_indent(p, -indent);\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_end_line(p);\n\n  /* Create the program from binaries */\n  p = print_str_new_line(p, \"// Create the program from binaries\");\n  p = print_str_new_line(p, \"size_t binary_length;\");\n  p = print_str_new_line(p, \"const unsigned char *binary;\");\n  p = print_str_new_line(p, \"printf(\\\"\\\\nAOCX file: %s\\\\n\\\\n\\\", binary_file.c_str());\");\n  p = print_str_new_line(p, \"FILE *fp = fopen(binary_file.c_str(), \\\"rb\\\");\");\n  p = print_str_new_line(p, \"if (fp == NULL) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"printf(\\\"Failed to open the AOCX file (fopen).\\\\n\\\");\");\n  p = print_str_new_line(p, \"return -1;\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = print_str_new_line(p, \"fseek(fp, 0, SEEK_END);\");\n  p = print_str_new_line(p, \"long ftell_sz = ftell(fp);\");\n  p = print_str_new_line(p, \"if (ftell_sz < 0) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"printf(\\\"ftell returns a negative value.\\\\n\\\");\");\n  p = print_str_new_line(p, \"fclose(fp);\");\n  p = print_str_new_line(p, \"return -1;\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"} else {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"binary_length = ftell_sz;\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = print_str_new_line(p, \"binary = (unsigned char *)malloc(sizeof(unsigned char) * binary_length);\");\n  p = print_str_new_line(p, \"rewind(fp);\");\n  p = print_str_new_line(p, \"size_t fread_sz = fread((void *)binary, binary_length, 1, fp);\");\n  p = print_str_new_line(p, \"if (fread_sz == 0) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"printf(\\\"Failed to read from the AOCX file (fread).\\\\n\\\");\");\n  p = print_str_new_line(p, \"fclose(fp);\");\n  p = print_str_new_line(p, \"free(const_cast<unsigned char *>(binary));\");\n  p = print_str_new_line(p, \"return -1;\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = print_str_new_line(p, \"fclose(fp);\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"program = clCreateProgramWithBinary(context,\");\n  indent = strlen(\"program = clCreateProgramWithBinary(\");\n  p = isl_printer_indent(p, indent);\n  p = print_str_new_line(p, \"1,\");\n  p = print_str_new_line(p, \"devices,\");\n  p = print_str_new_line(p, \"&binary_length,\");\n  p = print_str_new_line(p, \"(const unsigned char **)&binary,\");\n  p = print_str_new_line(p, \"&status,\");\n  p = print_str_new_line(p, \"NULL); CHECK(status);\");\n  p = isl_printer_indent(p, -indent);\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"status = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);\");\n  p = print_str_new_line(p, \"if (status != CL_SUCCESS) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"char log[10000] = {0};\");\n  p = print_str_new_line(p, \"clGetProgramBuildInfo(program, devices[0], CL_PROGRAM_BUILD_LOG, 10000, log, NULL);\");\n  p = print_str_new_line(p, \"printf(\\\"%s\\\\n\\\", log);\");\n  p = print_str_new_line(p, \"CHECK(status);\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_end_line(p);\n\n  /* Create the kernel */\n  p = print_str_new_line(p, \"// Create the kernel\");\n  int k_id = 0;\n  for (int i = 0; i < top->n_hw_modules; i++)\n  {\n    struct autosa_hw_module *module = top->hw_modules[i];\n    if (module->type == PE_MODULE || module->to_mem == 0)\n      continue;\n    struct autosa_array_ref_group *group = module->io_groups[0];\n\n    for (int j = 0; j < group->n_mem_ports; j++)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"kernel[\");\n      p = isl_printer_print_str(p, \"ID_\");\n      p = isl_printer_print_str(p, module->name);\n      p = isl_printer_print_str(p, \"_base\");\n      if (group->n_mem_ports > 1)\n      {\n        p = isl_printer_print_str(p, \" + \");\n        p = isl_printer_print_int(p, j);\n      }\n      p = isl_printer_print_str(p, \"] = clCreateKernel(program, \\\"\");\n      p = isl_printer_print_str(p, module->name);\n      if (module->boundary && !module->device_tree)\n        p = isl_printer_print_str(p, \"_boundary\");\n      if (module->is_serialized) \n        p = isl_printer_print_str(p, \"_serialize\");\n      if (group->n_mem_ports > 1)\n      {\n        p = isl_printer_print_str(p, \"_\");\n        p = isl_printer_print_int(p, j);\n      }\n      p = isl_printer_print_str(p, \"\\\", &status);\");\n      p = isl_printer_end_line(p);\n      p = print_str_new_line(p, \"CHECK(status);\");\n      k_id++;\n    }\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *declare_and_allocate_device_arrays_intel(\n    __isl_take isl_printer *p, struct autosa_prog *prog,\n    struct autosa_kernel *kernel, struct autosa_hw_top_module *top)\n{\n  int indent;\n  p = print_str_new_line(p, \"// Allocate memory in host memory\");\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    if (local_array->n_mem_ports > 1 && local_array->array->copy_out)\n    {\n      /* Create multiple host buffers. */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::vector<std::vector<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \", aligned_allocator<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \">>> \");\n      p = isl_printer_print_str(p, \"dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize) {\n        p = isl_printer_print_str(p, \"_unserialized\");\n      }\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n      p = isl_printer_print_int(p, local_array->n_mem_ports);\n      p = isl_printer_print_str(p, \"; i++) {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::vector<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \", aligned_allocator<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \">> \");\n      p = isl_printer_print_str(p, \"dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"_tmp\");\n      p = isl_printer_print_str(p, \"(\");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \");\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \".push_back(dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"_tmp);\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n\n      if (local_array->host_serialize) {\n        /* Allocate additional serialize buffer. */\n        /* Create multiple host buffers. */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"std::vector<std::vector<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \", aligned_allocator<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \">>> \");\n        p = isl_printer_print_str(p, \"dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);      \n        p = isl_printer_print_str(p, \";\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n        p = isl_printer_print_int(p, local_array->n_mem_ports);\n        p = isl_printer_print_str(p, \"; i++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"std::vector<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \", aligned_allocator<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \">> \");\n        p = isl_printer_print_str(p, \"dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"_tmp\");\n        p = isl_printer_print_str(p, \"(\");\n        // p = autosa_array_info_print_data_size(p, local_array->array); // TODO\n        //p = isl_printer_print_ast_expr(p, local_array->serialize_bound_expr);\n        p = isl_printer_print_pw_qpolynomial(p, local_array->serialize_bound);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \".push_back(dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"_tmp);\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n      }\n    }\n    else\n    {\n      /* Create a single host buffer. */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::vector<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \", aligned_allocator<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \">> \");\n      p = isl_printer_print_str(p, \"dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \"(\");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \");\");\n      p = isl_printer_end_line(p);\n\n      if (local_array->host_serialize) {\n        /* Create a single host buffer. */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"std::vector<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \", aligned_allocator<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \">> \");\n        p = isl_printer_print_str(p, \"dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);      \n        p = isl_printer_print_str(p, \"(\");\n        //p = autosa_array_info_print_data_size(p, local_array->array);\n        //p = isl_printer_print_ast_expr(p, local_array->serialize_bound_expr);\n        p = isl_printer_print_pw_qpolynomial(p, local_array->serialize_bound);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n  p = isl_printer_end_line(p);\n\n  /* Initialize buffer. */\n  p = print_str_new_line(p, \"// Initialize host buffers\");\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    if (local_array->n_mem_ports > 1 && local_array->array->copy_out)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n      p = isl_printer_print_int(p, local_array->n_mem_ports);\n      p = isl_printer_print_str(p, \"; i++) {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::copy(reinterpret_cast<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *>(\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"), reinterpret_cast<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *>(\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \") + \");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \", dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \"[i]\");    \n      p = isl_printer_print_str(p, \".begin());\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n    }\n    else\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::copy(reinterpret_cast<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *>(\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"), reinterpret_cast<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *>(\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \") + \");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \", dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \".begin());\");\n      p = isl_printer_end_line(p);\n    }\n  }  \n\n  /* Perform data serialization if needed. */\n  for (int i = 0; i < top->n_hw_modules; i++) {\n    struct autosa_hw_module *module = top->hw_modules[i];\n    if (module->serialize_tree && module->in) {\n      struct autosa_array_ref_group *group = module->io_groups[0];\n      struct autosa_local_array_info *local_array = group->local_array;\n      if (local_array->n_mem_ports > 1 && local_array->array->copy_out)\n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n        p = isl_printer_print_int(p, local_array->n_mem_ports);\n        p = isl_printer_print_str(p, \"; i++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n  \n        p = isl_printer_start_line(p);        \n        p = isl_printer_print_str(p, module->in? \"host_serialize_\" : \"host_deserialize_\");\n        p = isl_printer_print_str(p, local_array->array->name);            \n        p = isl_printer_print_str(p, \"(\");\n        p = print_host_serialize_arguments(p, kernel, group, module, 0, 0);  // TODO: add hbm support later.\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n  \n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n      } else \n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, module->in? \"host_serialize_\" : \"host_deserialize_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"(\");\n        p = print_host_serialize_arguments(p, kernel, group, module, 0, 0);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"// Allocate buffers in device memory\");\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"std::vector<cl_mem> buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n  }\n\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    int indent1, indent2;\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    //for (int j = 0; j < local_array->n_mem_ports; j++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n    p = isl_printer_print_int(p, local_array->n_mem_ports);\n    p = isl_printer_print_str(p, \"; i++) {\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"cl_mem buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"_tmp = clCreateBuffer\");\n    p = isl_printer_print_str(p, \"(context,\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, strlen(\"cl_mem buffer_\") +\n                                  strlen(local_array->array->name) + strlen(\"_tmp\") + strlen(\" = clCreateBuffer(\"));\n    p = isl_printer_start_line(p);\n    if (local_array->array->copy_in && local_array->array->copy_out)\n    {\n      p = isl_printer_print_str(p, \"CL_MEM_READ_WRITE\");\n    }\n    else\n    {\n      if (local_array->array->copy_in)\n        p = isl_printer_print_str(p, \"CL_MEM_READ_ONLY\");\n      else if (local_array->array->copy_out)\n        p = isl_printer_print_str(p, \"CL_MEM_WRITE_ONLY\");\n    }\n    p = isl_printer_print_str(p, \",\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_start_line(p);\n    if (local_array->host_serialize) {\n      p = autosa_array_info_print_serialize_size(p, local_array->array);      \n    } else {\n      p = autosa_array_info_print_size(p, local_array->array);\n    }\n    p = isl_printer_print_str(p, \",\");\n    p = isl_printer_end_line(p);\n    p = print_str_new_line(p, \"NULL,\");\n    //p = isl_printer_start_line(p);\n    //p = isl_printer_print_str(p, \"dev_\");\n    //p = isl_printer_print_str(p, local_array->array->name);\n    //if (local_array->n_mem_ports > 1 && local_array->array->copy_out) {\n    //  p = isl_printer_print_str(p, \"[i]\");\n    //}\n    //p = isl_printer_print_str(p, \".data(),\");\n    //p = isl_printer_end_line(p);\n    p = print_str_new_line(p, \"&status); CHECK(status);\");\n    p = isl_printer_indent(p, -(strlen(\"cl_mem buffer_\") +\n                                strlen(local_array->array->name) + strlen(\"_tmp\") + strlen(\" = clCreateBuffer(\")));\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \".push_back(std::move(buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"_tmp));\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n  p = isl_printer_end_line(p);\n\n  /* Insert profiling information. */\n  p = print_str_new_line(p, \"auto host_begin = std::chrono::high_resolution_clock::now();\");\n  p = print_str_new_line(p, \"auto fpga_begin = std::chrono::high_resolution_clock::now();\");\n  p = print_str_new_line(p, \"auto fpga_end = std::chrono::high_resolution_clock::now();\");\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print code for initializing the device for execution of the transformed\n * code. This includes declaring locally defined variables as well as\n * declaring and allocating the required copies of arrays on the device.\n */\nstatic __isl_give isl_printer *init_device_intel(__isl_take isl_printer *p,\n                                                 struct autosa_prog *prog, \n                                                 struct autosa_kernel *kernel, \n                                                 int hls,\n                                                 struct autosa_hw_top_module *top)\n{\n  p = autosa_print_local_declarations(p, prog);\n\n  p = find_device_intel(p, top);\n  p = declare_and_allocate_device_arrays_intel(p, prog, kernel, top);\n\n  return p;\n}\n\n/* Print code for clearing the device after execution of the transformed code.\n * In particular, free the memory that was allocated on the device.\n */\nstatic __isl_give isl_printer *clear_device_intel(__isl_take isl_printer *p,\n                                                  struct autosa_prog *prog,\n                                                  int hls,\n                                                  struct autosa_hw_top_module *top)\n{\n  /* Profiling results */\n  p = print_str_new_line(p, \"for (int i = 0; i < NUM_QUEUES_TO_CREATE; i++) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"status = clFinish(cmdQueue[i]); CHECK(status);\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  p = print_str_new_line(p, \"auto host_end = std::chrono::high_resolution_clock::now();\");\n  p = isl_printer_end_line(p);\n  p = print_str_new_line(p, \"// Calculate time\");\n  p = print_str_new_line(p, \"std::chrono::duration<double> fpga_duration = fpga_end - fpga_begin;\");\n  p = print_str_new_line(p, \"std::cout << \\\"FPGA Time: \\\" << fpga_duration.count() << \\\" s\\\" << std::endl;\");\n  p = print_str_new_line(p, \"std::chrono::duration<double> host_duration = host_end - host_begin;\");\n  p = print_str_new_line(p, \"std::cout << \\\"Host Time: \\\" << host_duration.count() << \\\" s\\\" << std::endl;\");\n  p = isl_printer_end_line(p);\n\n  /* Deserialize the buffer data if necessary. */\n  for (int i = 0; i < top->n_hw_modules; i++) {\n    struct autosa_hw_module *module = top->hw_modules[i];\n    if (module->serialize_tree && !module->in) {\n      struct autosa_array_ref_group *group = module->io_groups[0];\n      struct autosa_local_array_info *local_array = group->local_array;\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"host_deserialize_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"(\");      \n      p = print_host_serialize_arguments(p, top->kernel, group, module, 0, 0);  // TODO: add hbm support later.\n      p = isl_printer_print_str(p, \");\");      \n      p = isl_printer_end_line(p);\n    }\n  }\n\n  /* Restore buffer */\n  p = print_str_new_line(p, \"// Restore data from host buffers\");\n  for (int i = 0; i < prog->n_array; i++)\n  {\n    struct autosa_array_info *array = &prog->array[i];\n    if (!autosa_array_requires_device_allocation(array))\n      continue;\n\n    if (array->copy_out)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::copy(dev_\");\n      p = isl_printer_print_str(p, array->name);\n      if (array->local_array->host_serialize) {\n        p = isl_printer_print_str(p, \"_unserialized\");\n      }\n      if (array->local_array->n_mem_ports > 1)\n      {\n        p = isl_printer_print_str(p, \"[0]\");\n      }\n      p = isl_printer_print_str(p, \".begin(), dev_\");\n      p = isl_printer_print_str(p, array->name);\n      if (array->local_array->host_serialize) {\n        p = isl_printer_print_str(p, \"_unserialized\");\n      }\n      if (array->local_array->n_mem_ports > 1)\n      {\n        p = isl_printer_print_str(p, \"[0]\");\n      }\n      p = isl_printer_print_str(p, \".end(), reinterpret_cast<\");\n      p = isl_printer_print_str(p, array->type);\n      p = isl_printer_print_str(p, \" *>(\");\n      p = isl_printer_print_str(p, array->name);\n      p = isl_printer_print_str(p, \"));\");\n      p = isl_printer_end_line(p);\n    }\n  }\n  p = isl_printer_end_line(p);\n\n  /* Clean up OpenCL resources */\n  p = print_str_new_line(p, \"// Clean up OpenCL resources\");\n  p = print_str_new_line(p, \"for (int i = 0; i < NUM_KERNELS_TO_CREATE; i++) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"clReleaseKernel(kernel[i]);\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_end_line(p);\n  p = print_str_new_line(p, \"for (int i = 0; i < NUM_QUEUES_TO_CREATE; i++) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"clReleaseCommandQueue(cmdQueue[i]);\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_end_line(p);\n    \n  p = print_str_new_line(p, \"#ifndef EMULATE\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"clReleaseProgram(program);\");\n  p = print_str_new_line(p, \"clReleaseContext(context);\");\n  p = print_str_new_line(p, \"acl_aligned_free(devices);\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"#endif\");\n\n  return p;\n}\n\nstatic __isl_give isl_printer *drain_merge_intel(\n    __isl_take isl_printer *p, struct autosa_prog *prog,\n    struct autosa_drain_merge_func *func,\n    int hls)\n{\n  struct autosa_array_ref_group *group = func->group;\n  p = print_str_new_line(p, \"// Merge results\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"for (int idx = \");\n  p = isl_printer_print_int(p, group->mem_port_id);\n  p = isl_printer_print_str(p, \"; idx < \");\n  p = isl_printer_print_int(p, group->mem_port_id + group->n_mem_ports);\n  p = isl_printer_print_str(p, \"; idx++) {\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_indent(p, 2);\n  p = isl_printer_start_line(p);\n  p = autosa_array_ref_group_print_prefix(group, p);\n  p = isl_printer_print_str(p, \"_drain_merge(\");\n  p = print_drain_merge_arguments(p, func->kernel, group, func, 0, hls);\n  p = isl_printer_print_str(p, \");\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print code to \"p\" for copying \"array\" from the host to the device\n * in its entirety.  The bounds on the extent of \"array\" have\n * been precomputed in extract_array_info and are used in\n * gpu_array_info_print_size.\n */\nstatic __isl_give isl_printer *copy_array_to_device_intel(__isl_take isl_printer *p,\n                                                          struct autosa_array_info *array)\n{\n  int indent;\n  struct autosa_local_array_info *local_array = array->local_array;\n\n  p = print_str_new_line(p, \"// Write host data to device buffers\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n  p = isl_printer_print_int(p, local_array->n_mem_ports);\n  p = isl_printer_print_str(p, \"; i++) {\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_indent(p, 2);\n\n  p = print_str_new_line(p, \"status = clEnqueueWriteBuffer(\");\n  indent = strlen(\"status = clEnqueueWriteBuffer(\");\n  p = isl_printer_indent(p, indent);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"cmdQueue[0],\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"buffer_\");\n  p = isl_printer_print_str(p, array->name);\n  p = isl_printer_print_str(p, \"[i],\");\n  p = isl_printer_end_line(p);\n  p = print_str_new_line(p, \"CL_TRUE,\");\n  p = print_str_new_line(p, \"0,\");\n  p = isl_printer_start_line(p);\n  if (local_array->host_serialize) {\n    p = autosa_array_info_print_serialize_size(p, array);\n  } else {\n    p = autosa_array_info_print_size(p, array);\n  }\n  p = isl_printer_print_str(p, \",\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"dev_\");\n  p = isl_printer_print_str(p, array->name);\n  if (local_array->n_mem_ports > 1 && array->copy_out)\n  {\n    p = isl_printer_print_str(p, \"[i]\");\n  }\n  p = isl_printer_print_str(p, \".data(),\");\n  p = isl_printer_end_line(p);\n  p = print_str_new_line(p, \"0,\");\n  p = print_str_new_line(p, \"NULL,\");\n  p = print_str_new_line(p, \"NULL); CHECK(status);\");\n  p = isl_printer_indent(p, -indent);\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print code to \"p\" for copying \"array\" back from the device to the host\n * in its entirety.  The bounds on the extent of \"array\" have\n * been precomputed in extract_array_info and are used in\n * polysa_array_info_print_size.\n */\nstatic __isl_give isl_printer *copy_array_from_device_intel(\n    __isl_take isl_printer *p, struct autosa_array_info *array)\n{\n  struct autosa_local_array_info *local_array;\n  int indent;\n\n  local_array = array->local_array;\n  p = print_str_new_line(p, \"// Read the results back from the device\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n  p = isl_printer_print_int(p, local_array->n_io_group_refs);\n  p = isl_printer_print_str(p, \"; i++) {\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_indent(p, 2);\n\n  p = print_str_new_line(p, \"clEnqueueReadBuffer(\");\n  indent = strlen(\"clEnqueueReadBuffer(\");\n  p = isl_printer_indent(p, indent);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"cmdQueue[0],\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"buffer_\");\n  p = isl_printer_print_str(p, array->name);\n  p = isl_printer_print_str(p, \"[i],\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"CL_TRUE,\");\n  p = print_str_new_line(p, \"0,\");\n  p = isl_printer_start_line(p);\n  if (local_array->host_serialize) {\n    p = autosa_array_info_print_serialize_size(p, array);\n  } else {\n    p = autosa_array_info_print_size(p, array);\n  }\n  p = isl_printer_print_str(p, \",\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"dev_\");\n  p = isl_printer_print_str(p, array->name);\n  if (local_array->n_mem_ports > 1 && array->copy_out)\n  {\n    p = isl_printer_print_str(p, \"[i]\");\n  }\n  p = isl_printer_print_str(p, \".data(),\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"0,\");\n  p = print_str_new_line(p, \"NULL,\");\n  p = print_str_new_line(p, \"NULL); CHECK(status);\");\n\n  p = isl_printer_indent(p, -indent);\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  return p;\n}\n\n/* Print a statement for copying an array to or from the device,\n * or for initializing or clearing the device.\n * The statement identifier of a copying node is called\n * \"to_device_<array name>\" or \"from_device_<array name>\" and\n * its user pointer points to the autosa_array_info of the array\n * that needs to be copied.\n * The node for initializing the device is called \"init_device\".\n * The node for clearing the device is called \"clear_device\".\n *\n * Extract the array (if any) from the identifier and call\n * init_device, clear_device, copy_array_to_device or copy_array_from_device.\n */\nstatic __isl_give isl_printer *print_device_node_intel(__isl_take isl_printer *p,\n                                                       __isl_keep isl_ast_node *node, \n                                                       struct autosa_prog *prog, \n                                                       int hls,\n                                                       struct autosa_hw_top_module *top)\n{\n  isl_ast_expr *expr, *arg;\n  isl_id *id;\n  const char *name;\n  struct autosa_array_info *array;\n  struct autosa_kernel *kernel;\n  struct autosa_drain_merge_func *func;\n\n  expr = isl_ast_node_user_get_expr(node);\n  arg = isl_ast_expr_get_op_arg(expr, 0);\n  id = isl_ast_expr_get_id(arg);\n  name = isl_id_get_name(id);\n  if (!strcmp(name, \"init_device\") || !strcmp(name, \"clear_device\"))\n    kernel = (struct autosa_kernel *)isl_id_get_user(id);\n  else if (!strcmp(name, \"drain_merge\"))\n    func = (struct autosa_drain_merge_func *)isl_id_get_user(id);\n  else\n    array = (struct autosa_array_info *)isl_id_get_user(id);\n  isl_id_free(id);\n  isl_ast_expr_free(arg);\n  isl_ast_expr_free(expr);\n\n  if (!name)\n    return isl_printer_free(p);\n  if (!strcmp(name, \"init_device\"))\n    return init_device_intel(p, prog, kernel, hls, top);\n  if (!strcmp(name, \"clear_device\"))\n    return clear_device_intel(p, prog, hls, top);\n  if (!strcmp(name, \"drain_merge\"))\n    return drain_merge_intel(p, prog, func, hls);\n  if (!array)\n    return isl_printer_free(p);\n\n  if (!prefixcmp(name, \"to_device\"))\n    return copy_array_to_device_intel(p, array);\n  else\n    return copy_array_from_device_intel(p, array);\n}\n\n/* Print out the statements for setting the OpenCL arguments for the io\n * modules connected to the external memory. \n * - set_ext_module_args_upper\n * - set_ext_module_args_lower\n * \n * This function only works for Intel OpenCL.\n * Originally, for each module, we have the following arguments:\n * - the module identifiers\n * - the paramters\n * - the host loop iterators\n * - the array accessed by the modules\n * - the fifos\n * - the enable signal\n * \n * We will ignore the fifos since for Intel OpenCL designs will replace these \n * fifos later with channels.\n */\nstatic __isl_give isl_printer *autosa_kernel_print_set_ext_module_args(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct autosa_prog *prog)\n{\n  int upper = stmt->u.m.upper;\n  int lower = stmt->u.m.lower;\n  int complete = (upper == 0 && lower == 0);\n  int dummy = stmt->u.m.dummy;\n  int boundary = stmt->u.m.boundary;\n  char *module_name = stmt->u.m.module_name;\n  struct autosa_hw_module *module = stmt->u.m.module;\n  int n_arg = 0;\n  struct autosa_kernel *kernel = module->kernel;\n\n  isl_space *space;\n  int nparams;\n  int n;\n  const char *type;\n\n  if (!(complete || upper))\n    return p;\n\n  /* Module identifiers */\n  if (!dummy)\n  {\n    for (int i = 0; i < isl_id_list_n_id(module->inst_ids); i++)\n    {\n      p = print_str_new_line(p, \"status = clSetKernelArg(\");\n      p = isl_printer_indent(p, 2);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"kernel[ID_\");\n      p = isl_printer_print_str(p, module_name);\n      p = isl_printer_print_str(p, \"_base\");\n      if (module->io_groups[0]->n_mem_ports > 1)\n      {\n        p = isl_printer_print_str(p, \" + c0\");\n      }\n      p = isl_printer_print_str(p, \"],\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_int(p, n_arg);\n      p = isl_printer_print_str(p, \",\");\n      p = isl_printer_end_line(p);\n      n_arg++;\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"sizeof(unsigned int),\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"(void *)&c\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \");\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"CHECK(status);\");\n    }\n  }\n  else\n  {\n    /* Dummy modules will never be instantiated at the host code. */\n  }\n\n  /* Params */\n  space = isl_union_set_get_space(module->kernel->arrays);\n  n = isl_space_dim(space, isl_dim_param);\n  isl_space_free(space);\n  for (int i = 0; i < n; i++)\n  {\n    const char *name = isl_space_get_dim_name(space, isl_dim_set, i);\n    p = print_str_new_line(p, \"status = clSetKernelArg(\");\n    p = isl_printer_indent(p, 2);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"kernel[ID_\");\n    p = isl_printer_print_str(p, module_name);\n    p = isl_printer_print_str(p, \"_base\");\n    if (module->io_groups[0]->n_mem_ports > 1)\n    {\n      p = isl_printer_print_str(p, \" + c0\");\n    }\n    p = isl_printer_print_str(p, \"],\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_int(p, n_arg);\n    p = isl_printer_print_str(p, \",\");\n    p = isl_printer_end_line(p);\n    n_arg++;\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"sizeof(unsigned int),\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"(void *)&\");\n    p = isl_printer_print_str(p, name);\n    p = isl_printer_print_str(p, \");\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"CHECK(status);\");\n  }\n\n  /* Host iters */\n  n = isl_space_dim(module->kernel->space, isl_dim_set);\n  for (int i = 0; i < n; i++)\n  {\n    const char *name = isl_space_get_dim_name(module->kernel->space, isl_dim_set, i);\n    p = print_str_new_line(p, \"status = clSetKernelArg(\");\n    p = isl_printer_indent(p, 2);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"kernel[ID_\");\n    p = isl_printer_print_str(p, module_name);\n    p = isl_printer_print_str(p, \"_base\");\n    if (module->io_groups[0]->n_mem_ports > 1)\n    {\n      p = isl_printer_print_str(p, \" + c0\");\n    }\n    p = isl_printer_print_str(p, \"],\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_int(p, n_arg);\n    p = isl_printer_print_str(p, \",\");\n    p = isl_printer_end_line(p);\n    n_arg++;\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"sizeof(unsigned int),\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"(void *)&\");\n    p = isl_printer_print_str(p, name);\n    p = isl_printer_print_str(p, \");\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"CHECK(status);\");\n  }\n\n  /* Scalars and arrays */\n  if (module->type != PE_MODULE && module->to_mem)\n  {\n    struct autosa_local_array_info *local_array = module->io_groups[0]->local_array;\n    /* IO modules will not contain any scalar inputs. */\n    p = print_str_new_line(p, \"status = clSetKernelArg(\");\n    p = isl_printer_indent(p, 2);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"kernel[ID_\");\n    p = isl_printer_print_str(p, module_name);\n    p = isl_printer_print_str(p, \"_base\");\n    if (module->io_groups[0]->n_mem_ports > 1)\n    {\n      p = isl_printer_print_str(p, \" + c0\");\n    }\n    p = isl_printer_print_str(p, \"],\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_int(p, n_arg);\n    p = isl_printer_print_str(p, \",\");\n    p = isl_printer_end_line(p);\n    n_arg++;\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"sizeof(cl_mem),\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"(void *)&buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"[\");\n    if (module->io_groups[0]->n_mem_ports == 1)\n    {\n      p = isl_printer_print_int(p, module->n_array_ref);\n    }\n    else\n    {\n      p = isl_printer_print_str(p, \"c0 + \");\n      p = isl_printer_print_int(p, module->n_array_ref);\n    }\n    p = isl_printer_print_str(p, \"]);\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"CHECK(status);\");\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_set_ext_module_args_stmt(\n    __isl_take isl_printer *p,\n    __isl_take isl_ast_print_options *print_options,\n    __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  struct autosa_kernel_stmt *stmt;\n  struct print_hw_module_data *data = (struct print_hw_module_data *)(user);\n\n  id = isl_ast_node_get_annotation(node);\n  stmt = (struct autosa_kernel_stmt *)isl_id_get_user(id);\n  isl_id_free(id);\n\n  isl_ast_print_options_free(print_options);\n\n  switch (stmt->type)\n  {\n  case AUTOSA_KERNEL_STMT_EXT_MODULE:\n    return autosa_kernel_print_set_ext_module_args(p, stmt, data->prog);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *autosa_kernel_print_launch_ext_module_kernels(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct autosa_prog *prog)\n{\n  int upper = stmt->u.m.upper;\n  int lower = stmt->u.m.lower;\n  int complete = (upper == 0 && lower == 0);\n  int dummy = stmt->u.m.dummy;\n  int boundary = stmt->u.m.boundary;\n  char *module_name = stmt->u.m.module_name;\n  struct autosa_hw_module *module = stmt->u.m.module;\n  int n_arg = 0;\n  struct autosa_kernel *kernel = module->kernel;\n\n  isl_space *space;\n  int nparams;\n  int n;\n  const char *type;\n\n  if (!(complete || upper))\n    return p;\n\n  p = print_str_new_line(p, \"status = clEnqueueNDRangeKernel(\");\n  p = isl_printer_indent(p, 2);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"cmdQueue[ID_\");\n  p = isl_printer_print_str(p, module_name);\n  p = isl_printer_print_str(p, \"_base\");\n  if (module->io_groups[0]->n_mem_ports > 1)\n  {\n    p = isl_printer_print_str(p, \" + c0\");\n  }\n  p = isl_printer_print_str(p, \"],\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"kernel[ID_\");\n  p = isl_printer_print_str(p, module_name);\n  p = isl_printer_print_str(p, \"_base\");\n  if (module->io_groups[0]->n_mem_ports > 1)\n  {\n    p = isl_printer_print_str(p, \" + c0\");\n  }\n  p = isl_printer_print_str(p, \"],\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"1,\");\n  p = print_str_new_line(p, \"NULL,\");\n  p = print_str_new_line(p, \"globalWorkSize,\");\n  p = print_str_new_line(p, \"localWorkSize,\");\n  p = print_str_new_line(p, \"0,\");\n  p = print_str_new_line(p, \"NULL,\");\n  p = print_str_new_line(p, \"NULL);\");\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"CHECK(status);\");\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_launch_ext_module_kernels_stmt(\n    __isl_take isl_printer *p,\n    __isl_take isl_ast_print_options *print_options,\n    __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  struct autosa_kernel_stmt *stmt;\n  struct print_hw_module_data *data = (struct print_hw_module_data *)(user);\n\n  id = isl_ast_node_get_annotation(node);\n  stmt = (struct autosa_kernel_stmt *)isl_id_get_user(id);\n  isl_id_free(id);\n\n  isl_ast_print_options_free(print_options);\n\n  switch (stmt->type)\n  {\n  case AUTOSA_KERNEL_STMT_EXT_MODULE:\n    return autosa_kernel_print_launch_ext_module_kernels(p, stmt, data->prog);\n  }\n\n  return p;\n}\n\n/* Set kernel arguments:\n * - arrays\n * - parameters\n * - host iterators\n * TODO: We need to filter out the module declaration trees and \n * print them for Intel devices.\n */\nstatic __isl_give isl_printer *print_set_kernel_arguments_intel(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog,\n    struct autosa_kernel *kernel,\n    struct autosa_hw_top_module *top)\n{\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = prog->ctx;\n  struct print_hw_module_data hw_data = {NULL, prog, NULL, NULL};\n\n  p = print_str_new_line(p, \"// Set the arguments\");\n  /* Default settings */\n  p = print_str_new_line(p, \"size_t globalWorkSize[1];\");\n  p = print_str_new_line(p, \"size_t localWorkSize[1];\");\n  p = print_str_new_line(p, \"globalWorkSize[0] = 1;\");\n  p = print_str_new_line(p, \"localWorkSize[0] = 1;\");\n  p = isl_printer_end_line(p);\n\n  for (int i = 0; i < top->n_ext_module; i++)\n  {\n    /* Print AST */\n    print_options = isl_ast_print_options_alloc(ctx);\n    print_options = isl_ast_print_options_set_print_user(print_options,\n                                                         &print_set_ext_module_args_stmt, &hw_data);\n\n    p = isl_ast_node_print(top->ext_module_wrapped_trees[i],\n                           p, print_options);\n    p = isl_printer_end_line(p);\n  }\n\n  return p;\n}\n\n/* Launch the kernels.\n * For each io module connected to the external memory, we will launch a kernel\n * in a independent command queue.\n */\nstatic __isl_give isl_printer *print_launch_kernel_intel(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog,\n    struct autosa_kernel *kernel,\n    struct autosa_hw_top_module *top)\n{\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = prog->ctx;\n  struct print_hw_module_data hw_data = {NULL, prog, NULL, NULL};\n\n  p = print_str_new_line(p, \"// Launch the kernels\");\n\n  for (int i = 0; i < top->n_ext_module; i++)\n  {\n    /* Print AST */\n    print_options = isl_ast_print_options_alloc(ctx);\n    print_options = isl_ast_print_options_set_print_user(print_options,\n                                                         &print_launch_ext_module_kernels_stmt, &hw_data);\n\n    p = isl_ast_node_print(top->ext_module_wrapped_trees[i],\n                           p, print_options);\n    p = isl_printer_end_line(p);\n  }\n\n  return p;\n}\n\n/* Print the user statement of the host code to \"p\".\n *\n * The host code may contain original user statements, kernel launches,\n * statements that copy data to/from the device and statements\n * the initialize or clear the device.\n * The original user statements and the kernel launches have\n * an associated annotation, while the other statements do not.\n * The latter are handled by print_device_node.\n * The annotation on the user statements is called \"user\".\n *\n * In case of a kernel launch, print a block of statements that\n * defines the grid and the block and then launches the kernel.\n */\nstatic __isl_give isl_printer *print_host_user_intel(__isl_take isl_printer *p,\n                                                     __isl_take isl_ast_print_options *print_options,\n                                                     __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  int is_user;\n  struct autosa_kernel *kernel;\n  struct autosa_kernel_stmt *stmt;\n  struct print_host_user_data *data;\n  struct hls_info *hls;\n  struct autosa_hw_top_module *top;\n\n  isl_ast_print_options_free(print_options);\n\n  data = (struct print_host_user_data *)user;\n  hls = data->hls;\n  top = data->top;\n\n  id = isl_ast_node_get_annotation(node);\n  if (!id)\n  {\n    return print_device_node_intel(p, node, data->prog, hls->hls, top);\n  }\n\n  is_user = !strcmp(isl_id_get_name(id), \"user\");\n  kernel = is_user ? NULL : (struct autosa_kernel *)isl_id_get_user(id);\n  stmt = is_user ? (struct autosa_kernel_stmt *)isl_id_get_user(id) : NULL;\n  isl_id_free(id);\n\n  if (is_user)\n    return autosa_kernel_print_domain(p, stmt);\n\n  /* Print OpenCL host. */\n  p = ppcg_start_block(p);\n\n  p = print_set_kernel_arguments_intel(p, data->prog, kernel, top);\n\n  p = print_str_new_line(p, \"for (int i = 0; i < NUM_QUEUES_TO_CREATE; i++) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"status = clFinish(cmdQueue[i]); CHECK(status);\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = print_str_new_line(p, \"fpga_begin = std::chrono::high_resolution_clock::now();\");\n\n  p = print_launch_kernel_intel(p, data->prog, kernel, top);\n\n  p = print_str_new_line(p, \"for (int i = 0; i < NUM_QUEUES_TO_CREATE; i++) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"status = clFinish(cmdQueue[i]); CHECK(status);\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = print_str_new_line(p, \"fpga_end = std::chrono::high_resolution_clock::now();\");\n\n  p = ppcg_end_block(p);\n  p = isl_printer_end_line(p);\n\n  /* Print the top kernel header. */\n  // print_kernel_headers_intel(data->prog, kernel, data->hls); // TODO\n\n  return p;\n}\n\n/* Print the header of the given module.\n */\nstatic __isl_give isl_printer *print_module_header_intel(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_hw_module *module,\n    int inter, int boundary, int serialize)\n{\n  p = isl_printer_start_line(p);\n  if (inter == -1)\n    p = isl_printer_print_str(p, \"__kernel void \");\n  else\n    p = isl_printer_print_str(p, \"void \");\n  p = isl_printer_print_str(p, module->name);\n  if (inter == 0)\n    p = isl_printer_print_str(p, \"_intra_trans\");\n  else if (inter == 1)\n    p = isl_printer_print_str(p, \"_inter_trans\");\n  if (boundary)\n    p = isl_printer_print_str(p, \"_boundary\");\n  if (serialize)\n    p = isl_printer_print_str(p, \"_serialize\");\n  p = isl_printer_print_str(p, \"(\");\n  p = print_module_arguments(p, prog, module->kernel, module, 1, INTEL_HW, inter, -1, boundary, serialize);\n  p = isl_printer_print_str(p, \")\");\n\n  return p;\n}\n\n/* Print the header of the given module to both gen->hls.kernel_h\n * and gen->hls.kernel_c\n * If \"inter\" is -1, this is a normal module call.\n * If \"inter\" is 0, this is a intra_trans module call.\n * If \"inter\" is 1, this is a inter_trans module call.\n */\nstatic isl_stat print_module_headers_intel(\n    struct autosa_prog *prog, struct autosa_hw_module *module,\n    struct hls_info *hls, int inter, int boundary, int serialize)\n{\n  isl_printer *p;  \n\n  p = isl_printer_to_file(prog->ctx, hls->kernel_c);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  if (inter == -1)\n  {\n    p = print_str_new_line(p, \"__attribute__((max_global_work_dim(0)))\");\n    //if (module->to_mem != 1)\n    if ((module->is_serialized && !serialize) || (module->to_mem != 1))\n      p = print_str_new_line(p, \"__attribute__((autorun))\");\n  }\n  p = print_module_header_intel(p, prog, module, inter, boundary, serialize);\n  //p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  return isl_stat_ok;\n}\n\n/* Print out variable declarations on Intel platforms. \n */\nstatic __isl_give isl_printer *print_module_var_intel(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_var *var, int double_buffer,\n    struct autosa_hw_module *module)\n{\n  int j;\n  int use_memory = 0; // 0: FF 1: LUTRAM 2: BRAM 3: URAM\n  use_memory = extract_memory_type(module, var, module->options->autosa->uram);\n\n  p = isl_printer_start_line(p);\n  if (var->n_lane == 1)\n    p = isl_printer_print_str(p, var->array->type);\n  else\n  {\n    p = isl_printer_print_str(p, var->array->name);\n    p = isl_printer_print_str(p, \"_t\");\n    p = isl_printer_print_int(p, var->n_lane);\n  }\n  p = isl_printer_print_str(p, \" \");\n  p = isl_printer_print_str(p, var->name);\n  if (double_buffer)\n    p = isl_printer_print_str(p, \"_ping\");\n  for (j = 0; j < isl_vec_size(var->size); ++j)\n  {\n    isl_val *v;\n\n    p = isl_printer_print_str(p, \"[\");\n    v = isl_vec_get_element_val(var->size, j);\n    p = isl_printer_print_val(p, v);\n    isl_val_free(v);\n    p = isl_printer_print_str(p, \"]\");\n  }\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n\n  /* Print pong buffer */\n  if (double_buffer)\n  {\n    p = isl_printer_start_line(p);\n    if (var->n_lane == 1)\n      p = isl_printer_print_str(p, var->array->type);\n    else\n    {\n      p = isl_printer_print_str(p, var->array->name);\n      p = isl_printer_print_str(p, \"_t\");\n      p = isl_printer_print_int(p, var->n_lane);\n    }\n    p = isl_printer_print_str(p, \" \");\n    p = isl_printer_print_str(p, var->name);\n    if (double_buffer)\n      p = isl_printer_print_str(p, \"_pong\");\n    for (j = 0; j < isl_vec_size(var->size); ++j)\n    {\n      isl_val *v;\n\n      p = isl_printer_print_str(p, \"[\");\n      v = isl_vec_get_element_val(var->size, j);\n      p = isl_printer_print_val(p, v);\n      isl_val_free(v);\n      p = isl_printer_print_str(p, \"]\");\n    }\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_module_vars_intel(__isl_take isl_printer *p,\n                                                       struct autosa_hw_module *module, int inter)\n{\n  int i, n;\n  isl_space *space;\n  const char *type;\n\n  if (inter == -1)\n  {\n    for (i = 0; i < module->n_var; ++i)\n      p = print_module_var_intel(p, &module->var[i], module->double_buffer, module);\n  }\n\n  if (module->double_buffer && inter == -1)\n  {\n    type = isl_options_get_ast_iterator_type(module->kernel->ctx);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"bool arb = 0;\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, module->in ? \"bool inter_trans_en = 1;\" : \"bool inter_trans_en = 0;\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, module->in ? \"bool intra_trans_en = 0;\" : \"bool intra_trans_en = 1;\");\n    p = isl_printer_end_line(p);\n    /* iterators */\n    space = (module->in) ? module->intra_space : module->inter_space;\n    n = isl_space_dim(space, isl_dim_set);\n    for (int i = 0; i < n; i++)\n    {\n      const char *name;\n      name = isl_space_get_dim_name(space, isl_dim_set, i);\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, type);\n      p = isl_printer_print_str(p, \" \");\n      p = isl_printer_print_str(p, name);\n      p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, name);\n      p = isl_printer_print_str(p, \"_prev\");\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    }\n  }\n\n  return p;\n}\n\n/* Print the intra_trans module.\n */\nstatic __isl_give isl_printer *autosa_print_intra_trans_module(\n    __isl_take isl_printer *p,\n    struct autosa_hw_module *module, struct autosa_prog *prog,\n    struct hls_info *hls, int boundary)\n{\n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  print_module_headers_intel(prog, module, hls, 0, boundary, 0);\n  fprintf(hls->kernel_c, \" {\\n\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = print_module_iterators(p, hls->kernel_c, module);\n  p = print_module_vars_intel(p, module, 0);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  if (module->double_buffer)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"if (!intra_trans_en) return;\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_end_line(p);\n  }\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &print_module_for, &hw_data);\n\n  //p = print_str_new_line(p, \"#pragma loop_coalesce\");\n  p = isl_ast_node_print(module->intra_tree, p, print_options);\n  p = isl_printer_indent(p, -2);\n\n  fprintf(hls->kernel_c, \"}\\n\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print the inter_trans module.\n */\nstatic __isl_give isl_printer *autosa_print_inter_trans_module(\n    __isl_take isl_printer *p,\n    struct autosa_hw_module *module, struct autosa_prog *prog,\n    struct hls_info *hls, int boundary)\n{\n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  if (boundary) {\n    if (!module->boundary_inter_tree)\n      return p;\n  } else {\n    if (!module->inter_tree)\n      return p;\n  }\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  print_module_headers_intel(prog, module, hls, 1, boundary, 0);\n  fprintf(hls->kernel_c, \" {\\n\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = print_module_iterators(p, hls->kernel_c, module);\n  p = print_module_vars_intel(p, module, 1);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  if (module->double_buffer)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"if (!inter_trans_en) return;\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_end_line(p);\n  }\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &print_module_for, &hw_data);\n\n  //p = print_str_new_line(p, \"#pragma loop_coalesce\");\n  p = isl_ast_node_print((boundary == 0) ? module->inter_tree : module->boundary_inter_tree, p, print_options);\n  p = isl_printer_indent(p, -2);\n\n  fprintf(hls->kernel_c, \"}\\n\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print the serializaztion module that connects the external memory to the \n * top-level I/O module. \n */\nstatic __isl_give isl_printer *autosa_print_serialize_module(\n  __isl_take isl_printer *p,\n  struct autosa_hw_module *module, struct autosa_prog *prog,\n  struct hls_info *hls, int boundary)\n{  \n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);  \n\n  /* Print core. */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  print_module_headers_intel(prog, module, hls, -1, boundary, 1);  \n  fprintf(hls->kernel_c, \" {\\n\");    \n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  if (!prog->scop->options->autosa->use_cplusplus_template) {\n    p = print_module_iterators(p, hls->kernel_c, module);    \n  }\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  p = print_module_serialize_body(p, module, hls);\n  p = isl_printer_indent(p, -2);\n  fprintf(hls->kernel_c, \"}\\n\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n  return p;\n}\n\n/* Print the default module. */\nstatic __isl_give isl_printer *autosa_print_default_module(\n    __isl_take isl_printer *p,\n    struct autosa_hw_module *module, struct autosa_prog *prog,\n    struct hls_info *hls, int boundary)\n{\n  if (!boundary) {\n    if (!module->device_tree)\n      return p;\n  } else {\n    if (!module->boundary_tree)\n      return p;\n  }  \n\n  bool wrapper = 0;\n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  /* Print wrapper for PE and L1 IO module */\n  if (module->type == PE_MODULE || (module->type != PE_MODULE && module->level == 1)) \n    wrapper = 1; \n\n  /* Print core. */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  //p = print_module_core_headers_intel(p, prog, module, hls, -1, boundary, 1);\n  print_module_headers_intel(prog, module, hls, -1, boundary, 0);\n  fprintf(hls->kernel_c, \" {\\n\");  \n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  if (!prog->scop->options->autosa->use_cplusplus_template) {\n    p = print_module_iterators(p, hls->kernel_c, module);\n  }\n  p = print_module_vars_intel(p, module, -1);  \n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  if (module->credit && !module->in)\n  {\n  }\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &print_module_for, &hw_data);\n\n  //p = print_str_new_line(p, \"#pragma loop_coalesce\");\n  if (!boundary)\n    p = isl_ast_node_print(module->device_tree, p, print_options);\n  else\n    p = isl_ast_node_print(module->boundary_tree, p, print_options);\n\n  if (module->credit && module->in)\n  {\n  }\n\n  p = isl_printer_indent(p, -2);\n\n  fprintf(hls->kernel_c, \"}\\n\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n\n  /* Print wrapper. */\n  //  if (hls->target == XILINX_HW) {\n  //    p = isl_printer_start_line(p);\n  //    p = isl_printer_print_str(p, \"/* Module Definition */\");\n  //    p = isl_printer_end_line(p);\n  //\n  //    print_module_wrapper_headers_xilinx(prog, module, hls, -1, boundary);\n  //\n  //    fprintf(hls->kernel_c, \"{\\n\");\n  //    p = isl_printer_indent(p, 2);\n  //\n  //    p = print_module_core_headers_xilinx(p, prog, module, hls, -1, boundary, 0);\n  //    p = isl_printer_print_str(p, \";\");\n  //    p = isl_printer_end_line(p);\n  //    p = isl_printer_indent(p, -2);\n  //\n  //    fprintf(hls->kernel_c, \"}\\n\");\n  //    p = isl_printer_start_line(p);\n  //    p = isl_printer_print_str(p, \"/* Module Definition */\");\n  //    p = isl_printer_end_line(p);\n  //\n  //    p = isl_printer_end_line(p);\n  //  }\n\n  /* If the module serialization is enabled, we will print out an extra module\n   * for serailizing the data. */\n  if (module->to_mem && module->options->autosa->host_serialize) {\n    p = autosa_print_serialize_module(p, module, prog, hls, boundary);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_pe_dummy_module_core_header_intel(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_pe_dummy_module *module, int types)\n{\n  struct autosa_array_ref_group *group = module->io_group;\n\n  p = isl_printer_start_line(p);\n  if (types)\n    p = isl_printer_print_str(p, \"void \");\n  // group_name\n  p = isl_printer_print_str(p, group->array->name);\n  if (group->group_type == AUTOSA_IO_GROUP)\n  {\n    if (group->local_array->n_io_group > 1)\n    {\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_int(p, group->nr);\n    }\n  }\n  else if (group->group_type == AUTOSA_DRAIN_GROUP)\n  {\n    p = isl_printer_print_str(p, \"_\");\n    p = isl_printer_print_str(p, \"drain\");\n  }\n  p = isl_printer_print_str(p, \"_PE_dummy\");\n  p = isl_printer_print_str(p, module->in? \"_in\" : \"_out\");\n  p = isl_printer_print_str(p, \"(\");\n  p = print_pe_dummy_module_arguments(p, prog, module->module->kernel,\n                                      module, types, INTEL_HW);\n  p = isl_printer_print_str(p, \")\");\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_pe_dummy_module_core_headers_intel(\n    __isl_take isl_printer *p, struct autosa_prog *prog,\n    struct autosa_pe_dummy_module *module, struct hls_info *hls, int types)\n{\n  p = print_pe_dummy_module_core_header_intel(p, prog, module, types);\n\n  return p;\n}\n\n/* Print the header of the given module.\n */\nstatic __isl_give isl_printer *print_pe_dummy_module_header_intel(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_pe_dummy_module *module,\n    int inter, int boundary)\n{\n  struct autosa_array_ref_group *group = module->io_group;\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"__kernel void \");\n  // group_name\n  p = isl_printer_print_str(p, group->array->name);\n  if (group->group_type == AUTOSA_IO_GROUP)\n  {\n    if (group->local_array->n_io_group > 1)\n    {\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_int(p, group->nr);\n    }\n  }\n  else if (group->group_type == AUTOSA_DRAIN_GROUP)\n  {\n    p = isl_printer_print_str(p, \"_\");\n    p = isl_printer_print_str(p, \"drain\");\n  }\n  p = isl_printer_print_str(p, \"_PE_dummy\");\n  p = isl_printer_print_str(p, module->in? \"_in\" : \"_out\");\n  p = isl_printer_print_str(p, \"(\");\n  p = print_pe_dummy_module_arguments(p, prog, module->module->kernel,\n                                      module, 1, INTEL_HW);\n  p = isl_printer_print_str(p, \")\");\n\n  return p;\n}\n\n/* Print the header of the given module to both gen->hls.kernel_h\n * and gen->hls.kernel_c\n * If \"inter\" is -1, this is a normal module call.\n * If \"inter\" is 0, this is a intra_trans module call.\n * If \"inter\" is 1, this is a inter_trans module call.\n */\nstatic isl_stat print_pe_dummy_module_headers_intel(\n    struct autosa_prog *prog, struct autosa_pe_dummy_module *module,\n    struct hls_info *hls, int inter, int boundary)\n{\n  isl_printer *p;\n\n  //  p = isl_printer_to_file(prog->ctx, hls->kernel_h);\n  //  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  //  p = print_pe_dummy_module_header_intel(p, prog, module, inter, boundary);\n  //  p = isl_printer_print_str(p, \";\");\n  //  p = isl_printer_end_line(p);\n  //  isl_printer_free(p);\n\n  p = isl_printer_to_file(prog->ctx, hls->kernel_c);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_str_new_line(p, \"__attribute__((max_global_work_dim(0)))\");\n  p = print_str_new_line(p, \"__attribute__((autorun))\");\n  p = print_pe_dummy_module_header_intel(p, prog, module, inter, boundary);\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  return isl_stat_ok;\n}\n\nstatic __isl_give isl_printer *autosa_print_default_pe_dummy_module(\n    __isl_take isl_printer *p,\n    struct autosa_pe_dummy_module *pe_dummy_module,\n    struct autosa_prog *prog, struct hls_info *hls, int boundary)\n{\n  struct autosa_hw_module *module = pe_dummy_module->module;\n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  /* Print core. */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  //if (hls->target == XILINX_HW)\n  //    p = print_pe_dummy_module_core_headers_xilinx(p, prog,\n  //pe_dummy_module, hls, 1);\n  print_pe_dummy_module_headers_intel(prog, pe_dummy_module, hls, -1, boundary);\n\n  fprintf(hls->kernel_c, \" {\\n\");\n  p = isl_printer_indent(p, 2);  \n  p = print_str_new_line(p, \"while (1) {\");\n  p = isl_printer_indent(p, 2);\n  \n  /* [type] fifo_data; */\n  struct autosa_array_ref_group *group = pe_dummy_module->io_group;\n  int n_lane = get_io_group_n_lane(NULL, pe_dummy_module, group);\n  p = isl_printer_start_line(p);\n  if (n_lane == 1) {\n    p = isl_printer_print_str(p, group->array->type);\n  } else {\n    p = isl_printer_print_str(p, group->array->name);\n    p = isl_printer_print_str(p, \"_t\");\n    p = isl_printer_print_int(p, n_lane);\n  }\n  p = isl_printer_print_str(p, \" fifo_data;\");\n  p = isl_printer_end_line(p);\n\n  /* fifo_data = fifo.read(); */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"fifo_data = read_channel_intel(\");\n  p = autosa_array_ref_group_print_fifo_name(group, p);\n  p = isl_printer_print_str(p, \"_in);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  p = isl_printer_indent(p, -2);\n  fprintf(hls->kernel_c, \"}\\n\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\nstruct print_db_module_intel_data {\n  int inter; // -1: outer 0: intra 1: inter  \n  int under_if; \n  int reach_user;\n\n  isl_printer *p_for;\n  isl_printer *p_user;\n  /* Outer */\n  std::vector<char *> outer_for_logic;  \n  std::vector<char *> outer_iterator_name;\n  std::vector<char *> outer_iterator_lb;\n  std::vector<char *> outer_iterator_ub;\n  int outer_for_level;\n  /* Inter */\n  std::vector<char *> inter_for_logic;  \n  std::vector<char *> inter_iterator_name;\n  std::vector<char *> inter_iterator_lb;\n  std::vector<char *> inter_iterator_ub;\n  int inter_for_level;\n  /* Intra */\n  std::vector<char *> intra_for_logic;  \n  std::vector<char *> intra_iterator_name;\n  std::vector<char *> intra_iterator_lb;\n  std::vector<char *> intra_iterator_ub;\n  int intra_for_level;\n};\n\nstatic __isl_give isl_printer *print_double_buffer_module_vars_intel(\n  __isl_take isl_printer *p, struct autosa_hw_module *module, struct hls_info *hls,\n  struct print_db_module_intel_data *data)\n{\n  /* Inst ids */\n  p = print_module_iterators(p, hls->kernel_c, module);\n  /* Local buffer */\n  for (int i = 0; i < module->n_var; i++) {\n    struct autosa_kernel_var *var = &module->var[i];\n    p = isl_printer_start_line(p);\n    if (var->n_lane == 1) \n      p = isl_printer_print_str(p, var->array->type);\n    else\n    {\n      p = isl_printer_print_str(p, var->array->name);\n      p = isl_printer_print_str(p, \"_t\");\n      p = isl_printer_print_int(p, var->n_lane);\n    }\n    p = isl_printer_print_str(p, \" \");\n    p = isl_printer_print_str(p, var->name);\n    p = isl_printer_print_str(p, \"[2]\");\n    for (int j = 0; j < isl_vec_size(var->size); j++) {\n      isl_val *v;\n\n      p = isl_printer_print_str(p, \"[\");\n      v = isl_vec_get_element_val(var->size, j);\n      p = isl_printer_print_val(p, v);\n      isl_val_free(v);\n      p = isl_printer_print_str(p, \"]\");      \n    }\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n  }\n\n  /* State handle variables */\n  p = print_str_new_line(p, \"bool arb = 0;\");  \n  p = print_str_new_line(p, module->in? \"bool inter_trans_en = 1;\" : \"bool inter_trans_en = 0;\");\n  p = print_str_new_line(p, module->in? \"bool intra_trans_en = 0;\" : \"bool intra_trans_en = 1;\");\n  p = print_str_new_line(p, module->in? \"bool inter_done = 0;\" : \"bool inter_done = 1;\");\n  p = print_str_new_line(p, module->in? \"bool intra_done = 1;\" : \"bool intra_done = 0;\");\n  /* Iterators */\n  for (int i = 0; i < data->outer_iterator_name.size(); i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, data->outer_iterator_name[i]);\n    free(data->outer_iterator_name[i]);\n    p = isl_printer_print_str(p, \" = \");\n    p = isl_printer_print_str(p, data->outer_iterator_lb[i]);\n    free(data->outer_iterator_lb[i]);\n    p = isl_printer_print_str(p, \"; \");\n    p = isl_printer_print_str(p, \"/* UB: \");\n    p = isl_printer_print_str(p, data->outer_iterator_ub[i]);\n    free(data->outer_iterator_ub[i]);\n    p = isl_printer_print_str(p, \" */\");\n    p = isl_printer_end_line(p);\n  }\n  for (int i = 0; i < data->inter_iterator_name.size(); i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, data->inter_iterator_name[i]);\n    free(data->inter_iterator_name[i]);\n    p = isl_printer_print_str(p, \" = \");\n    p = isl_printer_print_str(p, data->inter_iterator_lb[i]);\n    free(data->inter_iterator_lb[i]);\n    p = isl_printer_print_str(p, \"; \");\n    p = isl_printer_print_str(p, \"/* UB: \");\n    p = isl_printer_print_str(p, data->inter_iterator_ub[i]);\n    free(data->inter_iterator_ub[i]);\n    p = isl_printer_print_str(p, \" */\");\n    p = isl_printer_end_line(p);\n  }\n  for (int i = 0; i < data->intra_iterator_name.size(); i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, data->intra_iterator_name[i]);\n    free(data->intra_iterator_name[i]);\n    p = isl_printer_print_str(p, \" = \");\n    p = isl_printer_print_str(p, data->intra_iterator_lb[i]);\n    free(data->intra_iterator_lb[i]);\n    p = isl_printer_print_str(p, \"; \");\n    p = isl_printer_print_str(p, \"/* UB: \");\n    p = isl_printer_print_str(p, data->intra_iterator_ub[i]);\n    free(data->intra_iterator_ub[i]);\n    p = isl_printer_print_str(p, \" */\");\n    p = isl_printer_end_line(p);\n  }\n\n  return p;\n}\n\n/* Count the for level.\n */\nstatic __isl_give isl_printer *count_module_for(__isl_take isl_printer *p,\n                                                __isl_take isl_ast_print_options *print_options,\n                                                __isl_keep isl_ast_node *node, void *user)\n{\n  struct print_db_module_intel_data *data = (struct print_db_module_intel_data *)user;\n  isl_ast_node *body;\n\n  if (data->inter == -1)\n    data->outer_for_level++;\n  else if (data->inter == 0)\n    data->intra_for_level++;\n  else if (data->inter == 1)\n    data->inter_for_level++;\n\n  body = isl_ast_node_for_get_body(node);\n  p = isl_ast_node_print(body, p, print_options);\n  isl_ast_node_free(body);\n\n  return p;\n}                                                                                                \n\n/* Count the for level. A different implementation. \n * Currently only used for inter_trans module.\n * Since there might be if branches existing, only count one branch.\n * We assume the two branches are with the equal depth.\n */\nstatic isl_bool count_module_for_alt(__isl_keep isl_ast_node *node, void *user) {\n  struct print_db_module_intel_data *data = (struct print_db_module_intel_data *)user;\n  if (isl_ast_node_get_type(node) == isl_ast_node_if) {\n    data->under_if = 1;\n  }  \n\n  if (isl_ast_node_get_type(node) == isl_ast_node_for) {\n    if (data->under_if == 0 || (data->under_if == 1 && data->reach_user == 0)) {\n      data->inter_for_level++;    \n    }\n  }\n  if (isl_ast_node_get_type(node) == isl_ast_node_user) {\n    data->reach_user = 1;\n  }\n\n  return isl_bool_true;\n}\n\n/* Extract the loop information. \n */\nstatic __isl_give isl_printer *extract_module_for(__isl_take isl_printer *p,\n                                                  __isl_take isl_ast_print_options *print_options,\n                                                  __isl_keep isl_ast_node *node, void *user)\n{\n  struct print_db_module_intel_data *data = (struct print_db_module_intel_data *)user;\n  isl_ast_expr *iterator, *init, *cond, *ub;  \n  const char *iterator_suffix;\n  isl_printer *p_local, *p_str;  \n  char *text, *iter_str;\n  std::vector<char *> text_lines;\n  isl_ast_node *body;\n  int iter_exist = 0;\n\n  p_local = data->p_for;  \n\n  /* Extract the lower bound and upper bound. */\n  iterator = isl_ast_node_for_get_iterator(node);\n  init = isl_ast_node_for_get_init(node);\n  cond = isl_ast_node_for_get_cond(node);\n  ub = isl_ast_expr_op_get_arg(cond, 1);\n\n  p_str = isl_printer_to_str(isl_ast_node_get_ctx(node));\n  p_str = isl_printer_set_output_format(p_str, ISL_FORMAT_C);  \n  p_str = isl_printer_print_ast_expr(p_str, iterator);\n  iter_str = isl_printer_get_str(p_str);\n  if (data->inter == -1) {    \n  } else if (data->inter == 0) {    \n  } else if (data->inter == 1) {\n    for (int i = 0; i < data->inter_iterator_name.size(); i++) {\n      if (!strcmp(data->inter_iterator_name[i], iter_str))\n        iter_exist = 1;\n    }    \n  }  \n  free(iter_str);\n\n  if (iter_exist) {\n    isl_printer_free(p_str);\n\n    isl_ast_expr_free(iterator);\n    isl_ast_expr_free(init);\n    isl_ast_expr_free(cond);\n    isl_ast_expr_free(ub);\n\n    body = isl_ast_node_for_get_body(node);\n    p = isl_ast_node_print(body, p, print_options);\n    isl_ast_node_free(body);\n\n    return p;\n  }\n\n  if (data->inter == -1)\n    data->outer_iterator_name.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 0)\n    data->intra_iterator_name.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 1)\n    data->inter_iterator_name.push_back(isl_printer_get_str(p_str));\n  isl_printer_flush(p_str);\n\n  p_str = isl_printer_print_ast_expr(p_str, ub);\n  if (data->inter == -1)\n    data->outer_iterator_ub.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 0)\n    data->intra_iterator_ub.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 1)\n    data->inter_iterator_ub.push_back(isl_printer_get_str(p_str));\n  isl_printer_flush(p_str);\n\n  p_str = isl_printer_print_ast_expr(p_str, init);\n  if (data->inter == -1)\n    data->outer_iterator_lb.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 0)\n    data->intra_iterator_lb.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 1)\n    data->inter_iterator_lb.push_back(isl_printer_get_str(p_str));\n  isl_printer_free(p_str);\n\n  p_local = isl_printer_indent(p_local, -2);\n\n  p_local = isl_printer_start_line(p_local);    \n  p_local = isl_printer_print_ast_expr(p_local, iterator);\n  p_local = isl_printer_print_str(p_local, \"++;\");\n  p_local = isl_printer_end_line(p_local);\n  text = isl_printer_get_str(p_local);\n  text_lines.push_back(text);\n  p_local = isl_printer_flush(p_local);\n\n  p_local = isl_printer_start_line(p_local);\n  p_local = isl_printer_print_str(p_local, \"if (\");  \n  p_local = isl_printer_print_ast_expr(p_local, iterator);\n  p_local = isl_printer_print_str(p_local, \" == \"); \n  p_local = isl_printer_print_ast_expr(p_local, ub);\n  p_local = isl_printer_print_str(p_local, \" + 1) {\"); \n  p_local = isl_printer_end_line(p_local);\n  text = isl_printer_get_str(p_local);\n  text_lines.push_back(text);\n  p_local = isl_printer_flush(p_local);\n\n  p_local = isl_printer_indent(p_local, 2);\n  p_local = isl_printer_start_line(p_local);    \n  p_local = isl_printer_print_ast_expr(p_local, iterator);\n  p_local = isl_printer_print_str(p_local, \" = \");\n  p_local = isl_printer_print_ast_expr(p_local, init);\n  p_local = isl_printer_print_str(p_local, \";\");\n  p_local = isl_printer_end_line(p_local);\n  text = isl_printer_get_str(p_local);\n  text_lines.push_back(text);\n  p_local = isl_printer_flush(p_local);\n\n  if (data->inter == -1)\n    data->outer_for_logic.insert(data->outer_for_logic.begin(), text_lines.begin(), text_lines.end());\n  else if (data->inter == 0)\n    data->intra_for_logic.insert(data->intra_for_logic.begin(), text_lines.begin(), text_lines.end());\n  else if (data->inter == 1)\n    data->inter_for_logic.insert(data->inter_for_logic.begin(), text_lines.begin(), text_lines.end());\n\n  isl_ast_expr_free(iterator);\n  isl_ast_expr_free(init);\n  isl_ast_expr_free(cond);\n  isl_ast_expr_free(ub);\n\n  p_local = isl_printer_indent(p_local, -2);\n\n  body = isl_ast_node_for_get_body(node);\n  p = isl_ast_node_print(body, p, print_options);\n  isl_ast_node_free(body);\n\n  return p;\n}                                                                                           \n\nstatic void extract_double_buffer_module_intel_data(\n  struct autosa_hw_module *module, int boundary, \n  struct print_db_module_intel_data *data)\n{\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = module->kernel->ctx;\n  isl_printer *p_for, *p_user, *p;\n  const char *for_logic, *user_logic;\n\n  /* Outer module */\n  data->inter = -1;  \n  p = isl_printer_to_str(ctx);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p_for = isl_printer_to_str(ctx);\n  p_for = isl_printer_set_output_format(p_for, ISL_FORMAT_C);\n  p_user = isl_printer_to_str(ctx);\n  p_user = isl_printer_set_output_format(p_user, ISL_FORMAT_C);\n  data->p_for = p_for;\n  data->p_user = p_user;\n  data->outer_for_level = 0;\n\n  /* Count the for level first. */\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &count_module_for, data);\n  if (!boundary)\n    p = isl_ast_node_print(module->device_tree, p, print_options);\n  else\n    p = isl_ast_node_print(module->boundary_tree, p, print_options);\n\n  /* Extract the for and user logic. */\n  data->p_for = isl_printer_indent(data->p_for, 2 * data->outer_for_level);\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &extract_module_for, data);\n  if (!boundary)\n    p = isl_ast_node_print(module->device_tree, p, print_options);\n  else\n    p = isl_ast_node_print(module->boundary_tree, p, print_options);\n  isl_printer_free(p);  \n  isl_printer_free(data->p_for);\n  isl_printer_free(data->p_user);\n\n  /* Intra module */\n  data->inter = 0;\n  p = isl_printer_to_str(ctx);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p_for = isl_printer_to_str(ctx);\n  p_for = isl_printer_set_output_format(p_for, ISL_FORMAT_C);\n  p_user = isl_printer_to_str(ctx);\n  p_user = isl_printer_set_output_format(p_user, ISL_FORMAT_C);\n  data->p_for = p_for;\n  data->p_user = p_user;\n  data->intra_for_level = 0;\n\n  /* Count the for level first. */\n  print_options = isl_ast_print_options_alloc(ctx);  \n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &count_module_for, data);\n  p = isl_ast_node_print(module->intra_tree, p, print_options);  \n\n  /* Extract the for logic. */\n  data->p_for = isl_printer_indent(data->p_for, 2 * data->intra_for_level);\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &extract_module_for, data);  \n  p = isl_ast_node_print(module->intra_tree, p, print_options);  \n  isl_printer_free(p);  \n  isl_printer_free(data->p_for);\n  isl_printer_free(data->p_user);\n\n  /* Inter module */\n  data->inter = 1;\n  data->under_if = 0;\n  data->reach_user = 0;\n  p = isl_printer_to_str(ctx);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p_for = isl_printer_to_str(ctx);\n  p_for = isl_printer_set_output_format(p_for, ISL_FORMAT_C);\n  p_user = isl_printer_to_str(ctx);\n  p_user = isl_printer_set_output_format(p_user, ISL_FORMAT_C);\n  data->p_for = p_for;\n  data->p_user = p_user;  \n  data->inter_for_level = 0;\n\n  /* Count the for level first. */  \n  if (!boundary) {\n    isl_ast_node_foreach_descendant_top_down(module->inter_tree, &count_module_for_alt, data);\n  } else {        \n    isl_ast_node_foreach_descendant_top_down(module->boundary_inter_tree, &count_module_for_alt, data);    \n  }\n\n  /* Extract the for logic. */\n  data->p_for = isl_printer_indent(data->p_for, 2 * data->inter_for_level);\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &extract_module_for, data);\n  if (!boundary)\n    p = isl_ast_node_print(module->inter_tree, p, print_options);\n  else {    \n    p = isl_ast_node_print(module->boundary_inter_tree, p, print_options);    \n  }\n  isl_printer_free(p);  \n  isl_printer_free(data->p_for);\n  isl_printer_free(data->p_user);\n}\n\nstatic __isl_give isl_printer *print_null_for(__isl_take isl_printer *p,\n                                              __isl_take isl_ast_print_options *print_options,\n                                              __isl_keep isl_ast_node *node, void *user)\n{\n  isl_ast_node *body;\n  \n  body = isl_ast_node_for_get_body(node);\n  p = isl_ast_node_print(body, p, print_options);\n  isl_ast_node_free(body);\n\n  return p;\n}                                              \n\n/* Print the inter_trans module in double buffer mode. \n */\nstatic __isl_give isl_printer *autosa_print_inter_trans_module_double_buffer(\n  __isl_take isl_printer *p,\n  struct autosa_hw_module *module, struct autosa_prog *prog,\n  struct hls_info *hls, int boundary)\n{\n  //printf(\"here\\n\");\n  struct print_hw_module_data hw_data = {hls, prog, module, \"inter_c\"};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &print_null_for, &hw_data);\n\n  //if (boundary == 1)\n  //  DBGASTNODE(stdout, module->boundary_inter_tree, ctx);\n  p = isl_ast_node_print((boundary == 0) ? module->inter_tree : module->boundary_inter_tree, p, print_options);\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print the intra_trans module in double buffer mode. \n */\nstatic __isl_give isl_printer *autosa_print_intra_trans_module_double_buffer(\n  __isl_take isl_printer *p,\n  struct autosa_hw_module *module, struct autosa_prog *prog,\n  struct hls_info *hls, int boundary)\n{\n  struct print_hw_module_data hw_data = {hls, prog, module, \"intra_c\"};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &print_null_for, &hw_data);\n\n  p = isl_ast_node_print(module->intra_tree, p, print_options);\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Double buffer module on Intel devices needs to be handled specially.\n * First, we will change the buffer to \n * local_buffer[2][...][...].\n * Intel OpenCL compiler can't handle local_buffer_ping/local_buffer_pong properly.\n * Specifically, when handling a code structure:\n * [outer for loops]\n * for ...\n *   for ...\n * [outer for loops]\n * { \n *   if (arb) {\n *     ld(local_buffer_ping, ld_en);\n *     st(local_buffer_pong, st_en);\n *   else {\n *     ld(local_buffer_pong, ld_en);\n *     st(local_buffer_ping, st_en);\n *   }\n *   [state handle logic]\n *   arb = !arb;\n *   [state handle logic]\n * }\n * [last batch]\n * if (arb) {\n *   st(local_buffer_pong, st_en);\n * } else {\n *   st(local_buffer_ping, st_en);\n * }\n * [last batch]\n * We will convert it to a new code structure:\n * while (1) {\n *   if (ld_en) {\n *     [inlined logic]\n *     ld(local_buffer[arb][...]);\n *     [inlined logic]\n *   } \n *   if (st_en) {\n *     [inlined logic]\n *     st(local_buffer[!arb][...]);\n *     [inlined logic]\n *   }\n *   [state handle logic]\n *   arb = !arb;\n *   ld_en = 1;\n *   st_en = 1;\n *   [state handle logic]\n *   [outer for loops]\n *   outer_iter0++;\n *   if (outer_iter0 == ...) {\n *     outer_iter0 = 0;\n *     [last batch]\n *     ld_en = 0;\n *     [last batch]\n *   }\n *   [outer for loops]\n * }\n * \n * Note that this only works if each for loop structure is a perfectly \n * nested loop so that we could convert to a while loop.\n */\nstatic __isl_give isl_printer *print_double_buffer_module_while(\n  __isl_take isl_printer *p, struct autosa_hw_module *module,\n  struct autosa_prog *prog, struct hls_info *hls, int boundary)\n{\n  if (!boundary) {\n    if (!module->device_tree)\n      return p;    \n  } else {\n    if (!module->boundary_tree)\n      return p;\n  }\n\n  struct print_db_module_intel_data print_data;\n\n  /* Extract the code snippets. */\n  extract_double_buffer_module_intel_data(module, boundary, &print_data);\n\n  /* Print header */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  print_module_headers_intel(prog, module, hls, -1, boundary, 0);\n  p = print_str_new_line(p, \" {\");\n  p = isl_printer_indent(p, 2);\n\n  /* Print variables */\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = print_double_buffer_module_vars_intel(p, module, hls, &print_data);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  /* Print content */\n  p = print_str_new_line(p, \"while (1) {\");\n  p = isl_printer_indent(p, 2);\n  \n  /* Print inter_trans */\n  p = print_str_new_line(p, \"if (inter_trans_en) {\");\n  p = isl_printer_indent(p, 2);\n  /* Print the module logic */\n  p = autosa_print_inter_trans_module_double_buffer(p, module, prog, hls, boundary);\n  /* Print the loop counter */  \n  for (int i = 0; i < print_data.inter_for_logic.size(); i++) {    \n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, print_data.inter_for_logic[i]);\n    free(print_data.inter_for_logic[i]);\n  }\n  p = isl_printer_indent(p, 2 * print_data.inter_for_level);\n  p = print_str_new_line(p, \"inter_done = 1;\");\n  p = print_str_new_line(p, \"inter_trans_en = 0;\");\n  for (int i = 0; i < print_data.inter_for_level; i++) {\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n  \n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  /* Print intra_trans */\n  p = print_str_new_line(p, \"if (intra_trans_en) {\");\n  p = isl_printer_indent(p, 2);\n  /* Print the module logic */\n  p = autosa_print_intra_trans_module_double_buffer(p, module, prog, hls, boundary);\n  /* Print the loop counter */\n  for (int i = 0; i < print_data.intra_for_logic.size(); i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, print_data.intra_for_logic[i]);\n    free(print_data.intra_for_logic[i]);\n  }\n  p = isl_printer_indent(p, 2 * print_data.intra_for_level);\n  p = print_str_new_line(p, \"intra_done = 1;\");\n  p = print_str_new_line(p, \"intra_trans_en = 0;\");\n  for (int i = 0; i < print_data.intra_for_level; i++) {\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  /* Print state_handle */\n  p = print_str_new_line(p, \"if (inter_done && intra_done) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"intra_trans_en = 1;\");\n  p = print_str_new_line(p, \"inter_trans_en = 1;\");\n  p = print_str_new_line(p, \"intra_done = 0;\");\n  p = print_str_new_line(p, \"inter_done = 0;\");\n  p = print_str_new_line(p, \"arb = !arb;\");\n  /* Print the loop counter */\n  for (int i = 0; i < print_data.outer_for_logic.size(); i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, print_data.outer_for_logic[i]);\n    free(print_data.outer_for_logic[i]);\n  }\n  p = isl_printer_indent(p, 2 * print_data.outer_for_level);\n  p = print_str_new_line(p, module->in? \"inter_trans_en = 0;\" : \"intra_trans_en = 0;\");\n  for (int i = 0; i < print_data.outer_for_level; i++) {\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  /* If the module serialization is enabled, we will print out an extra module\n   * for serializing the data. */\n  if (module->to_mem && module->options->autosa->host_serialize) {\n    p = autosa_print_serialize_module(p, module, prog, hls, boundary);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *autosa_print_host_code(__isl_take isl_printer *p,\n                                                      struct autosa_prog *prog, __isl_keep isl_ast_node *tree,\n                                                      struct autosa_hw_module **modules, int n_modules,\n                                                      struct autosa_hw_top_module *top,\n                                                      struct autosa_drain_merge_func **drain_merge_funcs, int n_drain_merge_funcs,\n                                                      struct hls_info *hls)\n{\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_ast_node_get_ctx(tree);\n  struct print_host_user_data data = {hls, prog, top};\n  struct print_hw_module_data hw_data = {hls, prog, NULL, NULL};\n  isl_printer *p_module;\n\n  /* Print the data pack types in the program. */\n  print_data_types_intel(top, hls);\n\n  /* Print the helper functions in the program. */\n  print_drain_merge_funcs(top->kernel, drain_merge_funcs, n_drain_merge_funcs, hls);\n\n  /* Print the host data serialization function. */\n  print_host_serialize_funcs(top->kernel, modules, n_modules, hls);\n\n  /* Print the default AST. */\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_host_user_intel, &data);\n\n  /* Print the macros definitions in the program. */\n  p = autosa_print_macros(p, tree);\n  p = isl_ast_node_print(tree, p, print_options);\n\n  /* Print the hw module ASTs. */\n  p_module = isl_printer_to_file(ctx, hls->kernel_c);\n  p_module = isl_printer_set_output_format(p_module, ISL_FORMAT_C);\n\n  for (int i = 0; i < n_modules; i++)\n  {   \n    //std::cout << modules[i]->name << \" \" << module->device_tree << std::endl;\n    if (modules[i]->double_buffer && modules[i]->options->autosa->double_buffer_style == 0) {\n      /* We implement a different codegen for double buffer on Intel devices. */\n      p_module = print_double_buffer_module_while(p_module, modules[i], prog, hls, 0);\n      if (modules[i]->boundary) {\n        p_module = print_double_buffer_module_while(p_module, modules[i], prog, hls, 1);\n      }\n    } else {\n      if (modules[i]->is_filter && modules[i]->is_buffer)\n      {\n        /* Print out the definitions for inter_trans and intra_trans function calls. */\n        /* Intra transfer function */\n        p_module = autosa_print_intra_trans_module(p_module, modules[i], prog, hls, 0);\n  \n        /* Inter transfer function */\n        p_module = autosa_print_inter_trans_module(p_module, modules[i], prog, hls, 0);\n        if (modules[i]->boundary)\n          p_module = autosa_print_inter_trans_module(p_module, modules[i], prog, hls, 1);\n      }\n  \n      p_module = autosa_print_default_module(p_module, modules[i], prog, hls, 0);\n  \n      if (modules[i]->boundary)\n      {\n        /* Print out the definitions for boundary trans function calls. */\n        p_module = autosa_print_default_module(p_module, modules[i], prog, hls, 1);\n      }\n      if (modules[i]->n_pe_dummy_modules > 0)\n      {\n        /* Print out the definitions for pe dummy function calls. */\n        for (int j = 0; j < modules[i]->n_pe_dummy_modules; j++)\n        {\n          p_module = autosa_print_default_pe_dummy_module(\n              p_module, modules[i]->pe_dummy_modules[j], prog, hls, 0);\n        }\n      }\n    }\n  }\n  isl_printer_free(p_module);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_top_module_headers_intel(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_hw_top_module *top, struct hls_info *hls)\n{\n  struct autosa_kernel *kernel = top->kernel;\n\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"void kernel\");\n  p = isl_printer_print_int(p, top->kernel->id);\n  p = isl_printer_print_str(p, \"(\");\n  p = print_kernel_arguments(p, prog, top->kernel, 1, hls);\n  p = isl_printer_print_str(p, \")\\\");\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"{\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  return p;\n}\n\nstatic char *extract_fifo_name_from_fifo_decl_name(isl_ctx *ctx, char *fifo_decl_name)\n{\n  int loc = 0;\n  char ch;\n  isl_printer *p_str = isl_printer_to_str(ctx);\n  char *name = NULL;\n\n  while ((ch = fifo_decl_name[loc]) != '\\0')\n  {\n    if (ch == '.')\n      break;\n    char buf[2];\n    buf[0] = ch;\n    buf[1] = '\\0';\n    p_str = isl_printer_print_str(p_str, buf);\n    loc++;\n  }\n\n  name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  return name;\n}\n\nstatic char *extract_fifo_width_from_fifo_decl_name(isl_ctx *ctx, char *fifo_decl_name)\n{\n  int loc = 0;\n  char ch;\n  isl_printer *p_str = isl_printer_to_str(ctx);\n  char *name = NULL;\n\n  while ((ch = fifo_decl_name[loc]) != '\\0')\n  {\n    if (ch == '.')\n      break;\n    loc++;\n  }\n\n  loc++;\n\n  while ((ch = fifo_decl_name[loc]) != '\\0')\n  {\n    char buf[2];\n    buf[0] = ch;\n    buf[1] = '\\0';\n    p_str = isl_printer_print_str(p_str, buf);\n    loc++;\n  }\n\n  name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  return name;\n}\n\nstatic __isl_give isl_printer *print_top_module_fifo_stmt(__isl_take isl_printer *p,\n                                                          __isl_take isl_ast_print_options *print_options,\n                                                          __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  struct autosa_kernel_stmt *stmt;\n  struct print_hw_module_data *data = (struct print_hw_module_data *)(user);\n\n  id = isl_ast_node_get_annotation(node);\n  stmt = (struct autosa_kernel_stmt *)isl_id_get_user(id);\n  isl_id_free(id);\n\n  isl_ast_print_options_free(print_options);\n\n  switch (stmt->type)\n  {\n  case AUTOSA_KERNEL_STMT_FIFO_DECL:\n    return autosa_kernel_print_fifo_decl(p, stmt, data->prog, data->hls);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_top_module_call_stmt(\n    __isl_take isl_printer *p,\n    __isl_take isl_ast_print_options *print_options,\n    __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  struct autosa_kernel_stmt *stmt;\n  struct print_hw_module_data *data = (struct print_hw_module_data *)(user);\n\n  id = isl_ast_node_get_annotation(node);\n  stmt = (struct autosa_kernel_stmt *)isl_id_get_user(id);\n  isl_id_free(id);\n\n  isl_ast_print_options_free(print_options);\n\n  switch (stmt->type)\n  {\n  case AUTOSA_KERNEL_STMT_MODULE_CALL:\n    return autosa_kernel_print_module_call(p, stmt, data->prog, data->hls->target);\n  }\n\n  return p;\n}\n\n/* This function prints the code that prints out the top function that \n * calls the hardware modules and declares the fifos.\n */\nstatic void print_top_gen_host_code(\n    struct autosa_prog *prog, __isl_keep isl_ast_node *node,\n    struct autosa_hw_top_module *top, struct hls_info *hls)\n{\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_ast_node_get_ctx(node);\n  isl_printer *p;\n  int fifo_depth = prog->scop->options->autosa->fifo_depth;\n  struct print_hw_module_data hw_data = {hls, prog, NULL, NULL};\n\n  /* Print the top module ASTs. */\n  p = isl_printer_to_file(ctx, hls->top_gen_c);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n\n  print_top_gen_headers(prog, top, hls);\n  fprintf(hls->top_gen_c, \" {\\n\");\n  p = isl_printer_indent(p, 2);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"FILE *fd = fopen(\\\"\");\n  p = isl_printer_print_str(p, hls->output_dir);\n  p = isl_printer_print_str(p, \"/resource_est/design_info.dat\\\", \\\"w\\\");\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"int fifo_cnt;\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"isl_ctx *ctx = isl_ctx_alloc();\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"isl_printer *p = isl_printer_to_file(ctx, f);\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_end_line(p);\n\n  p = print_top_module_headers_intel(p, prog, top, hls); // TODO\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_indent(p, 2);\");\n  p = isl_printer_end_line(p);\n\n  /* Print FIFO declarations */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_start_line(p);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* FIFO Declaration */\\\");\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_end_line(p);\n\n  /* Print the serialize fifos if existing. */\n  for (int i = 0; i < top->n_hw_modules; i++) {\n    struct autosa_hw_module *module = top->hw_modules[i];\n    struct autosa_array_ref_group *group = module->io_groups[0];\n    if (module->is_serialized) {\n      /* Generate fifo decl counter. */\n      char *fifo_name;\n      int fifo_w;  // bytes\n      fifo_w = module->data_pack_inter * group->array->size;\n      isl_printer *p_str;\n      p_str = isl_printer_to_str(ctx);\n      p_str = autosa_array_ref_group_print_fifo_name(group, p_str);\n      p_str = isl_printer_print_str(p_str, \"_\");\n      p_str = isl_printer_print_str(p_str, module->name);\n      p_str = isl_printer_print_str(p_str, \"_serialize\");\n      fifo_name = isl_printer_get_str(p_str);\n      isl_printer_free(p_str);\n\n      p = print_str_new_line(p, \"fifo_cnt = 1;\");\n      p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* \");\n      p = isl_printer_print_str(p, module->name);\n      p = isl_printer_print_str(p, \"_serialize fifo */ \");      \n      p = print_fifo_type_intel(p, group, module->data_pack_inter);\n      p = isl_printer_print_str(p, \" \");\n      p = isl_printer_print_str(p, fifo_name);            \n      p = isl_printer_print_str(p, \"\\\");\");      \n      p = isl_printer_end_line(p);      \n\n      /* Resource pragma */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\" __attribute__((depth(\");\n      p = isl_printer_print_int(p, fifo_depth);\n      p = isl_printer_print_str(p, \")));\\\");\");\n      p = isl_printer_end_line(p);\n\n      p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n      /* fifo:fifo_name:fifo_cnt:fifo_width */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"fprintf(fd, \\\"fifo:\");\n      p = isl_printer_print_str(p, fifo_name);\n      p = isl_printer_print_str(p, \":\\%d:\");\n      p = isl_printer_print_int(p, fifo_w);\n      p = isl_printer_print_str(p, \"\\\\n\\\", fifo_cnt);\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_end_line(p);      \n      free(fifo_name);\n    }\n  }\n\n  for (int i = 0; i < top->n_fifo_decls; i++)\n  {\n    /* Generate fifo decl counter. */\n    char *fifo_decl_name = top->fifo_decl_names[i];\n    char *fifo_name = extract_fifo_name_from_fifo_decl_name(ctx, fifo_decl_name);\n    char *fifo_w = extract_fifo_width_from_fifo_decl_name(ctx, fifo_decl_name);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"fifo_cnt = 0;\");\n    p = isl_printer_end_line(p);\n\n    /* Print AST */\n    print_options = isl_ast_print_options_alloc(ctx);\n    print_options = isl_ast_print_options_set_print_user(print_options,\n                                                         &print_top_module_fifo_stmt, &hw_data);\n\n    p = isl_ast_node_print(top->fifo_decl_wrapped_trees[i],\n                           p, print_options);\n\n    /* fifo:fifo_name:fifo_cnt:fifo_width */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"fprintf(fd, \\\"fifo:\");\n    p = isl_printer_print_str(p, fifo_name);\n    p = isl_printer_print_str(p, \":\\%d:\");\n    p = isl_printer_print_str(p, fifo_w);\n    p = isl_printer_print_str(p, \"\\\\n\\\", fifo_cnt);\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_end_line(p);\n\n    free(fifo_name);\n    free(fifo_w);\n  }\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_start_line(p);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* FIFO Declaration */\\\");\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n  p = isl_printer_end_line(p);\n\n  int n_module_names = 0;\n  char **module_names = NULL;\n  for (int i = 0; i < top->n_hw_modules; i++)\n  {\n    /* Generate module call counter. */\n    struct autosa_hw_module *module = top->hw_modules[i];\n    char *module_name;\n\n    if (module->is_filter && module->is_buffer)\n    {\n      module_name = concat(ctx, module->name, \"intra_trans\");\n\n      n_module_names++;\n      module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n      module_names[n_module_names - 1] = module_name;\n\n      module_name = concat(ctx, module->name, \"inter_trans\");\n\n      n_module_names++;\n      module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n      module_names[n_module_names - 1] = module_name;\n\n      if (module->boundary)\n      {\n        module_name = concat(ctx, module->name, \"inter_trans_boundary\");\n\n        n_module_names++;\n        module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n        module_names[n_module_names - 1] = module_name;\n      }\n    }\n\n    module_name = strdup(module->name);\n\n    n_module_names++;\n    module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n    module_names[n_module_names - 1] = module_name;\n\n    if (module->boundary)\n    {\n      module_name = concat(ctx, module->name, \"boundary\");\n\n      n_module_names++;\n      module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n      module_names[n_module_names - 1] = module_name;\n    }\n\n    if (module->n_pe_dummy_modules > 0)\n    {\n      for (int j = 0; j < module->n_pe_dummy_modules; j++)\n      {\n        struct autosa_pe_dummy_module *dummy_module = module->pe_dummy_modules[j];\n        struct autosa_array_ref_group *group = dummy_module->io_group;\n        isl_printer *p_str = isl_printer_to_str(ctx);\n        p_str = autosa_array_ref_group_print_prefix(group, p_str);\n        p_str = isl_printer_print_str(p_str, \"_PE_dummy\");\n        p_str = isl_printer_print_str(p_str, dummy_module->in? \"_in\" : \"_out\");\n        module_name = isl_printer_get_str(p_str);\n        isl_printer_free(p_str);\n\n        n_module_names++;\n        module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n        module_names[n_module_names - 1] = module_name;\n      }\n    }\n\n    if (module->is_serialized) { \n      if (module->boundary)      \n        module_name = concat(ctx, module->name, \"boundary_serialize\");\n      else\n        module_name = concat(ctx, module->name, \"serialize\");\n      \n      n_module_names++;\n      module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n      module_names[n_module_names - 1] = module_name;\n    }\n  }\n  for (int i = 0; i < n_module_names; i++)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, module_names[i]);\n    p = isl_printer_print_str(p, \"_cnt = 0;\");\n    p = isl_printer_end_line(p);\n  }\n\n  /* Print module calls. */\n  for (int i = 0; i < top->n_module_calls; i++)\n  {\n    /* Print AST */\n    print_options = isl_ast_print_options_alloc(ctx);\n    print_options = isl_ast_print_options_set_print_user(print_options,\n                                                         &print_top_module_call_stmt, &hw_data);\n\n    p = isl_ast_node_print(top->module_call_wrapped_trees[i],\n                           p, print_options);\n  }\n\n  /* module:module_name:module_cnt. */\n  for (int i = 0; i < n_module_names; i++)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"fprintf(fd, \\\"module:\");\n    p = isl_printer_print_str(p, module_names[i]);\n    p = isl_printer_print_str(p, \":\\%d\\\\n\\\", \");\n    p = isl_printer_print_str(p, module_names[i]);\n    p = isl_printer_print_str(p, \"_cnt);\");\n    p = isl_printer_end_line(p);\n  }\n  p = isl_printer_end_line(p);\n\n  for (int i = 0; i < n_module_names; i++)\n  {\n    free(module_names[i]);\n  }\n  free(module_names);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_indent(p, -2);\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"}\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  if (hls->target == XILINX_HW)\n  {\n    if (!hls->hls)\n    {\n      p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n      p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"}\\\");\");\n      p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n    }\n  }\n\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"fclose(fd);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"isl_printer_free(p);\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"isl_ctx_free(ctx);\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_indent(p, -2);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"}\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_end_line(p);\n\n  /* For internal testing only. */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"int main()\");\n  p = isl_printer_end_line(p);\n\n  p = ppcg_start_block(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"FILE *f = fopen(\\\"\");\n  p = isl_printer_print_str(p, hls->output_dir);\n  p = isl_printer_print_str(p, \"/src/top.cpp\\\", \\\"w\\\");\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"top_generate(f);\");\n  p = isl_printer_end_line(p);\n\n  p = ppcg_end_block(p);\n  p = isl_printer_free(p);\n\n  return;\n}\n\n/* Examine if all autorun modules are legal to be used as autorun.\n * Specifically, for Intel OpenCL, we examine for each non external module \n * (modules that are not connected to the external memory), if there is only\n * index and fifos in the arguments.\n */\nstatic int is_autorun_legal(struct autosa_prog *prog,\n                            struct autosa_hw_module **modules, int n_modules)\n{\n  for (int i = 0; i < n_modules; i++)\n  {\n    struct autosa_hw_module *module = modules[i];\n    if (module->to_mem)\n      continue;\n\n    isl_space *space;\n    int nparam, n;\n\n    /* param */\n    space = isl_union_set_get_space(module->kernel->arrays);\n    nparam = isl_space_dim(space, isl_dim_param);\n    isl_space_free(space);\n    if (nparam > 0)\n      return 0;\n    /* host iter */\n    n = isl_space_dim(module->space, isl_dim_set);\n    if (n > 0)\n      return 0;\n    /* scalar */\n    if (module->type == PE_MODULE)\n    {\n      for (int i = 0; i < prog->n_array; i++)\n      {\n        int required;\n        required = autosa_kernel_requires_array_argument(module->kernel, i);\n        if (required)\n        {\n          if (autosa_array_is_read_only_scalar(&prog->array[i]))\n            return 0;\n        }\n      }\n    }\n  }\n\n  return 1;\n}\n\n/* Given a autosa_prog \"prog\" and the corresponding tranformed AST\n * \"tree\", print the entire OpenCL/HLS code to \"p\".\n * \"types\" collects the types for which a definition has already been\n * printed.\n */\nstatic __isl_give isl_printer *print_hw(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, __isl_keep isl_ast_node *tree,\n    struct autosa_hw_module **modules, int n_modules,\n    struct autosa_hw_top_module *top_module,\n    struct autosa_drain_merge_func **drain_merge_funcs, int n_drain_merge_funcs,\n    struct autosa_types *types, void *user)\n{\n  struct hls_info *hls = (struct hls_info *)user;\n  isl_printer *kernel;\n  int legal;\n\n  kernel = isl_printer_to_file(isl_printer_get_ctx(p), hls->kernel_c);\n  kernel = isl_printer_set_output_format(kernel, ISL_FORMAT_C);\n  kernel = autosa_print_types(kernel, types, prog);\n  isl_printer_free(kernel);\n\n  if (!kernel)\n    return isl_printer_free(p);\n\n  /* Examine if autorun kernels are legal. */\n  legal = is_autorun_legal(prog, modules, n_modules);\n  if (!legal)\n  {\n    printf(\"[AutoSA] Error: Autorun kernels not legal! Abort the code generation.\\n\");\n    return p;\n  }\n\n  /* Print OpenCL host and kernel function. */\n  p = autosa_print_host_code(p, prog, tree, modules, n_modules, top_module,\n                             drain_merge_funcs, n_drain_merge_funcs, hls);\n  /* Print seperate top module code generation function. */\n  print_top_gen_host_code(prog, tree, top_module, hls);\n\n  return p;\n}\n\n/* Generate systolic array on Intel FPGAs.\n */\nint generate_autosa_intel_opencl(isl_ctx *ctx, struct ppcg_options *options,\n                                 const char *input)\n{\n  struct hls_info hls;\n  int r;\n\n  hls.target = INTEL_HW;\n  hls.hls = 0;\n  hls.ctx = ctx;\n  hls.output_dir = options->autosa->output_dir;\n  hls.hcl = options->autosa->hcl;\n  opencl_open_files(&hls, input);\n\n  r = generate_sa(ctx, input, hls.host_c, options, &print_hw, &hls);\n\n  opencl_close_files(&hls);\n\n  return r;\n}\n"
  },
  {
    "path": "src/autosa_intel_opencl.h",
    "content": "#ifndef _AUTOSA_INTEL_OPENCL_H\n#define _AUTOSA_INTEL_OPENCL_H\n\n#include <pet.h>\n#include \"ppcg_options.h\"\n#include \"ppcg.h\"\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\nint generate_autosa_intel_opencl(isl_ctx *ctx, struct ppcg_options *options,\n\tconst char *input);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif"
  },
  {
    "path": "src/autosa_print.cpp",
    "content": "/* Helper functions in codegen */\n#include <assert.h>\n#include <cmath>\n\n#include \"autosa_print.h\"\n#include \"autosa_utils.h\"\n#include \"autosa_comm.h\"\n#include \"print.h\"\n\nconst char *vector_index[] = {\"0\", \"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\",\n                              \"8\", \"9\", \"a\", \"b\", \"c\", \"d\", \"e\", \"f\"};\n\nenum IO_TRANS_DIR {GLOBAL_BUF, LOCAL_BUF, FIFO};\n\n/* Print the call of an array argument.\n */\n__isl_give isl_printer *autosa_array_info_print_call_argument(\n  __isl_take isl_printer *p, struct autosa_array_info *array, int n_ref, const char *prefix)\n{\n  if (autosa_array_is_read_only_scalar(array))\n    return isl_printer_print_str(p, array->name);\n\n  if (strlen(prefix) > 0) {\n    p = isl_printer_print_str(p, prefix);\n    p = isl_printer_print_str(p, \"_\");\n  }  \n  p = isl_printer_print_str(p, array->name);\n  if (n_ref >= 0)\n  {    \n    //auto ref_port_map = array->local_array->group_ref_mem_port_map.at(n_ref);\n    p = isl_printer_print_str(p, \"[\");\n    //p = isl_printer_print_int(p, ref_port_map.second);    \n    p = isl_printer_print_int(p, array->local_array->group_ref_mem_port_map.at(n_ref * 2 + 1));\n    p = isl_printer_print_str(p, \"]\");\n  }\n\n  return p;\n}\n\n/* Print the array group name prefix.\n * [array_name]_[group_id](optional)_[drain](optional)\n */\n__isl_give isl_printer *autosa_array_ref_group_print_prefix(\n    struct autosa_array_ref_group *group, __isl_take isl_printer *p)\n{\n  p = isl_printer_print_str(p, group->array->name);\n  if (group->group_type == AUTOSA_DRAIN_GROUP)\n  {\n    p = isl_printer_print_str(p, \"_drain\");\n  }\n  else\n  {\n    if (group->group_type == AUTOSA_IO_GROUP && group->local_array->n_io_group > 1)\n    {\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_int(p, group->nr);\n    }\n    else if (group->group_type == AUTOSA_PE_GROUP && group->local_array->n_pe_group > 1)\n    {\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_int(p, group->nr);\n    }\n  }\n\n  return p;\n}\n\n/* Print the name of the local copy of a given group of array references.\n */\n__isl_give isl_printer *autosa_array_ref_group_print_fifo_name(\n    struct autosa_array_ref_group *group, __isl_take isl_printer *p)\n{\n  int global = 0;\n  enum autosa_group_access_type type;\n\n  if (group->group_type == AUTOSA_PE_GROUP)\n    return p;\n  \n  p = isl_printer_print_str(p, \"fifo_\");\n  p = isl_printer_print_str(p, group->array->name);\n  if (group->group_type == AUTOSA_IO_GROUP) {\n    if (group->local_array->n_io_group > 1)\n    {\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_int(p, group->nr);\n    }\n  } else if (group->group_type == AUTOSA_DRAIN_GROUP)\n  {\n    p = isl_printer_print_str(p, \"_drain\");\n  }\n\n  return p;\n}\n\n/* Was the definition of \"type\" printed before?\n * That is, does its name appear in the list of printed types \"types\"?\n */\nstatic int already_printed(struct autosa_types *types,\n                           struct pet_type *type)\n{\n  int i;\n\n  for (i = 0; i < types->n; ++i)\n    if (!strcmp(types->name[i], type->name))\n      return 1;\n\n  return 0;\n}\n\n/* Print the definitions of all types prog->scop that have not been\n * printed before (according to \"types\") on \"p\".\n * Extend the list of printed types \"types\" with the newly printed types.\n */\n__isl_give isl_printer *autosa_print_types(__isl_take isl_printer *p,\n                                           struct autosa_types *types, struct autosa_prog *prog)\n{\n  int i, n;\n  isl_ctx *ctx;\n  char **name;\n\n  n = prog->scop->pet->n_type;\n\n  if (n == 0)\n    return p;\n\n  ctx = isl_printer_get_ctx(p);\n  name = isl_realloc_array(ctx, types->name, char *, types->n + n);\n  if (!name)\n    return isl_printer_free(p);\n  types->name = name;\n\n  for (i = 0; i < n; ++i)\n  {\n    struct pet_type *type = prog->scop->pet->types[i];\n\n    if (already_printed(types, type))\n      continue;\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, type->definition);\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n\n    types->name[types->n++] = strdup(type->name);\n  }\n\n  return p;\n}\n\n/* Print declarations to \"p\" for arrays that are local to \"prog\"\n * but that are used on the host and therefore require a declaration.\n */\n__isl_give isl_printer *autosa_print_local_declarations(\n    __isl_take isl_printer *p, struct autosa_prog *prog)\n{\n  int i;\n\n  if (!prog)\n    return isl_printer_free(p);\n\n  for (i = 0; i < prog->n_array; ++i)\n  {\n    struct autosa_array_info *array = &prog->array[i];\n    isl_ast_expr *size;\n\n    if (!array->declare_local)\n      continue;\n    size = array->declared_size;\n    p = ppcg_print_declaration_with_size(p, array->type, size);\n  }\n\n  return p;\n}\n\n__isl_give isl_printer *print_str_new_line(__isl_take isl_printer *p, const char *str)\n{\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, str);\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print an expression for the size of \"array\" in data items.\n */\n__isl_give isl_printer *autosa_array_info_print_data_size(\n    __isl_take isl_printer *p, struct autosa_array_info *array)\n{\n  int i;\n  int first = 1;\n\n  for (i = 0; i < array->n_index; ++i)\n  {\n    if (!first)\n      p = isl_printer_print_str(p, \" * \");\n\n    isl_ast_expr *bound;\n\n    p = isl_printer_print_str(p, \"(\");\n    bound = isl_ast_expr_get_op_arg(array->bound_expr, 1 + i);\n    p = isl_printer_print_ast_expr(p, bound);\n    isl_ast_expr_free(bound);\n    p = isl_printer_print_str(p, \")\");\n    first = 0;\n  }\n\n  if (array->local_array->is_sparse) {\n    p = isl_printer_print_str(p, \" / \");\n    p = isl_printer_print_double(p, (double)array->local_array->eff_compress_ratio);\n  }\n\n  return p;\n}\n\n/* Print an expression for the size of \"array\" in bytes.\n */\n__isl_give isl_printer *autosa_array_info_print_size(\n    __isl_take isl_printer *p, struct autosa_array_info *array)\n{\n  int i;\n\n  for (i = 0; i < array->n_index; ++i)\n  {\n    isl_ast_expr *bound;\n\n    p = isl_printer_print_str(p, \"(\");\n    bound = isl_ast_expr_get_op_arg(array->bound_expr, 1 + i);\n    p = isl_printer_print_ast_expr(p, bound);\n    isl_ast_expr_free(bound);\n    p = isl_printer_print_str(p, \") * \");\n  }\n  p = isl_printer_print_str(p, \"sizeof(\");\n  p = isl_printer_print_str(p, array->type);\n  p = isl_printer_print_str(p, \")\");\n\n  return p;\n}\n\n/* Print an expression for the size of \"array\" in bytes.\n */\n__isl_give isl_printer *autosa_array_info_print_serialize_data_size(\n    __isl_take isl_printer *p, struct autosa_array_info *array)\n{  \n  p = isl_printer_print_pw_qpolynomial(p, array->local_array->serialize_bound);\n  if (array->local_array->is_sparse) {\n    p = isl_printer_print_str(p, \" / \");\n    p = isl_printer_print_double(p, (double)array->local_array->eff_compress_ratio);\n  }\n\n  return p;\n}\n\n/* Print an expression for the size of \"array\" in bytes.\n */\n__isl_give isl_printer *autosa_array_info_print_serialize_size(\n    __isl_take isl_printer *p, struct autosa_array_info *array)\n{\n  p = isl_printer_print_str(p, \"(\");\n  p = isl_printer_print_pw_qpolynomial(p, array->local_array->serialize_bound);\n  if (array->local_array->is_sparse) {\n    p = isl_printer_print_str(p, \" / \");\n    p = isl_printer_print_double(p, (double)array->local_array->eff_compress_ratio);\n  }\n  p = isl_printer_print_str(p, \") * \");\n  p = isl_printer_print_str(p, \"sizeof(\");\n  p = isl_printer_print_str(p, array->type);\n  p = isl_printer_print_str(p, \")\");\n\n  return p;\n}\n\n__isl_give isl_printer *autosa_print_array_type(__isl_take isl_printer *p,\n                                                struct autosa_array_info *array)\n{\n  int n_lane = array->n_lane;\n  if (n_lane == 1)\n    p = isl_printer_print_str(p, array->type);\n  else\n  {\n    p = isl_printer_print_str(p, array->name);\n    p = isl_printer_print_str(p, \"_t\");\n    p = isl_printer_print_int(p, n_lane);\n  }\n\n  return p;\n}\n\n__isl_give isl_printer *autosa_print_array_type_with_lane(\n  __isl_take isl_printer *p,\n  struct autosa_array_info *array, int n_lane)\n{\n  //if (n_lane == 1)\n  //  p = isl_printer_print_str(p, array->type);\n  //else {\n    p = isl_printer_print_str(p, array->name);\n    p = isl_printer_print_str(p, \"_t\");\n    p = isl_printer_print_int(p, n_lane);\n  //}\n  return p;\n}\n\n__isl_give isl_printer *autosa_print_array_type_with_lane_sparse(\n  __isl_take isl_printer *p,\n  struct autosa_array_info *array, int n_lane)\n{\n  p = isl_printer_print_str(p, array->name);\n  p = isl_printer_print_str(p, \"_s_t\");\n  p = isl_printer_print_int(p, n_lane);\n\n  return p;\n}\n\n__isl_give isl_printer *autosa_kernel_print_domain(__isl_take isl_printer *p,\n                                                   struct autosa_kernel_stmt *stmt)\n{\n  return pet_stmt_print_body(stmt->u.d.stmt->stmt, p, stmt->u.d.ref2expr);\n}\n\n/* Print the declaration of a non-linearized array argument.\n */\nstatic __isl_give isl_printer *print_non_linearized_declaration_argument(\n    __isl_take isl_printer *p, struct autosa_array_info *array, int n_lane)\n{\n  if (n_lane == 1)\n  {\n    p = isl_printer_print_str(p, array->type);\n    p = isl_printer_print_str(p, \" \");\n\n    p = isl_printer_print_ast_expr(p, array->bound_expr);\n  }\n  else\n  {\n    p = isl_printer_print_str(p, array->name);\n    p = isl_printer_print_str(p, \"_t\");\n    p = isl_printer_print_int(p, n_lane);\n    p = isl_printer_print_str(p, \" \");\n\n    p = isl_printer_print_ast_expr(p, array->bound_expr);\n  }\n\n  return p;\n}\n\n/* Print the declaration of an array argument.\n * \"memory_space\" allows to specify a memory space prefix.\n */\n__isl_give isl_printer *autosa_array_info_print_declaration_argument(\n    __isl_take isl_printer *p, struct autosa_array_info *array, int n_lane,\n    const char *memory_space, int n_ref, char *mem_port_map, enum platform target)\n{\n  int mem_port = -1;\n  if (mem_port_map) {\n    /* This is only for Intel HBM. We will assign the different array to different HBM channel. */\n    isl_union_map *umap;\n\n    umap = extract_sizes_from_str(isl_printer_get_ctx(p), mem_port_map);\n    mem_port = read_mem_port_map(umap, array->name);\n    isl_union_map_free(umap);\n  }\n\n  if (autosa_array_is_read_only_scalar(array))\n  {\n    p = isl_printer_print_str(p, array->type);\n    p = isl_printer_print_str(p, \" \");\n    p = isl_printer_print_str(p, array->name);\n    return p;\n  }\n\n  if (memory_space)\n  {\n    p = isl_printer_print_str(p, memory_space);\n    p = isl_printer_print_str(p, \" \");\n  }\n  if (mem_port != -1) {\n    p = isl_printer_print_str(p, \"__attribute__((buffer_location(\\\"HBM\");\n    p = isl_printer_print_int(p, mem_port);\n    p = isl_printer_print_str(p, \"\\\"))) \");\n  }\n\n  if (array->n_index != 0 && !array->linearize)\n    return print_non_linearized_declaration_argument(p, array, n_lane);\n\n  if (target == TAPA_HW) {\n    if (array->copy_in) {\n      if (array->copy_out)\n        p = isl_printer_print_str(p, \"tapa::read_write_mmap<\");\n      else\n        p = isl_printer_print_str(p, \"tapa::read_only_mmap<\");\n    } else if (array->copy_out)\n      p = isl_printer_print_str(p, \"tapa::write_only_mmap<\");\n    else\n      p = isl_printer_print_str(p, \"tapa::placeholder_mmap<\");\n  }\n\n  //if (n_lane == 1)\n  //  p = isl_printer_print_str(p, array->type);\n  //else\n  //{\n    p = isl_printer_print_str(p, array->name);\n    p = isl_printer_print_str(p, \"_t\");\n    p = isl_printer_print_int(p, n_lane);\n  //}\n  if (target == TAPA_HW)\n    p = isl_printer_print_str(p, \">\");\n  p = isl_printer_print_str(p, \" \");\n  if (target != CATAPULT_HW && target != TAPA_HW)\n    p = isl_printer_print_str(p, \"*\");\n  if (target == INTEL_HW)\n    p = isl_printer_print_str(p, \"restrict \");\n\n  p = isl_printer_print_str(p, array->name);\n  if (n_ref >= 0)\n  {\n    p = isl_printer_print_str(p, \"_\");\n    p = isl_printer_print_int(p, n_ref);\n  }\n  if (target == CATAPULT_HW) {\n    if (array->local_array->host_serialize) {\n      p = isl_printer_print_str(p, \"[\");\n      p = isl_printer_print_pw_qpolynomial(p, array->local_array->serialize_bound);\n      p = isl_printer_print_str(p, \" / \");      \n      p = isl_printer_print_int(p, n_lane);\n      p = isl_printer_print_str(p, \"]\");      \n    } else {\n      throw std::runtime_error(\"[AutoSA] Error: Non-serialized array not supported for Catapult HLS yet.\");\n    }\n  }\n\n  return p;\n}\n\n/* Print the arguments to a kernel declaration or call.  If \"types\" is set,\n * then print a declaration (including the types of the arguments).\n *\n * The arguments are printed in the following order\n * - the arrays accessed by the kernel\n * - the parameters\n * - the host loop iterators\n */\n__isl_give isl_printer *print_kernel_arguments(__isl_take isl_printer *p,\n                                               struct autosa_prog *prog, \n                                               struct autosa_kernel *kernel,\n                                               int types, struct hls_info *hls)\n{\n  int i, n;\n  int first = 1;\n  unsigned nparam;\n  isl_space *space;\n  const char *type;\n  int fifo_depth = prog->scop->options->autosa->fifo_depth;\n\n  /* Arrays */\n  for (i = 0; i < kernel->n_array; ++i)\n  {\n    int required;\n    int n_lane;\n\n    required = autosa_kernel_requires_array_argument(kernel, i);\n    if (required < 0)\n      return isl_printer_free(p);\n    if (!required)\n      continue;\n\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    n_lane = local_array->n_lane;\n    if (hls->target == INTEL_HW ||\n        hls->target == CATAPULT_HW ||\n        (hls->target == TAPA_HW && local_array->n_io_group_refs == 1) ||\n        (hls->target == XILINX_HW && local_array->n_io_group_refs == 1))\n    {\n      if (!first)\n        p = isl_printer_print_str(p, \", \");\n\n      if (types) {\n        if (prog->scop->options->autosa->axi_stream) {\n          p = autosa_fifo_print_declaration_arguments(p, local_array->io_groups[0], n_lane, NULL, hls->target, fifo_depth, NULL);\n        } else {\n          p = autosa_array_info_print_declaration_argument(\n                p, local_array->array, n_lane, NULL, -1, NULL, hls->target);\n        }        \n      } else {                \n        if (prog->scop->options->autosa->axi_stream) {\n          p = autosa_array_info_print_call_argument(p,\n                                                    local_array->array, -1, \"fifo\");\n        } else {\n          p = autosa_array_info_print_call_argument(p,\n                                                    local_array->array, 0, \"buffer\");\n          if (hls->target == TAPA_HW) {\n            p = isl_printer_print_str(p, \".vectorized<\");\n            p = isl_printer_print_int(p, n_lane);\n            p = isl_printer_print_str(p, \">()\");\n          }\n        }\n      }\n\n      first = 0;\n    }\n    else\n    {\n      for (int j = 0; j < local_array->n_io_group_refs; j++)\n      {\n        if (!first)\n          p = isl_printer_print_str(p, \", \");\n\n        if (types)\n          p = autosa_array_info_print_declaration_argument(\n                p, local_array->array, n_lane, NULL, j, NULL, hls->target);\n        else\n        {\n          p = autosa_array_info_print_call_argument(p,\n                                                    local_array->array, j, \"buffer\");\n          if (hls->target == TAPA_HW) {\n            p = isl_printer_print_str(p, \".vectorized<\");\n            p = isl_printer_print_int(p, n_lane);\n            p = isl_printer_print_str(p, \">()\");\n          }\n        }\n\n        first = 0;\n      }\n    }\n  }\n\n  /* Parameters */\n  space = isl_union_set_get_space(kernel->arrays);\n  nparam = isl_space_dim(space, isl_dim_param);\n  for (i = 0; i < nparam; ++i)\n  {\n    const char *name;\n\n    name = isl_space_get_dim_name(space, isl_dim_param, i);\n\n    if (!first)\n      p = isl_printer_print_str(p, \", \");\n    if (types)\n      p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, name);\n\n    first = 0;\n  }\n  isl_space_free(space);\n\n  /* Host loop iterators */\n  n = isl_space_dim(kernel->space, isl_dim_set);\n  type = isl_options_get_ast_iterator_type(prog->ctx);\n  for (i = 0; i < n; ++i)\n  {\n    const char *name;\n\n    if (!first)\n      p = isl_printer_print_str(p, \", \");\n    name = isl_space_get_dim_name(kernel->space, isl_dim_set, i);\n    if (types)\n    {\n      p = isl_printer_print_str(p, type);\n      p = isl_printer_print_str(p, \" \");\n    }\n    p = isl_printer_print_str(p, name);\n\n    first = 0;\n  }\n\n  return p;\n}\n\n/* Print the header of the given kernel.\n */\n__isl_give isl_printer *print_kernel_header(\n  __isl_take isl_printer *p, struct autosa_prog *prog, \n  struct autosa_kernel *kernel, struct hls_info *hls, int types)\n{\n  p = isl_printer_start_line(p);\n  if (types)\n    p = isl_printer_print_str(p, \"void \");\n  if (hls->hcl) \n    p = isl_printer_print_str(p, \"autosa_func\");\n  else\n    p = isl_printer_print_str(p, \"kernel0\");\n  //p = isl_printer_print_int(p, kernel->id);\n  p = isl_printer_print_str(p, \"(\");\n  p = print_kernel_arguments(p, prog, kernel, types, hls);\n  p = isl_printer_print_str(p, \")\");\n\n  return p;\n}\n\n/* This function is called for each node in a AutoSA AST.\n * In case of a user node, print the macro definitions required\n * for printing the AST expressions in the annotation, if any.\n * For other nodes, return true such that descendants are also\n * visited.\n *\n * In particular, for a kernel launch, print the macro definitions\n * needed for the grid size.\n * For a copy statement, print the macro definitions needed\n * for the two index expressions.\n * For an original user statement, print the macro definitions\n * needed for the substitutions.\n */\nstatic isl_bool at_node(__isl_keep isl_ast_node *node, void *user)\n{\n  const char *name;\n  isl_id *id;\n  int is_kernel;\n  struct autosa_kernel *kernel;\n  struct autosa_kernel_stmt *stmt;\n  isl_printer **p = (isl_printer **)user;\n\n  if (isl_ast_node_get_type(node) != isl_ast_node_user)\n    return isl_bool_true;\n\n  id = isl_ast_node_get_annotation(node);\n  if (!id)\n    return isl_bool_false;\n\n  name = isl_id_get_name(id);\n  if (!name)\n    return isl_bool_error;\n  is_kernel = !strcmp(name, \"kernel\");\n  kernel = is_kernel ? (struct autosa_kernel *)isl_id_get_user(id) : NULL;\n  stmt = is_kernel ? NULL : (struct autosa_kernel_stmt *)isl_id_get_user(id);\n  isl_id_free(id);\n\n  if ((is_kernel && !kernel) || (!is_kernel && !stmt))\n    return isl_bool_error;\n\n  if (is_kernel)\n  {\n    *p = ppcg_ast_expr_print_macros(kernel->grid_size_expr, *p);\n  }\n  else if (stmt->type == AUTOSA_KERNEL_STMT_COPY)\n  {\n    *p = ppcg_ast_expr_print_macros(stmt->u.c.index, *p);\n    *p = ppcg_ast_expr_print_macros(stmt->u.c.local_index, *p);\n  }\n  else if (stmt->type == AUTOSA_KERNEL_STMT_DOMAIN)\n  {\n    *p = ppcg_print_body_macros(*p, stmt->u.d.ref2expr);\n  }\n  if (!*p)\n    return isl_bool_error;\n\n  return isl_bool_false;\n}\n\nstatic void print_indent(FILE *dst, int indent)\n{\n  fprintf(dst, \"%*s\", indent, \"\");\n}\n\n/* Print a list of iterators of type \"type\" with names \"ids\" to \"out\".\n * Each iterator is assigned one of the instance identifiers in dims.\n */\nstatic __isl_give isl_printer *print_iterators(\n  __isl_take isl_printer *p, \n  FILE *out, const char *type,\n  __isl_keep isl_id_list *ids, const char *dims[])\n{\n  int i, n;\n\n  n = isl_id_list_n_id(ids);\n  if (n <= 0)\n    return p;\n  //print_indent(out, 2);\n  //fprintf(out, \"%s \", type);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, type);\n  p = isl_printer_print_str(p, \" \");\n  for (i = 0; i < n; ++i)\n  {\n    isl_id *id;\n\n    if (i)\n      p = isl_printer_print_str(p, \", \");\n      //fprintf(out, \", \");\n    id = isl_id_list_get_id(ids, i);\n    //fprintf(out, \"%s = %s\", isl_id_get_name(id),\n    //        dims[i]);\n    p = isl_printer_print_str(p, isl_id_get_name(id));\n    p = isl_printer_print_str(p, \" = \");\n    p = isl_printer_print_str(p, dims[i]);\n    isl_id_free(id);\n  }\n  //fprintf(out, \"; // module id\\n\");\n  p = isl_printer_print_str(p, \"; // module id\");\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print the required macros for the AutoSA AST \"node\" to \"p\",\n * including those needed for the user statements inside the AST.\n */\n__isl_give isl_printer *autosa_print_macros(__isl_take isl_printer *p,\n                                            __isl_keep isl_ast_node *node)\n{\n  if (isl_ast_node_foreach_descendant_top_down(node, &at_node, &p) < 0)\n    return isl_printer_free(p);\n  p = ppcg_print_macros(p, node);\n  return p;\n}\n\n__isl_give isl_printer *print_module_iterators(\n  __isl_take isl_printer *p, FILE *out, struct autosa_hw_module *module)\n{\n  isl_ctx *ctx;\n  const char *type;\n  const char *dims[] = {\"idx\", \"idy\", \"idz\"};\n\n  ctx = isl_ast_node_get_ctx(module->tree);\n  type = isl_options_get_ast_iterator_type(ctx);\n  p = print_iterators(p, out, type, module->inst_ids, dims);\n\n  return p;\n}\n\n__isl_give isl_printer *print_func_iterators(\n  __isl_take isl_printer *p,\n  FILE *out, struct autosa_drain_merge_func *func)\n{\n  isl_ctx *ctx;\n  const char *type;\n  const char *dims[] = {\"idx\", \"idy\", \"idz\"};\n\n  ctx = isl_ast_node_get_ctx(func->tree);\n  type = isl_options_get_ast_iterator_type(ctx);\n  p = print_iterators(p, out, type, func->inst_ids, dims);\n  return p;\n}\n\n__isl_give isl_printer *print_serialize_counter(\n  __isl_take isl_printer *p, struct autosa_hw_module *module)\n{\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"unsigned int \");\n  p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n  p = isl_printer_print_str(p, \"_cnt = 0;\");\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print the arguments to a host serialization functioin declaration or call.\n * If \"types\" is set, then print a declaration (including the types of the arguments).\n * \n * The arguments are printed in the following order:\n * - the moduler identifiers\n * - the paramters\n * - the host loop iterators\n * - the input array accessed by the module (before serialization/deserialization)\n * - the output array accessed by the module (after serialization/deserialization)\n */\n__isl_give isl_printer *print_host_serialize_arguments(\n  __isl_take isl_printer *p,\n  struct autosa_kernel *kernel,\n  struct autosa_array_ref_group *group,\n  struct autosa_hw_module *module,\n  int types,\n  int hls)\n{\n  int first = 1;\n  int nparam;\n  int n;\n  isl_space *space;\n  const char *type;\n  struct autosa_local_array_info *local_array;\n\n  type = isl_options_get_ast_iterator_type(kernel->ctx);\n  /* module identifiers */\n  const char *dims[] = {\"idx\", \"idy\", \"idz\"};\n  n = isl_id_list_n_id(module->inst_ids);\n  for (int i = 0; i < n; ++i)\n  {\n    if (!first)\n      p = isl_printer_print_str(p, \", \");\n    if (types)\n    {\n      p = isl_printer_print_str(p, type);\n      p = isl_printer_print_str(p, \" \");\n    }\n    p = isl_printer_print_str(p, dims[i]);\n\n    first = 0;\n  }\n\n  /* params */\n  space = isl_union_set_get_space(kernel->arrays);\n  nparam = isl_space_dim(space, isl_dim_param);\n  for (int i = 0; i < nparam; ++i)\n  {\n    const char *name;\n\n    name = isl_space_get_dim_name(space, isl_dim_param, i);\n\n    if (!first)\n      p = isl_printer_print_str(p, \", \");\n    if (types)\n      p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, name);\n\n    first = 0;\n  }\n  isl_space_free(space);\n\n  /* Host iters */\n  n = isl_space_dim(kernel->space, isl_dim_set);\n  for (int i = 0; i < n; ++i)\n  {\n    const char *name;\n\n    if (!first)\n      p = isl_printer_print_str(p, \", \");\n    name = isl_space_get_dim_name(kernel->space, isl_dim_set, i);\n    if (types)\n    {\n      p = isl_printer_print_str(p, type);\n      p = isl_printer_print_str(p, \" \");\n    }\n    p = isl_printer_print_str(p, name);\n\n    first = 0;\n  }\n\n  /* Arrays */\n  local_array = group->local_array;\n  if (!first)\n    p = isl_printer_print_str(p, \", \");\n  if (types)    \n  {\n    if (hls)\n    {\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *\");\n    }\n    else \n    {\n      p = isl_printer_print_str(p, \"std::vector<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \", aligned_allocator<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \">> &\");\n    }\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"_to\");\n  }\n  else \n  {    \n    p = isl_printer_print_str(p, \"dev_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    if (!module->in) {\n      p = isl_printer_print_str(p, \"_unserialized\");\n    }    \n  }\n  first = 0;\n\n  if (!first)\n    p = isl_printer_print_str(p, \", \");\n  if (types)\n  {\n    if (hls)\n    {\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *\");\n    }\n    else\n    {\n      p = isl_printer_print_str(p, \"std::vector<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \", aligned_allocator<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \">> &\");\n    }\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"_from\");\n  }\n  else\n  {    \n    p = isl_printer_print_str(p, \"dev_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    if (module->in) {\n      p = isl_printer_print_str(p, \"_unserialized\");\n    }    \n  }\n  first = 0;\n\n  return p;  \n}\n\n/* Print out\n * \"hls::stream<[type]>\"\n */\n__isl_give isl_printer *print_fifo_type_xilinx(__isl_take isl_printer *p,\n                                               struct autosa_array_ref_group *group, int n_lane)\n{\n  struct autosa_array_info *array = group->array;\n\n  p = isl_printer_print_str(p, \"hls::stream<\");\n  if (group->local_array->is_sparse) {\n    p = isl_printer_print_str(p, array->name);\n    p = isl_printer_print_str(p, \"_s_t\");\n    p = isl_printer_print_int(p, n_lane);\n  } else {\n    if (n_lane == 1) {\n      p = isl_printer_print_str(p, group->array->type);\n    } else {    \n      p = isl_printer_print_str(p, array->name);    \n      p = isl_printer_print_str(p, \"_t\");\n      p = isl_printer_print_int(p, n_lane);\n    }\n  }\n  p = isl_printer_print_str(p, \">\");\n\n  return p;\n}\n\n/* Print out\n * \"ac_channel<[type]>\"\n */\n__isl_give isl_printer *print_fifo_type_catapult(__isl_take isl_printer *p,\n                                                 struct autosa_array_ref_group *group, int n_lane)\n{\n  struct autosa_array_info *array = group->array;\n\n  p = isl_printer_print_str(p, \"ac_channel<\");\n  if (group->local_array->is_sparse) {\n    p = isl_printer_print_str(p, array->name);\n    p = isl_printer_print_str(p, \"_s_t\");\n    p = isl_printer_print_int(p, n_lane);\n  } else {\n    //if (n_lane == 1) {\n    //  p = isl_printer_print_str(p, group->array->type);\n    //} else {    \n      p = isl_printer_print_str(p, array->name);    \n      p = isl_printer_print_str(p, \"_t\");\n      p = isl_printer_print_int(p, n_lane);\n    //}\n  }\n  p = isl_printer_print_str(p, \">\");\n\n  return p;\n}\n\n/* Print out\n * \"channel [type]\"\n */\n__isl_give isl_printer *print_fifo_type_intel(__isl_take isl_printer *p,\n                                              struct autosa_array_ref_group *group, int n_lane)\n{\n  p = isl_printer_print_str(p, \"channel \");\n  if (n_lane == 1)\n    p = isl_printer_print_str(p, group->array->type);\n  else\n  {\n    p = isl_printer_print_str(p, group->array->name);\n    p = isl_printer_print_str(p, \"_t\");\n    p = isl_printer_print_int(p, n_lane);\n  }\n\n  return p;\n}\n\n/* Print out\n * \"tapa::[i/o]stream<[type], [depth]>\"\n */\n__isl_give isl_printer *print_fifo_type_tapa(__isl_take isl_printer *p,\n                                             struct autosa_array_ref_group *group,\n                                             int n_lane, int fifo_depth, const char *direction)\n{\n  struct autosa_array_info *array = group->array;\n\n  p = isl_printer_print_str(p, \"tapa::\");\n  if (direction) {\n    p = isl_printer_print_str(p, direction);\n  }\n  p = isl_printer_print_str(p, \"stream<\");\n  if (group->local_array->is_sparse) {\n    p = isl_printer_print_str(p, array->name);\n    p = isl_printer_print_str(p, \"_s_t\");\n    p = isl_printer_print_int(p, n_lane);\n  } else {\n    if (n_lane == 1) {\n      p = isl_printer_print_str(p, group->array->type);\n    } else {\n      p = isl_printer_print_str(p, array->name);\n      p = isl_printer_print_str(p, \"_t\");\n      p = isl_printer_print_int(p, n_lane);\n    }\n  }\n  if (!direction) {\n    p = isl_printer_print_str(p, \", \");\n    p = isl_printer_print_int(p, fifo_depth);\n  }\n  p = isl_printer_print_str(p, \">\");\n\n  return p;\n}\n\n\n/* If disable prefix is asserted, do not print \"fifo\" prefix. \n */\n__isl_give isl_printer *autosa_fifo_print_declaration_arguments(\n    __isl_take isl_printer *p, struct autosa_array_ref_group *group, int n_lane,\n    const char *suffix, enum platform target, int fifo_depth, const char *direction)\n{\n  if (target == XILINX_HW)\n  {\n    p = print_fifo_type_xilinx(p, group, n_lane);\n    p = isl_printer_print_str(p, \" &\");\n  } else if (target == TAPA_HW)\n  {\n    p = print_fifo_type_tapa(p, group, n_lane, fifo_depth, direction);\n    p = isl_printer_print_str(p, \" &\");\n  } else if (target == INTEL_HW)\n  {\n    p = print_fifo_type_intel(p, group, n_lane);\n    p = isl_printer_print_str(p, \" \");\n  } else if (target == CATAPULT_HW) \n  {\n    p = print_fifo_type_catapult(p, group, n_lane);\n    p = isl_printer_print_str(p, \" &\");\n  }\n  p = autosa_array_ref_group_print_fifo_name(group, p);\n  if (suffix)\n  {\n    p = isl_printer_print_str(p, \"_\");\n    p = isl_printer_print_str(p, suffix);\n  }\n\n  return p;\n}\n\n__isl_give isl_printer *autosa_fifo_print_call_argument(\n    __isl_take isl_printer *p, struct autosa_array_ref_group *group,\n    const char *suffix, enum platform target)\n{\n  p = autosa_array_ref_group_print_fifo_name(group, p);\n  if (suffix)\n  {\n    p = isl_printer_print_str(p, \"_\");\n    p = isl_printer_print_str(p, suffix);\n  }\n\n  return p;\n}\n\n/* Print the call of an array argument in the module.\n */\n__isl_give isl_printer *autosa_module_array_info_print_call_argument(\n  __isl_take isl_printer *p, struct autosa_array_info *array)\n{\n  if (autosa_array_is_read_only_scalar(array))\n    return isl_printer_print_str(p, array->name);\n\n  p = isl_printer_print_str(p, array->name);\n\n  return p;\n}\n\n/* Print the variable initialization. */\n__isl_give isl_printer *autosa_print_var_initialization(\n  __isl_take isl_printer *p, struct autosa_kernel_var *var,\n  enum platform target)\n{  \n  for (int i = 0; i < isl_vec_size(var->size); ++i) {\n    isl_val *extent;\n\n    if (target == CATAPULT_HW)\n      p = print_str_new_line(p, \"// hls_pipeline\");    \n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int c\");\n    p = isl_printer_print_int(p, i);\n    p = isl_printer_print_str(p, \" = 0; c\");\n    p = isl_printer_print_int(p, i);\n    p = isl_printer_print_str(p, \" < \");\n    extent = isl_vec_get_element_val(var->size, i);\n    p = isl_printer_print_val(p, extent);\n    isl_val_free(extent);\n    p = isl_printer_print_str(p, \"; c\");\n    p = isl_printer_print_int(p, i);\n    p = isl_printer_print_str(p, \"++) {\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n  }\n  \n  if (target == XILINX_HW || target == TAPA_HW)\n    p = print_str_new_line(p, \"// hls_pipeline\");\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, var->name);\n  for (int i = 0; i < isl_vec_size(var->size); ++i) {\n    p = isl_printer_print_str(p, \"[c\");\n    p = isl_printer_print_int(p, i);\n    p = isl_printer_print_str(p, \"]\");\n  }\n  p = isl_printer_print_str(p, \" = 0;\");\n  p = isl_printer_end_line(p);\n  for (int i = 0; i < isl_vec_size(var->size); ++i) {\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n\n  return p;  \n}\n\n/* Print the arguments to a module declaration or call. If \"types\" is set,\n * then print a declaration (including the types of the arguments).\n *\n * The arguments are printed in the following order\n * - the module identifiers\n * - the parameters\n * - the host loop iterators\n * - the arrays accessed by the module\n * - the fifos\n * - the enable signal\n * \n * If module is to_mem with serialize set as 0, we will replace the arrays \n * by a serialize fifo.\n */\n__isl_give isl_printer *print_module_arguments(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog,\n    struct autosa_kernel *kernel,\n    struct autosa_hw_module *module, int types,\n    enum platform target,\n    int inter, int arb, int boundary, int serialize)\n{\n  int first = 1;\n  isl_space *space;\n  int nparam;\n  int n;\n  const char *type;\n  int fifo_depth = prog->scop->options->autosa->fifo_depth;\n\n  type = isl_options_get_ast_iterator_type(prog->ctx);\n  /* Module identifiers */\n  const char *dims[] = {\"idx\", \"idy\", \"idz\"};\n  n = isl_id_list_n_id(module->inst_ids);\n  if (!prog->scop->options->autosa->use_cplusplus_template) {\n    for (int i = 0; i < n; ++i)\n    {\n      if (!first)\n      {\n        p = isl_printer_print_str(p, \", \");\n        if (!types)\n        {\n          p = isl_printer_end_line(p);\n          p = isl_printer_start_line(p);\n        }\n      }\n      if (types)\n      {\n        p = isl_printer_print_str(p, type);\n        p = isl_printer_print_str(p, \" \");\n      }\n      if (!types)\n      {\n        p = isl_printer_print_str(p, \"/* module id */ \");\n      }\n      p = isl_printer_print_str(p, dims[i]);\n      first = 0;\n    }\n  }\n\n  /* params */\n  space = isl_union_set_get_space(kernel->arrays);\n  nparam = isl_space_dim(space, isl_dim_param);\n  for (int i = 0; i < nparam; ++i)\n  {\n    const char *name;\n\n    name = isl_space_get_dim_name(space, isl_dim_param, i);\n\n    if (!first)\n    {\n      p = isl_printer_print_str(p, \", \");\n      if (!types)\n      {\n        p = isl_printer_end_line(p);\n        p = isl_printer_start_line(p);\n      }\n    }\n    if (types)\n      p = isl_printer_print_str(p, \"int \");\n    if (!types)\n      p = isl_printer_print_str(p, \"/* param */ \");\n    p = isl_printer_print_str(p, name);\n\n    first = 0;\n  }\n  isl_space_free(space);\n\n  /* host iters */\n  if (inter == -1)\n    space = module->space;\n  else if (inter == 0)\n    space = module->intra_space;\n  else if (inter == 1)\n    space = module->inter_space;\n\n  /* Skip printing the host iterators for inter/intra modules for Catapult HLS */\n  if (!(inter >= 0 && target == CATAPULT_HW)) {\n    n = isl_space_dim(space, isl_dim_set);\n    for (int i = 0; i < n; ++i)\n    {\n      const char *name;\n\n      if (!first)\n      {\n        p = isl_printer_print_str(p, \", \");\n        if (!types)\n        {\n          p = isl_printer_end_line(p);\n          p = isl_printer_start_line(p);\n        }\n      }\n      name = isl_space_get_dim_name(space, isl_dim_set, i);\n      if (types)\n      {\n        p = isl_printer_print_str(p, type);\n        p = isl_printer_print_str(p, \" \");\n      }\n      if (!types)\n      {\n        p = isl_printer_print_str(p, \"/* host iter */ \");\n      }\n      p = isl_printer_print_str(p, name);\n      if (module->double_buffer && inter != -1 && !types)\n      {\n        if (module->in && inter == 0)\n        {\n          /* intra trans */\n          p = isl_printer_print_str(p, \"_prev\");\n        }\n        else if (!module->in && inter == 1)\n        {\n          /* inter trans */\n          p = isl_printer_print_str(p, \"_prev\");\n        }\n      }\n\n      first = 0;\n    }\n  }\n\n  /* Arrays */\n  if (module->type != PE_MODULE && module->to_mem)\n  {\n    if (!module->is_serialized || \n       (module->is_serialized && serialize && !prog->scop->options->autosa->axi_stream)) {\n      /* If module satisfies any of the following constraints:\n       * 1. not serialized \n       * 2. serialized and not using the AXI stream interface\n       * the I/O module will access the external memory through array pointer. */\n      struct autosa_io_buffer *io_buffer =\n          module->io_groups[0]->io_buffers[module->io_groups[0]->io_level - 1];      \n      int n_lane = (module->is_serialized)? module->data_pack_serialize : io_buffer->n_lane;\n      if (!first)\n      {\n        p = isl_printer_print_str(p, \", \");\n        if (!types)\n        {\n          p = isl_printer_end_line(p);\n          p = isl_printer_start_line(p);\n        }\n      }\n      if (types)\n      {\n        p = autosa_array_info_print_declaration_argument(\n              p, module->io_groups[0]->array, n_lane,\n              target == INTEL_HW ? \"__global volatile\" : NULL, -1, prog->scop->options->autosa->mem_port_map, target);\n      }\n      else\n      {\n        p = isl_printer_print_str(p, \"/* array */ \");\n        p = autosa_module_array_info_print_call_argument(p,\n                                                         module->io_groups[0]->array);\n      }\n      first = 0;\n    } else if (module->is_serialized && serialize && prog->scop->options->autosa->axi_stream && inter == -1) {\n      /* The module is serialized and using the AXI stream interface,\n       * the I/O module will access the external memory via a stream fifo. */\n      struct autosa_io_buffer *io_buffer =\n          module->io_groups[0]->io_buffers[module->io_groups[0]->io_level - 1];\n      int n_lane = (module->is_serialized)? module->data_pack_serialize : io_buffer->n_lane;\n      if (!first) {\n        p = isl_printer_print_str(p, \", \");\n        if (!types) {\n          p = isl_printer_end_line(p);\n          p = isl_printer_start_line(p);\n        }\n      }\n      if (types) {\n        p = autosa_fifo_print_declaration_arguments(p,\n                                                    module->io_groups[0], n_lane, NULL, target, fifo_depth, NULL);\n      } else {\n        p = isl_printer_print_str(p, \"/* fifo */ \");\n        p = autosa_fifo_print_call_argument(p,  \n                                            module->io_groups[0], NULL, target);\n      }\n      first = 0;\n    } else if (inter == -1) {\n      /* The module is serialized and connected to another stream header module,\n       * print a normal FIFO interface here. */\n      int n_lane = module->data_pack_inter;\n      if (!first) {\n        p = isl_printer_print_str(p, \", \");\n        if (!types) {\n          p = isl_printer_end_line(p);\n          p = isl_printer_start_line(p);\n        }\n      }\n      if (types) {\n        //p = autosa_fifo_print_declaration_arguments(p,\n        //                                            module->io_groups[0], n_lane, \"serialize\", target, fifo_depth);\n        p = autosa_fifo_print_declaration_arguments(p,\n                                                    module->io_groups[0], n_lane, (module->in)? \"in\" : \"out\", target, fifo_depth,\n                                                    (module->in)? \"i\" : \"o\");\n      } else {\n        p = isl_printer_print_str(p, \"/* fifo */ \");\n        //p = autosa_fifo_print_call_argument(p,  \n        //                                    module->io_groups[0], \"serialize\", target);\n        p = autosa_fifo_print_call_argument(p,  \n                                            module->io_groups[0], (module->in)? \"in\" : \"out\", target);\n      }\n      first = 0;\n    }\n  } else if (module->type == PE_MODULE) {\n    /* Scalars */\n    for (int i = 0; i < prog->n_array; i++)\n    {\n      int required;\n\n      required = autosa_kernel_requires_array_argument(kernel, i);\n      if (required < 0)\n        return isl_printer_free(p);\n      if (!required)\n        continue;\n\n      if (autosa_array_is_read_only_scalar(&prog->array[i]))\n      {\n        if (!first)\n        {\n          p = isl_printer_print_str(p, \", \");\n          if (!types)\n          {\n            p = isl_printer_end_line(p);\n            p = isl_printer_start_line(p);\n          }\n        }\n        if (types)\n          p = autosa_array_info_print_declaration_argument(\n                p, &prog->array[i], 1, NULL, -1, NULL, target);\n        else\n        {\n          p = isl_printer_print_str(p, \"/* scalar */ \");\n          p = autosa_array_info_print_call_argument(p,\n                                                    &prog->array[i], -1, \"buffer\");\n        }\n        first = 0;\n      }\n    }\n  }\n\n  /* Local buffer */\n  if (inter != -1)\n  {\n    for (int i = 0; i < module->n_var; i++)\n    {\n      struct autosa_kernel_var *var;\n\n      var = (struct autosa_kernel_var *)&module->var[i];\n      if (!first)\n      {\n        p = isl_printer_print_str(p, \", \");\n        if (!types)\n        {\n          p = isl_printer_end_line(p);\n          p = isl_printer_start_line(p);\n        }\n      }\n      if (types)\n      {\n        if (target == CATAPULT_HW) {\n          p = isl_printer_print_str(p, \"ac_channel<\");\n          p = isl_printer_print_str(p, module->name);\n          p = isl_printer_print_str(p, \"_\");\n          p = isl_printer_print_str(p, var->name);\n          p = isl_printer_print_str(p, \"> &\");\n          p = isl_printer_print_str(p, var->name);          \n        } else {\n          if (module->data_pack_inter == 1 && module->io_groups[0]->local_array->is_sparse == 0) {\n            p = isl_printer_print_str(p, var->array->type);\n          }\n          else {\n            p = isl_printer_print_str(p, var->array->name);\n            if (var->array->local_array->is_sparse)\n              p = isl_printer_print_str(p, \"_s\");\n            p = isl_printer_print_str(p, \"_t\");\n            p = isl_printer_print_int(p, module->data_pack_inter);\n          }\n          p = isl_printer_print_str(p, \" \");\n          p = isl_printer_print_str(p, var->name);\n          for (int j = 0; j < isl_vec_size(var->size); j++) {\n            isl_val *v;\n            p = isl_printer_print_str(p, \"[\");\n            v = isl_vec_get_element_val(var->size, j);\n            p = isl_printer_print_val(p, v);\n            isl_val_free(v);\n            p = isl_printer_print_str(p, \"]\");\n          }\n        }\n      }\n      else\n      {\n        p = isl_printer_print_str(p, \"/* array */ \");\n        if (target == CATAPULT_HW) {\n          p = isl_printer_print_str(p, module->name);\n          p = isl_printer_print_str(p, \"_\");\n          p = isl_printer_print_str(p, var->name);\n          p = isl_printer_print_str(p, \"_inst\");\n        } else {\n          if (!module->double_buffer)\n          {\n            p = isl_printer_print_str(p, var->name);\n          }\n          else\n          {\n            if (arb == 0)\n            {\n              p = isl_printer_print_str(p, var->name);\n              p = isl_printer_print_str(p, inter == 0 ? \"_ping\" : \"_pong\");\n            }\n            else\n            {\n              p = isl_printer_print_str(p, var->name);\n              p = isl_printer_print_str(p, inter == 0 ? \"_pong\" : \"_ping\");\n            }\n          }\n        }\n      }\n\n      first = 0;\n    }\n  }\n\n  /* fifos */\n  if (module->type == PE_MODULE)\n  {\n    for (int i = 0; i < module->n_io_group; i++)\n    {\n      struct autosa_array_ref_group *group = module->io_groups[i];\n      //if (!(group->copy_in || group->copy_out))\n      //  continue;\n      int n_lane = get_io_group_n_lane(module, NULL, group);\n      if (module->io_groups[i]->pe_io_dir == IO_IN ||\n          module->io_groups[i]->pe_io_dir == IO_INOUT)\n      {\n        if (!first)\n        {\n          p = isl_printer_print_str(p, \", \");\n          if (!types)\n          {\n            p = isl_printer_end_line(p);\n            p = isl_printer_start_line(p);\n          }\n        }\n        if (types)\n        {\n          p = autosa_fifo_print_declaration_arguments(p,\n                                                      module->io_groups[i], n_lane, \"in\", target, fifo_depth, \"i\");\n        }\n        else\n        {\n          p = isl_printer_print_str(p, \"/* fifo */ \");\n          p = autosa_fifo_print_call_argument(p,\n                                              module->io_groups[i], \"in\", target);\n        }\n        first = 0;\n      }\n      if (module->io_groups[i]->pe_io_dir == IO_OUT ||\n          module->io_groups[i]->pe_io_dir == IO_INOUT)\n      {\n        if (!first)\n        {\n          p = isl_printer_print_str(p, \", \");\n          if (!types)\n          {\n            p = isl_printer_end_line(p);\n            p = isl_printer_start_line(p);\n          }\n        }\n        if (types)\n          p = autosa_fifo_print_declaration_arguments(p,\n                                                      module->io_groups[i], n_lane, \"out\", target, fifo_depth, \"o\");\n        else\n        {\n          p = isl_printer_print_str(p, \"/* fifo */ \");\n          p = autosa_fifo_print_call_argument(p,\n                                              module->io_groups[i], \"out\", target);\n        }\n        first = 0;\n      }\n    }\n  }\n  else {\n    for (int i = 0; i < module->n_io_group; i++) {      \n      if (inter == 1 || (inter == -1 && !module->to_mem)) {\n      //if (!module->to_mem && (inter == 1 || inter == -1)) {\n        /* inter trans or outer module or default module. */\n        if (!(!module->in && boundary)) {\n          /* Print in fifo. */\n          if (!first) {\n            p = isl_printer_print_str(p, \", \");\n            if (!types) {\n              p = isl_printer_end_line(p);\n              p = isl_printer_start_line(p);\n            }\n          }\n          /* in */\n          if (types)\n            p = autosa_fifo_print_declaration_arguments(p,\n                                                        module->io_groups[i], module->data_pack_inter, \"in\", target, fifo_depth, \"i\");\n          else {\n            p = isl_printer_print_str(p, \"/* fifo */ \");\n            p = autosa_fifo_print_call_argument(p,\n                                                module->io_groups[i], \"in\", target);\n          }\n          first = 0;\n        }\n\n        if (!(module->in && boundary)) {\n          /* Print out fifo. */\n          /* out */\n          if (!first) {\n            p = isl_printer_print_str(p, \", \");\n            if (!types) {\n              p = isl_printer_end_line(p);\n              p = isl_printer_start_line(p);\n            }\n          }\n          if (types)\n            p = autosa_fifo_print_declaration_arguments(p,\n                                                        module->io_groups[i], module->data_pack_inter, \"out\", target, fifo_depth, \"o\");\n          else {\n            p = isl_printer_print_str(p, \"/* fifo */ \");\n            p = autosa_fifo_print_call_argument(p,\n                                                module->io_groups[i], \"out\", target);\n          }\n          first = 0;\n        }\n      }\n\n      if (inter != 1) {\n        if (!first) {\n          p = isl_printer_print_str(p, \", \");\n          if (!types) {\n            p = isl_printer_end_line(p);\n            p = isl_printer_start_line(p);\n          }\n        }\n        /* local */\n        if (types) {\n          p = autosa_fifo_print_declaration_arguments(p,\n                                                      module->io_groups[i], \n                                                      (module->is_serialized && serialize)? module->data_pack_inter : module->data_pack_intra,                                                      \n                                                      module->in ? \"local_out\" : \"local_in\", target, fifo_depth,\n                                                      module->in ? \"o\" : \"i\");\n        } else {\n          p = isl_printer_print_str(p, \"/* fifo */ \");\n          p = autosa_fifo_print_call_argument(p,\n                                              module->io_groups[i], module->in ? \"local_out\" : \"local_in\", target);\n        }\n        first = 0;\n      }\n    }\n  }\n\n  /* credit fifo */\n  if (module->credit)\n  {\n    if (!first)\n    {\n      p = isl_printer_print_str(p, \", \");\n      if (!types)\n      {\n        p = isl_printer_end_line(p);\n        p = isl_printer_start_line(p);\n      }\n    }\n    if (types)\n    {\n      if (target == XILINX_HW)\n      {\n        p = isl_printer_print_str(p, \"hls::stream<int> &credit\");\n      }\n      else if (target == TAPA_HW)\n      {\n        p = isl_printer_print_str(p, \"tapa::stream<int> &credit\");\n      }\n      else\n      {\n        p = isl_printer_print_str(p, \"channel int credit\");\n      }\n    }\n    else\n    {\n      p = isl_printer_print_str(p, \"/* credit */ \");\n      p = isl_printer_print_str(p, \"credit\");\n    }\n\n    first = 0;\n  }\n\n  /* enable signal */\n  if (module->double_buffer && inter != -1 && target != CATAPULT_HW)\n  {\n    if (!first)\n    {\n      p = isl_printer_print_str(p, \", \");\n      if (!types)\n      {\n        p = isl_printer_end_line(p);\n        p = isl_printer_start_line(p);\n      }\n    }\n    if (types)\n    {\n      p = isl_printer_print_str(p, inter == 0 ? \"bool intra_trans_en\" : \"bool inter_trans_en\");\n    }\n    else\n    {\n      p = isl_printer_print_str(p, \"/* enable */ \");\n      p = isl_printer_print_str(p, inter == 0 ? \"intra_trans_en\" : \"inter_trans_en\");\n    }\n\n    first = 0;\n  }\n\n  return p;\n}\n\n/* Print the arguments to a pe dummy module declaration or call. If \"types\" is set,\n * then print a declaration (including the types of the arguments).\n *\n * The arguments are printed in the following order\n * - the module identifiers\n * - the parameters\n * - the host loop iterators \n * - the arrays accessed by the module\n * - the fifos\n */\n__isl_give isl_printer *print_pe_dummy_module_arguments(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog,\n    struct autosa_kernel *kernel,\n    struct autosa_pe_dummy_module *pe_dummy_module,\n    int types,\n    enum platform target)\n{\n  int first = 1;\n  isl_space *space;\n  int nparam;\n  int n;\n  const char *type;\n  struct autosa_hw_module *module = pe_dummy_module->module;\n  int fifo_depth = prog->scop->options->autosa->fifo_depth;\n\n  type = isl_options_get_ast_iterator_type(prog->ctx);\n  /* module identifiers */\n  const char *dims[] = {\"idx\", \"idy\", \"idz\"};\n  n = isl_id_list_n_id(module->inst_ids);\n  for (int i = 0; i < n; ++i)\n  {\n    if (!first)\n      p = isl_printer_print_str(p, \", \");\n    if (types)\n    {\n      p = isl_printer_print_str(p, type);\n      p = isl_printer_print_str(p, \" \");\n    }\n    p = isl_printer_print_str(p, dims[i]);\n\n    first = 0;\n  }\n\n  /* params */\n  space = isl_union_set_get_space(kernel->arrays);\n  nparam = isl_space_dim(space, isl_dim_param);\n  for (int i = 0; i < nparam; ++i)\n  {\n    const char *name;\n\n    name = isl_space_get_dim_name(space, isl_dim_param, i);\n\n    if (!first)\n      p = isl_printer_print_str(p, \", \");\n    if (types)\n      p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, name);\n\n    first = 0;\n  }\n  isl_space_free(space);\n\n  /* host iters */\n  space = module->space;\n\n  n = isl_space_dim(space, isl_dim_set);\n  for (int i = 0; i < n; ++i)\n  {\n    const char *name;\n\n    if (!first)\n      p = isl_printer_print_str(p, \", \");\n    name = isl_space_get_dim_name(space, isl_dim_set, i);\n    if (types)\n    {\n      p = isl_printer_print_str(p, type);\n      p = isl_printer_print_str(p, \" \");\n    }\n    p = isl_printer_print_str(p, name);\n\n    first = 0;\n  }\n\n  /* Arrays */\n  /* Scalars */\n  for (int i = 0; i < prog->n_array; i++)\n  {\n    int required;\n\n    required = autosa_kernel_requires_array_argument(kernel, i);\n    if (required < 0)\n      return isl_printer_free(p);\n    if (!required)\n      continue;\n\n    if (autosa_array_is_read_only_scalar(&prog->array[i]))\n    {\n      if (!first)\n      {\n        p = isl_printer_print_str(p, \", \");\n      }\n      if (types)\n        p = autosa_array_info_print_declaration_argument(\n              p, &prog->array[i], 1, NULL, -1, NULL, target);\n      else\n        p = autosa_module_array_info_print_call_argument(p,\n                                                         &prog->array[i]);\n      first = 0;\n    }\n  }\n\n  /* fifos */\n  struct autosa_array_ref_group *group = pe_dummy_module->io_group;\n  int n_lane = get_io_group_n_lane(NULL, pe_dummy_module, group);  \n\n  if (!first)\n  {\n    p = isl_printer_print_str(p, \", \");\n  }\n  if (types)\n  {\n    p = autosa_fifo_print_declaration_arguments(p,\n                                                group, n_lane, pe_dummy_module->in? \"in\" : \"out\", target, fifo_depth,\n                                                pe_dummy_module->in? \"i\" : \"o\");\n  }\n  else\n    p = autosa_fifo_print_call_argument(p,\n                                        group, pe_dummy_module->in? \"in\" : \"out\", target);\n  first = 0;\n\n  return p;\n}\n\n/* Print the arguments of the top_gen function:\n * - parameters\n * - host loop iterators\n * - file descriptor\n */\n__isl_give isl_printer *print_top_gen_arguments(__isl_take isl_printer *p,\n                                                struct autosa_prog *prog, struct autosa_kernel *kernel, int types)\n{\n  int i, n;\n  int first = 1;\n  unsigned nparam;\n  isl_space *space;\n  const char *type;\n\n  /* Parameters */\n  space = isl_union_set_get_space(kernel->arrays);\n  nparam = isl_space_dim(space, isl_dim_param);\n  for (i = 0; i < nparam; ++i)\n  {\n    const char *name;\n\n    name = isl_space_get_dim_name(space, isl_dim_param, i);\n\n    if (!first)\n      p = isl_printer_print_str(p, \", \");\n    if (types)\n      p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, name);\n\n    first = 0;\n  }\n  isl_space_free(space);\n\n  /* Host iterators */\n  n = isl_space_dim(kernel->space, isl_dim_set);\n  type = isl_options_get_ast_iterator_type(prog->ctx);\n  for (i = 0; i < n; ++i)\n  {\n    const char *name;\n\n    if (!first)\n      p = isl_printer_print_str(p, \", \");\n    name = isl_space_get_dim_name(kernel->space, isl_dim_set, i);\n    if (types)\n    {\n      p = isl_printer_print_str(p, type);\n      p = isl_printer_print_str(p, \" \");\n    }\n    p = isl_printer_print_str(p, name);\n\n    first = 0;\n  }\n\n  /* File description */\n  if (!first)\n    p = isl_printer_print_str(p, \", \");\n  if (types)\n  {\n    p = isl_printer_print_str(p, \"FILE *\");\n  }\n  p = isl_printer_print_str(p, \"f\");\n\n  first = 0;\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_top_gen_header(__isl_take isl_printer *p,\n                                                    struct autosa_prog *prog, struct autosa_hw_top_module *top)\n{\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"void \");\n  p = isl_printer_print_str(p, \"top_generate\");\n  p = isl_printer_print_str(p, \"(\");\n  p = print_top_gen_arguments(p, prog, top->kernel, 1);\n  p = isl_printer_print_str(p, \")\");\n\n  return p;\n}\n\nvoid print_top_gen_headers(\n    struct autosa_prog *prog, struct autosa_hw_top_module *top, struct hls_info *hls)\n{\n  isl_printer *p;\n\n  p = isl_printer_to_file(prog->ctx, hls->top_gen_h);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_top_gen_header(p, prog, top);\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  p = isl_printer_to_file(prog->ctx, hls->top_gen_c);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_top_gen_header(p, prog, top);\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n}\n\n/* Print out\n * \"\\/* [module_name] FIFO *\\/\"\n */\nstatic __isl_give isl_printer *print_fifo_comment(\n    __isl_take isl_printer *p, struct autosa_hw_module *module)\n{\n  p = isl_printer_print_str(p, \"/* \");\n  p = isl_printer_print_str(p, module->name);\n  p = isl_printer_print_str(p, \" fifo */\");\n\n  return p;\n}\n\n/* Print out\n * \"_[c0 + val]\"\n * Increase the \"pos\"th index by the value of \"val\"\n */\nstatic __isl_give isl_printer *print_inst_ids_inc_suffix(\n    __isl_take isl_printer *p, int n, int pos, int val)\n{\n  for (int i = 0; i < n; i++)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"_\\\");\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_int(p, c\");\n    p = isl_printer_print_int(p, i);\n    if (i == pos)\n    {\n      if (val != 0)\n      {\n        p = isl_printer_print_str(p, \" + \");\n        p = isl_printer_print_int(p, val);\n      }\n    }\n    p = isl_printer_print_str(p, \");\");\n    p = isl_printer_end_line(p);\n  }\n\n  return p;\n}\n\n/* Print out\n * \"_c0_c1\"\n */\nstatic __isl_give isl_printer *print_inst_ids_suffix(\n    __isl_take isl_printer *p, int n, __isl_keep isl_vec *offset)\n{\n  for (int i = 0; i < n; i++)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"_\\\");\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_int(p, c\");\n    p = isl_printer_print_int(p, i);\n    if (offset)\n    {\n      isl_val *val = isl_vec_get_element_val(offset, i);\n      if (!isl_val_is_zero(val))\n      {\n        p = isl_printer_print_str(p, \" + \");\n        p = isl_printer_print_val(p, val);\n      }\n      isl_val_free(val);\n    }\n    p = isl_printer_print_str(p, \");\");\n    p = isl_printer_end_line(p);\n  }\n\n  return p;\n}\n\n/* This function prints the inst ids described by \"expr\".\n * If the \"offset\" is set, it is added to the inst ids.\n */\nstatic __isl_give isl_printer *print_pretrans_inst_ids_suffix(\n    __isl_take isl_printer *p, int n_id,\n    __isl_keep isl_ast_expr *expr, __isl_keep isl_vec *offset)\n{\n  isl_ctx *ctx = isl_ast_expr_get_ctx(expr);\n  int n;\n\n  n = isl_ast_expr_op_get_n_arg(expr);\n  for (int i = 0; i < n_id; i++)\n  {\n    isl_ast_expr *expr_i = isl_ast_expr_get_op_arg(expr, i + 1);\n    int format;\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"_\\\");\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_int(p, \");\n    format = isl_printer_get_output_format(p);\n    p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n    p = isl_printer_print_ast_expr(p, expr_i);\n    p = isl_printer_set_output_format(p, format);\n    if (offset)\n    {\n      isl_val *val = isl_vec_get_element_val(offset, i);\n      if (!isl_val_is_zero(val))\n      {\n        p = isl_printer_print_str(p, \" + \");\n        p = isl_printer_print_val(p, val);\n      }\n      isl_val_free(val);\n    }\n    p = isl_printer_print_str(p, \");\");\n    p = isl_printer_end_line(p);\n\n    isl_ast_expr_free(expr_i);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_fifo_decl_single(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct autosa_prog *prog,\n    struct hls_info *hls, int pe_inout, const char *suffix)\n{\n  struct autosa_hw_module *module = stmt->u.m.module;\n  struct autosa_array_ref_group *group = stmt->u.m.group;\n  int boundary = stmt->u.m.boundary;\n  int n;\n  int n_lane;\n  int fifo_depth = prog->scop->options->autosa->fifo_depth;\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"// Count channel number\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"fifo_cnt++;\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"// Print channel declarations of module: \");\n  p = isl_printer_print_str(p, module->name);\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_start_line(p);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"\");\n  p = print_fifo_comment(p, module);\n  p = isl_printer_print_str(p, \" \");\n  n_lane = get_io_group_n_lane(module, NULL, group);\n  if (hls->target == XILINX_HW)\n    p = print_fifo_type_xilinx(p, group, n_lane);\n  else if (hls->target == TAPA_HW)\n    p = print_fifo_type_tapa(p, group, n_lane, fifo_depth, NULL);\n  else if (hls->target == INTEL_HW)\n    p = print_fifo_type_intel(p, group, n_lane);\n  else if (hls->target == CATAPULT_HW)\n    p = print_fifo_type_catapult(p, group, n_lane);\n  p = isl_printer_print_str(p, \" \");\n  p = autosa_array_ref_group_print_fifo_name(group, p);\n  p = isl_printer_print_str(p, \"_\");\n  p = isl_printer_print_str(p, module->name);\n  if (pe_inout)\n  {\n    p = isl_printer_print_str(p, suffix);\n  }\n  p = isl_printer_print_str(p, \"\\\");\");\n  p = isl_printer_end_line(p);\n\n  n = isl_id_list_n_id(module->inst_ids);\n  if (module->type == IO_MODULE || module->type == DRAIN_MODULE)\n  {\n    if (boundary)\n    {\n      p = print_inst_ids_inc_suffix(p, n, n - 1, 1);\n    }\n    else\n    {\n      p = print_inst_ids_suffix(p, n, NULL);\n    }\n  }\n  else if (module->type == PE_MODULE)\n  {\n    if (boundary)\n      p = print_pretrans_inst_ids_suffix(p, n, group->io_L1_pe_expr, group->dir);\n    else\n      p = print_pretrans_inst_ids_suffix(p, n, group->io_L1_pe_expr, NULL);\n  }\n  if (hls->target == INTEL_HW)\n  {\n    /* Print fifo attribute */\n    //p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\" __attribute__((depth(2)))\\\");\");\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\" __attribute__((depth(\");\n    p = isl_printer_print_int(p, fifo_depth);\n    p = isl_printer_print_str(p, \")))\\\");\");\n    p = isl_printer_end_line(p);\n  }\n  if (hls->target == TAPA_HW) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"(\\\\\\\"\");\n    p = autosa_array_ref_group_print_fifo_name(group, p);\n    p = isl_printer_print_str(p, \"_\");\n    p = isl_printer_print_str(p, module->name);\n    if (pe_inout)\n    {\n      p = isl_printer_print_str(p, suffix);\n    }\n    p = isl_printer_print_str(p, \"\\\");\");\n    p = isl_printer_end_line(p);\n\n    if (module->type == IO_MODULE || module->type == DRAIN_MODULE)\n    {\n      if (boundary)\n      {\n        p = print_inst_ids_inc_suffix(p, n, n - 1, 1);\n      }\n      else\n      {\n        p = print_inst_ids_suffix(p, n, NULL);\n      }\n    }\n    else if (module->type == PE_MODULE)\n    {\n      if (boundary)\n      {\n        p = print_pretrans_inst_ids_suffix(p, n, group->io_L1_pe_expr, group->dir);\n      }\n      else\n      {\n        p = print_pretrans_inst_ids_suffix(p, n, group->io_L1_pe_expr, NULL);\n      }\n    }\n    p = isl_printer_end_line(p);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"\\\\\\\")\\\");\");\n    p = isl_printer_end_line(p);\n  }\n\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\";\\\");\");  \n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  if (hls->target == XILINX_HW)\n  {\n    /* Print fifo pragma */\n    p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"#pragma HLS STREAM variable=\");\n    p = autosa_array_ref_group_print_fifo_name(group, p);\n    p = isl_printer_print_str(p, \"_\");\n    p = isl_printer_print_str(p, module->name);\n    if (pe_inout)\n    {\n      p = isl_printer_print_str(p, suffix);\n    }\n    p = isl_printer_print_str(p, \"\\\");\");\n    p = isl_printer_end_line(p);\n\n    if (module->type == IO_MODULE || module->type == DRAIN_MODULE)\n    {\n      if (boundary)\n      {\n        p = print_inst_ids_inc_suffix(p, n, n - 1, 1);\n      }\n      else\n      {\n        p = print_inst_ids_suffix(p, n, NULL);\n      }\n    }\n    else if (module->type == PE_MODULE)\n    {\n      if (boundary)\n      {\n        p = print_pretrans_inst_ids_suffix(p, n, group->io_L1_pe_expr, group->dir);\n      }\n      else\n      {\n        p = print_pretrans_inst_ids_suffix(p, n, group->io_L1_pe_expr, NULL);\n      }\n    }\n    //p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\" depth=2\\\");\");\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\" depth=\");\n    p = isl_printer_print_int(p, fifo_depth);\n    p = isl_printer_print_str(p, \"\\\");\");\n    p = isl_printer_end_line(p);\n    \n    p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n    /* If depth * width > 512 bits, HLS will use BRAM to implement FIFOs.\n     * Instead, we will insert pragmas to use SRL instead.\n     * Modified: Use SRL anytime.\n     */\n    /* Print fifo resource pragma. */\n    //if (n_lane * group->array->size >= 32)\n    {\n      p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"#pragma HLS RESOURCE variable=\");\n      p = autosa_array_ref_group_print_fifo_name(group, p);\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_str(p, module->name);\n      if (pe_inout)\n      {\n        p = isl_printer_print_str(p, suffix);\n      }\n      p = isl_printer_print_str(p, \"\\\");\");\n      p = isl_printer_end_line(p);\n\n      if (module->type == IO_MODULE || module->type == DRAIN_MODULE)\n      {\n        if (boundary)\n        {\n          p = print_inst_ids_inc_suffix(p, n, n - 1, 1);\n        }\n        else\n        {\n          p = print_inst_ids_suffix(p, n, NULL);\n        }\n      }\n      else if (module->type == PE_MODULE)\n      {\n        if (boundary)\n        {\n          p = print_pretrans_inst_ids_suffix(p, n, group->io_L1_pe_expr, group->dir);\n        }\n        else\n        {\n          p = print_pretrans_inst_ids_suffix(p, n, group->io_L1_pe_expr, NULL);\n        }\n      }      \n      p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\" core=FIFO_SRL\\\");\");      \n      p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n    }\n\n    /* For sparse structure, we will need to perform data pack. */\n    if (group->local_array->is_sparse) {\n      p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"#pragma HLS DATA_PACK variable=\");\n      p = autosa_array_ref_group_print_fifo_name(group, p);\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_str(p, module->name);\n      if (pe_inout)\n      {\n        p = isl_printer_print_str(p, suffix);\n      }\n      p = isl_printer_print_str(p, \"\\\");\");\n      p = isl_printer_end_line(p);\n\n      if (module->type == IO_MODULE || module->type == DRAIN_MODULE)\n      {\n        if (boundary)\n        {\n          p = print_inst_ids_inc_suffix(p, n, n - 1, 1);\n        }\n        else\n        {\n          p = print_inst_ids_suffix(p, n, NULL);\n        }\n      }\n      else if (module->type == PE_MODULE)\n      {\n        if (boundary)\n        {\n          p = print_pretrans_inst_ids_suffix(p, n, group->io_L1_pe_expr, group->dir);\n        }\n        else\n        {\n          p = print_pretrans_inst_ids_suffix(p, n, group->io_L1_pe_expr, NULL);\n        }\n      }                  \n      p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n    }    \n  }\n\n  return p;\n}\n\n/* if module->type == PE_MODULE\n *   if boundary == 0:\n *     new_inst_id = io_trans(inst_id)\n *     print [fifo_name]_[module_name]_[new_inst_id]\n *   else if boundary == 1:\n *     new_inst_id = io_trans(inst_id)\n *     print [fifo_name]_[module_name]_[new_inst_id + dep_dir]\n * if module->type == IO_MODULE:\n *     print [fifo_name]_[module_name]_[inst_id]\n */\nstatic __isl_give isl_printer *print_fifo_decl(__isl_take isl_printer *p,\n                                               struct autosa_kernel_stmt *stmt, struct autosa_prog *prog, struct hls_info *hls)\n{\n  struct autosa_hw_module *module = stmt->u.m.module;\n  struct autosa_array_ref_group *group = stmt->u.m.group;\n  int pe_inout;\n\n  if (group->io_type == AUTOSA_INT_IO && module->type == PE_MODULE && group->pe_io_dir == IO_INOUT)\n  {\n    pe_inout = 1;\n  }\n  else\n  {\n    pe_inout = 0;\n  }\n\n  if (pe_inout)\n  {\n    p = print_fifo_decl_single(p, stmt, prog, hls, 1, \"_in\");\n    p = print_fifo_decl_single(p, stmt, prog, hls, 1, \"_out\");\n  }\n  else\n  {\n    p = print_fifo_decl_single(p, stmt, prog, hls, 0, NULL);\n  }\n\n  return p;\n}\n\n__isl_give isl_printer *autosa_kernel_print_fifo_decl(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct autosa_prog *prog, struct hls_info *hls)\n{\n  p = ppcg_start_block(p);\n\n  /* Build the fifo_decl. */\n  p = print_fifo_decl(p, stmt, prog, hls);\n\n  p = ppcg_end_block(p);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_delimiter(__isl_take isl_printer *p,\n                                               int *first)\n{\n  if (!(*first))\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\",\\\");\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n    p = isl_printer_end_line(p);\n  }\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_start_line(p);\");\n  p = isl_printer_end_line(p);\n\n  *first = 0;\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_fifo_annotation(__isl_take isl_printer *p)\n{\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* fifo */ \\\");\");\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print out\n * [fifo_name]_[module_name]\n */\nstatic __isl_give isl_printer *print_fifo_prefix(__isl_take isl_printer *p,\n                                                 struct autosa_hw_module *module, struct autosa_array_ref_group *group)\n{\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"\");\n  p = autosa_array_ref_group_print_fifo_name(group, p);\n  p = isl_printer_print_str(p, \"_\");\n  p = isl_printer_print_str(p, module->name);\n  p = isl_printer_print_str(p, \"\\\");\");\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print the upper body of the module call, including:\n * - module identifier\n * - parameters\n * - host loop iterators\n * - arrays\n * - inter-module fifos\n */\n__isl_give isl_printer *print_module_call_upper(__isl_take isl_printer *p,\n                                                struct autosa_kernel_stmt *stmt, struct autosa_prog *prog,\n                                                enum platform target)\n{\n  struct autosa_hw_module *module = stmt->u.m.module;\n  struct autosa_pe_dummy_module *pe_dummy_module = stmt->u.m.pe_dummy_module;\n  int lower = stmt->u.m.lower;\n  int upper = stmt->u.m.upper;\n  int boundary = stmt->u.m.boundary;\n  int serialize = stmt->u.m.serialize;\n  int dummy = stmt->u.m.dummy;\n  int first = 1;\n  int n;\n  char *module_name = stmt->u.m.module_name;\n  isl_space *space;\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"// Print calls of module: \");\n  p = isl_printer_print_str(p, module_name);\n  if (boundary) {\n    p = isl_printer_print_str(p, \"_boundary\");\n  }\n  if (serialize) {\n    p = isl_printer_print_str(p, \"_serialize\");\n  }\n  p = isl_printer_end_line(p);\n\n  if (dummy && stmt->u.m.lower_sched_val != -1) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int c\");\n    p = isl_printer_print_int(p, isl_id_list_n_id(module->inst_ids) - 1);\n    p = isl_printer_print_str(p, \" = \");\n    p = isl_printer_print_int(p, stmt->u.m.lower_sched_val);\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n  }\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_start_line(p);\");\n  p = isl_printer_end_line(p);\n\n  if (target == TAPA_HW)\n    p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\".invoke(\\\");\");\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"\");\n  p = isl_printer_print_str(p, module_name);\n  if (boundary) {\n    p = isl_printer_print_str(p, \"_boundary\");\n  }\n  if (serialize) {\n    p = isl_printer_print_str(p, \"_serialize\");\n  }  \n\n  if (target == XILINX_HW || target == TAPA_HW) {\n    if (!dummy && module->type == PE_MODULE)\n      p = isl_printer_print_str(p, \"_wrapper\");\n    else if (module->type != PE_MODULE && module->level == 1)\n      p = isl_printer_print_str(p, \"_wrapper\");\n  }\n  if (target == CATAPULT_HW) {\n    p = isl_printer_print_str(p, \"_inst\\\");\");\n    /* Print module ids if any */\n    if (isl_id_list_n_id(module->inst_ids) > 0) {\n      for (int i = 0; i < isl_id_list_n_id(module->inst_ids); i++)\n      {\n        p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"_\\\");\");\n        p = isl_printer_start_line(p);        \n        p = isl_printer_print_str(p, \"p = isl_printer_print_int(p, c\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\".run\");    \n  }\n  p = isl_printer_print_str(p, \"\\\");\");\n  p = isl_printer_end_line(p);\n\n  if (isl_id_list_n_id(module->inst_ids) > 0 && prog->scop->options->autosa->use_cplusplus_template) {\n    p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"<\\\");\");    \n    if (!dummy) {\n      for (int i = 0; i < isl_id_list_n_id(module->inst_ids); i++) {\n        if (i > 0) {          \n          p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\", \\\");\");\n        }\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"p = isl_printer_print_int(p, c\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    } else {\n      isl_ast_expr *expr = pe_dummy_module->io_group->io_L1_pe_expr;\n      int n_arg = isl_ast_expr_op_get_n_arg(expr);\n      for (int i = 0; i < isl_id_list_n_id(module->inst_ids); i++) {\n        if (i > 0) {          \n          p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\", \\\");\");\n        }\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"p = isl_printer_print_int(p, \");\n        isl_ast_expr *expr_i = isl_ast_expr_get_op_arg(expr, i + 1);\n        p = isl_printer_print_ast_expr(p, expr_i);\n        isl_ast_expr_free(expr_i);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n    p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\">\\\");\");\n  }\n\n  if (target != TAPA_HW)\n    p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"(\\\");\");  \n  else\n    p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\",\\\");\");\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_indent(p, 2);\");\n  p = isl_printer_end_line(p);\n\n  /* module identifiers */\n  if (!prog->scop->options->autosa->use_cplusplus_template) {\n    if (!dummy)\n    {\n      for (int i = 0; i < isl_id_list_n_id(module->inst_ids); i++)\n      {\n        p = print_delimiter(p, &first);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* module id */ \\\");\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"p = isl_printer_print_int(p, c\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n    else\n    {\n      isl_ast_expr *expr = pe_dummy_module->io_group->io_L1_pe_expr;\n      int n_arg = isl_ast_expr_op_get_n_arg(expr);\n      for (int i = 0; i < isl_id_list_n_id(module->inst_ids); i++)\n      {\n        int format;\n        p = print_delimiter(p, &first);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* module id */ \\\");\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"p = isl_printer_print_int(p, \");\n        \n        isl_ast_expr *expr_i = isl_ast_expr_get_op_arg(expr, i + 1);\n        p = isl_printer_print_ast_expr(p, expr_i);\n        isl_ast_expr_free(expr_i);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n\n  /* params */\n  space = isl_union_set_get_space(module->kernel->arrays);\n  n = isl_space_dim(space, isl_dim_param);\n  for (int i = 0; i < n; i++)\n  {\n    p = print_delimiter(p, &first);\n\n    const char *name = isl_space_get_dim_name(space, isl_dim_set, i);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* param */\");\n    p = isl_printer_print_str(p, name);\n    p = isl_printer_print_str(p, \"\\\");\");\n    p = isl_printer_end_line(p);\n  }\n  isl_space_free(space);\n\n  /* host iterators */\n  n = isl_space_dim(module->kernel->space, isl_dim_set);\n  for (int i = 0; i < n; i++)\n  {\n    p = print_delimiter(p, &first);\n\n    const char *name = isl_space_get_dim_name(module->kernel->space, isl_dim_set, i);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* host iter */ \");\n    p = isl_printer_print_str(p, name);\n    p = isl_printer_print_str(p, \"\\\");\");\n    p = isl_printer_end_line(p);\n  }\n\n  /* scalar and arrays */\n  if (module->type != PE_MODULE && module->to_mem && \n      ((module->is_serialized && serialize) || !module->is_serialized))\n  {\n    p = print_delimiter(p, &first);\n\n    p = isl_printer_start_line(p);\n    if (prog->scop->options->autosa->axi_stream) {\n      p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* fifo */ \");    \n      p = isl_printer_print_str(p, \"fifo_\");    \n      p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n    } else {\n      p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* array */ \");    \n      p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n    }\n    if (module->io_groups[0]->local_array->n_io_group_refs > 1)\n    {\n      if (module->io_groups[0]->n_mem_ports == 1)\n      {\n        /* Print A_[module_n_array_ref] */\n        p = isl_printer_print_str(p, \"_\");\n        p = isl_printer_print_int(p, module->n_array_ref);\n        p = isl_printer_print_str(p, \"\\\");\");\n        p = isl_printer_end_line(p);\n      }\n      else\n      {\n        /* Print A_[module_n_array_ref + c0] */\n        p = isl_printer_print_str(p, \"_\\\");\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"p = isl_printer_print_int(p, c0 + \");\n        p = isl_printer_print_int(p, module->n_array_ref);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n    else\n    {\n      p = isl_printer_print_str(p, \"\\\");\");\n      p = isl_printer_end_line(p);\n    }\n  }\n  else if (module->type == PE_MODULE)\n  {\n    for (int i = 0; i < prog->n_array; i++)\n    {\n      int required;\n\n      required = autosa_kernel_requires_array_argument(module->kernel, i);\n      if (required < 0)\n        return isl_printer_free(p);\n      if (!required)\n        continue;\n\n      if (autosa_array_is_read_only_scalar(&prog->array[i]))\n      {\n        p = print_delimiter(p, &first);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* scalar */ \");\n        p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n        p = isl_printer_print_str(p, \"\\\");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n\n  /* FIFO */\n  n = isl_id_list_n_id(module->inst_ids);\n  if (module->type == PE_MODULE)\n  {\n    if (dummy)\n    {\n      struct autosa_array_ref_group *group = pe_dummy_module->io_group;\n      p = print_delimiter(p, &first);\n      p = print_fifo_annotation(p);\n      p = print_fifo_prefix(p, module, group);\n      if (isl_vec_is_zero(group->dir))\n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"_in\\\")\");\n        p = isl_printer_end_line(p);\n      }\n      if (pe_dummy_module->in)\n        p = print_pretrans_inst_ids_suffix(p, n, group->io_L1_pe_expr, group->dir);\n      else\n        p = print_pretrans_inst_ids_suffix(p, n, group->io_L1_pe_expr, NULL);\n    }\n    else\n    {\n      for (int i = 0; i < module->n_io_group; i++)\n      {\n        struct autosa_array_ref_group *group = module->io_groups[i];\n        if (group->pe_io_dir == IO_NULL)\n          continue;\n        if (group->pe_io_dir == IO_INOUT)\n        {\n          p = print_delimiter(p, &first);\n          p = print_fifo_annotation(p);\n          p = print_fifo_prefix(p, module, group);          \n          if (group->io_type == AUTOSA_INT_IO)\n          {\n            p = isl_printer_start_line(p);\n            p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"_in\\\");\");\n            p = isl_printer_end_line(p);\n          }\n          p = print_inst_ids_suffix(p, n, NULL);\n\n          p = print_delimiter(p, &first);\n          p = print_fifo_annotation(p);\n          p = print_fifo_prefix(p, module, group);          \n          if (group->io_type == AUTOSA_INT_IO)\n          {\n            p = isl_printer_start_line(p);\n            p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"_out\\\");\");\n            p = isl_printer_end_line(p);\n          }          \n          if (group->io_type == AUTOSA_INT_IO)\n          {\n            p = print_inst_ids_suffix(p, n, NULL);\n          }\n          else\n          {\n            p = print_inst_ids_suffix(p, n, group->dir);\n          }\n        }\n        else\n        {\n          p = print_delimiter(p, &first);\n          p = print_fifo_annotation(p);\n          p = print_fifo_prefix(p, module, group);\n          p = print_inst_ids_suffix(p, n, NULL);\n        }\n      }\n    }\n  }\n  else\n  {\n    if (!module->to_mem)\n    {\n      for (int i = 0; i < module->n_io_group; i++)\n      {\n        struct autosa_array_ref_group *group = module->io_groups[i];\n        if (module->in)\n        {\n          p = print_delimiter(p, &first);\n          p = print_fifo_annotation(p);\n          p = print_fifo_prefix(p, module, group);\n          p = print_inst_ids_suffix(p, n, NULL);\n\n          if (!boundary)\n          {\n            p = print_delimiter(p, &first);\n            p = print_fifo_annotation(p);\n            p = print_fifo_prefix(p, module, group);\n            p = print_inst_ids_inc_suffix(p, n, n - 1, 1);\n          }\n        }\n        else\n        {\n          if (!boundary)\n          {\n            p = print_delimiter(p, &first);\n            p = print_fifo_annotation(p);\n            p = print_fifo_prefix(p, module, group);\n            p = print_inst_ids_inc_suffix(p, n, n - 1, 1);\n          }\n\n          p = print_delimiter(p, &first);\n          p = print_fifo_annotation(p);\n          p = print_fifo_prefix(p, module, group);\n          p = print_inst_ids_suffix(p, n, NULL);\n        }\n      }\n    } else {\n      if (module->is_serialized && !serialize) {\n        struct autosa_array_ref_group *group = module->io_groups[0];\n        p = print_delimiter(p, &first);\n        p = print_fifo_annotation(p);\n        p = print_fifo_prefix(p, module, group);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"_serialize\\\");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n\n  return p;\n}\n\n/* Build the lower-level module name to the current \"module\".\n */\nstatic char *build_io_module_lower_name(struct autosa_hw_module *module)\n{\n  struct autosa_array_ref_group *group = module->io_groups[0];\n\n  isl_printer *p = isl_printer_to_str(module->kernel->ctx);\n  p = isl_printer_print_str(p, group->array->name);\n  if (group->group_type == AUTOSA_IO_GROUP)\n  {\n    if (group->local_array->n_io_group > 1)\n    {\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_int(p, group->nr);\n    }\n  }\n  else if (group->group_type == AUTOSA_DRAIN_GROUP)\n  {\n    p = isl_printer_print_str(p, \"_\");\n    p = isl_printer_print_str(p, \"drain\");\n  }\n  p = isl_printer_print_str(p, \"_IO_L\");\n  p = isl_printer_print_int(p, module->level - 1);\n  if (module->in)\n    p = isl_printer_print_str(p, \"_in\");\n  else\n    p = isl_printer_print_str(p, \"_out\");\n\n  char *name = isl_printer_get_str(p);\n  isl_printer_free(p);\n\n  return name;\n}\n\n/* Print the prefix of fifos to the lower-level modules. \n */\nstatic __isl_give isl_printer *print_fifo_prefix_lower(\n    __isl_take isl_printer *p,\n    struct autosa_hw_module *module, struct autosa_array_ref_group *group)\n{\n  int lower_is_PE;\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"\");\n  p = autosa_array_ref_group_print_fifo_name(group, p);\n  p = isl_printer_print_str(p, \"_\");\n  assert(module->type != PE_MODULE);\n\n  if (module->to_pe)\n    lower_is_PE = 1;\n  else\n    lower_is_PE = 0;\n\n  if (!lower_is_PE)\n  {\n    char *name = build_io_module_lower_name(module);\n    p = isl_printer_print_str(p, name);\n    free(name);\n  }\n  else\n  {\n    p = isl_printer_print_str(p, \"PE\");\n  }\n  p = isl_printer_print_str(p, \"\\\");\");\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print the lower body of the module call, including the \n * fifos to the lower-level modules.\n */\nstatic __isl_give isl_printer *print_module_call_lower(__isl_take isl_printer *p,\n                                                       struct autosa_kernel_stmt *stmt, struct autosa_prog *prog, enum platform target)\n{\n  struct autosa_hw_module *module = stmt->u.m.module;\n  int lower = stmt->u.m.lower;\n  int first = 0;\n  int n = isl_id_list_n_id(module->inst_ids);\n  int lower_is_PE;\n  int boundary = stmt->u.m.boundary;\n  int serialize = stmt->u.m.serialize;\n\n  if (lower)\n  {\n    struct autosa_array_ref_group *group = module->io_groups[0];\n\n    p = print_delimiter(p, &first);\n    p = print_fifo_annotation(p);\n    if (serialize) {\n      p = print_fifo_prefix(p, module, group);\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"_serialize\\\");\");      \n      p = isl_printer_end_line(p);\n    } else {\n      p = print_fifo_prefix_lower(p, module, group);\n  \n      if (module->to_pe)\n        lower_is_PE = 1;\n      else\n        lower_is_PE = 0;\n  \n      if (group->io_type == AUTOSA_INT_IO && lower_is_PE && group->pe_io_dir == IO_INOUT)\n      {\n        /* Add in/out suffix. */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"\");\n        p = isl_printer_print_str(p, module->in ? \"_in\" : \"_out\");\n        p = isl_printer_print_str(p, \"\\\");\");\n        p = isl_printer_end_line(p);\n      }\n  \n      if (lower_is_PE) {\n        p = print_pretrans_inst_ids_suffix(p, module->kernel->n_sa_dim,\n                                           boundary ? group->io_pe_expr_boundary : group->io_pe_expr, \n                                           module->in || group->pe_io_dir != IO_INOUT? NULL : group->dir\n                                           );\n      } else {\n        if (stmt->u.m.lower_sched_val != -1) {\n          p = print_inst_ids_suffix(p, n, NULL);\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"_\");\n          p = isl_printer_print_int(p, stmt->u.m.lower_sched_val);\n          p = isl_printer_print_str(p, \"\\\");\");\n          p = isl_printer_end_line(p);        \n        } else {\n          p = print_inst_ids_suffix(p, n + 1, NULL);\n        }\n      }\n    }\n  }\n\n  if (target != TAPA_HW) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n    p = isl_printer_end_line(p);\n  }\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_indent(p, -2);\");\n  p = isl_printer_end_line(p);\n\n  if (target != TAPA_HW) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_start_line(p);\");\n    p = isl_printer_end_line(p);\n  }\n\n  p = isl_printer_start_line(p);\n  if (target != TAPA_HW)\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\");\\\");\");\n  else\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\")\\\");\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print out the module call instantionation in the private class fields for \n * Catapult HLS.\n */\n__isl_give isl_printer *autosa_kernel_print_module_call_inst(\n  __isl_take isl_printer *p,\n  struct autosa_kernel_stmt *stmt, struct autosa_prog *prog,\n  enum platform target)\n{\n  int upper = stmt->u.m.upper;\n  int lower = stmt->u.m.lower;\n  int complete = (upper == 0 && lower == 0);\n  int dummy = stmt->u.m.dummy;\n  int boundary = stmt->u.m.boundary;\n  int serialize = stmt->u.m.serialize;\n  char *module_name = stmt->u.m.module_name;\n  struct autosa_hw_module *module = stmt->u.m.module;\n\n  if (dummy)\n    return p;\n\n  p = ppcg_start_block(p);\n\n  if (complete || upper) {\n    p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n    \n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"\");\n    p = isl_printer_print_str(p, module->name);\n    if (boundary)\n      p = isl_printer_print_str(p, \"_boundary\");    \n    if (serialize)\n      p = isl_printer_print_str(p, \"_serialize\");    \n    p = isl_printer_print_str(p, \"\\\");\");\n    p = isl_printer_end_line(p);\n    \n    p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\" \\\");\");\n    \n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"\");\n    p = isl_printer_print_str(p, module->name);\n    if (boundary)\n      p = isl_printer_print_str(p, \"_boundary\");    \n    if (serialize)\n      p = isl_printer_print_str(p, \"_serialize\");    \n    p = isl_printer_print_str(p, \"_inst\");\n    p = isl_printer_print_str(p, \"\\\");\");\n    p = isl_printer_end_line(p);    \n\n    /* Print the module ids if any */\n    if (!dummy)\n    {\n      for (int i = 0; i < isl_id_list_n_id(module->inst_ids); i++)\n      {                 \n        p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"_\\\");\");\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"p = isl_printer_print_int(p, c\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \");\");        \n        p = isl_printer_end_line(p);\n      }\n    }\n    \n    p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\";\\\");\");\n    p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  } \n\n  p = ppcg_end_block(p);\n\n  return p;\n}\n\n/* Print out the module calls:\n * - module_call_upper\n * - module_call_lower\n */\n__isl_give isl_printer *autosa_kernel_print_module_call(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct autosa_prog *prog,\n    enum platform target)\n{\n  int upper = stmt->u.m.upper;\n  int lower = stmt->u.m.lower;\n  int complete = (upper == 0 && lower == 0);\n  int dummy = stmt->u.m.dummy;\n  int boundary = stmt->u.m.boundary;\n  int serialize = stmt->u.m.serialize;\n  char *module_name = stmt->u.m.module_name;\n  struct autosa_hw_module *module = stmt->u.m.module;\n  p = ppcg_start_block(p);\n\n  /* Build the module name. */\n  if (complete)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"// Count module number\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, module_name);\n    if (boundary)\n      p = isl_printer_print_str(p, \"_boundary\");\n    p = isl_printer_print_str(p, \"_cnt++;\");\n    p = isl_printer_end_line(p);\n    if (module->is_filter && module->is_buffer)\n    {\n      /* Print counter for inter_trans and intra_trans module. */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, module_name);\n      p = isl_printer_print_str(p, \"_intra_trans_cnt++;\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, module_name);\n      if (boundary)\n        p = isl_printer_print_str(p, \"_inter_trans_boundary_cnt++;\");\n      else\n        p = isl_printer_print_str(p, \"_inter_trans_cnt++;\");\n      p = isl_printer_end_line(p);\n    }\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_start_line(p);\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* Module Call */\\\");\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n    p = isl_printer_end_line(p);\n\n    p = print_module_call_upper(p, stmt, prog, target);\n    p = print_module_call_lower(p, stmt, prog, target);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_start_line(p);\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* Module Call */\\\");\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n    p = isl_printer_end_line(p);\n  }\n  else\n  {\n    if (upper)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"// Count module number\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, module_name);\n      if (boundary)\n        p = isl_printer_print_str(p, \"_boundary\");\n      if (serialize)        \n        p = isl_printer_print_str(p, \"_serialize\");\n      p = isl_printer_print_str(p, \"_cnt++;\");\n      p = isl_printer_end_line(p);\n      if (module->is_filter && module->is_buffer && !serialize)\n      {\n        /* Print counter for inter_trans and intra_trans module */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, module_name);\n        p = isl_printer_print_str(p, \"_intra_trans_cnt++;\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, module_name);\n        if (boundary)\n          p = isl_printer_print_str(p, \"_inter_trans_boundary_cnt++;\");\n        else\n          p = isl_printer_print_str(p, \"_inter_trans_cnt++;\");\n        p = isl_printer_end_line(p);\n      }\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_start_line(p);\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* Module Call */\\\");\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n      p = isl_printer_end_line(p);\n\n      p = print_module_call_upper(p, stmt, prog, target);\n    }\n    else\n    {\n      p = print_module_call_lower(p, stmt, prog, target);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_start_line(p);\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* Module Call */\\\");\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n      p = isl_printer_end_line(p);\n    }\n  }\n\n  p = ppcg_end_block(p);\n\n  return p;\n}\n\n/* If read, print:\n *   \"[fifo_name].read()\"\n * else, print:\n *   \"[fifo_name].write(\"\n */\n__isl_give isl_printer *print_fifo_rw_xilinx(__isl_take isl_printer *p,\n                                             const char *fifo_name, int read)\n{\n  if (read)\n  {\n    p = isl_printer_print_str(p, fifo_name);\n    p = isl_printer_print_str(p, \".read()\");\n  }\n  else\n  {\n    p = isl_printer_print_str(p, fifo_name);\n    p = isl_printer_print_str(p, \".write(\");\n  }\n  return p;\n}\n\n__isl_give isl_printer *print_fifo_rw_catapult(\n  __isl_take isl_printer *p, const char *fifo_name, int read)\n{\n  if (read) {\n    p = isl_printer_print_str(p, fifo_name);\n    p = isl_printer_print_str(p, \".read()\");\n  } else {\n    p = isl_printer_print_str(p, fifo_name);\n    p = isl_printer_print_str(p, \".write(\");\n  }\n  return p;\n}\n\n/* If read, print:\n *   \"read_channel_intel([fifo_name])\"\n * else, print:\n *   \"write_channel_intel([fifo_name])\"\n */\n__isl_give isl_printer *print_fifo_rw_intel(__isl_take isl_printer *p,\n                                            const char *fifo_name, int read)\n{\n  if (read)\n  {\n    p = isl_printer_print_str(p, \"read_channel_intel(\");\n    p = isl_printer_print_str(p, fifo_name);\n    p = isl_printer_print_str(p, \")\");\n  }\n  else\n  {\n    p = isl_printer_print_str(p, \"write_channel_intel(\");\n    p = isl_printer_print_str(p, fifo_name);\n    p = isl_printer_print_str(p, \", \");\n  }\n  return p;\n}\n\n__isl_give isl_printer *print_fifo_rw_tapa(\n  __isl_take isl_printer *p, const char *fifo_name, int read)\n{\n  if (read) {\n    p = isl_printer_print_str(p, fifo_name);\n    p = isl_printer_print_str(p, \".read()\");\n  } else {\n    p = isl_printer_print_str(p, fifo_name);\n    p = isl_printer_print_str(p, \".write(\");\n  }\n  return p;\n}\n\n/* Print an I/O statement.\n *\n * An in I/O statement is printed as\n *\n *  local[] = fifo.read(); \n *\n * while an out I/O statement is printed as\n *\n *  fifo.write(local);\n */\n__isl_give isl_printer *autosa_kernel_print_io(__isl_take isl_printer *p,\n                                               struct autosa_kernel_stmt *stmt, struct hls_info *hls)\n{\n  struct autosa_hw_module *module = stmt->u.i.module;\n  struct autosa_array_ref_group *group = stmt->u.i.group;\n  struct autosa_kernel *kernel = module->kernel;\n  char *fifo_name;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n  int is_dummy = stmt->u.i.dummy;\n  fifo_name = concat(ctx, stmt->u.i.in_fifo_name, stmt->u.i.in == 1 ? \"in\" : \"out\");\n  int data_pack = stmt->u.i.data_pack;  \n\n  /* Extract the sparse data. */\n  int is_sparse = group->local_array->is_sparse;\n  int vec_len = stmt->u.i.local_array->vec_len;\n  int n_nzero = stmt->u.i.local_array->n_nzero;\n  float compress_ratio = stmt->u.i.local_array->compress_ratio;\n  int n_meta_data = stmt->u.i.local_array->n_meta_data;\n  float eff_compress_ratio = stmt->u.i.local_array->eff_compress_ratio;\n\n  if (is_dummy)  \n  {\n    if (stmt->u.i.in) {\n      /* [type] fifo_data; */\n      p = isl_printer_start_line(p);\n      if (is_sparse) {\n        p = isl_printer_print_str(p, group->array->name);\n        p = isl_printer_print_str(p, \"_s_t\");\n        p = isl_printer_print_int(p, data_pack);\n      } else {        \n        p = isl_printer_print_str(p, group->array->name);\n        p = isl_printer_print_str(p, \"_t\");\n        p = isl_printer_print_int(p, data_pack);        \n      }\n      p = isl_printer_print_str(p, \" fifo_data;\");\n      p = isl_printer_end_line(p);\n\n      /* fifo_data = fifo.read(); */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"fifo_data = \");\n      if (hls->target == XILINX_HW)\n        p = print_fifo_rw_xilinx(p, fifo_name, 1);\n      else if (hls->target == TAPA_HW)\n        p = print_fifo_rw_tapa(p, fifo_name, 1);\n      else if (hls->target == INTEL_HW)\n        p = print_fifo_rw_intel(p, fifo_name, 1);\n      else if (hls->target == CATAPULT_HW)  \n        p = print_fifo_rw_catapult(p, fifo_name, 1);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n\n      free(fifo_name);\n      return p;\n    } else {\n      /* Send zeros by default, might be buggy. */      \n      /* [type] fifo_data = 0; */\n      p = isl_printer_start_line(p);      \n      p = isl_printer_print_str(p, group->array->name);\n      p = isl_printer_print_str(p, \"_t\");\n      p = isl_printer_print_int(p, data_pack);      \n      p = isl_printer_print_str(p, \" fifo_data = 0;\");\n      p = isl_printer_end_line(p);\n      \n      /* fifo.write(fifo_data); */\n      p = isl_printer_start_line(p);\n      if (hls->target == XILINX_HW)\n        p = print_fifo_rw_xilinx(p, fifo_name, 0);\n      else if (hls->target == TAPA_HW)\n        p = print_fifo_rw_tapa(p, fifo_name, 0);\n      else if (hls->target == INTEL_HW)\n        p = print_fifo_rw_intel(p, fifo_name, 0);\n      else if (hls->target == CATAPULT_HW)\n        p = print_fifo_rw_catapult(p, fifo_name, 0);\n      p = isl_printer_print_str(p, \"fifo_data);\");\n      p = isl_printer_end_line(p);\n\n      free(fifo_name);\n      return p;      \n    }\n  }\n\n  int nxt_data_pack = stmt->u.i.nxt_data_pack;\n  isl_ast_expr *local_index_packed;\n  isl_ast_expr *arg, *div;\n  int n_arg;\n  local_index_packed = isl_ast_expr_copy(stmt->u.i.local_index);\n  /* Modify the local index. */\n  if (data_pack > 1)\n  {\n    n_arg = isl_ast_expr_get_op_n_arg(local_index_packed);\n    arg = isl_ast_expr_get_op_arg(local_index_packed, n_arg - 1);\n    div = isl_ast_expr_from_val(isl_val_int_from_si(ctx, data_pack));\n    arg = isl_ast_expr_div(arg, div);\n    local_index_packed = isl_ast_expr_set_op_arg(local_index_packed, n_arg - 1, arg);\n  }\n\n  if (data_pack == nxt_data_pack && !group->local_array->is_sparse)\n  {\n    // TODO: modify the sparse\n\n    /* local[] = fifo.read() */\n    p = isl_printer_start_line(p);\n    if (stmt->u.i.in)\n    {\n      p = isl_printer_print_ast_expr(p, local_index_packed);\n      p = isl_printer_print_str(p, \" = \");\n      if (hls->target == XILINX_HW)\n        p = print_fifo_rw_xilinx(p, fifo_name, 1);\n      else if (hls->target == TAPA_HW)\n        p = print_fifo_rw_tapa(p, fifo_name, 1);\n      else if (hls->target == INTEL_HW)\n        p = print_fifo_rw_intel(p, fifo_name, 1);\n      else if (hls->target == CATAPULT_HW)\n        p = print_fifo_rw_catapult(p, fifo_name, 1);\n    }\n    else\n    {\n      /* fifo.write(local[]) */\n      if (hls->target == XILINX_HW)\n        p = print_fifo_rw_xilinx(p, fifo_name, 0);\n      else if (hls->target == TAPA_HW)\n        p = print_fifo_rw_tapa(p, fifo_name, 0);\n      else if (hls->target == INTEL_HW)\n        p = print_fifo_rw_intel(p, fifo_name, 0);\n      else if (hls->target == CATAPULT_HW)  \n        p = print_fifo_rw_catapult(p, fifo_name, 0);\n      p = isl_printer_print_ast_expr(p, local_index_packed);\n      p = isl_printer_print_str(p, \")\");\n    }\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n  } \n  else\n  {\n    p = ppcg_start_block(p);\n    if (!kernel->sparse) {\n      /* [type] fifo_data; */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, group->array->name);    \n      p = isl_printer_print_str(p, \"_t\");\n      p = isl_printer_print_int(p, data_pack);\n      p = isl_printer_print_str(p, \" fifo_data;\");\n      p = isl_printer_end_line(p);\n    }    \n\n    if (kernel->sparse && is_sparse == 0 && stmt->u.i.in) {\n      /* [type] tmp_X[]; */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, group->array->type);      \n      p = isl_printer_print_str(p, \" \");\n      p = isl_printer_print_str(p, group->array->name);\n      p = isl_printer_print_str(p, \"_tmp[1][\");\n      p = isl_printer_print_int(p, group->n_lane);\n      p = isl_printer_print_str(p, \"];\");\n      p = isl_printer_end_line(p);\n\n      if (hls->target == XILINX_HW || hls->target == TAPA_HW) {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"#pragma HLS ARRAY_PARTITION variable=\");\n        p = isl_printer_print_str(p, group->array->name);\n        p = isl_printer_print_str(p, \"_tmp dim=0 complete\");\n        p = isl_printer_end_line(p);\n      }\n    }\n\n    if (stmt->u.i.in)\n    {\n      /* fifo_data = fifo.read(); */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"fifo_data\");\n      if (kernel->sparse) {\n        p = isl_printer_print_str(p, \"_\");\n        p = isl_printer_print_str(p, group->array->name);    \n      }\n      p = isl_printer_print_str(p, \" = \");\n      if (hls->target == XILINX_HW)\n        p = print_fifo_rw_xilinx(p, fifo_name, 1);\n      else if (hls->target == TAPA_HW)\n        p = print_fifo_rw_tapa(p, fifo_name, 1);\n      else if (hls->target == INTEL_HW)\n        p = print_fifo_rw_intel(p, fifo_name, 1);\n      else if (hls->target == CATAPULT_HW)\n        p = print_fifo_rw_catapult(p, fifo_name, 1);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n      if (kernel->sparse) {        \n        /* [type] fifo_data = fifo_data_X; */\n        if (is_sparse) {\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, group->array->name);\n          p = isl_printer_print_str(p, \"_s_t\");\n          p = isl_printer_print_int(p, group->n_lane);\n          p = isl_printer_print_str(p, \" fifo_data = fifo_data_\");\n          p = isl_printer_print_str(p, group->array->name);\n          p = isl_printer_print_str(p, \";\");\n          p = isl_printer_end_line(p);          \n        } else {\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, group->array->name);\n          p = isl_printer_print_str(p, \"_t\");\n          p = isl_printer_print_int(p, group->n_lane);\n          p = isl_printer_print_str(p, \" fifo_data = fifo_data_\");\n          p = isl_printer_print_str(p, group->array->name);\n          p = isl_printer_print_str(p, \";\");\n          p = isl_printer_end_line(p);          \n        }\n      }\n\n      if (hls->target == XILINX_HW)\n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int n = 0; n < \");\n        if (is_sparse)\n          p = isl_printer_print_int(p, group->n_lane * n_nzero);  \n        else\n          p = isl_printer_print_int(p, data_pack / nxt_data_pack);\n        p = isl_printer_print_str(p, \"; n++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"#pragma HLS UNROLL\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n        isl_ast_expr *op;\n        isl_ast_expr *expr = stmt->u.i.local_index;\n        int n_arg = isl_ast_expr_op_get_n_arg(expr);\n        /* Union */\n        if (nxt_data_pack == 1)\n        {\n          /* union {unsigned int ui; float ut;} u; */\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"union {unsigned int ui; \");\n          p = isl_printer_print_str(p, group->array->type);\n          p = isl_printer_print_str(p, \" ut;} u;\");\n          p = isl_printer_end_line(p);\n          /* u.ui = (unsigned int)fifo_data(32*next_data_pack - 1, 0); */\n          p = isl_printer_start_line(p);\n          if (kernel->sparse) {\n            if (is_sparse) \n              p = isl_printer_print_str(p, \"u.ui = (unsigned int)fifo_data.d(\");\n            else\n              p = isl_printer_print_str(p, \"u.ui = (unsigned int)fifo_data(\");\n          } else\n            p = isl_printer_print_str(p, \"u.ui = (unsigned int)fifo_data(\");\n          p = isl_printer_print_int(p, group->array->size * 8 * nxt_data_pack - 1);\n          p = isl_printer_print_str(p, \", 0);\");\n          p = isl_printer_end_line(p);\n        }\n        /* local[][n] = u.ut; or \n         * local[][n] = fifo_data(32*nxt_data_pack - 1, 0);\n         */\n        p = isl_printer_start_line(p);\n        op = isl_ast_expr_op_get_arg(expr, 0);        \n        if (kernel->sparse && group->local_array->is_sparse == 0 && group->local_array->array_type == AUTOSA_EXT_ARRAY) {\n          p = isl_printer_print_str(p, group->array->name);\n          p = isl_printer_print_str(p, \"_tmp\");\n        } else {\n          p = isl_printer_print_ast_expr(p, op); // array_name\n        }\n\n        isl_ast_expr_free(op);\n        for (int i = 0; i < n_arg - 1; i++)\n        {\n          op = isl_ast_expr_op_get_arg(expr, 1 + i);\n          p = isl_printer_print_str(p, \"[\");\n          if (i == n_arg - 2)\n          {\n            if (stmt->u.i.simd_depth != -1) {\n              //DBGASTEXPR(stdout, op, ctx);\n              p = isl_printer_print_ast_expr(p, op);\n              p = isl_printer_print_str(p, \" + n\");\n            } else {\n              p = isl_printer_print_str(p, \"n\");\n            }\n          }\n          else\n          {\n            p = isl_printer_print_ast_expr(p, op);\n          }\n          p = isl_printer_print_str(p, \"]\");\n          isl_ast_expr_free(op);\n        }\n        p = isl_printer_print_str(p, \" = \");\n        if (nxt_data_pack == 1)\n        {\n          p = isl_printer_print_str(p, \"u.ut;\");\n          p = isl_printer_end_line(p);\n        }\n        else\n        {\n          p = isl_printer_print_str(p, \"fifo_data(\");\n          p = isl_printer_print_int(p, group->array->size * 8 * nxt_data_pack - 1);\n          p = isl_printer_print_str(p, \", 0)\");\n          p = isl_printer_print_str(p, \";\");\n          p = isl_printer_end_line(p);\n        }\n        /* fifo_data = fifo_data >> 32*nxt_data_pack; */\n        p = isl_printer_start_line(p);\n        if (is_sparse)\n          p = isl_printer_print_str(p, \"fifo_data.d = fifo_data.d >> \");\n        else\n          p = isl_printer_print_str(p, \"fifo_data = fifo_data >> \");            \n        p = isl_printer_print_int(p, group->array->size * 8 * nxt_data_pack);\n        p = isl_printer_print_str(p, \";\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, -2);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"}\");\n        p = isl_printer_end_line(p);\n      }\n      else if (hls->target == INTEL_HW)\n      {\n        isl_ast_expr *op;\n        isl_ast_expr *expr = stmt->u.i.local_index;\n        int n_arg = isl_ast_expr_op_get_n_arg(expr);\n        for (int i = 0; i < data_pack / nxt_data_pack; i++)\n        {\n          /* local[][n] = fifo_data.sxxxx; */\n          p = isl_printer_start_line(p);\n          op = isl_ast_expr_op_get_arg(expr, 0);\n          p = isl_printer_print_ast_expr(p, op); // array_name\n          isl_ast_expr_free(op);\n          for (int j = 0; j < n_arg - 1; j++)\n          {\n            op = isl_ast_expr_op_get_arg(expr, 1 + j);\n            p = isl_printer_print_str(p, \"[\");\n            if (j == n_arg - 2)\n            {\n              p = isl_printer_print_int(p, i);\n            }\n            else\n            {\n              p = isl_printer_print_ast_expr(p, op);\n            }\n            p = isl_printer_print_str(p, \"]\");\n            isl_ast_expr_free(op);\n          }\n          if (nxt_data_pack > 1)\n            p = isl_printer_print_str(p, \".data\");\n          p = isl_printer_print_str(p, \" = fifo_data.data.s\");\n          for (int j = 0; j < nxt_data_pack; j++)\n          {\n            p = isl_printer_print_str(p, vector_index[j + i * nxt_data_pack]);\n          }\n          p = isl_printer_print_str(p, \";\");\n          p = isl_printer_end_line(p);\n        }\n      } else if (hls->target == CATAPULT_HW) {\n        p = print_str_new_line(p, \"#pragma unroll yes\");\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int n = 0; n < \");\n        if (is_sparse)\n          p = isl_printer_print_int(p, group->n_lane * n_nzero);  \n        else\n          p = isl_printer_print_int(p, data_pack / nxt_data_pack);\n        p = isl_printer_print_str(p, \"; n++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n        isl_ast_expr *op;\n        isl_ast_expr *expr = stmt->u.i.local_index;\n        int n_arg = isl_ast_expr_op_get_n_arg(expr);\n        /* local[][n] = fifo_data.slc(); */\n        p = isl_printer_start_line(p);\n        op = isl_ast_expr_op_get_arg(expr, 0);        \n        if (kernel->sparse && group->local_array->is_sparse == 0 && group->local_array->array_type == AUTOSA_EXT_ARRAY) {\n          p = isl_printer_print_str(p, group->array->name);\n          p = isl_printer_print_str(p, \"_tmp\");\n        } else {\n          p = isl_printer_print_ast_expr(p, op); // array_name\n        }\n        isl_ast_expr_free(op);\n        for (int i = 0; i < n_arg - 1; i++) {\n          op = isl_ast_expr_op_get_arg(expr, 1 + i);\n          p = isl_printer_print_str(p, \"[\");\n          if (i == n_arg - 2)\n          {\n            if (stmt->u.i.simd_depth != -1) {\n              //DBGASTEXPR(stdout, op, ctx);\n              p = isl_printer_print_ast_expr(p, op);\n              p = isl_printer_print_str(p, \" + n\");\n            } else {\n              p = isl_printer_print_str(p, \"n\");\n            }\n          }\n          else\n          {\n            p = isl_printer_print_ast_expr(p, op);\n          }\n          p = isl_printer_print_str(p, \"]\");\n          isl_ast_expr_free(op);\n        }\n        p = isl_printer_print_str(p, \" = (\");\n        p = isl_printer_print_str(p, group->array->name);\n        p = isl_printer_print_str(p, \"_t\");\n        p = isl_printer_print_int(p, nxt_data_pack);\n        p = isl_printer_print_str(p, \")fifo_data.slc<\");\n        p = isl_printer_print_int(p, group->array->size * 8 * nxt_data_pack);\n        p = isl_printer_print_str(p, \">(0);\");        \n        p = isl_printer_end_line(p);\n\n        /* fifo_data = fifo_data >> xx * nxt_data_pack; */\n        p = isl_printer_start_line(p);\n        if (is_sparse)\n          p = isl_printer_print_str(p, \"fifo_data.d = fifo_data.d >> \");\n        else\n          p = isl_printer_print_str(p, \"fifo_data = fifo_data >> \");      \n        p = isl_printer_print_int(p, group->array->size * 8 * nxt_data_pack);\n        p = isl_printer_print_str(p, \";\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, -2);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"}\");\n        p = isl_printer_end_line(p);  \n      }\n      else if (hls->target == TAPA_HW)\n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int n = 0; n < \");\n        if (is_sparse)\n          p = isl_printer_print_int(p, group->n_lane * n_nzero);\n        else\n          p = isl_printer_print_int(p, data_pack / nxt_data_pack);\n        p = isl_printer_print_str(p, \"; n++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"#pragma HLS UNROLL\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n        isl_ast_expr *op;\n        isl_ast_expr *expr = stmt->u.i.local_index;\n        int n_arg = isl_ast_expr_op_get_n_arg(expr);\n        /* local[][n] = fifo_data[n]; */\n        p = isl_printer_start_line(p);\n        op = isl_ast_expr_op_get_arg(expr, 0);\n        if (kernel->sparse && group->local_array->is_sparse == 0 && group->local_array->array_type == AUTOSA_EXT_ARRAY) {\n          p = isl_printer_print_str(p, group->array->name);\n          p = isl_printer_print_str(p, \"_tmp\");\n        } else {\n          p = isl_printer_print_ast_expr(p, op); // array_name\n        }\n\n        isl_ast_expr_free(op);\n        for (int i = 0; i < n_arg - 1; i++) {\n          op = isl_ast_expr_op_get_arg(expr, 1 + i);\n          p = isl_printer_print_str(p, \"[\");\n          if (i == n_arg - 2) {\n            if (stmt->u.i.simd_depth != -1) {\n              //DBGASTEXPR(stdout, op, ctx);\n              p = isl_printer_print_ast_expr(p, op);\n              p = isl_printer_print_str(p, \" + n\");\n            } else {\n              p = isl_printer_print_str(p, \"n\");\n            }\n          } else {\n            p = isl_printer_print_ast_expr(p, op);\n          }\n          p = isl_printer_print_str(p, \"]\");\n          isl_ast_expr_free(op);\n        }\n        p = isl_printer_print_str(p, \" = \");\n        if (nxt_data_pack == 1)\n          p = isl_printer_print_str(p, \"fifo_data[n];\");\n        else {\n          p = isl_printer_print_str(p, \"tapa::truncated<\");\n          p = isl_printer_print_int(p, nxt_data_pack);\n          p = isl_printer_print_str(p, \">(fifo_data, \");\n          p = isl_printer_print_int(p, nxt_data_pack);\n          p = isl_printer_print_str(p, \"* n)\");\n        }\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, -2);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"}\");\n        p = isl_printer_end_line(p);\n      }\n\n      if (kernel->sparse && group->local_array->is_sparse == 0) {\n        /* Print the extra data selection code. */        \n        int index_s, index_w;\n        int pos_w;\n\n        p = isl_printer_start_line(p);\n        index_w = (int)log2f((float)group->n_lane);\n        if (hls->target == XILINX_HW || hls->target == TAPA_HW) {\n          p = isl_printer_print_str(p, \"ap_uint<\");\n          p = isl_printer_print_int(p, index_w);\n        } else if (hls->target == CATAPULT_HW) {\n          p = isl_printer_print_str(p, \"ac_int<\");\n          p = isl_printer_print_int(p, index_w);\n          p = isl_printer_print_str(p, \", false\");\n        }\n        p = isl_printer_print_str(p, \"> index[\");\n        index_s = group->n_lane / kernel->vec_len * kernel->n_nzero;\n        p = isl_printer_print_int(p, index_s);\n        p = isl_printer_print_str(p, \"];\");\n        p = isl_printer_end_line(p);\n\n        if (hls->target == XILINX_HW || hls->target == TAPA_HW) {\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"#pragma HLS ARRAY_PARTITION variable=index dim=0 complete\");\n          p = isl_printer_end_line(p);\n        }\n\n        //p = print_str_new_line(p, \"unsigned char index = 0;\");\n        \n        p = isl_printer_start_line(p);\n        struct autosa_local_array_info *sparse_array;\n        for (int i = 0; i < kernel->n_array; i++) {\n          sparse_array = &kernel->array[i];\n          if (sparse_array->is_sparse)\n            break;\n        }\n        p = isl_printer_print_str(p, sparse_array->array->name);\n        p = isl_printer_print_str(p, \"_s_t\");\n        p = isl_printer_print_int(p, group->n_lane / kernel->vec_len);\n        p = isl_printer_print_str(p, \" \");        \n        p = isl_printer_print_str(p, \"s_tmp = fifo_data_\");\n        p = isl_printer_print_str(p, sparse_array->array->name);\n        p = isl_printer_print_str(p, \";\");\n        p = isl_printer_end_line(p);\n\n        pos_w = (int)log2f((float)index_s);\n        p = isl_printer_start_line(p);\n        if (hls->target == XILINX_HW || hls->target == TAPA_HW) {\n          p = isl_printer_print_str(p, \"ap_uint<\");\n          p = isl_printer_print_int(p, pos_w);\n        } else if (hls->target == CATAPULT_HW) {\n          p = isl_printer_print_str(p, \"ac_int<\");\n          p = isl_printer_print_int(p, pos_w);          \n          p = isl_printer_print_str(p, \", false\");\n        }\n        p = isl_printer_print_str(p, \"> pos = 0;\");\n        p = isl_printer_end_line(p);\n\n        if (hls->target == CATAPULT_HW) {\n          p = print_str_new_line(p, \"#pragma unroll yes\");\n        }\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int n = 0; n < \");\n        p = isl_printer_print_int(p, group->n_lane / kernel->vec_len);\n        p = isl_printer_print_str(p, \"; n++) {\");\n        p = isl_printer_end_line(p);\n\n        if (hls->target == XILINX_HW || hls->target == TAPA_HW) {\n          p = print_str_new_line(p, \"#pragma HLS UNROLL\");\n        }\n\n        p = isl_printer_indent(p, 2);        \n        p = print_str_new_line(p, \"unsigned char offset = s_tmp.i(7, 0);\");\n        p = print_str_new_line(p, \"s_tmp.i = s_tmp.i >> 8;\");\n        \n        if (hls->target == CATAPULT_HW) {\n          p = print_str_new_line(p, \"#pragma unroll yes\");\n        }\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int m = 0; m < \");        \n        p = isl_printer_print_int(p, kernel->vec_len);\n        p = isl_printer_print_str(p, \"; m++) {\");\n        p = isl_printer_end_line(p);\n        \n        if (hls->target == XILINX_HW || hls->target == TAPA_HW) {\n          p = print_str_new_line(p, \"#pragma HLS UNROLL\");\n        }\n        \n        p = isl_printer_indent(p, 2);\n        if (hls->target == XILINX_HW || hls->target == TAPA_HW) {\n          p = print_str_new_line(p, \"if ((ap_uint<1>)(offset & 1) == (ap_uint<1>)1) {\");\n        } else if (hls->target == CATAPULT_HW) {\n          p = print_str_new_line(p, \"if ((ac_int<1, false>)(offset & 1) == (ac_int<1, false>)1) {\");\n        }\n        p = isl_printer_indent(p, 2);\n        \n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"index[pos] = n * \");\n        p = isl_printer_print_int(p, kernel->vec_len);\n        p = isl_printer_print_str(p, \" + m;\");        \n        p = isl_printer_end_line(p);\n\n        p = print_str_new_line(p, \"pos++;\");\n\n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n        p = print_str_new_line(p, \"offset = offset >> 1;\");\n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n\n        if (hls->target == CATAPULT_HW) {\n          p = print_str_new_line(p, \"#pragma unroll yes\");\n        }\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int n = 0; n < \");\n        p = isl_printer_print_int(p, group->n_lane / kernel->vec_len * kernel->n_nzero);\n        p = isl_printer_print_str(p, \"; n++) {\");\n        p = isl_printer_end_line(p);\n\n        if (hls->target == XILINX_HW || hls->target == TAPA_HW) {\n          p = print_str_new_line(p, \"#pragma HLS UNROLL\");\n        }\n\n        p = isl_printer_indent(p, 2);\n        p = isl_printer_start_line(p);\n        isl_ast_expr *op;\n        isl_ast_expr *expr = stmt->u.i.local_index;\n        int n_arg = isl_ast_expr_op_get_n_arg(expr);\n        op = isl_ast_expr_op_get_arg(expr, 0);\n        p = isl_printer_print_ast_expr(p, op); // array_name;\n        isl_ast_expr_free(op);\n        for (int i = 0; i < n_arg - 1; i++) {\n          op = isl_ast_expr_op_get_arg(expr, 1 + i);\n          p = isl_printer_print_str(p, \"[\");\n          if (i == n_arg - 2) {\n            if (stmt->u.i.simd_depth != -1) {\n              p = isl_printer_print_ast_expr(p, op);\n              p = isl_printer_print_str(p, \" + n\");\n            } else {\n              p = isl_printer_print_str(p, \"n\");\n            }\n          } else {\n            p = isl_printer_print_ast_expr(p, op);\n          }\n          p = isl_printer_print_str(p, \"]\");\n          isl_ast_expr_free(op);\n        }\n        p = isl_printer_print_str(p, \" = \");\n        p = isl_printer_print_str(p, group->array->name);\n        p = isl_printer_print_str(p, \"_tmp[0][index[n]];\");\n        p = isl_printer_end_line(p);\n        \n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n      }\n    }\n    else\n    {\n      if (hls->target == XILINX_HW)\n      {\n        if (kernel->sparse) {\n          p = isl_printer_start_line(p);\n          p = print_fifo_rw_xilinx(p, fifo_name, 0);\n          p = isl_printer_print_str(p, \"fifo_data_\");\n          p = isl_printer_print_str(p, group->array->name);\n          p = isl_printer_print_str(p, \");\");\n          p = isl_printer_end_line(p);\n        } else {\n          if (nxt_data_pack == 1)\n          {\n            /* union {unsigned int ui; float ut;} u1, u0; */\n            p = isl_printer_start_line(p);\n            p = isl_printer_print_str(p, \"union {unsigned int ui; \");\n            p = isl_printer_print_str(p, group->array->type);\n            p = isl_printer_print_str(p, \" ut;} \");\n            int first = 1;\n            for (int i = data_pack / nxt_data_pack - 1; i >= 0; i--)\n            {\n              if (!first)\n                p = isl_printer_print_str(p, \", \");\n              p = isl_printer_print_str(p, \"u\");\n              p = isl_printer_print_int(p, i);\n              first = 0;\n            }\n            p = isl_printer_print_str(p, \";\");\n            p = isl_printer_end_line(p);\n            /* u1 = local[][1];\n             * u0 = local[][0];\n             */\n            for (int i = data_pack / nxt_data_pack - 1; i >= 0; i--)\n            {\n              isl_ast_expr *expr = stmt->u.i.local_index;\n              isl_ast_expr *op;\n              int n_arg = isl_ast_expr_op_get_n_arg(expr);\n              p = isl_printer_start_line(p);\n              p = isl_printer_print_str(p, \"u\");\n              p = isl_printer_print_int(p, i);\n              p = isl_printer_print_str(p, \".ut = \");\n              op = isl_ast_expr_op_get_arg(expr, 0);\n              p = isl_printer_print_ast_expr(p, op);\n              isl_ast_expr_free(op);\n              for (int j = 0; j < n_arg - 1; j++)\n              {\n                op = isl_ast_expr_op_get_arg(expr, 1 + j);\n                p = isl_printer_print_str(p, \"[\");\n                if (j == n_arg - 2)\n                {\n                  if (stmt->u.i.simd_depth != -1) {\n                    p = isl_printer_print_ast_expr(p, op);\n                    p = isl_printer_print_str(p, \" + \");\n                  }\n                  p = isl_printer_print_int(p, i);\n                }\n                else\n                {\n                  p = isl_printer_print_ast_expr(p, op);\n                }\n                p = isl_printer_print_str(p, \"]\");\n                isl_ast_expr_free(op);\n              }\n              p = isl_printer_print_str(p, \";\");\n              p = isl_printer_end_line(p);\n            }\n          }\n          /* fifo_data = (ap_uint<32*nxt_data_pack>(u1.ui), \n           *              ap_uint<32*nxt_data_pack>(u0.ui)); */\n          int first = 1;\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"fifo_data = (\");\n          for (int i = data_pack / nxt_data_pack - 1; i >= 0; i--)\n          {\n            isl_ast_expr *expr = stmt->u.i.local_index;\n            isl_ast_expr *op;\n            int n_arg = isl_ast_expr_op_get_n_arg(expr);\n            if (!first)\n              p = isl_printer_print_str(p, \", \");\n            if (nxt_data_pack == 1)\n            {\n              p = isl_printer_print_str(p, \"ap_uint<\");\n              p = isl_printer_print_int(p, group->array->size * 8 * nxt_data_pack);\n              p = isl_printer_print_str(p, \">(u\");\n              p = isl_printer_print_int(p, i);\n              p = isl_printer_print_str(p, \".ui)\");\n            }\n            else\n            {\n              op = isl_ast_expr_op_get_arg(expr, 0);\n              p = isl_printer_print_ast_expr(p, op);\n              isl_ast_expr_free(op);\n              for (int j = 0; j < n_arg - 1; j++)\n              {\n                op = isl_ast_expr_op_get_arg(expr, 1 + j);\n                p = isl_printer_print_str(p, \"[\");\n                if (j == n_arg - 2)\n                {\n                  p = isl_printer_print_int(p, i);\n                }\n                else\n                {\n                  p = isl_printer_print_ast_expr(p, op);\n                }\n                p = isl_printer_print_str(p, \"]\");\n                isl_ast_expr_free(op);\n              }\n            }\n            first = 0;\n          }\n          p = isl_printer_print_str(p, \");\");\n          p = isl_printer_end_line(p);\n          p = isl_printer_start_line(p);\n          p = print_fifo_rw_xilinx(p, fifo_name, 0);\n          p = isl_printer_print_str(p, \"fifo_data);\");\n          p = isl_printer_end_line(p);\n        }\n      }\n      else if (hls->target == INTEL_HW)\n      {\n        /* fifo_data = (float4)((float2)local[][1], (float2)local[][0]); */\n        int first = 1;\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"fifo_data.data = (\");\n        if (data_pack == 1)\n        {\n          p = isl_printer_print_str(p, group->array->type);\n        }\n        else\n        {\n          //p = isl_printer_print_str(p, group->array->name);\n          //p = isl_printer_print_str(p, \"_t\");\n          //p = isl_printer_print_int(p, data_pack);\n          p = isl_printer_print_str(p, group->array->type);\n          p = isl_printer_print_int(p, data_pack);\n        }\n        p = isl_printer_print_str(p, \")(\");\n        //for (int i = data_pack / nxt_data_pack - 1; i >= 0; i--)\n        for (int i = 0; i < data_pack / nxt_data_pack; i++)\n        {\n          isl_ast_expr *expr = stmt->u.i.local_index;\n          isl_ast_expr *op;\n          int n_arg = isl_ast_expr_op_get_n_arg(expr);\n          if (!first)\n            p = isl_printer_print_str(p, \", \");\n          p = isl_printer_print_str(p, \"(\");\n          if (nxt_data_pack == 1)\n          {\n            p = isl_printer_print_str(p, group->array->type);\n          }\n          else\n          {\n            p = isl_printer_print_str(p, group->array->name);\n            p = isl_printer_print_str(p, \"_t\");\n            p = isl_printer_print_int(p, nxt_data_pack);\n          }\n          p = isl_printer_print_str(p, \")\");\n          op = isl_ast_expr_op_get_arg(expr, 0);\n          p = isl_printer_print_ast_expr(p, op);\n          isl_ast_expr_free(op);\n          for (int j = 0; j < n_arg - 1; j++)\n          {\n            op = isl_ast_expr_op_get_arg(expr, 1 + j);\n            p = isl_printer_print_str(p, \"[\");\n            if (j == n_arg - 2)\n            {\n              p = isl_printer_print_int(p, i);\n            }\n            else\n            {\n              p = isl_printer_print_ast_expr(p, op);\n            }\n            p = isl_printer_print_str(p, \"]\");\n            isl_ast_expr_free(op);\n            if (nxt_data_pack > 1)\n              p = isl_printer_print_str(p, \".data\");\n          }\n          first = 0;\n        }\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n        /* write_channel_intel(fifo, fifo_data); */\n        p = isl_printer_start_line(p);\n        p = print_fifo_rw_intel(p, fifo_name, 0);\n        p = isl_printer_print_str(p, \"fifo_data);\");\n        p = isl_printer_end_line(p);\n      } else if (hls->target == CATAPULT_HW) {\n        if (kernel->sparse) {\n          p = isl_printer_start_line(p);\n          p = print_fifo_rw_catapult(p, fifo_name, 0);          \n          p = isl_printer_print_str(p, \"fifo_data_\");\n          p = isl_printer_print_str(p, group->array->name);\n          p = isl_printer_print_str(p, \");\");\n          p = isl_printer_end_line(p);\n        } else {          \n          for (int i = data_pack / nxt_data_pack - 1; i >= 0; i--) {\n            p = isl_printer_start_line(p);\n            p = isl_printer_print_str(p, \"fifo_data.set_slc(\");\n            p = isl_printer_print_int(p, group->array->size * 8 * nxt_data_pack * i);\n            p = isl_printer_print_str(p, \", \");\n\n            isl_ast_expr *expr = stmt->u.i.local_index;\n            isl_ast_expr *op;\n            int n_arg = isl_ast_expr_op_get_n_arg(expr);\n            op = isl_ast_expr_op_get_arg(expr, 0);\n            p = isl_printer_print_ast_expr(p, op);\n            isl_ast_expr_free(op);\n            for (int j = 0; j < n_arg - 1; j++)\n            {\n              op = isl_ast_expr_op_get_arg(expr, 1 + j);\n              p = isl_printer_print_str(p, \"[\");\n              if (j == n_arg - 2)\n              {\n                p = isl_printer_print_int(p, i);\n              }\n              else\n              {\n                p = isl_printer_print_ast_expr(p, op);\n              }\n              p = isl_printer_print_str(p, \"]\");\n              isl_ast_expr_free(op);\n            }\n            p = isl_printer_print_str(p, \");\");\n            p = isl_printer_end_line(p);\n          }\n\n          p = isl_printer_start_line(p);\n          p = print_fifo_rw_catapult(p, fifo_name, 0);\n          p = isl_printer_print_str(p, \"fifo_data);\");\n          p = isl_printer_end_line(p);\n        }\n      } else if (hls->target == TAPA_HW) {\n        if (kernel->sparse) {\n          p = isl_printer_start_line(p);\n          p = print_fifo_rw_tapa(p, fifo_name, 0);\n          p = isl_printer_print_str(p, \"fifo_data_\");\n          p = isl_printer_print_str(p, group->array->name);\n          p = isl_printer_print_str(p, \");\");\n          p = isl_printer_end_line(p);\n        } else {\n          if (nxt_data_pack == 1)\n          {\n            /* float f1, f0; */\n            p = isl_printer_start_line(p);\n            p = isl_printer_print_str(p, group->array->type);\n            p = isl_printer_print_str(p, \" \");\n            int first = 1;\n            for (int i = data_pack / nxt_data_pack - 1; i >= 0; i--) {\n              if (!first)\n                p = isl_printer_print_str(p, \", \");\n              p = isl_printer_print_str(p, \"f\");\n              p = isl_printer_print_int(p, i);\n              first = 0;\n            }\n            p = isl_printer_print_str(p, \";\");\n            p = isl_printer_end_line(p);\n            /* f1 = local[][1];\n             * f0 = local[][0]; */\n            for (int i = data_pack / nxt_data_pack - 1; i >= 0; i--)\n            {\n              isl_ast_expr *expr = stmt->u.i.local_index;\n              isl_ast_expr *op;\n              int n_arg = isl_ast_expr_op_get_n_arg(expr);\n              p = isl_printer_start_line(p);\n              p = isl_printer_print_str(p, \"f\");\n              p = isl_printer_print_int(p, i);\n              p = isl_printer_print_str(p, \" = \");\n              op = isl_ast_expr_op_get_arg(expr, 0);\n              p = isl_printer_print_ast_expr(p, op);\n              isl_ast_expr_free(op);\n              for (int j = 0; j < n_arg - 1; j++) {\n                op = isl_ast_expr_op_get_arg(expr, 1 + j);\n                p = isl_printer_print_str(p, \"[\");\n                if (j == n_arg - 2) {\n                  if (stmt->u.i.simd_depth != -1) {\n                    p = isl_printer_print_ast_expr(p, op);\n                    p = isl_printer_print_str(p, \" + \");\n                  }\n                  p = isl_printer_print_int(p, i);\n                } else {\n                  p = isl_printer_print_ast_expr(p, op);\n                }\n                p = isl_printer_print_str(p, \"]\");\n                isl_ast_expr_free(op);\n              }\n              p = isl_printer_print_str(p, \";\");\n              p = isl_printer_end_line(p);\n            }\n          }\n          /* fifo_data = [f1, f0]; */\n          for (int i = data_pack / nxt_data_pack - 1; i >= 0; i--) {\n            isl_ast_expr *expr = stmt->u.i.local_index;\n            isl_ast_expr *op;\n            int n_arg = isl_ast_expr_op_get_n_arg(expr);\n            if (nxt_data_pack == 1) {\n              p = isl_printer_start_line(p);\n              p = isl_printer_print_str(p, \"fifo_data.set(\");\n              p = isl_printer_print_int(p, i);\n              p = isl_printer_print_str(p, \", f\");\n              p = isl_printer_print_int(p, i);\n              p = isl_printer_print_str(p, \");\");\n              p = isl_printer_end_line(p);\n            } else {\n              for (int j = 0; j < nxt_data_pack; j++) {\n                p = isl_printer_start_line(p);\n                p = isl_printer_print_str(p, \"fifo_data.set(\");\n                p = isl_printer_print_int(p, i * nxt_data_pack + j);\n                p = isl_printer_print_str(p, \", \");\n\n                op = isl_ast_expr_op_get_arg(expr, 0);\n                p = isl_printer_print_ast_expr(p, op);\n                isl_ast_expr_free(op);\n\n                for (int k = 0; k < n_arg - 1; k++) {\n                  op = isl_ast_expr_op_get_arg(expr, 1 + k);\n                  p = isl_printer_print_str(p, \"[\");\n                  if (k == n_arg - 2) {\n                    p = isl_printer_print_int(p, i);\n                  } else {\n                    p = isl_printer_print_ast_expr(p, op);\n                  }\n                  p = isl_printer_print_str(p, \"]\");\n                  isl_ast_expr_free(op);\n                }\n\n                p = isl_printer_print_str(p, \"[\");\n                p = isl_printer_print_int(p, j);\n                p = isl_printer_print_str(p, \"]);\");\n                p = isl_printer_end_line(p);\n              }\n            }\n          }\n          p = isl_printer_start_line(p);\n          p = print_fifo_rw_tapa(p, fifo_name, 0);\n          p = isl_printer_print_str(p, \"fifo_data);\");\n          p = isl_printer_end_line(p);\n        }\n      }\n    }\n    p = ppcg_end_block(p);\n  }\n  \n  free(fifo_name);\n  isl_ast_expr_free(local_index_packed);  \n  return p;\n}\n\n__isl_give isl_printer *autosa_print_reduce_data_pack(\n  __isl_take isl_printer *p,\n  struct autosa_kernel_stmt *stmt,\n  int data_pack_in,\n  int data_pack_out,\n  struct autosa_array_ref_group *group,\n  enum platform target\n  )\n{  \n  p = print_str_new_line(p, \"/* Local Reduction */\");\n\n  if (target == XILINX_HW) {\n    /* union {unsigned int ui; data_t uf;} uin_0, uin_1, ... uout_0, uout_1, ...; */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"union {unsigned int ui; \");\n    p = isl_printer_print_str(p, group->array->type);\n    p = isl_printer_print_str(p, \" ut;} \");\n    for (int i = 0; i < data_pack_in; i++) {\n      p = isl_printer_print_str(p, \"uin_\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \", \");\n    }\n    for (int i = 0; i < data_pack_in; i++) {\n      p = isl_printer_print_str(p, \"uout_\");\n      p = isl_printer_print_int(p, i);\n      if (i == data_pack_in - 1) {\n        p = isl_printer_print_str(p, \";\");\n      } else {\n        p = isl_printer_print_str(p, \", \");\n      }\n    }\n    p = isl_printer_end_line(p);\n\n    /* assign the fifo_data and buf_data_split[split_i] to union vars. */\n    for (int i = 0; i < data_pack_in; i++) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"uin_\");\n      p = isl_printer_print_int(p, i);\n      if (data_pack_in == 1) {\n        p = isl_printer_print_str(p, \".ut = in_data;\");\n      } else {\n        p = isl_printer_print_str(p, \".ui = (unsigned int)in_data(\");\n        p = isl_printer_print_int(p, group->array->size * 8 * (i + 1) - 1);\n        p = isl_printer_print_str(p, \", \");\n        p = isl_printer_print_int(p, group->array->size * 8 * i);\n        p = isl_printer_print_str(p, \");\");\n      }\n      p = isl_printer_end_line(p);\n    }\n    for (int i = 0; i < data_pack_in; i++) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"uout_\");\n      p = isl_printer_print_int(p, i);    \n      p = isl_printer_print_str(p, \".ui = (unsigned int)data_split[split_idx](\");\n      p = isl_printer_print_int(p, group->array->size * 8 * (i + 1) - 1);\n      p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_int(p, group->array->size * 8 * i);\n      p = isl_printer_print_str(p, \");\");    \n      p = isl_printer_end_line(p);\n    }\n\n    /* perform reduction. */\n    for (int i = 0; i < data_pack_in; i++) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"uout_\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \".ut \");\n      p = isl_printer_print_str(p, stmt->u.i.reduce_op);\n      p = isl_printer_print_str(p, \"= \");\n      p = isl_printer_print_str(p, \"uin_\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \".ut;\");\n      p = isl_printer_end_line(p);\n    }\n\n    /* re-assign the reduced values to the buf_data_split[i]. */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"data_split[split_idx] = \");\n    p = isl_printer_print_str(p, \"(\");\n    for (int i = data_pack_in - 1; i >= 0; i--) {    \n      if (i != data_pack_in - 1)\n        p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, \"(ap_uint<\");\n      p = isl_printer_print_int(p, group->array->size * 8);\n      p = isl_printer_print_str(p, \">)\");\n      p = isl_printer_print_str(p, \"uout_\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \".ui\");\n    }\n    p = isl_printer_print_str(p, \");\");\n    p = isl_printer_end_line(p);\n  } else if (target == CATAPULT_HW) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, group->array->name);\n    p = isl_printer_print_str(p, \"_t1 \");\n    for (int i = 0; i < data_pack_in; i++) {\n      p = isl_printer_print_str(p, \"uin_\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \", \");\n    }\n    for (int i = 0; i < data_pack_in; i++) {\n      p = isl_printer_print_str(p, \"uout_\");\n      p = isl_printer_print_int(p, i);\n      if (i == data_pack_in - 1) {\n        p = isl_printer_print_str(p, \";\");\n      } else {\n        p = isl_printer_print_str(p, \", \");\n      }\n    }\n    p = isl_printer_end_line(p);\n\n    /* assign the fifo_data and buf_data_split[split_i] to vars. */\n    for (int i = 0; i < data_pack_in; i++) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"uin_\");\n      p = isl_printer_print_int(p, i);      \n      if (data_pack_in == 1) {\n        p = isl_printer_print_str(p, \" = in_data\");      \n      } else {\n        p = isl_printer_print_str(p, \" = in_data.slc<\");\n        p = isl_printer_print_int(p, group->array->size * 8);\n        p = isl_printer_print_str(p, \">(\");\n        p = isl_printer_print_int(p, group->array->size * 8 * i);\n        p = isl_printer_print_str(p, \");\");\n      }\n      p = isl_printer_end_line(p);\n    }\n    for (int i = 0; i < data_pack_in; i++) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"uout_\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \" = data_split[split_idx].slc<\");\n      p = isl_printer_print_int(p, group->array->size * 8);\n      p = isl_printer_print_str(p, \">(\");\n      p = isl_printer_print_int(p, group->array->size * 8 * i);\n      p = isl_printer_print_str(p, \");\");\n      p = isl_printer_end_line(p);\n    }\n\n    /* perform reduction */\n    for (int i = 0; i < data_pack_in; i++) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"uout_\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \" \");\n      p = isl_printer_print_str(p, stmt->u.i.reduce_op);\n      p = isl_printer_print_str(p, \"= \");\n      p = isl_printer_print_str(p, \"uin_\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    }\n\n    /* re-assign the reduced values to the buf_data_split[i]. */\n    for (int i = 0; i < data_pack_in; i++) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"data_split[split_idx].set_slc(\");\n      p = isl_printer_print_int(p, group->array->size * 8 * i);\n      p = isl_printer_print_str(p, \", uout_\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \")\");\n    }    \n  } else if (target == TAPA_HW) {\n    /* data_t din_0, din_1, ... dout_0, dout_1, ...; */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, group->array->type);\n    p = isl_printer_print_str(p, \" \");\n    for (int i = 0; i < data_pack_in; i++) {\n      p = isl_printer_print_str(p, \"din_\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \", \");\n    }\n    for (int i = 0; i < data_pack_in; i++) {\n      p = isl_printer_print_str(p, \"dout_\");\n      p = isl_printer_print_int(p, i);\n      if (i == data_pack_in - 1) {\n        p = isl_printer_print_str(p, \";\");\n      } else {\n        p = isl_printer_print_str(p, \", \");\n      }\n    }\n    p = isl_printer_end_line(p);\n\n    /* assign the fifo_data and buf_data_split[split_i] into vars. */\n    for (int i = 0; i < data_pack_in; i++) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"din_\");\n      p = isl_printer_print_int(p, i);\n      if (data_pack_in == 1) {\n        p = isl_printer_print_str(p, \" = in_data;\");\n      } else {\n        p = isl_printer_print_str(p, \" = in_data[\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \"];\");\n      }\n      p = isl_printer_end_line(p);\n    }\n    for (int i = 0; i < data_pack_in; i++) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"dout_\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \" = data_split[split_idx]\");\n      if (data_pack_in > 1) {\n        p = isl_printer_print_str(p, \"[\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \"]\");\n      }\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    }\n\n    /* perform reduction. */\n    for (int i = 0; i < data_pack_in; i++) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"dout_\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \" \");\n      p = isl_printer_print_str(p, stmt->u.i.reduce_op);\n      p = isl_printer_print_str(p, \"= \");\n      p = isl_printer_print_str(p, \"din_\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    }\n\n    /* re-assign the reduced values to the buf_data_split[i]. */\n    for (int i = data_pack_in - 1; i >= 0; i--) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"data_split[split_idx]\");\n      if (data_pack_in > 1) {\n        p = isl_printer_print_str(p, \".set(\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \", \");\n      } else {\n        p = isl_printer_print_str(p, \" = \");\n      }\n      p = isl_printer_print_str(p, \"dout_\");\n      p = isl_printer_print_int(p, i);\n      if (data_pack_in > 1) {\n        p = isl_printer_print_str(p, \")\");\n      }\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    }\n  }\n\n  p = print_str_new_line(p, \"/* Local Reduction */\");\n\n  return p;\n}\n\n__isl_give isl_printer *autosa_print_reduce_default(\n  __isl_take isl_printer *p,\n  struct autosa_kernel_stmt *stmt,\n  int data_pack,\n  isl_ast_expr *index,\n  struct autosa_array_ref_group *group)\n{\n  p = print_str_new_line(p, \"/* Local Reduction */\");\n\n  /* union {unsigned int ui; data_t ut;} u... */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"union {unsigned int ui; \");\n  p = isl_printer_print_str(p, group->array->type);\n  p = isl_printer_print_str(p, \" ut;} \");\n  for (int i = 0; i < data_pack; i++) {\n    p = isl_printer_print_str(p, \"uin_\");\n    p = isl_printer_print_int(p, i);\n    p = isl_printer_print_str(p, \", \");\n  }\n  for (int i = 0; i < data_pack; i++) {\n    p = isl_printer_print_str(p, \"uout_\");\n    p = isl_printer_print_int(p, i);\n    if (i == data_pack - 1) {\n      p = isl_printer_print_str(p, \";\");\n    } else {\n      p = isl_printer_print_str(p, \", \");\n    }\n  }\n  p = isl_printer_end_line(p);\n\n  /* assign fifo_data to uxx, assign local_data to uxx. */\n  for (int i = 0; i < data_pack; i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"uin_\");\n    p = isl_printer_print_int(p, i);\n    if (data_pack == 1) {\n      p = isl_printer_print_str(p, \".ut = in_data;\");\n    } else {\n      p = isl_printer_print_str(p, \".ui = (unsigned int)in_data(\");\n      p = isl_printer_print_int(p, group->array->size * 8 * (i + 1) - 1);\n      p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_int(p, group->array->size * 8 * i);\n      p = isl_printer_print_str(p, \");\");\n    }\n    p = isl_printer_end_line(p);\n  }\n  for (int i = 0; i < data_pack; i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"uout_\");\n    p = isl_printer_print_int(p, i);    \n    if (data_pack == 1) {\n      p = isl_printer_print_str(p, \".ut = \");\n      if (stmt->u.i.module->double_buffer &&\n          stmt->u.i.module->options->autosa->double_buffer_style == 0)\n        throw std::runtime_error(\"[AutoSA] Error: Local reduce for double buffer style 0 is not supported!\");\n      else {        \n        p = isl_printer_print_ast_expr(p, index);\n      }\n      p = isl_printer_print_str(p, \";\");      \n    } else {\n      p = isl_printer_print_str(p, \".ui = (unsigned int)\");\n      p = isl_printer_print_ast_expr(p, index);\n      p = isl_printer_print_str(p, \"(\");\n      p = isl_printer_print_int(p, group->array->size * 8 * (i + 1) - 1);\n      p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_int(p, group->array->size * 8 * i);\n      p = isl_printer_print_str(p, \");\");\n    }\n    p = isl_printer_end_line(p);\n  }\n\n  /* perform reduction. */\n  for (int i = 0; i < data_pack; i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"uout_\");\n    p = isl_printer_print_int(p, i);\n    p = isl_printer_print_str(p, \".ut \");\n    p = isl_printer_print_str(p, stmt->u.i.reduce_op);\n    p = isl_printer_print_str(p, \"= \");\n    p = isl_printer_print_str(p, \"uin_\");\n    p = isl_printer_print_int(p, i);\n    p = isl_printer_print_str(p, \".ut;\");\n    p = isl_printer_end_line(p);\n  }\n\n  /* reassign uxx to local[][] */\n  p = isl_printer_start_line(p);\n  //p = isl_printer_print_ast_expr(p, index);\n  p = isl_printer_print_str(p, \"out_data\");\n  p = isl_printer_print_str(p, \" = \");\n  if (data_pack == 1) {\n    p = isl_printer_print_str(p, \"uout_0.ut;\");    \n  } else {\n    p = isl_printer_print_str(p, \"(\");\n    int is_first = 1;\n    for (int i = data_pack - 1; i >= 0; i--) {\n      if (!is_first)\n        p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, \"(ap_uint<\");\n      p = isl_printer_print_int(p, group->array->size * 8);\n      p = isl_printer_print_str(p, \">)uout_\");\n      p = isl_printer_print_int(p, i);   \n      p = isl_printer_print_str(p, \".ui\");\n      is_first = 0;\n    }\n    p = isl_printer_print_str(p, \");\");\n  }\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"/* Local Reduction */\");\n\n  return p;\n}\n\n/* Print an I/O transfer statement.\n *\n * An in I/O statement is printed as\n *\n *  [type] fifo_data;\n *  fifo_data = fifo.read();\n *  if (filter_condition) {\n *    local[] = fifo_data; // if buf == 1\n *    fifo_local.write(fifo_data); // if buf == 0\n *  } else {\n *    fifo.write(fifo_data);\n *  }\n *\n * if filter_depth < 0\n *\n *  [type] fifo_data;\n *  fifo_data = fifo.read();\n *  local = fifo_data; // if buf == 1\n *  fifo_local.write(fifo_data); // if buf == 0\n *\n * An out I/O statement is printed as \n *\n *  [type] fifo_data;\n *  fifo_data = fifo.read();\n *  if (filter_condition) {\n *    fifo_data = local[]; // if buf == 1\n *    fifo_data = fifo_local.read(); // if buf == 0\n *  } else {\n *    fifo_data = fifo.read();\n *  }\n *  fifo.write(fifo_data);\n */\nstatic __isl_give isl_printer *autosa_kernel_print_io_transfer_default(\n    __isl_take isl_printer *p, struct autosa_kernel_stmt *stmt,\n    struct autosa_array_ref_group *group, int n_lane, struct hls_info *hls,\n    const char *iterator_prefix)\n{\n  isl_ctx *ctx;\n  char *fifo_name;\n  ctx = isl_printer_get_ctx(p);\n  int boundary = stmt->u.i.boundary;\n  /* If the statement is a boundary statement, \n   * then ignore the filter condition by setting filter_sched_depth as -1\n   */\n  if (boundary)\n    stmt->u.i.filter_sched_depth = -1;\n\n  isl_ast_expr *local_index_packed;\n  isl_ast_expr *arg, *div;\n  local_index_packed = isl_ast_expr_copy(stmt->u.i.local_index);\n  int n_arg;\n  /* Extract the sparse data */\n  int is_sparse = group->local_array->is_sparse;\n  int vec_len = stmt->u.i.local_array->vec_len;\n  int n_nzero = stmt->u.i.local_array->n_nzero;\n  float compress_ratio = stmt->u.i.local_array->compress_ratio;\n  int n_meta_data = stmt->u.i.local_array->n_meta_data;\n  float eff_compress_ratio = stmt->u.i.local_array->eff_compress_ratio;\n\n  /* Modify the local index. */\n  if (is_sparse) {\n    n_arg = isl_ast_expr_get_op_n_arg(local_index_packed);\n    arg = isl_ast_expr_get_op_arg(local_index_packed, n_arg - 1);\n    div = isl_ast_expr_from_val(isl_val_int_from_si(ctx, vec_len * n_lane));\n    arg = isl_ast_expr_div(arg, div);\n    local_index_packed = isl_ast_expr_set_op_arg(local_index_packed, n_arg - 1, arg);\n  } else {\n    if (n_lane > 1)\n    {\n      n_arg = isl_ast_expr_get_op_n_arg(local_index_packed);\n      arg = isl_ast_expr_get_op_arg(local_index_packed, n_arg - 1);\n      div = isl_ast_expr_from_val(isl_val_int_from_si(ctx, n_lane));\n      arg = isl_ast_expr_div(arg, div);\n      local_index_packed = isl_ast_expr_set_op_arg(local_index_packed, n_arg - 1, arg);\n    }\n  }\n\n  /* Declare the fifo data variable. */\n  p = isl_printer_start_line(p);\n  if (is_sparse) {\n    p = autosa_print_array_type_with_lane_sparse(p, group->array, n_lane);\n  } else {    \n    p = isl_printer_print_str(p, stmt->u.i.array->name);\n    if (group->local_array->is_sparse)\n      p = isl_printer_print_str(p, \"_s\");\n    p = isl_printer_print_str(p, \"_t\");\n    p = isl_printer_print_int(p, n_lane);    \n  }\n  p = isl_printer_print_str(p, \" fifo_data;\");\n  p = isl_printer_end_line(p);\n\n  if (stmt->u.i.in)\n  {            \n    fifo_name = concat(ctx, stmt->u.i.in_fifo_name, \"in\");\n    /* fifo_data = fifo.read(); */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"fifo_data\");\n    p = isl_printer_print_str(p, \" = \");\n    if (hls->target == XILINX_HW)\n      p = print_fifo_rw_xilinx(p, fifo_name, 1);\n    else if (hls->target == TAPA_HW)\n      p = print_fifo_rw_tapa(p, fifo_name, 1);\n    else if (hls->target == INTEL_HW)\n      p = print_fifo_rw_intel(p, fifo_name, 1);\n    else if (hls->target == CATAPULT_HW)\n      p = print_fifo_rw_catapult(p, fifo_name, 1);\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n    free(fifo_name);\n\n    if (stmt->u.i.buf)\n    {\n      /* local[][] = fifo_data; */\n      if (stmt->u.i.reduce) {\n        p = autosa_print_reduce_default(p, stmt, n_lane, local_index_packed, group);\n      } else {\n        p = isl_printer_start_line(p);\n        //p = isl_printer_print_ast_expr(p, local_index_packed);\n        if (stmt->u.i.module->double_buffer && \n            stmt->u.i.module->options->autosa->double_buffer_style == 0)\n        {\n          isl_ast_expr *op;\n          op = isl_ast_expr_op_get_arg(local_index_packed, 0);\n          p = isl_printer_print_ast_expr(p, op);\n          isl_ast_expr_free(op);\n          p = isl_printer_print_str(p, \"[arb]\");\n          for (int n = 1; n < isl_ast_expr_op_get_n_arg(local_index_packed); n++) {\n            op = isl_ast_expr_op_get_arg(local_index_packed, n);\n            p = isl_printer_print_str(p, \"[\");\n            p = isl_printer_print_ast_expr(p, op);\n            p = isl_printer_print_str(p, \"]\");\n            isl_ast_expr_free(op);\n          }\n        } \n        else \n        {\n          if (hls->target == CATAPULT_HW && stmt->u.i.module->is_filter) {\n            isl_ast_expr *op;\n            op = isl_ast_expr_op_get_arg(local_index_packed, 0);\n            p = isl_printer_print_ast_expr(p, op);    \n            isl_ast_expr_free(op);\n            p = isl_printer_print_str(p, \"_tmp.data\");\n            for (int n = 1; n < isl_ast_expr_op_get_n_arg(local_index_packed); n++) {\n              op = isl_ast_expr_op_get_arg(local_index_packed, n);\n              p = isl_printer_print_str(p, \"[\");\n              p = isl_printer_print_ast_expr(p, op);\n              p = isl_printer_print_str(p, \"]\");\n              isl_ast_expr_free(op);\n            }\n          } else {\n            p = isl_printer_print_ast_expr(p, local_index_packed);\n          }\n        }\n        p = isl_printer_print_str(p, \" \");\n        if (stmt->u.i.reduce) {        \n          p = isl_printer_print_str(p, stmt->u.i.reduce_op);\n          // TODO: what if the data pack factor is greater than 1?\n        }         \n        p = isl_printer_print_str(p, \"= fifo_data;\");\n        p = isl_printer_end_line(p);\n      }\n    }\n    else\n    {\n      /* fifo.write(fifo_data); */          \n      fifo_name = concat(ctx, stmt->u.i.out_fifo_name, \"out\");      \n      p = isl_printer_start_line(p);\n      if (hls->target == XILINX_HW)\n        p = print_fifo_rw_xilinx(p, fifo_name, 0);\n      else if (hls->target == TAPA_HW)\n        p = print_fifo_rw_tapa(p, fifo_name, 0);\n      else if (hls->target == INTEL_HW)\n        p = print_fifo_rw_intel(p, fifo_name, 0);\n      else if (hls->target == CATAPULT_HW)\n        p = print_fifo_rw_catapult(p, fifo_name, 0);\n      p = isl_printer_print_str(p, \"fifo_data);\");\n      p = isl_printer_end_line(p);\n      free(fifo_name);\n    }\n  }\n  else\n  {    \n    if (stmt->u.i.buf)\n    {\n      /* fifo_data = local[][]; */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"fifo_data = \");\n      if (stmt->u.i.module->double_buffer && \n          stmt->u.i.module->options->autosa->double_buffer_style == 0) {      \n        isl_ast_expr *op;\n        op = isl_ast_expr_op_get_arg(local_index_packed, 0);\n        p = isl_printer_print_ast_expr(p, op);\n        isl_ast_expr_free(op);\n        p = isl_printer_print_str(p, \"[!arb]\");\n        for (int n = 1; n < isl_ast_expr_op_get_n_arg(local_index_packed); n++) {\n          op = isl_ast_expr_op_get_arg(local_index_packed, n);\n          p = isl_printer_print_str(p, \"[\");\n          p = isl_printer_print_ast_expr(p, op);\n          p = isl_printer_print_str(p, \"]\");\n          isl_ast_expr_free(op);\n        }\n      } else {\n        if (hls->target == CATAPULT_HW && stmt->u.i.module->is_filter) {\n          isl_ast_expr *op;\n          op = isl_ast_expr_op_get_arg(local_index_packed, 0);\n          p = isl_printer_print_ast_expr(p, op);    \n          isl_ast_expr_free(op);\n          p = isl_printer_print_str(p, \"_tmp.data\");\n          for (int n = 1; n < isl_ast_expr_op_get_n_arg(local_index_packed); n++) {\n            op = isl_ast_expr_op_get_arg(local_index_packed, n);\n            p = isl_printer_print_str(p, \"[\");\n            p = isl_printer_print_ast_expr(p, op);\n            p = isl_printer_print_str(p, \"]\");\n            isl_ast_expr_free(op);\n          }\n        } else {\n          p = isl_printer_print_ast_expr(p, local_index_packed);\n        }\n      }\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    }\n    else\n    {\n      /* fifo_data = fifo.read(); */            \n      fifo_name = concat(ctx, stmt->u.i.in_fifo_name, \"in\");      \n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"fifo_data = \");\n      if (hls->target == XILINX_HW)\n        p = print_fifo_rw_xilinx(p, fifo_name, 1);\n      else if (hls->target == TAPA_HW)\n        p = print_fifo_rw_tapa(p, fifo_name, 1);\n      else if (hls->target == INTEL_HW)\n        p = print_fifo_rw_intel(p, fifo_name, 1);\n      else if (hls->target == CATAPULT_HW)\n        p = print_fifo_rw_catapult(p, fifo_name, 1);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n      free(fifo_name);\n    }\n\n    /* fifo.write(fifo_data); */\n    fifo_name = concat(ctx, stmt->u.i.out_fifo_name, \"out\");\n    p = isl_printer_start_line(p);\n    if (hls->target == XILINX_HW)\n      p = print_fifo_rw_xilinx(p, fifo_name, 0);\n    else if (hls->target == TAPA_HW)\n      p = print_fifo_rw_tapa(p, fifo_name, 0);\n    else if (hls->target == INTEL_HW)\n      p = print_fifo_rw_intel(p, fifo_name, 0);\n    else if (hls->target == CATAPULT_HW)\n      p = print_fifo_rw_catapult(p, fifo_name, 0);\n    p = isl_printer_print_str(p, \"fifo_data);\");\n    p = isl_printer_end_line(p);\n    free(fifo_name);\n  }\n\n  isl_ast_expr_free(local_index_packed);\n\n  return p;\n}\n\n/* Print an access to the element in the global memory copy\n * described by \"stmt\".  The index of the copy is recorded in\n * stmt->index as an access to the array.\n * If \"serialize\" is set, we will simply print array[i++];\n */\nstatic __isl_give isl_printer *io_stmt_print_global_index(\n  __isl_take isl_printer *p, struct autosa_kernel_stmt *stmt, int serialize)\n{\n  struct autosa_array_info *array = stmt->u.i.array;\n  isl_ast_expr *index;\n\n  if (autosa_array_is_scalar(array))\n  {\n    if (!autosa_array_is_read_only_scalar(array))\n      p = isl_printer_print_str(p, \"*\");\n    p = isl_printer_print_str(p, array->name);\n    return p;\n  }\n\n  index = isl_ast_expr_copy(stmt->u.i.index);\n  if (!serialize) {    \n    p = isl_printer_print_ast_expr(p, index);\n  } else {    \n    isl_ast_expr *array_name;\n    array_name = isl_ast_expr_op_get_arg(index, 0);\n    p = isl_printer_print_ast_expr(p, array_name);\n    p = isl_printer_print_str(p, \"[i]\");    \n    isl_ast_expr_free(array_name);\n  }\n  isl_ast_expr_free(index);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *io_stmt_print_index_last_dim(\n  __isl_take isl_printer *p, struct autosa_kernel_stmt *stmt, \n  int serialize, int global, int n_lane, int nxt_n_lane, int is_sparse, int vec_len)\n{\n  struct autosa_array_info *array = stmt->u.i.array;\n  isl_ast_expr *index;\n\n  if (autosa_array_is_scalar(array))\n  {\n    if (!autosa_array_is_read_only_scalar(array))\n      p = isl_printer_print_str(p, \"0\");    \n    return p;\n  }\n\n  if (global)\n    index = isl_ast_expr_copy(stmt->u.i.index);\n  else \n    index = isl_ast_expr_copy(stmt->u.i.local_index);\n\n  if (!serialize) {    \n    isl_ast_expr *op;\n    int n_arg, r;\n    isl_val *val;\n    isl_ctx *ctx = isl_printer_get_ctx(p);\n\n    n_arg = isl_ast_expr_op_get_n_arg(index);\n    op = isl_ast_expr_op_get_arg(index, n_arg - 1);\n    r = n_lane / nxt_n_lane;    \n    if (is_sparse) \n      val = isl_val_int_from_si(ctx, vec_len * nxt_n_lane);\n    else\n      val = isl_val_int_from_si(ctx, nxt_n_lane);        \n    op = isl_ast_expr_div(op, isl_ast_expr_from_val(val));        \n    if (global) {\n      op = isl_ast_expr_mul(op, isl_ast_expr_from_val(isl_val_int_from_si(ctx, n_lane)));\n    }\n    p = isl_printer_print_ast_expr(p, op);\n\n    isl_ast_expr_free(op);    \n  } else {        \n    p = isl_printer_print_str(p, \"i\");        \n  }\n  isl_ast_expr_free(index);\n\n  return p;  \n}\n\n/* A list of helper functions for autosa_kernel_print_io_transfer */\n/* update_data_split: data_split[split_i] = in_data; */\nstatic __isl_give isl_printer *io_transfer_update_data_split(\n  __isl_take isl_printer *p, struct autosa_kernel_stmt *stmt,  \n  struct hls_info *hls, const char *iterator_prefix)\n{\n  struct autosa_array_ref_group *group = stmt->u.i.group;\n  struct autosa_hw_module *module = stmt->u.i.module;  \n  int n_lane = stmt->u.i.data_pack;\n  int nxt_n_lane = stmt->u.i.nxt_data_pack;\n\n  if (hls->target == XILINX_HW ||\n      hls->target == CATAPULT_HW ||\n      hls->target == TAPA_HW ||\n    (hls->target == INTEL_HW && nxt_n_lane > 1)) {\n    if (stmt->u.i.reduce) {\n      //if (n_lane == nxt_n_lane)\n      //  p = autosa_print_reduce_default(p, stmt, n_lane, local_index_packed, group);\n      //else\n      p = autosa_print_reduce_data_pack(p, stmt, nxt_n_lane, n_lane, group, hls->target); // TODO\n    } else {\n      if (hls->target == XILINX_HW) {\n        if (nxt_n_lane == 1) {\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"union {unsigned int ui; \");\n          p = isl_printer_print_str(p, group->array->type);\n          p = isl_printer_print_str(p, \" ut;} u;\");\n          p = isl_printer_end_line(p);\n\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"u.ut = in_data;\");\n          p = isl_printer_end_line(p);\n        }\n      }\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"data_split[split_idx] \");\n      if (stmt->u.i.reduce) {\n        p = isl_printer_print_str(p, stmt->u.i.reduce_op);\n      }\n      p = isl_printer_print_str(p, \"= \");\n\n      if (hls->target == XILINX_HW) {\n        if (nxt_n_lane == 1) {\n          p = isl_printer_print_str(p, \"ap_uint<\");\n          p = isl_printer_print_int(p, group->array->size * 8);\n          p = isl_printer_print_str(p, \">(u.ui);\");\n        } else {\n          p = isl_printer_print_str(p, \"in_data;\");\n        }\n      } else {\n        p = isl_printer_print_str(p, \"in_data;\");\n      }\n      p = isl_printer_end_line(p);\n    }\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *io_transfer_pack_out_data(\n  __isl_take isl_printer *p, struct autosa_kernel_stmt *stmt,  \n  struct hls_info *hls, const char *iterator_prefix) \n{\n  struct autosa_array_ref_group *group = stmt->u.i.group;\n  struct autosa_hw_module *module = stmt->u.i.module;  \n  int n_lane = stmt->u.i.data_pack;\n  int nxt_n_lane = stmt->u.i.nxt_data_pack;\n\n  if (hls->target == XILINX_HW) {\n    int first = 1;\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"out_data = (\");\n    for (int i = n_lane / nxt_n_lane - 1; i >= 0; i--) {\n      if (!first)\n        p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, \"data_split[\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \"]\");\n        first = 0;\n    }\n    p = isl_printer_print_str(p, \");\");\n  } else if (hls->target == INTEL_HW) {\n    p = isl_printer_start_line(p);\n    if (nxt_n_lane == 1) {\n      p = isl_printer_print_str(p, \"out_data.data[split_idx] = in_data;\");\n    } else {\n      int first = 1;\n      p = isl_printer_print_str(p, \"out_data.data = \");\n      p = isl_printer_print_str(p, \"(\");\n      p = isl_printer_print_str(p, group->array->type);\n      p = isl_printer_print_int(p, n_lane);\n      p = isl_printer_print_str(p, \")(\");\n      for (int i = 0; i < n_lane / nxt_n_lane; i++) {\n        if (!first)\n          p = isl_printer_print_str(p, \", \");\n        if (nxt_n_lane > 1) {\n          p = isl_printer_print_str(p, \"(\");\n          p = isl_printer_print_str(p, group->array->type);\n          p = isl_printer_print_int(p, nxt_n_lane);\n          p = isl_printer_print_str(p, \")\");\n        }\n        p = isl_printer_print_str(p, \"data_split[\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \"]\");\n        if (nxt_n_lane > 1) {\n          p = isl_printer_print_str(p, \".data\");\n        }\n        first = 0;\n      }\n      p = isl_printer_print_str(p, \");\");\n    }\n  } else if (hls->target == CATAPULT_HW) {\n    for (int i = 0; i < n_lane / nxt_n_lane; i++) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"out_data.set_slc(\");\n      p = isl_printer_print_int(p, i * group->array->size * 8 * nxt_n_lane);\n      p = isl_printer_print_str(p, \", data_split[\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \"]);\");\n      p = isl_printer_end_line(p);  \n    }\n  } else if (hls->target == TAPA_HW) {\n    for (int i = 0; i < n_lane / nxt_n_lane; i++) {\n      if (nxt_n_lane == 1) {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"out_data.set(\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \", data_split[\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \"]);\");\n        p = isl_printer_end_line(p);\n      } else {\n        for (int j = 0; j < nxt_n_lane; j++) {\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"out_data.set(\");\n          p = isl_printer_print_int(p, i * nxt_n_lane + j);\n          p = isl_printer_print_str(p, \", data_split[\");\n          p = isl_printer_print_int(p, i);\n          p = isl_printer_print_str(p, \"][\");\n          p = isl_printer_print_int(p, j);\n          p = isl_printer_print_str(p, \"]);\");\n          p = isl_printer_end_line(p);\n        }\n      }\n    }\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *io_transfer_read_local_buf(\n  __isl_take isl_printer *p, struct autosa_kernel_stmt *stmt,  \n  struct hls_info *hls, const char *iterator_prefix, isl_ast_expr *local_index_packed) \n{\n  struct autosa_array_ref_group *group = stmt->u.i.group;\n  struct autosa_hw_module *module = stmt->u.i.module;  \n  int n_lane = stmt->u.i.data_pack;\n  int nxt_n_lane = stmt->u.i.nxt_data_pack;\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"out_data = \");\n  if (stmt->u.i.module->double_buffer && \n    stmt->u.i.module->options->autosa->double_buffer_style == 0) {\n    isl_ast_expr *op;\n    op = isl_ast_expr_op_get_arg(local_index_packed, 0);\n    p = isl_printer_print_ast_expr(p, op);    \n    isl_ast_expr_free(op);\n    p = isl_printer_print_str(p, stmt->u.i.in? \"[arb]\" : \"[!arb]\");\n    for (int n = 1; n < isl_ast_expr_op_get_n_arg(local_index_packed); n++) {\n      op = isl_ast_expr_op_get_arg(local_index_packed, n);\n      p = isl_printer_print_str(p, \"[\");\n      p = isl_printer_print_ast_expr(p, op);\n      p = isl_printer_print_str(p, \"]\");\n      isl_ast_expr_free(op);\n    }\n  } else {\n    if (hls->target == CATAPULT_HW && stmt->u.i.module->is_filter) {\n      isl_ast_expr *op;\n      op = isl_ast_expr_op_get_arg(local_index_packed, 0);\n      p = isl_printer_print_ast_expr(p, op);    \n      isl_ast_expr_free(op);\n      p = isl_printer_print_str(p, \"_tmp.data\");\n      for (int n = 1; n < isl_ast_expr_op_get_n_arg(local_index_packed); n++) {\n        op = isl_ast_expr_op_get_arg(local_index_packed, n);\n        p = isl_printer_print_str(p, \"[\");\n        p = isl_printer_print_ast_expr(p, op);\n        p = isl_printer_print_str(p, \"]\");\n        isl_ast_expr_free(op);\n      }\n    } else {\n      p = isl_printer_print_ast_expr(p, local_index_packed);\n    }\n  }\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n\n  return p;  \n}\n\nstatic __isl_give isl_printer *io_transfer_parse_sparse_data(\n  __isl_take isl_printer *p, struct autosa_kernel_stmt *stmt,  \n  struct hls_info *hls, const char *iterator_prefix) \n{\n  struct autosa_array_ref_group *group = stmt->u.i.group;\n  struct autosa_hw_module *module = stmt->u.i.module;  \n  int n_lane = stmt->u.i.data_pack;\n  int nxt_n_lane = stmt->u.i.nxt_data_pack;\n\n  /* Extract the sparse data. */\n  int is_sparse = group->local_array->is_sparse;\n  int vec_len = stmt->u.i.local_array->vec_len;\n  int n_nzero = stmt->u.i.local_array->n_nzero;\n  float compress_ratio = stmt->u.i.local_array->compress_ratio;\n  int n_meta_data = stmt->u.i.local_array->n_meta_data;\n  float eff_compress_ratio = stmt->u.i.local_array->eff_compress_ratio;\n\n  /* [type_n_lane] buf_data_d = buf_data.d; */\n  p = isl_printer_start_line(p);\n  p = autosa_print_array_type_with_lane(p, group->array, n_lane * n_nzero);\n  p = isl_printer_print_str(p, \" out_data_d = out_data.d;\");\n  p = isl_printer_end_line(p);\n\n  /* [type_n_lane] buf_data_i = buf_data.i; */\n  p = isl_printer_start_line(p);\n  if (hls->target == XILINX_HW || hls->target == TAPA_HW) {\n    p = isl_printer_print_str(p, \"ap_uint<\");\n    p = isl_printer_print_int(p, 8 * n_lane);\n  } else if (hls->target == CATAPULT_HW) {\n    p = isl_printer_print_str(p, \"ac_int<\");\n    p = isl_printer_print_int(p, 8 * n_lane);\n    p = isl_printer_print_str(p, \", false\");\n  }\n  p = isl_printer_print_str(p, \"> out_data_i = out_data.i;\");\n  p = isl_printer_end_line(p); \n\n  return p;\n}\n\nstatic __isl_give isl_printer *io_transfer_write_data_split(\n  __isl_take isl_printer *p, struct autosa_kernel_stmt *stmt,  \n  struct hls_info *hls, const char *iterator_prefix, const char *data_str) \n{\n  struct autosa_array_ref_group *group = stmt->u.i.group;\n  struct autosa_hw_module *module = stmt->u.i.module;  \n  int n_lane = stmt->u.i.data_pack;\n  int nxt_n_lane = stmt->u.i.nxt_data_pack;\n\n  /* Extract the sparse data. */\n  int is_sparse = group->local_array->is_sparse;\n  int vec_len = stmt->u.i.local_array->vec_len;\n  int n_nzero = stmt->u.i.local_array->n_nzero;\n  float compress_ratio = stmt->u.i.local_array->compress_ratio;\n  int n_meta_data = stmt->u.i.local_array->n_meta_data;\n  float eff_compress_ratio = stmt->u.i.local_array->eff_compress_ratio;\n\n  if (hls->target == XILINX_HW) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int n = 0; n < \");\n    p = isl_printer_print_int(p, n_lane / nxt_n_lane);\n    p = isl_printer_print_str(p, \"; n++) {\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"#pragma HLS UNROLL\");\n    p = isl_printer_end_line(p);    \n    p = isl_printer_indent(p, 2);\n\n    if (is_sparse) {\n      /* data_split[n] = {out_data_d(), ...} */    \n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"data_split[n] = (\");\n      p = isl_printer_print_str(p, group->array->name);\n      p = isl_printer_print_str(p, \"_s_t\");\n      p = isl_printer_print_int(p, nxt_n_lane);\n      p = isl_printer_print_str(p, \"){\");\n      p = isl_printer_print_str(p, data_str);\n      p = isl_printer_print_str(p, \"_d(\");\n      p = isl_printer_print_int(p, group->array->size * 8 * nxt_n_lane * n_nzero - 1);\n      p = isl_printer_print_str(p, \", 0), \");\n      p = isl_printer_print_str(p, data_str);\n      p = isl_printer_print_str(p, \"_i(\");\n      p = isl_printer_print_int(p, 8 * nxt_n_lane - 1);\n      p = isl_printer_print_str(p, \", 0)};\");\n      p = isl_printer_end_line(p);      \n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, data_str);\n      p = isl_printer_print_str(p, \"_d = \");\n      p = isl_printer_print_str(p, data_str);\n      p = isl_printer_print_str(p, \"_d >> \");\n      p = isl_printer_print_int(p, group->array->size * 8 * nxt_n_lane * n_nzero);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, data_str);\n      p = isl_printer_print_str(p, \"_i = \");\n      p = isl_printer_print_str(p, data_str);\n      p = isl_printer_print_str(p, \"_i >> \");\n      p = isl_printer_print_int(p, 8 * nxt_n_lane);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    } else {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"data_split[n] = \");\n      p = isl_printer_print_str(p, data_str);\n      p = isl_printer_print_str(p, \"(\");\n      p = isl_printer_print_int(p, group->array->size * 8 * nxt_n_lane - 1);\n      p = isl_printer_print_str(p, \", 0);\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, data_str);\n      p = isl_printer_print_str(p, \" = \");\n      p = isl_printer_print_str(p, data_str);\n      p = isl_printer_print_str(p, \" >> \");\n      p = isl_printer_print_int(p, group->array->size * 8 * nxt_n_lane);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    }\n\n    p = isl_printer_indent(p, -2);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"}\");\n    p = isl_printer_end_line(p);\n  }\n  else if (hls->target == INTEL_HW && nxt_n_lane > 1) {    \n    for (int i = 0; i < n_lane / nxt_n_lane; i++) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"data_split[\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \"]\");\n      if (nxt_n_lane > 1)\n        p = isl_printer_print_str(p, \".data\");\n      p = isl_printer_print_str(p, \" = \");\n      p = isl_printer_print_str(p, data_str);\n      p = isl_printer_print_str(p, \".data.s\");\n      for (int j = 0; j < nxt_n_lane; j++) {\n        p = isl_printer_print_str(p, vector_index[j + i * nxt_n_lane]);\n      }\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    }    \n  }\n  else if (hls->target == CATAPULT_HW) {\n    for (int i = 0; i < n_lane / nxt_n_lane; i++) {\n      if (is_sparse) {\n        /* data_split[].set_slc(0, out_data_i.slc<>()); */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"data_split[\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \"].set_slc(0, \");\n        p = isl_printer_print_str(p, data_str);\n        p = isl_printer_print_str(p, \"_i.slc<\");\n        p = isl_printer_print_int(p, 8 * nxt_n_lane);\n        p = isl_printer_print_str(p, \">(\");\n        p = isl_printer_print_int(p, i * 8 * nxt_n_lane);\n        p = isl_printer_print_str(p, \"));\");\n        p = isl_printer_end_line(p);\n\n        /* data_split[].set_slc(xx, out_data_d.slc<>()); */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"data_split[\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \"].set_slc(\");\n        p = isl_printer_print_int(p, 8 * nxt_n_lane);\n        p = isl_printer_print_str(p, \", \");\n        p = isl_printer_print_str(p, data_str);\n        p = isl_printer_print_str(p, \"_d.slc<\");        \n        p = isl_printer_print_int(p, group->array->size * 8 * nxt_n_lane * n_nzero);\n        p = isl_printer_print_str(p, \">(\");\n        p = isl_printer_print_int(p, i * group->array->size * 8 * nxt_n_lane * n_nzero);\n        p = isl_printer_print_str(p, \"));\");\n        p = isl_printer_end_line(p);\n      } else {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"data_split[\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \"] = \");\n        p = isl_printer_print_str(p, data_str);\n        p = isl_printer_print_str(p, \".slc<\");        \n        p = isl_printer_print_int(p, group->array->size * 8 * nxt_n_lane);\n        p = isl_printer_print_str(p, \">(\");\n        p = isl_printer_print_int(p, i * group->array->size * 8 * nxt_n_lane);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  } else if (hls->target == TAPA_HW) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int n = 0; n < \");\n    p = isl_printer_print_int(p, n_lane / nxt_n_lane);\n    p = isl_printer_print_str(p, \"; n++) {\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"#pragma HLS UNROLL\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n\n    if (is_sparse) {\n      /* data_split[n] = {out_data_d(), ...} */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"data_split[n] = (\");\n      p = isl_printer_print_str(p, group->array->name);\n      p = isl_printer_print_str(p, \"_s_t\");\n      p = isl_printer_print_int(p, nxt_n_lane);\n      if (nxt_n_lane == 1) {\n        p = isl_printer_print_str(p, \"){\");\n        p = isl_printer_print_str(p, data_str);\n        p = isl_printer_print_str(p, \"_d[n], \");\n      } else {\n        p = isl_printer_print_str(p, \"){tapa::truncated<\");\n        p = isl_printer_print_int(p, nxt_n_lane);\n        p = isl_printer_print_str(p, \">(\");\n        p = isl_printer_print_str(p, data_str);\n        p = isl_printer_print_str(p, \"_d, \");\n        p = isl_printer_print_int(p, nxt_n_lane);\n        p = isl_printer_print_str(p, \"* n), \");\n      }\n      p = isl_printer_print_str(p, data_str);\n      p = isl_printer_print_str(p, \"_i(\");\n      p = isl_printer_print_int(p, 8 * nxt_n_lane - 1);\n      p = isl_printer_print_str(p, \", 0)};\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, data_str);\n      p = isl_printer_print_str(p, \"_i = \");\n      p = isl_printer_print_str(p, data_str);\n      p = isl_printer_print_str(p, \"_i >> \");\n      p = isl_printer_print_int(p, 8 * nxt_n_lane);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    } else {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"data_split[n] = \");\n      if (nxt_n_lane == 1) {\n        p = isl_printer_print_str(p, data_str);\n        p = isl_printer_print_str(p, \"[n];\");\n      } else {\n        p = isl_printer_print_str(p, \"tapa::truncated<\");\n        p = isl_printer_print_int(p, nxt_n_lane);\n        p = isl_printer_print_str(p, \">(\");\n        p = isl_printer_print_str(p, data_str);\n        p = isl_printer_print_str(p, \", \");\n        p = isl_printer_print_int(p, nxt_n_lane);\n        p = isl_printer_print_str(p, \" * n);\");\n      }\n      p = isl_printer_end_line(p);\n    }\n\n    p = isl_printer_indent(p, -2);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"}\");\n    p = isl_printer_end_line(p);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *io_transfer_read_data_split(\n  __isl_take isl_printer *p, struct autosa_kernel_stmt *stmt,  \n  struct hls_info *hls, const char *iterator_prefix) \n{\n  struct autosa_array_ref_group *group = stmt->u.i.group;\n  struct autosa_hw_module *module = stmt->u.i.module;  \n  int n_lane = stmt->u.i.data_pack;\n  int nxt_n_lane = stmt->u.i.nxt_data_pack;\n\n  /* Extract the sparse data. */\n  int is_sparse = group->local_array->is_sparse;\n  int vec_len = stmt->u.i.local_array->vec_len;\n  int n_nzero = stmt->u.i.local_array->n_nzero;\n  float compress_ratio = stmt->u.i.local_array->compress_ratio;\n  int n_meta_data = stmt->u.i.local_array->n_meta_data;\n  float eff_compress_ratio = stmt->u.i.local_array->eff_compress_ratio;\n\n  if (is_sparse) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"out_data = data_split[split_idx];\");\n    p = isl_printer_end_line(p);\n  } else {\n    if (hls->target == XILINX_HW) {\n      if (nxt_n_lane == 1) {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"union {unsigned int ui; \");\n        p = isl_printer_print_str(p, group->array->type);\n        p = isl_printer_print_str(p, \" ut;} u;\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"u.ui = (unsigned int)data_split[split_idx];\");\n        p = isl_printer_end_line(p);\n      }\n    }\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"out_data = \");\n    if (hls->target == XILINX_HW) {\n      if (nxt_n_lane == 1) {\n        p = isl_printer_print_str(p, \"u.ut\");\n      } else {\n        p = isl_printer_print_str(p, \"data_split[split_idx]\");\n      }\n    } else if (hls->target == INTEL_HW) {\n      if (nxt_n_lane > 1)\n        p = isl_printer_print_str(p, \"data_split[split_idx]\");\n      else      \n        p = isl_printer_print_str(p, \"in_data.data[split_idx]\");\n    } else if (hls->target == CATAPULT_HW) {\n      p = isl_printer_print_str(p, \"data_split[split_idx]\");\n    } else if (hls->target == TAPA_HW) {\n      p = isl_printer_print_str(p, \"data_split[split_idx]\");\n    }\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);    \n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *autosa_kernel_print_io_transfer(\n  __isl_take isl_printer *p, struct autosa_kernel_stmt *stmt,  \n  struct hls_info *hls, const char *iterator_prefix, \n  char *in_fifo_suffix, char *out_fifo_suffix,\n  enum IO_TRANS_DIR in, enum IO_TRANS_DIR out) \n{\n  struct autosa_array_ref_group *group = stmt->u.i.group;\n  struct autosa_hw_module *module = stmt->u.i.module;  \n  int n_lane = stmt->u.i.data_pack;\n  int nxt_n_lane = stmt->u.i.nxt_data_pack;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n  isl_ast_expr *local_index_packed = isl_ast_expr_copy(stmt->u.i.local_index);  \n  int n_arg;\n  int boundary = stmt->u.i.boundary;\n  /* If the statement is a boundary statement, \n   * then ignore the filter condition by setting filter_sched_depth as -1\n   */\n  if (boundary)\n    stmt->u.i.filter_sched_depth = -1;\n\n  /* Extract the sparse data. */\n  int is_sparse = group->local_array->is_sparse;\n  int vec_len = stmt->u.i.local_array->vec_len;\n  int n_nzero = stmt->u.i.local_array->n_nzero;\n  float compress_ratio = stmt->u.i.local_array->compress_ratio;\n  int n_meta_data = stmt->u.i.local_array->n_meta_data;\n  float eff_compress_ratio = stmt->u.i.local_array->eff_compress_ratio;\n\n  /* Pre-process the local index. */\n  if (group->local_array->is_sparse) {\n    isl_ast_expr *arg, *div;\n    n_arg = isl_ast_expr_get_op_n_arg(local_index_packed);\n    arg = isl_ast_expr_get_op_arg(local_index_packed, n_arg - 1);\n    div = isl_ast_expr_from_val(isl_val_int_from_si(ctx, vec_len * n_lane));\n    arg = isl_ast_expr_div(arg, div);\n    local_index_packed = isl_ast_expr_set_op_arg(local_index_packed, n_arg - 1, arg);\n  } else {\n    if (n_lane > 1)\n    {\n      isl_ast_expr *arg, *div;\n      n_arg = isl_ast_expr_get_op_n_arg(local_index_packed);\n      arg = isl_ast_expr_get_op_arg(local_index_packed, n_arg - 1);\n      div = isl_ast_expr_from_val(isl_val_int_from_si(ctx, n_lane));\n      arg = isl_ast_expr_div(arg, div);\n      local_index_packed = isl_ast_expr_set_op_arg(local_index_packed, n_arg - 1, arg);\n    }\n  }\n\n  p = ppcg_start_block(p);  \n\n  /* Declare some common variables here. */  \n  int in_n_lane, out_n_lane;\n  if (module->in) {    \n    in_n_lane = n_lane;\n    out_n_lane = nxt_n_lane;    \n  } else {\n    in_n_lane = nxt_n_lane;\n    out_n_lane = n_lane;\n  }\n\n  /* [type_in] in_data; */\n  p = isl_printer_start_line(p);\n  if (group->local_array->is_sparse) {\n    p = autosa_print_array_type_with_lane_sparse(p, group->array, in_n_lane);\n  } else {    \n    p = isl_printer_print_str(p, stmt->u.i.array->name);    \n    p = isl_printer_print_str(p, \"_t\");\n    p = isl_printer_print_int(p, in_n_lane);\n  } \n  p = isl_printer_print_str(p, \" in_data;\");\n  p = isl_printer_end_line(p);\n  \n  /* [type_out] out_data; */\n  p = isl_printer_start_line(p);\n  if (group->local_array->is_sparse) {\n    p = autosa_print_array_type_with_lane_sparse(p, group->array, out_n_lane);\n  } else {    \n    p = isl_printer_print_str(p, stmt->u.i.array->name);\n    p = isl_printer_print_str(p, \"_t\");\n    p = isl_printer_print_int(p, out_n_lane);    \n  }\n  p = isl_printer_print_str(p, \" out_data;\");\n  p = isl_printer_end_line(p);  \n\n  if (n_lane != nxt_n_lane) {\n    /* [type_nxt_n_lane] data_split[]; */\n    if (hls->target == XILINX_HW ||\n        hls->target == CATAPULT_HW ||\n        hls->target == TAPA_HW ||\n      (hls->target == INTEL_HW && nxt_n_lane > 1)) {\n      p = isl_printer_start_line(p);\n      if (is_sparse) {\n        p = autosa_print_array_type_with_lane_sparse(p, group->array, nxt_n_lane);\n      } else {\n        if (nxt_n_lane == 1) {\n          if (hls->target == XILINX_HW) {\n            p = isl_printer_print_str(p, \"ap_uint<\");\n            p = isl_printer_print_int(p, group->array->size * 8);\n            p = isl_printer_print_str(p, \">\");\n          } else if (hls->target == TAPA_HW) {\n            p = isl_printer_print_str(p, group->array->type);\n          } else if (hls->target == INTEL_HW) {\n            p = isl_printer_print_str(p, group->array->type);\n          } else if (hls->target == CATAPULT_HW) {\n            p = isl_printer_print_str(p, group->array->name);\n            p = isl_printer_print_str(p, \"_t\");\n            p = isl_printer_print_int(p, nxt_n_lane);\n          }       \n        } else {\n          p = isl_printer_print_str(p, group->array->name);\n          p = isl_printer_print_str(p, \"_t\");\n          p = isl_printer_print_int(p, nxt_n_lane);\n        }\n      }\n      p = isl_printer_print_str(p, \" data_split[\");\n      p = isl_printer_print_int(p, n_lane / nxt_n_lane);\n      p = isl_printer_print_str(p, \"];\");\n      p = isl_printer_end_line(p);\n\n      if (hls->target == XILINX_HW || hls->target == TAPA_HW)\n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"#pragma HLS ARRAY_PARTITION variable=data_split complete\");\n        p = isl_printer_end_line(p);\n      }\n    }     \n  }\n  \n  if ((in == GLOBAL_BUF || in == LOCAL_BUF) && (n_lane != nxt_n_lane)) {\n    /* Insert guards. */\n    /* if (cx % xx == 0) { */\n    if (stmt->u.i.coalesce_depth >= 0) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"if (\");\n      if (iterator_prefix != NULL) {\n        p = isl_printer_print_str(p, iterator_prefix);\n      } else {\n        p = isl_printer_print_str(p, \"c\");\n      }    \n      p = isl_printer_print_int(p, stmt->u.i.coalesce_depth);\n      p = isl_printer_print_str(p, \" % \");\n      p = isl_printer_print_int(p, n_lane / nxt_n_lane);\n      p = isl_printer_print_str(p, \" == 0) {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n    }\n  }\n\n  /* Read in data */\n  if (in == GLOBAL_BUF) {\n    /* in_data = global_buf[]; */\n    p = isl_printer_start_line(p);        \n    p = isl_printer_print_str(p, \"in_data = \");\n    p = io_stmt_print_global_index(p, stmt, stmt->u.i.serialize);\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n  } else if (in == LOCAL_BUF) {    \n    /* in_data = local_buf[]; */\n    p = isl_printer_start_line(p);   \n    p = isl_printer_print_str(p, \"in_data = \");\n\n    if (stmt->u.i.module->double_buffer && \n          stmt->u.i.module->options->autosa->double_buffer_style == 0) {  \n      isl_ast_expr *op;\n\n      op = isl_ast_expr_op_get_arg(local_index_packed, 0);\n      p = isl_printer_print_ast_expr(p, op);\n      isl_ast_expr_free(op);\n      p = isl_printer_print_str(p, stmt->u.i.in? \"[arb]\" : \"[!arb]\");\n      for (int n = 1; n < isl_ast_expr_op_get_n_arg(local_index_packed); n++) {\n        op = isl_ast_expr_op_get_arg(local_index_packed, n);\n        p = isl_printer_print_str(p, \"[\");\n        p = isl_printer_print_ast_expr(p, op);\n        p = isl_printer_print_str(p, \"]\");\n        isl_ast_expr_free(op);\n      }\n    } else if (hls->target == CATAPULT_HW && stmt->u.i.module->is_filter) {\n      isl_ast_expr *op;\n\n      op = isl_ast_expr_op_get_arg(local_index_packed, 0);\n      p = isl_printer_print_ast_expr(p, op);    \n      isl_ast_expr_free(op);\n      p = isl_printer_print_str(p, \"_tmp.data\");\n      for (int n = 1; n < isl_ast_expr_op_get_n_arg(local_index_packed); n++) {\n        op = isl_ast_expr_op_get_arg(local_index_packed, n);\n        p = isl_printer_print_str(p, \"[\");\n        p = isl_printer_print_ast_expr(p, op);\n        p = isl_printer_print_str(p, \"]\");\n        isl_ast_expr_free(op);\n      }      \n    } else {\n      p = isl_printer_print_ast_expr(p, local_index_packed);\n    }\n    \n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n  } else if (in == FIFO) {\n    char *fifo_in_name;\n    fifo_in_name = concat(ctx, stmt->u.i.in_fifo_name, in_fifo_suffix);        \n    \n    /* in_data = fifo_in.read(); */\n    p = isl_printer_start_line(p);  \n    p = isl_printer_print_str(p, \"in_data = \");\n    if (hls->target == XILINX_HW)\n      p = print_fifo_rw_xilinx(p, fifo_in_name, 1);\n    else if (hls->target == TAPA_HW)\n      p = print_fifo_rw_tapa(p, fifo_in_name, 1);\n    else if (hls->target == INTEL_HW)\n      p = print_fifo_rw_intel(p, fifo_in_name, 1);      \n    else if (hls->target == CATAPULT_HW)\n      p = print_fifo_rw_catapult(p, fifo_in_name, 1);  \n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);  \n\n    free(fifo_in_name);\n  }\n\n  /* Re-pack data in the middle. */\n  if (n_lane == nxt_n_lane) {\n    if (stmt->u.i.reduce) {\n      p = autosa_print_reduce_default(p, stmt, n_lane, local_index_packed, group);\n    } else {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"out_data = in_data;\");\n      p = isl_printer_end_line(p);\n    }\n  } else {\n    if (out == FIFO) {\n      /* write_data_split: data_split[] = in_data... */\n      p = io_transfer_write_data_split(p, stmt, hls, iterator_prefix, \"in_data\");\n    }    \n\n    if ((in == GLOBAL_BUF || in == LOCAL_BUF) && (n_lane != nxt_n_lane)) {\n      /* Insert guards. */\n      /* if (cx % xx == 0) { */\n      if (stmt->u.i.coalesce_depth >= 0) {\n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n      }\n    }\n\n    /* calculate_split_idx: split_idx = ... */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int split_idx = (\");    \n    p = io_stmt_print_index_last_dim(\n          p, stmt, stmt->u.i.serialize, ((in == GLOBAL_BUF) || (out == GLOBAL_BUF))? 1 : 0,\n          n_lane, nxt_n_lane, is_sparse, vec_len);\n    p = isl_printer_print_str(p, \") % \");\n    p = isl_printer_print_int(p, n_lane / nxt_n_lane);\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n\n    if (out == GLOBAL_BUF) {\n      /* update_data_split: data_split[split_i] = in_data; */\n      p = io_transfer_update_data_split(p, stmt, hls, iterator_prefix);\n\n      /* pack_out_data: out_data = (data_split[], ...); */\n      p = io_transfer_pack_out_data(p, stmt, hls, iterator_prefix);\n    } else if (out == LOCAL_BUF) {\n      /* read_local_buf: out_data = local_buf[...]; */\n      p = io_transfer_read_local_buf(p, stmt, hls, iterator_prefix, local_index_packed);\n\n      /* parse_sparse_data */\n      if (is_sparse) {\n        p = io_transfer_parse_sparse_data(p, stmt, hls, iterator_prefix);\n      }\n\n      /* write_data_split: data_split[] = out_data... */\n      p = io_transfer_write_data_split(p, stmt, hls, iterator_prefix, \"out_data\");\n\n      /* update_data_split: data_split[split_i] = in_data; */\n      p = io_transfer_update_data_split(p, stmt, hls, iterator_prefix);\n\n      /* pack_out_data: out_data = (data_split[], ...) */\n      p = io_transfer_pack_out_data(p, stmt, hls, iterator_prefix);\n    } else if (out == FIFO) {\n      /* read_data_split: out_data = data_split[split_i]; */\n      p = io_transfer_read_data_split(p, stmt, hls, iterator_prefix);\n    }\n  }\n\n  if ((out == GLOBAL_BUF || out == LOCAL_BUF) && (n_lane != nxt_n_lane)) {\n    if (stmt->u.i.coalesce_depth >= 0) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"if (\");\n      if (iterator_prefix != NULL) {\n        p = isl_printer_print_str(p, iterator_prefix);\n      } else {\n        p = isl_printer_print_str(p, \"c\");\n      }            \n      p = isl_printer_print_int(p, stmt->u.i.coalesce_depth);\n      p = isl_printer_print_str(p, \" % \");\n      p = isl_printer_print_int(p, n_lane / nxt_n_lane);\n      p = isl_printer_print_str(p, \" == \");\n      p = isl_printer_print_int(p, n_lane / nxt_n_lane);\n      p = isl_printer_print_str(p, \" - 1 || c\");\n      p = isl_printer_print_int(p, stmt->u.i.coalesce_depth);\n      p = isl_printer_print_str(p, \" == \");\n      p = isl_printer_print_int(p, stmt->u.i.coalesce_bound - 1);\n      p = isl_printer_print_str(p, \") {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n    }\n  }\n\n  /* Write out data. */\n  if (out == GLOBAL_BUF) {\n    /* global_buf[] = in_data; */\n    p = isl_printer_start_line(p);   \n    p = io_stmt_print_global_index(p, stmt, stmt->u.i.serialize);\n    p = isl_printer_print_str(p, \" = out_data;\");\n    p = isl_printer_end_line(p);\n  } else if (out == LOCAL_BUF) {      \n    /* local_buf[] = fifo_data; */\n    //if (stmt->u.i.reduce) {\n    //  p = autosa_print_reduce_default(p, stmt, n_lane, local_index_packed, group);\n    //} else {\n      p = isl_printer_start_line(p);\n\n      if (stmt->u.i.module->double_buffer && \n            stmt->u.i.module->options->autosa->double_buffer_style == 0) {\n        isl_ast_expr *op;\n              \n        op = isl_ast_expr_op_get_arg(local_index_packed, 0);\n        p = isl_printer_print_ast_expr(p, op);\n        isl_ast_expr_free(op);\n        p = isl_printer_print_str(p, stmt->u.i.in? \"[arb]\" : \"[!arb]\");\n        for (int n = 1; n < isl_ast_expr_op_get_n_arg(local_index_packed); n++) {\n            op = isl_ast_expr_op_get_arg(local_index_packed, n);\n            p = isl_printer_print_str(p, \"[\");\n            p = isl_printer_print_ast_expr(p, op);\n            p = isl_printer_print_str(p, \"]\");\n            isl_ast_expr_free(op);\n        }        \n      } else if (hls->target == CATAPULT_HW && stmt->u.i.module->is_filter) {\n        isl_ast_expr *op;\n\n        op = isl_ast_expr_op_get_arg(local_index_packed, 0);\n        p = isl_printer_print_ast_expr(p, op);    \n        isl_ast_expr_free(op);\n        p = isl_printer_print_str(p, \"_tmp.data\");\n        for (int n = 1; n < isl_ast_expr_op_get_n_arg(local_index_packed); n++) {\n          op = isl_ast_expr_op_get_arg(local_index_packed, n);\n          p = isl_printer_print_str(p, \"[\");\n          p = isl_printer_print_ast_expr(p, op);\n          p = isl_printer_print_str(p, \"]\");\n          isl_ast_expr_free(op);\n        }        \n      } else {        \n        p = isl_printer_print_ast_expr(p, local_index_packed);        \n      }\n\n      p = isl_printer_print_str(p, \" \");\n      //if (stmt->u.i.reduce) {        \n      //  p = isl_printer_print_str(p, stmt->u.i.reduce_op);        \n      //}               \n      p = isl_printer_print_str(p, \"= out_data;\");\n      p = isl_printer_end_line(p);\n    //}    \n  } else if (out == FIFO) {      \n    char *fifo_out_name;\n    fifo_out_name = concat(ctx, stmt->u.i.out_fifo_name, out_fifo_suffix);      \n\n    /* fifo_out.write(fifo_data); */          \n    p = isl_printer_start_line(p);\n    if (hls->target == XILINX_HW)\n      p = print_fifo_rw_xilinx(p, fifo_out_name, 0);\n    else if (hls->target == TAPA_HW)\n      p = print_fifo_rw_tapa(p, fifo_out_name, 0);\n    else if (hls->target == INTEL_HW)\n      p = print_fifo_rw_intel(p, fifo_out_name, 0);\n    else if (hls->target == CATAPULT_HW)\n      p = print_fifo_rw_catapult(p, fifo_out_name, 0);\n    p = isl_printer_print_str(p, \"out_data);\");\n    p = isl_printer_end_line(p);\n   \n    free(fifo_out_name);    \n  }\n\n  if ((out == GLOBAL_BUF || out == LOCAL_BUF) && (n_lane != nxt_n_lane)) {\n    if (stmt->u.i.coalesce_depth >= 0) {\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n    }\n  }\n\n  p = ppcg_end_block(p);\n\n  isl_ast_expr_free(local_index_packed);\n\n  return p;\n}\n\n/* This function extracts the necessary information for generating I/O transfer statements and \n * calls the final function to generate the statements.\n */\nstatic __isl_give isl_printer *autosa_kernel_print_io_transfer_wrapper(\n  __isl_take isl_printer *p, struct autosa_kernel_stmt *stmt,  \n  struct hls_info *hls, const char *iterator_prefix\n) {\n  int n_lane, nxt_n_lane;\n  enum IO_TRANS_DIR in, out;\n  char in_fifo_suffix[100], out_fifo_suffix[100];\n\n  struct autosa_array_ref_group *group = stmt->u.i.group;\n  struct autosa_hw_module *module = stmt->u.i.module;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  if (stmt->type == AUTOSA_KERNEL_STMT_IO_DRAM) {\n    if (stmt->u.i.in) {\n      if (module->is_serialized) {\n        in = FIFO;\n        //sprintf(in_fifo_suffix, \"serialize\");\n        sprintf(in_fifo_suffix, \"in\");\n      } else {\n        in = GLOBAL_BUF;\n      }\n\n      if (stmt->u.i.buf) {\n        out = LOCAL_BUF;\n      } else {\n        out = FIFO;\n        sprintf(out_fifo_suffix, \"out\");\n      }      \n    } else {\n      if (stmt->u.i.buf) {\n        in = LOCAL_BUF;\n      } else {\n        in = FIFO;\n        sprintf(in_fifo_suffix, \"in\");\n      }\n\n      if (module->is_serialized) {\n        out = FIFO;\n        //sprintf(out_fifo_suffix, \"serialize\");\n        sprintf(out_fifo_suffix, \"out\");\n      } else {\n        out = GLOBAL_BUF;\n      }\n    }\n  } else if (stmt->type == AUTOSA_KERNEL_STMT_IO_TRANSFER) {\n    if (stmt->u.i.in) {\n      in = FIFO;\n      sprintf(in_fifo_suffix, \"in\");\n\n      if (stmt->u.i.buf) {\n        out = LOCAL_BUF;\n      } else {\n        out = FIFO;\n        sprintf(out_fifo_suffix, \"out\");\n      }\n    } else {\n      if (stmt->u.i.buf) {\n        in = LOCAL_BUF;\n      } else {\n        in = FIFO;\n        sprintf(in_fifo_suffix, \"in\");\n      }\n\n      out = FIFO;\n      sprintf(out_fifo_suffix, \"out\");\n    }    \n  }\n\n  p = autosa_kernel_print_io_transfer(\n    p, stmt, hls, iterator_prefix, in_fifo_suffix, out_fifo_suffix, in, out);\n\n  return p;\n}\n\n/* Print an I/O transfer statement.\n * is_filter = 0\n * is_buf = 1\n * An in I/O statement is printed as\n *\n *  [type] fifo_data;\n *  [type2] buf_data;\n *  [type] buf_data_split[];\n *  buf_data = local_buf[...];\n *  fifo_data = fifo.read();\n *  for (int n = 0; n < n_lane / nxt_n_lane; n++) {\n *    buf_data_split[n] = buf_data();\n *    buf_data = buf_data >> DW;\n *  }\n *  buf_data_split[...] = Reinterpret<>(fifo_data);\n *  buf_data = (buf_data_split[1], ...);\n *  local_buf[...] = buf_data;\n *\n * An out I/O staement is printed as \n *\n *  [type] fifo_data;\n *  [type2] buf_data;\n *  [type] buf_data_split[];\n *  buf_data = local_buf[...];\n *  for (int n = 0; n < n_lane / nxt_n_lane; n++) {\n *    buf_data_split[n] = buf_data();\n *    buf_data = buf_data >> DW;\n *  }\n *  fifo_data = Reinterpret<>(buf_data_split[...]);\n *  fifo.write(fifo_data);\n */\nstatic __isl_give isl_printer *autosa_kernel_print_io_transfer_data_pack(\n  __isl_take isl_printer *p, struct autosa_kernel_stmt *stmt,\n  struct autosa_array_ref_group *group, int n_lane, int nxt_n_lane,\n  struct hls_info *hls, const char *iterator_prefix, int global, int buffer)\n{\n  isl_ctx *ctx;\n  ctx = isl_printer_get_ctx(p);\n  int boundary = stmt->u.i.boundary;\n\n  char *fifo_name;\n  isl_ast_expr *expr, *op;\n  int n_arg;\n  int r;\n  isl_val *val;\n  isl_ast_expr *local_index_packed;\n  isl_ast_expr *arg, *div;\n  local_index_packed = isl_ast_expr_copy(stmt->u.i.local_index);\n  /* Extract the sparse data */\n  int is_sparse = group->local_array->is_sparse;\n  int vec_len = stmt->u.i.local_array->vec_len;\n  int n_nzero = stmt->u.i.local_array->n_nzero;\n  float compress_ratio = stmt->u.i.local_array->compress_ratio;\n  int n_meta_data = stmt->u.i.local_array->n_meta_data;\n  float eff_compress_ratio = stmt->u.i.local_array->eff_compress_ratio;\n\n  /* Modify the local index. */\n  if (is_sparse) {\n    n_arg = isl_ast_expr_get_op_n_arg(local_index_packed);\n    arg = isl_ast_expr_get_op_arg(local_index_packed, n_arg - 1);\n    div = isl_ast_expr_from_val(isl_val_int_from_si(ctx, vec_len * n_lane));\n    arg = isl_ast_expr_div(arg, div);\n    local_index_packed = isl_ast_expr_set_op_arg(local_index_packed, n_arg - 1, arg);\n  } else {\n    if (n_lane > 1)\n    {\n      n_arg = isl_ast_expr_get_op_n_arg(local_index_packed);\n      arg = isl_ast_expr_get_op_arg(local_index_packed, n_arg - 1);\n      div = isl_ast_expr_from_val(isl_val_int_from_si(ctx, n_lane));\n      arg = isl_ast_expr_div(arg, div);\n      local_index_packed = isl_ast_expr_set_op_arg(local_index_packed, n_arg - 1, arg);\n    }\n  }\n\n  /* [type] fifo_data; */\n  p = isl_printer_start_line(p);\n  if (is_sparse) \n    p = autosa_print_array_type_with_lane_sparse(p, group->array, nxt_n_lane);\n  else\n    p = autosa_print_array_type_with_lane(p, group->array, nxt_n_lane);  \n  p = isl_printer_print_str(p, \" \");\n  p = isl_printer_print_str(p, \"fifo_data;\");\n  p = isl_printer_end_line(p);\n\n  /* [type2] buf_data; */\n  p = isl_printer_start_line(p);\n  if (is_sparse) {\n    p = autosa_print_array_type_with_lane_sparse(p, group->array, n_lane);\n  } else {\n    p = isl_printer_print_str(p, group->array->name);\n    p = isl_printer_print_str(p, \"_t\");\n    p = isl_printer_print_int(p, n_lane);\n  }\n  p = isl_printer_print_str(p, \" \");\n  p = isl_printer_print_str(p, \"buf_data;\");\n  p = isl_printer_end_line(p);\n\n  /* [type] buf_data_split[]; */  \n  if (hls->target == XILINX_HW ||\n      hls->target == CATAPULT_HW ||\n      hls->target == TAPA_HW ||\n      (hls->target == INTEL_HW && nxt_n_lane > 1)) {\n    p = isl_printer_start_line(p);\n    if (is_sparse) {\n      p = autosa_print_array_type_with_lane_sparse(p, group->array, nxt_n_lane);\n    } else {\n      if (nxt_n_lane == 1)\n      {\n        if (hls->target == XILINX_HW)\n        {\n          p = isl_printer_print_str(p, \"ap_uint<\");\n          p = isl_printer_print_int(p, group->array->size * 8);\n          p = isl_printer_print_str(p, \">\");\n        }\n        else if (hls->target == TAPA_HW)\n        {\n          p = isl_printer_print_str(p, group->array->type);\n        }\n        else if (hls->target == INTEL_HW)\n        {\n          p = isl_printer_print_str(p, group->array->type);\n        }\n        else if (hls->target == CATAPULT_HW)\n        {\n          p = isl_printer_print_str(p, group->array->name);\n          p = isl_printer_print_str(p, \"_t\");\n          p = isl_printer_print_int(p, nxt_n_lane);\n        }\n      }\n      else\n      {\n        p = isl_printer_print_str(p, group->array->name);\n        p = isl_printer_print_str(p, \"_t\");\n        p = isl_printer_print_int(p, nxt_n_lane);\n      }\n    }\n    p = isl_printer_print_str(p, \" buf_data_split[\");\n    p = isl_printer_print_int(p, n_lane / nxt_n_lane);\n    p = isl_printer_print_str(p, \"];\");\n    p = isl_printer_end_line(p);\n    if (hls->target == XILINX_HW || hls->target == TAPA_HW)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"#pragma HLS ARRAY_PARTITION variable=buf_data_split complete\");\n      p = isl_printer_end_line(p);\n    }\n  }\n  \n  if (stmt->u.i.in && stmt->u.i.coalesce_depth >= 0)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"if (\");\n    if (iterator_prefix != NULL) {\n      p = isl_printer_print_str(p, iterator_prefix);\n    } else {\n      p = isl_printer_print_str(p, \"c\");\n    }    \n    p = isl_printer_print_int(p, stmt->u.i.coalesce_depth);\n    p = isl_printer_print_str(p, \" % \");\n    p = isl_printer_print_int(p, n_lane / nxt_n_lane);\n    p = isl_printer_print_str(p, \" == 0) {\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n  }\n  /* buf_data = local[]; */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"buf_data = \");\n  if (stmt->u.i.module->double_buffer && \n      stmt->u.i.module->options->autosa->double_buffer_style == 0) {\n    isl_ast_expr *op;\n    op = isl_ast_expr_op_get_arg(local_index_packed, 0);\n    p = isl_printer_print_ast_expr(p, op);    \n    isl_ast_expr_free(op);\n    p = isl_printer_print_str(p, stmt->u.i.in? \"[arb]\" : \"[!arb]\");\n    for (int n = 1; n < isl_ast_expr_op_get_n_arg(local_index_packed); n++) {\n      op = isl_ast_expr_op_get_arg(local_index_packed, n);\n      p = isl_printer_print_str(p, \"[\");\n      p = isl_printer_print_ast_expr(p, op);\n      p = isl_printer_print_str(p, \"]\");\n      isl_ast_expr_free(op);\n    }\n  } else {\n    if (hls->target == CATAPULT_HW && stmt->u.i.module->is_filter) {\n      isl_ast_expr *op;\n      op = isl_ast_expr_op_get_arg(local_index_packed, 0);\n      p = isl_printer_print_ast_expr(p, op);    \n      isl_ast_expr_free(op);\n      p = isl_printer_print_str(p, \"_tmp.data\");\n      for (int n = 1; n < isl_ast_expr_op_get_n_arg(local_index_packed); n++) {\n        op = isl_ast_expr_op_get_arg(local_index_packed, n);\n        p = isl_printer_print_str(p, \"[\");\n        p = isl_printer_print_ast_expr(p, op);\n        p = isl_printer_print_str(p, \"]\");\n        isl_ast_expr_free(op);\n      }\n    } else {\n      p = isl_printer_print_ast_expr(p, local_index_packed);\n    }\n  }\n\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n\n  if (is_sparse) {\n    /* [type] buf_data_d = buf_data.d; */\n    p = isl_printer_start_line(p);\n    p = autosa_print_array_type_with_lane(p, group->array, n_lane * n_nzero);\n    p = isl_printer_print_str(p, \" buf_data_d = buf_data.d;\");\n    p = isl_printer_end_line(p);\n\n    /* [type] buf_data_i = buf_data.i; */\n    p = isl_printer_start_line(p);\n    if (hls->target == XILINX_HW || hls->target == TAPA_HW) {\n      p = isl_printer_print_str(p, \"ap_uint<\");\n      p = isl_printer_print_int(p, 8 * n_lane);\n    } else if (hls->target == CATAPULT_HW) {\n      p = isl_printer_print_str(p, \"ac_int<\");\n      p = isl_printer_print_int(p, 8 * n_lane);\n      p = isl_printer_print_str(p, \", false\");\n    }\n    p = isl_printer_print_str(p, \"> buf_data_i = buf_data.i;\");\n    p = isl_printer_end_line(p);      \n  }\n\n  if (hls->target == XILINX_HW)\n  {    \n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int n = 0; n < \");\n    p = isl_printer_print_int(p, n_lane / nxt_n_lane);\n    p = isl_printer_print_str(p, \"; n++) {\");\n    p = isl_printer_end_line(p);\n        \n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"#pragma HLS UNROLL\");\n    p = isl_printer_end_line(p);    \n    p = isl_printer_indent(p, 2);\n\n    if (is_sparse) {\n      /* buf_data_split[n] = {buf_data_d(), ...} */    \n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"buf_data_split[n] = (\");\n      p = isl_printer_print_str(p, group->array->name);\n      p = isl_printer_print_str(p, \"_s_t\");\n      p = isl_printer_print_int(p, nxt_n_lane);\n      p = isl_printer_print_str(p, \"){buf_data_d(\");\n      p = isl_printer_print_int(p, group->array->size * 8 * nxt_n_lane * n_nzero - 1);\n      p = isl_printer_print_str(p, \", 0), buf_data_i(\");\n      p = isl_printer_print_int(p, 8 * nxt_n_lane - 1);\n      p = isl_printer_print_str(p, \", 0)};\");\n      p = isl_printer_end_line(p);      \n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"buf_data_d = buf_data_d >> \");\n      p = isl_printer_print_int(p, group->array->size * 8 * nxt_n_lane * n_nzero);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"buf_data_i = buf_data_i >> \");\n      p = isl_printer_print_int(p, 8 * nxt_n_lane);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    } else {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"buf_data_split[n] = buf_data(\");\n      p = isl_printer_print_int(p, group->array->size * 8 * nxt_n_lane - 1);\n      p = isl_printer_print_str(p, \", 0);\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"buf_data = buf_data >> \");\n      p = isl_printer_print_int(p, group->array->size * 8 * nxt_n_lane);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    }\n\n    p = isl_printer_indent(p, -2);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"}\");\n    p = isl_printer_end_line(p);\n  }\n  else if (hls->target == INTEL_HW && nxt_n_lane > 1) \n  {    \n    for (int i = 0; i < n_lane / nxt_n_lane; i++)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"buf_data_split[\");\n      p = isl_printer_print_int(p, i);\n      p = isl_printer_print_str(p, \"]\");\n      if (nxt_n_lane > 1)\n        p = isl_printer_print_str(p, \".data\");\n      p = isl_printer_print_str(p, \" = buf_data.data.s\");\n      for (int j = 0; j < nxt_n_lane; j++)\n      {\n        p = isl_printer_print_str(p, vector_index[j + i * nxt_n_lane]);\n      }\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    }    \n  }\n  else if (hls->target == CATAPULT_HW) {\n    for (int i = 0; i < n_lane / nxt_n_lane; i++) {\n      if (is_sparse) {\n        /* buf_data_split[].set_slc(0, buf_data_i.slc<>()); */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"buf_data_split[\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \"].set_slc(0, \");\n        p = isl_printer_print_str(p, \"buf_data_i.slc<\");\n        p = isl_printer_print_int(p, 8 * nxt_n_lane);\n        p = isl_printer_print_str(p, \">(\");\n        p = isl_printer_print_int(p, i * 8 * nxt_n_lane);\n        p = isl_printer_print_str(p, \"));\");\n        p = isl_printer_end_line(p);\n\n        /* buf_data_split[].set_slc(xx, buf_data_d.slc<>()); */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"buf_data_split[\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \"].set_slc(\");\n        p = isl_printer_print_int(p, 8 * nxt_n_lane);\n        p = isl_printer_print_str(p, \", buf_data_d.slc<\");;\n        p = isl_printer_print_int(p, group->array->size * 8 * nxt_n_lane * n_nzero);\n        p = isl_printer_print_str(p, \">(\");\n        p = isl_printer_print_int(p, i * group->array->size * 8 * nxt_n_lane * n_nzero);\n        p = isl_printer_print_str(p, \"));\");\n        p = isl_printer_end_line(p);\n      } else {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"buf_data_split[\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \"] = buf_data.slc<\");\n        p = isl_printer_print_int(p, group->array->size * 8 * nxt_n_lane);\n        p = isl_printer_print_str(p, \">(\");\n        p = isl_printer_print_int(p, i * group->array->size * 8 * nxt_n_lane);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  } else if (hls->target == TAPA_HW) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int n = 0; n < \");\n    p = isl_printer_print_int(p, n_lane / nxt_n_lane);\n    p = isl_printer_print_str(p, \"; n++) {\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"#pragma HLS UNROLL\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n\n    if (is_sparse) {\n      /* buf_data_split[n] = {buf_data_d(), ...} */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"buf_data_split[n] = (\");\n      p = isl_printer_print_str(p, group->array->name);\n      p = isl_printer_print_str(p, \"_s_t\");\n      p = isl_printer_print_int(p, nxt_n_lane);\n      p = isl_printer_print_str(p, \"){\");\n      if (nxt_n_lane == 1)\n        p = isl_printer_print_str(p, \"buf_data_d[n]\");\n      else {\n        p = isl_printer_print_str(p, \"tapa::truncated<\");\n        p = isl_printer_print_int(p, nxt_n_lane);\n        p = isl_printer_print_str(p, \">(buf_data_d, \");\n        p = isl_printer_print_int(p, nxt_n_lane);\n        p = isl_printer_print_str(p, \"* n)\");\n      }\n      p = isl_printer_print_str(p, \", buf_data_i(\");\n      p = isl_printer_print_int(p, 8 * nxt_n_lane - 1);\n      p = isl_printer_print_str(p, \", 0)};\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"buf_data_i = buf_data_i >> \");\n      p = isl_printer_print_int(p, 8 * nxt_n_lane);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    } else {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"buf_data_split[n] = \");\n      if (nxt_n_lane == 1)\n        p = isl_printer_print_str(p, \"buf_data[n]\");\n      else {\n        p = isl_printer_print_str(p, \"tapa::truncated<\");\n        p = isl_printer_print_int(p, nxt_n_lane);\n        p = isl_printer_print_str(p, \">(buf_data, \");\n        p = isl_printer_print_int(p, nxt_n_lane);\n        p = isl_printer_print_str(p, \"* n)\");\n      }\n      p = isl_printer_end_line(p);\n    }\n\n    p = isl_printer_indent(p, -2);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"}\");\n    p = isl_printer_end_line(p);\n  }\n\n  if (stmt->u.i.in && stmt->u.i.coalesce_depth >= 0)\n  {\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n\n  /* split_i = ... */\n  expr = isl_ast_expr_copy(stmt->u.i.local_index);\n  n_arg = isl_ast_expr_op_get_n_arg(expr);\n  op = isl_ast_expr_op_get_arg(expr, n_arg - 1);\n  r = n_lane / nxt_n_lane;\n  if (is_sparse) \n    val = isl_val_int_from_si(ctx, vec_len * nxt_n_lane);\n  else\n    val = isl_val_int_from_si(ctx, nxt_n_lane);\n  op = isl_ast_expr_div(op, isl_ast_expr_from_val(val));\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"int split_i = (\");\n  p = isl_printer_print_ast_expr(p, op);\n  p = isl_printer_print_str(p, \") % \");\n  p = isl_printer_print_int(p, r);\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n  isl_ast_expr_free(op);\n  isl_ast_expr_free(expr);\n  if (stmt->u.i.in)\n  {\n    fifo_name = concat(ctx, stmt->u.i.in_fifo_name, \"in\");\n    /* fifo_data = fifo.read(); */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"fifo_data = \");\n    if (hls->target == XILINX_HW)\n      p = print_fifo_rw_xilinx(p, fifo_name, 1);\n    else if (hls->target == TAPA_HW)\n      p = print_fifo_rw_tapa(p, fifo_name, 1);\n    else if (hls->target == INTEL_HW)\n      p = print_fifo_rw_intel(p, fifo_name, 1);\n    else if (hls->target == CATAPULT_HW)\n      p = print_fifo_rw_catapult(p, fifo_name, 1);\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n      /* buf_data_split[...] = Reinterpret<>(fifo_data); */\n    if (hls->target == XILINX_HW ||\n        hls->target == TAPA_HW ||\n        hls->target == CATAPULT_HW || \n        (hls->target == INTEL_HW && nxt_n_lane > 1)) {\n      if (stmt->u.i.reduce) {\n        p = autosa_print_reduce_data_pack(p, stmt, nxt_n_lane, n_lane, group, hls->target); // TODO\n      } else {      \n        if (hls->target == XILINX_HW)\n        {\n          if (nxt_n_lane == 1)\n          {\n            p = isl_printer_start_line(p);\n            p = isl_printer_print_str(p, \"union {unsigned int ui; \");\n            p = isl_printer_print_str(p, group->array->type);\n            p = isl_printer_print_str(p, \" ut;} u;\");\n            p = isl_printer_end_line(p);\n  \n            p = isl_printer_start_line(p);\n            p = isl_printer_print_str(p, \"u.ut = fifo_data;\");\n            p = isl_printer_end_line(p);\n          }\n        }\n  \n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"buf_data_split[split_i] \");\n        if (stmt->u.i.reduce) {\n          p = isl_printer_print_str(p, stmt->u.i.reduce_op);\n        }\n        p = isl_printer_print_str(p, \"= \");\n  \n        if (hls->target == XILINX_HW)\n        {\n          if (nxt_n_lane == 1)\n          {\n            p = isl_printer_print_str(p, \"ap_uint<\");\n            p = isl_printer_print_int(p, group->array->size * 8);\n            p = isl_printer_print_str(p, \">(u.ui);\");\n          }\n          else\n          {\n            p = isl_printer_print_str(p, \"fifo_data;\");\n          }\n        }\n        else \n        {\n          p = isl_printer_print_str(p, \"fifo_data;\");\n        }\n        p = isl_printer_end_line(p);      \n      }\n  \n      if (stmt->u.i.coalesce_depth >= 0)\n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"if (\");\n        if (iterator_prefix != NULL) {\n          p = isl_printer_print_str(p, iterator_prefix);\n        } else {\n          p = isl_printer_print_str(p, \"c\");\n        }            \n        p = isl_printer_print_int(p, stmt->u.i.coalesce_depth);\n        p = isl_printer_print_str(p, \" % \");\n        p = isl_printer_print_int(p, n_lane / nxt_n_lane);\n        p = isl_printer_print_str(p, \" == \");\n        p = isl_printer_print_int(p, n_lane / nxt_n_lane);\n        p = isl_printer_print_str(p, \" - 1 || c\");\n        p = isl_printer_print_int(p, stmt->u.i.coalesce_depth);\n        p = isl_printer_print_str(p, \" == \");\n        p = isl_printer_print_int(p, stmt->u.i.coalesce_bound - 1);\n        p = isl_printer_print_str(p, \") {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n      }\n    }\n    /* buf_data = (buf_data_split[1], ...); */\n    p = isl_printer_start_line(p);\n    if (hls->target == XILINX_HW)\n    {\n      int first = 1;\n      p = isl_printer_print_str(p, \"buf_data = (\");\n      for (int i = n_lane / nxt_n_lane - 1; i >= 0; i--)\n      {\n        if (!first)\n          p = isl_printer_print_str(p, \", \");\n        p = isl_printer_print_str(p, \"buf_data_split[\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \"]\");\n          first = 0;\n      }\n      p = isl_printer_print_str(p, \");\");\n    } else if (hls->target == INTEL_HW)\n    {\n      if (nxt_n_lane == 1) {\n        p = isl_printer_print_str(p, \"buf_data.data[split_i] = fifo_data;\");\n      } else {\n        int first = 1;\n        p = isl_printer_print_str(p, \"buf_data.data = \");\n        p = isl_printer_print_str(p, \"(\");\n        p = isl_printer_print_str(p, group->array->type);\n        p = isl_printer_print_int(p, n_lane);\n        p = isl_printer_print_str(p, \")(\");\n          for (int i = 0; i < n_lane / nxt_n_lane; i++)\n        {\n          if (!first)\n            p = isl_printer_print_str(p, \", \");\n            if (nxt_n_lane > 1)\n          {\n            p = isl_printer_print_str(p, \"(\");\n            p = isl_printer_print_str(p, group->array->type);\n            p = isl_printer_print_int(p, nxt_n_lane);\n            p = isl_printer_print_str(p, \")\");\n          }\n          p = isl_printer_print_str(p, \"buf_data_split[\");\n          p = isl_printer_print_int(p, i);\n          p = isl_printer_print_str(p, \"]\");\n          if (nxt_n_lane > 1)\n          {\n            p = isl_printer_print_str(p, \".data\");\n          }\n            first = 0;\n        }\n        p = isl_printer_print_str(p, \");\");\n      }\n    } else if (hls->target == CATAPULT_HW) {\n      for (int i = 0; i < n_lane / nxt_n_lane; i++) {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"buf_data.set_slc(\");\n        p = isl_printer_print_int(p, i * group->array->size * 8 * nxt_n_lane);\n        p = isl_printer_print_str(p, \", buf_data_split[\");\n        p = isl_printer_print_int(p, i);\n        p = isl_printer_print_str(p, \"]);\");\n        p = isl_printer_end_line(p);  \n      }\n    } else if (hls->target == TAPA_HW) {\n      for (int i = 0; i < n_lane / nxt_n_lane; i++) {\n        if (nxt_n_lane == 1) {\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"buf_data.set(\");\n          p = isl_printer_print_int(p, i);\n          p = isl_printer_print_str(p, \", buf_data_split[\");\n          p = isl_printer_print_int(p, i);\n          p = isl_printer_print_str(p, \"]);\");\n          p = isl_printer_end_line(p);\n        } else {\n          for (int j = 0; j < nxt_n_lane; j++) {\n            p = isl_printer_start_line(p);\n            p = isl_printer_print_str(p, \"buf_data.set(\");\n            p = isl_printer_print_int(p, i * nxt_n_lane + j);\n            p = isl_printer_print_str(p, \", buf_data_split[\");\n            p = isl_printer_print_int(p, i);\n            p = isl_printer_print_str(p, \"][\");\n            p = isl_printer_print_int(p, j);\n            p = isl_printer_print_str(p, \"]);\");\n            p = isl_printer_end_line(p);\n          }\n        }\n      }\n    }\n\n      p = isl_printer_end_line(p);\n      /* local_buf[...] = buf_data; */\n    p = isl_printer_start_line(p);    \n    if (stmt->u.i.module->double_buffer && \n        stmt->u.i.module->options->autosa->double_buffer_style == 0) {\n      isl_ast_expr *op;\n      op = isl_ast_expr_op_get_arg(local_index_packed, 0);\n      p = isl_printer_print_ast_expr(p, op);\n      isl_ast_expr_free(op);\n      p = isl_printer_print_str(p, \"[arb]\");\n      for (int n = 1; n < isl_ast_expr_op_get_n_arg(local_index_packed); n++) {\n        op = isl_ast_expr_op_get_arg(local_index_packed, n);\n        p = isl_printer_print_str(p, \"[\");\n        p = isl_printer_print_ast_expr(p, op);\n        p = isl_printer_print_str(p, \"]\");\n        isl_ast_expr_free(op);\n      }\n    } else {\n      if (hls->target == CATAPULT_HW && stmt->u.i.module->is_filter) {\n        isl_ast_expr *op;\n        op = isl_ast_expr_op_get_arg(local_index_packed, 0);\n        p = isl_printer_print_ast_expr(p, op);    \n        isl_ast_expr_free(op);\n        p = isl_printer_print_str(p, \"_tmp.data\");\n        for (int n = 1; n < isl_ast_expr_op_get_n_arg(local_index_packed); n++) {\n          op = isl_ast_expr_op_get_arg(local_index_packed, n);\n          p = isl_printer_print_str(p, \"[\");\n          p = isl_printer_print_ast_expr(p, op);\n          p = isl_printer_print_str(p, \"]\");\n          isl_ast_expr_free(op);\n        }        \n      } else {\n        p = isl_printer_print_ast_expr(p, local_index_packed);\n      }\n    }\n    p = isl_printer_print_str(p, \" = buf_data;\");\n    p = isl_printer_end_line(p);\n      if (stmt->u.i.coalesce_depth >= 0)\n    {\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n    }\n      free(fifo_name);\n  } else {\n    if (is_sparse) {\n      /* fifo_data = buf_data_split[...]; */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"fifo_data = buf_data_split[split_i];\");\n      p = isl_printer_end_line(p);\n      /* fifo.write(fifo_data); */\n      fifo_name = concat(ctx, stmt->u.i.out_fifo_name, \"out\");\n      p = isl_printer_start_line(p);\n      if (hls->target == XILINX_HW)\n        p = print_fifo_rw_xilinx(p, fifo_name, 0);\n      else if (hls->target == TAPA_HW)\n        p = print_fifo_rw_tapa(p, fifo_name, 0);\n      else if (hls->target == CATAPULT_HW)\n        p = print_fifo_rw_catapult(p, fifo_name, 0);\n      else if (hls->target == INTEL_HW)\n        p = print_fifo_rw_intel(p, fifo_name, 0);\n      p = isl_printer_print_str(p, \"fifo_data);\");\n      p = isl_printer_end_line(p);\n      free(fifo_name);\n    } else {\n      fifo_name = concat(ctx, stmt->u.i.out_fifo_name, \"out\");\n      if (hls->target == XILINX_HW)\n      {\n        if (nxt_n_lane == 1)\n        {\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"union {unsigned int ui; \");\n          p = isl_printer_print_str(p, group->array->type);\n          p = isl_printer_print_str(p, \" ut;} u;\");\n          p = isl_printer_end_line(p);\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"u.ui = (unsigned int)buf_data_split[split_i];\");\n          p = isl_printer_end_line(p);\n        }\n      }\n      /* fifo_data = Reinterpret<>(buf_data_split[...]); */    \n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"fifo_data = \");\n      if (hls->target == XILINX_HW)\n      {\n        if (nxt_n_lane == 1)\n        {\n          p = isl_printer_print_str(p, \"u.ut\");\n        }\n        else\n        {\n          p = isl_printer_print_str(p, \"buf_data_split[split_i]\");\n        }\n      }\n      else if (hls->target == INTEL_HW)\n      {\n        if (nxt_n_lane > 1)\n          p = isl_printer_print_str(p, \"buf_data_split[split_i]\");\n        else      \n          p = isl_printer_print_str(p, \"buf_data.data[split_i]\");\n      }\n      else if (hls->target == CATAPULT_HW) \n      {\n        p = isl_printer_print_str(p, \"buf_data_split[split_i]\");\n      }\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);    \n        /* fifo.write(fifo_data); */\n      p = isl_printer_start_line(p);\n      if (hls->target == XILINX_HW)\n        p = print_fifo_rw_xilinx(p, fifo_name, 0);\n      else if (hls->target == TAPA_HW)\n        p = print_fifo_rw_tapa(p, fifo_name, 0);\n      else if (hls->target == INTEL_HW)\n        p = print_fifo_rw_intel(p, fifo_name, 0);\n      else if (hls->target == CATAPULT_HW)\n        p = print_fifo_rw_catapult(p, fifo_name, 0);\n      p = isl_printer_print_str(p, \"fifo_data);\");\n      p = isl_printer_end_line(p);\n        free(fifo_name);\n    }\n  }\n\n  isl_ast_expr_free(local_index_packed);\n\n  return p;\n}\n\n///* Print an I/O transfer statement.\n// */\n//__isl_give isl_printer *autosa_kernel_print_io_transfer(\n//    __isl_take isl_printer *p,\n//    struct autosa_kernel_stmt *stmt, struct hls_info *hls, const char *iterator_prefix)\n//{\n//  struct autosa_hw_module *module = stmt->u.i.module;\n//  struct autosa_array_ref_group *group = stmt->u.i.group;\n//  int n_lane = stmt->u.i.data_pack;\n//  int nxt_n_lane = stmt->u.i.nxt_data_pack;\n//  //int is_filter = stmt->u.i.filter;\n//  int is_buf = stmt->u.i.buf;\n//  isl_ctx *ctx = isl_printer_get_ctx(p);\n//\n//  //  p = ppcg_start_block(p);\n//  if (n_lane == nxt_n_lane) {    \n//    p = autosa_kernel_print_io_transfer_default(p, stmt, group, n_lane, hls, iterator_prefix);\n//  } else {    \n//    p = autosa_kernel_print_io_transfer_data_pack(\n//          p, stmt, group, n_lane, nxt_n_lane, hls, iterator_prefix, 0, 1);\n//  }\n//  //  p = ppcg_end_block(p);\n//\n//  return p;\n//}\n\n/* Print a serialization/deserialization statement.\n * Serialization:\n * X_to[X_cnt++] = X_from[...]\n * Deserizalition:\n * X_to[...] = X_from[X_cnt++]\n */\n__isl_give isl_printer *autosa_kernel_print_host_serialize(\n  __isl_take isl_printer *p,\n  struct autosa_kernel_stmt *stmt,\n  struct hls_info *hls)\n{\n  isl_ast_expr *index, *arg;\n  isl_id *id;\n  const char *array_name;\n\n  index = stmt->u.s.index;\n  p = isl_printer_start_line(p);\n  arg = isl_ast_expr_get_op_arg(index, 0);\n  id = isl_ast_expr_id_get_id(arg);\n  array_name = isl_id_get_name(id);\n  isl_id_free(id);\n  isl_ast_expr_free(arg);\n\n  arg = isl_ast_expr_get_op_arg(index, 1);\n\n  if (stmt->u.s.in) {\n    p = isl_printer_print_str(p, array_name);\n    p = isl_printer_print_str(p, \"_to[cnt++] = \");    \n    p = isl_printer_print_str(p, array_name);\n    p = isl_printer_print_str(p, \"_from[\");\n    if (stmt->u.s.group->local_array->is_sparse)\n      p = isl_printer_print_str(p, \"(\");\n    p = isl_printer_print_ast_expr(p, arg);\n    if (stmt->u.s.group->local_array->is_sparse)\n      p = isl_printer_print_str(p, \") / EFF_COMPRESS_RATIO\");\n    p = isl_printer_print_str(p, \"];\");\n  } else {\n    p = isl_printer_print_str(p, array_name);\n    p = isl_printer_print_str(p, \"_to[\");\n    p = isl_printer_print_ast_expr(p, arg);\n    p = isl_printer_print_str(p, \"] = \");\n    p = isl_printer_print_str(p, array_name);\n    p = isl_printer_print_str(p, \"_from[cnt++];\");    \n  }\n  p = isl_printer_end_line(p);\n  isl_ast_expr_free(arg);\n\n  return p;\n}\n\n/* Print a drain merge statement.\n *\n * [group_array_prefix]_to[...] = [group_array_prefix]_from[...]\n */\n__isl_give isl_printer *autosa_kernel_print_drain_merge(__isl_take isl_printer *p,\n                                                        struct autosa_kernel_stmt *stmt, struct hls_info *hls)\n{\n  isl_ast_expr *index_to, *index_from, *arg;\n  isl_ctx *ctx = hls->ctx;\n  struct autosa_drain_merge_func *func = stmt->u.dm.func;\n  isl_ast_expr *index = stmt->u.dm.index;\n  int n_arg;\n  isl_id *id;\n  const char *array_name;\n  char *new_array_name;\n  isl_printer *p_str;\n\n  p = isl_printer_start_line(p);\n  // TODO\n  n_arg = isl_ast_expr_get_op_n_arg(index);\n  /* Modify the index. */\n  arg = isl_ast_expr_get_op_arg(index, 0);\n  id = isl_ast_expr_id_get_id(arg);\n  array_name = isl_id_get_name(id);\n  isl_id_free(id);\n  isl_ast_expr_free(arg);\n  p_str = isl_printer_to_str(ctx);\n  p_str = isl_printer_print_str(p_str, array_name);\n  p_str = isl_printer_print_str(p_str, \"_to\");\n  new_array_name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  id = isl_id_alloc(ctx, new_array_name, NULL);\n  arg = isl_ast_expr_from_id(id);\n  free(new_array_name);\n  index_to = isl_ast_expr_set_op_arg(isl_ast_expr_copy(index), 0, arg);\n\n  arg = isl_ast_expr_get_op_arg(index, 0);\n  id = isl_ast_expr_id_get_id(arg);\n  array_name = isl_id_get_name(id);\n  isl_id_free(id);\n  isl_ast_expr_free(arg);\n  p_str = isl_printer_to_str(ctx);\n  p_str = isl_printer_print_str(p_str, array_name);\n  p_str = isl_printer_print_str(p_str, \"_from\");\n  new_array_name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  id = isl_id_alloc(ctx, new_array_name, NULL);\n  arg = isl_ast_expr_from_id(id);\n  free(new_array_name);\n  index_from = isl_ast_expr_set_op_arg(isl_ast_expr_copy(index), 0, arg);\n\n  p = isl_printer_print_ast_expr(p, index_to);\n  p = isl_printer_print_str(p, \" = \");\n  p = isl_printer_print_ast_expr(p, index_from);\n  p = isl_printer_print_str(p, \";\");\n\n  isl_ast_expr_free(index_to);\n  isl_ast_expr_free(index_from);\n\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print an I/O dram statement.\n *\n * An in I/O statement is printed as \n *\n *  [type] fifo_data;\n *  fifo_data = global;\n *  or \n *  fifo_data = fifo_[arr].read() // when serialize is enabled\n *  fifo.write(fifo_data);\n *\n * while an out I/O statement is printed as\n *\n *  [type] fifo_data;\n *  fifo_data = fifo.read();\n *  global = fifo_data;\n *  or \n *  fifo_[arr].write(fifo_data); // when serialize is enabled\n */\n__isl_give isl_printer *autosa_kernel_print_io_dram(\n  __isl_take isl_printer *p,\n  struct autosa_kernel_stmt *stmt, struct hls_info *hls,\n  const char *iterator_prefix)\n{\n  // TODO: add when data packing factors are different.\n  struct autosa_array_ref_group *group = stmt->u.i.group;\n  struct autosa_hw_module *module = stmt->u.i.module;\n  char *fifo_name;\n  int n_lane = stmt->u.i.data_pack;\n  int nxt_n_lane = stmt->u.i.nxt_data_pack;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n  int buf = stmt->u.i.buf;\n  isl_ast_expr *local_index_packed;  \n  int n_arg;  \n  /* Extract the sparse data. */\n  int is_sparse = group->local_array->is_sparse;\n  int vec_len = stmt->u.i.local_array->vec_len;\n  int n_nzero = stmt->u.i.local_array->n_nzero;\n  float compress_ratio = stmt->u.i.local_array->compress_ratio;\n  int n_meta_data = stmt->u.i.local_array->n_meta_data;\n  float eff_compress_ratio = stmt->u.i.local_array->eff_compress_ratio;\n\n  local_index_packed = isl_ast_expr_copy(stmt->u.i.local_index);\n  /* Modify the local index; */\n  if (group->local_array->is_sparse) {\n    isl_ast_expr *arg, *div;\n    n_arg = isl_ast_expr_get_op_n_arg(local_index_packed);\n    arg = isl_ast_expr_get_op_arg(local_index_packed, n_arg - 1);\n    div = isl_ast_expr_from_val(isl_val_int_from_si(ctx, vec_len * n_lane));\n    arg = isl_ast_expr_div(arg, div);\n    local_index_packed = isl_ast_expr_set_op_arg(local_index_packed, n_arg - 1, arg);\n  } else {\n    if (n_lane > 1)\n    {\n      isl_ast_expr *arg, *div;\n      n_arg = isl_ast_expr_get_op_n_arg(local_index_packed);\n      arg = isl_ast_expr_get_op_arg(local_index_packed, n_arg - 1);\n      div = isl_ast_expr_from_val(isl_val_int_from_si(ctx, n_lane));\n      arg = isl_ast_expr_div(arg, div);\n      local_index_packed = isl_ast_expr_set_op_arg(local_index_packed, n_arg - 1, arg);\n    }\n  }\n\n  p = isl_printer_indent(p, -2);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"{\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_indent(p, 2);\n\n  /* Declare the fifo data variable. */\n  p = isl_printer_start_line(p);\n  if (group->local_array->is_sparse) {\n    p = autosa_print_array_type_with_lane_sparse(p, group->array, nxt_n_lane);\n  } else {    \n    p = isl_printer_print_str(p, stmt->u.i.array->name);\n    p = isl_printer_print_str(p, \"_t\");\n    p = isl_printer_print_int(p, nxt_n_lane);    \n  }\n  p = isl_printer_print_str(p, \" fifo_data;\");\n  p = isl_printer_end_line(p);\n\n  if (stmt->u.i.in)\n  {\n    /* Generate the serialize fifo name */\n    isl_printer *p_str;\n    char *serialize_fifo_name;\n    p_str = isl_printer_to_str(ctx);\n    p_str = autosa_array_ref_group_print_fifo_name(group, p_str);\n    p_str = isl_printer_print_str(p_str, \"_serialize\");\n    serialize_fifo_name = isl_printer_get_str(p_str);\n    isl_printer_free(p_str);\n\n    p = isl_printer_start_line(p);    \n    p = isl_printer_print_str(p, \"fifo_data = \");        \n    if (module->is_serialized) {\n      if (hls->target == XILINX_HW)\n        p = print_fifo_rw_xilinx(p, serialize_fifo_name, 1);\n      else if (hls->target == TAPA_HW)\n        p = print_fifo_rw_tapa(p, serialize_fifo_name, 1);\n      else if (hls->target == INTEL_HW)\n        p = print_fifo_rw_intel(p, serialize_fifo_name, 1);      \n      else if (hls->target == CATAPULT_HW)\n        p = print_fifo_rw_catapult(p, serialize_fifo_name, 1);      \n    } else {\n      p = io_stmt_print_global_index(p, stmt, stmt->u.i.serialize);    \n    }\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n\n    free(serialize_fifo_name);\n\n    if (!buf) {            \n      fifo_name = concat(ctx, stmt->u.i.out_fifo_name, \"out\");      \n      p = isl_printer_start_line(p);\n      if (hls->target == XILINX_HW)\n        p = print_fifo_rw_xilinx(p, fifo_name, 0);\n      else if (hls->target == TAPA_HW)\n        p = print_fifo_rw_tapa(p, fifo_name, 0);\n      else if (hls->target == INTEL_HW)\n        p = print_fifo_rw_intel(p, fifo_name, 0);\n      else if (hls->target == CATAPULT_HW)\n        p = print_fifo_rw_catapult(p, fifo_name, 0);\n      p = isl_printer_print_str(p, \"fifo_data);\");\n      p = isl_printer_end_line(p);\n      free(fifo_name);      \n    }\n    else\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_ast_expr(p, local_index_packed);\n      p = isl_printer_print_str(p, \" = fifo_data;\");\n      p = isl_printer_end_line(p);\n    }\n  }\n  else\n  {\n    if (!buf)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"fifo_data = \");      \n      fifo_name = concat(ctx, stmt->u.i.in_fifo_name, \"in\");      \n      if (hls->target == XILINX_HW)\n        p = print_fifo_rw_xilinx(p, fifo_name, 1);\n      else if (hls->target == TAPA_HW)\n        p = print_fifo_rw_tapa(p, fifo_name, 1);\n      else if (hls->target == INTEL_HW)\n        p = print_fifo_rw_intel(p, fifo_name, 1);\n      else if (hls->target == CATAPULT_HW)\n        p = print_fifo_rw_catapult(p, fifo_name, 1);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n      free(fifo_name);\n    }\n    else\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"fifo_data = \");\n      p = isl_printer_print_ast_expr(p, local_index_packed);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    }\n\n    p = isl_printer_start_line(p);    \n    if (module->is_serialized) {\n      /* Generate serialize fifo name */\n      isl_printer *p_str;\n      char *serialize_fifo_name;\n      p_str = isl_printer_to_str(ctx);\n      p_str = autosa_array_ref_group_print_fifo_name(group, p_str);\n      p_str = isl_printer_print_str(p_str, \"_serialize\");\n      serialize_fifo_name = isl_printer_get_str(p_str);\n      isl_printer_free(p_str);\n\n      if (hls->target == XILINX_HW)\n        p = print_fifo_rw_xilinx(p, serialize_fifo_name, 0);\n      else if (hls->target == TAPA_HW)\n        p = print_fifo_rw_tapa(p, serialize_fifo_name, 0);\n      else if (hls->target == INTEL_HW)\n        p = print_fifo_rw_intel(p, serialize_fifo_name, 0);\n      else if (hls->target == CATAPULT_HW)\n        p = print_fifo_rw_catapult(p, serialize_fifo_name, 0);\n      p = isl_printer_print_str(p, \"fifo_data);\");\n\n      free(serialize_fifo_name);\n    } else {\n      p = io_stmt_print_global_index(p, stmt, stmt->u.i.serialize);\n      p = isl_printer_print_str(p, \" = fifo_data;\");\n    }\n    p = isl_printer_end_line(p);\n  }\n\n  p = isl_printer_indent(p, -2);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"}\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_indent(p, 2);\n\n  isl_ast_expr_free(local_index_packed);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_inter_trans_module_call(\n    __isl_take isl_printer *p,\n    struct autosa_hw_module *module, struct autosa_prog *prog,\n    struct autosa_kernel *kernel, struct hls_info *hls, int arb, int boundary)\n{\n  int n = isl_id_list_n_id(module->inst_ids);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, module->name);\n  p = isl_printer_print_str(p, \"_inter_trans\");\n  if (boundary)\n    p = isl_printer_print_str(p, \"_boundary\");\n  if (prog->scop->options->autosa->use_cplusplus_template) {\n    p = isl_printer_print_str(p, \"<\");\n    for (int i = 0; i < n; i++) {\n      if (i > 0) \n        p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, \"p\");\n      p = isl_printer_print_int(p, i);\n    }\n    p = isl_printer_print_str(p, \">\");\n  }\n  if (hls->target == CATAPULT_HW) {\n    p = isl_printer_print_str(p, \"_inst.run\");\n  }\n  p = isl_printer_print_str(p, \"(\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_indent(p, 2);\n  p = isl_printer_start_line(p);\n  p = print_module_arguments(p, prog, kernel, module, 0,\n                             hls->target, 1, arb, boundary, 0);\n  p = isl_printer_end_line(p);\n  p = isl_printer_indent(p, -2);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \");\");\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print the function call for inter_transfer module. */\n__isl_give isl_printer *autosa_kernel_print_inter_trans(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct hls_info *hls)\n{\n  struct autosa_hw_module *module = stmt->u.f.module;\n  struct autosa_kernel *kernel = module->kernel;\n  struct autosa_prog *prog = kernel->prog;\n  int boundary = stmt->u.f.boundary;\n\n  if (hls->target == CATAPULT_HW) {    \n    p = print_inter_trans_module_call(p, module, prog, kernel, hls, 0, boundary);\n  } else {\n    if (module->double_buffer)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"if (arb == 0) {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n    }\n\n    p = print_inter_trans_module_call(p, module, prog, kernel, hls, 0, boundary);\n\n    if (module->double_buffer)\n    {\n      p = isl_printer_indent(p, -2);\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"} else {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n\n      p = print_inter_trans_module_call(p, module, prog, kernel, hls, 1, boundary);\n\n      p = isl_printer_indent(p, -2);\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"}\");\n      p = isl_printer_end_line(p);\n    }\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_intra_trans_module_call(\n    __isl_take isl_printer *p,\n    struct autosa_hw_module *module, struct autosa_prog *prog,\n    struct autosa_kernel *kernel,\n    struct hls_info *hls, int arb)\n{\n  int n = isl_id_list_n_id(module->inst_ids);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, module->name);\n  p = isl_printer_print_str(p, \"_intra_trans\");\n  if (prog->scop->options->autosa->use_cplusplus_template) {\n    p = isl_printer_print_str(p, \"<\");\n    for (int i = 0; i < n; i++) {\n      if (i > 0) \n        p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, \"p\");\n      p = isl_printer_print_int(p, i);\n    }\n    p = isl_printer_print_str(p, \">\");\n  }\n  if (hls->target == CATAPULT_HW) {\n    p = isl_printer_print_str(p, \"_inst.run\");\n  }\n  p = isl_printer_print_str(p, \"(\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_indent(p, 2);\n  p = isl_printer_start_line(p);\n  p = print_module_arguments(p, prog, kernel, module, 0, \n                             hls->target, 0, arb, 0, 0);\n  p = isl_printer_end_line(p);\n  p = isl_printer_indent(p, -2);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \");\");\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print the function call for intra_transfer module. */\n__isl_give isl_printer *autosa_kernel_print_intra_trans(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct hls_info *hls)\n{\n  struct autosa_hw_module *module = stmt->u.f.module;\n  struct autosa_kernel *kernel = module->kernel;\n  struct autosa_prog *prog = kernel->prog;\n\n  if (hls->target == CATAPULT_HW) {\n    p = print_intra_trans_module_call(p, module, prog, kernel, hls, 1);\n  } else {\n    if (module->double_buffer)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"if (arb == 0) {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n    }\n\n    p = print_intra_trans_module_call(p, module, prog, kernel, hls, 0);\n\n    if (module->double_buffer)\n    {\n      p = isl_printer_indent(p, -2);\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"} else {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n\n      p = print_intra_trans_module_call(p, module, prog, kernel, hls, 1);\n\n      p = isl_printer_indent(p, -2);\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"}\");\n      p = isl_printer_end_line(p);\n    }\n  }\n\n  return p;\n}\n\n/* Print the function calls for inter_transfer and intra_tranfer modules. */\n__isl_give isl_printer *autosa_kernel_print_inter_intra(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct hls_info *hls)\n{\n  struct autosa_hw_module *module = stmt->u.f.module;\n  struct autosa_kernel *kernel = module->kernel;\n  struct autosa_prog *prog = kernel->prog;\n  int boundary = stmt->u.f.boundary;\n\n  if (module->double_buffer && hls->target != CATAPULT_HW)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"if (arb == 0) {\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n  }\n\n  /* inter_trans */\n  p = print_inter_trans_module_call(p, module, prog, kernel, hls, 0, boundary);\n  /* intra_trans */\n  p = print_intra_trans_module_call(p, module, prog, kernel, hls, 0);\n\n  if (module->double_buffer && hls->target != CATAPULT_HW)\n  {\n    p = isl_printer_indent(p, -2);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"} else {\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_indent(p, 2);\n\n    /* inter_trans */\n    p = print_inter_trans_module_call(p, module, prog, kernel, hls, 1, boundary);\n    /* intra_trans */\n    p = print_intra_trans_module_call(p, module, prog, kernel, hls, 1);\n\n    p = isl_printer_indent(p, -2);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"}\");\n    p = isl_printer_end_line(p);\n  }\n\n  return p;\n}\n\n/* Print the function calls for intra_transfer and inter_tranfer modules. */\n__isl_give isl_printer *autosa_kernel_print_intra_inter(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct hls_info *hls)\n{\n  struct autosa_hw_module *module = stmt->u.f.module;\n  struct autosa_kernel *kernel = module->kernel;\n  struct autosa_prog *prog = kernel->prog;\n  int boundary = stmt->u.f.boundary;\n\n  if (module->double_buffer && hls->target != CATAPULT_HW)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"if (arb == 0) {\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n  }\n\n  /* intra_trans */\n  p = print_intra_trans_module_call(p, module, prog, kernel, hls, 0);\n  /* inter_trans */\n  p = print_inter_trans_module_call(p, module, prog, kernel, hls, 0, boundary);\n\n  if (module->double_buffer && hls->target != CATAPULT_HW)\n  {\n    p = isl_printer_indent(p, -2);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"} else {\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_indent(p, 2);\n\n    /* intra_trans */\n    p = print_intra_trans_module_call(p, module, prog, kernel, hls, 1);\n    /* inter_trans */\n    p = print_inter_trans_module_call(p, module, prog, kernel, hls, 1, boundary);\n\n    p = isl_printer_indent(p, -2);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"}\");\n    p = isl_printer_end_line(p);\n  }\n\n  return p;\n}\n\n/* Print the state transfer for double buffers. */\n__isl_give isl_printer *autosa_kernel_print_state_handle(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct hls_info *hls)\n{\n  struct autosa_hw_module *module = stmt->u.f.module;\n  isl_space *space;\n  int n;\n\n  if (hls->target == CATAPULT_HW)\n    return p;\n\n  if (module->in)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"intra_trans_en = 1;\");\n    p = isl_printer_end_line(p);\n  }\n  else\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"inter_trans_en = 1;\");\n    p = isl_printer_end_line(p);\n  }\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"arb = !arb;\");\n  p = isl_printer_end_line(p);\n\n  if (module->in)\n  {\n    /* intra trans */\n    space = module->intra_space;\n  }\n  else\n  {\n    /* inter trans */\n    space = module->inter_space;\n  }\n  n = isl_space_dim(space, isl_dim_set);\n  for (int i = 0; i < n; i++)\n  {\n    const char *name;\n    name = isl_space_get_dim_name(space, isl_dim_set, i);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, name);\n    p = isl_printer_print_str(p, \"_prev = \");\n    p = isl_printer_print_str(p, name);\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n  }\n\n  return p;\n}\n\n/* Print the body for a module that connects to the DRAM with serialized data. \n */\n__isl_give isl_printer *print_module_serialize_body(\n  __isl_take isl_printer *p, struct autosa_hw_module *module, struct hls_info *hls)\n{\n  isl_pw_qpolynomial *total_bound_pwq = module->io_groups[0]->array->local_array->serialize_bound;\n  long int total_bound = -1;  \n  int ele_size = module->io_groups[0]->array->size; // bytes\n  total_bound = convert_pwqpoly_to_int(total_bound_pwq);\n  int data_pack_in = module->data_pack_serialize;\n  int data_pack_out = module->data_pack_inter;  \n  char *fifo_name;\n  isl_printer *p_str;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n  /* Extract the sparse information */\n  int is_sparse = module->io_groups[0]->local_array->is_sparse;\n  int vec_len = module->io_groups[0]->local_array->vec_len;\n  int n_nzero = module->io_groups[0]->local_array->n_nzero;\n  float compress_ratio = module->io_groups[0]->local_array->compress_ratio;\n  int n_meta_data = module->io_groups[0]->local_array->n_meta_data;\n  float eff_compress_ratio = module->io_groups[0]->local_array->eff_compress_ratio;\n\n  int axi_stream = module->options->autosa->axi_stream;\n\n  p_str = isl_printer_to_str(ctx);\n  p_str = autosa_array_ref_group_print_fifo_name(module->io_groups[0], p_str);  \n  fifo_name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  \n  if (data_pack_in == data_pack_out) {    \n    if (module->in) { \n      char *new_fifo_name;\n\n      if (hls->target == INTEL_HW)\n        p = print_str_new_line(p, \"#pragma loop_coalesce\");\n      else if (hls->target == CATAPULT_HW)\n        p = print_str_new_line(p, \"#pragma hls_pipeline_init_interval 1\");\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int i = 0; i < \");      \n      if (is_sparse)\n        p = isl_printer_print_int(p, total_bound / eff_compress_ratio / data_pack_in);\n      else\n        p = isl_printer_print_int(p, total_bound / data_pack_out);\n      p = isl_printer_print_str(p, \"; i++) {\");\n      p = isl_printer_end_line(p);\n          \n      if (hls->target == XILINX_HW || hls->target == TAPA_HW)\n        p = print_str_new_line(p, \"#pragma HLS PIPELINE II=1\");\n\n      p = isl_printer_indent(p, 2);\n      p = isl_printer_start_line(p);\n      p = autosa_print_array_type_with_lane(p, module->io_groups[0]->array, data_pack_out);\n      p = isl_printer_print_str(p, \" fifo_data;\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"fifo_data = \");\n      if (axi_stream) {\n        //char *fifo_name;\n        //isl_printer *p_str;\n        //p_str = isl_printer_to_str(ctx);\n        //p_str = isl_printer_print_str(p_str,\"fifo_\");\n        //p_str = isl_printer_print_str(p_str, module->io_groups[0]->array->name);\n        //fifo_name = isl_printer_get_str(p_str);\n        //isl_printer_free(p_str);\n\n        if (hls->target == XILINX_HW)\n          p = print_fifo_rw_xilinx(p, fifo_name, 1);\n        else if (hls->target == TAPA_HW)\n          p = print_fifo_rw_tapa(p, fifo_name, 1);\n        else if (hls->target == INTEL_HW)\n          p = print_fifo_rw_intel(p, fifo_name, 1);\n        else if (hls->target == CATAPULT_HW)\n          p = print_fifo_rw_catapult(p, fifo_name, 1);\n        p = isl_printer_print_str(p, \";\");\n\n        //free(fifo_name);\n      } else {\n        p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n        p = isl_printer_print_str(p, \"[i];\");\n      }\n      p = isl_printer_end_line(p);\n\n      new_fifo_name = concat(ctx, fifo_name, \"local_out\");\n      p = isl_printer_start_line(p);\n      if (hls->target == XILINX_HW)\n        p = print_fifo_rw_xilinx(p, new_fifo_name, 0);\n      else if (hls->target == TAPA_HW)\n        p = print_fifo_rw_tapa(p, new_fifo_name, 0);\n      else if (hls->target == INTEL_HW)\n        p = print_fifo_rw_intel(p, new_fifo_name, 0);          \n      else if (hls->target == CATAPULT_HW)\n        p = print_fifo_rw_catapult(p, new_fifo_name, 0);          \n\n      p = isl_printer_print_str(p, \"fifo_data);\");      \n      p = isl_printer_end_line(p);\n      free(new_fifo_name);\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");            \n    } else {\n      char *new_fifo_name;\n\n      if (hls->target == INTEL_HW)\n        p = print_str_new_line(p, \"#pragma loop_coalesce\");\n      else if (hls->target == CATAPULT_HW)\n        p = print_str_new_line(p, \"#pragma hls_pipeline_init_interval 1\");\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n      p = isl_printer_print_int(p, total_bound / data_pack_out);\n      p = isl_printer_print_str(p, \"; i++) {\");\n      p = isl_printer_end_line(p);\n\n      if (hls->target == XILINX_HW || hls->target == TAPA_HW)\n        p = print_str_new_line(p, \"#pragma HLS PIPELINE II=1\");\n\n      p = isl_printer_indent(p, 2);\n      p = isl_printer_start_line(p);\n      p = autosa_print_array_type_with_lane(p, module->io_groups[0]->array, data_pack_in);      \n      p = isl_printer_print_str(p, \" fifo_data;\");\n      p = isl_printer_end_line(p);\n\n      new_fifo_name = concat(ctx, fifo_name, \"local_in\");\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"fifo_data = \");\n      if (hls->target == XILINX_HW)\n        p = print_fifo_rw_xilinx(p, new_fifo_name, 1);\n      else if (hls->target == TAPA_HW)\n        p = print_fifo_rw_tapa(p, new_fifo_name, 1);\n      else if (hls->target == INTEL_HW)\n        p = print_fifo_rw_intel(p, new_fifo_name, 1);\n      else if (hls->target == CATAPULT_HW)\n        p = print_fifo_rw_catapult(p, new_fifo_name, 1);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      if (axi_stream) {\n        //char *fifo_name;\n        //isl_printer *p_str;\n        //p_str = isl_printer_to_str(ctx);\n        //p_str = isl_printer_print_str(p_str,\"fifo_\");\n        //p_str = isl_printer_print_str(p_str, module->io_groups[0]->array->name);\n        //fifo_name = isl_printer_get_str(p_str);\n        //isl_printer_free(p_str);\n        \n        if (hls->target == XILINX_HW)\n          p = print_fifo_rw_xilinx(p, fifo_name, 0);\n        else if (hls->target == TAPA_HW)\n          p = print_fifo_rw_tapa(p, fifo_name, 0);\n        else if (hls->target == INTEL_HW)\n          p = print_fifo_rw_intel(p, fifo_name, 0);\n        else if (hls->target == CATAPULT_HW)\n          p = print_fifo_rw_catapult(p, fifo_name, 0);\n        p = isl_printer_print_str(p, \"fifo_data);\");\n        p = isl_printer_print_str(p, \";\");\n\n        //free(fifo_name);        \n      } else {\n        p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n        p = isl_printer_print_str(p, \"[i] = fifo_data;\");\n      }\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n\n      free(new_fifo_name);\n    }\n  } else {    \n    if (module->in) {\n      char *new_fifo_name;\n\n      /* [type] fifo_data; */\n      p = isl_printer_start_line(p);      \n      if (is_sparse)\n        p = autosa_print_array_type_with_lane_sparse(p, module->io_groups[0]->array, data_pack_out);\n      else\n        p = autosa_print_array_type_with_lane(p, module->io_groups[0]->array, data_pack_out);\n      p = isl_printer_print_str(p, \" fifo_data;\");\n      p = isl_printer_end_line(p);\n\n      /* [type2] mem_data; */\n      p = isl_printer_start_line(p);\n      p = autosa_print_array_type_with_lane(p, module->io_groups[0]->array, data_pack_in);\n      p = isl_printer_print_str(p, \" mem_data;\");\n      p = isl_printer_end_line(p);\n      \n      if (hls->target == XILINX_HW) {\n        if (data_pack_out == 1 && !is_sparse) {\n          /* union {unsigned int ui; [type] ut;} u; */\n          p = isl_printer_start_line(p);        \n          p = isl_printer_print_str(p, \"union {unsigned int ui; \");\n          p = isl_printer_print_str(p, module->io_groups[0]->array->type);\n          p = isl_printer_print_str(p, \" ut;} u;\");        \n          p = isl_printer_end_line(p);\n        }        \n          \n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n        if (is_sparse)\n          p = isl_printer_print_int(p, total_bound / eff_compress_ratio / data_pack_in);\n        else\n          p = isl_printer_print_int(p, total_bound / data_pack_in);\n        p = isl_printer_print_str(p, \"; i++) {\");\n        p = isl_printer_end_line(p);\n            \n        p = print_str_new_line(p, \"#pragma HLS PIPELINE II=1\");            \n        p = isl_printer_indent(p, 2);\n  \n        /* mem_data = array[]; */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"mem_data = \");\n        if (axi_stream) {\n          //char *fifo_name;\n          //isl_printer *p_str;\n          //p_str = isl_printer_to_str(ctx);\n          //p_str = isl_printer_print_str(p_str,\"fifo_\");\n          //p_str = isl_printer_print_str(p_str, module->io_groups[0]->array->name);\n          //fifo_name = isl_printer_get_str(p_str);\n          //isl_printer_free(p_str);\n\n          if (hls->target == XILINX_HW)\n            p = print_fifo_rw_xilinx(p, fifo_name, 1);\n          else if (hls->target == TAPA_HW)\n            p = print_fifo_rw_tapa(p, fifo_name, 1);\n          else if (hls->target == INTEL_HW)\n            p = print_fifo_rw_intel(p, fifo_name, 1);\n          else if (hls->target == CATAPULT_HW)\n            p = print_fifo_rw_catapult(p, fifo_name, 1);\n          p = isl_printer_print_str(p, \";\");\n\n          //free(fifo_name);\n        } else {\n          p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n          p = isl_printer_print_str(p, \"[i];\");\n        }\n        p = isl_printer_end_line(p);\n  \n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int p = 0; p < \");\n        if (is_sparse)\n          p = isl_printer_print_int(p, data_pack_in / (n_nzero + n_meta_data) / data_pack_out);\n        else\n          p = isl_printer_print_int(p, data_pack_in / data_pack_out);\n        p = isl_printer_print_str(p, \"; p++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n  \n        if (is_sparse) {\n          /* ap_uint<...> mem_data_tmp = mem_data(...); */\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"ap_uint<\");\n          p = isl_printer_print_int(p, ele_size * (n_nzero + n_meta_data) * 8 * data_pack_out);\n          p = isl_printer_print_str(p, \"> mem_data_tmp = mem_data(\");\n          p = isl_printer_print_int(p, ele_size * (n_nzero + n_meta_data) * 8 * data_pack_out - 1);\n          p = isl_printer_print_str(p, \", 0);\");\n          p = isl_printer_end_line(p);\n\n          /* mem_data = mem_data >> ...; */\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"mem_data = mem_data >> \");\n          p = isl_printer_print_int(p, ele_size * (n_nzero + n_meta_data) * 8 * data_pack_out);\n          p = isl_printer_print_str(p, \";\");\n          p = isl_printer_end_line(p);\n\n          /* fifo_data.d = ... */\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"fifo_data.d = (\");\n          for (int n = data_pack_out - 1; n >= 0; n--) {\n            p = isl_printer_print_str(p, \"(ap_uint<\");\n            p = isl_printer_print_int(p, ele_size * 8 * n_nzero);\n            p = isl_printer_print_str(p, \">)\");\n            p = isl_printer_print_str(p, \"mem_data_tmp(\");\n            p = isl_printer_print_int(p, n * ele_size * 8 * (n_nzero + n_meta_data) + ele_size * 8 * n_nzero - 1);\n            p = isl_printer_print_str(p, \", \");\n            p = isl_printer_print_int(p, n * ele_size * 8 * (n_nzero + n_meta_data));\n            p = isl_printer_print_str(p, \")\");\n            if (n > 0) \n              p = isl_printer_print_str(p, \", \");\n          }\n          p = isl_printer_print_str(p, \");\");\n          p = isl_printer_end_line(p);\n\n          /* fifo_data.i = ... */\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"fifo_data.i = (\");\n          for (int n = data_pack_out - 1; n >= 0; n--) {\n            p = isl_printer_print_str(p, \"(ap_uint<8>)mem_data_tmp(\");\n            p = isl_printer_print_int(p, n * ele_size * 8 * (n_nzero + n_meta_data) + ele_size * 8 * n_nzero + 8 - 1);\n            p = isl_printer_print_str(p, \", \");\n            p = isl_printer_print_int(p, n * ele_size * 8 * (n_nzero + n_meta_data) + ele_size * 8 * n_nzero);\n            p = isl_printer_print_str(p, \")\");\n            if (n > 0) \n              p = isl_printer_print_str(p, \", \");\n          }\n          p = isl_printer_print_str(p, \");\");\n          p = isl_printer_end_line(p);\n        } else {\n          /* fifo_data = mem_data(..,..); */\n          p = isl_printer_start_line(p);\n          if (data_pack_out == 1) {\n            p = isl_printer_print_str(p, \"u.ui = (unsigned int)mem_data(\");\n            p = isl_printer_print_int(p, ele_size * data_pack_out * 8 - 1);\n            p = isl_printer_print_str(p, \", 0);\");\n            p = isl_printer_end_line(p);\n\n            p = print_str_new_line(p, \"fifo_data = u.ut;\");\n          } else {\n            p = isl_printer_print_str(p, \"fifo_data = mem_data(\");\n            p = isl_printer_print_int(p, ele_size * data_pack_out * 8 - 1);\n            p = isl_printer_print_str(p, \", 0);\");\n          }\n          p = isl_printer_end_line(p);\n\n          /* mem_data = mem_data >> .. */\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"mem_data = mem_data >> \");\n          p = isl_printer_print_int(p, ele_size * data_pack_out * 8);\n          p = isl_printer_print_str(p, \";\");\n          p = isl_printer_end_line(p);\n        }\n  \n        new_fifo_name = concat(ctx, fifo_name, \"local_out\");\n        p = isl_printer_start_line(p);        \n        p = print_fifo_rw_xilinx(p, new_fifo_name, 0);\n        p = isl_printer_print_str(p, \"fifo_data);\");\n        p = isl_printer_end_line(p);\n  \n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n  \n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n\n        free(new_fifo_name);\n      } else if (hls->target == INTEL_HW) {                  \n        p = print_str_new_line(p, \"#pragma loop_coalesce\");\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n        p = isl_printer_print_int(p, total_bound / data_pack_in);\n        p = isl_printer_print_str(p, \"; i++) {\");\n        p = isl_printer_end_line(p);\n                  \n        p = isl_printer_indent(p, 2);\n  \n        /* mem_data = array[]; */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"mem_data = __burst_coalesced_load(&\");\n        p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n        p = isl_printer_print_str(p, \"[i]);\");\n        p = isl_printer_end_line(p);\n          \n        /* [type] mem_data_split[n] */\n        p = isl_printer_start_line(p);\n        p = autosa_print_array_type_with_lane(p, module->io_groups[0]->array, data_pack_out);\n        p = isl_printer_print_str(p, \" mem_data_split[\");\n        p = isl_printer_print_int(p, data_pack_in / data_pack_out);\n        p = isl_printer_print_str(p, \"];\");\n        p = isl_printer_end_line(p);\n\n        for (int i = 0; i < data_pack_in / data_pack_out; i++) {\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"mem_data_split[\");\n          p = isl_printer_print_int(p, i);\n          p = isl_printer_print_str(p, \"].data = mem_data.data.s\");\n          for (int j = i * data_pack_out; j < i * data_pack_out + data_pack_out; j++) {\n            p = isl_printer_print_str(p, vector_index[j]);\n          }\n          p = isl_printer_print_str(p, \";\");\n          p = isl_printer_end_line(p);\n        }\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int p = 0; p < \");\n        p = isl_printer_print_int(p, data_pack_in / data_pack_out);\n        p = isl_printer_print_str(p, \"; p++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n  \n        /* fifo_data = mem_data(..,..); */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"fifo_data = mem_data_split[p];\");                \n        p = isl_printer_end_line(p);\n          \n        new_fifo_name = concat(ctx, fifo_name, \"local_out\");\n        p = isl_printer_start_line(p);\n        p = print_fifo_rw_intel(p, new_fifo_name, 0);\n        p = isl_printer_print_str(p, \"fifo_data);\");\n        p = isl_printer_end_line(p);\n  \n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n  \n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n\n        free(new_fifo_name);        \n      } else if (hls->target == CATAPULT_HW) {\n        p = print_str_new_line(p, \"#pragma hls_pipeline_init_interval 1\");\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n        if (is_sparse)\n          p = isl_printer_print_int(p, total_bound / eff_compress_ratio / data_pack_in);\n        else\n          p = isl_printer_print_int(p, total_bound / data_pack_in);\n        p = isl_printer_print_str(p, \"; i++) {\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_indent(p, 2);\n\n        /* mem_data = array[]; */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"mem_data = \");\n        p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n        p = isl_printer_print_str(p, \"[i];\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int p = 0; p < \");\n        if (is_sparse)\n          p = isl_printer_print_int(p, data_pack_in / (n_nzero + n_meta_data) / data_pack_out);\n        else\n          p = isl_printer_print_int(p, data_pack_in / data_pack_out);\n        p = isl_printer_print_str(p, \"; p++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n\n        if (is_sparse) {\n          /* ap_uint<...> mem_data_tmp = mem_data(...); */\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"ac_int<\");\n          p = isl_printer_print_int(p, ele_size * (n_nzero + n_meta_data) * 8 * data_pack_out);\n          p = isl_printer_print_str(p, \", false> mem_data_tmp = mem_data.slc<\");\n          p = isl_printer_print_int(p, ele_size * (n_nzero + n_meta_data) * 8 * data_pack_out - 1);\n          p = isl_printer_print_str(p, \">(0);\");\n          p = isl_printer_end_line(p);\n\n          /* mem_data = mem_data >> ...; */\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"mem_data = mem_data >> \");\n          p = isl_printer_print_int(p, ele_size * (n_nzero + n_meta_data) * 8 * data_pack_out);\n          p = isl_printer_print_str(p, \";\");\n          p = isl_printer_end_line(p);\n\n          /* fifo_data.d = ... */\n          for (int n = 0; n < data_pack_out; n++) {\n            p = isl_printer_start_line(p);\n\n            p = isl_printer_print_str(p, \"fifo_data.d.set_slc(\");\n            p = isl_printer_print_int(p, n * ele_size * 8 * n_nzero);\n            p = isl_printer_print_str(p, \", \");\n\n            p = isl_printer_print_str(p, \"mem_data_tmp.slc<\");\n            p = isl_printer_print_int(p, n * ele_size * 8 * n_nzero);\n            p = isl_printer_print_str(p, \">(\");\n            p = isl_printer_print_int(p, n * ele_size * 8 * (n_nzero + n_meta_data));\n            p = isl_printer_print_str(p, \"));\");\n\n            p = isl_printer_end_line(p);\n          }          \n\n          /* fifo_data.i = ... */\n          for (int n = 0; n < data_pack_out; n++) {\n            p = isl_printer_start_line(p);\n            \n            p = isl_printer_print_str(p, \"fifo_data.i.set_slc(\");\n            p = isl_printer_print_int(p, 8 * n);\n            p = isl_printer_print_str(p, \", \");\n\n            p = isl_printer_print_str(p, \"mem_data_tmp.slc<8>(\");\n            p = isl_printer_print_int(p, n * ele_size * 8 * (n_nzero + n_meta_data) + ele_size * 8 * n_nzero);\n            p = isl_printer_print_str(p, \"));\");\n\n            p = isl_printer_end_line(p);\n          }          \n        } else {\n          /* fifo_data = mem_data(..,..); */\n          //p = isl_printer_start_line(p);\n          //if (data_pack_out == 1) {\n          //  p = isl_printer_print_str(p, \"u.ui = (unsigned int)mem_data(\");\n          //  p = isl_printer_print_int(p, ele_size * data_pack_out * 8 - 1);\n          //  p = isl_printer_print_str(p, \", 0);\");\n          //  p = isl_printer_end_line(p);\n\n          //  p = print_str_new_line(p, \"fifo_data = u.ut;\");\n          //} else {\n          //  p = isl_printer_print_str(p, \"fifo_data = mem_data(\");\n          //  p = isl_printer_print_int(p, ele_size * data_pack_out * 8 - 1);\n          //  p = isl_printer_print_str(p, \", 0);\");\n          //}\n          //p = isl_printer_end_line(p);\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"fifo_data = mem_data.slc<\");\n          p = isl_printer_print_int(p, ele_size * data_pack_out * 8);\n          p = isl_printer_print_str(p, \">(0);\");\n          p = isl_printer_end_line(p);\n\n          /* mem_data = mem_data >> .. */\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"mem_data = mem_data >> \");\n          p = isl_printer_print_int(p, ele_size * data_pack_out * 8);\n          p = isl_printer_print_str(p, \";\");\n          p = isl_printer_end_line(p);\n        }\n  \n        new_fifo_name = concat(ctx, fifo_name, \"local_out\");\n        p = isl_printer_start_line(p);        \n        p = print_fifo_rw_catapult(p, new_fifo_name, 0);\n        p = isl_printer_print_str(p, \"fifo_data);\");\n        p = isl_printer_end_line(p);\n  \n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n  \n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n\n        free(new_fifo_name);\n      } else if (hls->target == TAPA_HW) {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n        if (is_sparse)\n          p = isl_printer_print_int(p, total_bound / eff_compress_ratio / data_pack_in);\n        else\n          p = isl_printer_print_int(p, total_bound / data_pack_in);\n        p = isl_printer_print_str(p, \"; i++) {\");\n        p = isl_printer_end_line(p);\n\n        p = print_str_new_line(p, \"#pragma HLS PIPELINE II=1\");\n        p = isl_printer_indent(p, 2);\n\n        /* mem_data = array[]; */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"mem_data = \");\n        if (axi_stream) {\n          p = print_fifo_rw_tapa(p, fifo_name, 1);\n          p = isl_printer_print_str(p, \";\");\n        } else {\n          p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n          p = isl_printer_print_str(p, \"[i];\");\n        }\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int p = 0; p < \");\n        if (is_sparse)\n          p = isl_printer_print_int(p, data_pack_in / (n_nzero + n_meta_data) / data_pack_out);\n        else\n          p = isl_printer_print_int(p, data_pack_in / data_pack_out);\n        p = isl_printer_print_str(p, \"; p++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n\n        if (is_sparse) {\n          /* tapa::vec_t<T, size> mem_data_tmp = tapa::truncated<begin, end>(mem_data); */\n          p = isl_printer_start_line(p);\n          p = autosa_print_array_type_with_lane_sparse(p, module->io_groups[0]->array, data_pack_out);\n          p = isl_printer_print_str(p, \" mem_data_tmp = \");\n          if (data_pack_out == 1) {\n            p = isl_printer_print_str(p, \"mem_data[p];\");\n          } else {\n            p = isl_printer_print_str(p, \"tapa::truncated<\");\n            p = isl_printer_print_int(p, data_pack_out);\n            p = isl_printer_print_str(p, \">(mem_data, \");\n            p = isl_printer_print_int(p, data_pack_out);\n            p = isl_printer_print_str(p, \" * p);\");\n          }\n\n          /* fifo_data.d = ... */\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"fifo_data.d = \");\n          for (int n = 1; n < data_pack_out; n++)\n            p = isl_printer_print_str(p, \"tapa::cat(\");\n          for (int n = 0; n < data_pack_out; n++) {\n            if (n > 1) p = isl_printer_print_str(p, \")\");\n            if (n > 0) p = isl_printer_print_str(p, \", \");\n            p = isl_printer_print_str(p, \"mem_data_tmp[\");\n            p = isl_printer_print_int(p, n);\n            p = isl_printer_print_str(p, \"].d\");\n          }\n          p = isl_printer_print_str(p, \");\");\n          p = isl_printer_end_line(p);\n\n          /* fifo_data.i = ... */\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"fifo_data.i = \");\n          for (int n = 1; n < data_pack_out; n++)\n            p = isl_printer_print_str(p, \"tapa::cat(\");\n          for (int n = 0; n < data_pack_out; n++) {\n            if (n > 1) p = isl_printer_print_str(p, \")\");\n            if (n > 0) p = isl_printer_print_str(p, \", \");\n            p = isl_printer_print_str(p, \"mem_data_tmp[\");\n            p = isl_printer_print_int(p, n);\n            p = isl_printer_print_str(p, \"].i\");\n          }\n          p = isl_printer_print_str(p, \");\");\n          p = isl_printer_end_line(p);\n        } else {\n          /* fifo_data = tapa::truncated<begin, end>(mem_data); */\n          p = isl_printer_start_line(p);\n          if (data_pack_out == 1) {\n            p = print_str_new_line(p, \"fifo_data = mem_data[p];\");\n          } else {\n            p = isl_printer_print_str(p, \"fifo_data = tapa::truncated<\");\n            p = isl_printer_print_int(p, data_pack_out);\n            p = isl_printer_print_str(p, \">(mem_data, \");\n            p = isl_printer_print_int(p, data_pack_out);\n            p = isl_printer_print_str(p, \" * p);\");\n          }\n          p = isl_printer_end_line(p);\n        }\n\n        new_fifo_name = concat(ctx, fifo_name, \"local_out\");\n        p = isl_printer_start_line(p);\n        p = print_fifo_rw_xilinx(p, new_fifo_name, 0);\n        p = isl_printer_print_str(p, \"fifo_data);\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n\n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n\n        free(new_fifo_name);\n      }\n    } else {\n      char *new_fifo_name;\n      if (hls->target == INTEL_HW)\n        p = print_str_new_line(p, \"#pragma loop_coalesce\");\n      else if (hls->target == CATAPULT_HW)\n        p = print_str_new_line(p, \"#pragma hls_pipeline_init_interval 1\");\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n      p = isl_printer_print_int(p, total_bound / data_pack_in);\n      p = isl_printer_print_str(p, \"; i++) {\");\n      p = isl_printer_end_line(p);\n          \n      if (hls->target == XILINX_HW || hls->target == TAPA_HW)\n        p = print_str_new_line(p, \"#pragma HLS PIPELINE II=1\");\n      p = isl_printer_indent(p, 2);\n\n      /* [type] fifo_data; */\n      p = isl_printer_start_line(p);\n      //if (data_pack_out == 1) {\n      //  p = isl_printer_print_str(p, module->io_groups[0]->array->type);\n      //} else {\n      p = autosa_print_array_type_with_lane(p, module->io_groups[0]->array, data_pack_out);\n      //}\n      p = isl_printer_print_str(p, \" fifo_data;\");\n      p = isl_printer_end_line(p);      \n\n      /* [type2] mem_data; */\n      p = isl_printer_start_line(p);\n      p = autosa_print_array_type_with_lane(p, module->io_groups[0]->array, data_pack_in);      \n      p = isl_printer_print_str(p, \" mem_data;\");\n      p = isl_printer_end_line(p);            \n      \n      p = isl_printer_start_line(p);      \n      if (data_pack_out == 1) {\n        if (hls->target == XILINX_HW) {\n          p = isl_printer_print_str(p, \"ap_uint<\");\n          p = isl_printer_print_int(p, module->io_groups[0]->array->size * 8);\n          p = isl_printer_print_str(p, \">\");\n        } else if (hls->target == INTEL_HW) {\n          p = isl_printer_print_str(p, module->io_groups[0]->array->type);\n        } else if (hls->target == TAPA_HW) {\n          p = isl_printer_print_str(p, module->io_groups[0]->array->type);\n        } else if (hls->target == CATAPULT_HW) {\n          p = isl_printer_print_str(p, \"ac_int<\");\n          p = isl_printer_print_int(p, module->io_groups[0]->array->size * 8);\n          p = isl_printer_print_str(p, \", false>\");\n        }\n      } else {\n        p = autosa_print_array_type_with_lane(p, module->io_groups[0]->array, data_pack_out);\n      }\n      p = isl_printer_print_str(p, \" mem_data_split[\");\n      p = isl_printer_print_int(p, data_pack_in / data_pack_out);\n      p = isl_printer_print_str(p, \"];\");\n      p = isl_printer_end_line(p);\n\n      if (hls->target == XILINX_HW || hls->target == TAPA_HW)\n        p = print_str_new_line(p, \"#pragma HLS ARRAY_PARTITION variable=mem_data_split complete\");\n      \n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int p = 0; p < \");\n      p = isl_printer_print_int(p, data_pack_in / data_pack_out);\n      p = isl_printer_print_str(p, \"; p++) {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n\n      p = isl_printer_start_line(p);\n      new_fifo_name = concat(ctx, fifo_name, \"local_in\");\n      p = isl_printer_print_str(p, \"fifo_data = \");\n      if (hls->target == XILINX_HW)\n        p = print_fifo_rw_xilinx(p, new_fifo_name, 1);\n      else if (hls->target == TAPA_HW)\n        p = print_fifo_rw_tapa(p, new_fifo_name, 1);\n      else if (hls->target == INTEL_HW) \n        p = print_fifo_rw_intel(p, new_fifo_name, 1);\n      else if (hls->target == CATAPULT_HW)\n        p = print_fifo_rw_catapult(p, new_fifo_name, 1);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n\n      if (hls->target == XILINX_HW) {\n        if (data_pack_out == 1) {\n          /* union {unsigned int ui; [type] ut;} u; */\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"union {unsigned int ui; \");\n          p = isl_printer_print_str(p, module->io_groups[0]->array->type);\n          p = isl_printer_print_str(p, \" ut;} u;\");        \n          p = isl_printer_end_line(p);\n\n          p = print_str_new_line(p, \"u.ut = fifo_data;\");\n\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"mem_data_split[p] = ap_uint<\");\n          p = isl_printer_print_int(p, module->io_groups[0]->array->size * 8);\n          p = isl_printer_print_str(p, \">(u.ui);\");\n          p = isl_printer_end_line(p);\n        } else {\n          p = print_str_new_line(p, \"mem_data_split[p] = fifo_data;\");\n        }\n      } else if (hls->target == INTEL_HW) {\n        p = print_str_new_line(p, \"mem_data_split[p] = fifo_data;\");\n      } else if (hls->target == TAPA_HW) {\n        p = print_str_new_line(p, \"mem_data_split[p] = fifo_data;\");\n      } else if (hls->target == CATAPULT_HW) {\n        p = print_str_new_line(p, \"mem_data_split[p] = fifo_data;\");\n      }\n      \n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n\n      if (hls->target == XILINX_HW) {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"mem_data = (\");\n        for (int i = data_pack_in / data_pack_out - 1; i >= 0; i--) {\n          if (i < data_pack_in / data_pack_out - 1)\n            p = isl_printer_print_str(p, \", \");\n          p = isl_printer_print_str(p, \"mem_data_split[\");\n          p = isl_printer_print_int(p, i);\n          p = isl_printer_print_str(p, \"]\");\n        }\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      } else if (hls->target == INTEL_HW) {\n        int first = 1;\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"mem_data.data = \");\n        p = isl_printer_print_str(p, \"(\");\n        p = isl_printer_print_str(p, module->io_groups[0]->array->type);\n        p = isl_printer_print_int(p, data_pack_in);\n        p = isl_printer_print_str(p, \")(\");\n\n        for (int i = 0; i < data_pack_in / data_pack_out; i++) {\n          if (!first)\n            p = isl_printer_print_str(p, \", \");\n          if (data_pack_out > 1) {\n            p = isl_printer_print_str(p, \"(\");\n            p = isl_printer_print_str(p, module->io_groups[0]->array->type);\n            p = isl_printer_print_int(p, data_pack_out);\n            p = isl_printer_print_str(p, \")\");\n          }\n          p = isl_printer_print_str(p, \"mem_data_split[\");\n          p = isl_printer_print_int(p, i);\n          p = isl_printer_print_str(p, \"]\");\n          if (data_pack_out > 1)  {\n            p = isl_printer_print_str(p, \".data\");\n          }\n          first = 0;\n        }\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      } else if (hls->target == TAPA_HW) {\n        for (int n = 0; n < data_pack_in / data_pack_out; n++) {\n          if (data_pack_out == 1) {\n            p = isl_printer_start_line(p);\n            p = isl_printer_print_str(p, \"mem_data.set(\");\n            p = isl_printer_print_int(p, n);\n            p = isl_printer_print_str(p, \", mem_data_split[\");\n            p = isl_printer_print_int(p, n);\n            p = isl_printer_print_str(p, \"]);\");\n            p = isl_printer_end_line(p);\n          } else {\n            for (int j = 0; j < data_pack_out; j++) {\n              p = isl_printer_start_line(p);\n              p = isl_printer_print_str(p, \"mem_data.set(\");\n              p = isl_printer_print_int(p, n * data_pack_out + j);\n              p = isl_printer_print_str(p, \", mem_data_split[\");\n              p = isl_printer_print_int(p, n);\n              p = isl_printer_print_str(p, \"][\");\n              p = isl_printer_print_int(p, j);\n              p = isl_printer_print_str(p, \"]);\");\n              p = isl_printer_end_line(p);\n            }\n          }\n        }\n      } else if (hls->target == CATAPULT_HW) {\n        for (int i = 0; i < data_pack_in / data_pack_out; i++) {\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"mem_data.set_slc(\");\n          p = isl_printer_print_int(p, i * data_pack_out * module->io_groups[0]->array->size * 8);\n          p = isl_printer_print_str(p, \", mem_data_split[\");\n          p = isl_printer_print_int(p, i);\n          p = isl_printer_print_str(p, \"]);\");\n          p = isl_printer_end_line(p);\n        }\n      }\n\n      if (hls->target == XILINX_HW ||\n          hls->target == TAPA_HW ||\n          hls->target == CATAPULT_HW) {\n        p = isl_printer_start_line(p);\n        if (axi_stream) {\n          //char *fifo_name;\n          //isl_printer *p_str;\n          //p_str = isl_printer_to_str(ctx);\n          //p_str = isl_printer_print_str(p_str,\"fifo_\");\n          //p_str = isl_printer_print_str(p_str, module->io_groups[0]->array->name);\n          //fifo_name = isl_printer_get_str(p_str);\n          //isl_printer_free(p_str);\n\n          if (hls->target == XILINX_HW)\n            p = print_fifo_rw_xilinx(p, fifo_name, 0);\n          else if (hls->target == TAPA_HW)\n            p = print_fifo_rw_tapa(p, fifo_name, 0);\n          else if (hls->target == INTEL_HW)\n            p = print_fifo_rw_intel(p, fifo_name, 0);\n          else if (hls->target == CATAPULT_HW)\n            p = print_fifo_rw_catapult(p, fifo_name, 0);\n          p = isl_printer_print_str(p, \"mem_data);\");\n          p = isl_printer_print_str(p, \";\");\n\n          //free(fifo_name);  \n        } else {\n          p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n          p = isl_printer_print_str(p, \"[i] = mem_data;\");\n        }\n        p = isl_printer_end_line(p);\n      } else {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"__burst_coalesced_store(&\");\n        p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n        p = isl_printer_print_str(p, \"[i], mem_data);\");\n        p = isl_printer_end_line(p);\n      }\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n\n      free(new_fifo_name);\n    }\n  }\n\n  free(fifo_name);\n  return p;\n}\n\n/* Print the macros for the sparse data structure. \n */\nisl_stat print_sparse_macros(struct autosa_kernel *kernel, struct hls_info *hls)\n{\n  isl_printer *p;\n\n  p = isl_printer_to_file(kernel->ctx, hls->kernel_h);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_str_new_line(p, \"/* Sparse Macros */\");\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"#define VEC_LEN \");\n  p = isl_printer_print_int(p, kernel->vec_len);\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"#define NON_ZERO_NUM \");\n  p = isl_printer_print_int(p, kernel->n_nzero);\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"#define COMPRESS_RATIO (VEC_LEN/NON_ZERO_NUM)\");\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"#define META_DATA_NUM \");\n  p = isl_printer_print_int(p, kernel->n_meta_data);\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"#define EFF_COMPRESS_RATIO (VEC_LEN/(NON_ZERO_NUM+META_DATA_NUM))\");\n\n  p = print_str_new_line(p, \"/* Sparse Macros */\");\n  p = isl_printer_end_line(p);  \n\n  isl_printer_free(p);\n\n  if (hls->hls == 0) {\n    p = isl_printer_to_file(kernel->ctx, hls->host_h);\n    p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n    p = print_str_new_line(p, \"/* Sparse Macros */\");\n  \n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"#define VEC_LEN \");\n    p = isl_printer_print_int(p, kernel->vec_len);\n    p = isl_printer_end_line(p);\n  \n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"#define NON_ZERO_NUM \");\n    p = isl_printer_print_int(p, kernel->n_nzero);\n    p = isl_printer_end_line(p);\n  \n    p = print_str_new_line(p, \"#define COMPRESS_RATIO (VEC_LEN/NON_ZERO_NUM)\");\n  \n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"#define META_DATA_NUM \");\n    p = isl_printer_print_int(p, kernel->n_meta_data);\n    p = isl_printer_end_line(p);\n  \n    p = print_str_new_line(p, \"#define EFF_COMPRESS_RATIO (VEC_LEN/(NON_ZERO_NUM+META_DATA_NUM))\");\n  \n    p = print_str_new_line(p, \"/* Sparse Macros */\");\n    p = isl_printer_end_line(p);  \n  \n    isl_printer_free(p);    \n  }\n\n  return isl_stat_ok;\n}\n\n/* Print the arguments to a drain merge function declaration or call.\n * If \"types\" is set, then print a declaration (including the types of the arguments).\n * \n * The arguments are printed in the following order:\n * - the module identifiers\n * - the parameters\n * - the host loop iterators\n * - the arrays accssed by the module\n */\n__isl_give isl_printer *print_drain_merge_arguments(\n    __isl_take isl_printer *p,\n    struct autosa_kernel *kernel,\n    struct autosa_array_ref_group *group,\n    struct autosa_drain_merge_func *func,\n    int types,\n    int hls)\n{\n  int first = 1;\n  int nparam;\n  int n;\n  isl_space *space;\n  const char *type;\n  struct autosa_local_array_info *local_array;\n\n  type = isl_options_get_ast_iterator_type(kernel->ctx);\n  /* module identifiers */\n  const char *dims[] = {\"idx\", \"idy\", \"idz\"};\n  n = isl_id_list_n_id(func->inst_ids);\n  for (int i = 0; i < n; ++i)\n  {\n    if (!first)\n      p = isl_printer_print_str(p, \", \");\n    if (types)\n    {\n      p = isl_printer_print_str(p, type);\n      p = isl_printer_print_str(p, \" \");\n    }\n    p = isl_printer_print_str(p, dims[i]);\n\n    first = 0;\n  }\n\n  /* params */\n  space = isl_union_set_get_space(kernel->arrays);\n  nparam = isl_space_dim(space, isl_dim_param);\n  for (int i = 0; i < nparam; ++i)\n  {\n    const char *name;\n\n    name = isl_space_get_dim_name(space, isl_dim_param, i);\n\n    if (!first)\n      p = isl_printer_print_str(p, \", \");\n    if (types)\n      p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, name);\n\n    first = 0;\n  }\n  isl_space_free(space);\n\n  /* Host iters */\n  n = isl_space_dim(kernel->space, isl_dim_set);\n  for (int i = 0; i < n; ++i)\n  {\n    const char *name;\n\n    if (!first)\n      p = isl_printer_print_str(p, \", \");\n    name = isl_space_get_dim_name(kernel->space, isl_dim_set, i);\n    if (types)\n    {\n      p = isl_printer_print_str(p, type);\n      p = isl_printer_print_str(p, \" \");\n    }\n    p = isl_printer_print_str(p, name);\n\n    first = 0;\n  }\n\n  /* Arrays */\n  local_array = group->local_array;\n  if (!first)\n    p = isl_printer_print_str(p, \", \");\n  if (types)\n  {\n    if (hls)\n    {\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *\");\n    }\n    else\n    {\n      p = isl_printer_print_str(p, \"std::vector<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \", aligned_allocator<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \">> &\");\n    }\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"_to\");\n  }\n  else\n  {\n    p = isl_printer_print_str(p, \"dev_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"[0]\");\n  }\n  first = 0;\n\n  if (!first)\n    p = isl_printer_print_str(p, \", \");\n  if (types)\n  {\n    if (hls)\n    {\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *\");\n    }\n    else\n    {\n      p = isl_printer_print_str(p, \"std::vector<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \", aligned_allocator<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \">> &\");\n    }\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"_from\");\n  }\n  else\n  {\n    p = isl_printer_print_str(p, \"dev_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"[idx]\");\n  }\n  first = 0;\n\n  return p;\n}\n\nstruct print_hw_module_data\n{\n  struct hls_info *hls;\n  struct autosa_prog *prog;\n  struct autosa_hw_module *module;\n  /* Used for double buffer codegen. Modify the printed iterator prefix. */\n  const char *iterator_prefix;\n};\n\n/* Print the drained data merge functions. \n */\nisl_stat print_drain_merge_funcs(\n    struct autosa_kernel *kernel,\n    struct autosa_drain_merge_func **funcs, int n_funcs,\n    struct hls_info *hls)\n{\n  isl_printer *p;\n  isl_ctx *ctx;\n\n  if (n_funcs == 0)\n    return isl_stat_ok;\n\n  ctx = kernel->ctx;\n  if (!hls->hls)\n    p = isl_printer_to_file(kernel->ctx, hls->host_h);\n  else\n    p = isl_printer_to_file(kernel->ctx, hls->kernel_h);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  for (int i = 0; i < n_funcs; i++)\n  {\n    struct autosa_array_ref_group *group = funcs[i]->group;\n    isl_ast_print_options *print_options;\n    struct print_hw_module_data hw_data = {hls, NULL, NULL, NULL};\n\n    p = print_str_new_line(p, \"/* Helper Function */\");\n    p = isl_printer_start_line(p);\n    if (hls->hls)\n      p = isl_printer_print_str(p, \"inline \");\n    p = isl_printer_print_str(p, \"void \");\n    p = autosa_array_ref_group_print_prefix(group, p);\n    p = isl_printer_print_str(p, \"_drain_merge(\");\n    p = print_drain_merge_arguments(p, kernel, group, funcs[i], 1, hls->hls);\n    p = isl_printer_print_str(p, \"){\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n\n    p = print_str_new_line(p, \"/* Variable Declaration */\");\n    if (!hls->hls)\n      p = print_func_iterators(p, hls->host_h, funcs[i]);\n    else\n      p = print_func_iterators(p, hls->kernel_h, funcs[i]);\n    p = print_str_new_line(p, \"/* Variable Declaration */\");\n    p = isl_printer_end_line(p);\n\n    print_options = isl_ast_print_options_alloc(ctx);\n    print_options = isl_ast_print_options_set_print_user(print_options,\n                                                         &print_module_stmt, &hw_data);\n    p = isl_ast_node_print(funcs[i]->device_tree, p, print_options);\n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n    p = print_str_new_line(p, \"/* Helper Function */\");\n    p = isl_printer_end_line(p);\n  }  \n  isl_printer_free(p);\n\n  return isl_stat_ok;\n}\n\n__isl_give isl_printer *print_module_stmt(__isl_take isl_printer *p,\n                                          __isl_take isl_ast_print_options *print_options,\n                                          __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  struct autosa_kernel_stmt *stmt;\n  struct print_hw_module_data *hw_data = (struct print_hw_module_data *)(user);\n  struct autosa_hw_module *module = hw_data->module;\n\n  id = isl_ast_node_get_annotation(node);\n  stmt = (struct autosa_kernel_stmt *)isl_id_get_user(id);\n  isl_id_free(id);\n\n  isl_ast_print_options_free(print_options);\n\n  switch (stmt->type)\n  {\n    case AUTOSA_KERNEL_STMT_DOMAIN:\n      return autosa_kernel_print_domain(p, stmt);\n    case AUTOSA_KERNEL_STMT_IO:\n      return autosa_kernel_print_io(p, stmt, hw_data->hls);\n    case AUTOSA_KERNEL_STMT_IO_TRANSFER:\n    case AUTOSA_KERNEL_STMT_IO_DRAM:\n      return autosa_kernel_print_io_transfer_wrapper(\n        p, stmt, hw_data->hls, module->options->autosa->double_buffer_style == 0? hw_data->iterator_prefix : NULL);\n    //case AUTOSA_KERNEL_STMT_IO_TRANSFER:\n    //  return autosa_kernel_print_io_transfer(p, stmt, hw_data->hls, \n    //            module->options->autosa->double_buffer_style == 0?\n    //              hw_data->iterator_prefix : NULL);\n    //case AUTOSA_KERNEL_STMT_IO_DRAM:\n    //  return autosa_kernel_print_io_dram(p, stmt, hw_data->hls,\n    //      module->options->autosa->double_buffer_style == 0? hw_data->iterator_prefix : NULL);\n    case AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTER_TRANS:\n      return autosa_kernel_print_inter_trans(p, stmt, hw_data->hls);\n    case AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTRA_TRANS:\n      return autosa_kernel_print_intra_trans(p, stmt, hw_data->hls);\n    case AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTER_INTRA:\n      return autosa_kernel_print_inter_intra(p, stmt, hw_data->hls);\n    case AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTRA_INTER:\n      return autosa_kernel_print_intra_inter(p, stmt, hw_data->hls);\n    case AUTOSA_KERNEL_STMT_IO_MODULE_CALL_STATE_HANDLE:\n      return autosa_kernel_print_state_handle(p, stmt, hw_data->hls);\n    case AUTOSA_KERNEL_STMT_DRAIN_MERGE:\n      return autosa_kernel_print_drain_merge(p, stmt, hw_data->hls);\n    case AUTOSA_KERNEL_STMT_HOST_SERIALIZE:\n      return autosa_kernel_print_host_serialize(p, stmt, hw_data->hls);\n  }\n\n  return p;\n}\n\n/* Print the host serialization functions.\n */\nisl_stat print_host_serialize_funcs(\n    struct autosa_kernel *kernel,\n    struct autosa_hw_module **modules,\n    int n_modules, struct hls_info *hls)\n{\n  isl_printer *p;\n  isl_ctx *ctx;\n\n  ctx = kernel->ctx;\n  if (!hls->hls)\n    p = isl_printer_to_file(ctx, hls->host_h);\n  else\n    p = isl_printer_to_file(ctx, hls->kernel_h);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  for (int i = 0; i < n_modules; i++) {\n    struct autosa_hw_module *module = modules[i];\n    isl_ast_print_options *print_options;\n    struct print_hw_module_data hw_data = {hls, NULL, NULL, NULL};\n\n    if (module->serialize_tree) {\n      p = print_str_new_line(p, \"/* Helper Function */\");\n      p = isl_printer_start_line(p);\n      if (hls->hls)\n        p = isl_printer_print_str(p, \"inline \");\n      p = isl_printer_print_str(p, \"void \");\n      if (module->in) {\n        p = isl_printer_print_str(p, \"host_serialize_\");\n      } else {\n        p = isl_printer_print_str(p, \"host_deserialize_\");\n      }      \n      p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n      p = isl_printer_print_str(p, \"(\");      \n      p = print_host_serialize_arguments(p, kernel, module->io_groups[0], module, 1, hls->hls);\n      p = isl_printer_print_str(p, \"){\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n\n      p = print_str_new_line(p, \"/* Variable Declaration */\");\n      p = print_str_new_line(p, \"unsigned int cnt = 0;\");      \n      p = print_str_new_line(p, \"/* Variable Declaration */\");\n      p = isl_printer_end_line(p);\n\n      print_options = isl_ast_print_options_alloc(ctx);\n      print_options = isl_ast_print_options_set_print_user(print_options,\n                                                           &print_module_stmt, &hw_data);\n            \n      p = isl_ast_node_print(module->serialize_tree, p, print_options);\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n      p = print_str_new_line(p, \"/* Helper Function */\");\n      p = isl_printer_end_line(p);\n    }    \n  }\n  isl_printer_free(p);\n\n  return isl_stat_ok;\n}\n\n/* Print a user statement in the generated AST.\n * The ppcg_stmt has been attached to the node in at_each_domain.\n */\n__isl_give isl_printer *print_cpu_user(__isl_take isl_printer *p,\n\t__isl_take isl_ast_print_options *print_options,\n\t__isl_keep isl_ast_node *node, void *user)\n{\n\tstruct autosa_kernel_stmt *stmt;\n\tisl_id *id;\n\n\tid = isl_ast_node_get_annotation(node);\n\tstmt = (struct autosa_kernel_stmt *)isl_id_get_user(id);\n\tisl_id_free(id);\n\n\tp = pet_stmt_print_body(stmt->u.d.stmt->stmt, p, stmt->u.d.ref2expr);\n\n\tisl_ast_print_options_free(print_options);\n\n\treturn p;\n}\n"
  },
  {
    "path": "src/autosa_print.h",
    "content": "#ifndef _AUTOSA_PRINT_H\n#define _AUTOSA_PRINT_H\n\n#include <isl/printer.h>\n\n#include \"autosa_common.h\"\n\n/* Arrays */\n__isl_give isl_printer *autosa_array_info_print_call_argument(\n    __isl_take isl_printer *p, struct autosa_array_info *array, int n_ref, const char *prefix);\n__isl_give isl_printer *autosa_array_ref_group_print_prefix(\n    struct autosa_array_ref_group *group, __isl_take isl_printer *p);\n__isl_give isl_printer *autosa_array_ref_group_print_fifo_name(\n    struct autosa_array_ref_group *group, __isl_take isl_printer *p);\n__isl_give isl_printer *autosa_print_types(__isl_take isl_printer *p,\n                                           struct autosa_types *types, struct autosa_prog *prog);\n__isl_give isl_printer *autosa_print_local_declarations(\n    __isl_take isl_printer *p, struct autosa_prog *prog);\n__isl_give isl_printer *autosa_array_info_print_data_size(\n    __isl_take isl_printer *p, struct autosa_array_info *array);\n__isl_give isl_printer *autosa_array_info_print_size(\n    __isl_take isl_printer *p, struct autosa_array_info *array);\n__isl_give isl_printer *autosa_array_info_print_serialize_data_size(\n    __isl_take isl_printer *p, struct autosa_array_info *array);    \n__isl_give isl_printer *autosa_array_info_print_serialize_size(\n    __isl_take isl_printer *p, struct autosa_array_info *array);    \n__isl_give isl_printer *autosa_print_array_type(__isl_take isl_printer *p,\n                                                struct autosa_array_info *array);\n__isl_give isl_printer *autosa_print_array_type_with_lane(\n    __isl_take isl_printer *p,\n    struct autosa_array_info *array, int n_lane);\n__isl_give isl_printer *autosa_print_array_type_with_lane_sparse(\n    __isl_take isl_printer *p,\n    struct autosa_array_info *array, int n_lane);\n__isl_give isl_printer *autosa_array_info_print_declaration_argument(\n    __isl_take isl_printer *p, struct autosa_array_info *array, int n_lane,\n    const char *memory_space, int n_ref);\n__isl_give isl_printer *autosa_module_array_info_print_call_argument(\n    __isl_take isl_printer *p, struct polysa_array_info *array);\n__isl_give isl_printer *autosa_print_var_initialization(\n    __isl_take isl_printer *p, struct autosa_kernel_var *var, enum platform target);\n\n/* Utils */\n__isl_give isl_printer *print_str_new_line(__isl_take isl_printer *p, const char *str);\n__isl_give isl_printer *autosa_print_macros(__isl_take isl_printer *p,\n                                            __isl_keep isl_ast_node *node);\n\n/* Kernel */\n__isl_give isl_printer *print_kernel_arguments(__isl_take isl_printer *p,\n                                               struct autosa_prog *prog, struct autosa_kernel *kernel,\n                                               int types, struct hls_info *hls);\n__isl_give isl_printer *print_kernel_header(\n    __isl_take isl_printer *p, struct autosa_prog *prog, \n    struct autosa_kernel *kernel, struct hls_info *hls, int types);\n\n/* HW modules */\n__isl_give isl_printer *print_module_iterators(\n    __isl_take isl_printer *p, FILE *out, struct autosa_hw_module *module);\n__isl_give isl_printer *print_module_arguments(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog,\n    struct autosa_kernel *kernel,\n    struct autosa_hw_module *module, int types,\n    enum platform target,\n    int inter, int arb, int boundary, int serialize);\n__isl_give isl_printer *print_pe_dummy_module_arguments(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog,\n    struct autosa_kernel *kernel,\n    struct autosa_pe_dummy_module *pe_dummy_module,\n    int types,\n    enum platform target);\nvoid print_top_gen_headers(\n    struct autosa_prog *prog, struct autosa_hw_top_module *top, struct hls_info *hls);\n__isl_give isl_printer *print_top_gen_arguments(__isl_take isl_printer *p,\n                                                struct autosa_prog *prog, struct autosa_kernel *kernel, int types);\n__isl_give isl_printer *autosa_kernel_print_module_call(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct autosa_prog *prog,\n    enum platform target);\n__isl_give isl_printer *autosa_kernel_print_module_call_inst(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct autosa_prog *prog,\n    enum platform target);    \n__isl_give isl_printer *print_func_iterators(\n    __isl_take isl_printer *p,\n    FILE *out,\n    struct autosa_drain_merge_func *func);\n__isl_give isl_printer *print_serialize_counter(\n    __isl_take isl_printer *p, \n    struct autosa_hw_module *module);\n__isl_give isl_printer *print_host_serialize_arguments(\n    __isl_take isl_printer *p,\n    struct autosa_kernel *kernel,\n    struct autosa_array_ref_group *group,\n    struct autosa_hw_module *module,\n    int types,\n    int hls);    \n\n/* FIFOs */\n__isl_give isl_printer *autosa_fifo_print_declaration_arguments(\n    __isl_take isl_printer *p, struct autosa_array_ref_group *group, int n_lane,\n    const char *suffix, enum platform target, int fifo_depth, const char *direction);\n__isl_give isl_printer *autosa_fifo_print_call_argument(\n    __isl_take isl_printer *p, struct autosa_array_ref_group *group,\n    const char *suffix, enum platform target);\n__isl_give isl_printer *autosa_kernel_print_fifo_decl(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct autosa_prog *prog, struct hls_info *hls);\n\n/* Statements */\n__isl_give isl_printer *autosa_kernel_print_domain(__isl_take isl_printer *p,\n                                                   struct autosa_kernel_stmt *stmt);\n__isl_give isl_printer *autosa_kernel_print_io(__isl_take isl_printer *p,\n                                               struct autosa_kernel_stmt *stmt, struct hls_info *hls);\n__isl_give isl_printer *autosa_kernel_print_io_transfer(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct hls_info *hls, const char *iterator_prefix);\n__isl_give isl_printer *autosa_kernel_print_io_dram(__isl_take isl_printer *p,\n                                                    struct autosa_kernel_stmt *stmt, struct hls_info *hls);\n__isl_give isl_printer *autosa_kernel_print_inter_trans(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct hls_info *hls);\n__isl_give isl_printer *autosa_kernel_print_intra_trans(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct hls_info *hls);\n__isl_give isl_printer *autosa_kernel_print_intra_inter(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct hls_info *hls);\n__isl_give isl_printer *autosa_kernel_print_inter_intra(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct hls_info *hls);\n__isl_give isl_printer *autosa_kernel_print_state_handle(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct hls_info *hls);\n__isl_give isl_printer *autosa_kernel_print_drain_merge(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt, struct hls_info *hls);\n__isl_give isl_printer *autosa_kernel_print_host_serialize(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_stmt *stmt,\n    struct hls_info *hls);    \n__isl_give isl_printer *print_module_serialize_body(\n    __isl_take isl_printer *p, struct autosa_hw_module *module, struct hls_info *hls);    \n__isl_give isl_printer *print_module_stmt(__isl_take isl_printer *p,\n                                          __isl_take isl_ast_print_options *print_options,\n                                          __isl_keep isl_ast_node *node, void *user);\n__isl_give isl_printer *print_cpu_user(\n    __isl_take isl_printer *p,\n\t__isl_take isl_ast_print_options *print_options,\n\t__isl_keep isl_ast_node *node, void *user);\n\n/* Xilinx-specific */\n__isl_give isl_printer *print_fifo_type_xilinx(__isl_take isl_printer *p,\n                                               struct autosa_array_ref_group *group, int n_lane);\n__isl_give isl_printer *print_fifo_rw_xilinx(__isl_take isl_printer *p,\n                                             const char *fifo_name, int read);\n\n/* Intel-specific */\n__isl_give isl_printer *print_fifo_type_intel(__isl_take isl_printer *p,\n                                              struct autosa_array_ref_group *group, int n_lane);\n__isl_give isl_printer *print_fifo_rw_intel(__isl_take isl_printer *p,\n                                            const char *fifo_name, int read);\n\n/* Catapult-specific */\n__isl_give isl_printer *print_fifo_type_catapult(__isl_take isl_printer *p,\n                                                 struct autosa_array_ref_group *group, int n_lane);\n__isl_give isl_printer *print_fifo_rw_catapult(__isl_take isl_printer *p,\n                                               const char *fifo_name, int read);                                                 \n\n/* TAPA-specific */\n__isl_give isl_printer *print_fifo_type_tapa(__isl_take isl_printer *p,\n                                             struct autosa_array_ref_group *group,\n                                             int n_lane, int fifo_depth, const char *suffix);\n__isl_give isl_printer *print_fifo_rw_tapa(__isl_take isl_printer *p,\n                                           const char *fifo_name, int read);\n\n/* Sparse */\nisl_stat print_sparse_macros(struct autosa_kernel *kernel, struct hls_info *hls);\n\n/* Host functions */\n__isl_give isl_printer *print_drain_merge_arguments(\n    __isl_take isl_printer *p,\n    struct autosa_kernel *kernel,\n    struct autosa_array_ref_group *group,\n    struct autosa_drain_merge_func *func,\n    int types,\n    int hls);\nisl_stat print_drain_merge_funcs(\n    struct autosa_kernel *kernel,\n    struct autosa_drain_merge_func **funcs, int n_funcs,\n    struct hls_info *hls);\nisl_stat print_host_serialize_funcs(\n    struct autosa_kernel *kernel,\n    struct autosa_hw_module **modules,\n    int n_modules, struct hls_info *hls);\n\n#endif\n"
  },
  {
    "path": "src/autosa_schedule_tree.cpp",
    "content": "/* This file defines functions used to manipulate the schedule trees in AutoSA.\n */\n#include <isl/ctx.h>\n#include <isl/schedule_node.h>\n\n#include \"autosa_common.h\"\n#include \"autosa_utils.h\"\n#include \"autosa_schedule_tree.h\"\n\n/* Is \"node\" a mark node with an identifier called \"name\"?\n */\nint is_marked(__isl_keep isl_schedule_node *node, const char *name)\n{\n  isl_id *mark;\n  int has_name;\n\n  if (!node)\n    return -1;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_mark)\n    return 0;\n\n  mark = isl_schedule_node_mark_get_id(node);\n  if (!mark)\n    return -1;\n\n  has_name = !strcmp(isl_id_get_name(mark), name);\n  isl_id_free(mark);\n\n  return has_name;\n}\n\nstatic __isl_give isl_multi_val *multi_val_from_int_list(\n    __isl_take isl_space *space, int *list)\n{\n  int i, n;\n  isl_ctx *ctx;\n  isl_multi_val *mv;\n\n  if (!space)\n    return NULL;\n\n  ctx = isl_space_get_ctx(space);\n  n = isl_space_dim(space, isl_dim_set);\n  mv = isl_multi_val_zero(space);\n  for (i = 0; i < n; ++i)\n  {\n    isl_val *v;\n\n    v = isl_val_int_from_si(ctx, list[i]);\n    mv = isl_multi_val_set_val(mv, i, v);\n  }\n\n  return mv;\n}\n\n/* Construct the tile sizes from int array \"tile_size\".\n */\n__isl_give isl_multi_val *construct_band_tile_sizes(\n    __isl_keep isl_schedule_node *node, int *tile_size)\n{\n  isl_space *space;\n\n  if (!node)\n    return NULL;\n\n  space = isl_schedule_node_band_get_space(node);\n  return multi_val_from_int_list(space, tile_size);\n}\n\n/* Extract the pe_opt, space_time, sched_pos property from the band node.\n */\nstruct autosa_node_band_prop *extract_node_band_prop(__isl_keep isl_schedule_node *node)\n{\n  struct autosa_node_band_prop *prop = isl_calloc_type(\n      isl_schedule_node_get_ctx(node), struct autosa_node_band_prop);\n  prop->mupa = isl_schedule_node_band_get_partial_schedule(node);\n  prop->n_member = isl_schedule_node_band_n_member(node);\n  prop->coincident = isl_calloc_array(isl_schedule_node_get_ctx(node), int,\n                                      prop->n_member);\n  for (int i = 0; i < prop->n_member; i++)\n  {\n    prop->coincident[i] = isl_schedule_node_band_member_get_coincident(node, i);\n  }\n  prop->permutable = isl_schedule_node_band_get_permutable(node);\n  prop->space_time = isl_calloc_array(isl_schedule_node_get_ctx(node),\n                                      enum autosa_loop_type, prop->n_member);\n  prop->pe_opt = isl_calloc_array(isl_schedule_node_get_ctx(node),\n                                  enum autosa_loop_type, prop->n_member);\n  prop->sched_pos = isl_calloc_array(isl_schedule_node_get_ctx(node),\n                                     int, prop->n_member);  \n  for (int i = 0; i < prop->n_member; i++)\n  {\n    prop->space_time[i] = isl_schedule_node_band_member_get_space_time(node, i);\n    prop->pe_opt[i] = isl_schedule_node_band_member_get_pe_opt(node, i);\n    prop->sched_pos[i] = isl_schedule_node_band_member_get_sched_pos(node, i);\n    prop->iter[i] = isl_schedule_node_band_member_get_iter(node, i);\n  }  \n\n  return prop;\n}\n\nstruct autosa_node_band_prop *autosa_node_band_prop_free(\n    __isl_take struct autosa_node_band_prop *prop)\n{\n  isl_multi_union_pw_aff_free(prop->mupa);\n  free(prop->coincident);\n  free(prop->space_time);\n  free(prop->pe_opt);\n  free(prop->sched_pos);  \n\n  free(prop);\n\n  return NULL;\n}\n\n/* Examines if the \"node\" is a permutable band node. */\nisl_bool is_permutable_node(__isl_keep isl_schedule_node *node)\n{\n  if (!node)\n    return isl_bool_error;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n    return isl_bool_false;\n  if (!isl_schedule_node_band_get_permutable(node))\n    return isl_bool_false;\n  if (isl_schedule_node_band_n_member(node) < 1)\n    return isl_bool_false;\n\n  return isl_bool_true;\n}\n\n/* Examines if the node is a permutable band node. If so, \n * increase the count of permutable node.\n */\nstatic isl_bool is_permutable_node_cnt(\n    __isl_keep isl_schedule_node *node, void *user)\n{\n  isl_val *n_permutable_node = (isl_val *)(user);\n  if (!node)\n    return isl_bool_error;\n\n  if (is_permutable_node(node) == isl_bool_true)\n    n_permutable_node = isl_val_add_ui(n_permutable_node, 1);\n\n  return isl_bool_true;\n}\n\n/* Examines that if the program only contains one permutable node and there is\n * no other node beside it.\n */\nisl_bool has_single_permutable_node(__isl_keep isl_schedule *schedule)\n{\n  isl_schedule_node *root;\n  root = isl_schedule_get_root(schedule);\n  isl_val *n_permutable_node = isl_val_zero(isl_schedule_get_ctx(schedule));\n  isl_bool all_permutable_node = isl_schedule_node_every_descendant(root,\n                                                                    &is_permutable_node_cnt, n_permutable_node);\n  isl_schedule_node_free(root);\n\n  if (all_permutable_node && isl_val_is_one(n_permutable_node))\n  {\n    isl_val_free(n_permutable_node);\n    return isl_bool_true;\n  }\n  else\n  {\n    isl_val_free(n_permutable_node);\n    return isl_bool_false;\n  }\n}\n\n/* Examines if the dependence is uniform based on the partial schedule\n * in the node. We will calculate the dependence vector and examine \n * if each dimension is a constant.\n */\nisl_bool is_dep_uniform_at_node(__isl_keep isl_schedule_node *node, void *user)\n{\n  isl_basic_map *dep = (isl_basic_map *)(user);\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n    return isl_bool_true;\n\n  /* By this stage we know that if a node is a band node, it is a \n   * permutable band node to be analyzed. \n   */\n  isl_multi_union_pw_aff *p_sc = isl_schedule_node_band_get_partial_schedule(node);\n  isl_union_pw_multi_aff *contraction = isl_schedule_node_get_subtree_contraction(node);\n  p_sc = isl_multi_union_pw_aff_pullback_union_pw_multi_aff(p_sc, contraction);\n\n  isl_bool is_uniform = isl_bool_true;\n  for (int i = 0; i < isl_schedule_node_band_n_member(node); i++)\n  {\n    isl_union_pw_aff *p_sc_hyp = isl_multi_union_pw_aff_get_union_pw_aff(p_sc, i);\n    /* Obtain the schedule for the src statment. */\n    isl_space *space = isl_basic_map_get_space(dep);\n    isl_space *src_space = isl_space_domain(isl_space_copy(space));\n    isl_space *dest_space = isl_space_range(space);\n\n    isl_pw_aff *src_sc;\n    isl_pw_aff_list *p_sc_hyp_list = isl_union_pw_aff_get_pw_aff_list(p_sc_hyp);\n    for (int j = 0; j < isl_union_pw_aff_n_pw_aff(p_sc_hyp); j++)\n    {\n      isl_pw_aff *single_sc = isl_pw_aff_list_get_pw_aff(p_sc_hyp_list, j);\n      isl_space *single_sc_stmt = isl_space_domain(isl_pw_aff_get_space(single_sc));\n      if (isl_space_is_equal(src_space, single_sc_stmt))\n      {\n        isl_space_free(single_sc_stmt);\n        src_sc = single_sc;\n        break;\n      }\n      isl_pw_aff_free(single_sc);\n      isl_space_free(single_sc_stmt);\n    }\n    isl_pw_aff_list_free(p_sc_hyp_list);\n    isl_space_free(src_space);\n\n    /* Obtain the schedule for the dest statement. */\n    isl_pw_aff *dest_sc;\n    p_sc_hyp_list = isl_union_pw_aff_get_pw_aff_list(p_sc_hyp);\n    for (int j = 0; j < isl_union_pw_aff_n_pw_aff(p_sc_hyp); j++)\n    {\n      isl_pw_aff *single_sc = isl_pw_aff_list_get_pw_aff(p_sc_hyp_list, j);\n      isl_space *single_sc_stmt = isl_space_domain(isl_pw_aff_get_space(single_sc));\n      if (isl_space_is_equal(dest_space, single_sc_stmt))\n      {\n        isl_space_free(single_sc_stmt);\n        dest_sc = single_sc;\n        break;\n      }\n      isl_pw_aff_free(single_sc);\n      isl_space_free(single_sc_stmt);\n    }\n    isl_pw_aff_list_free(p_sc_hyp_list);\n    isl_space_free(dest_space);\n\n    /* Compute the dependence distance at the current hyperplane. */\n    /* Step 1: Extend the scheduling function. */\n    isl_size src_sc_dim = isl_pw_aff_dim(src_sc, isl_dim_in);\n    isl_size dest_sc_dim = isl_pw_aff_dim(dest_sc, isl_dim_in);\n    src_sc = isl_pw_aff_insert_dims(src_sc, isl_dim_in, src_sc_dim, dest_sc_dim);\n    dest_sc = isl_pw_aff_insert_dims(dest_sc, isl_dim_in, 0, src_sc_dim);\n    for (int j = 0; j < dest_sc_dim; j++)\n    {\n      isl_pw_aff_set_dim_id(src_sc, isl_dim_in, src_sc_dim + j, isl_pw_aff_get_dim_id(dest_sc, isl_dim_in, src_sc_dim + j));\n    }\n    for (int j = 0; j < src_sc_dim; j++)\n    {\n      isl_pw_aff_set_dim_id(dest_sc, isl_dim_in, j, isl_pw_aff_get_dim_id(src_sc, isl_dim_in, j));\n    }\n\n    isl_pw_aff *dis_sc = isl_pw_aff_sub(dest_sc, src_sc);\n\n    /* Step 2: Convert the basic_map into basic_set. */\n    isl_mat *eq_mat = isl_basic_map_equalities_matrix(dep,\n                                                      isl_dim_in, isl_dim_out, isl_dim_div, isl_dim_param, isl_dim_cst);\n    isl_mat *ieq_mat = isl_basic_map_inequalities_matrix(dep,\n                                                         isl_dim_in, isl_dim_out, isl_dim_div, isl_dim_param, isl_dim_cst);\n\n    isl_basic_set *dep_set = isl_basic_set_from_constraint_matrices(\n        isl_space_domain(isl_pw_aff_get_space(dis_sc)),\n        eq_mat, ieq_mat,\n        isl_dim_set, isl_dim_div, isl_dim_param, isl_dim_cst);\n\n    /* Step 3: Intersect the scheduling function with the domain. */\n    isl_pw_aff *dis = isl_pw_aff_intersect_domain(dis_sc,\n                                                  isl_set_from_basic_set(isl_basic_set_copy(dep_set)));\n\n    isl_union_pw_aff_free(p_sc_hyp);\n    isl_basic_set_free(dep_set);\n\n    /* Examine if the dependence distance is constant. */\n    if (!isl_pw_aff_is_cst(dis))\n    {\n      is_uniform = isl_bool_false;\n      isl_pw_aff_free(dis);\n      break;\n    }\n\n    isl_pw_aff_free(dis);\n  }\n\n  isl_multi_union_pw_aff_free(p_sc);\n  return is_uniform;\n}\n\n/* Apply the schedule on the dependence and check if every dimension is a constant. \n * Dep in the form of S1[]->S2[].\n */\nisl_bool is_dep_uniform(__isl_take isl_basic_map *bmap, void *user)\n{\n  isl_bool is_uniform;\n  isl_schedule *schedule = (isl_schedule *)(user);\n  isl_schedule_node *root = isl_schedule_get_root(schedule);\n  isl_ctx *ctx = isl_basic_map_get_ctx(bmap);\n\n  /* Get the full schedule and apply the schedule to both the domain and range \n   * of the dependence. Generate the set from this map, and apply a map that \n   * calculate the diff at each dimension to get the dependence vector. \n   * At last, check if the dependence vector is a constant vector.\n   */\n  isl_union_map *full_sched = isl_schedule_node_get_subtree_schedule_union_map(root);\n  isl_union_map *dep_tmp = isl_union_map_apply_domain(\n      isl_union_map_from_map(isl_map_from_basic_map(bmap)),\n      isl_union_map_copy(full_sched));\n  isl_union_map *dep = isl_union_map_apply_range(dep_tmp, full_sched);\n\n  isl_schedule_node_free(root);\n\n  isl_map *dep_map = isl_map_from_union_map(dep);\n  isl_basic_map *dep_bmap = isl_basic_map_from_map(isl_map_copy(dep_map)); // TODO\n\n  isl_set *src_dep_domain = isl_map_domain(isl_map_copy(dep_map));\n  isl_map *src_dep_domain_map = isl_set_identity(src_dep_domain);\n  isl_multi_pw_aff *src_mpa = isl_multi_pw_aff_identity(isl_map_get_space(src_dep_domain_map));\n  isl_map_free(src_dep_domain_map);\n\n  isl_set *dest_dep_domain = isl_map_range(dep_map);\n  isl_map *dest_dep_domain_map = isl_set_identity(dest_dep_domain);\n  isl_multi_pw_aff *dest_mpa = isl_multi_pw_aff_identity(isl_map_get_space(dest_dep_domain_map));\n  isl_map_free(dest_dep_domain_map);\n\n  /* Add dims */\n  isl_size src_dim = isl_multi_pw_aff_dim(src_mpa, isl_dim_in);\n  isl_size dest_dim = isl_multi_pw_aff_dim(dest_mpa, isl_dim_in);\n  src_mpa = isl_multi_pw_aff_insert_dims(src_mpa, isl_dim_in, src_dim, dest_dim);\n  dest_mpa = isl_multi_pw_aff_insert_dims(dest_mpa, isl_dim_in, 0, src_dim);\n\n  isl_multi_pw_aff *dep_dis_mpa = isl_multi_pw_aff_sub(dest_mpa, src_mpa);\n\n  /* Convert the basic map to basic_set */\n  isl_mat *eq_mat = isl_basic_map_equalities_matrix(dep_bmap,\n                                                    isl_dim_in, isl_dim_out, isl_dim_div, isl_dim_param, isl_dim_cst);\n  isl_mat *ieq_mat = isl_basic_map_inequalities_matrix(dep_bmap,\n                                                       isl_dim_in, isl_dim_out, isl_dim_div, isl_dim_param, isl_dim_cst);\n  isl_basic_set *dep_bset = isl_basic_set_from_constraint_matrices(\n      isl_space_domain(isl_multi_pw_aff_get_space(dep_dis_mpa)),\n      eq_mat, ieq_mat,\n      isl_dim_set, isl_dim_div, isl_dim_param, isl_dim_cst);\n\n  dep_dis_mpa = isl_multi_pw_aff_intersect_domain(dep_dis_mpa,\n                                                  isl_set_from_basic_set(dep_bset));\n\n  is_uniform = isl_multi_pw_aff_is_cst(dep_dis_mpa);\n\n  isl_multi_pw_aff_free(dep_dis_mpa);\n  isl_basic_map_free(dep_bmap);\n  return is_uniform;\n}\n\n/* Examine the dependences in the \"map\". If any of the dependence is non-uniform,\n * print out the detailed information.\n * Return true if all dependences are uniform.\n */\nisl_bool is_dep_uniform_wrap(__isl_keep isl_map *map, void *user)\n{\n  isl_bool is_uniform;\n  isl_basic_map_list *bmap_list = isl_map_get_basic_map_list(map);\n  for (int i = 0; i < isl_map_n_basic_map(map); i++)\n  {\n    is_uniform = is_dep_uniform(isl_basic_map_list_get_basic_map(bmap_list, i), user);\n    if (is_uniform != isl_bool_true)\n    {\n      isl_basic_map *dep_i = isl_basic_map_list_get_basic_map(bmap_list, i);\n      /* Print out the non-uniform dependence. */\n      isl_printer *p = isl_printer_to_file(isl_map_get_ctx(map), stdout);\n      p = isl_printer_print_basic_map(p, dep_i);\n      printf(\"\\n\");\n      isl_printer_free(p);\n      isl_basic_map_free(dep_i);\n\n      isl_basic_map_list_free(bmap_list);\n      return isl_bool_false;\n    }\n  }\n  isl_basic_map_list_free(bmap_list);\n  return isl_bool_true;\n}\n\n/* Examine if all flow and RAR dependences are uniform in the program. */\nisl_bool uniform_dep_check(__isl_keep isl_schedule *schedule, struct ppcg_scop *scop)\n{\n  isl_union_map *dep_rar = scop->dep_rar;\n  //DBGUMAP(stdout, dep_rar, isl_schedule_get_ctx(schedule));\n\n  isl_union_map *dep_flow = scop->dep_flow;\n\n  isl_bool all_flow_dep_uniform = isl_union_map_every_map(dep_flow, &is_dep_uniform_wrap, schedule);\n  if (all_flow_dep_uniform != isl_bool_true)\n    return isl_bool_false;\n\n  isl_bool all_rar_dep_uniform = isl_union_map_every_map(dep_rar, &is_dep_uniform_wrap, schedule);\n  if (all_rar_dep_uniform != isl_bool_true)\n    return isl_bool_false;\n\n  return isl_bool_true;\n}\n\n/* Set *depth (initialized to 0 by the caller) to the maximum\n * of the schedule depths of the leaf nodes for which this function is called.\n */\nstatic isl_bool update_depth(__isl_keep isl_schedule_node *node, void *user)\n{\n  int *depth = (int *)user;\n  int node_depth;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_leaf)\n    return isl_bool_true;\n  node_depth = isl_schedule_node_get_schedule_depth(node);\n  if (node_depth > *depth)\n    *depth = node_depth;\n\n  return isl_bool_false;\n}\n\n/* Compute the dependence distance of dependence \"dep\" under the schedule \"schedule\".\n */\n__isl_give isl_vec *get_dep_dis_at_schedule(__isl_keep isl_basic_map *dep,\n                                            __isl_keep isl_schedule *schedule)\n{\n  isl_schedule_node *root = isl_schedule_get_root(schedule);\n  isl_ctx *ctx = isl_basic_map_get_ctx(dep);\n  isl_union_map *full_sched = isl_schedule_node_get_subtree_schedule_union_map(root);\n  isl_schedule_node_free(root);\n\n  /* Extract the iterator num. */\n  int iter_num = 0;\n  isl_schedule_foreach_schedule_node_top_down(schedule, &update_depth, &iter_num);\n\n  isl_union_map *dep_sched = isl_union_map_apply_domain(isl_union_map_from_map(isl_map_from_basic_map(isl_basic_map_copy(dep))),\n                                                        isl_union_map_copy(full_sched));\n  dep_sched = isl_union_map_apply_range(dep_sched, full_sched);\n\n  isl_map *dep_map = isl_map_from_union_map(dep_sched);\n  isl_basic_map *dep_bmap = isl_basic_map_from_map(isl_map_copy(dep_map));\n\n  isl_set *src_dep_domain = isl_map_domain(isl_map_copy(dep_map));\n  isl_map *src_dep_domain_map = isl_set_identity(src_dep_domain);\n  isl_multi_pw_aff *src_mpa = isl_multi_pw_aff_identity(isl_map_get_space(src_dep_domain_map));\n  isl_map_free(src_dep_domain_map);\n\n  isl_set *dest_dep_domain = isl_map_range(dep_map);\n  isl_map *dest_dep_domain_map = isl_set_identity(dest_dep_domain);\n  isl_multi_pw_aff *dest_mpa = isl_multi_pw_aff_identity(isl_map_get_space(dest_dep_domain_map));\n  isl_map_free(dest_dep_domain_map);\n\n  /* Add dims. */\n  isl_size src_dim = isl_multi_pw_aff_dim(src_mpa, isl_dim_in);\n  isl_size dest_dim = isl_multi_pw_aff_dim(dest_mpa, isl_dim_in);\n  src_mpa = isl_multi_pw_aff_insert_dims(src_mpa, isl_dim_in, src_dim, dest_dim);\n  dest_mpa = isl_multi_pw_aff_insert_dims(dest_mpa, isl_dim_in, 0, src_dim);\n\n  isl_multi_pw_aff *dep_dis_mpa = isl_multi_pw_aff_sub(dest_mpa, src_mpa);\n\n  /* Convert the basic map to basic_set. */\n  isl_mat *eq_mat = isl_basic_map_equalities_matrix(dep_bmap,\n                                                    isl_dim_in, isl_dim_out, isl_dim_div, isl_dim_param, isl_dim_cst);\n  isl_mat *ieq_mat = isl_basic_map_inequalities_matrix(dep_bmap,\n                                                       isl_dim_in, isl_dim_out, isl_dim_div, isl_dim_param, isl_dim_cst);\n  isl_basic_set *dep_bset = isl_basic_set_from_constraint_matrices(\n      isl_space_domain(isl_multi_pw_aff_get_space(dep_dis_mpa)),\n      eq_mat, ieq_mat,\n      isl_dim_set, isl_dim_div, isl_dim_param, isl_dim_cst);\n\n  dep_dis_mpa = isl_multi_pw_aff_intersect_domain(dep_dis_mpa,\n                                                  isl_set_from_basic_set(isl_basic_set_copy(dep_bset)));\n  isl_space *space = isl_multi_pw_aff_get_space(dep_dis_mpa);\n  isl_vec *dep_dis = isl_vec_zero(ctx, isl_space_dim(space, isl_dim_out));\n  for (int i = 0; i < isl_vec_size(dep_dis); i++)\n  {\n    isl_pw_aff *pa = isl_multi_pw_aff_get_pw_aff(dep_dis_mpa, i);\n    isl_val *val = isl_pw_aff_eval(pa, isl_basic_set_sample_point(isl_basic_set_copy(dep_bset)));\n    dep_dis = isl_vec_set_element_val(dep_dis, i, val);\n  }\n\n  isl_space_free(space);\n  isl_basic_set_free(dep_bset);\n  isl_basic_map_free(dep_bmap);\n  isl_multi_pw_aff_free(dep_dis_mpa);\n\n  return dep_dis;\n}\n\n/* Compute the dependence distance vector of the dependence under the \n * partial schedule of the band node. The dependence \"dep\" is untagged.\n */\n__isl_give isl_vec *get_dep_dis_at_node(__isl_keep isl_basic_map *dep, __isl_keep isl_schedule_node *band)\n{\n  if (isl_schedule_node_get_type(band) != isl_schedule_node_band)\n    return NULL;\n\n  isl_multi_union_pw_aff *p_sc = isl_schedule_node_band_get_partial_schedule(band);\n  isl_union_pw_multi_aff *contraction = isl_schedule_node_get_subtree_contraction(band);\n  p_sc = isl_multi_union_pw_aff_pullback_union_pw_multi_aff(p_sc, contraction);\n\n  int band_w = isl_schedule_node_band_n_member(band);\n  isl_vec *dep_dis = isl_vec_zero(isl_basic_map_get_ctx(dep), band_w);\n  for (int i = 0; i < band_w; i++)\n  {\n    isl_union_pw_aff *p_sc_hyp = isl_multi_union_pw_aff_get_union_pw_aff(p_sc, i);\n    /* Obtain the schedule for the src statement. */\n    isl_space *space = isl_basic_map_get_space(dep);\n    isl_space *src_space = isl_space_domain(isl_space_copy(space));\n    isl_space *dest_space = isl_space_range(space);\n\n    isl_pw_aff *src_sc = NULL;\n    isl_pw_aff_list *p_sc_hyp_list = isl_union_pw_aff_get_pw_aff_list(p_sc_hyp);\n    for (int j = 0; j < isl_union_pw_aff_n_pw_aff(p_sc_hyp); j++)\n    {\n      isl_pw_aff *single_sc = isl_pw_aff_list_get_pw_aff(p_sc_hyp_list, j);\n      isl_space *single_sc_stmt = isl_space_domain(isl_pw_aff_get_space(single_sc));\n\n      if (isl_space_is_equal(src_space, single_sc_stmt))\n      {\n        isl_space_free(single_sc_stmt);\n        src_sc = single_sc;\n        break;\n      }\n      isl_pw_aff_free(single_sc);\n      isl_space_free(single_sc_stmt);\n    }\n    isl_pw_aff_list_free(p_sc_hyp_list);\n    isl_space_free(src_space);\n\n    /* Obtain the schedule for the dest statement. */\n    isl_pw_aff *dest_sc = NULL;\n    p_sc_hyp_list = isl_union_pw_aff_get_pw_aff_list(p_sc_hyp);\n    for (int j = 0; j < isl_union_pw_aff_n_pw_aff(p_sc_hyp); j++)\n    {\n      isl_pw_aff *single_sc = isl_pw_aff_list_get_pw_aff(p_sc_hyp_list, j);\n      isl_space *single_sc_stmt = isl_space_domain(isl_pw_aff_get_space(single_sc));\n\n      if (isl_space_is_equal(dest_space, single_sc_stmt))\n      {\n        isl_space_free(single_sc_stmt);\n        dest_sc = single_sc;\n        break;\n      }\n      isl_pw_aff_free(single_sc);\n      isl_space_free(single_sc_stmt);\n    }\n    isl_pw_aff_list_free(p_sc_hyp_list);\n    isl_space_free(dest_space);\n\n    /* Compute the dependence distance at the current hyperplane. */\n    /* Step 1: Extend the scheduling function. */\n    isl_size src_sc_dim = isl_pw_aff_dim(src_sc, isl_dim_in);\n    isl_size dest_sc_dim = isl_pw_aff_dim(dest_sc, isl_dim_in);\n    src_sc = isl_pw_aff_insert_dims(src_sc, isl_dim_in, src_sc_dim, dest_sc_dim);\n    dest_sc = isl_pw_aff_insert_dims(dest_sc, isl_dim_in, 0, src_sc_dim);\n    for (int j = 0; j < dest_sc_dim; j++)\n    {\n      isl_pw_aff_set_dim_id(src_sc, isl_dim_in, src_sc_dim + j, isl_pw_aff_get_dim_id(dest_sc, isl_dim_in, src_sc_dim + j));\n    }\n    for (int j = 0; j < src_sc_dim; j++)\n    {\n      isl_pw_aff_set_dim_id(dest_sc, isl_dim_in, j, isl_pw_aff_get_dim_id(src_sc, isl_dim_in, j));\n    }\n\n    isl_pw_aff *dis_sc = isl_pw_aff_sub(dest_sc, src_sc);\n\n    /* Step 2: Convert the basic_map into basic_set. */\n    isl_mat *eq_mat = isl_basic_map_equalities_matrix(dep,\n                                                      isl_dim_in, isl_dim_out, isl_dim_div, isl_dim_param, isl_dim_cst);\n    isl_mat *ieq_mat = isl_basic_map_inequalities_matrix(dep,\n                                                         isl_dim_in, isl_dim_out, isl_dim_div, isl_dim_param, isl_dim_cst);\n\n    isl_basic_set *dep_set = isl_basic_set_from_constraint_matrices(\n        isl_space_domain(isl_pw_aff_get_space(dis_sc)),\n        eq_mat, ieq_mat,\n        isl_dim_set, isl_dim_div, isl_dim_param, isl_dim_cst);\n\n    /* Step 3: Intersect the scheduling function with the domain. */\n    isl_pw_aff *dis = isl_pw_aff_intersect_domain(dis_sc, isl_set_from_basic_set(isl_basic_set_copy(dep_set)));\n    isl_val *val = isl_pw_aff_eval(dis, isl_basic_set_sample_point(dep_set));\n    dep_dis = isl_vec_set_element_val(dep_dis, i, val);\n\n    isl_union_pw_aff_free(p_sc_hyp);\n  }\n\n  isl_multi_union_pw_aff_free(p_sc);\n  return dep_dis;\n}\n\n/* Interchange the loop at \"level1\" and \"level2\" in the schedule node and \n * return the new schedule. */\n__isl_give isl_schedule_node *loop_interchange_at_node(\n  __isl_take isl_schedule_node *node, isl_size level1, isl_size level2)\n{\n  /* Obtain the partial schedule of the node. */\n  isl_multi_union_pw_aff *sc = isl_schedule_node_band_get_partial_schedule(node);\n\n  /* Exchange the schedule at level1 and level2. */\n  isl_multi_union_pw_aff *new_sc = isl_multi_union_pw_aff_copy(sc);\n  new_sc = isl_multi_union_pw_aff_set_union_pw_aff(new_sc, level1, isl_multi_union_pw_aff_get_union_pw_aff(sc, level2));\n  new_sc = isl_multi_union_pw_aff_set_union_pw_aff(new_sc, level2, isl_multi_union_pw_aff_get_union_pw_aff(sc, level1));\n\n  /* Insert a new schedule node with the new schedule. */\n  struct autosa_node_band_prop *prop = extract_node_band_prop(node);\n  node = isl_schedule_node_insert_partial_schedule(node, new_sc);\n\n  /* Update the properties of the new node. */\n  node = isl_schedule_node_band_set_permutable(node, 1);\n  for (int i = 0; i < isl_schedule_node_band_n_member(node); i++)\n  {\n    node = isl_schedule_node_band_member_set_coincident(node, i, prop->coincident[i]);\n  }\n  node = isl_schedule_node_band_member_set_coincident(node, level1, prop->coincident[level2]);\n  node = isl_schedule_node_band_member_set_coincident(node, level2, prop->coincident[level1]);\n  for (int i = 0; i < isl_schedule_node_band_n_member(node); i++)\n  {\n    node = isl_schedule_node_band_member_set_pe_opt(node, i, prop->pe_opt[i]);\n  }\n  node = isl_schedule_node_band_member_set_pe_opt(node, level1, prop->pe_opt[level2]);\n  node = isl_schedule_node_band_member_set_pe_opt(node, level2, prop->pe_opt[level1]);\n\n  for (int i = 0; i < isl_schedule_node_band_n_member(node); i++)\n  {\n    node = isl_schedule_node_band_member_set_space_time(node, i, prop->space_time[i]);\n  }\n  node = isl_schedule_node_band_member_set_space_time(node, level1, prop->space_time[level2]);\n  node = isl_schedule_node_band_member_set_space_time(node, level2, prop->space_time[level1]);\n\n  for (int i = 0; i < isl_schedule_node_band_n_member(node); i++)\n  {\n    node = isl_schedule_node_band_member_set_sched_pos(node, i, prop->sched_pos[i]);\n  }\n  node = isl_schedule_node_band_member_set_sched_pos(node, level1, prop->sched_pos[level2]);\n  node = isl_schedule_node_band_member_set_sched_pos(node, level2, prop->sched_pos[level1]);\n  for (int i = 0; i < isl_schedule_node_band_n_member(node); i++) \n  {\n    node = isl_schedule_node_band_member_set_iter(node, i, prop->iter[i]);    \n  }\n  node = isl_schedule_node_band_member_set_iter(node, level1, prop->iter[level2]);\n  node = isl_schedule_node_band_member_set_iter(node, level2, prop->iter[level1]);\n\n  autosa_node_band_prop_free(prop);\n\n  /* Delete the old node after the current node */\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_delete(node);\n\n  node = isl_schedule_node_parent(node);\n  isl_multi_union_pw_aff_free(sc);\n  \n  return node;\n\n//  /* Obtain the schedule from the schedule node. */\n//  isl_schedule *schedule = isl_schedule_node_get_schedule(node);\n//\n//  isl_schedule_node_free(node);\n//  isl_multi_union_pw_aff_free(sc);\n//\n//  return schedule;\n}\n\n/* Examine if the node is a permutable band node. If so,\n * since the schedule tree is visited top-down,\n * return such a node immediately.\n */\nstatic isl_bool is_outermost_permutable_node_update(\n    __isl_keep isl_schedule_node *node, void *user)\n{\n  isl_schedule_node **t_node = (isl_schedule_node **)(user);\n  if (!node)\n    return isl_bool_error;\n\n  if (is_permutable_node(node) == isl_bool_true)\n  {\n    *t_node = isl_schedule_node_copy(node);\n    return isl_bool_false;\n  }\n  else\n  {\n    return isl_bool_true;\n  }\n\n  return isl_bool_true;\n}\n\n/* Extract the outermost permutable band node from the schedule tree.\n * When there are multiple nodes at the same level, extract the first one.\n */\n__isl_give isl_schedule_node *get_outermost_permutable_node(\n    __isl_keep isl_schedule *schedule)\n{\n  isl_schedule_node *root = isl_schedule_get_root(schedule);\n  isl_schedule_node *t_node = NULL;\n  isl_schedule_node_foreach_descendant_top_down(root,\n                                                &is_outermost_permutable_node_update, &t_node);\n\n  isl_schedule_node_free(root);\n  return t_node;\n}\n\n/* Examines if the node is a permutable band node. If so,\n * since the schedule tree is visited bottom-up,\n * return the node immediately.\n */\nstatic isl_bool is_innermost_permutable_node_update(__isl_keep isl_schedule_node *node, void *user)\n{\n  isl_schedule_node **t_node = (isl_schedule_node **)(user);\n  if (!node)\n    return isl_bool_error;\n\n  if (is_permutable_node(node) == isl_bool_true)\n  {\n    /* Check if there is any other band below it. */\n    isl_schedule_node *new_node = isl_schedule_node_get_child(node, 0);\n    isl_bool no_inner_band = isl_schedule_node_every_descendant(new_node,\n                                                                &no_permutable_node, NULL);\n    if (no_inner_band)\n    {\n      if (*t_node == NULL)\n        *t_node = isl_schedule_node_copy(node);\n    }\n    isl_schedule_node_free(new_node);\n  }\n\n  return isl_bool_true;\n}\n\n/* Extract the innermost permutable band node from the schedule tree.\n * When there are multiple nodes at the same level, extract the first one.\n */\n__isl_give isl_schedule_node *get_innermost_permutable_node(__isl_keep isl_schedule *schedule)\n{\n  isl_schedule_node *root = isl_schedule_get_root(schedule);\n  isl_schedule_node *t_node = NULL;\n  isl_schedule_node_foreach_descendant_top_down(root,\n                                                &is_innermost_permutable_node_update, &t_node);\n\n  isl_schedule_node_free(root);\n  return t_node;\n}\n\n/* Tile \"band\" with tile size specified by \"sizes\".\n */\n__isl_give isl_schedule_node *tile_band(\n    __isl_take isl_schedule_node *node, __isl_take isl_multi_val *sizes)\n{\n  isl_ctx *ctx = isl_schedule_node_get_ctx(node);\n  int scale_tile;\n  int shift_point;\n\n  scale_tile = isl_options_get_tile_scale_tile_loops(ctx);\n  isl_options_set_tile_scale_tile_loops(ctx, 0);\n  shift_point = isl_options_get_tile_shift_point_loops(ctx);\n  isl_options_set_tile_shift_point_loops(ctx, 1);\n\n  node = isl_schedule_node_band_tile(node, sizes);\n\n  isl_options_set_tile_scale_tile_loops(ctx, scale_tile);\n  isl_options_set_tile_shift_point_loops(ctx, shift_point);\n\n  return node;\n}\n\n/* Tile \"band\" with tile size specified by \"sizes\".\n *\n * If the tile size at the given position, is \"-1\", the loop\n * will not be tiled. Two band nodes are generated. The first band\n * contains the tile loops and the untiled loops. The second band\n * contains the point loops.\n */\n__isl_give isl_schedule_node *autosa_tile_band(\n    __isl_take isl_schedule_node *node, __isl_keep int *sizes)\n{\n  int full_tile = 1;\n  int n;\n\n  /* Examine of the band needs to be completedly tiled. */\n  n = isl_schedule_node_band_n_member(node);\n  for (int i = 0; i < n; i++)\n  {\n    if (sizes[i] == -1)\n    {\n      full_tile = 0;\n      break;\n    }\n  }\n\n  if (full_tile)\n  {\n    isl_multi_val *tile_sizes;\n    tile_sizes = construct_band_tile_sizes(node, sizes);\n    node = tile_band(node, isl_multi_val_copy(tile_sizes));\n    /* Reset the space_time in the tile band */\n    for (int i = 0; i < n; i++)\n    {\n      node = isl_schedule_node_band_member_set_space_time(node, i, autosa_loop_time);\n    }\n    isl_multi_val_free(tile_sizes);\n  }\n  else\n  {\n    // TODO: tile on demand\n    isl_die(isl_schedule_node_get_ctx(node), isl_error_unsupported,\n            \"on-demand tiling not supported\", return node);\n  }\n\n  return node;\n}\n\n/* Given two nested nodes,\n * N1\n * |\n * N2\n * Merge them into one node.\n * N\n * The input \"node\" points to N1.\n * Return a pointer to N.\n */\nstatic __isl_give isl_schedule_node *autosa_node_merge(\n    __isl_take isl_schedule_node *node)\n{\n  if (isl_schedule_node_n_children(node) == 0 || isl_schedule_node_n_children(node) > 1)\n    return node;\n\n  isl_schedule_node *parent = node;\n  isl_schedule_node *child = isl_schedule_node_child(isl_schedule_node_copy(node), 0);\n  if (isl_schedule_node_get_type(parent) != isl_schedule_node_band ||\n      isl_schedule_node_get_type(child) != isl_schedule_node_band)\n    return node;\n\n  /* Save the node properties. */\n  struct autosa_node_band_prop *parent_prop = extract_node_band_prop(parent);\n  struct autosa_node_band_prop *child_prop = extract_node_band_prop(child);\n\n  /* Merge the partial schedules of two nodes. */\n  isl_union_pw_aff_list *upa_list = isl_union_pw_aff_list_alloc(\n      isl_schedule_node_get_ctx(node), 0);\n  isl_space *parent_space = isl_multi_union_pw_aff_get_space(parent_prop->mupa);\n  isl_space *child_space = isl_multi_union_pw_aff_get_space(child_prop->mupa);\n\n  for (int i = 0; i < parent_prop->n_member; i++)\n  {\n    isl_union_pw_aff *upa = isl_multi_union_pw_aff_get_union_pw_aff(parent_prop->mupa, i);\n    upa_list = isl_union_pw_aff_list_add(\n        upa_list, upa);\n  }\n  for (int i = 0; i < child_prop->n_member; i++)\n  {\n    isl_union_pw_aff *upa = isl_multi_union_pw_aff_get_union_pw_aff(child_prop->mupa, i);\n    upa_list = isl_union_pw_aff_list_add(\n        upa_list, upa);\n  }\n\n  isl_space *mupa_space = isl_space_add_dims(parent_space, isl_dim_set, isl_space_dim(child_space, isl_dim_set));\n  isl_space_free(child_space);\n\n  isl_multi_union_pw_aff *mupa = isl_multi_union_pw_aff_from_union_pw_aff_list(\n      mupa_space,\n      upa_list);\n\n  /* Insert one new node. */\n  node = isl_schedule_node_insert_partial_schedule(node, mupa);\n\n  /* Restore the node properties. */\n  node = isl_schedule_node_band_set_permutable(node, 1);\n  for (int i = 0; i < parent_prop->n_member; i++)\n  {\n    node = isl_schedule_node_band_member_set_coincident(\n        node, i, parent_prop->coincident[i]);\n  }\n  for (int i = 0; i < parent_prop->n_member; i++)\n  {\n    node = isl_schedule_node_band_member_set_space_time(\n        node, i, parent_prop->space_time[i]);\n    node = isl_schedule_node_band_member_set_pe_opt(\n        node, i, parent_prop->pe_opt[i]);\n    node = isl_schedule_node_band_member_set_sched_pos(\n        node, i, parent_prop->sched_pos[i]);\n    node = isl_schedule_node_band_member_set_iter(\n        node, i, parent_prop->iter[i]);\n  }\n  for (int i = 0; i < child_prop->n_member; i++)\n  {\n    node = isl_schedule_node_band_member_set_coincident(\n        node, i + parent_prop->n_member, child_prop->coincident[i]);\n  }\n  for (int i = 0; i < child_prop->n_member; i++)\n  {\n    node = isl_schedule_node_band_member_set_space_time(\n        node, i + parent_prop->n_member, child_prop->space_time[i]);\n    node = isl_schedule_node_band_member_set_pe_opt(\n        node, i + parent_prop->n_member, child_prop->pe_opt[i]);\n    node = isl_schedule_node_band_member_set_sched_pos(\n        node, i + parent_prop->n_member, child_prop->sched_pos[i]);\n    node = isl_schedule_node_band_member_set_iter(\n        node, i + parent_prop->n_member, child_prop->iter[i]);\n  }\n\n  /* Delete the old nodes. */\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_delete(node);\n  node = isl_schedule_node_delete(node);\n  node = isl_schedule_node_parent(node);\n\n  free(parent_prop->coincident);\n  free(parent_prop->pe_opt);\n  free(parent_prop->space_time);\n  free(parent_prop->sched_pos);  \n  isl_multi_union_pw_aff_free(parent_prop->mupa);\n  free(parent_prop);\n  free(child_prop->coincident);\n  free(child_prop->pe_opt);\n  free(child_prop->space_time);  \n  free(child_prop->sched_pos);  \n  isl_multi_union_pw_aff_free(child_prop->mupa);\n  free(child_prop);\n  isl_schedule_node_free(child);\n\n  return node;\n}\n\n/* Tile the loop at the \"pos\" position of the band with the size \"tile_size\".\n * The original band\n * B\n * is first splitted to\n * B1\n * |\n * p\n * |\n * B2\n * The loop p is then tiled, and four band nodes are generated.\n * B1\n * |\n * p_tile\n * |\n * B2\n * |\n * p_point\n * The first three bands are then merged together.\n * B'\n * |\n * p_point\n * A pointer to B' is returned.\n */\n__isl_give isl_schedule_node *autosa_node_band_tile_loop(\n    __isl_take isl_schedule_node *node, int tile_size, int pos)\n{\n  isl_multi_val *tile_sizes;\n  int n = isl_schedule_node_band_n_member(node);\n  int size[1];\n\n  size[0] = tile_size;\n  node = isl_schedule_node_band_split(node, pos);\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_band_split(node, 1);\n\n  tile_sizes = construct_band_tile_sizes(node, size);\n  node = tile_band(node, isl_multi_val_copy(tile_sizes));\n  isl_multi_val_free(tile_sizes);\n\n  /* Swap the order of the point band and the next band. */\n  node = isl_schedule_node_child(node, 0);\n  node = autosa_node_interchange(node);\n\n  /* Merge the first three bands. */\n  node = isl_schedule_node_parent(node);\n  node = autosa_node_merge(node);\n  node = isl_schedule_node_parent(node);\n  node = autosa_node_merge(node);\n\n  return node;\n}\n\n/* Reset the pe_opt properties of all the band opts back to default. */\n__isl_give isl_schedule_node *clear_pe_opt_prop(\n    __isl_take isl_schedule_node *node, void *user)\n{\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n  {\n    for (int i = 0; i < isl_schedule_node_band_n_member(node); i++)\n    {\n      node = isl_schedule_node_band_member_set_pe_opt(node, i,\n                                                      autosa_loop_default);\n    }\n  }\n\n  return node;\n}\n\n/* Extract the partial schedule, restore the rest band node properties from \"prop\". \n */\n__isl_give isl_schedule_node *restore_node_band_prop(\n    __isl_take isl_schedule_node *node,\n    __isl_take struct autosa_node_band_prop *prop)\n{\n  node = isl_schedule_node_band_set_permutable(node, prop->permutable);\n  for (int i = 0; i < prop->n_member; i++)\n  {\n    node = isl_schedule_node_band_member_set_coincident(node, i, prop->coincident[i]);\n  }\n  for (int i = 0; i < prop->n_member; i++)\n  {\n    node = isl_schedule_node_band_member_set_space_time(node, i, prop->space_time[i]);\n    node = isl_schedule_node_band_member_set_pe_opt(node, i, prop->pe_opt[i]);\n    node = isl_schedule_node_band_member_set_sched_pos(node, i, prop->sched_pos[i]);\n    node = isl_schedule_node_band_member_set_iter(node, i, prop->iter[i]);\n  }\n\n  free(prop->coincident);\n  free(prop->pe_opt);\n  free(prop->space_time);\n  free(prop->sched_pos);  \n  isl_multi_union_pw_aff_free(prop->mupa);\n  free(prop);\n\n  return node;\n}\n\n/* Given two nested nodes,\n * N1\n * |\n * N2\n * Interchange the two nodes to\n * N2\n * |\n * N1\n * The input \"node\" points to N1.\n * return a pointer to node N2.\n */\n__isl_give isl_schedule_node *autosa_node_interchange(\n    __isl_take isl_schedule_node *node)\n{\n  if (isl_schedule_node_n_children(node) == 0 || isl_schedule_node_n_children(node) > 1)\n  {\n    return node;\n  }\n\n  /* Save the current node. */\n  struct autosa_node_band_prop *prop = extract_node_band_prop(node);\n\n  /* Delete the current node. */\n  node = isl_schedule_node_delete(node);\n\n  /* Insert the old node. */\n  node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_insert_partial_schedule(node,\n                                                   isl_multi_union_pw_aff_copy(prop->mupa));\n\n  /* Restore the node properties. */\n  node = restore_node_band_prop(node, prop);\n  node = isl_schedule_node_parent(node);\n\n  return node;\n}\n\n/* Given two nested nodes,\n * N2\n * |\n * N1\n * Interchange the two nodes to\n * N1\n * |\n * N2\n * The input \"node\" points to N1.\n * Return a pointer to node N1.\n * Besides, currently we only support interchanging band nodes and mark nodes.\n */\n__isl_give isl_schedule_node *autosa_node_interchange_up(\n    __isl_take isl_schedule_node *node)\n{\n  enum isl_schedule_node_type t;\n  enum isl_schedule_node_type parent_t;\n  isl_schedule_node *parent_node;\n  struct autosa_node_band_prop *prop;\n  isl_id *id;\n\n  if (!isl_schedule_node_has_parent(node))\n  {\n    return node;\n  }\n  t = isl_schedule_node_get_type(node);\n  if (!(t == isl_schedule_node_band || t == isl_schedule_node_mark))\n  {\n    isl_die(isl_schedule_node_get_ctx(node), isl_error_unsupported,\n            \"only band and mark nodes are supported\", return node);\n  }\n  parent_node = isl_schedule_node_parent(isl_schedule_node_copy(node));\n  parent_t = isl_schedule_node_get_type(parent_node);\n  if (!(parent_t == isl_schedule_node_band || parent_t == isl_schedule_node_mark))\n  {\n    isl_die(isl_schedule_node_get_ctx(node), isl_error_unsupported,\n            \"only band and mark nodes are supported\", return node);\n  }\n  isl_schedule_node_free(parent_node);\n\n  /* Save the current node. */\n  if (t == isl_schedule_node_band)\n  {\n    prop = extract_node_band_prop(node);\n  }\n  else if (t == isl_schedule_node_mark)\n  {\n    id = isl_schedule_node_mark_get_id(node);\n  }\n\n  /* Delete the current node. */\n  node = isl_schedule_node_delete(node);\n\n  /* Insert the old node. */\n  node = isl_schedule_node_parent(node);\n  if (t == isl_schedule_node_band)\n  {\n    node = isl_schedule_node_insert_partial_schedule(node,\n                                                     isl_multi_union_pw_aff_copy(prop->mupa));\n    node = restore_node_band_prop(node, prop);\n  }\n  else if (t == isl_schedule_node_mark)\n  {\n    node = isl_schedule_node_insert_mark(node, id);\n  }\n\n  return node;\n}\n\n/* If the \"node\" is a permutable band node, return false.\n */\nisl_bool no_permutable_node(__isl_keep isl_schedule_node *node, void *user)\n{\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n    return isl_bool_false;\n  else\n    return isl_bool_true;\n}\n\n/* If any band member is non-parallel, return false. \n */\nisl_bool all_parallel_node(__isl_keep isl_schedule_node *node, void *user)\n{\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n  {\n    int n = isl_schedule_node_band_n_member(node);\n    for (int i = 0; i < n; i++)\n    {\n      if (!isl_schedule_node_band_member_get_coincident(node, i))\n        return isl_bool_false;\n    }\n  }\n  return isl_bool_true;\n}\n\n/* This function tests if the loops above the \"array\" mark carry any flow\n * dependence that is assoicated with the I/O group \"group\".\n */\nisl_bool is_flow_dep_carried_by_array_part_loops(__isl_keep isl_schedule *schedule,\n                                                 struct autosa_array_ref_group *group, struct autosa_kernel *kernel)\n{\n  isl_bool carried = isl_bool_false;\n  isl_schedule_node *node;\n  isl_union_map *umap;\n\n  if (!group->local_array->array_type == AUTOSA_INT_ARRAY)\n    return carried;\n  node = isl_schedule_get_root(schedule);\n  node = autosa_tree_move_down_to_array(node, kernel->core);\n  while (node && isl_schedule_node_has_parent(node))\n  {\n    if (autosa_tree_node_is_kernel(node))\n      break;\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n    {\n      umap = isl_schedule_node_band_get_partial_schedule_union_map(node);\n      for (int i = 0; i < group->n_ref; i++)\n      {\n        struct autosa_stmt_access *ref = group->refs[i];\n        for (int j = 0; j < ref->n_io_info; j++)\n        {\n          struct autosa_io_info *io_info = ref->io_info[j];\n          if (io_info->io_type == group->io_type &&\n              !isl_vec_cmp(io_info->dir, group->dir))\n          {\n            isl_map *test;\n            isl_map *schedule_dep;\n            int dim;\n            int is_parallel;\n\n            isl_union_map *dep = isl_union_map_from_map(\n                isl_map_factor_domain(\n                    isl_map_from_basic_map(isl_basic_map_copy(io_info->dep->isl_dep))));\n            dep = isl_union_map_apply_range(dep, isl_union_map_copy(umap));\n            dep = isl_union_map_apply_domain(dep, isl_union_map_copy(umap));\n            if (isl_union_map_is_empty(dep))\n            {\n              isl_union_map_free(dep);\n              break;\n            }\n            schedule_dep = isl_map_from_union_map(dep);\n            test = isl_map_universe(isl_map_get_space(schedule_dep));\n            dim = isl_schedule_node_band_n_member(node);\n            for (int n = 0; n < dim; n++)\n            {\n              test = isl_map_equate(test, isl_dim_in, n, isl_dim_out, n);\n            }\n            is_parallel = isl_map_is_subset(schedule_dep, test);\n            isl_map_free(schedule_dep);\n            isl_map_free(test);\n\n            if (!is_parallel)\n            {\n              /* Dependence is carried by the array part loops. */\n              carried = isl_bool_true;\n              break;\n            }\n          }\n        }\n      }\n      isl_union_map_free(umap);\n    }\n    node = isl_schedule_node_parent(node);\n  }\n\n  isl_schedule_node_free(node);\n  return carried;\n}\n\n/* Test if the dependence is carried by the current schedule node. */\nint is_dep_carried_by_node(__isl_keep isl_basic_map *dep, __isl_keep isl_schedule_node *node)\n{\n  if (!node || isl_schedule_node_get_type(node) != isl_schedule_node_band)\n    return -1;\n  if (isl_schedule_node_band_n_member(node) != 1)\n    return -1;\n  if (!dep)\n    return -1;\n\n  isl_union_map *umap, *umap_dep;\n  isl_map *map_dep, *test;\n  int is_carried;\n\n  umap = isl_schedule_node_band_get_partial_schedule_union_map(node);\n  umap_dep = isl_union_map_from_map(isl_map_factor_domain(isl_map_from_basic_map(isl_basic_map_copy(dep))));\n  umap_dep = isl_union_map_apply_range(umap_dep, isl_union_map_copy(umap));\n  umap_dep = isl_union_map_apply_domain(umap_dep, umap);\n  if (isl_union_map_is_empty(umap_dep)) {\n    isl_union_map_free(umap_dep);\n    return -1;\n  }\n  map_dep = isl_map_from_union_map(umap_dep);\n  test = isl_map_universe(isl_map_get_space(map_dep));\n  test = isl_map_equate(test, isl_dim_in, 0, isl_dim_out, 0);\n  is_carried = !isl_map_is_subset(map_dep, test);\n  isl_map_free(map_dep);\n  isl_map_free(test);\n  \n  return is_carried;\n}\n\nstruct insert_node_at_depth_data {\n  isl_multi_union_pw_aff *mupa;\n  struct autosa_node_band_prop *prop;\n  int depth;\n};\n\nstatic isl_bool has_inserted_mark(__isl_keep isl_schedule_node *node, void *user)\n{\n  if (is_marked(node, \"inserted\"))\n    return isl_bool_false;\n  \n  return isl_bool_true;\n}\n\nstatic __isl_give isl_schedule_node *delete_inserted_mark(__isl_take isl_schedule_node *node, void *user)\n{\n  if (is_marked(node, \"inserted\"))\n    node = isl_schedule_node_delete(node);\n  \n  return node;\n}\n\nstatic isl_bool has_band_node(__isl_keep isl_schedule_node *node, void *user)\n{\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_band)    \n    return isl_bool_false;\n  \n  return isl_bool_true;\n}\n\n/* Insert the node at the \"depth\" position. To prevent inserting the node \n * multiple times, a \"inserted\" mark will be inserted before the node.\n * After the insertion, we will delete this \"inserted\" mark.\n * This function is not complete, might have bugs.\n */\nstatic __isl_give isl_schedule_node *insert_node_at_depth(\n  __isl_take isl_schedule_node *node, void *user)\n{\n  struct insert_node_at_depth_data *data = (struct insert_node_at_depth_data *)user;\n  isl_id *id;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n    return node;\n\n  /* Examine the subtree contains the \"inserted\" mark node */\n  if (!isl_schedule_node_every_descendant(node, &has_inserted_mark, NULL)) {    \n    return node;\n  }\n\n  if (isl_schedule_node_get_schedule_depth(node) < data->depth) {\n    /* Split the node and insert at certain position. However, \n     * currently, we simply put it below the cureretn node.\n     * TODO: fix it\n     */\n    node = isl_schedule_node_child(node, 0);\n  }\n\n  if (isl_schedule_node_get_schedule_depth(node) != data->depth) {\n//#ifdef _DEBUG\n//    DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif        \n    return node;\n  }\n\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif\n\n  /* Check if the node is right under the \"latency\" node.\n   * If true, move the node to the mark node.\n   */\n  node = isl_schedule_node_parent(node);\n  if (!is_marked(node, \"latency\"))\n    node = isl_schedule_node_child(node, 0);\n  node = isl_schedule_node_parent(node);\n  if (!is_marked(node, \"simd\"))\n    node = isl_schedule_node_child(node, 0);\n\n  /* Insert the node at current position */\n  node = isl_schedule_node_insert_partial_schedule(node, isl_multi_union_pw_aff_copy(data->mupa));\n  node = isl_schedule_node_band_set_permutable(node, data->prop->permutable);\n  for (int i = 0; i < isl_schedule_node_band_n_member(node); i++) {\n    node = isl_schedule_node_band_member_set_coincident(node, i, data->prop->coincident[i]);\n    node = isl_schedule_node_band_member_set_pe_opt(node, i, data->prop->pe_opt[i]);\n    node = isl_schedule_node_band_member_set_space_time(node, i, data->prop->space_time[i]);\n    node = isl_schedule_node_band_member_set_sched_pos(node, i, data->prop->sched_pos[i]);\n    node = isl_schedule_node_band_member_set_iter(node, i, data->prop->iter[i]);\n  }\n\n  /* Insert a \"inserted\" mark */\n  id = isl_id_alloc(isl_schedule_node_get_ctx(node), \"inserted\", NULL);\n  node = isl_schedule_node_insert_mark(node, id);\n\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif\n\n  return node;\n}\n\n/* This function sinks the node to the schedule depth \"depth\". */\n__isl_give isl_schedule_node *autosa_node_sink_to_depth(\n  __isl_take isl_schedule_node *node, int depth)\n{\n  isl_multi_union_pw_aff *mupa;\n  struct autosa_node_band_prop *prop;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n    return node;\n  \n  mupa = isl_schedule_node_band_get_partial_schedule(node);\n  prop = extract_node_band_prop(node);\n  /* Delete the current node */\n  node = isl_schedule_node_delete(node);\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif  \n  struct insert_node_at_depth_data data = {mupa, prop, depth};\n  node = isl_schedule_node_map_descendant_bottom_up(node, &insert_node_at_depth, &data);\n//#ifdef _DEBUG\n//  DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n//#endif\n  /* Delete the inserted mark */\n  node = isl_schedule_node_map_descendant_bottom_up(node, &delete_inserted_mark, NULL);\n\n  autosa_node_band_prop_free(prop);\n  isl_multi_union_pw_aff_free(mupa);\n\n  return node;\n}\n\nstruct sink_node_to_mark_data {\n  isl_multi_union_pw_aff *mupa;\n  struct autosa_node_band_prop *prop;\n  const char *name;  \n  bool inserted;\n};\n\nstatic __isl_give isl_schedule_node *sink_node_to_mark(\n  __isl_take isl_schedule_node *node, void *user)\n{\n  struct sink_node_to_mark_data *data = (struct sink_node_to_mark_data *)user;\n  isl_id *id;\n  isl_schedule_node *node_tmp;  \n\n  //if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n  //  return node;\n  \n  /* Examine the subtree contains the \"inserted\" mark node */\n  if (!isl_schedule_node_every_descendant(node, &has_inserted_mark, NULL)) {    \n    return node;\n  }\n\n  //DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node));\n\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_band) {\n    /* If this is a band node, then insert it under the band node. */\n    node = isl_schedule_node_child(node, 0);\n  } else if (isl_schedule_node_get_type(node) == isl_schedule_node_leaf) {\n    /* If this is a leaf node, check:\n     * 1. There is a band node in the parent tree.\n     * 2. There is a sequence node, and there is no bands under any children.\n     * If the above criteria meet, we will skip this node because we will insert the node in the other positions. \n     */    \n    bool insert = 1;\n    node_tmp = isl_schedule_node_copy(node);\n    //DBGSCHDNODE(stdout, node_tmp, isl_schedule_node_get_ctx(node_tmp));\n    while (!autosa_tree_node_is_mark(node_tmp, \"stop\") && isl_schedule_node_has_parent(node_tmp)) {\n      node_tmp = isl_schedule_node_parent(node_tmp);\n      if (isl_schedule_node_get_type(node_tmp) == isl_schedule_node_band) {\n        insert = 0;        \n        break;\n      }\n      if (isl_schedule_node_get_type(node_tmp) == isl_schedule_node_sequence) {\n        // TODO: We haven't considered other nodes such as set yet.\n        int n_child = 0;\n        for (n_child = 0; n_child < isl_schedule_node_n_children(node_tmp); n_child++) {\n          isl_schedule_node *node_child = isl_schedule_node_child(isl_schedule_node_copy(node_tmp), n_child);\n          /* Check if there is any band node under this child node. */\n          if (!isl_schedule_node_every_descendant(node_child, &has_band_node, NULL)) {                        \n            isl_schedule_node_free(node_child);\n            break;\n          }          \n          isl_schedule_node_free(node_child);\n        }\n        if (n_child == isl_schedule_node_n_children(node_tmp)) {\n          insert = 0;          \n          break;\n        }        \n      } \n    }    \n    isl_schedule_node_free(node_tmp);\n    if (insert == 0)\n      return node;\n  } else {\n    return node;\n  }\n\n  //node = isl_schedule_node_child(node, 0);\n  /* Check if the node is under any exisiting \"name\" node.\n   * If true, move the node to the mark node.\n   */\n  int mark_cnt = 0;\n  node_tmp = isl_schedule_node_copy(node);\n  while (isl_schedule_node_has_parent(node_tmp)) {\n    node_tmp = isl_schedule_node_parent(node_tmp);\n    if (is_marked(node_tmp, data->name))\n      mark_cnt++;\n  }\n  isl_schedule_node_free(node_tmp);\n  \n  while (mark_cnt > 0) {\n    node = isl_schedule_node_parent(node);\n    if (is_marked(node, data->name))\n      mark_cnt--;\n  }\n\n  /* Insert the node at current position */\n  node = isl_schedule_node_insert_partial_schedule(node, isl_multi_union_pw_aff_copy(data->mupa));\n  node = isl_schedule_node_band_set_permutable(node, data->prop->permutable);\n  for (int i = 0; i < isl_schedule_node_band_n_member(node); i++) {\n    node = isl_schedule_node_band_member_set_coincident(node, i, data->prop->coincident[i]);\n    node = isl_schedule_node_band_member_set_pe_opt(node, i, data->prop->pe_opt[i]);\n    node = isl_schedule_node_band_member_set_space_time(node, i, data->prop->space_time[i]);\n    node = isl_schedule_node_band_member_set_sched_pos(node, i, data->prop->sched_pos[i]);\n    node = isl_schedule_node_band_member_set_iter(node, i, data->prop->iter[i]);\n  }\n\n  /* Insert a \"name\" mark */\n  id = isl_id_alloc(isl_schedule_node_get_ctx(node), data->name, NULL);\n  node = isl_schedule_node_insert_mark(node, id);\n\n  /* Insert a \"inserted\" mark */\n  id = isl_id_alloc(isl_schedule_node_get_ctx(node), \"inserted\", NULL);\n  node = isl_schedule_node_insert_mark(node, id);\n  \n  data->inserted = true;\n\n  return node;\n}\n\n/* Sink the node innermost, but above the mark name with \"name\" if set. */\n__isl_give isl_schedule_node *autosa_node_sink_to_mark(\n  __isl_take isl_schedule_node *node, const char *name)\n{\n  isl_multi_union_pw_aff *mupa;\n  struct autosa_node_band_prop *prop;\n  isl_id *id;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n    return node;\n\n  /* Insert a stop mark. */\n  id = isl_id_alloc(isl_schedule_node_get_ctx(node), \"stop\", NULL);\n  node = isl_schedule_node_insert_mark(node, id);\n  node = isl_schedule_node_child(node, 0);\n\n  mupa = isl_schedule_node_band_get_partial_schedule(node);\n  prop = extract_node_band_prop(node);\n  /* Delete the current node */\n  node = isl_schedule_node_delete(node);\n\n  struct sink_node_to_mark_data data = {mupa, prop, name, false};\n  node = isl_schedule_node_map_descendant_bottom_up(node, &sink_node_to_mark, &data);\n  if (!data.inserted) {\n    \n    /* Insert the node at current position */\n    node = isl_schedule_node_insert_partial_schedule(node, isl_multi_union_pw_aff_copy(data.mupa));\n    node = isl_schedule_node_band_set_permutable(node, data.prop->permutable);\n    for (int i = 0; i < isl_schedule_node_band_n_member(node); i++) {\n      node = isl_schedule_node_band_member_set_coincident(node, i, data.prop->coincident[i]);\n      node = isl_schedule_node_band_member_set_pe_opt(node, i, data.prop->pe_opt[i]);\n      node = isl_schedule_node_band_member_set_space_time(node, i, data.prop->space_time[i]);\n      node = isl_schedule_node_band_member_set_sched_pos(node, i, data.prop->sched_pos[i]);\n      node = isl_schedule_node_band_member_set_iter(node, i, data.prop->iter[i]);\n    }\n\n    /* Insert a \"name\" mark */\n    id = isl_id_alloc(isl_schedule_node_get_ctx(node), data.name, NULL);\n    node = isl_schedule_node_insert_mark(node, id);\n  }\n  /* Delete the \"inserted\" mark */\n  node = isl_schedule_node_map_descendant_bottom_up(node, &delete_inserted_mark, NULL);\n  \n  /* Delete the stop mark */\n  node = isl_schedule_node_parent(node);\n  node = isl_schedule_node_delete(node);\n\n  autosa_node_band_prop_free(prop);\n  isl_multi_union_pw_aff_free(mupa);\n\n  return node;\n}\n\n/* Reorder the schedule dims in the band based on the dependence distance.\n */\n__isl_give isl_schedule_node *reorder_band_by_dep_dis(__isl_take isl_schedule_node *node)\n{\n  int n = isl_schedule_node_band_n_member(node);\n  for (int i = 0; i < n; i++) {\n    for (int j = 0; j < n; j++) {\n      int sched_pos = isl_schedule_node_band_member_get_sched_pos(node, j);\n      if (sched_pos == i) {\n        /* Permute the j-th dim to i-th dim */\n        node = loop_interchange_at_node(node, j, i);\n      }\n    }\n  }\n\n  return node;\n}\n\nstatic __isl_give isl_schedule_node *band_sched_pos_setup(\n  __isl_take isl_schedule_node *node, void *user)\n{\n  if (!node)\n    return NULL;\n\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n  {\n    int n = isl_schedule_node_band_n_member(node);\n    for (int i = 0; i < n; i++) {\n      node = isl_schedule_node_band_member_set_sched_pos(node, i, i);\n    }\n  }\n\n  return node;\n}\n\n/* Set up the sched_pos properties.\n */\n__isl_give isl_schedule_node *sched_pos_setup(__isl_take isl_schedule_node *node)\n{\n    node = isl_schedule_node_map_descendant_bottom_up(node,\n                                                      &band_sched_pos_setup, NULL);\n\n//#ifdef _DEBUG\n//    DBGSCHDNODE(stdout, node, isl_schedule_node_get_ctx(node))    \n//#endif\n    return node;\n}\n\n/* Check if the band is single dimension and the schedule value is a constant.\n * Return the constant value, or -1.\n */\nint get_band_single_schedule_val(__isl_keep isl_schedule_node *node)\n{\n  isl_union_map *umap;\n  isl_union_set *domain;\n  isl_set *set;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n    return -1;\n  if (isl_schedule_node_band_n_member(node) != 1)\n    return -1;\n  \n  umap = isl_schedule_node_band_get_partial_schedule_union_map(node);\n  domain = isl_schedule_node_get_domain(node);\n  umap = isl_union_map_intersect_domain(umap, domain);\n  domain = isl_union_map_range(umap);\n  set = isl_set_from_union_set(domain);\n  if (isl_set_is_singleton(set)) {\n    isl_val *val;    \n    int ret;\n    val = isl_set_plain_get_val_if_fixed(set, isl_dim_set, 0);    \n    ret = isl_val_get_num_si(val);    \n    isl_set_free(set);\n    isl_val_free(val);\n    return ret;\n  } else {\n    isl_set_free(set);\n    return -1;\n  }\n}\n\n/* Compute the prefix schedule of the current node and check if the last \n * schedule dimension only contains single values. If so, return the value.\n */\nint get_last_sched_dim_val(__isl_keep isl_schedule_node *node)\n{\n  isl_union_map *prefix;\n  isl_set *range;\n\n  prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n  range = isl_set_from_union_set(isl_union_map_range(prefix));  \n\n  if (isl_set_dim(range, isl_dim_set) > 1)\n    range = isl_set_project_out(range, isl_dim_set, 0, isl_set_dim(range, isl_dim_set) - 1);  \n\n  range = isl_set_coalesce(range);\n  if (isl_set_is_singleton(range)) {\n    isl_val *val;\n    int ret;\n    val = isl_set_plain_get_val_if_fixed(range, isl_dim_set, 0);\n    if (isl_val_is_nan(val)) {\n      isl_set_free(range);\n      isl_val_free(val);\n      return -1;\n    }    \n    ret = isl_val_get_num_si(val);    \n    isl_set_free(range);\n    isl_val_free(val);\n    return ret;\n  } else {\n    isl_set_free(range);\n    return -1;\n  }\n}\n\n/* Mark all dimensions in the current band node atomic.\n */\nstatic __isl_give isl_schedule_node *atomic(__isl_take isl_schedule_node *node)\n{\n  return ppcg_set_schedule_node_type(node, isl_ast_loop_atomic);\n}\n\n/* Mark \"node\" atomic, if it is a band node.\n * Do the same for all ancestors.\n * Return a pointer to \"node\" (in the updated schedule tree).\n */\n__isl_give isl_schedule_node *autosa_atomic_ancestors(\n  __isl_take isl_schedule_node *node)\n{\n  int pos;\n\n  if (!node)\n    return NULL;\n  if (!isl_schedule_node_has_parent(node))\n    return node;\n\n  pos = isl_schedule_node_get_child_position(node);\n  node = isl_schedule_node_parent(node);\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n    node = atomic(node);\n  node = autosa_atomic_ancestors(node);\n  node = isl_schedule_node_child(node, pos);\n\n  return node;\n}\n\n/* Examines if the current schedule node is a io mark at the level \"io_level\".\n * Specifically, the io mark at the level \"io_level\" has the name as \"io_L[io_level]\".\n */\nisl_bool isl_schedule_node_is_io_mark(__isl_keep isl_schedule_node *node, int io_level)\n{\n  isl_id *mark;\n  const char *name;\n  isl_printer *p;\n  char *io_mark;\n\n  if (!node)\n    return isl_bool_error;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_mark)\n    return isl_bool_false;\n\n  mark = isl_schedule_node_mark_get_id(node);\n  if (!mark)\n    return isl_bool_error;\n\n  name = isl_id_get_name(mark);\n  p = isl_printer_to_str(isl_schedule_node_get_ctx(node));\n  p = isl_printer_print_str(p, \"io_L\");\n  p = isl_printer_print_int(p, io_level);\n  io_mark = isl_printer_get_str(p);\n  p = isl_printer_free(p);\n  isl_id_free(mark);\n  if (!strcmp(name, io_mark))\n  {\n    free(io_mark);\n    return isl_bool_true;\n  }\n  else\n  {\n    free(io_mark);\n    return isl_bool_false;\n  }\n}\n\n/* Examine if the \"node\" is under the \"simd\" mark. \n */\nint is_node_under_simd(__isl_keep isl_schedule_node *node)\n{\n  isl_schedule_node *cur_node;\n\n  cur_node = isl_schedule_node_copy(node);\n  while (isl_schedule_node_has_parent(cur_node))\n  {\n    if (isl_schedule_node_get_type(cur_node) == isl_schedule_node_mark)\n    {\n      isl_id *id = isl_schedule_node_mark_get_id(cur_node);\n      if (!strcmp(isl_id_get_name(id), \"simd\"))\n      {\n        isl_id_free(id);\n        isl_schedule_node_free(cur_node);\n        return 1;\n      }\n      isl_id_free(id);\n    }\n    cur_node = isl_schedule_node_parent(cur_node);\n  }\n\n  isl_schedule_node_free(cur_node);\n\n  return 0;\n}\n\n/* Examine if the \"node\" is under the \"latency\" mark. */\nint is_node_under_latency(__isl_keep isl_schedule_node *node)\n{\n  isl_schedule_node *cur_node;\n\n  cur_node = isl_schedule_node_copy(node);\n  while (isl_schedule_node_has_parent(cur_node))\n  {\n    if (isl_schedule_node_get_type(cur_node) == isl_schedule_node_mark)\n    {\n      isl_id *id = isl_schedule_node_mark_get_id(cur_node);\n      if (!strcmp(isl_id_get_name(id), \"latency\"))\n      {\n        isl_id_free(id);\n        isl_schedule_node_free(cur_node);\n        return 1;\n      }\n      isl_id_free(id);\n    }\n    cur_node = isl_schedule_node_parent(cur_node);\n  }\n\n  isl_schedule_node_free(cur_node);\n\n  return 0;\n}\n\n/* Compute a box hull of the time domain of the schedule node, and return the \n * box dimensions in an array.\n */\nint *extract_band_upper_bounds(__isl_keep isl_schedule_node *node)\n{\n  isl_union_map *umap;\n  isl_union_set *uset;\n  isl_map *map;  \n  isl_set *set;\n  int *ubs;\n  int n;\n\n  umap = isl_schedule_node_band_get_partial_schedule_union_map(node);\n  uset = isl_schedule_node_get_domain(node);\n  umap = isl_union_map_intersect_domain(umap, uset);\n  uset = isl_union_map_range(umap);\n  set = isl_set_from_union_set(uset);\n\n  n = isl_schedule_node_band_n_member(node);\n  ubs = (int *)malloc(n * sizeof(int));\n  for (int i = 0; i < n; i++) {\n    ubs[i] = compute_set_max(set, i) + 1;\n  }\n  isl_set_free(set);\n\n  return ubs;\n}\n\n/* Return an isl_multi_aff, with as elements the parameters in \"space\"\n * that have the names specified by the elements in \"names\".\n * If (some of) these parameters do not already appear in \"space\",\n * then they are added first.\n */\nstatic __isl_give isl_multi_aff *parameter_vector(__isl_take isl_space *space,\n                                                  __isl_keep isl_id_list *names)\n{\n  int i, n;\n  isl_local_space *ls;\n  isl_multi_aff *ma;\n\n  if (!names)\n    space = isl_space_free(space);\n\n  n = isl_id_list_n_id(names);\n  for (i = 0; i < n; ++i)\n  {\n    int pos;\n    isl_id *id;\n\n    id = isl_id_list_get_id(names, i);\n    pos = isl_space_find_dim_by_id(space, isl_dim_param, id);\n    if (pos >= 0)\n    {\n      isl_id_free(id);\n      continue;\n    }\n    pos = isl_space_dim(space, isl_dim_param);\n    space = isl_space_add_dims(space, isl_dim_param, 1);\n    space = isl_space_set_dim_id(space, isl_dim_param, pos, id);\n  }\n  ma = isl_multi_aff_zero(isl_space_copy(space));\n  ls = isl_local_space_from_space(isl_space_domain(space));\n  for (i = 0; i < n; ++i)\n  {\n    int pos;\n    isl_id *id;\n    isl_aff *aff;\n\n    id = isl_id_list_get_id(names, i);\n    pos = isl_space_find_dim_by_id(space, isl_dim_param, id);\n    isl_id_free(id);\n    aff = isl_aff_var_on_domain(isl_local_space_copy(ls),\n                                isl_dim_param, pos);\n    ma = isl_multi_aff_set_aff(ma, i, aff);\n  }\n  isl_local_space_free(ls);\n\n  return ma;\n}\n\n/* Return constraints on the domain elements that equate a sequence of\n * parameters called \"names\", to the partial schedule of \"node\".\n * The number of members of the band node \"node\" should be smaller\n * than or equal to the number of elements in \"names\". \n * If it is smaller, then the first elements of \"names\" are equated to zero.\n */\n__isl_give isl_union_set *set_schedule_eq(\n    __isl_keep isl_schedule_node *node, __isl_keep isl_id_list *names)\n{\n  int n, n_zero;\n  isl_multi_union_pw_aff *mupa, *mupa2;\n  isl_multi_aff *ma;\n  isl_space *space;\n  isl_union_set *domain;\n\n  if (!node)\n    return NULL;\n  n = isl_id_list_n_id(names);\n  if (n == 0)\n    return isl_schedule_node_get_universe_domain(node);\n  n_zero = n - isl_schedule_node_band_n_member(node);\n\n  mupa = isl_schedule_node_band_get_partial_schedule(node);\n  space = isl_multi_union_pw_aff_get_space(mupa);\n  space = isl_space_params(space);\n  space = isl_space_set_from_params(space);\n  space = isl_space_add_dims(space, isl_dim_set, n_zero);\n  ma = isl_multi_aff_zero(space);\n\n  domain = isl_schedule_node_get_universe_domain(node);\n  /* Map the domain elements to \"n_zero\" zeros. */\n  mupa2 = isl_multi_union_pw_aff_multi_aff_on_domain(\n      isl_union_set_copy(domain), ma);\n  /* Build a new mupa that mupa2 -> mupa */\n  mupa = isl_multi_union_pw_aff_range_product(mupa2, mupa);\n  space = isl_multi_union_pw_aff_get_space(mupa);\n  ma = parameter_vector(space, names);\n  mupa2 = isl_multi_union_pw_aff_multi_aff_on_domain(domain, ma);\n  mupa = isl_multi_union_pw_aff_sub(mupa, mupa2);\n\n  return isl_multi_union_pw_aff_zero_union_set(mupa);\n}\n\n__isl_give isl_union_set *set_schedule_neq(\n    __isl_keep isl_schedule_node *node, __isl_keep isl_id_list *names)\n{\n  isl_union_set *uset, *domain;\n  isl_union_map *umap;\n\n  if (!node)\n    return NULL;\n  \n  uset = set_schedule_eq(node, names);\n  umap = isl_schedule_node_band_get_partial_schedule_union_map(node);\n  domain = isl_union_map_domain(umap);\n  uset = isl_union_set_subtract(domain, uset);\n\n  return uset;\n}\n\n/* Construct schedule constraints from the dependences in prog->scop and\n * the array order dependences in prog->array_order.\n *\n * If live range reordering is allowed, then we need to make sure\n * that live ranges on arrays are not run in parallel since doing\n * so would require array expansion.  We therefore add the array\n * order dependences to the coincidence dependences.  Non-zero array\n * order dependences will then prevent a schedule dimension from being\n * considered parallel.\n * Live ranges derived from scalars are allowed to be run in parallel\n * since we force the scalars to be mapped to private memory in\n * check_scalar_live_ranges.\n * If live range reordering is allowed, then the false dependences\n * are not added to the validity constraints as that would prevent\n * reordering.  Instead, the external false dependences that enforce that reads\n * from potentially live-in data precede any later write and\n * that writes of potentially live-out data follow any other earlier write\n * are added to the validity and the coincidence constraints.\n * The false dependences are still added to the proximity constraints\n * for consistency with the case where live range reordering is not allowed.\n * The coincidence constraints then consist of flow dependences,\n * external false dependences and array order dependences.\n * The independences can be filtered out from the first two sets.\n * They have already been filtered out from the array order dependences\n * on a per array basis in collect_order_dependences.\n * There is no need for a per array handling of the other two sets\n * as there should be no flow or external false dependence on local\n * variables that can be filtered out.\n */\nstatic __isl_give isl_schedule_constraints *construct_schedule_constraints(\n    struct autosa_prog *prog)\n{\n  isl_union_set *domain;\n  isl_union_map *dep_raw, *dep;\n  isl_union_map *validity, *proximity, *coincidence;\n  isl_schedule_constraints *sc;\n\n  domain = isl_union_set_copy(prog->scop->domain);\n  sc = isl_schedule_constraints_on_domain(domain);\n  sc = isl_schedule_constraints_set_context(sc,\n                                            isl_set_copy(prog->scop->context));\n  if (prog->scop->options->live_range_reordering)\n  {\n    sc = isl_schedule_constraints_set_conditional_validity(sc,\n                                                           isl_union_map_copy(prog->scop->tagged_dep_flow),\n                                                           isl_union_map_copy(prog->scop->tagged_dep_order));\n    proximity = isl_union_map_copy(prog->scop->dep_flow);\n    validity = isl_union_map_copy(proximity);\n    validity = isl_union_map_union(validity,\n                                   isl_union_map_copy(prog->scop->dep_forced));\n    proximity = isl_union_map_union(proximity,\n                                    isl_union_map_copy(prog->scop->dep_false));\n    coincidence = isl_union_map_copy(validity);\n    coincidence = isl_union_map_subtract(coincidence,\n                                         isl_union_map_copy(prog->scop->independence));\n    coincidence = isl_union_map_union(coincidence,\n                                      isl_union_map_copy(prog->array_order));\n    /* Add the RAR into the validity constraints for AutoSA. */\n    if (prog->scop->options->autosa->autosa)\n    {\n      validity = isl_union_map_union(validity,\n                                     isl_union_map_copy(prog->scop->dep_rar));\n    }\n  }\n  else\n  {\n//#ifdef _DEBUG\n//    std::cout << \"FLOW DEPs\" << std::endl;\n//    DBGUMAP(stdout, prog->scop->dep_flow, isl_union_map_get_ctx(prog->scop->dep_flow));    \n//    std::cout << \"FALSE DEPs\" << std::endl;\n//    DBGUMAP(stdout, prog->scop->dep_false, isl_union_map_get_ctx(prog->scop->dep_false));\n//    std::cout << \"RAR DEPs\" << std::endl;\n//    DBGUMAP(stdout, prog->scop->dep_rar, isl_union_map_get_ctx(prog->scop->dep_rar));\n//#endif\n    dep_raw = isl_union_map_copy(prog->scop->dep_flow);\n    dep = isl_union_map_copy(prog->scop->dep_false);\n    dep = isl_union_map_union(dep, dep_raw);    \n    dep = isl_union_map_coalesce(dep);\n    proximity = isl_union_map_copy(dep);\n    coincidence = isl_union_map_copy(dep);\n    validity = dep;\n    /* Add the RAR into the validity constraints for AutoSA. */\n    if (prog->scop->options->autosa->autosa)\n    {\n      validity = isl_union_map_union(validity,\n                                     isl_union_map_copy(prog->scop->dep_rar));\n    }\n  }\n  sc = isl_schedule_constraints_set_validity(sc, validity);\n  sc = isl_schedule_constraints_set_coincidence(sc, coincidence);\n  sc = isl_schedule_constraints_set_proximity(sc, proximity);\n\n  return sc;\n}\n\n/* Compute an appropriate schedule based on the accesses in\n * gen->read and gen->write.\n *\n * We derive schedule constraints from the dependences in gen->prog->scop\n * and then use isl to compute a schedule that has a parallel loop\n * in each tilable band.\n * During the schedule construction, some statement instances\n * may be grouped first based on the input schedule.\n */\n__isl_give isl_schedule *compute_schedule(struct autosa_gen *gen)\n{\n  isl_schedule_constraints *sc;\n  isl_schedule *schedule;\n\n  sc = construct_schedule_constraints(gen->prog);\n  schedule = gen->prog->scop->schedule;\n  schedule = ppcg_compute_schedule(sc, schedule, gen->options);\n\n  return schedule;\n}\n\n/* If the band node \"node\" has exactly one member then mark it permutable.\n */\nstatic __isl_give isl_schedule_node *band_set_permutable(\n    __isl_take isl_schedule_node *node,\n    __isl_keep isl_schedule_constraints *sc)\n{\n  if (isl_schedule_node_band_n_member(node) == 1)\n    node = isl_schedule_node_band_set_permutable(node, 1);\n\n  return node;\n}\n\n/* Return the coincidence constraints between pairs of instances\n * that are scheduled together by the ancestors of \"node\".\n * That is, select those coincidence constraints that relate\n * pairs of instances that have the same value for the prefix schedule.\n * If the schedule depth is zero, then the prefix schedule does not\n * contain any information, so we intersect domain and range\n * of the schedule constraints with the reaching domain elements instead.\n */\nstatic __isl_give isl_union_map *get_local_coincidence(\n    __isl_keep isl_schedule_node *node,\n    __isl_keep isl_schedule_constraints *sc)\n{\n  isl_union_map *coincidence;\n  isl_multi_union_pw_aff *prefix;\n  isl_union_pw_multi_aff *contraction;\n\n  coincidence = isl_schedule_constraints_get_coincidence(sc);\n  contraction = isl_schedule_node_get_subtree_contraction(node);\n  if (isl_schedule_node_get_schedule_depth(node) == 0)\n  {\n    isl_union_set *domain;\n\n    domain = isl_schedule_node_get_domain(node);\n    domain = isl_union_set_preimage_union_pw_multi_aff(domain,\n                                                       contraction);\n    coincidence = isl_union_map_intersect_domain(coincidence,\n                                                 isl_union_set_copy(domain));\n    coincidence = isl_union_map_intersect_range(coincidence,\n                                                domain);\n    return coincidence;\n  }\n\n  prefix = isl_schedule_node_get_prefix_schedule_multi_union_pw_aff(node);\n  prefix = isl_multi_union_pw_aff_pullback_union_pw_multi_aff(prefix,\n                                                              contraction);\n  return isl_union_map_eq_at_multi_union_pw_aff(coincidence, prefix);\n}\n\n/* For each member in the band node \"node\", determine whether\n * it is coincident with respect to the outer nodes and mark\n * it accordingly.\n *\n * That is, for each coincidence constraint between pairs\n * of instances that are scheduled together by the outer nodes,\n * check that domain and range are assigned the same value\n * by the band member.  This test is performed by checking\n * that imposing the same value for the band member does not\n * remove any elements from the set of coincidence constraints.\n */\nstatic __isl_give isl_schedule_node *band_set_coincident(\n    __isl_take isl_schedule_node *node,\n    __isl_keep isl_schedule_constraints *sc)\n{\n  isl_union_map *coincidence;\n  isl_union_pw_multi_aff *contraction;\n  isl_multi_union_pw_aff *partial;\n  int i, n;\n\n  coincidence = get_local_coincidence(node, sc);\n\n  partial = isl_schedule_node_band_get_partial_schedule(node);\n  contraction = isl_schedule_node_get_subtree_contraction(node);\n  partial = isl_multi_union_pw_aff_pullback_union_pw_multi_aff(partial,\n                                                               contraction);\n  n = isl_schedule_node_band_n_member(node);\n  for (i = 0; i < n; ++i)\n  {\n    isl_union_map *coincidence_i;\n    isl_union_pw_aff *upa;\n    isl_multi_union_pw_aff *partial_i;\n    int subset;\n\n    upa = isl_multi_union_pw_aff_get_union_pw_aff(partial, i);\n    partial_i = isl_multi_union_pw_aff_from_union_pw_aff(upa);\n    coincidence_i = isl_union_map_copy(coincidence);\n    coincidence_i = isl_union_map_eq_at_multi_union_pw_aff(\n        coincidence_i, partial_i);\n    subset = isl_union_map_is_subset(coincidence, coincidence_i);\n    isl_union_map_free(coincidence_i);\n\n    if (subset < 0)\n      break;\n    node = isl_schedule_node_band_member_set_coincident(node, i,\n                                                        subset);\n  }\n  if (i < n)\n    node = isl_schedule_node_free(node);\n  isl_multi_union_pw_aff_free(partial);\n  isl_union_map_free(coincidence);\n\n  return node;\n}\n\n/* If \"node\" is a band, then set its properties.\n *\n * In particular, if the band has exactly one member, then mark it permutable.\n * Mark the band members coincident based on the coincidence constraints\n * of \"sc\".\n */\nstatic __isl_give isl_schedule_node *set_band_properties(\n    __isl_take isl_schedule_node *node, void *user)\n{\n  isl_schedule_constraints *sc = (isl_schedule_constraints *)user;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n    return node;\n  if (isl_schedule_node_band_n_member(node) == 0)\n    return node;\n\n  node = band_set_permutable(node, sc);\n  node = band_set_coincident(node, sc);\n\n  return node;\n}\n\n/* Return the original schedule with all bands marked permutable and\n * all band members marked coincident based on the coincidence constraints.\n * The bands are explicitly marked permutable so that they will be considered\n * by mark_outer_permutable.\n */\nstatic __isl_give isl_schedule *determine_properties_original_schedule(\n    struct autosa_gen *gen)\n{\n  isl_schedule *schedule;\n  isl_schedule_constraints *sc;\n\n  schedule = isl_schedule_copy(gen->prog->scop->schedule);\n  sc = construct_schedule_constraints(gen->prog);\n  schedule = isl_schedule_map_schedule_node_bottom_up(schedule,\n                                                      &set_band_properties, sc);\n  isl_schedule_constraints_free(sc);\n\n  return schedule;\n}\n\n/* Compute a schedule or determine the properties of the original schedule\n * depending on the value of the \"reschedule\" option.\n */\nstatic __isl_give isl_schedule *compute_or_set_properties(void *user)\n{\n  struct autosa_gen *gen = (struct autosa_gen *)user;\n\n  if (gen->options->reschedule)\n    return compute_schedule(gen);\n  else\n    return determine_properties_original_schedule(gen);\n}\n\n/* Obtain a schedule for the scop, by reading it from\n * a file, by computing one or by determining the properties\n * of the original schedule. \n */\n__isl_give isl_schedule *get_schedule(struct autosa_gen *gen)\n{\n  return ppcg_get_schedule(gen->ctx, gen->options,\n                           &compute_or_set_properties, gen);\n}\n\n/* Since we are merging for the outermost band node, \n * we will check if for each validity constraint if the domain is lexicographically \n * less or equal to the range. \n * Note that this function only considers the outermost node.\n */\nstatic isl_bool is_dep_non_neg_at_node(\n  __isl_keep isl_schedule_node *node, __isl_keep isl_schedule_constraints *sc)\n{\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n    return isl_bool_false;\n  if (isl_schedule_node_band_n_member(node) == 0)\n    return isl_bool_false;\n\n  isl_union_map *validity;\n  isl_union_pw_multi_aff *contraction;\n  isl_multi_union_pw_aff *partial;\n  isl_union_set *domain;\n  int i, n;\n\n  validity = isl_schedule_constraints_get_validity(sc);\n  contraction = isl_schedule_node_get_subtree_contraction(node);\n  domain = isl_schedule_node_get_domain(node);\n  domain = isl_union_set_preimage_union_pw_multi_aff(domain, contraction);\n  validity = isl_union_map_intersect_domain(validity, isl_union_set_copy(domain));\n  validity = isl_union_map_intersect_range(validity, domain);\n  //DBGUMAP(stdout, validity, isl_schedule_node_get_ctx(node));\n\n  partial = isl_schedule_node_band_get_partial_schedule(node);\n  contraction = isl_schedule_node_get_subtree_contraction(node);\n  partial = isl_multi_union_pw_aff_pullback_union_pw_multi_aff(partial,\n                                                               contraction);\n  n = isl_schedule_node_band_n_member(node);\n  for (i = 0; i < n; i++)\n  {\n    isl_union_map *validity_i, *validity_i_eq, *validity_i_lt;\n    isl_union_pw_aff *upa;\n    isl_multi_union_pw_aff *partial_i;\n    int subset;\n\n    upa = isl_multi_union_pw_aff_get_union_pw_aff(partial, i);\n    partial_i = isl_multi_union_pw_aff_from_union_pw_aff(upa);    \n    validity_i_eq = isl_union_map_eq_at_multi_union_pw_aff(\n      isl_union_map_copy(validity), isl_multi_union_pw_aff_copy(partial_i));\n    validity_i_lt = isl_union_map_lex_lt_at_multi_union_pw_aff(\n      isl_union_map_copy(validity), partial_i);\n    validity_i = isl_union_map_union(validity_i_eq, validity_i_lt);\n    subset = isl_union_map_is_subset(validity, validity_i);\n    isl_union_map_free(validity_i);\n\n    if (subset <= 0)\n      break;    \n  }\n\n  isl_multi_union_pw_aff_free(partial);\n  isl_union_map_free(validity);\n\n  return (i == n) ? isl_bool_true : isl_bool_false;\n}\n\n/* Try to merge the outer bands of the schedule as much as possible as \n * long as they can form a permutable band.\n * Start from the outermost band, if the dependence distance on the current band \n * is non-zero, merge it with the parent band node. \n * This process stops until a non-band node is encoutnered.\n */\n__isl_give isl_schedule *merge_outer_bands(__isl_take isl_schedule *schedule, struct autosa_gen *gen)\n{\n  isl_schedule_node *node;\n  isl_schedule_constraints *sc;\n  isl_bool is_first_band = isl_bool_true;\n\n  node = isl_schedule_get_root(schedule); // points to the domain node\n  isl_schedule_free(schedule);\n  sc = construct_schedule_constraints(gen->prog);\n\n  node = isl_schedule_node_child(node, 0); // points to the first band band\n  while (isl_schedule_node_get_type(node) == isl_schedule_node_band) {\n    /* Examine if all dependence distances at this band are non-negative */    \n    isl_bool nneg = is_dep_non_neg_at_node(node, sc);\n    //std::cout << nneg << std::endl;\n    if (nneg) {\n      if (is_first_band)\n        is_first_band = isl_bool_false;\n      else {\n        /* Merge the node with the parent band node. */\n        node = isl_schedule_node_parent(node);\n        node = autosa_node_merge(node); // TODO: delete the partial schedule space name\n      }\n    }\n    node = isl_schedule_node_child(node, 0);\n  }\n\n  /* Set the coincidence. */\n  node = isl_schedule_node_parent(node);\n  if (isl_schedule_node_get_type(node) == isl_schedule_node_band) {\n    node = band_set_coincident(node, sc);\n  }\n\n  schedule = isl_schedule_node_get_schedule(node);\n  isl_schedule_node_free(node);\n  isl_schedule_constraints_free(sc);\n\n  return schedule;\n}\n\n/* Is \"node\" a mark node with an identifier called \"array\"?\n */\nstatic int node_is_array(__isl_keep isl_schedule_node *node)\n{\n  return is_marked(node, \"array\");\n}\n\n/* Is \"node\" a mark node with an identifier called \"anchor\"?\n */\nstatic int node_is_anchor(__isl_keep isl_schedule_node *node)\n{\n  return is_marked(node, \"anchor\");\n}\n\n/* Is \"node\" a mark node with an identifier called \"local\"?\n */\nstatic int node_is_local(__isl_keep isl_schedule_node *node)\n{\n  return is_marked(node, \"local\");\n}\n\n/* Is \"node\" a mark node with an identifier called \"pe\"?\n */\nstatic int node_is_pe(__isl_keep isl_schedule_node *node)\n{\n  return is_marked(node, \"pe\");\n}\n\n/* Is \"node\" a mark node with an identifier called \"kernel\"?\n */\nstatic int node_is_kernel(__isl_keep isl_schedule_node *node)\n{\n  return is_marked(node, \"kernel\");\n}\n\n/* Is \"node\" a mark node with an identifier called \"mark\"?\n */\nstatic int node_is_mark(__isl_keep isl_schedule_node *node, const char *mark)\n{\n  return is_marked(node, mark);\n}\n\n/* Is \"node\" a mark node with an identifier called \"io_L[x]\"?\n */\nstatic int node_is_io_mark(__isl_keep isl_schedule_node *node)\n{\n  isl_id *mark;\n  const char *name;\n  int has_name;\n\n  if (!node)\n    return -1;\n\n  if (isl_schedule_node_get_type(node) != isl_schedule_node_mark)\n    return 0;\n\n  mark = isl_schedule_node_mark_get_id(node);\n  if (!mark)\n    return -1;\n\n  name = isl_id_get_name(mark);\n  has_name = strncmp(name, \"io_L\", strlen(\"io_L\"));\n\n  isl_id_free(mark);\n\n  return has_name;\n}\n\n/* Assuming \"node\" is a filter node, does it correspond to the branch\n * that contains the \"array\" mark, i.e., does it contain any elements in\n * \"core\"?\n */\nstatic int node_is_core(__isl_keep isl_schedule_node *node,\n                        __isl_keep isl_union_set *core)\n{\n  int disjoint;\n  isl_union_set *filter;\n\n  filter = isl_schedule_node_filter_get_filter(node);\n  disjoint = isl_union_set_is_disjoint(filter, core);\n  isl_union_set_free(filter);\n  if (disjoint < 0)\n    return -1;\n\n  return !disjoint;\n}\n\n/* Move to the only child of \"node\" where the branch containing \n * the domain elements in \"core\".\n *\n * If \"node\" is not a sequence, then it only has one child and we move\n * to that single child.\n * Otherwise, we check each of the filters in the children, pick\n * the one that corresponds to \"core\" and return a pointer to the child\n * of the filter node.\n */\nstatic __isl_give isl_schedule_node *core_child(\n    __isl_take isl_schedule_node *node, __isl_keep isl_union_set *core)\n{\n  int i, n;\n  \n  if (isl_schedule_node_get_type(node) != isl_schedule_node_sequence)\n    return isl_schedule_node_child(node, 0);\n  \n  n = isl_schedule_node_n_children(node);\n  for (i = 0; i < n; ++i)\n  {\n    int is_core;\n\n    node = isl_schedule_node_child(node, i);\n    is_core = node_is_core(node, core);\n\n    if (is_core < 0)\n      return isl_schedule_node_free(node);\n    if (is_core)\n      return isl_schedule_node_child(node, 0);\n\n    node = isl_schedule_node_parent(node);\n  }  \n\n  isl_die(isl_schedule_node_get_ctx(node), isl_error_internal,\n          \"core child not found\", return isl_schedule_node_free(node));\n}\n\n/* Move down from the \"kernel\" mark (or at least a node with schedule\n * depth smaller than or equal to \"depth\") to a band node at schedule\n * depth \"depth\".  The \"array\" mark is assumed to have a schedule\n * depth greater than or equal to \"depth\".  The branch containing the\n * \"array\" mark is identified by the domain elements in \"core\".\n *\n * If the desired schedule depth is in the middle of band node,\n * then the band node is split into two pieces, the second piece\n * at the desired schedule depth.\n */\n__isl_give isl_schedule_node *autosa_tree_move_down_to_depth(\n    __isl_take isl_schedule_node *node, int depth,\n    __isl_keep isl_union_set *core)\n{\n  int is_local;\n  int is_array = 0;\n\n  while (node && isl_schedule_node_get_schedule_depth(node) < depth)\n  {\n    if (isl_schedule_node_get_type(node) ==\n        isl_schedule_node_band)\n    {\n      int node_depth, node_dim;\n      node_depth = isl_schedule_node_get_schedule_depth(node);\n      node_dim = isl_schedule_node_band_n_member(node);\n      if (node_depth + node_dim > depth)\n        node = isl_schedule_node_band_split(node,\n                                            depth - node_depth);\n    }\n    node = core_child(node, core);\n  }\n  while ((is_local = node_is_local(node)) == 0 &&\n         (is_array = node_is_array(node)) == 0 &&\n         isl_schedule_node_get_type(node) != isl_schedule_node_band)\n    node = core_child(node, core);\n  if (is_local < 0 || is_array < 0)\n    node = isl_schedule_node_free(node);\n\n  return node;\n}\n\n/* Move down the branch until the \"array\" mark is reached,\n * where the branch containing the \"array\" mark is \n * identified by the domain elements in \"core\".\n */\n__isl_give isl_schedule_node *autosa_tree_move_down_to_array(\n    __isl_take isl_schedule_node *node, __isl_keep isl_union_set *core)\n{\n  int is_array;\n\n  while ((is_array = node_is_array(node)) == 0)\n    node = core_child(node, core);\n\n  if (is_array < 0)\n    node = isl_schedule_node_free(node);\n\n  return node;\n}\n\n/* Move up the tree underneath the \"array\" mark until the \"array\" mark is reached. \n */\n__isl_give isl_schedule_node *autosa_tree_move_up_to_array(\n    __isl_take isl_schedule_node *node)\n{\n  int is_array;\n\n  while ((is_array = node_is_array(node)) == 0)\n    node = isl_schedule_node_parent(node);\n\n  if (is_array < 0)\n    node = isl_schedule_node_free(node);\n\n  return node;\n}\n\n/* Move down the branch between \"kernel\" and \"local\" until\n * the \"local\" mark is reached, where the branch containing the \"local\"\n * mark is identified by the domain elements in \"core\".\n */\n__isl_give isl_schedule_node *autosa_tree_move_down_to_local(\n    __isl_take isl_schedule_node *node, __isl_keep isl_union_set *core)\n{\n  int is_local;\n\n  while ((is_local = node_is_local(node)) == 0)\n    node = core_child(node, core);\n\n  if (is_local < 0)\n    node = isl_schedule_node_free(node);\n\n  return node;\n}\n\n/* Move down the branch until the \"kernel\" mark is reached. \n * In AutoSA, only one single kernel is identified, and it lies on the \n * linear branch below the domain node. Therefore, we can safely\n * traverse down the branch until the \"kernel\" mark is found.\n */\n__isl_give isl_schedule_node *autosa_tree_move_down_to_kernel(\n    __isl_take isl_schedule_node *node)\n{\n  int is_kernel;\n\n  while ((is_kernel = node_is_kernel(node)) == 0)\n    node = isl_schedule_node_child(node, 0);\n\n  if (is_kernel < 0)\n    node = isl_schedule_node_free(node);\n\n  return node;\n}\n\n/* Move up the tree underneath the \"kernel\" mark until\n * the \"kernel\" mark is reached.\n */\n__isl_give isl_schedule_node *autosa_tree_move_up_to_kernel(\n    __isl_take isl_schedule_node *node)\n{\n  int is_kernel;\n\n  while ((is_kernel = autosa_tree_node_is_kernel(node)) == 0)\n  {\n    node = isl_schedule_node_parent(node);\n  }\n  if (is_kernel < 0)\n    node = isl_schedule_node_free(node);\n\n  return node;\n}\n\n/* Move down the branch between \"kernel\" and \"pe\" until\n * the \"pe\" mark is reached, where the branch containing the \"pe\"\n * mark is identified by the domain elements in \"core\".\n */\n__isl_give isl_schedule_node *autosa_tree_move_down_to_pe(\n    __isl_take isl_schedule_node *node, __isl_keep isl_union_set *core)\n{\n  int is_pe;\n\n  while ((is_pe = node_is_pe(node)) == 0)\n    node = core_child(node, core);\n\n  if (is_pe < 0)\n    node = isl_schedule_node_free(node);\n\n  return node;\n}\n\n/* Move up the tree underneath the \"array\" mark until the \"pe\" mark is reached. \n */\n__isl_give isl_schedule_node *autosa_tree_move_up_to_pe(\n    __isl_take isl_schedule_node *node)\n{\n  int is_pe;\n\n  while ((is_pe = node_is_pe(node)) == 0)\n    node = isl_schedule_node_parent(node);\n\n  if (is_pe < 0)\n    node = isl_schedule_node_free(node);\n\n  return node;\n}\n\n/* Move down the branch between \"kernel\" and \"mark\" until\n * the \"mark\" mark is reached, where the branch containing the \"mark\"\n * mark is identified by the domain elements in \"core\".\n */\n__isl_give isl_schedule_node *autosa_tree_move_down_to_mark(\n    __isl_take isl_schedule_node *node, __isl_keep isl_union_set *core, const char *mark)\n{\n  int is_mark;\n\n  while ((is_mark = node_is_mark(node, mark)) == 0)\n    node = core_child(node, core);\n\n  if (is_mark < 0)\n    node = isl_schedule_node_free(node);\n\n  return node;\n}\n\n/* Move up the tree underneath the \"mark\" mark until the \"mark\" mark is reached. \n */\n__isl_give isl_schedule_node *autosa_tree_move_up_to_mark(\n    __isl_take isl_schedule_node *node, const char *mark)\n{\n  int is_mark;\n\n  while ((is_mark = node_is_mark(node, mark)) == 0)\n    node = isl_schedule_node_parent(node);\n\n  if (is_mark < 0)\n    node = isl_schedule_node_free(node);\n\n  return node;\n}\n\n/* Move down the branch between \"kernel\" and \"pe\" until\n * the first \"io_L[x]\" mark is reached, where the branch containing the \"io_L[x]\"\n * mark is identified by the domain elements in \"core\".\n */\n__isl_give isl_schedule_node *autosa_tree_move_down_to_first_io_mark(\n    __isl_take isl_schedule_node *node, __isl_keep isl_union_set *core)\n{\n  int is_io_mark;\n\n  while ((is_io_mark = node_is_io_mark(node)) == 0)\n    node = core_child(node, core);\n\n  if (is_io_mark < 0)\n    node = isl_schedule_node_free(node);\n\n  return node;\n}\n\n/* Move down the branch between \"kernel\" and \"pe\" until\n * the \"io_L[io_level]\" mark is reached, where the branch containing the io\n * mark is identified by the domain elements in \"core\".\n */\n__isl_give isl_schedule_node *autosa_tree_move_down_to_io_mark(\n    __isl_take isl_schedule_node *node, __isl_keep isl_union_set *core, int io_level)\n{\n  int is_mark;\n  isl_printer *p;\n  char *mark;  \n\n  p = isl_printer_to_str(isl_schedule_node_get_ctx(node));\n  p = isl_printer_print_str(p, \"io_L\");\n  p = isl_printer_print_int(p, io_level);\n  mark = isl_printer_get_str(p);\n  p = isl_printer_free(p);\n\n\n  while ((is_mark = node_is_mark(node, mark)) == 0) {\n    if (!isl_schedule_node_has_children(node))\n      break;\n    node = core_child(node, core);\n  }\n\n  if (is_mark <= 0)\n    node = isl_schedule_node_free(node);  \n  free(mark);\n\n  return node;\n}\n\n/* Move up the tree underneath the \"anchor\" mark until the \"anchor\" mark is reached. \n */\n__isl_give isl_schedule_node *autosa_tree_move_up_to_anchor(\n    __isl_take isl_schedule_node *node)\n{\n  int is_anchor;\n\n  while ((is_anchor = node_is_anchor(node)) == 0)\n    node = isl_schedule_node_parent(node);\n\n  if (is_anchor < 0)\n    node = isl_schedule_node_free(node);\n\n  return node;\n}\n\n/* Is \"node\" a mark node with an identifier called \"kernel\"?\n */\nint autosa_tree_node_is_kernel(__isl_keep isl_schedule_node *node)\n{\n  return is_marked(node, \"kernel\");\n}\n\n/* Is \"node\" a mark node with an identifier called \"mark\"?\n */\nint autosa_tree_node_is_mark(__isl_keep isl_schedule_node *node, const char *mark)\n{\n  if (mark == NULL)\n    return (isl_schedule_node_get_type(node) == isl_schedule_node_mark);\n\n  return is_marked(node, mark);\n}\n\n/* Insert a mark node with identifier \"local\" in front of \"node\".\n */\nstatic __isl_give isl_schedule_node *insert_local(\n    __isl_take isl_schedule_node *node)\n{\n  isl_ctx *ctx;\n  isl_id *id;\n\n  ctx = isl_schedule_node_get_ctx(node);\n  id = isl_id_alloc(ctx, \"local\", NULL);\n  node = isl_schedule_node_insert_mark(node, id);\n\n  return node;\n}\n\n/* Insert a \"local\" mark in front of the \"array\" mark \n * provided the linear branch between \"node\" and the \"array\" mark\n * does not contain such a \"local\" mark already.\n *\n * As a side effect, this function checks that the subtree at \"node\"\n * actually contains a \"array\" mark and that there is no branching\n * in between \"node\" and this \"array\" mark.\n * The new node at the original position of \"node\" is returned.\n */\n__isl_give isl_schedule_node *autosa_tree_insert_local_before_array(\n    __isl_take isl_schedule_node *node)\n{\n  int depth0, depth;\n  int any_local = 0;\n\n  if (!node)\n    return NULL;\n\n  depth0 = isl_schedule_node_get_tree_depth(node);\n\n  for (;;)\n  {\n    int is_array;\n    int n;\n\n    if (!any_local)\n    {\n      any_local = node_is_local(node);\n      if (any_local < 0)\n        return isl_schedule_node_free(node);\n    }\n    is_array = node_is_array(node);\n    if (is_array < 0)\n      return isl_schedule_node_free(node);\n    if (is_array)\n      break;\n    n = isl_schedule_node_n_children(node);\n    if (n == 0)\n      isl_die(isl_schedule_node_get_ctx(node),\n              isl_error_invalid,\n              \"no array marker found\",\n              return isl_schedule_node_free(node));\n    if (n > 1)\n      isl_die(isl_schedule_node_get_ctx(node),\n              isl_error_invalid,\n              \"expecting single array marker\",\n              return isl_schedule_node_free(node));\n\n    node = isl_schedule_node_child(node, 0);\n  }\n\n  if (!any_local)\n    node = insert_local(node);\n  depth = isl_schedule_node_get_tree_depth(node);\n  node = isl_schedule_node_ancestor(node, depth - depth0);\n\n  return node;\n}\n"
  },
  {
    "path": "src/autosa_schedule_tree.h",
    "content": "#ifndef _AUTOSA_SCHEDULE_TREE_H\n#define _AUTOSA_SCHEDULE_TREE_H\n\n#include <isl/schedule_node.h>\n\nint autosa_tree_node_is_kernel(__isl_keep isl_schedule_node *node);\nint autosa_tree_node_is_mark(__isl_keep isl_schedule_node *node, const char *mark);\nisl_bool isl_schedule_node_is_io_mark(__isl_keep isl_schedule_node *node, int io_level);\n\n__isl_give isl_schedule_node *autosa_tree_move_down_to_depth(\n    __isl_take isl_schedule_node *node, int depth,\n    __isl_keep isl_union_set *core);\n__isl_give isl_schedule_node *autosa_tree_move_down_to_array(\n    __isl_take isl_schedule_node *node, __isl_keep isl_union_set *core);\n__isl_give isl_schedule_node *autosa_tree_move_up_to_array(\n    __isl_take isl_schedule_node *node);\n__isl_give isl_schedule_node *autosa_tree_move_down_to_local(\n    __isl_take isl_schedule_node *node, __isl_keep isl_union_set *core);\n__isl_give isl_schedule_node *autosa_tree_move_down_to_kernel(\n    __isl_take isl_schedule_node *node);\n__isl_give isl_schedule_node *autosa_tree_move_up_to_kernel(\n    __isl_take isl_schedule_node *node);\n__isl_give isl_schedule_node *autosa_tree_move_down_to_pe(\n    __isl_take isl_schedule_node *node, __isl_keep isl_union_set *core);\n__isl_give isl_schedule_node *autosa_tree_move_up_to_pe(\n    __isl_take isl_schedule_node *node);\n__isl_give isl_schedule_node *autosa_tree_move_down_to_mark(\n    __isl_take isl_schedule_node *node, __isl_keep isl_union_set *core, const char *mark);\n__isl_give isl_schedule_node *autosa_tree_move_up_to_mark(\n    __isl_take isl_schedule_node *node, const char *mark);\n__isl_give isl_schedule_node *autosa_tree_move_down_to_first_io_mark(\n    __isl_take isl_schedule_node *node, __isl_keep isl_union_set *core);\n__isl_give isl_schedule_node *autosa_tree_move_down_to_io_mark(\n    __isl_take isl_schedule_node *node, __isl_keep isl_union_set *core, int io_level);\n__isl_give isl_schedule_node *autosa_tree_move_up_to_anchor(\n    __isl_take isl_schedule_node *node);\n\n__isl_give isl_schedule_node *autosa_tree_insert_local_before_array(\n    __isl_take isl_schedule_node *node);\n\n#endif\n"
  },
  {
    "path": "src/autosa_t2s.cpp",
    "content": ""
  },
  {
    "path": "src/autosa_tapa_cpp.cpp",
    "content": "#include <isl/ctx.h>\n\n#include \"autosa_tapa_cpp.h\"\n#include \"autosa_common.h\"\n#include \"autosa_comm.h\"\n#include \"autosa_print.h\"\n#include \"autosa_trans.h\"\n#include \"autosa_codegen.h\"\n#include \"autosa_utils.h\"\n\n#include <set>\n\nstruct print_host_user_data\n{\n  struct hls_info *hls;\n  struct autosa_prog *prog;\n  struct autosa_hw_top_module *top;\n};\n\nstruct print_hw_module_data\n{\n  struct hls_info *hls;\n  struct autosa_prog *prog;\n  struct autosa_hw_module *module;\n  /* Used for double buffer codegen. Modify the printed iterator prefix. */\n  const char *iterator_prefix;\n};\n\n/* Print the includes for TAPA host.\n */\nstatic void print_tapa_host_header(FILE *fp)\n{\n  fprintf(fp, \"#include <tapa.h>\\n\");\n  fprintf(fp, \"using tapa::aligned_allocator;\");\n}\n\n/* Open the host .cpp file and the kernel .h and .cpp files for writing.\n * Add the necessary includes.\n */\nstatic void hls_open_files(struct hls_info *info, const char *input)\n{\n  char name[PATH_MAX];\n  char dir[PATH_MAX];\n  int len, len_dir;\n  isl_printer *p_str;\n  char *file_path;\n\n  p_str = isl_printer_to_str(info->ctx);\n  p_str = isl_printer_print_str(p_str, info->output_dir);\n  p_str = isl_printer_print_str(p_str, \"/src/\");\n  file_path = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  len = ppcg_extract_base_name(name, input);\n  /* Add the prefix */\n  sprintf(dir, \"%s\", file_path);\n  len_dir = strlen(file_path);\n\n  strcpy(name + len, \"_host.cpp\");\n  strcpy(dir + len_dir, name);\n  info->host_c = fopen(dir, \"w\");\n  if (!info->host_c)\n  {\n    printf(\"[AutoSA] Error: Can't open the file: %s\\n\", dir);\n    exit(1);\n  }\n\n  strcpy(name + len, \"_host.h\");\n  strcpy(dir + len_dir, name);\n  info->host_h = fopen(dir, \"w\");\n\n  fprintf(info->host_h, \"template <typename T1, typename T2> \"                                                                                                                                                                              \n          \"inline T1 min(T1 x, T2 y) { return (x < T1(y)) ? x : T1(y); }\\n\");\n  fprintf(info->host_h, \"template <typename T1, typename T2> \"\n          \"inline T1 max(T1 x, T2 y) { return (x > T1(y)) ? x : T1(y); }\\n\");\n  fprintf(info->host_h, \"\\n\");\n  print_tapa_host_header(info->host_h);\n  fprintf(info->host_c, \"#include \\\"%s\\\"\\n\", name);\n\n  strcpy(name + len, \"_kernel_modules.cpp\");\n  strcpy(dir + len_dir, name);\n  info->kernel_c = fopen(dir, \"w\");\n  if (!info->kernel_c)\n  {\n    printf(\"[AutoSA] Error: Can't open the file: %s\\n\", dir);\n    exit(1);\n  }\n\n  strcpy(name + len, \"_kernel.h\");\n  strcpy(dir + len_dir, name);\n  info->kernel_h = fopen(dir, \"w\");\n  if (!info->kernel_h)\n  {\n    printf(\"[AutoSA] Error: Can't open the file: %s\\n\", dir);\n    exit(1);\n  }\n\n  fprintf(info->host_c, \"#include <assert.h>\\n\");\n  fprintf(info->host_c, \"#include <stdio.h>\\n\");\n  fprintf(info->host_c, \"#include \\\"%s\\\"\\n\\n\", name);\n  fprintf(info->kernel_c, \"#include \\\"%s\\\"\\n\", name);\n\n  strcpy(name + len, \"_top_gen.cpp\");\n  strcpy(dir + len_dir, name);\n  info->top_gen_c = fopen(dir, \"w\");\n\n  strcpy(name + len, \"_top_gen.h\");\n  strcpy(dir + len_dir, name);\n  info->top_gen_h = fopen(dir, \"w\");\n\n  fprintf(info->top_gen_c, \"#include <isl/printer.h>\\n\");\n  fprintf(info->top_gen_c, \"#include \\\"%s\\\"\\n\", name);\n\n  fprintf(info->kernel_h, \"#include <tapa.h>\\n\");\n  fprintf(info->kernel_h, \"#include <ap_int.h>\\n\");\n  fprintf(info->kernel_h, \"\\n\");\n\n  fprintf(info->kernel_c, \"template <typename T1, typename T2> \"\n          \"inline T1 min(T1 x, T2 y) { return (x < T1(y)) ? x : T1(y); }\\n\");\n  fprintf(info->kernel_c, \"template <typename T1, typename T2> \"\n          \"inline T1 max(T1 x, T2 y) { return (x > T1(y)) ? x : T1(y); }\\n\");\n  fprintf(info->kernel_c, \"\\n\");\n\n  free(file_path);\n}\n\n/* Close all output files.\n */\nstatic void hls_close_files(struct hls_info *info)\n{\n  isl_printer *p_str;\n  char *complete;\n  FILE *f;\n\n  fclose(info->kernel_c);\n  fclose(info->kernel_h);\n  fclose(info->host_c);\n  fclose(info->host_h);\n  fclose(info->top_gen_c);\n  fclose(info->top_gen_h);\n\n  p_str = isl_printer_to_str(info->ctx);\n  p_str = isl_printer_print_str(p_str, info->output_dir);\n  p_str = isl_printer_print_str(p_str, \"/src/completed\");\n  complete = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  f = fopen(complete, \"w\");\n  fclose(f);\n  free(complete);\n}\n\n/* Extract the data pack factors for each I/O buffer allocated for the current\n * I/O group.\n * Only insert the data pack factor that is not found in the current list\n * \"data_pack_factors\".\n * The list is in ascending order.\n */\nstatic int *extract_data_pack_factors(int *data_pack_factors,\n                                      int *n_factor, struct autosa_array_ref_group *group)\n{\n  /* Test if the group default packing factor needs to be inserted */\n  if (group->n_lane > 1)\n  {\n    int n_lane = group->n_lane;\n    bool insert = true;\n    int pos = 0;\n    for (pos = 0; pos < *n_factor; pos++)\n    {\n      if (n_lane > data_pack_factors[pos])\n      {\n        if (pos < *n_factor - 1)\n        {\n          if (n_lane < data_pack_factors[pos + 1])\n          {\n            // insert @pos+1\n            pos++;\n            break;\n          }\n        }\n      }\n      else if (n_lane == data_pack_factors[pos])\n      {\n        insert = false;\n        break;\n      }\n    }\n\n    if (insert) {\n      *n_factor = *n_factor + 1;\n      data_pack_factors = (int *)realloc(data_pack_factors,\n                                         sizeof(int) * (*n_factor));\n      for (int j = *n_factor - 1; j > pos; j--)\n      {\n        data_pack_factors[j] = data_pack_factors[j - 1];\n      }\n      data_pack_factors[pos] = n_lane;\n    }\n  }\n\n  for (int i = 0; i < group->n_io_buffer; i++)\n  {\n    struct autosa_io_buffer *buf = group->io_buffers[i];\n    bool insert = true;\n    int pos = 0;\n    for (pos = 0; pos < *n_factor; pos++)\n    {\n      if (buf->n_lane > data_pack_factors[pos])\n      {\n        if (pos < *n_factor - 1)\n        {\n          if (buf->n_lane < data_pack_factors[pos + 1])\n          {\n            // insert @pos+1\n            pos++;\n            break;\n          }\n        }\n      }\n      else if (buf->n_lane == data_pack_factors[pos])\n      {\n        insert = false;\n        break;\n      }\n    }\n\n    if (!insert)\n      continue;\n\n    *n_factor = *n_factor + 1;\n    data_pack_factors = (int *)realloc(data_pack_factors,\n                                       sizeof(int) * (*n_factor));\n    for (int j = *n_factor - 1; j > pos; j--)\n    {\n      data_pack_factors[j] = data_pack_factors[j - 1];\n    }\n    data_pack_factors[pos] = buf->n_lane;\n  }\n\n  return data_pack_factors;\n}\n\n/* Examine the local buffers of each array group.\n * Extract the data pack factors and build the data types\n * required by the program.\n */\nstatic isl_stat print_data_types_tapa(\n    struct autosa_hw_top_module *top, struct hls_info *hls)\n{\n  isl_printer *p;\n  struct autosa_kernel *kernel;\n\n  kernel = top->kernel;\n  p = isl_printer_to_file(kernel->ctx, hls->kernel_h);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_str_new_line(p, \"/* Data Type */\");\n\n  /* Print the primitive data type. */\n  for (int i = 0; i < kernel->n_array; i++) {\n    struct autosa_local_array_info *local = &kernel->array[i];\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"typedef \");\n    p = isl_printer_print_str(p, local->array->type);\n    p = isl_printer_print_str(p, \" \");\n    p = isl_printer_print_str(p, local->array->name);\n    p = isl_printer_print_str(p, \"_t1;\");\n    p = isl_printer_end_line(p);\n  }\n\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local = &kernel->array[i];\n    int *data_pack_factors = (int *)malloc(sizeof(int));\n    int n_factor = 1;\n    /* First insert the default data pack factor for the array. */\n    data_pack_factors[0] = local->n_lane;\n\n    /* IO group */\n    for (int n = 0; n < local->n_io_group; n++)\n    {\n      struct autosa_array_ref_group *group = local->io_groups[n];\n      data_pack_factors = extract_data_pack_factors(data_pack_factors, &n_factor, group);\n    }\n    /* Drain group */\n    if (local->drain_group)\n      data_pack_factors = extract_data_pack_factors(data_pack_factors, &n_factor, local->drain_group);\n\n    if (local->is_sparse) {\n      std::set<int> tmp_lanes;\n      for (int n = 0; n < n_factor; n++) {\n        tmp_lanes.insert(data_pack_factors[n] * kernel->n_nzero);\n        tmp_lanes.insert(data_pack_factors[n]);\n      }\n      for (auto it = tmp_lanes.begin(); it != tmp_lanes.end(); ++it) {\n        int f = *it;\n        if (local->array->size * 8 * f > 1024) {\n          printf(\"[AutoSA] Warning: The data width %d is greater than 1024-bit. The type definition is not generated.\\n\", local->array->size * 8 * f);\n          continue;\n        }\n        if (f > 1) {\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"typedef vec_t<\");\n          p = isl_printer_print_str(p, local->array->type);\n          p = isl_printer_print_str(p, \", \");\n          p = isl_printer_print_int(p, f);\n          p = isl_printer_print_str(p, \"> \");\n          p = isl_printer_print_str(p, local->array->name);\n          p = isl_printer_print_str(p, \"_t\");\n          p = isl_printer_print_int(p, f);\n          p = isl_printer_print_str(p, \";\");\n          p = isl_printer_end_line(p);\n        }\n      }\n\n      for (int n = 0; n < n_factor; n++) {\n        if (data_pack_factors[n] * kernel->n_nzero * local->array->size * 8 > 1024)\n          continue;\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"typedef struct \");\n        p = isl_printer_print_str(p, local->array->name);\n        p = isl_printer_print_str(p, \"_s_t\");\n        p = isl_printer_print_int(p, data_pack_factors[n]);\n        p = isl_printer_print_str(p, \" {\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_indent(p, 2);\n\n        p = isl_printer_start_line(p);\n        if (data_pack_factors[n] == 1 && kernel->n_nzero == 1) {\n          p = isl_printer_print_str(p, local->array->type);\n        } else {\n          p = isl_printer_print_str(p, local->array->name);\n          p = isl_printer_print_str(p, \"_t\");\n          p = isl_printer_print_int(p, data_pack_factors[n] * kernel->n_nzero);\n        }\n        p = isl_printer_print_str(p, \" d;\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        if (data_pack_factors[n] == 1 && kernel->n_nzero == 1) {\n          p = isl_printer_print_str(p, \"unsigned char\");\n        } else {\n          p = isl_printer_print_str(p, \"tapa::vec_t<\");\n          p = isl_printer_print_int(p, 8 * data_pack_factors[n]);\n          p = isl_printer_print_str(p, \">\");\n        }\n        p = isl_printer_print_str(p, \" i;\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_indent(p, -2);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"} \");\n        p = isl_printer_print_str(p, local->array->name);\n        p = isl_printer_print_str(p, \"_s_t\");\n        p = isl_printer_print_int(p, data_pack_factors[n]);\n        p = isl_printer_print_str(p, \";\");\n        p = isl_printer_end_line(p);\n      }\n    } else {\n      for (int n = 0; n < n_factor; n++)\n      {\n        if (data_pack_factors[n] != 1)\n        {\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"typedef tapa::vec_t<\");\n          p = isl_printer_print_str(p, local->array->type);\n          p = isl_printer_print_str(p, \", \");\n          p = isl_printer_print_int(p, data_pack_factors[n]);\n          p = isl_printer_print_str(p, \"> \");\n          p = isl_printer_print_str(p, local->array->name);\n          p = isl_printer_print_str(p, \"_t\");\n          p = isl_printer_print_int(p, data_pack_factors[n]);\n          p = isl_printer_print_str(p, \";\");\n          p = isl_printer_end_line(p);\n        }\n      }\n    }\n    free(data_pack_factors);\n  }\n  p = print_str_new_line(p, \"/* Data Type */\");\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  return isl_stat_ok;\n}\n\nstatic __isl_give isl_printer *declare_and_allocate_arrays(\n    __isl_take isl_printer *p, struct autosa_prog *prog,\n    struct autosa_kernel *kernel, struct autosa_hw_top_module *top)\n{\n  p = print_str_new_line(p, \"// Allocate memory in host memory\");\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    if (local_array->n_mem_ports > 1)\n    {\n      /* Create multiple host buffers. */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::vector<std::vector<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \", tapa::aligned_allocator<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \">>> \");\n      p = isl_printer_print_str(p, \"dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n      p = isl_printer_print_int(p, local_array->n_mem_ports);\n      p = isl_printer_print_str(p, \"; i++) {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::vector<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \", tapa::aligned_allocator<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \">> \");\n      p = isl_printer_print_str(p, \"dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"_tmp\");\n      p = isl_printer_print_str(p, \"(\");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \");\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \".push_back(dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"_tmp);\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n\n      if (local_array->host_serialize) {\n        /* Allocate additional serialize buffer. */\n        /* Create multiple host buffers. */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"std::vector<std::vector<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \", tapa::aligned_allocator<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \">>> \");\n        p = isl_printer_print_str(p, \"dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n\n        p = isl_printer_print_str(p, \";\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n        p = isl_printer_print_int(p, local_array->n_mem_ports);\n        p = isl_printer_print_str(p, \"; i++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"std::vector<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \", tapa::aligned_allocator<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \">> \");\n        p = isl_printer_print_str(p, \"dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"_tmp\");\n        p = isl_printer_print_str(p, \"(\");\n        p = isl_printer_print_pw_qpolynomial(p, local_array->serialize_bound);\n        if (local_array->is_sparse) {\n          p = isl_printer_print_str(p, \" / \");\n          p = isl_printer_print_double(p, (double)local_array->eff_compress_ratio);\n        }\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \".push_back(dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"_tmp);\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n      }\n    }\n    else\n    {\n      /* Create a single host buffer. */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::vector<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \", tapa::aligned_allocator<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \">> \");\n      p = isl_printer_print_str(p, \"dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \"(\");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \");\");\n      p = isl_printer_end_line(p);\n\n      if (local_array->host_serialize) {\n        /* Create a single host buffer. */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"std::vector<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \", tapa::aligned_allocator<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \">> \");\n        p = isl_printer_print_str(p, \"dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"(\");\n        p = isl_printer_print_pw_qpolynomial(p, local_array->serialize_bound);\n        if (local_array->is_sparse) {\n          p = isl_printer_print_str(p, \" / \");\n          p = isl_printer_print_double(p, (double)local_array->eff_compress_ratio);\n        }\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n  p = isl_printer_end_line(p);\n\n  /* Initialize buffer. */\n  p = print_str_new_line(p, \"// Initialize host buffers\");\n\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    if (local_array->n_mem_ports > 1 && local_array->array->copy_in)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n      p = isl_printer_print_int(p, local_array->n_mem_ports);\n      p = isl_printer_print_str(p, \"; i++) {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::copy(reinterpret_cast<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *>(\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->is_sparse)\n        p = isl_printer_print_str(p, \"_s\");\n      p = isl_printer_print_str(p, \"), reinterpret_cast<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *>(\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->is_sparse)\n        p = isl_printer_print_str(p, \"_s\");\n      p = isl_printer_print_str(p, \") + \");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \", dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \"[i]\");\n      p = isl_printer_print_str(p, \".begin());\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n    }\n    else if (local_array->array->copy_in)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::copy(reinterpret_cast<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *>(\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->is_sparse)\n        p = isl_printer_print_str(p, \"_s\");\n      p = isl_printer_print_str(p, \"), reinterpret_cast<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *>(\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->is_sparse)\n        p = isl_printer_print_str(p, \"_s\");\n      p = isl_printer_print_str(p, \") + \");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \", dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \".begin());\");\n      p = isl_printer_end_line(p);\n    }\n  }\n\n  /* Perform data serialization if needed. */\n  for (int i = 0; i < top->n_hw_modules; i++) {\n    struct autosa_hw_module *module = top->hw_modules[i];\n    if (module->serialize_tree && module->in) {\n      struct autosa_array_ref_group *group = module->io_groups[0];\n      struct autosa_local_array_info *local_array = group->local_array;\n      if (local_array->n_mem_ports > 1 && local_array->array->copy_in)\n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n        p = isl_printer_print_int(p, local_array->n_mem_ports);\n        p = isl_printer_print_str(p, \"; i++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, module->in? \"host_serialize_\" : \"host_deserialize_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"(\");\n        p = print_host_serialize_arguments(p, kernel, group, module, 0, 0);  // TODO: add hbm support later.\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n      } else\n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, module->in? \"host_serialize_\" : \"host_deserialize_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"(\");\n        p = print_host_serialize_arguments(p, kernel, group, module, 0, 0);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"// Explicitly create TAPA mmap objects\");\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"std::vector<\");\n    if (local_array->array->copy_in) {\n      if (local_array->array->copy_out)\n        p = isl_printer_print_str(p, \"tapa::read_write_mmap<\");\n      else\n        p = isl_printer_print_str(p, \"tapa::read_only_mmap<\");\n    } else if (local_array->array->copy_out)\n      p = isl_printer_print_str(p, \"tapa::write_only_mmap<\");\n    else\n      p = isl_printer_print_str(p, \"tapa::placeholder_mmap<\");\n    p = isl_printer_print_str(p, local_array->array->type);\n    p = isl_printer_print_str(p, \">> buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n  }\n\n  p = print_str_new_line(p, \"// Set the direction of the TAPA mmap objects\");\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    //for (int j = 0; j < local_array->n_mem_ports; j++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n    p = isl_printer_print_int(p, local_array->n_mem_ports);\n    p = isl_printer_print_str(p, \"; i++) {\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n\n    p = isl_printer_start_line(p);\n    if (local_array->array->copy_in && local_array->array->copy_out) {\n      p = isl_printer_print_str(p, \"tapa::read_write_mmap<\");\n    } else {\n      if (local_array->array->copy_in)\n        p = isl_printer_print_str(p, \"tapa::read_only_mmap<\");\n      else if (local_array->array->copy_out)\n        p = isl_printer_print_str(p, \"tapa::write_only_mmap<\");\n    }\n    p = isl_printer_print_str(p, local_array->array->type);\n    p = isl_printer_print_str(p, \"> buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"_tmp(\");\n    p = isl_printer_print_str(p, \"dev_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    if (local_array->n_mem_ports > 1) {\n      p = isl_printer_print_str(p, \"[i]\");\n    }\n    p = isl_printer_print_str(p, \");\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \".push_back(std::move(buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"_tmp));\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n\n/* Print code for initializing the device for execution of the transformed\n * code. This includes declaring locally defined variables as well as\n * declaring and allocating the required copies of arrays on the device.\n */\nstatic __isl_give isl_printer *init_device_tapa(__isl_take isl_printer *p,\n                                                struct autosa_prog *prog,\n                                                struct autosa_kernel *kernel,\n                                                int hls,\n                                                struct autosa_hw_top_module *top)\n{\n  p = autosa_print_local_declarations(p, prog);\n  p = declare_and_allocate_arrays(p, prog, kernel, top);\n\n  return p;\n}\n\n/* Print code for clearing the device after execution of the transformed code.\n * In particular, free the memory that was allocated on the device.\n */\nstatic __isl_give isl_printer *clear_device_tapa(__isl_take isl_printer *p,\n                                                   struct autosa_prog *prog,\n                                                   struct autosa_kernel *kernel,\n                                                   int hls,\n                                                   struct autosa_hw_top_module *top)\n{\n  /* Deserialize the buffer data if necessary. */\n  for (int i = 0; i < top->n_hw_modules; i++) {\n    struct autosa_hw_module *module = top->hw_modules[i];\n    if (module->serialize_tree && !module->in) {\n      struct autosa_array_ref_group *group = module->io_groups[0];\n      struct autosa_local_array_info *local_array = group->local_array;\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"host_deserialize_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"(\");\n      p = print_host_serialize_arguments(p, top->kernel, group, module, 0, 0);  // TODO: add hbm support later.\n      p = isl_printer_print_str(p, \");\");\n      p = isl_printer_end_line(p);\n    }\n  }\n\n  /* Restore buffer */\n  p = print_str_new_line(p, \"// Restore data from host buffers\");\n  for (int i = 0; i < prog->n_array; i++)\n  {\n    struct autosa_array_info *array = &prog->array[i];\n    if (!autosa_array_requires_device_allocation(array))\n      continue;\n\n    if (array->copy_out)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::copy(dev_\");\n      p = isl_printer_print_str(p, array->name);\n      if (array->local_array->host_serialize) {\n        p = isl_printer_print_str(p, \"_unserialized\");\n      }\n      if (array->local_array->n_mem_ports > 1)\n      {\n        p = isl_printer_print_str(p, \"[0]\");\n      }\n      p = isl_printer_print_str(p, \".begin(), dev_\");\n      p = isl_printer_print_str(p, array->name);\n      if (array->local_array->host_serialize) {\n        p = isl_printer_print_str(p, \"_unserialized\");\n      }\n      if (array->local_array->n_mem_ports > 1)\n      {\n        p = isl_printer_print_str(p, \"[0]\");\n      }\n      p = isl_printer_print_str(p, \".end(), reinterpret_cast<\");\n      p = isl_printer_print_str(p, array->type);\n      p = isl_printer_print_str(p, \" *>(\");\n      p = isl_printer_print_str(p, array->name);\n      p = isl_printer_print_str(p, \"));\");\n      p = isl_printer_end_line(p);\n    }\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *drain_merge_tapa(\n    __isl_take isl_printer *p, struct autosa_prog *prog,\n    struct autosa_drain_merge_func *func,\n    int hls)\n{\n  struct autosa_array_ref_group *group = func->group;\n  p = print_str_new_line(p, \"// Merge results\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"for (int idx = \");\n  p = isl_printer_print_int(p, group->mem_port_id);\n  p = isl_printer_print_str(p, \"; idx < \");\n  p = isl_printer_print_int(p, group->mem_port_id + group->n_mem_ports);\n  p = isl_printer_print_str(p, \"; idx++) {\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_indent(p, 2);\n  p = isl_printer_start_line(p);\n  p = autosa_array_ref_group_print_prefix(group, p);\n  p = isl_printer_print_str(p, \"_drain_merge(\");\n  p = print_drain_merge_arguments(p, func->kernel, group, func, 0, hls);\n  p = isl_printer_print_str(p, \");\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_end_line(p);\n  return p;\n}\n\n/* Print a statement for copying an array to or from the device,\n * or for initializing or clearing the device.\n * The statement identifier of a copying node is called\n * \"to_device_<array name>\" or \"from_device_<array name>\" and\n * its user pointer points to the autosa_array_info of the array\n * that needs to be copied.\n * The node for initializing the device is called \"init_device\".\n * The node for clearing the device is called \"clear_device\".\n *\n * Extract the array (if any) from the identifier and call\n * init_device, clear_device, copy_array_to_device or copy_array_from_device.\n */\nstatic __isl_give isl_printer *print_device_node_tapa(__isl_take isl_printer *p,\n                                                        __isl_keep isl_ast_node *node,\n                                                        struct autosa_prog *prog,\n                                                        int hls,\n                                                        struct autosa_hw_top_module *top)\n{\n  isl_ast_expr *expr, *arg;\n  isl_id *id;\n  const char *name;\n  struct autosa_array_info *array;\n  struct autosa_kernel *kernel;\n  struct autosa_drain_merge_func *func;\n\n  expr = isl_ast_node_user_get_expr(node);\n  arg = isl_ast_expr_get_op_arg(expr, 0);\n  id = isl_ast_expr_get_id(arg);\n  name = isl_id_get_name(id);\n  if (!strcmp(name, \"init_device\") || !strcmp(name, \"clear_device\"))\n    kernel = (struct autosa_kernel *)isl_id_get_user(id);\n  else if (!strcmp(name, \"drain_merge\"))\n    func = (struct autosa_drain_merge_func *)isl_id_get_user(id);\n  else\n    array = (struct autosa_array_info *)isl_id_get_user(id);\n  isl_id_free(id);\n  isl_ast_expr_free(arg);\n  isl_ast_expr_free(expr);\n\n  if (!name)\n    return isl_printer_free(p);\n  if (!strcmp(name, \"init_device\"))\n    return init_device_tapa(p, prog, kernel, hls, top);\n  if (!strcmp(name, \"clear_device\"))\n    return clear_device_tapa(p, prog, kernel, hls, top);\n  if (!strcmp(name, \"drain_merge\"))\n    return drain_merge_tapa(p, prog, func, hls);\n  if (!array)\n    return isl_printer_free(p);\n\n  return p;\n}\n\n/* Print the header of the given kernel to both gen->hls.kernel_h\n * and gen->hls.kernel_c.\n */\nstatic void print_kernel_headers_tapa(struct autosa_prog *prog,\n                                        struct autosa_kernel *kernel, struct hls_info *hls)\n{\n  isl_printer *p;\n\n  p = isl_printer_to_file(prog->ctx, hls->kernel_h);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_kernel_header(p, prog, kernel, hls, 1);\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n\n  isl_printer_free(p);\n}\n\n/* Print the user statement of the host code to \"p\".\n *\n * The host code may contain original user statements, kernel launches,\n * statements that copy data to/from the device and statements\n * the initialize or clear the device.\n * The original user statements and the kernel launches have\n * an associated annotation, while the other statements do not.\n * The latter are handled by print_device_node.\n * The annotation on the user statements is called \"user\".\n *\n * In case of a kernel launch, print a block of statements that\n * defines the grid and the block and then launches the kernel.\n */\nstatic __isl_give isl_printer *print_host_user_tapa(__isl_take isl_printer *p,\n                                                      __isl_take isl_ast_print_options *print_options,\n                                                      __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  int is_user;\n  struct autosa_kernel *kernel;\n  struct autosa_kernel_stmt *stmt;\n  struct print_host_user_data *data;\n  struct hls_info *hls;\n  struct autosa_hw_top_module *top;\n\n  isl_ast_print_options_free(print_options);\n\n  data = (struct print_host_user_data *)user;\n  hls = data->hls;\n  top = data->top;\n\n  id = isl_ast_node_get_annotation(node);\n  if (!id)\n  {\n    return print_device_node_tapa(p, node, data->prog, hls->hls, top);\n  }\n\n  is_user = !strcmp(isl_id_get_name(id), \"user\");\n  kernel = is_user ? NULL : (struct autosa_kernel *)isl_id_get_user(id);\n  stmt = is_user ? (struct autosa_kernel_stmt *)isl_id_get_user(id) : NULL;\n  isl_id_free(id);\n\n  if (is_user)\n    return autosa_kernel_print_domain(p, stmt);\n\n  p = print_str_new_line(p, \"// Launch the kernel\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"tapa::task().invoke(kernel0, \");\n  p = print_kernel_arguments(p, data->prog, kernel, 0, hls);\n  p = isl_printer_print_str(p, \");\");\n  p = isl_printer_end_line(p);\n\n  /* Print the top kernel header. */\n  print_kernel_headers_tapa(data->prog, kernel, data->hls);\n\n  return p;\n}\n\n/* Print the header of the given module.\n */\nstatic __isl_give isl_printer *print_module_header_tapa(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_hw_module *module,\n    int inter, int boundary)\n{\n  int n = isl_id_list_n_id(module->inst_ids);;\n  int first = 1;\n\n  if (n > 0 && prog->scop->options->autosa->use_cplusplus_template) {\n    /* Print the index template */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"template<\");\n    for (int i = 0; i < n; i++) {\n      if (!first)\n        p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, \"int p\");\n      p = isl_printer_print_int(p, i);\n      first = 0;\n    }\n    p = isl_printer_print_str(p, \">\");\n    p = isl_printer_end_line(p);\n  }\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"void \");\n  p = isl_printer_print_str(p, module->name);\n  if (inter == 0)\n    p = isl_printer_print_str(p, \"_intra_trans\");\n  else if (inter == 1)\n    p = isl_printer_print_str(p, \"_inter_trans\");\n  if (boundary)\n    p = isl_printer_print_str(p, \"_boundary\");\n  p = isl_printer_print_str(p, \"(\");\n  p = print_module_arguments(p, prog, module->kernel, module, 1, TAPA_HW, inter, -1, boundary, 0);\n  p = isl_printer_print_str(p, \")\");\n\n  return p;\n}\n\n/* Print the header of the given module to both gen->hls.kernel_h\n * and gen->hls.kernel_c\n * If \"inter\" is -1, this is a normal module call.\n * If \"inter\" is 0, this is a intra_trans module call.\n * If \"inter\" is 1, this is a inter_trans module call.\n */\nstatic isl_stat print_module_headers_tapa(\n    struct autosa_prog *prog, struct autosa_hw_module *module,\n    struct hls_info *hls, int inter, int boundary)\n{\n  isl_printer *p;\n\n  p = isl_printer_to_file(prog->ctx, hls->kernel_h);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_module_header_tapa(p, prog, module, inter, boundary);\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  p = isl_printer_to_file(prog->ctx, hls->kernel_c);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_module_header_tapa(p, prog, module, inter, boundary);\n  isl_printer_free(p);\n\n  return isl_stat_ok;\n}\n\n/* Print out variable declarations\n * The local variable can be mapped to different memory resources:\n * FF, LUTRAM, BRAM, URAM.\n */\nstatic __isl_give isl_printer *print_module_var_tapa(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_var *var, int double_buffer,\n    struct autosa_hw_module *module)\n{\n  int j;\n  int use_memory = 0; // 0: FF 1: LUTRAM 2: BRAM 3: URAM\n  use_memory = extract_memory_type(module, var, module->options->autosa->uram);\n\n  p = isl_printer_start_line(p);\n  if (var->array->local_array->is_sparse && module->type != PE_MODULE) {\n    p = isl_printer_print_str(p, var->array->name);\n    p = isl_printer_print_str(p, \"_s_t\");\n    p = isl_printer_print_int(p, var->n_lane);\n  } else {\n    p = isl_printer_print_str(p, var->array->name);\n    p = isl_printer_print_str(p, \"_t\");\n    p = isl_printer_print_int(p, var->n_lane);\n  }\n  p = isl_printer_print_str(p, \" \");\n  p = isl_printer_print_str(p, var->name);\n  if (double_buffer)\n    p = isl_printer_print_str(p, \"_ping\");\n  for (j = 0; j < isl_vec_size(var->size); ++j)\n  {\n    isl_val *v;\n\n    p = isl_printer_print_str(p, \"[\");\n    v = isl_vec_get_element_val(var->size, j);\n    p = isl_printer_print_val(p, v);\n    isl_val_free(v);\n    p = isl_printer_print_str(p, \"]\");\n  }\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n  if (use_memory && var->n_part != 1)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"#pragma HLS ARRAY_PARTITION variable=\");\n    p = isl_printer_print_str(p, var->name);\n    if (double_buffer)\n      p = isl_printer_print_str(p, \"_ping\");\n    p = isl_printer_print_str(p, \" dim=\");\n    p = isl_printer_print_int(p, isl_vec_size(var->size));\n    p = isl_printer_print_str(p, \" factor=\");\n    p = isl_printer_print_int(p, var->n_part);\n    p = isl_printer_print_str(p, \" cyclic\");\n    p = isl_printer_end_line(p);\n  } else if (use_memory == 0) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"#pragma HLS ARRAY_PARTITION variable=\");\n    p = isl_printer_print_str(p, var->name);\n    if (double_buffer)\n      p = isl_printer_print_str(p, \"_ping\");\n    p = isl_printer_print_str(p, \" dim=0 complete\");\n    p = isl_printer_end_line(p);\n  }\n\n  if (use_memory)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"#pragma HLS RESOURCE variable=\");\n    p = isl_printer_print_str(p, var->name);\n    if (double_buffer)\n      p = isl_printer_print_str(p, \"_ping\");\n    if (module->type == IO_MODULE && module->data_pack_inter == module->data_pack_intra)\n      p = isl_printer_print_str(p, use_memory == 1 ? \" core=RAM_1P_LUTRAM\" : (use_memory == 2 ? \" core=RAM_1P_BRAM\" : \" core=RAM_1P_URAM\"));\n    else\n      p = isl_printer_print_str(p, use_memory == 1 ? \" core=RAM_2P_LUTRAM\" : (use_memory == 2 ? \" core=RAM_2P_BRAM\" : \" core=RAM_2P_URAM\"));\n    p = isl_printer_end_line(p);\n\n    if (var->array->local_array->is_sparse) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"#pragma HLS DATA_PACK variable=\");\n      p = isl_printer_print_str(p, var->name);\n      if (double_buffer)\n        p = isl_printer_print_str(p, \"_ping\");\n      p = isl_printer_end_line(p);\n    }\n  }\n\n  /* Print pong buffer */\n  if (double_buffer)\n  {\n    p = isl_printer_start_line(p);\n    if (var->array->local_array->is_sparse) {\n      p = isl_printer_print_str(p, var->array->name);\n      p = isl_printer_print_str(p, \"_s_t\");\n      p = isl_printer_print_int(p, var->n_lane);\n    } else {\n      if (var->n_lane == 1)\n        p = isl_printer_print_str(p, var->array->type);\n      else {\n        p = isl_printer_print_str(p, var->array->name);\n        p = isl_printer_print_str(p, \"_t\");\n        p = isl_printer_print_int(p, var->n_lane);\n      }\n    }\n    p = isl_printer_print_str(p, \" \");\n    p = isl_printer_print_str(p, var->name);\n    if (double_buffer)\n      p = isl_printer_print_str(p, \"_pong\");\n    for (j = 0; j < isl_vec_size(var->size); ++j)\n    {\n      isl_val *v;\n\n      p = isl_printer_print_str(p, \"[\");\n      v = isl_vec_get_element_val(var->size, j);\n      p = isl_printer_print_val(p, v);\n      isl_val_free(v);\n      p = isl_printer_print_str(p, \"]\");\n    }\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n    if (use_memory && var->n_part != 1)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"#pragma HLS ARRAY_PARTITION variable=\");\n      p = isl_printer_print_str(p, var->name);\n      if (double_buffer)\n        p = isl_printer_print_str(p, \"_pong\");\n      p = isl_printer_print_str(p, \" dim=\");\n      p = isl_printer_print_int(p, isl_vec_size(var->size));\n      p = isl_printer_print_str(p, \" factor=\");\n      p = isl_printer_print_int(p, var->n_part);\n      p = isl_printer_print_str(p, \" cyclic\");\n      p = isl_printer_end_line(p);\n    } else if (use_memory == 0) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"#pragma HLS ARRAY_PARTITION variable=\");\n      p = isl_printer_print_str(p, var->name);\n      if (double_buffer)\n        p = isl_printer_print_str(p, \"_pong\");\n      p = isl_printer_print_str(p, \" dim=0 complete\");\n      p = isl_printer_end_line(p);\n    }\n\n    if (use_memory)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"#pragma HLS RESOURCE variable=\");\n      p = isl_printer_print_str(p, var->name);\n      p = isl_printer_print_str(p, \"_pong\");\n      if (module->type == IO_MODULE && module->data_pack_inter == module->data_pack_intra)\n        p = isl_printer_print_str(p, use_memory == 1 ? \" core=RAM_1P_LUTRAM\" : (use_memory == 2 ? \" core=RAM_1P_BRAM\" : \" core=RAM_1P_URAM\"));\n      else\n        p = isl_printer_print_str(p, use_memory == 1 ? \" core=RAM_2P_LUTRAM\" : (use_memory == 2 ? \" core=RAM_2P_BRAM\" : \" core=RAM_2P_URAM\"));\n      p = isl_printer_end_line(p);\n\n      if (var->array->local_array->is_sparse) {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"#pragma HLS DATA_PACK variable=\");\n        p = isl_printer_print_str(p, var->name);\n        p = isl_printer_print_str(p, \"_pong\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_module_vars_tapa(__isl_take isl_printer *p,\n                                                        struct autosa_hw_module *module, int inter)\n{\n  int i, n;\n  isl_space *space;\n  const char *type;\n\n  if (inter == -1)\n  {\n    for (i = 0; i < module->n_var; ++i)\n      p = print_module_var_tapa(p, &module->var[i], module->double_buffer, module);\n  }\n\n  if (module->double_buffer && inter == -1)\n  {\n    type = isl_options_get_ast_iterator_type(module->kernel->ctx);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"bool arb = 0;\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, module->in ? \"bool inter_trans_en = 1;\" : \"bool inter_trans_en = 0;\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, module->in ? \"bool intra_trans_en = 0;\" : \"bool intra_trans_en = 1;\");\n    p = isl_printer_end_line(p);\n    /* iterators */\n    space = (module->in) ? module->intra_space : module->inter_space;\n    n = isl_space_dim(space, isl_dim_set);\n    for (int i = 0; i < n; i++)\n    {\n      const char *name;\n      name = isl_space_get_dim_name(space, isl_dim_set, i);\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, type);\n      p = isl_printer_print_str(p, \" \");\n      p = isl_printer_print_str(p, name);\n      p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, name);\n      p = isl_printer_print_str(p, \"_prev\");\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    }\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_for_with_pipeline(\n    __isl_keep isl_ast_node *node, __isl_take isl_printer *p,\n    __isl_take isl_ast_print_options *print_options)\n{\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"#pragma HLS PIPELINE II=1\");\n  p = isl_printer_end_line(p);\n\n  p = isl_ast_node_for_print(node, p, print_options);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_for_with_unroll(\n    __isl_keep isl_ast_node *node, __isl_take isl_printer *p,\n    __isl_take isl_ast_print_options *print_options)\n{\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"#pragma HLS UNROLL\");\n  p = isl_printer_end_line(p);\n\n  p = isl_ast_node_for_print(node, p, print_options);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_for_tapa(__isl_take isl_printer *p,\n                                                __isl_take isl_ast_print_options *print_options,\n                                                __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  int pipeline;\n  int unroll;\n\n  pipeline = 0;\n  unroll = 0;\n  id = isl_ast_node_get_annotation(node);\n\n  if (id)\n  {\n    struct autosa_ast_node_userinfo *info;\n\n    info = (struct autosa_ast_node_userinfo *)isl_id_get_user(id);\n    if (info && info->is_pipeline)\n      pipeline = 1;\n    if (info && info->is_unroll)\n      unroll = 1;\n  }\n\n  if (pipeline)\n    p = print_for_with_pipeline(node, p, print_options);\n  else if (unroll)\n    p = print_for_with_unroll(node, p, print_options);\n  else\n    p = isl_ast_node_for_print(node, p, print_options);\n\n  isl_id_free(id);\n\n  return p;\n}\n\n/* Print the intra_trans module.\n */\nstatic __isl_give isl_printer *autosa_print_intra_trans_module(\n    __isl_take isl_printer *p,\n    struct autosa_hw_module *module, struct autosa_prog *prog,\n    struct hls_info *hls, int boundary)\n{\n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  if (!module->intra_tree)\n    return p;\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  print_module_headers_tapa(prog, module, hls, 0, boundary);\n  fprintf(hls->kernel_c, \" {\\n\");\n  /* If double buffer is disabled, the module is then inlined to reduce the\n   * overheads.\n   * Double buffer module can't inlined, this might cause deadlocks.\n   */\n  if (module->double_buffer)\n    fprintf(hls->kernel_c, \"#pragma HLS INLINE OFF\\n\");\n  else\n    fprintf(hls->kernel_c, \"#pragma HLS INLINE\\n\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  if (!prog->scop->options->autosa->use_cplusplus_template) {\n    p = print_module_iterators(p, hls->kernel_c, module);\n  }\n  p = print_module_vars_tapa(p, module, 0);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  if (module->double_buffer)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"if (!intra_trans_en) return;\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_end_line(p);\n  }\n  /* For local reduce, print the buffer initialization. */\n  for (int i = 0; i < module->n_var; i++) {\n    if (module->var[i].init_required) {\n      p = autosa_print_var_initialization(p, &module->var[i], hls->target);\n    }\n  }\n  p = isl_printer_end_line(p);\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &print_for_tapa, &hw_data);\n\n  p = isl_ast_node_print(module->intra_tree, p, print_options);\n  p = isl_printer_indent(p, -2);\n\n  fprintf(hls->kernel_c, \"}\\n\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print the inter_trans module.\n */\nstatic __isl_give isl_printer *autosa_print_inter_trans_module(\n    __isl_take isl_printer *p,\n    struct autosa_hw_module *module, struct autosa_prog *prog,\n    struct hls_info *hls, int boundary)\n{\n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  if (boundary) {\n    if (!module->boundary_inter_tree)\n      return p;\n  } else {\n    if (!module->inter_tree)\n      return p;\n  }\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  print_module_headers_tapa(prog, module, hls, 1, boundary);\n  fprintf(hls->kernel_c, \" {\\n\");\n  if (module->double_buffer)\n    fprintf(hls->kernel_c, \"#pragma HLS INLINE OFF\\n\");\n  else\n    fprintf(hls->kernel_c, \"#pragma HLS INLINE\\n\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  if (!prog->scop->options->autosa->use_cplusplus_template) {\n    p = print_module_iterators(p, hls->kernel_c, module);\n  }\n  p = print_module_vars_tapa(p, module, 1);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  if (module->double_buffer)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"if (!inter_trans_en) return;\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_end_line(p);\n  }\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &print_for_tapa, &hw_data);\n\n  p = isl_ast_node_print((boundary == 0) ? module->inter_tree : module->boundary_inter_tree, p, print_options);\n  p = isl_printer_indent(p, -2);\n\n  fprintf(hls->kernel_c, \"}\\n\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_module_core_header_tapa(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_hw_module *module,\n    int inter, int boundary, int serialize, int types)\n{\n  int n = isl_id_list_n_id(module->inst_ids);\n  if (types && n > 0 && prog->scop->options->autosa->use_cplusplus_template) {\n    /* Print the template */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"template<\");\n    for (int i = 0; i < n; i++) {\n      if (i > 0)\n        p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, \"int p\");\n      p = isl_printer_print_int(p, i);\n    }\n    p = isl_printer_print_str(p, \">\");\n    p = isl_printer_end_line(p);\n  }\n\n  p = isl_printer_start_line(p);\n  if (types)\n    p = isl_printer_print_str(p, \"void \");\n  p = isl_printer_print_str(p, module->name);\n  if (inter == 0)\n    p = isl_printer_print_str(p, \"_intra_trans\");\n  else if (inter == 1)\n    p = isl_printer_print_str(p, \"_inter_trans\");\n  if (boundary)\n    p = isl_printer_print_str(p, \"_boundary\");\n  if (serialize)\n    p = isl_printer_print_str(p, \"_serialize\");\n  if (!types && n > 0 && prog->scop->options->autosa->use_cplusplus_template) {\n    p = isl_printer_print_str(p, \"<\");\n    for (int i = 0; i < n; i++) {\n      if (i > 0)\n        p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, \"p\");\n      p = isl_printer_print_int(p, i);\n    }\n    p = isl_printer_print_str(p, \">\");\n  }\n  p = isl_printer_print_str(p, \"(\");\n  if (!types) {\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n    p = isl_printer_start_line(p);\n  }\n  p = print_module_arguments(p, prog, module->kernel, module, types,\n                             TAPA_HW, inter, -1, boundary, serialize);\n  p = isl_printer_print_str(p, \")\");\n  if (!types) {\n    p = isl_printer_indent(p, -2);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_module_core_headers_tapa(\n    __isl_take isl_printer *p, struct autosa_prog *prog,\n    struct autosa_hw_module *module, struct hls_info *hls,\n    int inter, int boundary, int serialize, int types)\n{\n  p = print_module_core_header_tapa(p, prog, module, inter, boundary, serialize, types);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_module_wrapper_header_tapa(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_hw_module *module,\n    int inter, int boundary)\n{\n  int n = isl_id_list_n_id(module->inst_ids);\n  if (n > 0 && prog->scop->options->autosa->use_cplusplus_template) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"template<\");\n    for (int i = 0; i < n; i++) {\n      if (i > 0)\n        p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, \"int p\");\n      p = isl_printer_print_int(p, i);\n    }\n    p = isl_printer_print_str(p, \">\");\n    p = isl_printer_end_line(p);\n  }\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"void \");\n  p = isl_printer_print_str(p, module->name);\n  if (inter == 0)\n    p = isl_printer_print_str(p, \"_intra_trans\");\n  else if (inter == 1)\n    p = isl_printer_print_str(p, \"_inter_trans\");\n  if (boundary)\n    p = isl_printer_print_str(p, \"_boundary\");\n  p = isl_printer_print_str(p, \"_wrapper\");\n  p = isl_printer_print_str(p, \"(\");\n  p = print_module_arguments(p, prog, module->kernel, module, 1,\n                             TAPA_HW, inter, -1, boundary, 0);\n  p = isl_printer_print_str(p, \")\");\n\n  return p;\n}\n\nstatic isl_stat print_module_wrapper_headers_tapa(\n    struct autosa_prog *prog, struct autosa_hw_module *module,\n    struct hls_info *hls, int inter, int boundary)\n{\n  isl_printer *p;\n\n  p = isl_printer_to_file(prog->ctx, hls->kernel_h);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_module_wrapper_header_tapa(p, prog, module, inter, boundary);\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  p = isl_printer_to_file(prog->ctx, hls->kernel_c);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_module_wrapper_header_tapa(p, prog, module, inter, boundary);\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  return isl_stat_ok;\n}\n\n/* Print the serializaztion module that connects the external memory to the\n * top-level I/O module.\n */\nstatic __isl_give isl_printer *autosa_print_serialize_module(\n  __isl_take isl_printer *p,\n  struct autosa_hw_module *module, struct autosa_prog *prog,\n  struct hls_info *hls, int boundary)\n{\n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  /* Print core. */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = print_module_core_headers_tapa(p, prog, module, hls, -1, boundary, 1, 1); // TODO\n  fprintf(hls->kernel_c, \" {\\n\");\n  fprintf(hls->kernel_c, \"#pragma HLS INLINE OFF\\n\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  if (!prog->scop->options->autosa->use_cplusplus_template) {\n    p = print_module_iterators(p, hls->kernel_c, module);\n  }\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  p = print_module_serialize_body(p, module, hls);\n  p = isl_printer_indent(p, -2);\n  fprintf(hls->kernel_c, \"}\\n\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n  return p;\n}\n\n/* Print the default module.\n * For PE modules, we will print a wrapper function to speedup the HLS\n * synthesis.\n * For the rest of the modules, wrapper is disabled.\n */\nstatic __isl_give isl_printer *autosa_print_default_module(\n  __isl_take isl_printer *p,\n  struct autosa_hw_module *module, struct autosa_prog *prog,\n  struct hls_info *hls, int boundary)\n{\n  if (!boundary) {\n    if (!module->device_tree)\n      return p;\n  } else {\n    if (!module->boundary_tree)\n      return p;\n  }\n\n  bool wrapper = 0;\n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  /* Print wrapper for PE and L1 IO module */\n  if (module->type == PE_MODULE || (module->type != PE_MODULE && module->level == 1))\n    wrapper = 1;\n\n  /* Print core. */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = print_module_core_headers_tapa(p, prog, module, hls, -1, boundary, 0, 1);\n  fprintf(hls->kernel_c, \" {\\n\");\n  if (!boundary || !wrapper)\n    fprintf(hls->kernel_c, \"#pragma HLS INLINE OFF\\n\");\n  else\n    fprintf(hls->kernel_c, \"#pragma HLS INLINE\\n\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  if (!prog->scop->options->autosa->use_cplusplus_template) {\n    p = print_module_iterators(p, hls->kernel_c, module);\n  }\n  if (prog->scop->options->autosa->block_sparse) {\n    for (int i = 0; i < module->n_io_group; i++) {\n      struct autosa_array_ref_group *group = module->io_groups[i];\n      if (group->local_array->array_type == AUTOSA_EXT_ARRAY) {\n        int n_lane = get_io_group_n_lane(module, NULL, group);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, group->array->name);\n        if (group->local_array->is_sparse)\n          p = isl_printer_print_str(p, \"_s_t\");\n        else\n          p = isl_printer_print_str(p, \"_t\");\n        p = isl_printer_print_int(p, n_lane);\n        p = isl_printer_print_str(p, \" fifo_data_\");\n        p = isl_printer_print_str(p, group->array->name);\n        p = isl_printer_print_str(p, \";\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n  p = print_module_vars_tapa(p, module, -1);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  if (module->credit && !module->in)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"credit.write(1);\");\n    p = isl_printer_end_line(p);\n  }\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &print_for_tapa, &hw_data);\n\n  if (!boundary)\n    p = isl_ast_node_print(module->device_tree, p, print_options);\n  else\n    p = isl_ast_node_print(module->boundary_tree, p, print_options);\n\n  if (module->credit && module->in)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int token = credit.read();\");\n    p = isl_printer_end_line(p);\n  }\n\n  p = isl_printer_indent(p, -2);\n\n  fprintf(hls->kernel_c, \"}\\n\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n\n  if (wrapper) {\n    /* Print wrapper. */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"/* Module Definition */\");\n    p = isl_printer_end_line(p);\n\n    print_module_wrapper_headers_tapa(prog, module, hls, -1, boundary);\n\n    fprintf(hls->kernel_c, \" {\\n\");\n    p = isl_printer_indent(p, 2);\n\n    p = print_module_core_headers_tapa(p, prog, module, hls, -1, boundary, 0, 0);\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_indent(p, -2);\n    fprintf(hls->kernel_c, \"}\\n\");\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"/* Module Definition */\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_end_line(p);\n  }\n\n  /* If the module serialization is enabled, we will print out an extra module\n   * for serializing the data. */\n  if (module->to_mem && module->options->autosa->host_serialize) {\n    p = autosa_print_serialize_module(p, module, prog, hls, boundary);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_pe_dummy_module_core_header_tapa(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_pe_dummy_module *module, int types)\n{\n  struct autosa_array_ref_group *group = module->io_group;\n\n  p = isl_printer_start_line(p);\n  if (types)\n    p = isl_printer_print_str(p, \"void \");\n  // group_name\n  p = isl_printer_print_str(p, group->array->name);\n  if (group->group_type == AUTOSA_IO_GROUP)\n  {\n    if (group->local_array->n_io_group > 1)\n    {\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_int(p, group->nr);\n    }\n  }\n  else if (group->group_type == AUTOSA_DRAIN_GROUP)\n  {\n    p = isl_printer_print_str(p, \"_\");\n    p = isl_printer_print_str(p, \"drain\");\n  }\n  p = isl_printer_print_str(p, \"_PE_dummy\");\n  p = isl_printer_print_str(p, module->in? \"_in\" : \"_out\");\n  p = isl_printer_print_str(p, \"(\");\n  p = print_pe_dummy_module_arguments(p, prog, module->module->kernel,\n                                      module, types, TAPA_HW);\n  p = isl_printer_print_str(p, \")\");\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_pe_dummy_module_core_headers_tapa(\n    __isl_take isl_printer *p, struct autosa_prog *prog,\n    struct autosa_pe_dummy_module *module, struct hls_info *hls, int types)\n{\n  p = print_pe_dummy_module_core_header_tapa(p, prog, module, types);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_pe_dummy_module_wrapper_header_tapa(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_pe_dummy_module *module)\n{\n  struct autosa_array_ref_group *group = module->io_group;\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"void \");\n  // group_name\n  p = isl_printer_print_str(p, group->array->name);\n  if (group->group_type == AUTOSA_IO_GROUP)\n  {\n    if (group->local_array->n_io_group > 1)\n    {\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_int(p, group->nr);\n    }\n  }\n  else if (group->group_type == AUTOSA_DRAIN_GROUP)\n  {\n    p = isl_printer_print_str(p, \"_\");\n    p = isl_printer_print_str(p, \"drain\");\n  }\n  p = isl_printer_print_str(p, \"_PE_dummy\");\n  p = isl_printer_print_str(p, module->in? \"_in\": \"_out\");\n  p = isl_printer_print_str(p, \"_wrapper\");\n  p = isl_printer_print_str(p, \"(\");\n  p = print_pe_dummy_module_arguments(p, prog, module->module->kernel,\n                                      module, 1, TAPA_HW);\n  p = isl_printer_print_str(p, \")\");\n\n  return p;\n}\n\nstatic isl_stat print_pe_dummy_module_wrapper_headers_tapa(\n    struct autosa_prog *prog, struct autosa_pe_dummy_module *module,\n    struct hls_info *hls)\n{\n  isl_printer *p;\n\n  p = isl_printer_to_file(prog->ctx, hls->kernel_h);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_pe_dummy_module_wrapper_header_tapa(p, prog, module);\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  p = isl_printer_to_file(prog->ctx, hls->kernel_c);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_pe_dummy_module_wrapper_header_tapa(p, prog, module);\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  return isl_stat_ok;\n}\n\nstatic __isl_give isl_printer *autosa_print_default_pe_dummy_module(\n    __isl_take isl_printer *p,\n    struct autosa_pe_dummy_module *pe_dummy_module,\n    struct autosa_prog *prog, struct hls_info *hls, int boundary)\n{\n  /* For dummy module, we disable wrapper by default due to the relatively\n   * high overheads.\n   */\n  bool wrapper = 0;\n  struct autosa_hw_module *module = pe_dummy_module->module;\n  struct print_hw_module_data hw_data = {hls, prog, module};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  /* Print core. */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = print_pe_dummy_module_core_headers_tapa(p, prog,\n                                              pe_dummy_module, hls, 1);\n\n  fprintf(hls->kernel_c, \" {\\n\");\n  if (wrapper)\n    fprintf(hls->kernel_c, \"#pragma HLS INLINE\\n\");\n\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  if (!prog->scop->options->autosa->use_cplusplus_template) {\n    p = print_module_iterators(p, hls->kernel_c, module);\n  }\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n\n  p = isl_printer_end_line(p);\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &print_for_tapa, &hw_data);\n\n  p = isl_ast_node_print(pe_dummy_module->device_tree, p, print_options);\n\n  p = isl_printer_indent(p, -2);\n\n  fprintf(hls->kernel_c, \"}\\n\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n\n  /* Print wrapper. */\n  if (wrapper) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"/* Module Definition */\");\n    p = isl_printer_end_line(p);\n\n    print_pe_dummy_module_wrapper_headers_tapa(prog, pe_dummy_module, hls);\n\n    fprintf(hls->kernel_c, \" {\\n\");\n    p = isl_printer_indent(p, 2);\n    p = print_pe_dummy_module_core_headers_tapa(p, prog, pe_dummy_module, hls, 0);\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, -2);\n    fprintf(hls->kernel_c, \"}\\n\");\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"/* Module Definition */\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_end_line(p);\n  }\n\n  return p;\n}\n\nstruct print_db_module_while_data {\n  int inter; // -1: outer 0: intra 1: inter\n  int under_if;\n  int reach_user;\n\n  isl_printer *p_for;\n  isl_printer *p_user;\n  /* Outer */\n  std::vector<char *> outer_for_logic;\n  std::vector<char *> outer_iterator_name;\n  std::vector<char *> outer_iterator_lb;\n  std::vector<char *> outer_iterator_ub;\n  int outer_for_level;\n  /* Inter */\n  std::vector<char *> inter_for_logic;\n  std::vector<char *> inter_iterator_name;\n  std::vector<char *> inter_iterator_lb;\n  std::vector<char *> inter_iterator_ub;\n  int inter_for_level;\n  /* Intra */\n  std::vector<char *> intra_for_logic;\n  std::vector<char *> intra_iterator_name;\n  std::vector<char *> intra_iterator_lb;\n  std::vector<char *> intra_iterator_ub;\n  int intra_for_level;\n};\n\nstatic __isl_give isl_printer *print_double_buffer_module_vars_while(\n  __isl_take isl_printer *p, struct autosa_hw_module *module,\n  struct hls_info *hls,\n  struct print_db_module_while_data *data)\n{\n  /* Inst ids */\n  if (!module->options->autosa->use_cplusplus_template) {\n    p = print_module_iterators(p, hls->kernel_c, module);\n  }\n  /* Local buffer */\n  for (int i = 0; i < module->n_var; i++) {\n    struct autosa_kernel_var *var = &module->var[i];\n    p = isl_printer_start_line(p);\n    if (var->n_lane == 1)\n      p = isl_printer_print_str(p, var->array->type);\n    else\n    {\n      p = isl_printer_print_str(p, var->array->name);\n      p = isl_printer_print_str(p, \"_t\");\n      p = isl_printer_print_int(p, var->n_lane);\n    }\n    p = isl_printer_print_str(p, \" \");\n    p = isl_printer_print_str(p, var->name);\n    p = isl_printer_print_str(p, \"[2]\");\n    for (int j = 0; j < isl_vec_size(var->size); j++) {\n      isl_val *v;\n\n      p = isl_printer_print_str(p, \"[\");\n      v = isl_vec_get_element_val(var->size, j);\n      p = isl_printer_print_val(p, v);\n      isl_val_free(v);\n      p = isl_printer_print_str(p, \"]\");\n    }\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n  }\n\n  /* State handle variables */\n  p = print_str_new_line(p, \"bool arb = 0;\");\n  p = print_str_new_line(p, module->in? \"bool inter_trans_en = 1;\" : \"bool inter_trans_en = 0;\");\n  p = print_str_new_line(p, module->in? \"bool intra_trans_en = 0;\" : \"bool intra_trans_en = 1;\");\n  p = print_str_new_line(p, module->in? \"bool inter_done = 0;\" : \"bool inter_done = 1;\");\n  p = print_str_new_line(p, module->in? \"bool intra_done = 1;\" : \"bool intra_done = 0;\");\n  /* Iterators */\n  for (int i = 0; i < data->outer_iterator_name.size(); i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, data->outer_iterator_name[i]);\n    free(data->outer_iterator_name[i]);\n    p = isl_printer_print_str(p, \" = \");\n    p = isl_printer_print_str(p, data->outer_iterator_lb[i]);\n    free(data->outer_iterator_lb[i]);\n    p = isl_printer_print_str(p, \"; \");\n    p = isl_printer_print_str(p, \"/* UB: \");\n    p = isl_printer_print_str(p, data->outer_iterator_ub[i]);\n    free(data->outer_iterator_ub[i]);\n    p = isl_printer_print_str(p, \" */\");\n    p = isl_printer_end_line(p);\n  }\n  for (int i = 0; i < data->inter_iterator_name.size(); i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, data->inter_iterator_name[i]);\n    free(data->inter_iterator_name[i]);\n    p = isl_printer_print_str(p, \" = \");\n    p = isl_printer_print_str(p, data->inter_iterator_lb[i]);\n    free(data->inter_iterator_lb[i]);\n    p = isl_printer_print_str(p, \"; \");\n    p = isl_printer_print_str(p, \"/* UB: \");\n    p = isl_printer_print_str(p, data->inter_iterator_ub[i]);\n    free(data->inter_iterator_ub[i]);\n    p = isl_printer_print_str(p, \" */\");\n    p = isl_printer_end_line(p);\n  }\n  for (int i = 0; i < data->intra_iterator_name.size(); i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, data->intra_iterator_name[i]);\n    free(data->intra_iterator_name[i]);\n    p = isl_printer_print_str(p, \" = \");\n    p = isl_printer_print_str(p, data->intra_iterator_lb[i]);\n    free(data->intra_iterator_lb[i]);\n    p = isl_printer_print_str(p, \"; \");\n    p = isl_printer_print_str(p, \"/* UB: \");\n    p = isl_printer_print_str(p, data->intra_iterator_ub[i]);\n    free(data->intra_iterator_ub[i]);\n    p = isl_printer_print_str(p, \" */\");\n    p = isl_printer_end_line(p);\n  }\n\n  p = print_str_new_line(p, \"bool last_run = false;\");\n\n  return p;\n}\n\n/* Count the for level.\n */\nstatic __isl_give isl_printer *count_module_for(__isl_take isl_printer *p,\n                                                __isl_take isl_ast_print_options *print_options,\n                                                __isl_keep isl_ast_node *node, void *user)\n{\n  struct print_db_module_while_data *data = (struct print_db_module_while_data *)user;\n  isl_ast_node *body;\n\n  if (data->inter == -1)\n    data->outer_for_level++;\n  else if (data->inter == 0)\n    data->intra_for_level++;\n  else if (data->inter == 1)\n    data->inter_for_level++;\n\n  body = isl_ast_node_for_get_body(node);\n  p = isl_ast_node_print(body, p, print_options);\n  isl_ast_node_free(body);\n\n  return p;\n}\n\n/* Count the for level. A different implementation.\n * Currently only used for inter_trans module.\n * Since there might be if branches existing, only count one branch.\n * We assume the two branches are with the equal depth.\n */\nstatic isl_bool count_module_for_alt(__isl_keep isl_ast_node *node, void *user) {\n  struct print_db_module_while_data *data = (struct print_db_module_while_data *)user;\n  if (isl_ast_node_get_type(node) == isl_ast_node_if) {\n    data->under_if = 1;\n  }\n\n  if (isl_ast_node_get_type(node) == isl_ast_node_for) {\n    if (data->under_if == 0 || (data->under_if == 1 && data->reach_user == 0)) {\n      data->inter_for_level++;\n    }\n  }\n  if (isl_ast_node_get_type(node) == isl_ast_node_user) {\n    data->reach_user = 1;\n  }\n\n  return isl_bool_true;\n}\n\n/* Extract the loop information.\n */\nstatic __isl_give isl_printer *extract_module_for(__isl_take isl_printer *p,\n                                                  __isl_take isl_ast_print_options *print_options,\n                                                  __isl_keep isl_ast_node *node, void *user)\n{\n  struct print_db_module_while_data *data = (struct print_db_module_while_data *)user;\n  isl_ast_expr *iterator, *init, *cond, *ub;\n  const char *iterator_suffix;\n  isl_printer *p_local, *p_str;\n  char *text;\n  std::vector<char *> text_lines;\n  isl_ast_node *body;\n\n  p_local = data->p_for;\n\n  /* Extract the lower bound and upper bound. */\n  iterator = isl_ast_node_for_get_iterator(node);\n  init = isl_ast_node_for_get_init(node);\n  cond = isl_ast_node_for_get_cond(node);\n  ub = isl_ast_expr_op_get_arg(cond, 1);\n\n  p_str = isl_printer_to_str(isl_ast_node_get_ctx(node));\n  p_str = isl_printer_set_output_format(p_str, ISL_FORMAT_C);\n  //p_str = isl_printer_print_str(p_str, iterator_suffix);\n  p_str = isl_printer_print_ast_expr(p_str, iterator);\n  if (data->inter == -1)\n    data->outer_iterator_name.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 0)\n    data->intra_iterator_name.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 1)\n    data->inter_iterator_name.push_back(isl_printer_get_str(p_str));\n  isl_printer_flush(p_str);\n\n  p_str = isl_printer_print_ast_expr(p_str, ub);\n  if (data->inter == -1)\n    data->outer_iterator_ub.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 0)\n    data->intra_iterator_ub.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 1)\n    data->inter_iterator_ub.push_back(isl_printer_get_str(p_str));\n  isl_printer_flush(p_str);\n\n  p_str = isl_printer_print_ast_expr(p_str, init);\n  if (data->inter == -1)\n    data->outer_iterator_lb.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 0)\n    data->intra_iterator_lb.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 1)\n    data->inter_iterator_lb.push_back(isl_printer_get_str(p_str));\n  isl_printer_free(p_str);\n\n  p_local = isl_printer_indent(p_local, -4);\n\n  p_local = isl_printer_start_line(p_local);\n  p_local = isl_printer_print_ast_expr(p_local, iterator);\n  p_local = isl_printer_print_str(p_local, \"++;\");\n  p_local = isl_printer_end_line(p_local);\n  text = isl_printer_get_str(p_local);\n  text_lines.push_back(text);\n  p_local = isl_printer_flush(p_local);\n\n  p_local = isl_printer_start_line(p_local);\n  p_local = isl_printer_print_str(p_local, \"if (\");\n  p_local = isl_printer_print_ast_expr(p_local, iterator);\n  p_local = isl_printer_print_str(p_local, \" == \");\n  p_local = isl_printer_print_ast_expr(p_local, ub);\n  p_local = isl_printer_print_str(p_local, \" + 1) {\");\n  p_local = isl_printer_end_line(p_local);\n  text = isl_printer_get_str(p_local);\n  text_lines.push_back(text);\n  p_local = isl_printer_flush(p_local);\n\n  p_local = isl_printer_indent(p_local, 4);\n  p_local = isl_printer_start_line(p_local);\n  p_local = isl_printer_print_ast_expr(p_local, iterator);\n  p_local = isl_printer_print_str(p_local, \" = \");\n  p_local = isl_printer_print_ast_expr(p_local, init);\n  p_local = isl_printer_print_str(p_local, \";\");\n  p_local = isl_printer_end_line(p_local);\n  text = isl_printer_get_str(p_local);\n  text_lines.push_back(text);\n  p_local = isl_printer_flush(p_local);\n\n  if (data->inter == -1)\n    data->outer_for_logic.insert(data->outer_for_logic.begin(), text_lines.begin(), text_lines.end());\n  else if (data->inter == 0)\n    data->intra_for_logic.insert(data->intra_for_logic.begin(), text_lines.begin(), text_lines.end());\n  else if (data->inter == 1)\n    data->inter_for_logic.insert(data->inter_for_logic.begin(), text_lines.begin(), text_lines.end());\n\n  isl_ast_expr_free(iterator);\n  isl_ast_expr_free(init);\n  isl_ast_expr_free(cond);\n  isl_ast_expr_free(ub);\n\n  p_local = isl_printer_indent(p_local, -4);\n\n  body = isl_ast_node_for_get_body(node);\n  p = isl_ast_node_print(body, p, print_options);\n  isl_ast_node_free(body);\n\n  return p;\n}\n\nstatic void extract_double_buffer_module_while_data(\n  struct autosa_hw_module *module, int boundary,\n  struct print_db_module_while_data *data)\n{\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = module->kernel->ctx;\n  isl_printer *p_for, *p_user, *p;\n  const char *for_logic, *user_logic;\n\n  /* Outer module */\n  data->inter = -1;\n  p = isl_printer_to_str(ctx);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p_for = isl_printer_to_str(ctx);\n  p_for = isl_printer_set_output_format(p_for, ISL_FORMAT_C);\n  p_user = isl_printer_to_str(ctx);\n  p_user = isl_printer_set_output_format(p_user, ISL_FORMAT_C);\n  data->p_for = p_for;\n  data->p_user = p_user;\n  data->outer_for_level = 0;\n\n  /* Count the for level first. */\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &count_module_for, data);\n  if (!boundary)\n    p = isl_ast_node_print(module->device_tree, p, print_options);\n  else\n    p = isl_ast_node_print(module->boundary_tree, p, print_options);\n\n  /* Extract the for and user logic. */\n  data->p_for = isl_printer_indent(data->p_for, 4 * data->outer_for_level);\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &extract_module_for, data);\n  if (!boundary)\n    p = isl_ast_node_print(module->device_tree, p, print_options);\n  else\n    p = isl_ast_node_print(module->boundary_tree, p, print_options);\n  isl_printer_free(p);\n  isl_printer_free(data->p_for);\n  isl_printer_free(data->p_user);\n\n  /* Intra module */\n  data->inter = 0;\n  p = isl_printer_to_str(ctx);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p_for = isl_printer_to_str(ctx);\n  p_for = isl_printer_set_output_format(p_for, ISL_FORMAT_C);\n  p_user = isl_printer_to_str(ctx);\n  p_user = isl_printer_set_output_format(p_user, ISL_FORMAT_C);\n  data->p_for = p_for;\n  data->p_user = p_user;\n  data->intra_for_level = 0;\n\n  /* Count the for level first. */\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &count_module_for, data);\n  p = isl_ast_node_print(module->intra_tree, p, print_options);\n\n  /* Extract the for logic. */\n  data->p_for = isl_printer_indent(data->p_for, 4 * data->intra_for_level);\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &extract_module_for, data);\n  p = isl_ast_node_print(module->intra_tree, p, print_options);\n  isl_printer_free(p);\n  isl_printer_free(data->p_for);\n  isl_printer_free(data->p_user);\n\n  /* Inter module */\n  data->inter = 1;\n  data->under_if = 0;\n  data->reach_user = 0;\n  p = isl_printer_to_str(ctx);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p_for = isl_printer_to_str(ctx);\n  p_for = isl_printer_set_output_format(p_for, ISL_FORMAT_C);\n  p_user = isl_printer_to_str(ctx);\n  p_user = isl_printer_set_output_format(p_user, ISL_FORMAT_C);\n  data->p_for = p_for;\n  data->p_user = p_user;\n  data->inter_for_level = 0;\n\n  /* Count the for level first. */\n  if (!boundary) {\n    isl_ast_node_foreach_descendant_top_down(module->inter_tree, &count_module_for_alt, data);\n  } else {\n    isl_ast_node_foreach_descendant_top_down(module->boundary_inter_tree, &count_module_for_alt, data);\n  }\n\n  /* Extract the for logic. */\n  data->p_for = isl_printer_indent(data->p_for, 4 * data->inter_for_level);\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &extract_module_for, data);\n  if (!boundary)\n    p = isl_ast_node_print(module->inter_tree, p, print_options);\n  else\n    p = isl_ast_node_print(module->boundary_inter_tree, p, print_options);\n  isl_printer_free(p);\n  isl_printer_free(data->p_for);\n  isl_printer_free(data->p_user);\n}\n\nstatic __isl_give isl_printer *print_null_for(__isl_take isl_printer *p,\n                                              __isl_take isl_ast_print_options *print_options,\n                                              __isl_keep isl_ast_node *node, void *user)\n{\n  isl_ast_node *body;\n\n  body = isl_ast_node_for_get_body(node);\n  p = isl_ast_node_print(body, p, print_options);\n  isl_ast_node_free(body);\n\n  return p;\n}\n\n/* Print the inter_trans module in double buffer mode.\n */\nstatic __isl_give isl_printer *autosa_print_inter_trans_module_double_buffer(\n  __isl_take isl_printer *p,\n  struct autosa_hw_module *module, struct autosa_prog *prog,\n  struct hls_info *hls, int boundary)\n{\n  struct print_hw_module_data hw_data = {hls, prog, module, \"inter_c\"};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &print_null_for, &hw_data);\n\n  p = isl_ast_node_print((boundary == 0) ? module->inter_tree : module->boundary_inter_tree, p, print_options);\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print the intra_trans module in double buffer mode.\n */\nstatic __isl_give isl_printer *autosa_print_intra_trans_module_double_buffer(\n  __isl_take isl_printer *p,\n  struct autosa_hw_module *module, struct autosa_prog *prog,\n  struct hls_info *hls, int boundary)\n{\n  struct print_hw_module_data hw_data = {hls, prog, module, \"intra_c\"};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &print_null_for, &hw_data);\n\n  p = isl_ast_node_print(module->intra_tree, p, print_options);\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print the double buffer module using while loops instead of for loops.\n * First, we will change the buffer to\n * local_buffer[2][...][...].\n *\n * Specifically, when handling a code structure:\n * [outer for loops]\n * for ...\n *   for ...\n * [outer for loops]\n * {\n *   if (arb) {\n *     ld(local_buffer_ping, ld_en);\n *     st(local_buffer_pong, st_en);\n *   else {\n *     ld(local_buffer_pong, ld_en);\n *     st(local_buffer_ping, st_en);\n *   }\n *   [state handle logic]\n *   arb = !arb;\n *   [state handle logic]\n * }\n * [last batch]\n * if (arb) {\n *   st(local_buffer_pong, st_en);\n * } else {\n *   st(local_buffer_ping, st_en);\n * }\n * [last batch]\n * We will convert it to a new code structure:\n * while (1) {\n *   if (ld_en) {\n *     [inlined logic]\n *     ld(local_buffer[arb][...]);\n *     [inlined logic]\n *   }\n *   if (st_en) {\n *     [inlined logic]\n *     st(local_buffer[!arb][...]);\n *     [inlined logic]\n *   }\n *   [state handle logic]\n *   arb = !arb;\n *   ld_en = 1;\n *   st_en = 1;\n *   [state handle logic]\n *   [outer for loops]\n *   outer_iter0++;\n *   if (outer_iter0 == ...) {\n *     outer_iter0 = 0;\n *     [last batch]\n *     ld_en = 0;\n *     [last batch]\n *   }\n *   [outer for loops]\n * }\n *\n * Note that this only works if each for loop structure is a perfectly\n * nested loop so that we could convert to a while loop.\n */\nstatic __isl_give isl_printer *print_double_buffer_module_while(\n  __isl_take isl_printer *p, struct autosa_hw_module *module,\n  struct autosa_prog *prog, struct hls_info *hls, int boundary)\n{\n  if (!boundary) {\n    if (!module->device_tree)\n      return p;\n  } else {\n    if (!module->boundary_tree)\n      return p;\n  }\n\n  struct print_db_module_while_data print_data;\n\n  /* Extract the code snippets. */\n  extract_double_buffer_module_while_data(module, boundary, &print_data);\n\n  /* Print header */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  print_module_headers_tapa(prog, module, hls, -1, boundary);\n  p = print_str_new_line(p, \"{\");\n  p = isl_printer_indent(p, 2);\n\n  /* Print variables */\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = print_double_buffer_module_vars_while(p, module, hls, &print_data);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  /* Print content */\n  p = print_str_new_line(p, \"while (1) {\");\n  p = print_str_new_line(p, \"#pragma HLS PIPELINE II=1\");\n  p = isl_printer_indent(p, 2);\n\n  /* Print inter_trans */\n  p = print_str_new_line(p, \"if (inter_trans_en) {\");\n  p = isl_printer_indent(p, 2);\n  /* Print the module logic */\n  p = autosa_print_inter_trans_module_double_buffer(p, module, prog, hls, boundary);\n  /* Print the loop counter */\n  for (int i = 0; i < print_data.inter_for_logic.size(); i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, print_data.inter_for_logic[i]);\n    free(print_data.inter_for_logic[i]);\n  }\n  p = isl_printer_indent(p, 4 * print_data.inter_for_level);\n  p = print_str_new_line(p, \"inter_done = 1;\");\n  p = print_str_new_line(p, \"inter_trans_en = 0;\");\n  for (int i = 0; i < print_data.inter_for_level; i++) {\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  /* Print intra_trans */\n  p = print_str_new_line(p, \"if (intra_trans_en) {\");\n  p = isl_printer_indent(p, 2);\n  /* Print the module logic */\n  p = autosa_print_intra_trans_module_double_buffer(p, module, prog, hls, boundary);\n  /* Print the loop counter */\n  for (int i = 0; i < print_data.intra_for_logic.size(); i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, print_data.intra_for_logic[i]);\n    free(print_data.intra_for_logic[i]);\n  }\n  p = isl_printer_indent(p, 4 * print_data.intra_for_level);\n  p = print_str_new_line(p, \"intra_done = 1;\");\n  p = print_str_new_line(p, \"intra_trans_en = 0;\");\n  for (int i = 0; i < print_data.intra_for_level; i++) {\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  /* Print state_handle */\n  p = print_str_new_line(p, \"if (inter_done && intra_done) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"if (last_run) break;\");\n  p = print_str_new_line(p, \"intra_trans_en = 1;\");\n  p = print_str_new_line(p, \"inter_trans_en = 1;\");\n  p = print_str_new_line(p, \"intra_done = 0;\");\n  p = print_str_new_line(p, \"inter_done = 0;\");\n  p = print_str_new_line(p, \"arb = !arb;\");\n  /* Print the loop counter */\n  for (int i = 0; i < print_data.outer_for_logic.size(); i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, print_data.outer_for_logic[i]);\n    free(print_data.outer_for_logic[i]);\n  }\n  p = isl_printer_indent(p, 4 * print_data.outer_for_level);\n  p = print_str_new_line(p, module->in? \"inter_trans_en = 0;\" : \"intra_trans_en = 0;\");\n  p = print_str_new_line(p, module->in? \"inter_done = 1;\" : \"intra_done = 1;\");\n  p = print_str_new_line(p, \"last_run = true;\");\n  for (int i = 0; i < print_data.outer_for_level; i++) {\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  /* If the module serialization is enabled, we will print out an extra module\n   * for serializing the data. */\n  if (module->to_mem && module->options->autosa->host_serialize) {\n    p = autosa_print_serialize_module(p, module, prog, hls, boundary);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *autosa_print_host_code(__isl_take isl_printer *p,\n                                                      struct autosa_prog *prog, __isl_keep isl_ast_node *tree,\n                                                      struct autosa_hw_module **modules, int n_modules,\n                                                      struct autosa_hw_top_module *top,\n                                                      struct autosa_drain_merge_func **drain_merge_funcs, int n_drain_merge_funcs,\n                                                      struct hls_info *hls)\n{\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_ast_node_get_ctx(tree);\n  struct print_host_user_data data = {hls, prog, top};\n  struct print_hw_module_data hw_data = {hls, prog, NULL};\n  isl_printer *p_module;\n\n  /* Print the data pack types in the program. */\n  print_data_types_tapa(top, hls);\n\n  /* Print the macros for sparse data structure */\n  if (prog->scop->options->autosa->block_sparse) {\n    print_sparse_macros(top->kernel, hls);\n  }\n\n  /* Print the helper functions in the program. */\n  print_drain_merge_funcs(top->kernel, drain_merge_funcs, n_drain_merge_funcs, hls);\n\n  /* Print the host data serialization function. */\n  print_host_serialize_funcs(top->kernel, modules, n_modules, hls); // TODO\n\n  /* Print the default AST. */\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_host_user_tapa, &data);\n\n  /* Print the macros definitions in the program. */\n  p = autosa_print_macros(p, tree);\n  p = isl_ast_node_print(tree, p, print_options);\n\n  /* Print the hw module ASTs. */\n  p_module = isl_printer_to_file(ctx, hls->kernel_c);\n  p_module = isl_printer_set_output_format(p_module, ISL_FORMAT_C);\n\n  for (int i = 0; i < n_modules; i++)\n  {\n    if (modules[i]->double_buffer && modules[i]->options->autosa->double_buffer_style == 0)\n    {\n      p_module = print_double_buffer_module_while(p_module, modules[i], prog, hls, 0);\n      if (modules[i]->boundary) {\n        p_module = print_double_buffer_module_while(p_module, modules[i], prog, hls, 1);\n      }\n    } else {\n      if (modules[i]->is_filter && modules[i]->is_buffer)\n      {\n        /* Print out the definitions for inter_trans and intra_trans function calls. */\n        /* Intra transfer function */\n        p_module = autosa_print_intra_trans_module(p_module, modules[i], prog, hls, 0);\n\n        /* Inter transfer function */\n        p_module = autosa_print_inter_trans_module(p_module, modules[i], prog, hls, 0);\n        if (modules[i]->boundary)\n          p_module = autosa_print_inter_trans_module(p_module, modules[i], prog, hls, 1);\n      }\n\n      p_module = autosa_print_default_module(p_module, modules[i], prog, hls, 0);\n\n      if (modules[i]->boundary)\n      {\n        /* Print out the definitions for boundary trans function calls. */\n        p_module = autosa_print_default_module(p_module, modules[i], prog, hls, 1);\n      }\n\n      if (modules[i]->n_pe_dummy_modules > 0)\n      {\n        /* Print out the definitions for pe dummy function calls. */\n        for (int j = 0; j < modules[i]->n_pe_dummy_modules; j++)\n        {\n          p_module = autosa_print_default_pe_dummy_module(\n              p_module, modules[i]->pe_dummy_modules[j], prog, hls, 0);\n        }\n      }\n    }\n  }\n  isl_printer_free(p_module);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_top_module_headers_tapa(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_hw_top_module *top, struct hls_info *hls)\n{\n  struct autosa_kernel *kernel = top->kernel;\n\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"void kernel\");\n  p = isl_printer_print_int(p, 0);\n  p = isl_printer_print_str(p, \"(\");\n  p = print_kernel_arguments(p, prog, top->kernel, 1, hls);\n  p = isl_printer_print_str(p, \")\\\");\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"{\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  return p;\n}\n\nstatic char *extract_fifo_name_from_fifo_decl_name(isl_ctx *ctx, char *fifo_decl_name)\n{\n  int loc = 0;\n  char ch;\n  isl_printer *p_str = isl_printer_to_str(ctx);\n  char *name = NULL;\n\n  while ((ch = fifo_decl_name[loc]) != '\\0')\n  {\n    if (ch == '.')\n      break;\n    char buf[2];\n    buf[0] = ch;\n    buf[1] = '\\0';\n    p_str = isl_printer_print_str(p_str, buf);\n    loc++;\n  }\n\n  name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  return name;\n}\n\nstatic char *extract_fifo_width_from_fifo_decl_name(isl_ctx *ctx, char *fifo_decl_name)\n{\n  int loc = 0;\n  char ch;\n  isl_printer *p_str = isl_printer_to_str(ctx);\n  char *name = NULL;\n\n  while ((ch = fifo_decl_name[loc]) != '\\0')\n  {\n    if (ch == '.')\n      break;\n    loc++;\n  }\n\n  loc++;\n\n  while ((ch = fifo_decl_name[loc]) != '\\0')\n  {\n    char buf[2];\n    buf[0] = ch;\n    buf[1] = '\\0';\n    p_str = isl_printer_print_str(p_str, buf);\n    loc++;\n  }\n\n  name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  return name;\n}\n\nstatic __isl_give isl_printer *print_top_module_fifo_stmt(__isl_take isl_printer *p,\n                                                          __isl_take isl_ast_print_options *print_options,\n                                                          __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  struct autosa_kernel_stmt *stmt;\n  struct print_hw_module_data *data = (struct print_hw_module_data *)(user);\n\n  id = isl_ast_node_get_annotation(node);\n  stmt = (struct autosa_kernel_stmt *)isl_id_get_user(id);\n  isl_id_free(id);\n\n  isl_ast_print_options_free(print_options);\n\n  switch (stmt->type)\n  {\n  case AUTOSA_KERNEL_STMT_FIFO_DECL:\n    return autosa_kernel_print_fifo_decl(p, stmt, data->prog, data->hls);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_top_module_call_stmt(\n  __isl_take isl_printer *p,\n  __isl_take isl_ast_print_options *print_options,\n  __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  struct autosa_kernel_stmt *stmt;\n  struct print_hw_module_data *data = (struct print_hw_module_data *)(user);\n\n  id = isl_ast_node_get_annotation(node);\n  stmt = (struct autosa_kernel_stmt *)isl_id_get_user(id);\n  isl_id_free(id);\n\n  isl_ast_print_options_free(print_options);\n\n  switch (stmt->type)\n  {\n  case AUTOSA_KERNEL_STMT_MODULE_CALL:\n    return autosa_kernel_print_module_call(p, stmt, data->prog, data->hls->target);\n  }\n\n  return p;\n}\n\n/* This function prints the code that prints out the top function that\n * calls the hardware modules and declares the fifos.\n */\nstatic void print_top_gen_host_code(\n    struct autosa_prog *prog, __isl_keep isl_ast_node *node,\n    struct autosa_hw_top_module *top, struct hls_info *hls)\n{\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_ast_node_get_ctx(node);\n  isl_printer *p;\n  int fifo_depth = prog->scop->options->autosa->fifo_depth;\n  struct print_hw_module_data hw_data = {hls, prog, NULL};\n\n  /* Print the top module ASTs. */\n  p = isl_printer_to_file(ctx, hls->top_gen_c);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n\n  print_top_gen_headers(prog, top, hls);\n  fprintf(hls->top_gen_c, \" {\\n\");\n  p = isl_printer_indent(p, 2);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"FILE *fd = fopen(\\\"\");\n  p = isl_printer_print_str(p, hls->output_dir);\n  p = isl_printer_print_str(p, \"/resource_est/design_info.dat\\\", \\\"w\\\");\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"int fifo_cnt;\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"isl_ctx *ctx = isl_ctx_alloc();\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"isl_printer *p = isl_printer_to_file(ctx, f);\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_end_line(p);\n\n  p = print_top_module_headers_tapa(p, prog, top, hls);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_indent(p, 2);\");\n  p = isl_printer_end_line(p);\n\n  /* Print FIFO declarations */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_start_line(p);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* FIFO Declaration */\\\");\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_end_line(p);\n\n  /* Print the serialize fifos if existing. */\n  for (int i = 0; i < top->n_hw_modules; i++) {\n    struct autosa_hw_module *module = top->hw_modules[i];\n    struct autosa_array_ref_group *group = module->io_groups[0];\n    if (module->is_serialized) {\n      /* Generate fifo decl counter. */\n      char *fifo_name;\n      int fifo_w;  // bytes\n      fifo_w = module->data_pack_inter * group->array->size;\n      isl_printer *p_str;\n      p_str = isl_printer_to_str(ctx);\n      p_str = autosa_array_ref_group_print_fifo_name(group, p_str);\n      p_str = isl_printer_print_str(p_str, \"_\");\n      p_str = isl_printer_print_str(p_str, module->name);\n      p_str = isl_printer_print_str(p_str, \"_serialize\");\n      fifo_name = isl_printer_get_str(p_str);\n      isl_printer_free(p_str);\n\n      p = print_str_new_line(p, \"fifo_cnt = 1;\");\n      p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* \");\n      p = isl_printer_print_str(p, module->name);\n      p = isl_printer_print_str(p, \"_serialize fifo */ \");\n      p = print_fifo_type_tapa(p, group, module->data_pack_inter, fifo_depth, NULL);\n      p = isl_printer_print_str(p, \" \");\n      p = isl_printer_print_str(p, fifo_name);\n      p = isl_printer_print_str(p, \";\\\");\");\n      p = isl_printer_end_line(p);\n      p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n      if (group->local_array->is_sparse) {\n        p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"#pragma HLS DATA_PACK variable=\");\n        p = isl_printer_print_str(p, fifo_name);\n        p = isl_printer_print_str(p, \"\\\");\");\n        p = isl_printer_end_line(p);\n        p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n      }\n\n      /* fifo:fifo_name:fifo_cnt:fifo_width */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"fprintf(fd, \\\"fifo:\");\n      p = isl_printer_print_str(p, fifo_name);\n      p = isl_printer_print_str(p, \":\\%d:\");\n      p = isl_printer_print_int(p, fifo_w);\n      p = isl_printer_print_str(p, \"\\\\n\\\", fifo_cnt);\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_end_line(p);\n      free(fifo_name);\n    }\n  }\n\n  for (int i = 0; i < top->n_fifo_decls; i++) {\n    /* Generate fifo decl counter. */\n    char *fifo_decl_name = top->fifo_decl_names[i];\n    char *fifo_name = extract_fifo_name_from_fifo_decl_name(ctx, fifo_decl_name);\n    char *fifo_w = extract_fifo_width_from_fifo_decl_name(ctx, fifo_decl_name);\n    p = print_str_new_line(p, \"fifo_cnt = 0;\");\n\n    /* Print AST */\n    print_options = isl_ast_print_options_alloc(ctx);\n    print_options = isl_ast_print_options_set_print_user(print_options,\n                                                         &print_top_module_fifo_stmt, &hw_data);\n\n    p = isl_ast_node_print(top->fifo_decl_wrapped_trees[i],\n                           p, print_options);\n\n    /* fifo:fifo_name:fifo_cnt:fifo_width */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"fprintf(fd, \\\"fifo:\");\n    p = isl_printer_print_str(p, fifo_name);\n    p = isl_printer_print_str(p, \":\\%d:\");\n    p = isl_printer_print_str(p, fifo_w);\n    p = isl_printer_print_str(p, \"\\\\n\\\", fifo_cnt);\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_end_line(p);\n\n    free(fifo_name);\n    free(fifo_w);\n  }\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_start_line(p);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* FIFO Declaration */\\\");\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n  p = isl_printer_end_line(p);\n\n  int n_module_names = 0;\n  char **module_names = NULL;\n  for (int i = 0; i < top->n_hw_modules; i++)\n  {\n    /* Generate module call counter. */\n    struct autosa_hw_module *module = top->hw_modules[i];\n    char *module_name;\n\n    if (module->is_filter && module->is_buffer)\n    {\n      module_name = concat(ctx, module->name, \"intra_trans\");\n\n      n_module_names++;\n      module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n      module_names[n_module_names - 1] = module_name;\n\n      module_name = concat(ctx, module->name, \"inter_trans\");\n\n      n_module_names++;\n      module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n      module_names[n_module_names - 1] = module_name;\n\n      if (module->boundary)\n      {\n        module_name = concat(ctx, module->name, \"inter_trans_boundary\");\n\n        n_module_names++;\n        module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n        module_names[n_module_names - 1] = module_name;\n      }\n    }\n\n    module_name = strdup(module->name);\n\n    n_module_names++;\n    module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n    module_names[n_module_names - 1] = module_name;\n\n    if (module->boundary)\n    {\n      module_name = concat(ctx, module->name, \"boundary\");\n\n      n_module_names++;\n      module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n      module_names[n_module_names - 1] = module_name;\n    }\n\n    if (module->n_pe_dummy_modules > 0)\n    {\n      for (int j = 0; j < module->n_pe_dummy_modules; j++)\n      {\n        struct autosa_pe_dummy_module *dummy_module = module->pe_dummy_modules[j];\n        struct autosa_array_ref_group *group = dummy_module->io_group;\n        isl_printer *p_str = isl_printer_to_str(ctx);\n        p_str = autosa_array_ref_group_print_prefix(group, p_str);\n        p_str = isl_printer_print_str(p_str, \"_PE_dummy\");\n        p_str = isl_printer_print_str(p_str, dummy_module->in? \"_in\" : \"_out\");\n        module_name = isl_printer_get_str(p_str);\n        isl_printer_free(p_str);\n\n        n_module_names++;\n        module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n        module_names[n_module_names - 1] = module_name;\n      }\n    }\n\n    if (module->is_serialized) {\n      if (module->boundary)\n        module_name = concat(ctx, module->name, \"boundary_serialize\");\n      else\n        module_name = concat(ctx, module->name, \"serialize\");\n\n      n_module_names++;\n      module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n      module_names[n_module_names - 1] = module_name;\n    }\n  }\n  for (int i = 0; i < n_module_names; i++)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, module_names[i]);\n    p = isl_printer_print_str(p, \"_cnt = 0;\");\n    p = isl_printer_end_line(p);\n  }\n\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"  tapa::task()\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  /* Print module calls. */\n  for (int i = 0; i < top->n_module_calls; i++)\n  {\n    /* Print AST */\n    print_options = isl_ast_print_options_alloc(ctx);\n    print_options = isl_ast_print_options_set_print_user(print_options,\n                                                         &print_top_module_call_stmt, &hw_data);\n\n    p = isl_ast_node_print(top->module_call_wrapped_trees[i],\n                           p, print_options);\n  }\n\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"  ;\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  /* module:module_name:module_cnt. */\n  for (int i = 0; i < n_module_names; i++)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"fprintf(fd, \\\"module:\");\n    p = isl_printer_print_str(p, module_names[i]);\n    p = isl_printer_print_str(p, \":\\%d\\\\n\\\", \");\n    p = isl_printer_print_str(p, module_names[i]);\n    p = isl_printer_print_str(p, \"_cnt);\");\n    p = isl_printer_end_line(p);\n  }\n  p = isl_printer_end_line(p);\n\n  for (int i = 0; i < n_module_names; i++)\n  {\n    free(module_names[i]);\n  }\n  free(module_names);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_indent(p, -2);\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"}\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"fclose(fd);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"isl_printer_free(p);\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"isl_ctx_free(ctx);\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_indent(p, -2);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"}\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_end_line(p);\n\n  /* For internal testing only. */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"int main()\");\n  p = isl_printer_end_line(p);\n\n  p = ppcg_start_block(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"FILE *f = fopen(\\\"\");\n  p = isl_printer_print_str(p, hls->output_dir);\n  p = isl_printer_print_str(p, \"/src/top.cpp\\\", \\\"w\\\");\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"top_generate(f);\");\n  p = isl_printer_end_line(p);\n\n  p = ppcg_end_block(p);\n  p = isl_printer_free(p);\n\n  return;\n}\n\n/* Given a autosa_prog \"prog\" and the corresponding tranformed AST\n * \"tree\", print the entire OpenCL/HLS code to \"p\".\n * \"types\" collects the types for which a definition has already been\n * printed.\n */\nstatic __isl_give isl_printer *print_hw(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, __isl_keep isl_ast_node *tree,\n    struct autosa_hw_module **modules, int n_modules,\n    struct autosa_hw_top_module *top_module,\n    struct autosa_drain_merge_func **drain_merge_funcs, int n_drain_merge_funcs,\n    struct autosa_types *types, void *user)\n{\n  struct hls_info *hls = (struct hls_info *)user;\n  isl_printer *p_tmp;\n\n  p_tmp = isl_printer_to_file(isl_printer_get_ctx(p), hls->kernel_c);\n  p_tmp = isl_printer_set_output_format(p_tmp, ISL_FORMAT_C);\n  p_tmp = autosa_print_types(p_tmp, types, prog);\n  p_tmp = isl_printer_free(p_tmp);\n\n  /* Print OpenCL host and kernel function. */\n  p = autosa_print_host_code(p, prog, tree, modules, n_modules, top_module,\n                             drain_merge_funcs, n_drain_merge_funcs, hls);\n  /* Print seperate top module code generation function. */\n  print_top_gen_host_code(prog, tree, top_module, hls);\n\n  return p;\n}\n\n/* Generate systolic arrays for TAPA C++\n */\nint generate_autosa_tapa_cpp(isl_ctx *ctx, struct ppcg_options *options,\n                                 const char *input)\n{\n  struct hls_info hls;\n  int r;\n\n  hls.target = TAPA_HW;\n  hls.hls = false;\n  hls.hcl = false;\n  hls.ctx = ctx;\n  hls.output_dir = options->autosa->output_dir;\n  hls_open_files(&hls, input);\n\n  r = generate_sa(ctx, input, hls.host_c, options, &print_hw, &hls);\n\n  hls_close_files(&hls);\n\n  return r;\n}\n"
  },
  {
    "path": "src/autosa_tapa_cpp.h",
    "content": "#ifndef _AUTOSA_TAPA_CPP_H\n#define _AUTOSA_TAPA_CPP_H\n\n#include <pet.h>\n#include \"ppcg_options.h\"\n#include \"ppcg.h\"\n\n#ifdef __cplusplus\nextern \"C\"\n{\n#endif\n\nint generate_autosa_tapa_cpp(isl_ctx *ctx, struct ppcg_options *options,\n        const char *input);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "src/autosa_trans.cpp",
    "content": "#include <string>\n#include <exception>\n//#include <chrono>\n//using namespace std::chrono;\n\n#include \"autosa_trans.h\"\n#include \"autosa_utils.h\"\n#include \"autosa_schedule_tree.h\"\n#include \"autosa_comm.h\"\n#include \"autosa_codegen.h\"\n#include \"autosa_print.h\"\n#include \"cpu.h\"\n\n/* A program is legal to be transformed to systolic array if and only if \n * it satisfies the following constraints:\n * - one single fully permutable outermost band\n * - uniform dependency\n */\nisl_bool sa_legality_check(__isl_keep isl_schedule *schedule, struct ppcg_scop *scop)\n{\n    isl_bool single_band;\n    enum isl_schedule_node_type type;\n\n    /* Check if the root node point to a band node */\n    isl_schedule_node *node = isl_schedule_get_root(schedule);\n    node = isl_schedule_node_child(node, 0);\n    type = isl_schedule_node_get_type(node);\n    single_band = (type == isl_schedule_node_band) ? isl_bool_true : isl_bool_false;\n    isl_schedule_node_free(node);\n    if (!single_band)\n    {\n        throw std::runtime_error(\"[AutoSA] Error: Single outermost permutable band not found.\");\n    }\n\n    //DBGSCHD(stdout, schedule, isl_schedule_get_ctx(schedule))\n\n    /* Check if all flow and rar dependences are uniform. */\n    isl_bool all_uniform_dep = uniform_dep_check(schedule, scop);\n    if (all_uniform_dep < 1)\n    {\n        throw std::runtime_error(\"[AutoSA] Error: Non-uniform dependence detected.\");\n    }    \n\n    return isl_bool_true;\n}\n\n/* Load the tuning configuration file.  \n */\nstatic cJSON *load_tuning_config(char *config_file)\n{\n    FILE *f;\n    char *buffer = NULL;\n    cJSON *config = NULL;\n    long length;\n\n    f = fopen(config_file, \"rb\");\n    if (f)\n    {\n        fseek(f, 0, SEEK_END);\n        length = ftell(f);\n        fseek(f, 0, SEEK_SET);\n        buffer = (char *)malloc(length + 1);\n        if (buffer)\n        {\n            buffer[length] = '\\0';\n            int r = fread(buffer, 1, length, f);\n        }\n        fclose(f);\n    }\n    else\n    {\n        printf(\"[AutoSA] Error: Can't open configuration file: %s\\n\", config_file);\n        exit(1);\n    }\n\n    if (buffer)\n    {\n        config = cJSON_Parse(buffer);\n        free(buffer);\n    }\n\n    return config;\n}\n\n/* Generate asyncrhonized systolic arrays with the given dimension.\n * For sync arrays, time loops are placed inside the space loops.\n * We will first select space loop candidates from the outermost loop band \n * which carry dependences with distance less than or equal to 1. \n * Then we will enumerate different space loop combinations by picking up \"dim\" \n * space loops from the candidate pool.\n */\nstruct autosa_kernel **sa_space_time_transform_at_dim_async(\n    __isl_keep isl_schedule *schedule, struct ppcg_scop *scop,\n    isl_size dim, isl_size *num_sa, isl_size num_sa_offset)\n{\n    struct autosa_kernel **sas = NULL;\n\n    /* Select space loop candidates.\n     * Space loops carry dependences with distance less or equal to 1.\n     */\n    isl_schedule_node *band = get_outermost_permutable_node(schedule);\n    isl_size band_w = isl_schedule_node_band_n_member(band);\n    isl_size *is_space_loop = (isl_size *)malloc(band_w * sizeof(isl_size));\n    isl_union_map *dep_flow = scop->dep_flow;\n    isl_union_map *dep_rar = scop->dep_rar;\n    isl_union_map *dep_total = isl_union_map_union(isl_union_map_copy(dep_flow),\n                                                   isl_union_map_copy(dep_rar));\n    isl_basic_map_list *deps = isl_union_map_get_basic_map_list(dep_total);\n    isl_size ndeps = isl_union_map_n_basic_map(dep_total);\n\n    for (int h = 0; h < band_w; h++)\n    {\n        int n;\n        for (n = 0; n < ndeps; n++)\n        {\n            isl_basic_map *dep = isl_basic_map_list_get_basic_map(deps, n);\n            isl_vec *dep_dis = get_dep_dis_at_node(dep, band);\n            isl_val *val = isl_vec_get_element_val(dep_dis, h);\n            if (!(isl_val_is_one(val) || isl_val_is_zero(val)))\n            {\n                isl_vec_free(dep_dis);\n                isl_val_free(val);\n                isl_basic_map_free(dep);\n                break;\n            }\n\n            isl_val_free(val);\n            isl_vec_free(dep_dis);\n            isl_basic_map_free(dep);\n        }\n        is_space_loop[h] = (n == ndeps);\n    }\n\n    /* Perform loop permutation to generate all candidates. */\n    if (dim == 1)\n    {\n        for (int i = 0; i < band_w; i++)\n        {\n            if (is_space_loop[i])\n            {                  \n                TuningProgram *tuning_program = new TuningProgram;      \n                tuning_program->id = *num_sa + num_sa_offset;\n                tuning_program->load_param_names(scop->options->autosa->param_names);\n                isl_schedule *new_schedule = isl_schedule_dup(schedule);\n                new_schedule = tuning_program->init_from_schedule(new_schedule);\n                isl_schedule_node *band = get_outermost_permutable_node(new_schedule);\n                isl_schedule_free(new_schedule);\n\n                /* Make the loop i the outermost loop. */\n                for (int d = i; d > 0; d--)\n                {                    \n                    band = loop_interchange_at_node(band, d, d - 1);\n                }\n                new_schedule = isl_schedule_node_get_schedule(band);\n                isl_schedule_node_free(band);\n\n                /* Update the hyperplane types. */\n                struct autosa_kernel *sa = autosa_kernel_from_schedule(new_schedule);\n                sa->scop = scop;\n                sa->type = AUTOSA_SA_TYPE_ASYNC;\n\n                /* Update the array dimension. */\n                sa->n_sa_dim = dim;\n                sa->array_part_w = 0;\n                sa->space_w = dim;\n                // TODO: incorrect, to fix.\n                sa->time_w = band_w - dim;\n                sa->tuning_program = tuning_program;                                \n\n                /* Add the new variant into the list. */\n                sas = (struct autosa_kernel **)realloc(sas, (*num_sa + 1) *\n                                                                sizeof(struct autosa_kernel *));\n                sas[*num_sa] = sa;\n                *num_sa = *num_sa + 1;\n            }\n        }\n    }\n    else if (dim == 2)\n    {\n        for (int i = 0; i < band_w; i++)\n        {\n            if (is_space_loop[i])\n            {\n                for (int j = i + 1; j < band_w; j++)\n                {\n                    if (is_space_loop[j])\n                    {\n                        TuningProgram *tuning_program = new TuningProgram;                        \n                        tuning_program->id = *num_sa + num_sa_offset;\n                        tuning_program->load_param_names(scop->options->autosa->param_names);                        \n                        isl_schedule *new_schedule = isl_schedule_dup(schedule);\n                        new_schedule = tuning_program->init_from_schedule(new_schedule);\n                        isl_schedule_node *band = get_outermost_permutable_node(new_schedule);                        \n                        isl_schedule_free(new_schedule);\n\n                        /* Make the loop i, j the outermost loops. */\n                        for (int d = j; d > 0; d--)\n                        {                            \n                            band = loop_interchange_at_node(band, d, d - 1);\n                        }\n                        for (int d = i + 1; d > 0; d--)\n                        {                         \n                            band = loop_interchange_at_node(band, d, d - 1);\n                        }\n                        new_schedule = isl_schedule_node_get_schedule(band);\n                        isl_schedule_node_free(band);\n\n                        /* Update the hyperplane types. */\n                        struct autosa_kernel *sa = autosa_kernel_from_schedule(new_schedule);\n                        sa->scop = scop;\n                        sa->type = AUTOSA_SA_TYPE_ASYNC;\n\n                        /* Update the array dimension. */\n                        sa->n_sa_dim = dim;\n                        sa->array_part_w = 0;\n                        sa->space_w = dim;\n                        // TODO: incorrect, to fix.\n                        sa->time_w = band_w - dim;\n                        sa->tuning_program = tuning_program;\n\n                        /* Add the new variant into the list. */\n                        sas = (struct autosa_kernel **)realloc(sas, (*num_sa + 1) *\n                                                                        sizeof(struct autosa_kernel *));\n                        sas[*num_sa] = sa;\n                        *num_sa = *num_sa + 1;\n                    }\n                }\n            }\n        }\n    }\n    else if (dim == 3)\n    {\n        for (int i = 0; i < band_w; i++)\n        {\n            if (is_space_loop[i])\n            {\n                for (int j = i + 1; j < band_w; j++)\n                {\n                    if (is_space_loop[j])\n                    {\n                        for (int k = j + 1; k < band_w; k++)\n                        {\n                            if (is_space_loop[k])\n                            {\n                                TuningProgram *tuning_program = new TuningProgram;                                \n                                tuning_program->id = *num_sa + num_sa_offset;\n                                tuning_program->load_param_names(scop->options->autosa->param_names);                                \n                                isl_schedule *new_schedule = isl_schedule_dup(schedule);\n                                new_schedule = tuning_program->init_from_schedule(new_schedule);\n                                isl_schedule_node *band = get_outermost_permutable_node(new_schedule);\n                                isl_schedule_free(new_schedule);\n\n                                /* Make the loop i, j, k the outermost loops. */\n                                for (int d = k; d > 0; d--)\n                                {                                    \n                                    band = loop_interchange_at_node(band, d, d - 1);\n                                }\n                                for (int d = j + 1; d > 0; d--)\n                                {                                    \n                                    band = loop_interchange_at_node(band, d, d - 1);\n                                }\n                                for (int d = i + 2; d > 0; d--)\n                                {                                 \n                                    band = loop_interchange_at_node(band, d, d - 1);\n                                }\n                                new_schedule = isl_schedule_node_get_schedule(band);\n                                isl_schedule_node_free(band);\n\n                                /* Update the hyperplane types. */\n                                struct autosa_kernel *sa = autosa_kernel_from_schedule(new_schedule);\n                                sa->scop = scop;\n                                sa->type = AUTOSA_SA_TYPE_ASYNC;\n\n                                /* Update the array dimension. */\n                                sa->n_sa_dim = dim;\n                                sa->array_part_w = 0;\n                                sa->space_w = dim;\n                                // TODO: incorrect, to fix.\n                                sa->time_w = band_w - dim;\n                                sa->tuning_program = tuning_program;\n\n                                /* Add the new variant into the list. */\n                                sas = (struct autosa_kernel **)realloc(sas, (*num_sa + 1) *\n                                                                                sizeof(struct autosa_kernel *));\n                                sas[*num_sa] = sa;\n                                *num_sa = *num_sa + 1;\n                            }\n                        }\n                    }\n                }\n            }\n        }\n    }\n\n    isl_basic_map_list_free(deps);\n    isl_union_map_free(dep_total);\n    isl_schedule_node_free(band);\n    free(is_space_loop);\n\n    return sas;\n}\n\n/* Generate syncrhonized systolic arrays with the given dimension.\n * For sync arrays, time loops are placed outside the space loops.\n * We will first select space loop candidates from the innermost loop band \n * which carry dependences with distance less than or equal to 1. \n * Then we will enumerate different space loop combinations by picking up \"dim\" \n * space loops from the candidate pool.\n */\nstruct autosa_kernel **sa_space_time_transform_at_dim_sync(\n    __isl_keep isl_schedule *schedule, struct ppcg_scop *scop,\n    isl_size dim, isl_size *num_sa)\n{\n    struct autosa_kernel **sas = NULL;\n\n    /* Select space loop candidates.\n   * Space loops carry dependences with distance less or equal to 1.\n   */\n    isl_schedule_node *band = get_innermost_permutable_node(schedule);\n    isl_size band_w = isl_schedule_node_band_n_member(band);\n    isl_size *is_space_loop = (isl_size *)malloc(band_w * sizeof(isl_size));\n    isl_union_map *dep_flow = scop->dep_flow;\n    isl_union_map *dep_rar = scop->dep_rar;\n    isl_union_map *dep_total = isl_union_map_union(isl_union_map_copy(dep_flow),\n                                                   isl_union_map_copy(dep_rar));\n    isl_basic_map_list *deps = isl_union_map_get_basic_map_list(dep_total);\n    isl_size ndeps = isl_union_map_n_basic_map(dep_total);\n\n    for (int h = 0; h < band_w; h++)\n    {\n        int n;\n        for (n = 0; n < ndeps; n++)\n        {\n            isl_basic_map *dep = isl_basic_map_list_get_basic_map(deps, n);\n            isl_vec *dep_dis = get_dep_dis_at_node(dep, band);\n            isl_val *val = isl_vec_get_element_val(dep_dis, h);\n            if (!(isl_val_is_one(val) || isl_val_is_zero(val)))\n            {\n                isl_vec_free(dep_dis);\n                isl_val_free(val);\n                isl_basic_map_free(dep);\n                break;\n            }\n\n            isl_val_free(val);\n            isl_vec_free(dep_dis);\n            isl_basic_map_free(dep);\n        }\n        is_space_loop[h] = (n == ndeps);\n    }\n\n    /* Perform loop permutation to generate all candidates. */\n    if (dim == 1)\n    {\n        for (int i = 0; i < band_w; i++)\n        {\n            if (is_space_loop[i])\n            {\n                isl_schedule *new_schedule = isl_schedule_dup(schedule);\n                isl_schedule_node *band = get_innermost_permutable_node(new_schedule);\n                isl_schedule_free(new_schedule);\n\n                /* Make the loop i the innermost loop. */\n                for (int d = i; d < band_w - 1; d++)\n                {\n                    //isl_schedule_node *band = get_innermost_permutable_node(new_schedule);\n                    //isl_schedule_free(new_schedule);\n                    //new_schedule = loop_interchange_at_node(band, d, d + 1);\n                    band = loop_interchange_at_node(band, d, d + 1);\n                }\n                new_schedule = isl_schedule_node_get_schedule(band);\n                isl_schedule_node_free(band);\n\n                /* Update the hyperplane types. */\n                struct autosa_kernel *sa = autosa_kernel_from_schedule(new_schedule);\n                sa->scop = scop;\n                sa->type = AUTOSA_SA_TYPE_SYNC;\n\n                /* Update the array dimension. */\n                sa->n_sa_dim = dim;\n                sa->array_part_w = 0;\n                sa->space_w = dim;\n                // TODO: this is incorrect, we need to consider other loop bands.\n                sa->time_w = band_w - dim;\n\n                /* Add the new variant into the list. */\n                sas = (struct autosa_kernel **)realloc(sas, (*num_sa + 1) *\n                                                                sizeof(struct autosa_kernel *));\n                sas[*num_sa] = sa;\n                *num_sa = *num_sa + 1;\n            }\n        }\n    }\n    else if (dim == 2)\n    {\n        for (int i = 0; i < band_w; i++)\n        {\n            if (is_space_loop[i])\n            {\n                for (int j = i + 1; j < band_w; j++)\n                {\n                    if (is_space_loop[j])\n                    {\n                        isl_schedule *new_schedule = isl_schedule_dup(schedule);\n                        isl_schedule_node *band = get_innermost_permutable_node(new_schedule);\n                        isl_schedule_free(new_schedule);\n\n                        /* Make the loop i, j the innermost loops. */\n                        for (int d = i; d < band_w - 1; d++)\n                        {\n                            //isl_schedule_node *band = get_innermost_permutable_node(new_schedule);\n                            //isl_schedule_free(new_schedule);\n                            //new_schedule = loop_interchange_at_node(band, d, d + 1);\n                            band = loop_interchange_at_node(band, d, d + 1);\n                        }\n                        for (int d = j - 1; d < band_w - 1; d++)\n                        {\n                            //isl_schedule_node *band = get_innermost_permutable_node(new_schedule);\n                            //isl_schedule_free(new_schedule);\n                            //new_schedule = loop_interchange_at_node(band, d, d + 1);\n                            band = loop_interchange_at_node(band, d, d + 1);\n                        }\n                        new_schedule = isl_schedule_node_get_schedule(band);\n                        isl_schedule_node_free(band);\n\n                        /* Update the hyperplane types. */\n                        struct autosa_kernel *sa = autosa_kernel_from_schedule(new_schedule);\n                        sa->scop = scop;\n                        sa->type = AUTOSA_SA_TYPE_SYNC;\n\n                        /* Update the array dimension. */\n                        sa->n_sa_dim = dim;\n                        sa->array_part_w = 0;\n                        sa->space_w = dim;\n                        // TODO: incorrect, to fix.\n                        sa->time_w = band_w - dim;\n\n                        /* Add the new variant into the list. */\n                        sas = (struct autosa_kernel **)realloc(sas, (*num_sa + 1) *\n                                                                        sizeof(struct autosa_kernel *));\n                        sas[*num_sa] = sa;\n                        *num_sa = *num_sa + 1;\n                    }\n                }\n            }\n        }\n    }\n    else if (dim == 3)\n    {\n        for (int i = 0; i < band_w; i++)\n        {\n            if (is_space_loop[i])\n            {\n                for (int j = i + 1; j < band_w; j++)\n                {\n                    if (is_space_loop[j])\n                    {\n                        for (int k = j + 1; k < band_w; k++)\n                        {\n                            if (is_space_loop[k])\n                            {\n                                isl_schedule *new_schedule = isl_schedule_dup(schedule);\n                                isl_schedule_node *band = get_innermost_permutable_node(new_schedule);\n                                isl_schedule_free(new_schedule);\n\n                                /* Make the loop i, j, k the innermost loops. */\n                                for (int d = i; d < band_w - 1; d++)\n                                {\n                                    //isl_schedule_node *band = get_innermost_permutable_node(new_schedule);\n                                    //isl_schedule_free(new_schedule);\n                                    //new_schedule = loop_interchange_at_node(band, d, d + 1);\n                                    band = loop_interchange_at_node(band, d, d + 1);\n                                }\n                                for (int d = j - 1; d < band_w - 1; d++)\n                                {\n                                    //isl_schedule_node *band = get_innermost_permutable_node(new_schedule);\n                                    //isl_schedule_free(new_schedule);\n                                    //new_schedule = loop_interchange_at_node(band, d, d + 1);\n                                    band = loop_interchange_at_node(band, d, d + 1);\n                                }\n                                for (int d = k - 2; d < band_w - 1; d++)\n                                {\n                                    //isl_schedule_node *band = get_innermost_permutable_node(new_schedule);\n                                    //isl_schedule_free(new_schedule);\n                                    //new_schedule = loop_interchange_at_node(band, d, d + 1);\n                                    band = loop_interchange_at_node(band, d, d + 1);\n                                }\n                                new_schedule = isl_schedule_node_get_schedule(band);\n                                isl_schedule_node_free(band);\n\n                                /* Update the hyperplane types. */\n                                struct autosa_kernel *sa = autosa_kernel_from_schedule(new_schedule);\n                                sa->scop = scop;\n                                sa->type = AUTOSA_SA_TYPE_SYNC;\n\n                                /* Update the array dimension. */\n                                sa->n_sa_dim = dim;\n                                sa->array_part_w = 0;\n                                sa->space_w = dim;\n                                sa->time_w = band_w - dim;\n\n                                /* Add the new variant into the list. */\n                                sas = (struct autosa_kernel **)realloc(sas, (*num_sa + 1) *\n                                                                                sizeof(struct autosa_kernel *));\n                                sas[*num_sa] = sa;\n                                *num_sa = *num_sa + 1;\n                            }\n                        }\n                    }\n                }\n            }\n        }\n    }\n\n    isl_basic_map_list_free(deps);\n    isl_union_map_free(dep_total);\n    isl_schedule_node_free(band);\n    free(is_space_loop);\n\n    return sas;\n}\n\n/* Generate systolic array with \"dim\" space dimensions. \n * Depending on the systolic array type set by users, we will generate \n * async or sync arrays.\n */\nstruct autosa_kernel **sa_space_time_transform_at_dim(\n    __isl_keep isl_schedule *schedule, struct ppcg_scop *scop,\n    isl_size dim, isl_size *num_sa, isl_size num_sa_offset)\n{\n    if (scop->options->autosa->sa_type == AUTOSA_SA_TYPE_ASYNC)\n    {\n        return sa_space_time_transform_at_dim_async(schedule, scop, dim, num_sa, num_sa_offset);\n    }\n    else if (scop->options->autosa->sa_type == AUTOSA_SA_TYPE_SYNC)\n    {\n        return sa_space_time_transform_at_dim_sync(schedule, scop, dim, num_sa);\n    }\n\n    return NULL;\n}\n\n/* Apply space-time transformation to generate different systolic array candidates. */\nstruct autosa_kernel **sa_space_time_transform(__isl_take isl_schedule *schedule,\n                                               struct ppcg_scop *scop, isl_size *num_sa)\n{\n    struct autosa_kernel **sa_list = NULL;\n    isl_size n_sa = 0;\n    isl_schedule_node *band = get_outermost_permutable_node(schedule);\n    isl_size band_w = isl_schedule_node_band_n_member(band);\n    if (band_w <= 0) {\n        isl_schedule_free(schedule);\n        *num_sa = 0;\n        return NULL;\n    }\n\n    /* Explore 1D systolic array */\n    if (scop->options->autosa->max_sa_dim >= 1 && band_w >= 1)\n    {\n        if (scop->options->autosa->verbose)\n        {\n            printf(\"[AutoSA] Explore 1D systolic array.\\n\");\n        }\n        isl_size n_sa_dim = 0;\n        struct autosa_kernel **sa_dim_list = sa_space_time_transform_at_dim(\n            schedule, scop, 1, &n_sa_dim, n_sa);\n        if (scop->options->autosa->verbose)\n        {\n            printf(\"[AutoSA] %d candidates generated.\\n\", n_sa_dim);\n        }\n        sa_list = (struct autosa_kernel **)realloc(sa_list,\n                                                   (n_sa + n_sa_dim) * sizeof(struct autosa_kernel *));\n        for (int i = 0; i < n_sa_dim; i++)\n        {\n            sa_list[n_sa + i] = sa_dim_list[i];\n            sa_list[n_sa + i]->space_time_id = n_sa + i;            \n        }\n        free(sa_dim_list);\n        n_sa += n_sa_dim;\n    }\n    /* Explore 2D systolic array */\n    if (scop->options->autosa->max_sa_dim >= 2 && band_w >= 2)\n    {\n        if (scop->options->autosa->verbose)\n        {\n            printf(\"[AutoSA] Explore 2D systolic array.\\n\");\n        }\n        isl_size n_sa_dim = 0;\n        struct autosa_kernel **sa_dim_list = sa_space_time_transform_at_dim(\n            schedule, scop, 2, &n_sa_dim, n_sa);\n        if (scop->options->autosa->verbose)\n        {\n            printf(\"[AutoSA] %d candidates generated.\\n\", n_sa_dim);\n        }\n        sa_list = (struct autosa_kernel **)realloc(sa_list,\n                                                   (n_sa + n_sa_dim) * sizeof(struct autosa_kernel *));\n        for (int i = 0; i < n_sa_dim; i++)\n        {\n            sa_list[n_sa + i] = sa_dim_list[i];\n            sa_list[n_sa + i]->space_time_id = n_sa + i;            \n        }\n        free(sa_dim_list);\n        n_sa += n_sa_dim;\n    }\n    /* Explore 3D systolic array */\n    if (scop->options->autosa->max_sa_dim >= 3 && band_w >= 3)\n    {\n        if (scop->options->autosa->verbose)\n        {\n            printf(\"[AutoSA] Explore 3D systolic array.\\n\");\n        }\n        isl_size n_sa_dim = 0;\n        struct autosa_kernel **sa_dim_list = sa_space_time_transform_at_dim(\n            schedule, scop, 3, &n_sa_dim, n_sa);\n        if (scop->options->autosa->verbose)\n        {\n            printf(\"[AutoSA] %d candidates generated.\\n\", n_sa_dim);\n        }\n        sa_list = (struct autosa_kernel **)realloc(sa_list,\n                                                   (n_sa + n_sa_dim) * sizeof(struct autosa_kernel *));\n        for (int i = 0; i < n_sa_dim; i++)\n        {\n            sa_list[n_sa + i] = sa_dim_list[i];\n            sa_list[n_sa + i]->space_time_id = n_sa + i;            \n        }\n        free(sa_dim_list);\n        n_sa += n_sa_dim;\n    }\n\n    isl_schedule_free(schedule);\n    isl_schedule_node_free(band);\n    *num_sa = n_sa;\n    /* Assign the kernel id */\n    for (int i = 0; i < n_sa; i++)\n    {\n        sa_list[i]->id = i;\n    }\n\n    return sa_list;\n}\n\n/* Initialize the space_time to autosa_loop_time, \n * and pe_opt to autosa_loop_default for all band nodes. */\nstatic __isl_give isl_schedule_node *init_band_node_sa_properties(\n    __isl_take isl_schedule_node *node, void *user)\n{\n    if (!node)\n        return NULL;\n\n    struct autosa_kernel *sa = (struct autosa_kernel *)(user);\n\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n    {\n        int band_w = isl_schedule_node_band_n_member(node);\n        /* Initialize the SA properties. */\n        for (int i = 0; i < band_w; i++)\n        {\n            node = isl_schedule_node_band_member_set_space_time(node, i, autosa_loop_time);\n            node = isl_schedule_node_band_member_set_pe_opt(node, i, autosa_loop_default);\n            //node = isl_schedule_node_band_member_set_sched_pos(node, i, -1);\n        }\n    }\n\n    return node;\n}\n\n/* Initialize the fields of time_space and pe_opt for each band node in the \n * schedule tree. */\nisl_stat sa_loop_init(struct autosa_kernel *sa)\n{\n    isl_schedule *schedule = sa->schedule;\n    isl_schedule_node *root = isl_schedule_get_root(schedule);\n    root = isl_schedule_node_map_descendant_bottom_up(root,\n                                                      &init_band_node_sa_properties, sa);\n\n    schedule = isl_schedule_node_get_schedule(root);\n    isl_schedule_node_free(root);\n    isl_schedule_free(sa->schedule);\n    sa->schedule = schedule;\n\n    return isl_stat_ok;\n}\n\n/* Set up the space_time properties. \n * As all the loops are initialized to be the time loop in the sa_loop_init(),\n * only the space loops are to be set.\n */\nisl_stat sa_space_time_loop_setup(struct autosa_kernel *sa)\n{\n    isl_schedule_node *node;\n    if (sa->type == AUTOSA_SA_TYPE_SYNC)\n    {\n        node = get_innermost_permutable_node(sa->schedule);\n        int dim = 0;\n        for (int i = isl_schedule_node_band_n_member(node) - sa->space_w;\n             i < isl_schedule_node_band_n_member(node); i++)\n        {\n            node = isl_schedule_node_band_member_set_space_time(node, i, autosa_loop_space);\n            sa->space_parallel[dim] = isl_schedule_node_band_member_get_coincident(node, i);\n            dim++;\n        }\n    }\n    else if (sa->type == AUTOSA_SA_TYPE_ASYNC)\n    {\n        node = get_outermost_permutable_node(sa->schedule);\n        int dim = 0;\n        for (int i = 0; i < sa->space_w; i++)\n        {\n            node = isl_schedule_node_band_member_set_space_time(node, i, autosa_loop_space);\n            sa->space_parallel[dim] = isl_schedule_node_band_member_get_coincident(node, i);\n            dim++;\n        }\n    }\n\n    isl_schedule *schedule = isl_schedule_node_get_schedule(node);\n    isl_schedule_node_free(node);\n    isl_schedule_free(sa->schedule);\n    sa->schedule = schedule;\n\n    return isl_stat_ok;\n}\n\n/* Internal struct used for sa_candidates_smart_pick. */\nstruct sa_candidates_smart_pick_update_data\n{\n    int score;\n    struct autosa_kernel *sa;\n    enum autosa_dep_type dep_type;\n};\n\n/* Internal struct used for not_carrried_at_space. */\nstruct dep_space_test_internal_data\n{\n    isl_vec *dirvec;\n    isl_basic_map *dep;\n};\n\n/* This function tests if the current node contains any space loop.\n * If so, test if the dependence is carried by the space loops, and update the \n * dependence distance vector. \n * If the dependence is carried at the space loop, return false,\n * else return true.\n */\nstatic isl_bool not_carried_at_space(__isl_keep isl_schedule_node *node, void *user)\n{\n    struct dep_space_test_internal_data *data =\n        (struct dep_space_test_internal_data *)user;\n    isl_basic_map *dep = data->dep;\n    isl_basic_map *untagged_dep = isl_basic_map_from_map(\n        isl_map_factor_domain(isl_map_from_basic_map(isl_basic_map_copy(dep))));\n    if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n    {\n        isl_basic_map_free(untagged_dep);\n        return isl_bool_true;\n    }\n\n    /* Examine if there is any space loop in the current loop band. */\n    int n_dim = isl_schedule_node_band_n_member(node);\n    int n_space_dim, space_dim_start;\n    n_space_dim = 0;\n    for (int i = 0; i < n_dim; i++)\n    {\n        if (isl_schedule_node_band_member_get_space_time(node, i) == autosa_loop_space)\n        {\n            if (n_space_dim == 0)\n                space_dim_start = i;\n            n_space_dim++;\n        }\n    }\n\n    if (n_space_dim > 0)\n    {\n        isl_vec *disvec = get_dep_dis_at_node(untagged_dep, node);\n        isl_vec *dirvec = isl_vec_zero(isl_schedule_node_get_ctx(node), n_space_dim);\n        int carried = 0;\n        for (int i = 0; i < n_space_dim; i++)\n        {\n            isl_val *val = isl_vec_get_element_val(disvec, space_dim_start + i);\n            dirvec = isl_vec_set_element_si(dirvec, i, isl_val_get_num_si(val));\n            if (isl_val_get_num_si(val) > 0)\n                carried = 1;\n            isl_val_free(val);\n        }\n        data->dirvec = dirvec;\n        isl_vec_free(disvec);\n        isl_basic_map_free(untagged_dep);\n        if (carried)\n            return isl_bool_false;\n        else\n            return isl_bool_true;\n    }\n    isl_basic_map_free(untagged_dep);\n    return isl_bool_true;\n}\n\n/* Update the score for the array. \n * Specifically, add one credit if RAR is carried by space loops or \n * RAW is carried by time loops.\n */\nstatic isl_bool sa_candidates_smart_pick_update(__isl_keep isl_map *map, void *user)\n{\n    isl_basic_map_list *bmap_list = isl_map_get_basic_map_list(map);\n    struct sa_candidates_smart_pick_update_data *data =\n        (struct sa_candidates_smart_pick_update_data *)user;\n    struct autosa_kernel *sa = data->sa;\n    isl_schedule_node *node = isl_schedule_get_root(sa->schedule);\n\n    for (int i = 0; i < isl_map_n_basic_map(map); i++)\n    {\n        isl_basic_map *dep = isl_basic_map_list_get_basic_map(bmap_list, i);\n        struct dep_space_test_internal_data internal_data = {NULL, dep};\n        int is_carried_at_space = !isl_schedule_node_every_descendant(node,\n                                                                      not_carried_at_space, &internal_data);\n        if (is_carried_at_space && data->dep_type == AUTOSA_DEP_RAR)\n            data->score += 1;\n        else if (!is_carried_at_space && data->dep_type == AUTOSA_DEP_RAW)\n            data->score += 1;\n\n        isl_vec_free(internal_data.dirvec);\n        isl_basic_map_free(dep);\n    }\n    isl_schedule_node_free(node);\n    isl_basic_map_list_free(bmap_list);\n    return isl_bool_true;\n}\n\n/* Select one systolic array design based on heuristics. \n * Heuristic:\n * We favor designs with the following features:\n * - RAR carried by space loops. \n * - RAW carried by time loops. \n * We compute a score for each design and select the one with the highest score.\n * The score is computed as :\n * score = 1 * (RAR carried by space || RAW carried by time loop)\n * Namely, for each dependnece, if it is a RAR carried by space or a RAW carried by \n * time loops, it will contriute one credit to the total score.\n * Besides, between 1D and 2D systolic arrays, we prefer 2D systolic arrays for now.\n */\nstruct autosa_kernel *sa_candidates_smart_pick(\n    struct autosa_kernel **sa_list, __isl_keep isl_size num_sa)\n{\n    assert(num_sa > 0);\n    int max_score = -1;\n    struct autosa_kernel *sa_opt;\n    int opt_id;\n    isl_union_map *dep_rar, *dep_flow;\n\n    for (int i = 0; i < num_sa; i++)\n    {\n        struct autosa_kernel *sa = sa_list[i];\n        struct sa_candidates_smart_pick_update_data data;\n        data.score = 0;\n        data.sa = sa;\n        /* Initialize the autosa_loop_types. */\n        sa_loop_init(sa);\n        /* Set up the space_time properties. */\n        sa_space_time_loop_setup(sa);\n\n        dep_rar = sa->scop->tagged_dep_rar;\n        dep_flow = sa->scop->tagged_dep_flow;\n\n        data.dep_type = AUTOSA_DEP_RAR;\n        isl_union_map_every_map(dep_rar, &sa_candidates_smart_pick_update, &data);\n        data.dep_type = AUTOSA_DEP_RAW;\n        isl_union_map_every_map(dep_flow, &sa_candidates_smart_pick_update, &data);\n        /* Add one more credit for 2D arrays. */\n        if (sa->n_sa_dim == 2)\n            data.score += 1;\n        if (data.score > max_score)\n        {\n            opt_id = i;\n            max_score = data.score;\n        }        \n    }\n\n    //sa_opt = autosa_kernel_copy(sa_list[opt_id]);\n    sa_opt = sa_list[opt_id];\n\n    for (int i = 0; i < num_sa; i++) {\n        if (i == opt_id)\n            continue;\n        else\n            autosa_kernel_free(sa_list[i]);\n    }\n    free(sa_list);\n\n    return sa_opt;\n}\n\n/* Return the selected systolic array design and free the rest. */\nstruct autosa_kernel *sa_candidates_manual_pick(struct autosa_kernel **sa_list,\n                                                isl_size num_sa, int sa_id)\n{\n    struct autosa_kernel *sa_opt = sa_list[sa_id];\n\n    for (int i = 0; i < num_sa; i++) {        \n        if (sa_id == i)\n            continue;\n        else\n            autosa_kernel_free(sa_list[i]);\n    }\n    free(sa_list);\n\n    return sa_opt;\n}\n\n/* Create the array of autosa_local_array_info structures \"array\"\n * inside \"kernel\". The number of elements in this array is \n * the same as the number of arrays in \"prog\".\n * Initialize the \"array\" field of each local array to point \n * to the corresponding array in \"prog\".\n */\nstatic struct autosa_kernel *autosa_kernel_create_local_arrays(\n    struct autosa_kernel *kernel, struct autosa_prog *prog)\n{\n    int i;\n    isl_ctx *ctx;\n\n    if (!kernel)\n        return NULL;\n\n    ctx = isl_set_get_ctx(prog->context);\n    //kernel->array = isl_calloc_array(ctx,\n    //                                 struct autosa_local_array_info, prog->n_array);\n    /* Initialize local_array_info */\n    kernel->array = new autosa_local_array_info[prog->n_array];\n    if (!kernel->array)\n        return (struct autosa_kernel *)autosa_kernel_free(kernel);\n    kernel->n_array = prog->n_array;\n\n    for (i = 0; i < prog->n_array; i++)\n    {\n        kernel->array[i].array = &prog->array[i];\n        prog->array[i].local_array = &kernel->array[i];\n        /* Initialize the fields. */\n        kernel->array[i].n_io_group_refs = 0;\n        kernel->array[i].n_mem_ports = 0;\n        kernel->array[i].host_serialize = 0;\n        kernel->array[i].serialize_bound = NULL;\n        /* Initiaze the sparse information */\n        kernel->array[i].is_sparse = 0;\n        kernel->array[i].vec_len = 0;\n        kernel->array[i].n_nzero = 0;\n        kernel->array[i].compress_ratio = 0.0f;\n        kernel->array[i].n_meta_data = 0;\n        kernel->array[i].eff_compress_ratio = 0.0f;\n        kernel->array[i].global = 0;\n    }\n\n    return kernel;\n}\n\n/* Internal data struct used for sa_io_update. */\nstruct data_transfer_opt_data\n{\n    struct autosa_stmt_access *access;\n    struct autosa_kernel *kernel;\n    enum autosa_dep_type dep_type;\n    isl_bool is_update;\n};\n\n/* If dependence is carried by the space loop, then mark it with the access \n * as exterior I/O; otherwise, mark it as the interior I/O.\n * In addition, update the dependence vector.\n */\nisl_stat data_transfer_update(__isl_keep isl_basic_map *dep, struct data_transfer_opt_data *data)\n{\n    struct autosa_stmt_access *access = data->access;\n    struct autosa_kernel *kernel = data->kernel;\n    isl_id *src_id, *dest_id;\n    isl_space *space;\n    isl_space *src_space, *dest_space;\n    isl_schedule_node *node;\n\n    /* Test if the access is associated with the current dep. */\n    space = isl_basic_map_get_space(dep);\n    src_space = isl_space_unwrap(isl_space_domain(isl_space_copy(space)));\n    dest_space = isl_space_unwrap(isl_space_range(space));\n    src_id = isl_space_get_tuple_id(src_space, isl_dim_out);\n    dest_id = isl_space_get_tuple_id(dest_space, isl_dim_out);\n    isl_space_free(src_space);\n    isl_space_free(dest_space);\n\n    if (src_id != access->ref_id && dest_id != access->ref_id)\n    {\n        isl_id_free(src_id);\n        isl_id_free(dest_id);\n        return isl_stat_ok;\n    }\n    isl_id_free(src_id);\n    isl_id_free(dest_id);\n\n    /* Test if the dependence is carried at the space loop. */\n    struct dep_space_test_internal_data internal_data = {NULL, dep};\n    node = isl_schedule_get_root(kernel->schedule);\n    int is_carried_at_space = !isl_schedule_node_every_descendant(\n        node, not_carried_at_space, &internal_data);\n    if (is_carried_at_space)\n    {\n        access->io_info = (struct autosa_io_info **)realloc(\n            access->io_info, sizeof(struct autosa_io_info *) * (++access->n_io_info));\n        access->io_info[access->n_io_info - 1] =\n            (struct autosa_io_info *)malloc(sizeof(struct autosa_io_info));\n        access->io_info[access->n_io_info - 1]->io_type = AUTOSA_EXT_IO;\n        access->io_info[access->n_io_info - 1]->dep =\n            (struct autosa_dep *)calloc(1, sizeof(struct autosa_dep));\n        access->io_info[access->n_io_info - 1]->dep->isl_dep = isl_basic_map_copy(dep);\n        access->io_info[access->n_io_info - 1]->dep->type = data->dep_type;\n        access->io_info[access->n_io_info - 1]->dir = internal_data.dirvec;\n        access->io_info[access->n_io_info - 1]->old_dir = isl_vec_dup(internal_data.dirvec);        \n    }\n    else\n    {\n        access->io_info = (struct autosa_io_info **)realloc(\n            access->io_info, sizeof(struct autosa_io_info *) * (++access->n_io_info));\n        access->io_info[access->n_io_info - 1] =\n            (struct autosa_io_info *)malloc(sizeof(struct autosa_io_info));\n        access->io_info[access->n_io_info - 1]->io_type = AUTOSA_INT_IO;\n        access->io_info[access->n_io_info - 1]->dep =\n            (struct autosa_dep *)calloc(1, sizeof(struct autosa_dep));\n        access->io_info[access->n_io_info - 1]->dep->isl_dep = isl_basic_map_copy(dep);\n        access->io_info[access->n_io_info - 1]->dep->type = data->dep_type;\n        access->io_info[access->n_io_info - 1]->dir = internal_data.dirvec;\n        access->io_info[access->n_io_info - 1]->old_dir = isl_vec_dup(internal_data.dirvec);        \n    }\n\n    isl_schedule_node_free(node);\n    data->is_update = isl_bool_true;\n\n    return isl_stat_ok;\n}\n\n/* Examine each dependence as basic maps in the \"map\".\n */\nstatic isl_bool data_transfer_update_wrap(__isl_keep isl_map *map, void *user)\n{\n    isl_basic_map_list *bmap_list = isl_map_get_basic_map_list(map);\n    for (int i = 0; i < isl_map_n_basic_map(map); i++)\n    {\n        isl_basic_map *dep = isl_basic_map_list_get_basic_map(bmap_list, i);\n        struct data_transfer_opt_data *opt_data = (struct data_transfer_opt_data *)user;\n        data_transfer_update(dep, opt_data);\n        isl_basic_map_free(dep);\n    }\n    isl_basic_map_list_free(bmap_list);\n    return isl_bool_true;\n}\n\n/* This function extracts the communication pairs from the kernel.\n * Each access is paired with the dependence it is associated with.\n * We consider three types of deps: RAR, RAW, WAW.\n * For each comm pair <access, dep>, we update two properties:\n * - I/O type: exterior I/O or interior I/O.\n * - I/O direction: the dependence vector on the space loops.\n */\nstatic isl_stat sa_io_update(struct autosa_kernel *sa)\n{\n    struct autosa_local_array_info *local_array;\n    /* Initialize the IO info */\n    for (int i = 0; i < sa->n_array; i++)\n    {\n        local_array = &sa->array[i];\n        for (int j = 0; j < sa->array[i].array->n_ref; j++)\n        {\n            struct autosa_stmt_access *access = sa->array[i].array->refs[j];\n            access->n_io_info = 0;\n            access->io_info = NULL;\n        }\n        local_array->n_lane = 0;\n        local_array->array->n_lane = 0;\n    }\n\n    /* Update the IO information */\n    for (int i = 0; i < sa->n_array; i++)\n    {\n        local_array = &sa->array[i];\n        local_array->array_type = AUTOSA_UNKNOWN_ARRAY;\n        for (int j = 0; j < local_array->array->n_ref; j++)\n        {\n            struct autosa_stmt_access *access = local_array->array->refs[j];\n            isl_union_map *dep_rar = sa->scop->tagged_dep_rar;\n            isl_union_map *dep_flow = sa->scop->tagged_dep_flow;\n            isl_union_map *dep_waw = sa->scop->tagged_dep_waw;\n            struct data_transfer_opt_data opt_data =\n                {access, sa, AUTOSA_DEP_UNKNOWN, isl_bool_false};\n\n            opt_data.dep_type = AUTOSA_DEP_RAR;\n            isl_union_map_every_map(dep_rar, &data_transfer_update_wrap, &opt_data);\n            if (opt_data.is_update == isl_bool_true)\n            {\n                local_array->array_type = AUTOSA_EXT_ARRAY;\n                opt_data.is_update = isl_bool_false;\n            }\n            opt_data.dep_type = AUTOSA_DEP_RAW;\n            isl_union_map_every_map(dep_flow, &data_transfer_update_wrap, &opt_data);\n            if (opt_data.is_update == isl_bool_true)\n            {\n                local_array->array_type = AUTOSA_INT_ARRAY;\n                opt_data.is_update = isl_bool_false;\n            }\n            opt_data.dep_type = AUTOSA_DEP_WAW;\n            isl_union_map_every_map(dep_waw, &data_transfer_update_wrap, &opt_data);\n        }\n    }\n\n    return isl_stat_ok;\n}\n\nvoid extract_sa_dims_from_node(__isl_keep isl_schedule_node *node, int *sa_dims, int n_sa_dim)\n{\n    int *ubs;\n    ubs = extract_band_upper_bounds(node);\n    for (int i = 0; i < n_sa_dim; i++) {\n        sa_dims[i] = ubs[i];\n    }\n    free(ubs);    \n}\n\n/* Apply array partitioning.\n * Apply loop tiling on the band that contains the space loops.\n * In addition, if L2 array partitioning is abled, we will tile the tile loops\n * from the previous array partitioning again to generate two-level tiling.\n * TODO: Reorganize the array partitioning loops and place them following the\n * ascending order of the dependence distances. \n * \n * en: enable signal for array partitioning.\n * mode: opt mode for array partitioning.\n * L2_en: enable signal for L2 array partitioning.\n * L2_mode: opt mode for L2 array partitioning.\n */\nisl_stat sa_array_partitioning_optimize(struct autosa_kernel *sa,\n                                        bool en, char *mode, bool L2_en, char *L2_mode)\n{\n    int tile_len;\n    isl_schedule *schedule;\n    int *tile_size;\n    isl_id *id;\n\n    /* Fetch the band that contains the space loops. */\n    isl_schedule_node *node;\n    if (sa->type == AUTOSA_SA_TYPE_SYNC)\n    {\n        node = get_innermost_permutable_node(sa->schedule);\n    }\n    else if (sa->type == AUTOSA_SA_TYPE_ASYNC)\n    {\n        node = get_outermost_permutable_node(sa->schedule);\n    }\n    else\n    {\n        isl_die(sa->ctx, isl_error_invalid,\n                \"systolic array type not supported\", return isl_stat_error);\n    }\n\n    if (!en)\n    {\n        /* Array partitioning is disabled, we will simply add an \"array\" mark before\n         * the space band and return.\n         */\n        id = isl_id_alloc(sa->ctx, \"array\", NULL);\n        node = isl_schedule_node_insert_mark(node, id);\n\n        isl_schedule_free(sa->schedule);\n        sa->schedule = isl_schedule_node_get_schedule(node);\n        isl_schedule_node_free(node);\n        return isl_stat_ok;\n    }\n\n    printf(\"[AutoSA] Apply array partitioning.\\n\");\n\n    /* Mark the loop properties. */\n    for (int i = 0; i < isl_schedule_node_band_n_member(node); i++)\n    {\n        node = isl_schedule_node_band_member_set_pe_opt(node, i, autosa_loop_array_part);\n    }\n    schedule = isl_schedule_node_get_schedule(node);\n\n    if (sa->scop->options->autosa->verbose)\n    {\n        /* Display the candidate loops. */\n        isl_printer *p = isl_printer_to_file(sa->ctx, stdout);\n        p = isl_printer_set_yaml_style(p, ISL_YAML_STYLE_BLOCK);\n        p = isl_printer_print_schedule(p, schedule);\n        printf(\"\\n\");\n        isl_printer_free(p);\n    }\n    isl_schedule_free(schedule);\n\n    tile_len = isl_schedule_node_band_n_member(node);\n    if (sa->scop->options->autosa->tuning_method == 1) {\n        /* Select one tiling factor in between (1, ub)/\n         * Avoid 1 as such as tiling factor will eliminate the opt chances for the \n         * later stages. \n         * Avoid ub as it will generate loop with single iteration that will be eliminated.\n         */\n        tile_size = extract_band_upper_bounds(node);\n        for (int i = 0; i < tile_len; i++) {\n            int size = tile_size[i];\n            std::vector<int> factors = get_factors(size);\n            if (factors.size() < 3) {\n                printf(\"[AutoSA] Error: Cannot find legal tiling factors for auto-tuning template!\\n\");\n                exit(1);\n            }\n            tile_size[i] = factors[factors.size() - 2];\n        }\n    } else {\n        if (!strcmp(mode, \"manual\"))\n        {\n            /* Manual mode */\n            tile_size = read_array_part_tile_sizes(sa, tile_len);\n            if (!tile_size)\n            {\n                /* User hasn't specified the tiling factors for array partitioning yet,\n                 * we will dump out the number and upper bounds of array_part loops \n                 * and exit the program. */\n                int *ubs = extract_band_upper_bounds(node);\n                FILE *fp;\n                char *content;\n                cJSON *tuning, *array_part_json, *loops_json, *n_sa_dim_json;\n                isl_printer *p_str;\n                char *tuning_path;\n\n                tuning = cJSON_CreateObject();\n                array_part_json = cJSON_CreateObject();\n                cJSON_AddItemToObject(tuning, \"array_part\", array_part_json);\n                loops_json = cJSON_CreateArray();\n                cJSON_AddItemToObject(array_part_json, \"tilable_loops\", loops_json);\n                for (int i = 0; i < tile_len; i++)\n                {\n                    cJSON *loop = cJSON_CreateNumber(ubs[i]);\n                    cJSON_AddItemToArray(loops_json, loop);\n                }\n                /* Add the sa_dim */\n                n_sa_dim_json = cJSON_CreateNumber(sa->n_sa_dim);\n                cJSON_AddItemToObject(array_part_json, \"n_sa_dim\", n_sa_dim_json);\n                p_str = isl_printer_to_str(sa->ctx);\n                p_str = isl_printer_print_str(p_str, sa->options->autosa->output_dir);\n                p_str = isl_printer_print_str(p_str, \"/tuning.json\");\n                tuning_path = isl_printer_get_str(p_str);\n                fp = fopen(tuning_path, \"w\");\n                content = cJSON_Print(tuning);\n                fprintf(fp, \"%s\", content);\n                cJSON_Delete(tuning);\n                isl_printer_free(p_str);\n                free(tuning_path);\n                exit(0);\n            }\n        }\n        else\n        {\n            /* Auto mode.\n             * Perform the array partitioning following the default policy. */\n            tile_size = read_default_array_part_tile_sizes(sa, tile_len);\n        }\n    }\n\n    /* Tile the band. */\n    if (!tile_size)\n    {\n        isl_schedule_node_free(node);\n        return isl_stat_error;\n    }    \n    /* Examine if all tiling factors are -1, in that case, we will skip array \n     * partitioning. \n     */\n    int c;\n    for (c = 0; c < tile_len; c++) {\n        if (tile_size[c] != -1)\n            break;\n    }\n    if (c == tile_len) {\n        id = isl_id_alloc(sa->ctx, \"array\", NULL);\n        node = isl_schedule_node_insert_mark(node, id);\n        node = isl_schedule_node_child(node, 0);\n        extract_sa_dims_from_node(node, sa->sa_dim, sa->n_sa_dim);        \n\n        free(tile_size);\n        isl_schedule_free(sa->schedule);\n        sa->schedule = isl_schedule_node_get_schedule(node);\n        isl_schedule_node_free(node);\n        return isl_stat_ok;\n    }\n    /* For now, our codegen doesn't support arrays with size one at any dim. \n     * We will examine if array size is one at any dimension, and return if found. \n     */\n    for (int i = 0; i < sa->n_sa_dim; i++)\n    {\n        if ((sa->type == AUTOSA_SA_TYPE_SYNC && tile_size[tile_len - sa->n_sa_dim + i] == 1) ||\n           (sa->type == AUTOSA_SA_TYPE_ASYNC && tile_size[i] == 1)) {            \n            printf(\"[AutoSA] Tiling factor 1 for array partitioning is not supported. Array partitioning is skipped.\\n\");\n            /* Skip the array partition. */\n            id = isl_id_alloc(sa->ctx, \"array\", NULL);\n            node = isl_schedule_node_insert_mark(node, id);\n            node = isl_schedule_node_child(node, 0);\n            extract_sa_dims_from_node(node, sa->sa_dim, sa->n_sa_dim);\n\n            free(tile_size);\n            isl_schedule_free(sa->schedule);\n            sa->schedule = isl_schedule_node_get_schedule(node);\n            isl_schedule_node_free(node);\n            return isl_stat_ok;\n        }\n    }\n        \n    sa->array_part_w = tile_len;\n    node = autosa_tile_band(node, tile_size);\n    if (sa->scop->options->autosa->tuning_method == 1)\n        node = sa->tuning_program->tile(node, 0, \"array_part\");\n\n    free(tile_size);\n    node = isl_schedule_node_child(node, 0);\n    extract_sa_dims_from_node(node, sa->sa_dim, sa->n_sa_dim);    \n    node = isl_schedule_node_parent(node);\n\n    /* Reorder the array part loops based on the dependence distance. */    \n    node = reorder_band_by_dep_dis(node);\n\n    /* Add the array marker */\n    node = isl_schedule_node_child(node, 0);\n    id = isl_id_alloc(sa->ctx, \"array\", NULL);\n    node = isl_schedule_node_insert_mark(node, id);\n    node = isl_schedule_node_parent(node);\n\n    /* Examine if there is any flow dep carried in the array_part band. \n     * For this case, we need to implement a credit-based dependence queue to \n     * force the possible data dependence between two array partitions. \n     * TODO: implement this feature. \n     */\n    //if (!sa->options->autosa->credit_control)\n    //{\n    //    for (int i = 0; i < isl_schedule_node_band_n_member(node); i++)\n    //    {\n    //        if (!isl_schedule_node_band_member_get_coincident(node, i))\n    //        {\n    //            printf(\"[AutoSA] Warning: Flow deps carried in the array partitioning band.\\n\");\n    //            printf(\"[AutoSA] Warning: Using simple task pipelining could lead to potential data hazards.\\n\");\n    //            printf(\"[AutoSA] Warning: The program will proceed as usual. You could consider enabling credit control.\\n\");\n    //            break;\n    //        }\n    //    }\n    //}\n    //else\n    //{\n    //    printf(\"[AutoSA] Error: Credit control is not supported yet!\\n\");\n    //    exit(1);\n    //    // TODO: modify the schedule to add credit rd/wr for I/O modules\n    //    // TODO: modify the module decls and fifo decls for credit fifos\n    //    // TODO: disable double buffering.\n    //    //    /* Disable double-buffering */\n    //    //    sa->options->autosa->double_buffer = 0;\n    //}\n\n    /* If two-level buffering is enabled, we will need to apply a second-level tiling\n   * on the tile band from the previous array partitioning. \n   * Namely, after array partitioning, we get two bands:\n   * T\n   * |\n   * P\n   * To support two-level buffering, we will tile the band T again:\n   * T1\n   * |\n   * T2\n   * |\n   * P\n   */\n    if (sa->options->autosa->two_level_buffer)\n    {\n        if (L2_en)\n        {\n            /* Tile the band again */\n            printf(\"[AutoSA] Two-level buffering is set. Apply second-level array partitioning.\\n\");\n            tile_len = isl_schedule_node_band_n_member(node);\n            if (!strcmp(mode, \"manual\"))\n            {\n                tile_size = read_array_part_L2_tile_sizes(sa, tile_len);\n                if (!tile_size)\n                {\n                    /* Dump out the number of and upper bounds of array_part loops and exit the program. */\n                    int *ubs = extract_band_upper_bounds(node);\n                    int *loop_coincident = (int *)malloc(sizeof(int) * tile_len);\n                    FILE *fp;\n                    char *content;\n                    cJSON *tuning, *array_part_json, *loops_json;\n                    isl_printer *p_str;\n                    char *tuning_path;\n\n                    for (int i = 0; i < tile_len; i++)\n                    {\n                        loop_coincident[i] = isl_schedule_node_band_member_get_coincident(node, i);\n                    }\n\n                    tuning = cJSON_CreateObject();\n                    array_part_json = cJSON_CreateObject();\n                    cJSON_AddItemToObject(tuning, \"array_part_L2\", array_part_json);\n                    loops_json = cJSON_CreateArray();\n                    cJSON_AddItemToObject(array_part_json, \"tilable_loops\", loops_json);\n                    for (int i = 0; i < tile_len; i++)\n                    {\n                        cJSON *loop = cJSON_CreateNumber(ubs[i]);\n                        cJSON_AddItemToArray(loops_json, loop);\n                    }\n                    loops_json = cJSON_CreateArray();\n                    cJSON_AddItemToObject(array_part_json, \"coincident\", loops_json);\n                    for (int i = 0; i < tile_len; i++)\n                    {\n                        cJSON *loop = cJSON_CreateNumber(loop_coincident[i]);\n                        cJSON_AddItemToArray(loops_json, loop);\n                    }\n                    p_str = isl_printer_to_str(sa->ctx);\n                    p_str = isl_printer_print_str(p_str, sa->options->autosa->output_dir);\n                    p_str = isl_printer_print_str(p_str, \"/tuning.json\");\n                    tuning_path = isl_printer_get_str(p_str);\n                    fp = fopen(tuning_path, \"w\");\n                    content = cJSON_Print(tuning);\n                    fprintf(fp, \"%s\", content);\n                    cJSON_Delete(tuning);\n                    free(tuning_path);\n                    free(loop_coincident);\n                    isl_printer_free(p_str);\n                    free(ubs);\n                    exit(0);\n                }\n            }\n            else\n            {\n                /* Perform second-level array partitioning following the default policy. */\n                // tile_size = read_default_array_part_L2_tile_sizes(sa, tile_len);\n                int *ubs = extract_band_upper_bounds(node);\n                tile_size = isl_alloc_array(sa->ctx, int, tile_len);\n                for (int i = 0; i < tile_len; i++)\n                {\n                    tile_size[i] = ubs[i];\n                }\n                free(ubs);\n            }\n\n            if (!tile_size)\n            {\n                isl_schedule_node_free(node);\n                return isl_stat_error;\n            }\n            node = autosa_tile_band(node, tile_size);\n            free(tile_size);\n\n            /* Add the second-level array mark */\n            node = isl_schedule_node_child(node, 0);\n            id = isl_id_alloc(sa->ctx, \"array_L2\", NULL);\n            node = isl_schedule_node_insert_mark(node, id);\n            node = isl_schedule_node_parent(node);\n        }\n        else\n        {\n            /* Disable the L2 array partitioning */\n            sa->options->autosa->two_level_buffer = 0;\n        }\n    }\n\n    /* Clean up the band pe_opt properties. */\n    schedule = isl_schedule_node_get_schedule(node);\n    isl_schedule_node_free(node);\n    schedule = isl_schedule_map_schedule_node_bottom_up(\n        schedule, &clear_pe_opt_prop, NULL);\n\n    isl_schedule_free(sa->schedule);\n    sa->schedule = schedule;\n\n    return isl_stat_ok;\n}\n\n/* Insert an \"hls_pipeline\" mark under the last time loop */\nstatic __isl_give isl_schedule_node *add_hls_pipeline(\n    __isl_take isl_schedule_node *node, void *user)\n{\n    struct autosa_kernel *sa = (struct autosa_kernel *)user;\n    isl_ctx *ctx = sa->ctx;\n\n    if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n        return node;\n\n    /* Examine if the node is innermost */\n    node = isl_schedule_node_child(node, 0);\n    isl_bool no_inner_band = isl_schedule_node_every_descendant(node,\n                                                                &no_permutable_node, NULL);\n    node = isl_schedule_node_parent(node);\n    if (!no_inner_band)\n        return node;\n\n    int n = isl_schedule_node_band_n_member(node);\n\n    if (sa->type == AUTOSA_SA_TYPE_ASYNC)\n    {\n        if (isl_schedule_node_band_member_get_space_time(node, n - 1) == autosa_loop_time)\n        {\n            isl_id *id;\n            id = isl_id_alloc(ctx, \"hls_pipeline\", NULL);\n            node = isl_schedule_node_child(node, 0);\n            node = isl_schedule_node_insert_mark(node, id);\n            node = isl_schedule_node_parent(node);\n        }\n    }\n    else if (sa->type == AUTOSA_SA_TYPE_SYNC)\n    {\n        /* Go to the innermost band with time loops. */\n        if (isl_schedule_node_band_member_get_space_time(node, 0) != autosa_loop_time)\n        {\n            node = isl_schedule_node_parent(node);\n            while (isl_schedule_node_get_type(node) != isl_schedule_node_band &&\n                   isl_schedule_node_has_parent(node))\n            {\n                node = isl_schedule_node_parent(node);\n            }\n        }\n        if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n        {\n            n = isl_schedule_node_band_n_member(node);\n            for (int i = n - 1; i >= 0; i--)\n            {\n                if (isl_schedule_node_band_member_get_space_time(node, i) == autosa_loop_time)\n                {\n                    isl_id *id = isl_id_alloc(ctx, \"hls_pipeline\", NULL);\n                    if (i != n - 1)\n                    {\n                        node = isl_schedule_node_band_split(node, i + 1);\n                    }\n                    node = isl_schedule_node_child(node, 0);\n                    node = isl_schedule_node_insert_mark(node, id);\n                    node = isl_schedule_node_parent(node);\n                    break;\n                }\n            }\n        }\n    }\n\n    return node;\n}\n\n/* Internal struct used for latency_opt_check */\nstruct latency_opt_check_data\n{\n    struct autosa_kernel *kernel;\n    int is_required;\n};\n\n/* Check if the innermost time loop is parallel.\n * If this loop is parallel, it can be used for latency hiding and \n * there is no need for further optimization.\n * We will split off this loop from the band, and attach a \"latency\"\n * marker above it.\n */\nstatic __isl_give isl_schedule_node *latency_opt_check(\n    __isl_take isl_schedule_node *node, void *user)\n{\n    struct latency_opt_check_data *data = (struct latency_opt_check_data *)user;\n    struct autosa_kernel *sa = data->kernel;\n    isl_ctx *ctx = sa->ctx;\n\n    if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n        return node;\n\n    /* Examine if the node is innermost */\n    node = isl_schedule_node_child(node, 0);\n    isl_bool no_inner_band = isl_schedule_node_every_descendant(node,\n                                                                &no_permutable_node, NULL);\n    node = isl_schedule_node_parent(node);\n    if (!no_inner_band)\n        return node;\n\n    int n = isl_schedule_node_band_n_member(node);\n\n    if (sa->type == AUTOSA_SA_TYPE_ASYNC)\n    {\n        if (isl_schedule_node_band_member_get_coincident(node, n - 1) &&\n            isl_schedule_node_band_member_get_space_time(node, n - 1) == autosa_loop_time)\n        {\n            //isl_id *id;\n            data->is_required = 0;\n            ///* Split off the loop and attach a \"latency\" mark */\n            //if (n > 1)\n            //{\n            //    node = isl_schedule_node_band_split(node, n - 1);\n            //    node = isl_schedule_node_child(node, 0);\n            //}\n            //id = isl_id_alloc(ctx, \"latency\", NULL);\n            //node = isl_schedule_node_insert_mark(node, id);\n            //node = isl_schedule_node_parent(node);\n        }\n    }\n    else if (sa->type == AUTOSA_SA_TYPE_SYNC)\n    {        \n        if (isl_schedule_node_band_member_get_space_time(node, 0) != autosa_loop_time)\n        {\n            node = isl_schedule_node_parent(node);\n            while (isl_schedule_node_get_type(node) != isl_schedule_node_band &&\n                   isl_schedule_node_has_parent(node))\n            {\n                node = isl_schedule_node_parent(node);\n            }\n        }\n        if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n        {\n            n = isl_schedule_node_band_n_member(node);\n            for (int i = n - 1; i >= 0; i--)\n            {\n                if (isl_schedule_node_band_member_get_space_time(node, i) == autosa_loop_time)\n                {\n                    if (isl_schedule_node_band_member_get_coincident(node, i))\n                    {\n                        //isl_id *id;\n                        data->is_required = 0;\n                        ///* Split off the time loop */\n                        //if (i > 1)\n                        //{\n                        //    node = isl_schedule_node_band_split(node, i);\n                        //    node = isl_schedule_node_child(node, 0);\n                        //}\n                        //if (n - i - 1 > 0)\n                        //{\n                        //    node = isl_schedule_node_band_split(node, 1);\n                        //}\n                        //id = isl_id_alloc(ctx, \"latency\", NULL);\n                        //node = isl_schedule_node_insert_mark(node, id);\n                        //node = isl_schedule_node_parent(node);\n                    }\n                    break;\n                }\n            }\n        }\n    }\n\n    return node;\n}\n\n/* Mark parallel loop as latency_hiding candidate loop. \n */\nstatic isl_schedule_node *detect_latency_hiding_loop(__isl_take isl_schedule_node *node, void *user)\n{\n    struct autosa_kernel *sa = (struct autosa_kernel *)user;\n\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n    {\n        for (int i = 0; i < isl_schedule_node_band_n_member(node); i++)\n        {\n            if (isl_schedule_node_band_member_get_coincident(node, i))\n            {\n                node = isl_schedule_node_band_member_set_pe_opt(node, i, autosa_loop_latency);\n            }\n        }\n    }\n\n    return node;\n}\n\n/* Examine if the node is the last band node.\n * If so, add a \"latency\" mark before the node. \n */\nstatic __isl_give isl_schedule_node *add_latency_mark(\n    __isl_take isl_schedule_node *node, void *user)\n{\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n    {\n        node = isl_schedule_node_child(node, 0);\n        isl_bool no_inner_band = isl_schedule_node_every_descendant(node,\n                                                                    &no_permutable_node, NULL);\n        node = isl_schedule_node_parent(node);\n        if (no_inner_band)\n        {\n            /* Insert the \"latency\" mark. */\n            isl_id *id = isl_id_alloc(isl_schedule_node_get_ctx(node), \"latency\", NULL);\n            node = isl_schedule_node_insert_mark(node, id);\n        }\n    }\n\n    return node;\n}\n\n/* Sink the current node (latency hiding loop) as the last time loop. \n * If the array is async, then sink the node to the bottom.\n * If the array is sync, then lift it up and insert it as the last loop \n * in the time band.\n */\n__isl_give isl_schedule_node *autosa_latency_node_band_sink_time(\n    __isl_take isl_schedule_node *node, struct autosa_kernel *sa)\n{\n    if (sa->type == AUTOSA_SA_TYPE_ASYNC)\n    {\n        if (sa->options->autosa->isl_sink) {\n            node = isl_schedule_node_band_sink(node);\n            /* Add the \"latency\" mark. */\n            node = isl_schedule_node_map_descendant_bottom_up(\n                node, &add_latency_mark, NULL);\n\n        } \n        else {         \n            node = autosa_node_sink_to_mark(node, \"latency\");            \n        }\n    }\n    else if (sa->type == AUTOSA_SA_TYPE_SYNC)\n    {\n        /* Move up to the node that contains the space loop.\n         * The current node should be right below the space band.\n         */\n        node = isl_schedule_node_parent(node);\n\n        /* Find the position of the first space loop. */\n        int n_member = isl_schedule_node_band_n_member(node);\n        int space_pos;\n        for (int i = 0; i < n_member; i++)\n        {\n            if (isl_schedule_node_band_member_get_space_time(node, i) == autosa_loop_space)\n            {\n                space_pos = i;\n                break;\n            }\n        }\n        if (space_pos == 0)\n        {\n            /* Interchange the current node with the child node. */\n            node = autosa_node_interchange(node);\n            /* Insert the \"latency\" mark. */\n            isl_id *id = isl_id_alloc(sa->ctx, \"latency\", NULL);\n            node = isl_schedule_node_insert_mark(node, id);\n            node = isl_schedule_node_child(node, 0);\n            node = isl_schedule_node_child(node, 0);\n        }\n        else\n        {\n            node = isl_schedule_node_band_split(node, space_pos);\n            node = isl_schedule_node_child(node, 0);\n            /* Interchange the current node with the child node. */\n            node = autosa_node_interchange(node);\n            /* Insert the \"latency\" mark. */\n            isl_id *id = isl_id_alloc(sa->ctx, \"latency\", NULL);\n            node = isl_schedule_node_insert_mark(node, id);\n            node = isl_schedule_node_child(node, 0);\n            node = isl_schedule_node_child(node, 0);\n        }\n    }\n\n    return node;\n}\n\n/* Given each node band, tile the candidate loop and permute it innermost in the time\n * loop band. \n * If the tile size is no greater than 1, the candidate loop is skipped.\n * For each point loop, a \"latency\" mark is added.\n */\nstatic __isl_give isl_schedule_node *autosa_latency_tile_band_loop(\n    __isl_take isl_schedule_node *node, void *user)\n{\n    struct autosa_pe_opt_tile_data *data = (struct autosa_pe_opt_tile_data *)user;\n    if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n        return node;\n\n    int n;\n    isl_id *id;\n    n = isl_schedule_node_band_n_member(node);\n    int i;\n    int reverse_visit = 0;\n    \n    if (data->sa->options->autosa->reverse_order) {        \n        if (data->sa->options->autosa->isl_sink) {\n            i = n - 1;\n            reverse_visit = 1;            \n        } else {\n            i = 0;\n            reverse_visit = 0;    \n        }\n    } else {\n        if (data->sa->options->autosa->isl_sink) {            \n            i = 0;\n            reverse_visit = 0;   \n        } else {            \n            i = n - 1;\n            reverse_visit = 1;            \n        }\n    }\n\n    while (1)\n    {        \n        if (isl_schedule_node_band_member_get_pe_opt(node, i) == autosa_loop_latency)\n        {\n            int loop_tile_size;            \n            loop_tile_size = data->tile_size[data->n_touched_loop];\n            (data->n_touched_loop)++;\n            /* If latency hiding is applied on the space loops, we need to update\n             * the SA dimensions. \n             */\n            if (isl_schedule_node_band_member_get_space_time(node, i) == autosa_loop_space)\n            {\n                /* Figure out the dim position. */\n                int touched_space_loop = 0;\n                for (int j = 0; j < i; j++)\n                {\n                    if (isl_schedule_node_band_member_get_space_time(node, j) == autosa_loop_space)\n                        touched_space_loop++;\n                }\n                //std::cout << \"space: \" << data->sa->sa_dim[touched_space_loop] << \", \" << loop_tile_size << std::endl;\n                data->sa->sa_dim[touched_space_loop] /= loop_tile_size;                \n                if (data->sa->sa_dim[touched_space_loop] == 1) {\n                    throw std::runtime_error(\"[AutoSA] Error: Array dimension as 1 is not supported!\");\n                }\n            }\n\n            /* Skip loop tile size as 1 */\n            if (loop_tile_size > 1)\n            {                \n                /* Tile the current loop and permute it to be the innermost time loop.\n                 * Specifically, tile the loop in the band at \"i\"th position with the \n                 * size \"loop_tile_size\".\n                 * The returned node points at the tile loop. */\n                node = autosa_node_band_tile_loop(node, loop_tile_size, i);                \n                /* Reset the candidate loop in the tile loop the pe_opt property to default. */\n                node = isl_schedule_node_band_member_set_pe_opt(node, i, autosa_loop_default);\n                /* Reset the point loop space_time property to time loop. */\n                node = isl_schedule_node_child(node, 0);\n                node = isl_schedule_node_band_member_set_space_time(node, 0, autosa_loop_time);                \n                /* Reset the point loop pe_opt property to default .*/\n                node = isl_schedule_node_band_member_set_pe_opt(node, 0, autosa_loop_default);\n                if (data->sa->scop->options->autosa->tuning_method == 1) {\n                    node = isl_schedule_node_parent(node);\n                    node = data->sa->tuning_program->tile(node, i, 1, \"latency\", {}, -1);\n                    node = isl_schedule_node_child(node, 0);\n                }\n                /* Move the single loop node to the bottom of the time band. */\n                node = autosa_latency_node_band_sink_time(node, data->sa);                \n                (data->n_tiled_loop)++;\n                return node;\n            }\n            else\n            {\n                /* Reset the pe_opt property */\n                node = isl_schedule_node_band_member_set_pe_opt(node, i, autosa_loop_default);\n            }\n        }\n        if (reverse_visit) {\n            if (i == 0)\n                break;\n            i--;\n        } else {\n            if (i == n - 1)\n                break;\n            i++;\n        }\n    }\n\n    return node;\n}\n\n/* Internal struct for count_latency_hiding_loop. */\nstruct count_latency_hiding_loop_data\n{\n    int tile_len;\n    int *ubs;\n    struct autosa_kernel *kernel;\n};\n\n/* Count the number of latency hiding candidate loops.\n * Extract the loop upper bounds of the candidate loops.\n */\n//static isl_bool count_latency_hiding_loop(\n//    __isl_keep isl_schedule_node *node, void *user)\n//{\n//    struct count_latency_hiding_loop_data *data =\n//        (struct count_latency_hiding_loop_data *)user;\n//    isl_schedule_node *node_copy;\n//\n//    if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n//    {\n//        int n = isl_schedule_node_band_n_member(node);\n//        for (int i = 0; i < n; i++)\n//        {\n//            if (isl_schedule_node_band_member_get_pe_opt(node, i) == autosa_loop_latency)\n//            {\n//                data->tile_len = data->tile_len + 1;\n//                /* Extract the loop upper bound */\n//                node_copy = isl_schedule_node_copy(node);\n//                if (i > 0)\n//                {\n//                    node_copy = isl_schedule_node_band_split(node_copy, i);\n//                    node_copy = isl_schedule_node_child(node_copy, 0);\n//                }\n//                if (n - i - 1 > 0)\n//                {\n//                    node_copy = isl_schedule_node_band_split(node_copy, 1);\n//                }\n//                int *ubs = extract_band_upper_bounds(node_copy);\n//                data->ubs = (int *)realloc(data->ubs, sizeof(int) * data->tile_len);\n//                data->ubs[data->tile_len - 1] = ubs[0];\n//                isl_schedule_node_free(node_copy);\n//                free(ubs);\n//            }\n//        }\n//    }\n//\n//    return isl_bool_true;\n//}\n\nstatic __isl_give isl_schedule_node *count_latency_hiding_loop(\n    __isl_take isl_schedule_node *node, void *user)\n{\n    struct count_latency_hiding_loop_data *data =\n        (struct count_latency_hiding_loop_data *)user;    \n    if (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n        return node;\n    \n    int n = isl_schedule_node_band_n_member(node);\n    int i;\n    int reverse_visit = 0;\n    if ((data->kernel->options->autosa->reverse_order && !data->kernel->options->autosa->isl_sink) ||\n       (!data->kernel->options->autosa->reverse_order && data->kernel->options->autosa->isl_sink)) {\n        i = 0;\n        reverse_visit = 0;\n    } else {\n        i = n - 1;\n        reverse_visit = 1;\n    }\n    while (1) {\n        if (isl_schedule_node_band_member_get_pe_opt(node, i) == autosa_loop_latency) {\n            data->tile_len = data->tile_len + 1;\n            /* Extract the loop upper bound */\n            isl_schedule_node *node_copy = isl_schedule_node_copy(node);\n            if (i > 0)\n            {\n                node_copy = isl_schedule_node_band_split(node_copy, i);\n                node_copy = isl_schedule_node_child(node_copy, 0);\n            }\n            if (n - i - 1 > 0)\n            {\n                node_copy = isl_schedule_node_band_split(node_copy, 1);\n            }\n            int *ubs = extract_band_upper_bounds(node_copy);\n            data->ubs = (int *)realloc(data->ubs, sizeof(int) * data->tile_len);\n            data->ubs[data->tile_len - 1] = ubs[0];\n            isl_schedule_node_free(node_copy);\n            free(ubs);            \n        }        \n        if (reverse_visit) {\n            if (i == 0)\n                break;\n            i--;\n        } else {\n            if (i == n - 1)\n                break;\n            i++;\n        }\n    }\n\n    return node;\n}\n\n/* Perform the latency hiding in either \"Manual\" or \"Auto\" mode.\n * We will tile each loop with a tiling factor greater than one, and place\n * the point loop as the innermost time loop. \n * A \"latency\" mark is placed before this loop.\n * A \"hls_pipeline\" mark is placed under this loop.\n */\nstatic __isl_give isl_schedule_node *autosa_latency_tile_loop(\n    __isl_take isl_schedule_node *node, struct autosa_kernel *sa, char *mode)\n{\n    int tile_len;\n    int *tile_size;\n    struct count_latency_hiding_loop_data data;\n    data.tile_len = 0;\n    data.ubs = NULL;\n    data.kernel = sa;\n    int i;\n\n    /* Count the candidate loop number and extract the loop upper bounds. */\n    //isl_schedule_node_foreach_descendant_top_down(\n    //    node, &count_latency_hiding_loop, &data);\n    node = isl_schedule_node_map_descendant_bottom_up(node, &count_latency_hiding_loop, &data);\n    tile_len = data.tile_len;\n\n    if (sa->scop->options->autosa->tuning_method == 1) {\n        /* Select one tiling factor in between (1, ub).\n         * Avoid 1 as such a tiling factor will be skipped and the AST loop will \n         * be degenerated.\n         * Avoid ub as generating space dim with 1 is not supported. \n         */        \n        tile_size = data.ubs;\n        for (int i = 0; i < tile_len; i++) {\n            int size = tile_size[i];\n            std::vector<int> factors = get_factors(size);\n            if (factors.size() < 3) {\n                printf(\"[AutoSA] Error: Cannot find legal tiling factors for auto-tuning template!\\n\");\n                exit(1);\n            }\n            tile_size[i] = factors[1];\n        }\n    } else {\n        if (!strcmp(mode, \"manual\"))\n        {\n            tile_size = read_latency_tile_sizes(sa, tile_len);\n            if (!tile_size)\n            {\n                /* Dump out the number and upper bounds of latency loops and exit the program. */\n                int *ubs = data.ubs;\n                FILE *fp;\n                char *content;\n                cJSON *tuning, *latency_json, *loops_json;\n                char *tuning_path;\n                isl_printer *p_str;\n\n                tuning = cJSON_CreateObject();\n                latency_json = cJSON_CreateObject();\n                cJSON_AddItemToObject(tuning, \"latency\", latency_json);\n                loops_json = cJSON_CreateArray();\n                cJSON_AddItemToObject(latency_json, \"tilable_loops\", loops_json);\n                for (int i = 0; i < tile_len; i++)\n                {\n                    cJSON *loop = cJSON_CreateNumber(ubs[i]);\n                    cJSON_AddItemToArray(loops_json, loop);\n                }\n                p_str = isl_printer_to_str(sa->ctx);\n                p_str = isl_printer_print_str(p_str, sa->options->autosa->output_dir);\n                p_str = isl_printer_print_str(p_str, \"/tuning.json\");\n                tuning_path = isl_printer_get_str(p_str);\n                fp = fopen(tuning_path, \"w\");\n                content = cJSON_Print(tuning);\n                fprintf(fp, \"%s\", content);\n                cJSON_Delete(tuning);\n                isl_printer_free(p_str);\n                free(tuning_path);\n                exit(0);\n            }\n        }\n        else\n        {\n            /* Perform the latency hiding following the default policy. */\n            tile_size = read_default_latency_tile_sizes(sa, tile_len);\n        }\n        free(data.ubs);\n    }    \n\n    if (!tile_size)\n    {\n        isl_schedule_node_free(node);        \n        return NULL;\n    }\n\n    /* Examine if all the tiling factors are 1, in that case, we will\n     * skip the tiling and split off the last time dimension to add a \n     * hls_pipeline mark. */\n    for (i = 0; i < tile_len; i++)\n    {\n        if (tile_size[i] != -1)\n            sa->lat_hide_len *= tile_size[i];\n    }\n    for (i = 0; i < tile_len; i++)\n    {\n        if (tile_size[i] > 1)\n            break;\n    }\n    if (i == tile_len)\n    {\n        node = isl_schedule_node_map_descendant_bottom_up(node,\n                                                          &add_hls_pipeline, sa);\n    }\n    else\n    {\n        /* Tile the candidate loops. */\n        struct autosa_pe_opt_tile_data tile_data = {0, 0, tile_len, tile_size, sa};\n        while (tile_data.n_touched_loop != tile_len)\n        {\n            node = isl_schedule_node_map_descendant_bottom_up(\n                node, &autosa_latency_tile_band_loop, &tile_data);\n        }\n    }\n    \n    free(tile_size);\n    return node;\n}\n\n/* Apply latency hiding. \n * Go through all the loops, if there is any parallel loop (considering only RAW), \n * such a loop will be identified as latency hiding loop candidate. \n * Such loops will be tiled. The point loops will be permuted as \n * the innermost time loops.\n * \n * en: enable signal for the current stage.\n * mode: manual/auto\n */\nisl_stat sa_latency_hiding_optimize(struct autosa_kernel *sa, bool en, char *mode)\n{\n    isl_bool opt_required;\n    isl_schedule *schedule = sa->schedule;\n    isl_schedule_node *node = isl_schedule_get_root(schedule);\n\n    if (!en)\n    {\n        /* This stage is disabled.\n         * We will peel off the last time loop and add an hls_pipeline mark as \n         * the innermost time loops are supposed to be pipelined on hardware. \n         */\n        node = isl_schedule_node_map_descendant_bottom_up(node,\n                                                          &add_hls_pipeline, sa);\n\n        isl_schedule_free(sa->schedule);\n        sa->schedule = isl_schedule_node_get_schedule(node);\n        isl_schedule_node_free(node);\n        return isl_stat_ok;\n    }\n\n    printf(\"[AutoSA] Apply latency hiding.\\n\");\n    sa->lat_hide_len = 1;\n\n    /* Move down to the array marker. */\n    node = autosa_tree_move_down_to_array(node, sa->core);\n\n    /* Check if the innermost time loop is parallel loop.\n     * If so, there is no need to perform latency hiding, safely reutrn.\n     */\n    struct latency_opt_check_data data;\n    data.kernel = sa;\n    data.is_required = 1;    \n    node = isl_schedule_node_map_descendant_bottom_up(node,\n                                                      &latency_opt_check, &data);\n    if (!data.is_required)\n    {             \n        printf(\"[AutoSA] The innermost time loop is parallel. Latency hiding is optional.\\n\");\n    }\n\n    /* Detect all candidate loops. */\n    node = isl_schedule_node_map_descendant_bottom_up(\n        node, &detect_latency_hiding_loop, sa);\n\n    /* Display the candidate loops. */\n    isl_schedule_free(schedule);\n    schedule = isl_schedule_node_get_schedule(node);\n    if (sa->scop->options->autosa->verbose)\n    {\n        isl_printer *p = isl_printer_to_file(sa->ctx, stdout);\n        p = isl_printer_set_yaml_style(p, ISL_YAML_STYLE_BLOCK);\n        p = isl_printer_print_schedule(p, schedule);\n        printf(\"\\n\");\n        isl_printer_free(p);\n    }\n    isl_schedule_free(schedule);\n\n    /* Tile the candidate loop. \n     * For each candidate loop, if the loop is used for latency hiding,\n     * it is tiled and permuted to the innermost of the time loop band. \n     * A latency hiding marker is added. */\n    node = autosa_latency_tile_loop(node, sa, mode);\n\n    /* Clean up the band pe_opt properties. */\n    schedule = isl_schedule_node_get_schedule(node);\n    isl_schedule_node_free(node);\n    schedule = isl_schedule_map_schedule_node_bottom_up(\n        schedule, &clear_pe_opt_prop, NULL);\n\n    sa->schedule = schedule;    \n\n    return isl_stat_ok;\n}\n\n/* Internal struct used in SIMD vectorization. */\nstruct simd_vectorization_data\n{\n    struct autosa_kernel *kernel;\n    float *scores;\n    int *legal;\n    float best_score;\n    int layout_trans;\n    int n_loops;\n    int loop_cnt;\n    char *mode;\n    int *ubs;\n    int *tile_size;\n    char *buffer;\n    int buffer_offset;\n    int has_space_candidate;\n    int n_legal_loops;\n};\n\n/* Internal struct used in is_stride_coalesced. */\nstruct stride_coalesced_data\n{\n    struct autosa_kernel *kernel;\n    isl_union_map *prefix;\n    float score;\n    float num_accs;\n    float num_layout_trans;\n};\n\n/* Examine if all the array references of the statement with the domain \"set\" \n * has stride-0/stride-1 access.\n */\nstatic isl_bool is_stride_coalesced_stmt(__isl_keep isl_set *set, void *user)\n{\n    isl_space *space;\n    isl_id *id;\n    struct autosa_stmt *stmt;\n    struct stride_coalesced_data *data = (struct stride_coalesced_data *)user;\n    struct autosa_stmt_access *accesses, *access;\n    isl_map *prefix;\n\n    space = isl_set_get_space(set);\n    id = isl_space_get_tuple_id(space, isl_dim_set);\n    isl_space_free(space);\n    prefix = isl_map_from_union_map(isl_union_map_intersect_domain(\n        isl_union_map_copy(data->prefix), isl_union_set_from_set(isl_set_copy(set))));\n    stmt = find_stmt(data->kernel->prog, id);\n    isl_id_free(id);\n    accesses = stmt->accesses;\n    for (access = accesses; access; access = access->next)\n    {\n        isl_map *acc;\n        int n;\n        isl_bool is_zero = isl_bool_false, is_one = isl_bool_false;\n        isl_pw_multi_aff *pma;\n        int i;\n\n        /* Skip the scalar access */\n        if (access->n_index == 0)\n            continue;\n\n        /* Transform the domain of access function to scheduling domains. */\n        acc = isl_map_copy(access->access);\n        acc = isl_map_apply_domain(acc, isl_map_copy(prefix));\n\n        /* Try each dimension of the array. */\n        for (i = access->n_index - 1; i >= 0; i--)\n        {\n            is_zero = access_is_stride_zero(acc, i);\n            if (is_zero)\n                break;\n        }\n        if (!is_zero)\n        {\n            for (i = access->n_index - 1; i >= 0; i--)\n            {\n                is_one = access_is_stride_one(acc, i);\n                if (is_one)\n                    break;\n            }\n        }\n\n        isl_map_free(acc);\n\n        if (!(is_zero || is_one))\n        {\n            isl_map_free(prefix);\n            return isl_bool_false;\n        }\n        else\n        {\n            /* Log if layout transformation is required and the dim to be permuted. */\n            if (i == access->n_index - 1)\n            {\n                access->layout_trans = 0;\n                access->simd_dim = i;\n            }\n            else\n            {\n                access->layout_trans = 1;\n                access->simd_dim = i;\n            }\n            /* Update the score. */\n            data->score = data->score + (1 - access->layout_trans);\n            data->num_accs = data->num_accs + 1;\n            data->num_layout_trans = data->num_layout_trans + access->layout_trans;\n        }\n    }\n\n    isl_map_free(prefix);\n    return isl_bool_true;\n}\n\n/* This function examines if the access function of the statements under \n * the current \"node\" has only stride-0/1 access.\n */\nstatic isl_bool is_stride_coalesced_at_node(__isl_keep isl_schedule_node *node,\n                                            void *user)\n{\n    struct stride_coalesced_data *data = (struct stride_coalesced_data *)user;\n    struct autosa_kernel *kernel = data->kernel;\n    isl_union_set *domain;\n    isl_union_map *prefix;\n    isl_bool one_or_zero;\n\n    if (isl_schedule_node_get_type(node) != isl_schedule_node_leaf)\n        return isl_bool_true;\n\n    domain = isl_schedule_node_get_domain(node);\n    prefix = isl_schedule_node_get_prefix_schedule_union_map(node);\n    data->prefix = prefix;\n\n    /* Examine each statment under the loop */\n    one_or_zero = isl_union_set_every_set(domain, &is_stride_coalesced_stmt, data);\n\n    isl_union_map_free(data->prefix);\n    isl_union_set_free(domain);\n\n    return one_or_zero;\n}\n\n/* This function examines if all the array references under the current \"node\"\n * are stride-0/stride-1.\n * We also give a score to the loop calculated by:\n * score = Sum_{all_array_references_under_the_loop}{\n *           (is_access_stride-0/1 * (1 - is_layout_transformation_required)              \n *              + num_of_accs / num_of_required_layout_transform}\n * When examining each array reference, we will try all different layout by \n * permuting each array dimension innermost to make sure we don't miss any\n * opportunity. \n * When layout transformation is required, we will log the dimension to be \n * permuted innermost.\n * The calculated score is returned.\n */\nstatic float is_stride_coalesced(__isl_keep isl_schedule_node *node,\n                                 struct autosa_kernel *kernel, int *layout_transform)\n{\n    float score = 0;\n    struct stride_coalesced_data data;\n    isl_bool coalesced;\n\n    data.kernel = kernel;\n    data.score = score;\n    data.num_accs = 0;\n    data.num_layout_trans = 0;\n    coalesced = isl_schedule_node_every_descendant(node,\n                                                   &is_stride_coalesced_at_node, &data);\n\n    /* We penalize the loop with more layout transformation required. */\n    if (data.num_layout_trans == 0)\n    {\n        data.score += (data.num_accs + 1);\n    }\n    else\n    {\n        data.score += (data.num_accs / data.num_layout_trans);\n    }\n\n    /* Examine and make sure all the array references of the same array \n     * have the same dimenison for layout transformation.\n     */\n    if (coalesced)\n    {\n        struct autosa_kernel *kernel = data.kernel;\n        for (int i = 0; i < kernel->n_array; i++)\n        {\n            struct autosa_local_array_info *local_array;\n            int simd_dim = -1;\n            local_array = &kernel->array[i];\n            for (int j = 0; j < local_array->array->n_ref; j++)\n            {\n                struct autosa_stmt_access *acc = local_array->array->refs[j];\n                if (acc->layout_trans == 1)\n                {\n                    if (simd_dim == -1)\n                        simd_dim = acc->simd_dim;\n                    else\n                    {\n                        if (simd_dim != acc->simd_dim)\n                        {\n                            coalesced = isl_bool_false;\n                            return coalesced ? data.score : -1;\n                        }\n                    }\n                }\n            }\n        }\n    }\n\n    /* Print out the layout transform information. */\n    if (coalesced)\n    {\n        struct autosa_kernel *kernel = data.kernel;\n        isl_printer *p;\n\n        p = isl_printer_to_file(kernel->ctx, stdout);\n        for (int i = 0; i < kernel->n_array; i++)\n        {\n            struct autosa_local_array_info *local_array;\n            local_array = &kernel->array[i];\n            for (int j = 0; j < local_array->array->n_ref; j++)\n            {\n                struct autosa_stmt_access *acc = local_array->array->refs[j];\n\n                if (acc->layout_trans != -1)\n                {\n                    if (acc->layout_trans == 1)\n                    {\n                        printf(\"[AutoSA] Array reference \");\n                        if (acc->read)\n                            printf(\"(R): \");\n                        else\n                            printf(\"(W): \");\n                        p = isl_printer_print_map(p, acc->access);\n                        printf(\"\\n\");\n                        printf(\"[AutoSA] Layout transform: Permute dim (%d) to the innermost\\n\", acc->simd_dim);\n                        *layout_transform = 1;\n                    }\n                    acc->layout_trans = -1;\n                    acc->simd_dim = -1;\n                }\n            }\n        }\n        isl_printer_free(p);\n    }\n\n    return coalesced ? data.score : -1;\n}\n\n/* A loop is identified to be vectorizable if it is:\n * - a parallel or reduction loop\n * - with stride-0/1 access.\n * Only time loops are considered.\n * For each candidate loop, we compute the score:\n * score = 2 * is_loop_parallel + 4 * is_loop_reduction)\n *           + Sum_{all_array_references_under_the_loop}{\n *              (is_access_stride-0/1 * (1 - is_layout_transformation_required)\n *              + num_of_accs / num_of_required_layout_transform}\n * The heuristics are:\n * - We prefer reduction loop to parallel loop. \n * - We prefer array references without requirements of layout transformation.\n */\nstatic isl_schedule_node *detect_simd_vectorization_loop(\n    __isl_take isl_schedule_node *node, void *user)\n{\n    struct simd_vectorization_data *data = (struct simd_vectorization_data *)user;\n    struct autosa_kernel *sa = data->kernel;\n    isl_ctx *ctx = isl_schedule_node_get_ctx(node);\n    float score;\n    isl_schedule_node *cur_node;\n    int is_latency;\n    int n_member;\n    int simd_touch_space;\n\n    /* If the currrent node is under the latency mark, return\n     * as we don't use latency hiding loop as candidates. \n     */\n    is_latency = is_node_under_latency(node);\n    if (is_latency)\n        return node;\n\n    simd_touch_space = sa->options->autosa->simd_touch_space;    \n\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n    {\n        n_member = isl_schedule_node_band_n_member(node);\n        for (int i = 0; i < n_member; i++)\n        {\n            if (!simd_touch_space && isl_schedule_node_band_member_get_space_time(node, i) != autosa_loop_time) {\n                /* We consider only time loops */\n                continue;\n            } else {\n                /* We consider both space and time loops */            \n                /* Two types of loops that we are interested in:\n                 * - Parallel loop.\n                 * - Reduction loop in the innermost loop band.\n                 *   This limit is currently relaxed, we will look at all loop bands \n                 *   for reduction loops as the current isl dep analysis can't \n                 *   differentiate reduction dependences and might seperate one \n                 *   permutable loop band into two loop bands.\n                 */\n                int is_parallel = 0;\n                int is_reduction = 0;\n                int layout_transform = 0;\n                float score_i;\n\n                if (!isl_schedule_node_band_member_get_coincident(node, i) && !strcmp(data->mode, \"manual\"))\n                {\n                    /* At present, we can't analyze reduction loop by AutoSA.\n                     * We will print each node and follow the user guidance.\n                     * Besides, reduction loops are only examined in the manual mode.\n                     * In the auto mode, only parallel loops are examined.\n                     */\n                    size_t bufsize = 100;\n                    size_t characters;\n                    printf(\"[AutoSA] Detecting the reduction loop.\\n\");\n                    printf(\"[AutoSA] Band member position: %d\\n\", i);\n                    /* If the SIMD info is pre-loaded, we don't ask for user inputs. */\n                    if (data->buffer == NULL)\n                    {\n                        isl_printer *p;\n                        p = isl_printer_to_file(ctx, stdout);\n                        p = isl_printer_end_line(p);\n                        p = isl_printer_set_yaml_style(p, ISL_YAML_STYLE_BLOCK);\n                        p = isl_printer_print_schedule_node(p, node);\n                        isl_printer_free(p);\n                        printf(\"[AutoSA] Please input if the current loop is a reduction loop [y/n]: \");\n                    }\n                    if (data->buffer == NULL)\n                    {\n                        char *buffer = (char *)malloc(bufsize * sizeof(char));\n                        data->buffer = buffer;\n                        data->buffer_offset = 0;\n                        characters = getline(&buffer, &bufsize, stdin);\n                    }\n                    printf(\"[AutoSA] Reduction property: %c\\n\", data->buffer[data->buffer_offset]);\n                    is_reduction = (data->buffer[data->buffer_offset] == 'y') ? 1 : 0;\n                    if (data->buffer[data->buffer_offset + 1] == 'y' ||\n                        data->buffer[data->buffer_offset + 1] == 'n')\n                    {\n                        data->buffer_offset += 1;\n                    }\n                    else\n                    {\n                        free(data->buffer);\n                        data->buffer = NULL;\n                        data->buffer_offset = 0;\n                    }\n                }\n                else\n                {\n                    is_parallel = isl_schedule_node_band_member_get_coincident(node, i);\n                }\n\n                /* Test if all the array references under the current loop \n                 * has only stride-0/1 access. \n                 */\n                if (is_parallel || is_reduction)\n                {\n                    cur_node = node;\n                    node = isl_schedule_node_dup(cur_node);\n\n                    if (i > 0)\n                    {\n                        node = isl_schedule_node_band_split(node, i);\n                        node = isl_schedule_node_child(node, 0);\n                    }\n                    if (n_member - i - 1 > 0)\n                    {\n                        node = isl_schedule_node_band_split(node, 1);\n                    }\n\n                    /* Sink the band innermost. */\n                    node = isl_schedule_node_band_sink(node);\n                    score = 2 * is_parallel + 4 * is_reduction;\n                    printf(\"[AutoSA] -----------------------------------------------\\n\");\n                    printf(\"[AutoSA] Current band member position: %d\\n\", i);\n                    printf(\"[AutoSA] -----------------------------------------------\\n\");\n                    score_i = is_stride_coalesced(node, sa, &layout_transform);\n                    isl_schedule_node_free(node);\n                    node = cur_node;\n                    if (score_i < 0)\n                    {\n                        /* The array references are not coalesced. */\n                        score = -1;\n                        continue;\n                    }\n                    else\n                    {\n                        score += score_i;\n                        printf(\"[AutoSA] -----------------------------------------------\\n\");\n                        printf(\"[AutoSA] The loop is legal to be vectorized with score: %f\\n\",\n                               score);\n                        if (layout_transform)\n                            printf(\"[AutoSA] Layout transformation is required to proceed.\\n\");\n                        printf(\"[AutoSA] -----------------------------------------------\\n\");\n                        node = isl_schedule_node_band_member_set_pe_opt(node, i, autosa_loop_simd);\n                        if (isl_schedule_node_band_member_get_space_time(node, i) == autosa_loop_space)\n                            data->has_space_candidate = 1;\n\n                        if (score >= data->best_score)\n                        {\n                            data->best_score = score;\n                            data->layout_trans = layout_transform;\n                        }\n                        data->n_loops = data->n_loops + 1;\n                        data->scores = (float *)realloc(data->scores, sizeof(float) * data->n_loops);\n                        data->scores[data->n_loops - 1] = score;\n                        data->legal = (int *)realloc(data->legal, sizeof(int) * data->n_loops);\n                        data->legal[data->n_loops - 1] = !layout_transform;\n                        if (!layout_transform) \n                            data->n_legal_loops++;\n\n                        /* Extract the loop upper bounds */\n                        int *ubs = extract_band_upper_bounds(node);\n                        data->ubs = (int *)realloc(data->ubs, sizeof(int) * data->n_loops);\n                        data->ubs[data->n_loops - 1] = ubs[i];\n                        free(ubs);\n                    }\n                }\n            }\n        }\n    }\n\n    return node;\n}\n\n/* Examine if the node is the last band node, \n * If so, add a \"simd\" mark before the node. */\nstatic __isl_give isl_schedule_node *add_simd_mark(\n    __isl_take isl_schedule_node *node, void *user)\n{\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n    {\n        node = isl_schedule_node_child(node, 0);\n        isl_bool no_inner_band = isl_schedule_node_every_descendant(node,\n                                                                    &no_permutable_node, NULL);\n        node = isl_schedule_node_parent(node);\n        if (no_inner_band)\n        {\n            /* Insert the \"simd\" mark. */\n            isl_id *id = isl_id_alloc(isl_schedule_node_get_ctx(node), \"simd\", NULL);\n            node = isl_schedule_node_insert_mark(node, id);\n        }\n    }\n\n    return node;\n}\n\n/* Update the stride information for the array accesses under the SIMD loop.\n */\nstatic isl_bool update_simd_acc_stmt(__isl_keep isl_set *set, void *user)\n{\n    struct stride_coalesced_data *data = (struct stride_coalesced_data *)user;\n    struct autosa_stmt *stmt;\n    isl_space *space;\n    isl_id *id;\n    struct autosa_stmt_access *accesses, *access;\n    isl_map *prefix;\n\n    space = isl_set_get_space(set);\n    id = isl_space_get_tuple_id(space, isl_dim_set);\n    isl_space_free(space);\n    stmt = find_stmt(data->kernel->prog, id);\n    isl_id_free(id);\n    accesses = stmt->accesses;\n    prefix = isl_map_from_union_map(isl_union_map_intersect_domain(\n        isl_union_map_copy(data->prefix), isl_union_set_from_set(isl_set_copy(set))));\n\n    for (access = accesses; access; access = access->next)\n    {\n        isl_map *acc;\n        int n;\n        isl_bool is_zero = isl_bool_false, is_one = isl_bool_false;\n        isl_pw_multi_aff *pma;\n        int i;\n\n        if (access->n_index == 0)\n            continue;\n\n        acc = isl_map_copy(access->access);\n        acc = isl_map_apply_domain(acc, isl_map_copy(prefix));\n\n        for (i = access->n_index - 1; i >= 0; i--)\n        {\n            is_zero = access_is_stride_zero(acc, i);\n            if (is_zero)\n                break;\n        }\n        if (!is_zero)\n        {\n            is_one = isl_bool_true;\n        }\n\n        isl_map_free(acc);\n        access->simd_stride = is_zero ? 0 : (is_one ? 1 : -1);\n    }\n\n    isl_map_free(prefix);\n    return isl_bool_true;\n}\n\n/* Update the stride information for the array accesses under the SIMD loop.\n */\nstatic isl_bool update_simd_acc(__isl_keep isl_schedule_node *node, void *user)\n{\n    isl_union_set *domain;\n    isl_union_map *prefix;\n    struct stride_coalesced_data *data = (struct stride_coalesced_data *)user;\n\n    if (isl_schedule_node_get_type(node) != isl_schedule_node_leaf)\n        return isl_bool_true;\n\n    domain = isl_schedule_node_get_domain(node);\n    prefix = isl_schedule_node_get_prefix_schedule_union_map(node);\n    data->prefix = prefix;\n\n    isl_union_set_every_set(domain, &update_simd_acc_stmt, data);\n\n    isl_union_set_free(domain);\n    isl_union_map_free(prefix);\n\n    return isl_bool_true;\n}\n\n/* This function tiles the SIMD loop.\n * If it is executed in the auto mode, it will select the loop with the \n * highest score.\n * Otherwise, it will select loops with positive tiling factors.\n * Loops with tiling factors of one or require layout transformation are skipped.\n * At last, it will also update the stride information for the array accesses\n * under the SIMD loop.\n */\nstatic __isl_give isl_schedule_node *autosa_simd_tile_loop(\n    __isl_take isl_schedule_node *node, void *user)\n{\n    struct simd_vectorization_data *data = (struct simd_vectorization_data *)user;\n    struct autosa_kernel *kernel = data->kernel;\n    struct stride_coalesced_data stride_data;\n    stride_data.kernel = data->kernel;\n\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n    {\n        for (int i = 0; i < isl_schedule_node_band_n_member(node); i++)\n        {\n            if (isl_schedule_node_band_member_get_pe_opt(node, i) == autosa_loop_simd)\n            {\n                if (!strcmp(data->mode, \"auto\"))\n                {\n                    /* Perform tiling on the loop with the highest score. */\n                    if (data->scores[data->loop_cnt] != data->best_score)\n                    {\n                        node = isl_schedule_node_band_member_set_pe_opt(node, i,\n                                                                        autosa_loop_default);\n                        data->loop_cnt++;\n                        continue;\n                    }\n                }\n                else\n                {\n                    /* Peform tiling on the loop with positive tiling factor */\n                    if (data->tile_size[data->loop_cnt] <= 0)\n                    {\n                        node = isl_schedule_node_band_member_set_pe_opt(node, i,\n                                                                        autosa_loop_default);\n                        data->loop_cnt++;\n                        continue;\n                    }\n                }\n                if (data->tile_size[data->loop_cnt] == 1)\n                {\n                    /* Skip if the tiling factor is one. */\n                    node = isl_schedule_node_band_member_set_pe_opt(node, i,\n                                                                    autosa_loop_default);\n                    data->loop_cnt++;\n                    continue;\n                }\n                if (data->legal[data->loop_cnt] == 0)\n                {\n                    /* Layout transformation is needed to proceed.\n                     * We will skip this loop. \n                     */\n                    node = isl_schedule_node_band_member_set_pe_opt(node, i,\n                                                                    autosa_loop_default);\n                    data->loop_cnt++;\n                    continue;\n                }\n                \n                int tile_size = data->tile_size[data->loop_cnt];\n                \n                /* If SIMD vectorization is applied on the space loops, we need to update\n                 * the SA dimensions.\n                 */\n                if (isl_schedule_node_band_member_get_space_time(node, i) == autosa_loop_space) {\n                    /* Figure out the dim position */\n                    int touched_space_loop = 0;\n                    for (int j = 0; j < i; j++) {\n                        if (isl_schedule_node_band_member_get_space_time(node, j) == autosa_loop_space)\n                            touched_space_loop++;\n                    }                                        \n                    data->kernel->sa_dim[touched_space_loop] /= tile_size;\n                    if (data->kernel->sa_dim[touched_space_loop] == 1) {\n                        throw std::runtime_error(\"[AutoSA] Error: Array dimension as 1 is not supported!\");\n                    }\n                }                \n                /* Tile the loop */\n                node = autosa_node_band_tile_loop(node, tile_size, i);                \n                /* Reset the candidate loop in the tile loop the pe_opt property to default */\n                node = isl_schedule_node_band_member_set_pe_opt(node, i, autosa_loop_default);\n                /* Reset the point loop space_time property to time loop. */\n                node = isl_schedule_node_child(node, 0);\n                node = isl_schedule_node_band_member_set_space_time(node, 0, autosa_loop_time);\n                /* Reset the point loop pe_opt property to default. */\n                node = isl_schedule_node_band_member_set_pe_opt(node, 0, autosa_loop_default);                \n                if (data->kernel->scop->options->autosa->tuning_method == 1) {\n                    node = isl_schedule_node_parent(node);\n                    node = data->kernel->tuning_program->tile(node, i, 1, \"SIMD\", {\"power_of_two\"}, 32/data->kernel->array[0].array->size);\n                    node = isl_schedule_node_child(node, 0);\n                }\n                /* Sink the point loop innermost */\n                if (kernel->options->autosa->isl_sink) {\n                    node = isl_schedule_node_band_sink(node);\n                    /* Add the simd marker */\n                    node = isl_schedule_node_map_descendant_bottom_up(node, &add_simd_mark, NULL);\n                }\n                else {\n                    /* Sink the point loop innermost and add the simd marker */\n                    node = autosa_node_sink_to_mark(node, \"simd\");\n                }\n                /* Update the stride information for array references under the SIMD loop. */\n                isl_schedule_node_every_descendant(node, &update_simd_acc, &stride_data);                \n\n                node = isl_schedule_node_parent(node);\n                kernel->simd_w = tile_size;\n                data->loop_cnt++;\n                printf(\"[AutoSA] SIMD vectorization successfully applied.\\n\");\n            }\n        }\n    }\n\n    return node;\n}\n\n/* Load the SIMD information for the kernel. \n */\nstatic __isl_give char *load_simd_info(struct autosa_kernel *sa)\n{\n    cJSON *simd_info;\n    FILE *f;\n    char *buffer = NULL;\n    long length;\n\n    if (sa->options->autosa->simd_info)\n    {\n        f = fopen(sa->options->autosa->simd_info, \"rb\");\n        if (f)\n        {\n            fseek(f, 0, SEEK_END);\n            length = ftell(f);\n            fseek(f, 0, SEEK_SET);\n            buffer = (char *)malloc(length + 1);\n            if (buffer)\n            {\n                buffer[length] = '\\0';\n                int r = fread(buffer, 1, length, f);\n            }\n            fclose(f);\n        }\n        else\n        {\n            printf(\"[AutoSA] Error: Can't open SIMD information file: %s\\n\",\n                   sa->options->autosa->simd_info);\n            exit(1);\n        }\n    }\n\n    if (buffer)\n    {\n        simd_info = cJSON_Parse(buffer);\n        free(buffer);\n        /* Load the SIMD info into a string. */\n        cJSON *reduction = NULL;\n        cJSON *reductions = NULL;\n        int info_id = 0;\n        char kernel_name[20];\n        sprintf(kernel_name, \"kernel%d\", sa->space_time_id);        \n        reductions = cJSON_GetObjectItemCaseSensitive(simd_info, kernel_name);\n        if (reductions)\n        {\n            char *info = (char *)malloc(100 * sizeof(char));\n            reductions = cJSON_GetObjectItemCaseSensitive(reductions, \"reduction\");\n            cJSON_ArrayForEach(reduction, reductions)\n            {\n                char *info_i = reduction->valuestring;\n                sprintf(info + info_id, \"%c\", info_i[0]);\n                info_id++;\n            }\n            cJSON_Delete(simd_info);\n            return info;\n        }\n        else\n        {\n            cJSON_Delete(simd_info);\n            return NULL;\n        }\n    }\n    return NULL;\n}\n\n/* Apply SIMD vectorization. \n * We go through all the loops, if there is any vectorizable loop \n * (parallel or reduction loop with stride-0/1 access), such a loop will \n * be identified as SIMD loop candidates. We will rank the loops by heuristics \n * and pick up one loop with the highest score to be tiled. \n * The point loop will be permuated as the innermost loops.\n * At last this loop with be unrolled by HLS tools.\n */\nisl_stat sa_simd_vectorization_optimize(struct autosa_kernel *sa, char *mode)\n{\n    float *scores = NULL;\n    int n_loops = 0;\n    struct simd_vectorization_data data;\n    data.best_score = 0;\n    data.mode = mode;\n    data.ubs = NULL;\n    int *tile_size = NULL;\n\n    printf(\"[AutoSA] Apply SIMD vectorization.\\n\");\n    isl_schedule *schedule = sa->schedule;\n    isl_schedule_node *node = isl_schedule_get_root(schedule);\n    sa->simd_w = 1;\n\n    /* Move down to the array marker */\n    node = autosa_tree_move_down_to_array(node, sa->core);\n\n    /* Detect all candidate loops */\n    data.kernel = sa;\n    data.scores = scores;\n    data.legal = NULL;\n    data.buffer = NULL;\n    data.buffer_offset = 0;\n    data.n_loops = n_loops;\n    data.n_legal_loops = 0;\n    data.has_space_candidate = 0;\n    /* Load the SIMD information. */\n    data.buffer = load_simd_info(sa);\n    node = isl_schedule_node_map_descendant_bottom_up(\n        node, &detect_simd_vectorization_loop, &data);\n\n    if (data.n_loops == 0)\n    {\n        printf(\"[AutoSA] No candidate loops found!\\n\");\n        isl_schedule_node_free(node);\n        return isl_stat_ok;\n    }\n\n    /* Display the candidate loops. */\n    isl_schedule_free(schedule);\n    schedule = isl_schedule_node_get_schedule(node);\n    if (sa->scop->options->autosa->verbose)\n    {\n        isl_printer *p = isl_printer_to_file(sa->ctx, stdout);\n        p = isl_printer_set_yaml_style(p, ISL_YAML_STYLE_BLOCK);\n        p = isl_printer_print_schedule(p, schedule);\n        printf(\"\\n\");\n        isl_printer_free(p);\n    }\n    isl_schedule_free(schedule);\n    \n    if (data.n_legal_loops == 0) {\n        printf(\"[AutoSA] No legal SIMD loop is fonud. SIMD vectorization is skipped.\\n\");\n    }\n    else {\n        /* Select the candidate loop with the highest score.\n         * Tile the candidate loop and permute the point loop innermost. \n         * A SIMD vectorization marker is added. \n         */\n        if (sa->scop->options->autosa->tuning_method == 1) {\n            /* Select one tiling factor in between (1, ub).\n             * Avoid 1 as such a tiling factor will be skipped and the AST loop will\n             * be degenerated.\n             * Avoid ub as generating space dim with 1 is not supported.\n             */\n            tile_size = data.ubs;\n            for (int i = 0; i < data.n_loops; i++) {\n                if (data.scores[i] == data.best_score) {\n                    std::vector<int> factors = get_factors(tile_size[i]);\n                    if (factors.size() < 3) {\n                        printf(\"[AutoSA] Error: Cannot find legal tiling factors for auto-tuning template!\\n\");\n                        exit(1);\n                    }\n                    tile_size[i] = factors[1];\n                } else {\n                    tile_size[i] = 1;\n                }\n            }            \n        } else {\n            if (!strcmp(mode, \"manual\"))\n            {\n                tile_size = read_simd_tile_sizes(sa, data.n_loops);\n                if (!tile_size)\n                {\n                    /* Dump out the number, score and upper bounds of simd loops \n                     * and exit the program. \n                     */\n                    int *ubs = data.ubs;\n                    FILE *fp;\n                    char *content;\n                    cJSON *tuning, *simd_json, *loops_json, *scores_json, *legal_json;\n                    isl_printer *p_str;\n                    char *tuning_path;\n\n                    tuning = cJSON_CreateObject();\n                    simd_json = cJSON_CreateObject();\n                    cJSON_AddItemToObject(tuning, \"simd\", simd_json);\n                    loops_json = cJSON_CreateArray();\n                    cJSON_AddItemToObject(simd_json, \"tilable_loops\", loops_json);\n                    for (int i = 0; i < data.n_loops; i++)\n                    {\n                        cJSON *loop = cJSON_CreateNumber(ubs[i]);\n                        cJSON_AddItemToArray(loops_json, loop);\n                    }\n                    scores_json = cJSON_CreateArray();\n                    cJSON_AddItemToObject(simd_json, \"scores\", scores_json);\n                    for (int i = 0; i < data.n_loops; i++)\n                    {\n                        cJSON *loop = cJSON_CreateNumber(data.scores[i]);\n                        cJSON_AddItemToArray(scores_json, loop);\n                    }\n                    legal_json = cJSON_CreateArray();\n                    cJSON_AddItemToObject(simd_json, \"legal\", legal_json);\n                    for (int i = 0; i < data.n_loops; i++)\n                    {\n                        cJSON *loop = cJSON_CreateNumber(data.legal[i]);\n                        cJSON_AddItemToArray(legal_json, loop);\n                    }\n                    if (data.has_space_candidate == 0) {\n                        loops_json = cJSON_CreateArray();\n                        cJSON_AddItemToObject(simd_json, \"sa_dims\", loops_json);\n                        for (int i = 0; i < sa->n_sa_dim; i++)\n                        {\n                            cJSON *loop = cJSON_CreateNumber(sa->sa_dim[i]);\n                            cJSON_AddItemToArray(loops_json, loop);\n                        }\n                    }\n                    p_str = isl_printer_to_str(sa->ctx);\n                    p_str = isl_printer_print_str(p_str, sa->options->autosa->output_dir);\n                    p_str = isl_printer_print_str(p_str, \"/tuning.json\");\n                    tuning_path = isl_printer_get_str(p_str);\n                    fp = fopen(tuning_path, \"w\");\n                    content = cJSON_Print(tuning);\n                    fprintf(fp, \"%s\", content);\n                    cJSON_Delete(tuning);\n                    free(tuning_path);\n                    isl_printer_free(p_str);\n                    exit(0);\n                }\n            }\n            else\n            {\n                throw std::runtime_error(\"[AutoSA] Error: Auto SIMD vectorization is not supported.\\n\");\n            }\n            free(data.ubs);\n        }\n\n        /* Perform the simd vectorization. */\n        data.loop_cnt = 0;\n        data.tile_size = tile_size;\n        node = isl_schedule_node_map_descendant_bottom_up(node,\n                                                          &autosa_simd_tile_loop, &data);\n    }\n    \n    free(data.legal);\n    free(tile_size);\n    /* Clean up the band pe_opt properties. */\n    schedule = isl_schedule_node_get_schedule(node);\n    isl_schedule_node_free(node);\n    schedule = isl_schedule_map_schedule_node_bottom_up(\n        schedule, &clear_pe_opt_prop, NULL);\n    free(data.scores);\n    sa->schedule = schedule;\n\n    /* Update the tuning config, dump out the sa dimensions. */\n    if (data.has_space_candidate)\n    {\n        cJSON *tuning, *loops_json;\n        isl_printer *p_str;\n        char *tuning_path;\n        char *content;\n        FILE *fp;\n\n        tuning = cJSON_CreateObject();\n        loops_json = cJSON_CreateArray();\n        cJSON_AddItemToObject(tuning, \"sa_dims\", loops_json);\n        for (int i = 0; i < sa->n_sa_dim; i++) {\n            cJSON *loop = cJSON_CreateNumber(sa->sa_dim[i]);\n            cJSON_AddItemToArray(loops_json, loop);\n        }\n        p_str = isl_printer_to_str(sa->ctx);\n        p_str = isl_printer_print_str(p_str, sa->options->autosa->output_dir);\n        p_str = isl_printer_print_str(p_str, \"/tuning.json\");\n        tuning_path = isl_printer_get_str(p_str);\n        fp = fopen(tuning_path, \"w\");\n        content = cJSON_Print(tuning);\n        fprintf(fp, \"%s\", content);\n        free(content);\n        cJSON_Delete(tuning);\n        free(tuning_path);\n        isl_printer_free(p_str);\n    }\n\n    /* Check if any of the space dimension is one, which is not supported by the current AutoSA. */\n    for (int i = 0; i < sa->n_sa_dim; i++) {\n        //std::cout << sa->n_sa_dim << std::endl;\n        //std::cout << sa->sa_dim[i] << std::endl;\n        if (sa->sa_dim[i] == 1) {            \n            throw std::runtime_error(\"[AutoSA] Error: Array dimension as 1 is not supported!\");\n        }\n    }\n\n    return isl_stat_ok;\n}\n\n/* Apply PE optimization including:\n * - latency hiding\n * - SIMD vectorization\n * - array partitioning\n */\nisl_stat compute_management(\n    struct autosa_gen *gen,\n    struct autosa_kernel *sa, bool pass_en[], char *pass_mode[])\n{\n    printf(\"[AutoSA] Apply compute management.\\n\");    \n\n    /* Prepartion before the optimization. */\n    /* Initialize the autosa_loop_types. */\n    sa_loop_init(sa);\n    /* Set up the space_time properties. */\n    sa_space_time_loop_setup(sa);    \n    /* Extract the communication pairs. */\n    sa_io_update(sa);    \n\n    /* If any of the space dimensions are not parallel, \n     * check if local_reduce is enabled, otherwise error out.\n     */\n    //if (gen->options->autosa->tuning_method != 1) {\n    //    for (int i = 0; i < sa->n_sa_dim; i++) {        \n    //        if (sa->space_parallel[i] == 0 && !gen->options->autosa->local_reduce) {\n    //            throw std::runtime_error(\"[AutoSA] Error: Detected non-parallel space loops which is not supported unless local-reduce is specified.\");\n    //        }\n    //    }\n    //}\n\n    /* Extract the tile sizes. */\n    sa->sizes = extract_sizes_from_str(sa->ctx, sa->scop->options->autosa->sa_sizes);\n    /* Set the core */\n    isl_union_set *domain = isl_schedule_get_domain(sa->schedule);\n    sa->core = isl_union_set_universe(domain);\n    /* Array partitioning. */\n    sa_array_partitioning_optimize(sa, pass_en[0], pass_mode[0], pass_en[1], pass_mode[1]);    \n    /* Dump out the intermediate code if needed */\n    if (gen->options->autosa->dump_code) {\n        dump_intermediate_code(gen, isl_schedule_copy(sa->schedule), \"array_part\");\n    }\n    /* Latency hiding. */\n    sa_latency_hiding_optimize(sa, pass_en[2], pass_mode[2]);    \n    if (gen->options->autosa->dump_code) {\n        dump_intermediate_code(gen, isl_schedule_copy(sa->schedule), \"latency\");\n    }\n    /* SIMD vectorization. */\n    if (pass_en[3]) {\n        sa_simd_vectorization_optimize(sa, pass_mode[3]);    \n        if (gen->options->autosa->dump_code) {\n            dump_intermediate_code(gen, isl_schedule_copy(sa->schedule), \"simd\");\n        }\n    }\n\n    return isl_stat_ok;\n}\n\n/* Extract the set of parameter values and outer schedule dimensions\n * for which any statement instance\n * in the kernel inserted at \"node\" needs to be executed.\n * Intersect the set of parameter values derived from the host schedule\n * relation with the context of \"prog\".\n */\nstatic __isl_give isl_set *extract_context(__isl_keep isl_schedule_node *node,\n                                           struct autosa_prog *prog)\n{\n    isl_union_map *schedule;\n    isl_union_set *schedule_domain;\n    isl_set *context;\n    int empty;\n\n    schedule = isl_schedule_node_get_prefix_schedule_relation(node);\n    schedule_domain = isl_union_map_range(schedule);\n    empty = isl_union_set_is_empty(schedule_domain);\n    if (empty < 0)\n    {\n        isl_union_set_free(schedule_domain);\n        return NULL;\n    }\n    if (empty)\n    {\n        int depth;\n        isl_space *space;\n\n        space = isl_union_set_get_space(schedule_domain);\n        isl_union_set_free(schedule_domain);\n        space = isl_space_set_from_params(space);\n        depth = isl_schedule_node_get_schedule_depth(node);\n        space = isl_space_add_dims(space, isl_dim_set, depth);\n        context = isl_set_empty(space);\n    }\n    else\n    {\n        context = isl_set_from_union_set(schedule_domain);\n    }\n    context = isl_set_intersect_params(context,\n                                       isl_set_copy(prog->context));\n\n    return context;\n}\n\n/* Return the set of outer array elements accessed by\n * by the statement instances in \"domain\" in \"prog\".\n * The instances in \"domain\" are those that appear\n * in the domains of the access relations in \"prog\".\n */\nstatic __isl_give isl_union_set *accessed_by_domain(\n    __isl_take isl_union_set *domain, struct autosa_prog *prog)\n{\n    isl_union_map *access;\n    isl_union_set *arrays;\n\n    access = isl_union_map_union(isl_union_map_copy(prog->read),\n                                 isl_union_map_copy(prog->may_write));\n    access = isl_union_map_intersect_domain(access, domain);\n    arrays = isl_union_map_range(access);\n    arrays = isl_union_set_apply(arrays,\n                                 isl_union_map_copy(prog->to_outer));\n\n    return arrays;\n}\n\n/* Compute the effective grid size as a list of the sizes in each dimension.\n *\n * The grid size specified by the user or set by default\n * in read_grid_sizes() and applied by the block filter,\n * may be too large for the given code in the sense that\n * it may contain blocks that don't need to execute anything.\n * We therefore don't return this grid size, but instead the\n * smallest grid size that ensures that all blocks that actually\n * execute code are included in the grid.\n *\n * We first extract a description of the grid, i.e., the possible values\n * of the block ids, from the domain elements in \"domain\" and\n * kernel->block_filter.\n * The block ids are parameters in kernel->block_filter.\n * We simply need to change them into set dimensions.\n *\n * Then, for each block dimension, we compute the maximal value of the block id\n * and add one.\n */\nstatic __isl_give isl_multi_pw_aff *extract_grid_size(\n    struct autosa_kernel *kernel, __isl_take isl_union_set *domain)\n{\n    int i;\n    isl_set *grid;\n    isl_set *context;\n    isl_multi_pw_aff *size;\n\n    /* For AutoSA, we set the grid size as 1 */\n    grid = isl_union_set_params(domain);\n    grid = isl_set_from_params(grid);\n    grid = isl_set_add_dims(grid, isl_dim_set, kernel->n_grid);\n    for (i = 0; i < kernel->n_grid; ++i)\n    {\n        int pos;\n        isl_constraint *ls;\n\n        if (!grid)\n            return NULL;\n\n        /* Set this dimension as 1. */\n        ls = isl_constraint_alloc_equality(isl_local_space_from_space(isl_set_get_space(grid)));\n        ls = isl_constraint_set_constant_si(ls, 0);\n        ls = isl_constraint_set_coefficient_si(ls, isl_dim_set, i, 1);\n        grid = isl_set_add_constraint(grid, ls);\n    }\n\n    grid = isl_set_coalesce(grid);\n    size = ppcg_size_from_extent(grid);\n    context = isl_set_params(isl_set_copy(kernel->context));\n    return isl_multi_pw_aff_gist(size, context);\n}\n\n/* Group the domain elements into a single space, named kernelX,\n * with X the kernel sequence number \"kernel_id\".\n */\nstatic __isl_give isl_schedule_node *group_statements(\n    __isl_take isl_schedule_node *node, int kernel_id)\n{\n    char buffer[20];\n    isl_id *id;\n\n    if (!node)\n        return NULL;\n\n    snprintf(buffer, sizeof(buffer), \"kernel%d\", kernel_id);\n    id = isl_id_alloc(isl_schedule_node_get_ctx(node), buffer, NULL);\n    return isl_schedule_node_group(node, id);\n}\n\n/* Replace \"pa\" by the zero function defined over the universe domain\n * in the space of \"pa\".\n */\nstatic __isl_give isl_pw_aff *set_universally_zero(__isl_take isl_pw_aff *pa)\n{\n    isl_space *space;\n    isl_aff *zero;\n\n    space = isl_space_domain(isl_pw_aff_get_space(pa));\n    isl_pw_aff_free(pa);\n    zero = isl_aff_zero_on_domain(isl_local_space_from_space(space));\n\n    return isl_pw_aff_from_aff(zero);\n}\n\n/* The sizes of the arrays on the host that have been computed by\n * extract_array_info may depend on the parameters.  Use the extra\n * constraints on the parameters that are valid at \"host_domain\"\n * to simplify these expressions and store the results in kernel->array.\n *\n * We only need these localized bounds for arrays that are accessed\n * by the current kernel.  If we have found at least one reference group\n * then the array is accessed by the kernel.\n *\n * The resulting sizes may be functions that are nowhere defined\n * in case the access function cannot possibly access anything inside\n * the kernel for some reason.  If so, they are replaced by the zero\n * function.  Since the access function cannot actually access anything,\n * there is no harm in printing the array sizes as zero.\n */\nstatic void localize_bounds(struct autosa_kernel *kernel)\n{\n    int i, j;\n    isl_set *context;\n    isl_set *host_domain = kernel->host_domain;\n\n    context = isl_set_copy(host_domain);\n    context = isl_set_params(context);\n\n    for (i = 0; i < kernel->n_array; ++i)\n    {\n        struct autosa_local_array_info *local = &kernel->array[i];\n        isl_multi_pw_aff *bound;\n        int n_index;\n\n        if (local->n_pe_group == 0)\n            continue;\n\n        n_index = local->array->n_index;\n        bound = isl_multi_pw_aff_copy(local->array->bound);\n\n        for (j = 0; j < n_index; ++j)\n        {\n            isl_pw_aff *pwaff;\n            int empty;\n\n            pwaff = isl_multi_pw_aff_get_pw_aff(bound, j);\n            pwaff = isl_pw_aff_gist(pwaff, isl_set_copy(context));\n            empty = isl_pw_aff_is_empty(pwaff);\n            if (empty < 0)\n                pwaff = isl_pw_aff_free(pwaff);\n            else if (empty)\n                pwaff = set_universally_zero(pwaff);\n            bound = isl_multi_pw_aff_set_pw_aff(bound, j, pwaff);\n        }\n\n        local->n_index = n_index;\n        local->bound = bound;\n    }\n    isl_set_free(context);\n}\n\n/* Apply communication management including:\n * - data allocation\n * - I/O construction\n * - I/O optimization \n * First, data allocation allocates the on-chip buffers inside PEs.\n * Next, I/O construction builds the I/O system to transfer the data.\n * Lastly, I/O optimization optimizes the I/O system, performing tasks including:\n * - I/O module clustering\n * - L2 I/O buffering\n * - data packing\n */\nisl_stat comm_management(struct autosa_kernel *sa, struct autosa_gen *gen)\n{\n    printf(\"[AutoSA] Apply communication management.\\n\");\n\n    sa_io_construct_optimize(sa, gen);\n\n    /* Localize the array bounds using parameters from the host domain. */\n    localize_bounds(sa);\n\n    return isl_stat_ok;\n}\n\nstatic struct autosa_kernel *process_kernel_meta_data(struct autosa_kernel *kernel, struct autosa_gen *gen)\n{\n    isl_schedule_node *node;\n    isl_union_set *domain, *expanded;\n    int single_statement;\n    isl_union_pw_multi_aff *contraction;\n    isl_union_map *host_schedule;\n    isl_set *host_domain;\n    isl_id *id;    \n    int n_space_dim;\n\n    node = isl_schedule_get_root(kernel->schedule);\n    node = isl_schedule_node_child(node, 0);\n    node = isl_schedule_node_child(node, 0);\n\n    /* Insert \"local\" mark before the \"array\" mark. */\n    node = autosa_tree_insert_local_before_array(node);\n    if (!node)\n        return NULL;\n\n    domain = isl_schedule_node_get_domain(node);\n    single_statement = isl_union_set_n_set(domain) == 1;\n\n    /* Prepare some metadata. */\n    kernel->single_statement = single_statement;\n    kernel->context = extract_context(node, gen->prog);\n    contraction = isl_schedule_node_get_subtree_contraction(node);\n    kernel->contraction = isl_union_pw_multi_aff_copy(contraction);\n    expanded = isl_union_set_copy(domain);\n    expanded = isl_union_set_preimage_union_pw_multi_aff(expanded, contraction);\n    kernel->expanded_domain = isl_union_set_copy(expanded);\n    kernel->arrays = accessed_by_domain(expanded, gen->prog);\n    //kernel->id = gen->kernel_id++;\n    /* For FPGA, we set grid_size and block_size as 1, i.e. only one thread block \n     * and one thread inside the thread block. */\n    kernel->n_grid = 1;\n    kernel->block_dim[0] = 1;\n    kernel->n_block = 1;\n    kernel->grid_dim[0] = 1;\n    kernel->grid_size = extract_grid_size(kernel, isl_union_set_copy(domain));\n    host_schedule = isl_schedule_node_get_prefix_schedule_union_map(node);\n    host_domain = isl_set_from_union_set(isl_union_map_range(host_schedule));\n    kernel->host_domain = host_domain;\n    kernel->domain = domain;\n\n    /* Make all the host loops atomic so that kernel is only called once. */\n    node = autosa_atomic_ancestors(node);\n\n    /* Insert the \"kernel\" mark. */\n    id = isl_id_alloc(gen->ctx, \"kernel\", kernel);\n    node = isl_schedule_node_insert_mark(node, id);\n    gen->kernel = kernel;\n\n    if (!single_statement)\n        node = group_statements(node, kernel->id);\n\n    /* Insert the PE mark below the space band */\n    node = autosa_tree_move_down_to_array(node, kernel->core);\n    node = isl_schedule_node_child(node, 0);\n    n_space_dim = 0;\n    for (int i = 0; i < isl_schedule_node_band_n_member(node); i++)\n    {\n        if (isl_schedule_node_band_member_get_space_time(node, i) == autosa_loop_space)\n        {\n            n_space_dim++;\n        }\n    }\n    if (isl_schedule_node_band_n_member(node) > n_space_dim)\n        node = isl_schedule_node_band_split(node, n_space_dim);\n    node = isl_schedule_node_child(node, 0);\n    id = isl_id_alloc(gen->ctx, \"pe\", NULL);\n    node = isl_schedule_node_insert_mark(node, id);\n    node = autosa_tree_move_up_to_kernel(node);\n\n    /* Save a copy of copy_schedule. */\n    node = autosa_tree_move_down_to_pe(node, kernel->core);\n    kernel->copy_schedule_dim = isl_schedule_node_get_schedule_depth(node);\n    kernel->copy_schedule =\n        isl_schedule_node_get_prefix_schedule_union_pw_multi_aff(node);\n    contraction = isl_union_pw_multi_aff_copy(kernel->contraction);\n    kernel->copy_schedule =\n        isl_union_pw_multi_aff_pullback_union_pw_multi_aff(\n            kernel->copy_schedule, contraction);\n    node = autosa_tree_move_up_to_kernel(node);\n\n    /* Delete the local node. */\n    node = autosa_tree_move_down_to_local(node, kernel->core);\n    node = isl_schedule_node_delete(node);\n\n    node = autosa_tree_move_up_to_kernel(node);\n\n    kernel->schedule = isl_schedule_free(kernel->schedule);\n    kernel->schedule = isl_schedule_node_get_schedule(node);\n    isl_schedule_node_free(node);\n\n    return kernel;\n}\n\nstatic struct autosa_kernel *optimize_single_array(struct autosa_kernel *kernel, struct autosa_gen *gen) \n{\n    cJSON *array_part_json, *array_part_en_json, *array_part_mode_json;\n    cJSON *array_part_L2_json, *array_part_L2_en_json, *array_part_L2_mode_json;\n    cJSON *latency_json, *latency_en_json, *latency_mode_json;\n    cJSON *simd_json, *simd_en_json, *simd_mode_json;\n    /* Enable for array partitioning, L2 array partitioning, latency hiding, SIMD. */\n    bool pe_opt_en[4];\n    char *pe_opt_mode[4];    \n\n    kernel->prog = gen->prog;\n    kernel->options = gen->options;    \n\n    /* Create local arrays. */\n    kernel = autosa_kernel_create_local_arrays(kernel, gen->prog);\n    assert(kernel != NULL);\n\n    /* Update the sparse structures */\n    if (gen->options->autosa->block_sparse) {\n        autosa_kernel_extract_sparse_info(kernel, gen);\n    }\n\n    /* Apply PE optimization. */\n    array_part_json = cJSON_GetObjectItemCaseSensitive(gen->tuning_config, \"array_part\");\n    array_part_en_json = cJSON_GetObjectItemCaseSensitive(array_part_json, \"enable\");\n    array_part_mode_json = cJSON_GetObjectItemCaseSensitive(array_part_json, \"mode\");\n\n    array_part_L2_json = cJSON_GetObjectItemCaseSensitive(gen->tuning_config, \"array_part_L2\");\n    array_part_L2_en_json = cJSON_GetObjectItemCaseSensitive(array_part_L2_json, \"enable\");\n    array_part_L2_mode_json = cJSON_GetObjectItemCaseSensitive(array_part_L2_json, \"mode\");\n\n    latency_json = cJSON_GetObjectItemCaseSensitive(gen->tuning_config, \"latency\");\n    latency_en_json = cJSON_GetObjectItemCaseSensitive(latency_json, \"enable\");\n    latency_mode_json = cJSON_GetObjectItemCaseSensitive(latency_json, \"mode\");\n\n    simd_json = cJSON_GetObjectItemCaseSensitive(gen->tuning_config, \"simd\");\n    simd_en_json = cJSON_GetObjectItemCaseSensitive(simd_json, \"enable\");\n    simd_mode_json = cJSON_GetObjectItemCaseSensitive(simd_json, \"mode\");\n\n    pe_opt_en[0] = array_part_en_json->valueint;\n    pe_opt_en[1] = array_part_L2_en_json->valueint;\n    pe_opt_en[2] = latency_en_json->valueint;\n    pe_opt_en[3] = simd_en_json->valueint;\n\n    pe_opt_mode[0] = array_part_mode_json->valuestring;\n    pe_opt_mode[1] = array_part_L2_mode_json->valuestring;\n    pe_opt_mode[2] = latency_mode_json->valuestring;\n    pe_opt_mode[3] = simd_mode_json->valuestring;\n\n    /* Compute Management */\n    compute_management(gen, kernel, pe_opt_en, pe_opt_mode);\n    /* Create the autosa_kernel object and attach to the schedule. */\n    if (!kernel)    \n        return NULL;    \n\n    /* Process meta data */\n    kernel = process_kernel_meta_data(kernel, gen);\n    \n    /* Communication Management */\n    comm_management(kernel, gen);    \n\n    return kernel;\n}\n\n/* Create an autosa_kernel represents the domain isntances that reach \"node\" and \n * insert a mark node pointing to the autosa_kernel before \"node\".\n *\n * Mark all outer band nodes as atomic to ensure each kernel is only scheduled once.\n * If the domain elements that reach \"node\" live in more than one space,\n * then group the domain elements into a single space, named kernelX, \n * with X the kernel sequence numbers.\n *\n * [Space-time transformation]\n * We will first perform space-time transformation to transform the design to \n * systolic array.\n * [PE optimization]\n * PE optimization is applied next including: array parititioning, latency hiding, \n * and SIMD vectorization.\n * For array partitioning, the mark \"array\" is added between the tile and point loops.\n * All the loops below the \"array\" mark will be mapped to FPGA device at once.\n * For latency hiding, SIMD vectorization, all the generated loops will be marked\n * \"latency\" and \"SIMD\".\n * [Communication management]\n * Then we perform comm opt. through: data allocation, I/O construction, and \n * I/O optimization.\n * \n * [Ignore below...]\n * The linear branch between the kernel node and \"array\" mark may also have a \n * \"local\" mark. If present, the mapping to local memory is computed at this point. \n * The \"local\" mark will be removed at the end of this function.\n *\n * Compute array reference groups for all arrays, set the local array bounds \n * based on the set of domain instances that reach the kernel node, \n * check the total amount of shared memory used and compute \n * all group tilings.\n *\n * We save a copy of the schedule that may influence the mappings to shared or private\n * memory in kernel->copy_schedule.\n *\n * We add copy statements to the schedule tree and create representations for \n * the local variables in the kernel.\n *\n * We keep a copy of the isl_id that points to the kernel to ensure \n * that the kernel does not get destroyed if the schedule node \n * is freed due to some error condition.\n */\nstatic __isl_give isl_schedule_node *compute_and_comm_optimize(\n    struct autosa_gen *gen, __isl_take isl_schedule_node *node)\n{\n    isl_size num_sa = 0;\n    struct autosa_kernel **sa_candidates;\n    struct autosa_kernel *sa_opt, *kernel;\n    isl_schedule *schedule;                       \n    char *space_time_mode;\n    cJSON *space_time_json, *space_time_mode_json, *n_sa_json, *tuning;\n\n    /* Set up the sched_pos property */\n    node = sched_pos_setup(node);\n\n    /* Generate systolic arrays using space-time mapping. */\n    schedule = isl_schedule_node_get_schedule(node);\n    isl_schedule_node_free(node);\n    sa_candidates = sa_space_time_transform(schedule, gen->prog->scop, &num_sa);\n    if (num_sa > 0)\n        printf(\"[AutoSA] %d systolic arrays generated.\\n\", num_sa);\n    else\n    {\n        printf(\"[AutoSA] No systolic array generated. Exit now.\\n\");\n        exit(0);\n    }\n\n    space_time_json = cJSON_GetObjectItemCaseSensitive(gen->tuning_config, \"space_time\");\n    space_time_mode_json = cJSON_GetObjectItemCaseSensitive(space_time_json, \"mode\");\n    space_time_mode = space_time_mode_json->valuestring;\n    \n    if (!strcmp(space_time_mode, \"auto\"))\n    {\n        /* Space-time transformation is set in AUTO mode. We will pick up\n         * one systolic array to proceed based on heuristics. \n         */\n        kernel = sa_candidates_smart_pick(sa_candidates, num_sa);\n    } else {\n        /* Space-time transformation is set in MANUAL mode. We will take the user\n         * specification to select one systolic array to proceed.\n         */\n        isl_union_map *sizes = extract_sizes_from_str(gen->ctx,\n                                                      gen->options->autosa->sa_sizes);\n        int kernel_id = read_space_time_kernel_id(sizes);\n        isl_union_map_free(sizes);\n        if (kernel_id < 0)\n        {\n            /* User hasn't specified which systolic array to choose yet.\n             * We will dump out the number of systolic array designs and \n             * exit the program. */\n            FILE *fp;\n            char *content;\n            isl_printer *p_str;\n            char *tuning_path;\n\n            tuning = cJSON_CreateObject();\n            space_time_json = cJSON_CreateObject();\n            n_sa_json = cJSON_CreateNumber(num_sa);\n            cJSON_AddItemToObject(space_time_json, \"n_kernel\", n_sa_json);\n            cJSON_AddItemToObject(tuning, \"space_time\", space_time_json);\n            p_str = isl_printer_to_str(gen->ctx);\n            p_str = isl_printer_print_str(p_str, gen->options->autosa->output_dir);\n            p_str = isl_printer_print_str(p_str, \"/tuning.json\");\n            tuning_path = isl_printer_get_str(p_str);\n            fp = fopen(tuning_path, \"w\");\n            free(tuning_path);\n            isl_printer_free(p_str);\n            content = cJSON_Print(tuning);\n            fprintf(fp, \"%s\", content);\n            cJSON_Delete(tuning);\n            exit(0);\n        }\n        else\n        {\n            kernel = sa_candidates_manual_pick(sa_candidates, num_sa, kernel_id);\n        }\n    }\n        \n    /* Dump out the intermediate code if needed */\n    if (gen->options->autosa->dump_code) {\n        dump_intermediate_code(gen, isl_schedule_copy(kernel->schedule), \"space_time\");\n    }\n    \n    /* Update the array information */\n    TP_extract_array_info(gen, kernel);    \n    kernel = optimize_single_array(kernel, gen);\n    gen->tuning_progs.push_back(kernel->tuning_program);\n\n    if (kernel) {\n        node = isl_schedule_get_root(kernel->schedule);\n        node = autosa_tree_move_down_to_kernel(node);\n    } else {\n        return NULL;\n    }\n\n    return node;\n}\n\n/* Return a read (\"read\" is 1) or write access relation for \"group\"\n * with those accesses removed that are only needed to communicate data\n * within the subtree of the schedule rooted at \"node\".\n * Furthermore, include the prefix schedule at \"node\".\n * That is, return a relation of the form\n *\n *\tS -> [D -> A]\n *\n * with D the outer schedule dimensions at \"node\".\n */\nstatic __isl_give isl_union_map *anchored_non_local_accesses(\n    struct autosa_kernel *kernel, struct autosa_array_ref_group *group,\n    __isl_take isl_schedule_node *node, int read)\n{\n    isl_union_map *access;\n    isl_union_map *prefix;\n\n    prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n    prefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix,\n                                                              isl_union_pw_multi_aff_copy(kernel->contraction));\n    access = autosa_array_ref_group_access_relation(group, read, !read);\n    access = remove_local_accesses_group(kernel, group, access, prefix,\n                                         read);\n    /* Prefix: S -> D\n   * Access: S -> A\n   * range_product: S -> [D -> A]\n   */\n    access = isl_union_map_range_product(prefix, access);\n\n    return access;\n}\n\n/* Given an array reference group \"group\", create a mapping\n *\n *\tread[D -> A] -> [D -> A]\n *\n * if \"read\" is set or\n *\n *\twrite[D -> A] -> [D -> A]\n *\n * if \"read\" is not set.\n * D corresponds to the outer tile->depth dimensions of\n * the kernel schedule.\n */\nstatic __isl_give isl_multi_aff *create_from_access(isl_ctx *ctx,\n                                                    struct autosa_array_ref_group *group, int read)\n{\n    struct autosa_array_tile *tile;\n    isl_space *space;\n    isl_id *id;\n\n    tile = autosa_array_ref_group_tile(group);\n    space = isl_space_copy(group->array->space);\n    space = isl_space_from_range(space);\n    space = isl_space_add_dims(space, isl_dim_in, tile->depth);\n    space = isl_space_wrap(space);\n    space = isl_space_map_from_set(space);\n\n    id = isl_id_alloc(ctx, read ? \"read\" : \"write\", group);\n    space = isl_space_set_tuple_id(space, isl_dim_in, id);\n\n    return isl_multi_aff_identity(space);\n}\n\n/* Add copy statements to the schedule tree of \"node\"\n * for reading from global memory to local memory (if \"read\" is set) or\n * for writing back from local memory to global memory\n * (if \"read\" is not set) for the array reference group \"group\" that\n * is mapped to local memory.\n * On input, \"node\" points to the kernel node, and it is moved\n * back there on output.\n *\n * The copies are performed in the order of the corresponding local\n * memory tile.\n * The copy statement instances include a reference to the outer\n * tile->depth dimensions of the kernel schedule for ease of\n * combining them with the group tiling.\n *\n * If we are performing a read from global memory to local memory and\n * if the array involved is not a scalar, then we copy\n * the entire tile to local memory.  This may result in some extra\n * elements getting copied, but it should lead to simpler code\n * (which means that fewer registers may be needed) and less divergence.\n *\n * Otherwise, we only copy the elements that will be read or have been written\n * in the kernel.\n *\n * That is, the extra schedule is of the form\n *\n *\ttype[D -> A] -> T\n *\n * where D corresponds to the outer tile->depth dimensions of\n * the kernel schedule, A to the global array and T is the corresponding\n * local memory tile.\n *\n * The copying is inserted in the schedule tree through an extension\n * of the form\n *\n *\tD -> type[D -> A]\n *\n * where the extra domain elements type[D -> A] are those accessed\n * by the group.  In the case of read from a non-scalar, this set\n * is replaced by the entire local memory tile.\n *\n * If the \"unroll_copy_local\" option is set, then the AST generator\n * is instructed to unroll the copying code.\n *\n * The extension is inserted before the core computation in case of a read\n * and after the core computation in case of a write.\n */\nstatic __isl_give isl_schedule_node *add_copies_group_local(\n    struct autosa_kernel *kernel, struct autosa_array_ref_group *group,\n    __isl_take isl_schedule_node *node, int read)\n{\n    struct autosa_array_tile *tile;\n    isl_union_map *access;\n    isl_union_set *domain;\n    isl_multi_aff *ma;\n    isl_multi_aff *from_access;\n    isl_multi_pw_aff *mpa;\n    isl_multi_union_pw_aff *mupa;\n    isl_schedule_node *graft;\n    isl_union_set *filter;\n    int skip;\n    int kernel_depth;\n    int empty;\n\n    tile = autosa_array_ref_group_tile(group);\n    kernel_depth = isl_schedule_node_get_schedule_depth(node);\n    node = autosa_tree_move_down_to_depth(node, tile->depth, kernel->core);\n\n    /* S -> [D -> A] \n   * S: domain elements\n   * D: prefix schedule dimensions\n   * A: access \n   */\n    access = anchored_non_local_accesses(kernel, group, node, read);\n    empty = isl_union_map_is_empty(access);\n    if (empty < 0 || empty)\n    {\n        isl_union_map_free(access);\n        if (empty < 0)\n            return isl_schedule_node_free(node);\n        return autosa_tree_move_up_to_kernel(node);\n    }\n\n    //group->array->global = 1;\n    //group->local_array->global = 1;\n\n    /* read[D -> A] -> [D -> A] */\n    from_access = create_from_access(kernel->ctx, group, read);\n\n    /* [D -> A] -> T */\n    ma = isl_multi_aff_copy(tile->tiling);\n    ma = isl_multi_aff_pullback_multi_aff(ma,\n                                          isl_multi_aff_copy(from_access));\n    mpa = isl_multi_pw_aff_from_multi_aff(ma);\n    /* read[D -> A] -> T */\n    mupa = isl_multi_union_pw_aff_from_multi_pw_aff(mpa);\n\n    /* [D -> A] */\n    domain = isl_union_map_range(access);\n\n    if (read && !autosa_array_is_scalar(group->array))\n    {\n        isl_map *map;\n        isl_set *set;\n        set = isl_map_domain(isl_map_from_union_map(isl_union_set_unwrap(domain)));\n        map = group_tile(group);\n        map = isl_map_intersect_domain(map, set);\n        domain = isl_union_set_from_set(isl_map_wrap(map));\n    }\n\n    /* read[D -> A] */\n    domain = isl_union_set_preimage_multi_aff(domain, from_access);\n    /* read[D -> A] -> D */\n    access = isl_union_set_wrapped_domain_map(domain);\n    /* D -> read[D -> A] */\n    access = isl_union_map_reverse(access);\n    access = isl_union_map_coalesce(access);\n    graft = isl_schedule_node_from_extension(access);\n    graft = isl_schedule_node_child(graft, 0);\n    graft = isl_schedule_node_insert_partial_schedule(graft, mupa);\n    if (kernel->options->unroll_copy_shared)\n        graft = ppcg_set_schedule_node_type(graft, isl_ast_loop_unroll);\n\n    while (graft && isl_schedule_node_has_parent(graft))\n        graft = isl_schedule_node_parent(graft);\n\n    if (read)\n    {\n        node = isl_schedule_node_graft_before(node, graft);\n    }\n    else\n    {\n        node = isl_schedule_node_graft_after(node, graft);\n    }\n\n    node = autosa_tree_move_up_to_kernel(node);\n\n    return node;\n}\n\n/* Check whether the array reference group \"group\" is mapped to\n * local memory and, if so, add copy statements to the schedule tree of \"node\"\n * for reading from global memory to local memory\n * (if \"read\" is set) or for writing back from local memory\n * to global memory (if \"read\" is not set) for this group.\n * On input, \"node\" points to the kernel node, and it is moved\n * back there on output.\n */\nstatic __isl_give isl_schedule_node *add_copies_group(\n    struct autosa_kernel *kernel, struct autosa_array_ref_group *group,\n    __isl_take isl_schedule_node *node, int read)\n{\n    enum autosa_group_access_type type;\n\n    type = autosa_cpu_array_ref_group_type(group);\n    if (type == AUTOSA_ACCESS_LOCAL)\n        return add_copies_group_local(kernel, group, node, read);\n\n    return node;\n}\n\nstatic void create_kernel_var(isl_ctx *ctx,\n                              struct autosa_array_ref_group *group,\n                              struct autosa_kernel_var *var)\n{\n    int j;\n    struct autosa_array_tile *tile;\n    isl_printer *p;\n\n    var->array = group->array;\n\n    var->type = autosa_array_ref_group_type(group);\n    tile = autosa_array_ref_group_tile(group);\n\n    p = isl_printer_to_str(ctx);\n    p = autosa_array_ref_group_print_name(group, p);\n    var->name = isl_printer_get_str(p);\n    isl_printer_free(p);\n\n    var->size = isl_vec_alloc(ctx, group->array->n_index);\n\n    for (j = 0; j < group->array->n_index; ++j)\n        var->size = isl_vec_set_element_val(var->size, j,\n                                            isl_val_copy(tile->bound[j].size));\n}\n\nstatic isl_stat create_kernel_vars(struct autosa_kernel *kernel)\n{\n    int i, j, n;\n\n    n = 0;\n    for (i = 0; i < kernel->n_array; ++i)\n    {\n        struct autosa_local_array_info *array = &kernel->array[i];\n\n        for (j = 0; j < array->n_group; ++j)\n        {\n            struct autosa_array_ref_group *group = array->groups[j];\n            enum autosa_group_access_type type;\n\n            type = autosa_cpu_array_ref_group_type(group);\n            if (type != AUTOSA_ACCESS_GLOBAL)\n                ++n;\n        }\n    }\n\n    kernel->var = isl_calloc_array(kernel->ctx, struct autosa_kernel_var, n);\n    if (!kernel->var)\n        return isl_stat_error;\n    kernel->n_var = n;\n\n    n = 0;\n    for (i = 0; i < kernel->n_array; ++i)\n    {\n        struct autosa_local_array_info *array = &kernel->array[i];\n\n        for (j = 0; j < array->n_group; ++j)\n        {\n            struct autosa_array_ref_group *group = array->groups[j];\n            enum autosa_group_access_type type;\n\n            type = autosa_cpu_array_ref_group_type(group);\n            if (type == AUTOSA_ACCESS_GLOBAL)\n                continue;\n            create_kernel_var(kernel->ctx, group, &kernel->var[n]);\n            ++n;\n        }\n    }\n\n    return isl_stat_ok;\n}\n\n/* For each array reference group that is mapped to local memory,\n * add copy statements to the schedule tree of \"node\"\n * for reading from global memory to local memory\n * and for writing back.\n * On input, \"node\" points to the kernel node, and it is moved\n * back there on output.\n */\nstatic __isl_give isl_schedule_node *add_copies(struct autosa_kernel *kernel,\n                                                __isl_take isl_schedule_node *node)\n{\n    int i, j;\n\n    for (i = 0; i < kernel->n_array; ++i)\n    {\n        struct autosa_local_array_info *array = &kernel->array[i];\n\n        for (j = 0; j < array->n_group; ++j)\n        {\n            struct autosa_array_ref_group *group = array->groups[j];\n            node = add_copies_group(kernel, group, node, 1);\n            if (!node)\n                return NULL;\n            node = add_copies_group(kernel, group, node, 0);\n            if (!node)\n                return NULL;\n        }\n    }\n\n    return node;\n}\n\n/* Add copy-in/out stmts for the default schedule. */\nstatic __isl_give isl_schedule_node *sa_add_copies(\n    struct autosa_gen *gen, __isl_take isl_schedule_node *node)\n{\n    struct autosa_kernel *kernel;\n    isl_id *id;\n    isl_set *host_domain;\n    isl_union_pw_multi_aff *contraction;\n    int single_statement;\n\n    id = isl_schedule_node_mark_get_id(node);\n    kernel = (struct autosa_kernel *)isl_id_get_user(id);\n    host_domain = kernel->host_domain;\n    single_statement = kernel->single_statement;\n\n    /* Add the copy statements. */\n    node = add_copies(kernel, node);\n\n    if (create_kernel_vars(kernel) < 0)\n        node = isl_schedule_node_free(node);\n\n    if (!single_statement)\n        node = isl_schedule_node_parent(node);\n\n    isl_id_free(id);\n\n    return node;\n}\n\n/* Perform computation and commmunication management to update the \n * \"schedule\" for mapping to FPGA.\n *\n * Unlike PPCG, in AutoSA, only one SA kernel is created out of the \n * original program, which is guaranteed by the previous step.\n * We will insert a context node, create a autosa_kernel for the schedule tree\n * beneath. Nodes for copying arrays in and out of the FPGA device and for\n * initializing and clearing the device are added. \n *\n * The FPGA code is generated in a context where at least one statement \n * instance is executed. The corresponding guard is inserted around \n * the entire schedule.\n */\n__isl_give isl_schedule *sa_map_to_device(struct autosa_gen *gen,\n                                          __isl_take isl_schedule *schedule)\n{\n    isl_schedule_node *node;\n    isl_set *context;\n    isl_set *guard;\n    isl_union_set *domain;\n    isl_union_map *prefix;\n    isl_union_pw_multi_aff *contraction;\n    struct autosa_prog *prog;\n    isl_schedule *hw_schedule;\n    struct autosa_kernel *kernel;\n    isl_id *id;\n    cJSON *tuning_config = NULL;\n\n    /* Load the tuning configuration file */\n    tuning_config = load_tuning_config(gen->options->autosa->config);\n    if (!tuning_config)\n    {\n        isl_schedule_free(schedule);\n        printf(\"[AutoSA] Error: AutoSA configuration file not found: %s\\n\",\n               gen->options->autosa->config);\n        exit(1);\n    }\n    gen->tuning_config = tuning_config;\n\n    context = isl_set_copy(gen->prog->context);\n    context = isl_set_from_params(context);\n    schedule = isl_schedule_insert_context(schedule, context);\n\n    prog = gen->prog;\n    guard = isl_union_set_params(isl_union_set_copy(prog->scop->domain));\n    prog->context = isl_set_intersect(prog->context, isl_set_copy(guard));\n    guard = isl_set_from_params(guard);\n\n    node = isl_schedule_get_root(schedule);\n    isl_schedule_free(schedule);\n    node = isl_schedule_node_child(node, 0);\n    node = isl_schedule_node_child(node, 0);\n    domain = isl_schedule_node_get_domain(node);\n    contraction = isl_schedule_node_get_subtree_contraction(node);\n    domain = isl_union_set_preimage_union_pw_multi_aff(domain,\n                                                       isl_union_pw_multi_aff_copy(contraction));\n    prefix = isl_schedule_node_get_prefix_schedule_union_map(node);\n    prefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix,\n                                                              contraction);\n\n    /* Perform compute and comm optimization. */\n    node = compute_and_comm_optimize(gen, node);    \n\n    id = isl_schedule_node_mark_get_id(node);\n    kernel = (struct autosa_kernel *)isl_id_get_user(id);\n    isl_id_free(id);\n    schedule = isl_schedule_node_get_schedule(node);    \n    /* Generate hw modules in the systolic array. */    \n    generate_hw_modules(schedule, gen, kernel);        \n\n    /* Add copy statements for the default schedule (used for correctness verification). */\n    node = sa_add_copies(gen, node);\n\n    /* Add copy-in/out statement for transferring data to/from the FPGA device. */\n    node = sa_add_to_from_device(node, domain, prefix, gen->prog);\n    node = isl_schedule_node_root(node);\n    node = isl_schedule_node_child(node, 0);\n    node = isl_schedule_node_child(node, 0);\n    node = isl_schedule_node_insert_guard(node, guard);\n    node = isl_schedule_node_child(node, 0);    \n\n    /* Add init/clear device statements. */\n    node = sa_add_init_clear_device(node, kernel);\n\n    /* Add drain merge nodes. */\n    node = sa_add_drain_merge(node, gen);    \n\n    isl_schedule_free(gen->schedule);\n    gen->schedule = isl_schedule_node_get_schedule(node);\n    isl_schedule_node_free(node);\n    cJSON_Delete(gen->tuning_config);\n\n    return gen->schedule;\n}\n\n/* Generate HLS code for \"scop\" and print it to \"p\".\n * After generating an AST for the transformed scop as explained below,\n * we call \"gen->print\" to print the AST in the desired output format \n * to \"p\".\n * \n * If it turns out that it does not make sense to generate SA code, \n * then we generate CPU code instead.\n * \n * The declarations of the arrays that are visible outside of the scop\n * are printed outside of the code generated from the schedule,\n * because the generated code may involve a guard around the entire code.\n * \n * We first compute a schedule that respects the dependences \n * of the original program and test if the current program can be mapped to sa.\n * If not, we will generate CPU code instead.\n * If the --load-schedule is specified, then the loaded schedule \n * is used instead of a computed schedule.\n * \n * For the candidate program, a sequence of optimizations are performed, \n * including: \n * - Space-time Transformation\n * - PE Optimization\n *   - Array Partitioning\n *   - Latency Hiding\n *   - SIMD Vectorization\n * - Data Transfer Optimization\n *   - Data Allocation\n *   - I/O Construction\n *   - I/O Optimization\n * \n * After the array partitioning, we have a program with\n * K\n * |\n * T\n * |\n * P\n * \n * We add the kernel marker on top.\n * For each iteration of the T band and for each array, we compute\n * the array elements accessed by that iteration, construct a rectangular\n * box around it and shift it to the origin. The result is used\n * as the on-chip memory for the array.\n * \n * Copying statements are added to this schedule tree.\n * In practice, these are added in front of the P band, but some of them \n * may get hoisted up to higher levels.\n * \n * The entire AST is then generated from the single resulting schedule tree.\n * During the generation the subtrees at kernel nodes (K) are saved aside and\n * replaced by kernel calls. The result is printed as host code while the saved\n * subtrees are printed as device code.\n */\nstatic __isl_give isl_printer *generate(__isl_take isl_printer *p,\n                                        struct autosa_gen *gen, struct ppcg_scop *scop,\n                                        struct ppcg_options *options)\n{\n    struct autosa_prog *prog;\n    isl_ctx *ctx;\n    isl_schedule *schedule;\n    isl_bool any_sa;\n\n    if (!scop)\n        return isl_printer_free(p);\n\n    ctx = isl_printer_get_ctx(p);\n    prog = autosa_prog_alloc(ctx, scop);\n    if (!prog)\n        return isl_printer_free(p);\n\n    gen->prog = prog;\n    /* Scheduling */\n    schedule = get_schedule(gen);    \n\n    /* The current ISL scheduler is limited and sometimes can't find the \n     * fully permutable loop band correctly.\n     * As a temporary hack, here we will try a second time and to merge the \n     * outer band as much as possible.\n     */    \n    schedule = merge_outer_bands(schedule, gen);    \n    //DBGSCHD(stdout, schedule, isl_schedule_get_ctx(schedule));\n\n    /* Legality check */\n    isl_bool is_legal = sa_legality_check(schedule, scop);\n    if (is_legal < 0 || !is_legal)\n    {\n        if (is_legal < 0)\n            p = isl_printer_free(p);\n        else\n            p = print_cpu(p, scop, options);\n        isl_schedule_free(schedule);\n    }\n    else\n    {\n        if (gen->options->autosa->array_contraction) {\n            /* If array contraction is enabled, disable isl sink. */\n            gen->options->autosa->isl_sink = 0;\n        }\n\n        /* Perform opt. stages:\n         * Computation Management -> Communication Management     \n         */        \n        gen->schedule = sa_map_to_device(gen, schedule);        \n\n        /* Generate the AST tree. */\n        gen->tree = sa_generate_code(gen, gen->schedule);\n        for (int i = 0; i < gen->n_hw_modules; i++)\n        {\n            if (gen->hw_modules[i]->is_filter == 1 &&\n                gen->hw_modules[i]->is_buffer == 1)\n            {\n                sa_filter_buffer_io_module_generate_code(gen, gen->hw_modules[i]);\n            }\n            else\n            {\n                sa_module_generate_code(gen, gen->hw_modules[i]);\n            }\n        }\n        sa_top_module_generate_code(gen);\n        for (int i = 0; i < gen->n_drain_merge_funcs; i++)\n        {\n            sa_drain_merge_generate_code(gen, gen->drain_merge_funcs[i]);\n        }\n        if (gen->options->autosa->host_serialize)\n        {\n            for (int i = 0; i < gen->n_hw_modules; i++)\n            {\n                if (gen->hw_modules[i]->to_mem)\n                {\n                    sa_host_serialize_generate_code(gen, gen->hw_modules[i]);\n                }\n            }\n        }\n\n        /* Extract loop structure for latency estimation */\n        for (int i = 0; i < gen->n_hw_modules; i++)\n        {\n            sa_extract_loop_info(gen, gen->hw_modules[i]);\n        }\n        if (options->autosa->tuning_method == 1) {\n            /* Extract the information for performance est in the auto tuner. */\n            for (int i = 0; i < gen->n_hw_modules; i++) {     \n                TP_extract_loop_info(gen, gen->hw_modules[i]);\n                TP_extract_resource_info(gen, gen->hw_modules[i]);\n                TP_extract_module_attr(gen, gen->hw_modules[i]);\n            }        \n        }\n\n        /* Dump out the array information */\n        sa_extract_array_info(gen->kernel);\n        /* Extract design information for resource estimation */\n        sa_extract_design_info(gen);\n\n        /* Code generation */        \n        p = ppcg_print_exposed_declarations(p, prog->scop);\n        p = gen->print(p, gen->prog, gen->tree, gen->hw_modules, gen->n_hw_modules,\n                       gen->hw_top_module, gen->drain_merge_funcs, gen->n_drain_merge_funcs,\n                       &gen->types, gen->print_user);\n\n        /* Dump tuning information */\n        if (options->autosa->tuning_method == 1) {\n            std::string params_f(options->autosa->output_dir);\n            params_f += \"/tuning\";\n            for (int i = 0; i < gen->tuning_progs.size(); i++) {\n                gen->tuning_progs[i]->dump(params_f);\n            }\n        }\n\n        /* Clean up */\n        isl_ast_node_free(gen->tree);\n        autosa_kernel_free(gen->kernel);\n        for (int i = 0; i < gen->n_hw_modules; i++)\n        {\n            autosa_hw_module_free(gen->hw_modules[i]);\n        }\n        free(gen->hw_modules);\n        autosa_hw_top_module_free(gen->hw_top_module);\n        for (int i = 0; i < gen->n_drain_merge_funcs; i++)\n        {\n            autosa_drain_merge_func_free(gen->drain_merge_funcs[i]);\n        }\n        free(gen->drain_merge_funcs);\n    }\n\n    autosa_prog_free(prog);\n\n    return p;\n}\n\n/* Wrapper around generate for use as a ppcg_transform callback. \n */\nstatic __isl_give isl_printer *generate_wrap(__isl_take isl_printer *p,\n                                             struct ppcg_scop *scop, void *user)\n{\n    struct autosa_gen *gen = (struct autosa_gen *)user;\n\n    return generate(p, gen, scop, gen->options);\n}\n\n/* Transform the code in the file called \"input\" by replacing \n * all scops by corresponding HLS code and write the results to \"out\".\n */\nint generate_sa(isl_ctx *ctx, const char *input, FILE *out,\n                struct ppcg_options *options,\n                __isl_give isl_printer *(*print)(__isl_take isl_printer *p,\n                                                 struct autosa_prog *prog, __isl_keep isl_ast_node *trees,\n                                                 struct autosa_hw_module **modules, int n_module,\n                                                 struct autosa_hw_top_module *module,\n                                                 struct autosa_drain_merge_func **drain_merge_funcs, int n_drain_merge_funcs,\n                                                 struct autosa_types *types, void *user),\n                void *user)\n{\n    struct autosa_gen gen;\n    int r;\n    int i;\n\n    gen.ctx = ctx;\n    gen.sizes = extract_sizes_from_str(ctx, options->sizes);\n    gen.options = options;\n    gen.kernel_id = 0;\n    gen.print = print;\n    gen.print_user = user;\n    gen.types.n = 0;\n    gen.types.name = NULL;\n    gen.hw_modules = NULL;\n    gen.n_hw_modules = 0;\n    gen.hw_top_module = NULL;\n    gen.drain_merge_funcs = NULL;\n    gen.n_drain_merge_funcs = 0;\n    gen.schedule = NULL;\n    gen.kernel = NULL;\n    gen.tuning_config = NULL;    \n\n    r = ppcg_transform(ctx, input, out, options, &generate_wrap, &gen);    \n\n    isl_union_map_free(gen.sizes);\n    for (i = 0; i < gen.types.n; ++i)\n        free(gen.types.name[i]);\n    free(gen.types.name);    \n\n    return r;\n}\n"
  },
  {
    "path": "src/autosa_trans.h",
    "content": "/* Defines functions for computation management in AutoSA, including:\n * - space-time transformation\n * - array partitionining\n * - latency hiding\n * - SIMD vectorization\n */\n\n#ifndef _AUTOSA_TRANS_H\n#define _AUTOSA_TRANS_H\n\n#include <isl/constraint.h>\n\n#include \"autosa_common.h\"\n\n/* Internal structure for loop tiling in PE optimization.\n */\nstruct autosa_pe_opt_tile_data\n{\n    int n_tiled_loop;\n    int n_touched_loop;\n    int tile_len;\n    int *tile_size;\n    struct autosa_kernel *sa;\n};\n\nint generate_sa(isl_ctx *ctx, const char *input, FILE *out,\n                struct ppcg_options *options,\n                __isl_give isl_printer *(*print)(__isl_take isl_printer *p,\n                                                 struct autosa_prog *prog, __isl_keep isl_ast_node *tree,\n                                                 struct autosa_hw_module **modules, int n_modules,\n                                                 struct autosa_hw_top_module *top_module,\n                                                 struct autosa_drain_merge_func **drain_merge_funcs, int n_drain_merge_funcs,\n                                                 struct autosa_types *types, void *user),\n                void *user);\n__isl_give isl_schedule *sa_map_to_device(struct autosa_gen *gen,\n                                          __isl_take isl_schedule *schedule);\nisl_bool sa_legality_check(__isl_keep isl_schedule *schedule, struct ppcg_scop *scop);\n\n/* Space-Time transformation */\nstruct autosa_kernel **sa_space_time_transform_at_dim_async(\n    __isl_keep isl_schedule *schedule, struct ppcg_scop *scop,\n    isl_size dim, isl_size *num_sa);\nstruct autosa_kernel **sa_space_time_transform_at_dim_sync(\n    __isl_keep isl_schedule *schedule, struct ppcg_scop *scop,\n    isl_size dim, isl_size *num_sa);\nstruct autosa_kernel **sa_space_time_transform_at_dim(\n    __isl_keep isl_schedule *schedule, struct ppcg_scop *scop,\n    isl_size dim, isl_size *num_sa);\nstruct autosa_kernel *sa_candidates_smart_pick(\n    struct autosa_kernel **sa_list, __isl_keep isl_size num_sa);\nstruct autosa_kernel *sa_candidates_manual_pick(\n    struct autosa_kernel **sa_list, isl_size num_sa, int sa_id);\nstruct autosa_kernel **sa_space_time_transform(\n    __isl_take isl_schedule *schedule, struct ppcg_scop *scop, isl_size *num_sa);\n\n/* PE Optimization */\nisl_stat sa_array_partitioning_optimize(\n    struct autosa_kernel *sa, bool en, char *mode, bool L2_en, char *L2_mode);\nisl_stat sa_latency_hiding_optimize(\n    struct autosa_kernel *sa, bool en, char *mode);\nisl_stat sa_simd_vectorization_optimize(\n    struct autosa_kernel *sa, char *mode);\nisl_stat compute_management(\n    struct autosa_gen *gen,\n    struct autosa_kernel *sa, bool pass_en[], char *pass_mode[]);\n\nisl_stat sa_loop_init(struct autosa_kernel *sa);\nisl_stat sa_space_time_loop_setup(struct autosa_kernel *sa);\n\nvoid extract_sa_dims_from_node(__isl_keep isl_schedule_node *node, int *sa_dims, int n_sa_dim);\n\n#endif"
  },
  {
    "path": "src/autosa_tuning.cpp",
    "content": "/* This function defines all the functions used for AutoSA tuning.\n * When executed in the tuning mode, AutoSA will automatically optimize the program,\n * applying different permutation and tiling techniques.\n * The program transform history and program loop structure are recorded, which \n * are later used by the auto-tuner.\n */\n#include <iomanip>\n#include <iostream>\n#include <fstream>\n\n#include \"ppcg.h\"\n#include \"autosa_tuning.h\"\n#include \"autosa_schedule_tree.h\"\n\n__isl_give TPExpr *TPExpr::div_by_param(__isl_take TPExpr *divisor) {        \n    TPExpr *expr = new TPExpr(\"div\", this, divisor);\n    return expr;\n}\n\n__isl_give TPExpr *TPExpr::ceil() {    \n    TPExpr *expr = new TPExpr(\"ceil\", this);\n    return expr;\n}\n\n__isl_give TPExpr *TPExpr::add(__isl_take TPExpr *expr) {\n    if (this->func == \"NULL\") {        \n        delete this;\n        return expr;        \n    } else {\n        TPExpr *new_expr = new TPExpr(\"add\", this, expr);\n        return new_expr;\n    }\n}\n\n__isl_give TPExpr *TPExpr::mul(__isl_take TPExpr *expr) {   \n    if (this->func == \"NULL\") {\n        delete this;\n        return expr;\n    } else if (this->to_str() == \"1\") {\n        delete this;\n        return expr;\n    } else if (expr->to_str() == \"1\") {\n        delete expr;\n        return this;\n    } else {\n        TPExpr *new_expr = new TPExpr(\"mul\", this, expr);\n        return new_expr;    \n    }\n}\n\n__isl_give TPExpr *TPExpr::subtract(__isl_take TPExpr *expr) {    \n    if (this->func == \"literal\" && dynamic_cast<TPConst *>(this->ops[0])) {        \n        int val = ((TPConst *)(this->ops[0]))->val;\n        if (expr->func == \"literal\" && dynamic_cast<TPConst *>(expr->ops[0])) {\n            val -= ((TPConst *)(expr->ops[0]))->val;        \n            delete this;\n            delete expr;\n            return new TPExpr(\"literal\", new TPConst(val));\n        }\n    } else if (expr->func == \"literal\" && dynamic_cast<TPConst *>(expr->ops[0])) {\n        int val = ((TPConst *)(expr->ops[0]))->val;\n        if (val == 0) {\n            delete expr;\n            return this;\n        }        \n    }\n    TPExpr *new_expr = new TPExpr(\"sub\", this, expr);\n    return new_expr;\n}\n\n__isl_give TPExpr *TPExpr::min(__isl_take TPExpr *expr) {    \n    if (this->func == \"literal\" && dynamic_cast<TPConst *>(this->ops[0])) {        \n        int val = ((TPConst *)(this->ops[0]))->val;\n        if (expr->func == \"literal\" && dynamic_cast<TPConst *>(expr->ops[0])) {\n            val = std::min(val, ((TPConst *)(expr->ops[0]))->val);\n            delete this;\n            delete expr;\n            return new TPExpr(\"literal\", new TPConst(val));\n        }\n    } else if (this->func == \"NULL\") {\n        delete this;\n        return expr;\n    } else if (this->to_str() == expr->to_str()) {\n        delete expr;\n        return this;\n    }\n    TPExpr *new_expr = new TPExpr(\"min\", this, expr);\n    return new_expr;\n}\n\n__isl_give TPExpr *TPExpr::max(__isl_take TPExpr *expr) {\n    if (this->func == \"literal\" && dynamic_cast<TPConst *>(this->ops[0])) {        \n        int val = ((TPConst *)(this->ops[0]))->val;\n        if (expr->func == \"literal\" && dynamic_cast<TPConst *>(expr->ops[0])) {\n            val = std::max(val, ((TPConst *)(expr->ops[0]))->val);\n            delete this;\n            delete expr;\n            return new TPExpr(\"literal\", new TPConst(val));\n        }\n    } else if (this->func == \"NULL\") {\n        delete this;\n        return expr;\n    } else if (this->to_str() == expr->to_str()) {\n        delete expr;\n        return this;\n    }\n    TPExpr *new_expr = new TPExpr(\"max\", this, expr);\n    return new_expr;\n}\n\n/* Create a duplicate of the current expression. */\n__isl_give TPExpr *TPExpr::dup() {\n    TPExpr *new_expr = new TPExpr();\n    new_expr->func = this->func;\n    if (this->func == \"literal\") {\n        TPExpr *op = this->ops[0];\n        if (dynamic_cast<TPParameter *>(op)) {            \n            new_expr->ops.push_back(((TPParameter *)(op))->dup());\n        } else if (dynamic_cast<TPConst *>(op)) {            \n            new_expr->ops.push_back(((TPConst *)(op))->dup());            \n        }\n    } else {\n        for (auto op : this->ops) {\n            new_expr->ops.push_back(op->dup());\n        }\n    }\n    return new_expr;\n}        \n\n__isl_give TPParameter *TPParameter::dup() {\n    TPParameter *new_param = new TPParameter();\n    new_param->name = this->name;\n    new_param->name_prefix = this->name_prefix;\n    new_param->type = this->type;\n    for (auto bound : this->bounds) {\n        new_param->bounds.push_back(std::shared_ptr<TPExpr>(bound->dup()));\n    }    \n    for (auto d : this->divisors) {\n        new_param->divisors.push_back(std::shared_ptr<TPExpr>(d->dup()));\n    }    \n    for (auto m : this->multiples) {\n        new_param->multiples.push_back(std::shared_ptr<TPExpr>(m->dup()));\n    }    \n    new_param->tune = this->tune;    \n    new_param->attr = this->attr; \n    for (auto tag : this->tags) {\n        new_param->tags.insert(tag);\n    }\n\n    return new_param;\n}\n\n__isl_give TPConst *TPConst::dup() {\n    TPConst *new_const = new TPConst();\n    new_const->type = this->type;\n    new_const->val = this->val;\n\n    return new_const; \n}\n\nbool propagate_cst(TPExpr *expr, int cst) {\n    bool status = false;\n    if (expr->func == \"add\" || expr->func == \"sub\") {\n        if (expr->ops[1]->func == \"add\" || expr->ops[1]->func == \"sub\") {\n            status = propagate_cst(expr->ops[1], cst);\n        } else if (expr->ops[1]->func == \"literal\" && dynamic_cast<TPConst *>(expr->ops[1]->ops[0])) {\n            int new_cst;\n            if (expr->func == \"sub\") \n                new_cst = dynamic_cast<TPConst *>(expr->ops[1]->ops[0])->val - cst;\n            else\n                new_cst = dynamic_cast<TPConst *>(expr->ops[1]->ops[0])->val + cst;\n            delete expr->ops[1]->ops[0];\n            expr->ops[1]->ops[0] = new TPConst(new_cst);\n            status = true;\n        }\n    }\n    return status;\n}\n\n__isl_give TPExpr *const_propagation(__isl_take TPExpr *expr) {\n    TPExpr *ret_expr = expr;\n    if (ret_expr->func == \"add\" || ret_expr->func == \"sub\") {\n        /* Check if const propogation is possible */\n        if (ret_expr->ops[1]->func == \"literal\" && dynamic_cast<TPConst *>(ret_expr->ops[1]->ops[0])) {            \n            bool status = propagate_cst(ret_expr->ops[0], dynamic_cast<TPConst *>(ret_expr->ops[1]->ops[0])->val);\n            if (status) {                \n                TPExpr *new_expr = ret_expr->ops[0]->dup();                \n                delete ret_expr;\n                ret_expr = new_expr;\n            }\n        }\n        /* Check if there is any zero in the operands. */\n        if (ret_expr->ops[1]->func == \"literal\" && dynamic_cast<TPConst *>(ret_expr->ops[1]->ops[0])) {\n            if (dynamic_cast<TPConst *>(ret_expr->ops[1]->ops[0])->val == 0) {\n                TPExpr *new_expr = ret_expr->ops[0]->dup();\n                delete ret_expr;\n                ret_expr = new_expr;\n            }\n        }        \n    }\n    for (int i = 0; i < ret_expr->ops.size(); i++) {\n        ret_expr->ops[i] = ret_expr->ops[i]->simplify();\n    }\n    return ret_expr;\n}\n\n__isl_give TPExpr *combine_like_terms(__isl_take TPExpr *expr) {\n    TPExpr *ret_expr = expr;\n\n    if (ret_expr->func == \"add\" || ret_expr->func == \"sub\") {\n        /* Try unite like terms */\n        //if (ret_expr->ops[0]->func == \"mul\") {\n        //    std::cout << \"f1: \" << ret_expr->ops[0]->ops[1]->to_str() << std::endl;\n        //    std::cout << \"f2: \" << ret_expr->ops[1]->to_str() << std::endl;\n        //}\n        if (ret_expr->ops[0]->func == \"mul\" && \n            (ret_expr->ops[0]->ops[1]->to_str() == ret_expr->ops[1]->to_str())) {\n            TPExpr *left = ret_expr->ops[0]->ops[0]->dup();\n            TPExpr *right = ret_expr->ops[0]->ops[1]->dup();\n            if (ret_expr->func == \"add\") {\n                left = left->add(new TPExpr(\"literal\", new TPConst(1)));\n            } else {\n                left = left->subtract(new TPExpr(\"literal\", new TPConst(1)));\n            }\n            TPExpr *new_expr = new TPExpr(\"mul\", left, right);\n            delete ret_expr;\n            ret_expr = new_expr;\n        }\n    }\n\n    ret_expr = const_propagation(ret_expr);\n\n    return ret_expr;\n}\n\n__isl_give TPExpr *simplify_chain_ops(__isl_take TPExpr *expr) {\n    TPExpr *ret_expr = expr;\n\n    if (ret_expr->func == \"mul\") {\n        if (ret_expr->ops[0]->func == \"div\" &&\n            (ret_expr->ops[0]->ops[1]->to_str() == ret_expr->ops[1]->to_str())) {\n            TPExpr *new_expr = ret_expr->ops[0]->ops[0]->dup();\n            delete ret_expr;\n            ret_expr = new_expr;\n        }\n    }\n\n    return ret_expr;\n}\n\n/* Simplify the expression. */\n__isl_give TPExpr *TPExpr::simplify() {\n    TPExpr *ret_expr = this;\n    /* Const propagation */\n    ret_expr = const_propagation(ret_expr);\n    /* Combine like terms */\n    ret_expr = combine_like_terms(ret_expr); \n    /* Simplify chain ops */\n    ret_expr = simplify_chain_ops(ret_expr);\n\n    return ret_expr;\n}\n\n/* Replace the expression that matches \"match\" with replace.\n */\n__isl_give TPExpr *TPExpr::replace(__isl_keep TPExpr *match, __isl_keep TPExpr *replace) {\n    if (this->to_str() == match->to_str()) {\n        /* Matched */\n        delete this;\n        return replace->dup();\n    } else {\n        if (this->func == \"literal\") {\n            return this;\n        } else if (this->func == \"floor\" || this->func == \"ceil\") {\n            this->ops[0] = this->ops[0]->replace(match, replace);        \n            return this;\n        } else if (this->func == \"div\" || this->func == \"add\" || this->func == \"mul\" || \n                   this->func == \"min\" || this->func == \"max\" || this->func == \"sub\") {\n            this->ops[0] = this->ops[0]->replace(match, replace);\n            this->ops[1] = this->ops[1]->replace(match, replace);\n            return this;\n        } else if (this->func == \"NULL\") {\n            return this;\n        } else {\n            std::cout << \"[AutoSA] Error: TPExpr::replace(): Unsupported TPExpr function type: \" << this->func << std::endl;\n            exit(1);\n        }\n    }\n}\n\nstd::string TPExpr::to_str() {\n    if (this->func == \"literal\") {\n        TPExpr *op = this->ops[0];        \n        if (dynamic_cast<TPParameter *>(op)) {            \n            return ((TPParameter *)(op))->name;\n        } else if (dynamic_cast<TPConst *>(op)) {            \n            return std::to_string(((TPConst *)(op))->val);\n        }\n    } else if (this->func == \"floor\") {        \n        std::string ret = \"floor(\";\n        ret += this->ops[0]->to_str();\n        ret += \")\";\n        return ret;\n    } else if (this->func == \"ceil\") {\n        std::string ret = \"ceil(\";\n        ret += this->ops[0]->to_str();\n        ret += \")\";\n        return ret;\n    } else if (this->func == \"div\") {\n        int single_op = 0;\n        std::string l = this->ops[0]->to_str();        \n        std::string r = this->ops[1]->to_str();\n        if (r == \"1\")\n            single_op = 1;            \n        std::string ret = \"\";\n        if (!single_op)\n            ret += \"(\";\n        ret += l;        \n        if (r != \"1\") {\n            ret += (\"/\" + r);\n        }\n        if (!single_op)\n            ret += \")\";        \n        return ret;\n    } else if (this->func == \"add\") {        \n        std::string l = this->ops[0]->to_str();        \n        std::string r = this->ops[1]->to_str();\n        std::string ret = \"(\" + l + \"+\" + r + \")\";\n        return ret;\n    } else if (this->func == \"sub\") {        \n        std::string l = this->ops[0]->to_str();        \n        std::string r = this->ops[1]->to_str();\n        std::string ret = \"(\" + l + \"-\" + r + \")\";\n        return ret;\n    } else if (this->func == \"mul\") {\n        int single_op = 0;        \n        std::string l = this->ops[0]->to_str();        \n        std::string r = this->ops[1]->to_str();\n        if (l == \"1\" || r == \"1\")\n            single_op = 1;\n        std::string ret = \"\";\n        if (!single_op)\n            ret += \"(\";\n        if (l != \"1\")\n            ret += l;\n        if (l != \"1\" && r != \"1\")\n            ret += \"*\";\n        if (r != \"1\")\n            ret += r;        \n        if (!single_op)\n            ret += \")\";\n        return ret;    \n    } else if (this->func == \"min\") {        \n        std::string l = this->ops[0]->to_str();        \n        std::string r = this->ops[1]->to_str();\n        std::string ret = \"min(\" + l + \",\" + r + \")\";\n        return ret;\n    } else if (this->func == \"max\") {        \n        std::string l = this->ops[0]->to_str();        \n        std::string r = this->ops[1]->to_str();\n        std::string ret = \"max(\" + l + \",\" + r + \")\";\n        return ret;\n    } else if (this->func == \"NULL\") {\n        return \"\";\n    } else {\n        std::cout << \"[AutoSA] Error: TPExpr::to_str(): Unsupported TPExpr function type: \" << this->func << std::endl;\n        exit(1);\n    }\n    return \"\";\n}\n\nstd::string TPParameter::to_str() {\n    return this->name;\n}\n\n__isl_give TPExpr *TPExpr::infer_bound(\n    std::unordered_map<std::string, TPExpr *> lbs, \n    std::unordered_map<std::string, TPExpr *> ubs,\n    std::unordered_set<std::string> ignore, int max)\n{    \n    if (this->func == \"literal\") {\n        TPExpr *op = this->ops[0];\n        if (dynamic_cast<TPParameter *>(op)) {          \n            TPParameter *param = (TPParameter *)(op);            \n            if (ignore.find(param->name) != ignore.end()) {\n                return new TPExpr(\"literal\", new TPConst(0));\n            } else if (lbs.find(param->name) != lbs.end() || ubs.find(param->name) != ubs.end()){\n                if (max == 1) {\n                    return (ubs[param->name]->dup())->subtract(new TPExpr(\"literal\", new TPConst(1)));\n                } else {                    \n                    return lbs[param->name]->dup();\n                }\n            } else {\n                return this->dup();\n            }\n        } else if (dynamic_cast<TPConst *>(op)) {                        \n            return this->dup();\n        }\n    } else if (this->func == \"floor\") {\n        std::cout << \"[AutoSA] Error: TPExpr::infer_bound(): Unsupported TPExpr function type: \" << this->func << std::endl;\n        exit(1);\n    } else if (this->func == \"ceil\") {\n        std::cout << \"[AutoSA] Error: TPExpr::infer_bound(): Unsupported TPExpr function type: \" << this->func << std::endl;\n        exit(1);\n    } else if (this->func == \"div\") {\n        std::cout << \"[AutoSA] Error: TPExpr::infer_bound(): Unsupported TPExpr function type: \" << this->func << std::endl;\n        exit(1);\n    } else if (this->func == \"add\") {\n        TPExpr *left, *right;\n        if (max == 1) {\n            left = this->ops[0]->infer_bound(lbs, ubs, ignore, 1);\n            right = this->ops[1]->infer_bound(lbs, ubs, ignore, 1);\n        } else {\n            left = this->ops[0]->infer_bound(lbs, ubs, ignore, 0);\n            right = this->ops[1]->infer_bound(lbs, ubs, ignore, 0);\n        }\n        if (left->to_str() == \"0\" && right->to_str() == \"0\") {\n            delete left;\n            delete right;\n            return new TPExpr(\"literal\", new TPConst(0));\n        } else if (left->to_str() == \"0\") {\n            delete left;\n            return right;\n        } else if (right->to_str() == \"0\") {\n            delete right;\n            return left;\n        } else {\n            return new TPExpr(\"add\", left, right);\n        }\n    } else if (this->func == \"mul\") {\n        TPExpr *left, *right;\n        if (max == 1) {\n            left = this->ops[0]->infer_bound(lbs, ubs, ignore, 1);\n            right = this->ops[1]->infer_bound(lbs, ubs, ignore, 1);\n        } else {\n            left = this->ops[0]->infer_bound(lbs, ubs, ignore, 0);\n            right = this->ops[1]->infer_bound(lbs, ubs, ignore, 0);\n        }\n        if (left->to_str() == \"0\" || right->to_str() == \"0\") {\n            delete left;\n            delete right;\n            return new TPExpr(\"literal\", new TPConst(0));\n        } else\n            return new TPExpr(\"mul\", left, right);\n    } else {\n        std::cout << \"[AutoSA] Error: TPExpr::infer_bound(): Unsupported TPExpr function type: \" << this->func << std::endl;\n        exit(1);\n    }\n    return NULL;\n}\n\nstd::string TPArrayRef::to_str() {\n    std::string ret = this->name;\n    for (auto index : this->index) {\n        ret += (\"[\" + index->to_str() + \"]\");                \n    }\n    return ret;\n}\n\nstatic __isl_give isl_schedule_node *extract_tuning_program_from_schedule(\n    __isl_take isl_schedule_node *node, void *user)\n{\n    if (!node)\n        return NULL;\n    \n    TuningProgram *prog = (TuningProgram *)user;\n\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band) \n    {        \n        int n = isl_schedule_node_band_n_member(node);        \n        for (int i = 0; i < n; i++) {            \n            /* We assume the loop bounds are independent and \n             * all the loops start from zero for now. \n             */                        \n            TPParameter *ub;            \n            if (prog->param_names.size() > 0) {\n                // Use the pre-assigned parameter names\n                ub = new TPParameter(prog->param_names[prog->params.size()], 0);\n                //std::cout << prog->params.size() << std::endl;\n                //std::cout << prog->param_names[prog->params.size()] << std::endl;\n                prog->param_names_cnt[ub->name] = 1;\n            } else {\n                ub = new TPParameter(\"p\" + std::to_string(prog->params.size()));\n            }\n            prog->params.push_back(ub);\n            prog->param_map[ub->name] = ub;\n            ub->tune = false;\n            ub->attr = \"loop_ub\";\n            ub->tags.insert(\"external\");\n\n            TPIterator *iter = new TPIterator(\n                \"c\" + std::to_string(prog->iters.size()),                \n                new TPExpr(\"literal\", new TPConst(0)),\n                new TPExpr(\"literal\", new TPParameter(ub)));            \n            // Assign the iterator to schedule dim                        \n            node = isl_schedule_node_band_member_set_iter(node, i, (void *)iter);            \n            prog->iters.push_back(iter);\n        }                \n    }\n\n    return node;\n}\n\n/* Initialize the tuning program from the schedule. \n * We will bind all the band dimensions in the schedule with an iterator variable to keep then in track.\n * All the future transformations on the band dimensions will also be recored by the tuning program.\n */\n__isl_give isl_schedule *TuningProgram::init_from_schedule(__isl_take isl_schedule *schedule) {\n    // Init the iter field to each dim of the schedule tree\n    // TODO: Add a legality check.\n    // Currently, we require all axis to be independent of each other. And the loop iterators\n    // should start from 0.\n    isl_schedule_node *root = isl_schedule_get_root(schedule);\n    root = isl_schedule_node_map_descendant_bottom_up(root, \n                                                      &extract_tuning_program_from_schedule, this);\n    isl_schedule_free(schedule);\n    schedule = isl_schedule_node_get_schedule(root);\n    isl_schedule_node_free(root);\n\n    return schedule;\n}\n\n/* Load the customized parameter names. */\nvoid TuningProgram::load_param_names(char *path) {\n    if (path == NULL)\n        return;\n    std::ifstream i(path);\n    json namings;\n    i >> namings;\n    std::string kernel_name = \"kernel\" + std::to_string(this->id);    \n    auto kernel_names = namings[kernel_name];    \n    for (std::string n : kernel_names) {\n        this->param_names.push_back(n);        \n    }    \n}\n\n/* Update the band iters after tiling. The \"node\" points to the tile band. \n * Div indicates if the tiling factors should be a divisor of the tiled loop.\n */\n__isl_give isl_schedule_node *TuningProgram::tile(__isl_take isl_schedule_node *node, int div, std::string step)\n{    \n    isl_schedule_node *tile_node = node;\n    isl_schedule_node *point_node = isl_schedule_node_child(isl_schedule_node_copy(node), 0);\n    int n = isl_schedule_node_band_n_member(point_node);\n    for (int i = 0; i < n; i++) {                \n        /* We assume all the loops start from zero for now. */\n        TPIterator *tile_iter = (TPIterator *)isl_schedule_node_band_member_get_iter(tile_node, i);\n        TPParameter *tile_ub = (TPParameter *)(tile_iter->ub->ops[0]);\n        /* Check if the parameter name is customized. \n         * If so, following the same naming fashion.\n         */\n        TPParameter *point_ub;\n        if (this->param_names.size() > 0) {\n            //std::cout << tile_ub->name_prefix << std::endl;\n            point_ub = new TPParameter(tile_ub->name_prefix, this->param_names_cnt[tile_ub->name_prefix]);\n            this->param_names_cnt[tile_ub->name_prefix] += 1;\n        } else {\n            point_ub = new TPParameter(\"p\" + std::to_string(this->params.size()));\n        }\n        point_ub->tune = true;\n        //point_ub->div = div;\n        point_ub->bounds.push_back(std::make_shared<TPExpr>(\"literal\", new TPConst(1)));        \n        this->param_map[tile_ub->to_str()]->split_by = point_ub;\n        point_ub->bounds.push_back(std::make_shared<TPExpr>(\"literal\", new TPParameter(tile_ub)));        \n        if (div) {\n            point_ub->divisors.push_back(std::make_shared<TPExpr>(\"literal\", new TPParameter(tile_ub)));\n        }\n        point_ub->attr = step + \"_tiling_factor\";\n        this->params.push_back(point_ub);\n        this->param_map[point_ub->name] = point_ub;\n                \n        // Update the loop bound\n        if (div == 0)\n            tile_iter->ub = (tile_iter->ub->div_by_param(new TPExpr(\"literal\", new TPParameter(point_ub))))->ceil();\n        else\n            tile_iter->ub = tile_iter->ub->div_by_param(new TPExpr(\"literal\", new TPParameter(point_ub)));\n\n        // Point loop                        \n        TPIterator *point_iter = new TPIterator(\n            \"c\" + std::to_string(this->iters.size()), \n            new TPExpr(\"literal\", new TPConst(0)), \n            new TPExpr(\"literal\", new TPParameter(point_ub)));\n        if (isl_schedule_node_band_member_get_space_time(point_node, i) == autosa_loop_space) {\n            //std::cout << \"iter space: \" << point_iter->name << std::endl;\n            point_iter->space_time = \"space\";\n        } else {\n            point_iter->space_time = \"time\";        \n        }\n        point_node = isl_schedule_node_band_member_set_iter(point_node, i, (void *)point_iter);\n        this->iters.push_back(point_iter);\n\n        // Update the array indices\n        this->update_tiled_arrays(tile_iter, point_iter, point_ub);\n    }\n\n    isl_schedule_node_free(tile_node);\n    node = isl_schedule_node_parent(point_node);    \n\n    return node;\n}\n\n/* Update the band iters after tiling. The \"node\" points to the tile band. \n * Dim \"pos\" in the band is tiled. Point band contains a single loop.\n */\n__isl_give isl_schedule_node *TuningProgram::tile(\n    __isl_take isl_schedule_node *node, int pos, int div, std::string step, std::unordered_set<std::string> tags, int bound)\n{    \n    isl_schedule_node *tile_node = node;\n    isl_schedule_node *point_node = isl_schedule_node_child(isl_schedule_node_copy(node), 0);    \n    TPIterator *tile_iter = (TPIterator *)isl_schedule_node_band_member_get_iter(tile_node, pos);\n    //std::cout << step << \" \" << tile_iter->name << \" \" << tile_iter->space_time << std::endl;\n    TPParameter *tile_ub = (TPParameter *)(tile_iter->ub->ops[0]);\n    //TPParameter *point_ub = new TPParameter(\"p\" + std::to_string(this->params.size()));\n    TPParameter *point_ub;\n    if (this->param_names.size() > 0) {\n        point_ub = new TPParameter(tile_ub->name_prefix, this->param_names_cnt[tile_ub->name_prefix]);\n        this->param_names_cnt[tile_ub->name_prefix] += 1;\n    } else {\n        point_ub = new TPParameter(\"p\" + std::to_string(this->params.size()));\n    }\n    point_ub->tune = true;\n    point_ub->bounds.push_back(std::make_shared<TPExpr>(\"literal\", new TPConst(1)));    \n    this->param_map[tile_ub->to_str()]->split_by = point_ub;\n    point_ub->bounds.push_back(std::make_shared<TPExpr>(\"literal\", new TPParameter(tile_ub)));    \n    if (step == \"SIMD\") {\n        point_ub->bounds[1] = std::shared_ptr<TPExpr>(point_ub->bounds[1]->dup()->min(new TPExpr(\"literal\", new TPConst(bound))));\n    }\n\n    point_ub->attr = step + \"_tiling_factor\";\n    for (auto tag : tags) {\n        point_ub->tags.insert(tag);\n    }\n\n    if (div) \n        point_ub->divisors.push_back(std::make_shared<TPExpr>(\"literal\", new TPParameter(tile_ub)));\n    this->params.push_back(point_ub);\n    this->param_map[point_ub->name] = point_ub;\n            \n    // Update the loop bound\n    if (div == 0)\n        tile_iter->ub = (tile_iter->ub->div_by_param(new TPExpr(\"literal\", new TPParameter(point_ub))))->ceil();\n    else\n        tile_iter->ub = tile_iter->ub->div_by_param(new TPExpr(\"literal\", new TPParameter(point_ub)));\n\n    // Point loop                        \n    TPIterator *point_iter = new TPIterator(\n        \"c\" + std::to_string(this->iters.size()), \n        new TPExpr(\"literal\", new TPConst(0)), \n        new TPExpr(\"literal\", new TPParameter(point_ub)));    \n    if (isl_schedule_node_band_member_get_space_time(point_node, 0) == autosa_loop_space) {\n        point_iter->space_time = \"space\";\n    } else {\n        point_iter->space_time = \"time\";        \n    }        \n    point_node = isl_schedule_node_band_member_set_iter(point_node, 0, (void *)point_iter);\n    this->iters.push_back(point_iter);    \n\n    isl_schedule_node_free(tile_node);\n    node = isl_schedule_node_parent(point_node);\n\n    // Update the array indices\n    this->update_tiled_arrays(tile_iter, point_iter, point_ub);\n\n    return node;\n}\n\n/* Dump out the tuning program information to a JSON file. \n */\nvoid TuningProgram::dump(std::string dir)\n{\n    json j;\n    // params\n    json j_params;\n    for (int i = 0; i < this->params.size(); i++) {\n        json j_param;\n        TPParameter *param = this->params[i];\n        j_param[\"name\"] = param->name;       \n        if (param->split_by)  {\n            j_param[\"split_by\"] = param->split_by->to_str();\n        }\n        for (auto d : param->divisors) {\n            j_param[\"divisors\"].push_back(d->to_str());\n        }        \n        for (auto m : param->multiples) {\n            j_param[\"multiples\"].push_back(m->to_str());\n        }\n        j_param[\"tunable\"] = param->tune;\n        j_param[\"attr\"] = param->attr;    \n        if (param->bounds.size() > 0)\n            j_param[\"bounds\"] = {param->bounds[0]->to_str(), param->bounds[1]->to_str()};        \n        for (auto tag : param->tags) {\n            j_param[\"tags\"].push_back(tag);\n        }\n        j_params.push_back(j_param);\n    }\n    j[\"params\"] = j_params;\n\n    // loop struct - latency    \n    for (auto x: this->module_loop_info) {\n        //std::cout << x.first << std::endl;\n        j[\"latency\"][x.first] = *x.second;\n    }\n    \n    // design stats - resource\n    for (auto x: this->module_memory_info) {\n        j[\"memory\"][x.first] = *x.second;\n    }\n    for (auto x: this->module_compute_info) {        \n        j[\"compute\"][x.first] = *x.second;\n    }\n    for (auto x: this->module_io_info) {\n        j[\"io\"][x.first] = *x.second;\n    }\n\n    for (auto x: this->module_attr) {\n        j[\"attr\"][x.first] = *x.second;\n    }\n\n    std::string file_name = dir + \"/kernel\" + std::to_string(this->id);\n    if (this->id2 >= 0) {\n        file_name += \"_\";        \n        file_name += std::to_string(this->id2);\n    }\n    std::ofstream o(file_name + \".json\");\n    o << std::setw(4) << j << std::endl;\n    o.close();    \n\n    return;\n}\n\n/* Break all band node into single bands, add a comment marker containing the \n * corresponding TPIterator pointer.\n */\nstatic __isl_give isl_schedule_node *modify_tuning_schedule(\n    __isl_take isl_schedule_node *node, void *user)\n{\n    if (!node)\n        return NULL;\n\n    TuningProgram *program = (TuningProgram *)user;\n    isl_ctx *ctx = isl_schedule_node_get_ctx(node);    \n\n    if (isl_schedule_node_get_type(node) == isl_schedule_node_band) {\n        int n = isl_schedule_node_band_n_member(node);\n        for (int i = n - 1; i >= 0; i--) {\n            if (i > 0) {\n                node = isl_schedule_node_band_split(node, i);\n                node = isl_schedule_node_child(node, 0);\n            }\n            TPIterator *iter = (TPIterator *)isl_schedule_node_band_member_get_iter(node, 0);\n            //if (iter) {\n            //    std::cout << iter->name << std::endl;\n            //    std::cout << iter->space_time << std::endl;\n            //}\n            if (iter) {\n                isl_id *id = isl_id_alloc(ctx, \"iter_info\", iter);\n                /* Insert it under the current band node. */\n                node = isl_schedule_node_child(node, 0);\n                node = isl_schedule_node_insert_mark(node, id);\n                node = isl_schedule_node_parent(node); // band node\n            }\n            if (i > 0) {\n                node = isl_schedule_node_parent(node);\n            }\n        }\n        //node = isl_schedule_node_parent(node);\n    }\n\n    return node;\n}\n\n/* This function generates a new schedule used for performance estimation.\n * Specially, all the band dims are broken into single band, and a new mark node is added above \n * each band, which contains the detailed information of the loop iterator.\n */\n__isl_give isl_schedule *TuningProgram::generate_tuning_schedule(__isl_take isl_schedule *schedule) {\n    isl_schedule *new_schedule = isl_schedule_dup(schedule);\n    isl_schedule_free(schedule);    \n\n    isl_schedule_node *root = isl_schedule_get_root(new_schedule);    \n    root = isl_schedule_node_map_descendant_bottom_up(root,\n                                                      &modify_tuning_schedule, this);\n\n    isl_schedule_free(new_schedule);\n    new_schedule = isl_schedule_node_get_schedule(root);\n    isl_schedule_node_free(root);    \n    \n    return new_schedule;\n}\n\nstd::shared_ptr<json> extract_isl_ast_node_user(__isl_keep isl_ast_node *node)\n{\n    isl_ctx *ctx = isl_ast_node_get_ctx(node);\n    isl_ast_expr *expr = isl_ast_node_user_get_expr(node);\n    isl_printer *p_str = isl_printer_to_str(ctx);\n    p_str = isl_printer_set_output_format(p_str, ISL_FORMAT_C);\n    p_str = isl_printer_print_ast_expr(p_str, expr);\n    char *user_expr = isl_printer_get_str(p_str);\n    isl_printer_free(p_str);\n\n    std::shared_ptr<json> info = std::make_shared<json>();\n    std::string user_expr_str(user_expr);\n    (*info)[\"user_expr\"] = user_expr_str;\n\n    free(user_expr);\n    isl_ast_expr_free(expr);\n\n    return info;\n}\n\nstruct extract_loop_info_data {\n    int after_for;\n};\n\nstd::shared_ptr<json> extract_loop_info(__isl_keep isl_ast_node *node, void *user)\n{    \n    std::shared_ptr<json> j_info;\n    enum isl_ast_node_type type;    \n    isl_ctx *ctx = isl_ast_node_get_ctx(node);\n    type = isl_ast_node_get_type(node);\n    struct extract_loop_info_data *data = (struct extract_loop_info_data *)user;\n\n    switch(type) {\n        case isl_ast_node_for:\n        {         \n            data->after_for = 1;            \n            isl_ast_node *child;\n            child = isl_ast_node_for_get_body(node);\n            std::shared_ptr<json> j_child = extract_loop_info(child, user);\n            isl_ast_node_free(child);\n            j_info = j_child;            \n\n            break;\n        }\n        case isl_ast_node_block:\n        {\n            data->after_for = 0;\n            /* Extract the block information and insert it into the loop struc. */\n            j_info = std::make_shared<json>();\n            *j_info = {{\"type\", \"block\"}, {\"child\", {}}};\n            isl_ast_node_list *child_list = isl_ast_node_block_get_children(node);\n            int n_child = isl_ast_node_list_n_ast_node(child_list);\n            for (int i = 0; i < n_child; i++) {\n                isl_ast_node *child = isl_ast_node_list_get_ast_node(child_list, i);\n                std::shared_ptr<json> j_child = extract_loop_info(child, user);\n                isl_ast_node_free(child);\n                (*j_info)[\"child\"].push_back(*j_child);\n            }\n            isl_ast_node_list_free(child_list);            \n            break;\n        }\n        case isl_ast_node_user:\n        {\n            data->after_for = 0;\n            /* Print nothing. */\n            j_info = std::make_shared<json>();\n            std::shared_ptr<json> j_user = extract_isl_ast_node_user(node);\n            *j_info = {{\"type\", \"user\"}, {\"child\", *j_user}};            \n            break;\n        }\n        case isl_ast_node_if: \n        {\n            data->after_for = 0;\n            j_info = std::make_shared<json>();\n            *j_info = {{\"type\", \"if\"}, {\"child\", {}}};\n            isl_ast_node *then_child, *else_child;\n            then_child = isl_ast_node_if_get_then_node(node);\n            std::shared_ptr<json> j_then = extract_loop_info(then_child, user);\n            isl_ast_node_free(then_child);\n            (*j_info)[\"child\"].push_back(*j_then);\n\n            else_child = isl_ast_node_if_get_else_node(node);\n            if (else_child) {\n                std::shared_ptr<json> j_else = extract_loop_info(else_child, user);\n                isl_ast_node_free(else_child);\n                (*j_info)[\"child\"].push_back(*j_else);\n            }            \n            break;\n        }\n        case isl_ast_node_mark: \n        {            \n            isl_id *id = isl_ast_node_mark_get_id(node);                        \n            TPIterator *iter = NULL;\n            if (!strcmp(isl_id_get_name(id), \"iter_info\")) {\n                if (data->after_for == 1) {\n                    /* For loop */                \n                    isl_ast_node *child = isl_ast_node_mark_get_node(node);\n                    data->after_for = 0;\n                    std::shared_ptr<json> j_child = extract_loop_info(child, user);\n                    isl_ast_node_free(child);\n                    iter = (TPIterator *)isl_id_get_user(id);\n                    if (iter) {\n                        j_info = std::make_shared<json>();\n                        *j_info = {{\"type\", \"for\"}, {\"iterator\", iter->name}};\n                        (*j_info)[\"bounds\"].push_back(iter->lb->to_str());                \n                        (*j_info)[\"bounds\"].push_back(iter->ub->to_str());\n                        (*j_info)[\"child\"] = *j_child;\n                    } else {\n                        j_info = j_child;\n                    }  \n                } else {\n                    /* Skip this one */\n                    isl_ast_node *child = isl_ast_node_mark_get_node(node);\n                    std::shared_ptr<json> j_child = extract_loop_info(child, user);\n                    isl_ast_node_free(child);\n                    j_info = j_child;\n                }                             \n            } else if (!strcmp(isl_id_get_name(id), \"tuning_array_tile\")) {\n                data->after_for = 0;\n                /* Print the array information */\n                TPArrayTile *tile = (TPArrayTile *)isl_id_get_user(id);\n                j_info = std::make_shared<json>();\n                *j_info = {{\"type\", \"array_tile\"}, {\"data_pack_factor\", tile->data_pack_factor_inter->name}};\n                std::string size = \"\";\n                int is_first = 1;\n                for (auto s : tile->sizes) {\n                    if (!is_first)\n                        size += \"*\";\n                    size += s->to_str();\n                    is_first = 0;\n                }\n                (*j_info)[\"size\"] = size;\n                (*j_info)[\"ele_size\"] = tile->ele_size;\n                (*j_info)[\"last_dim\"] = tile->sizes[tile->sizes.size() - 1]->to_str();\n            } else {\n                std::string mark_content(isl_id_get_name(id));\n                j_info = std::make_shared<json>();\n                *j_info = {{\"type\", \"mark\"}, {\"content\", mark_content}};\n                isl_ast_node *child = isl_ast_node_mark_get_node(node);\n                data->after_for = 0;\n                std::shared_ptr<json> j_child = extract_loop_info(child, user);\n                isl_ast_node_free(child);                \n                (*j_info)[\"child\"] = *j_child;\n            }\n            isl_id_free(id);                        \n\n            break;\n        }\n        default:\n        {\n            data->after_for = 0;\n            break;\n        }\n    }\n\n    return j_info;\n}\n\n/* Extract the loop structure from the \"ast\", used for latency estimation.\n * TODO: Extract the hw information for resource estimation. \n */\nvoid TuningProgram::extract_module_loop_info(std::string name, std::vector<isl_ast_node *> &ast) \n{\n    if (ast.size() == 0)\n        return;\n            \n    if (ast.size() == 1) {\n        std::shared_ptr<json> j_loop;    \n        struct extract_loop_info_data data = {0};\n        j_loop = extract_loop_info(ast[0], &data);\n        this->module_loop_info[name] = j_loop;\n    } else if (ast.size() == 3) {        \n        // outer module\n        std::shared_ptr<json> j_loop1;\n        struct extract_loop_info_data data = {0};\n        j_loop1 = extract_loop_info(ast[0], &data);\n        this->module_loop_info[name] = j_loop1;\n        // intra module\n        std::shared_ptr<json> j_loop2;\n        data.after_for = 0;\n        j_loop2 = extract_loop_info(ast[1], &data);\n        this->module_loop_info[name + \"_intra\"] = j_loop2;\n        // inter module\n        std::shared_ptr<json> j_loop3;\n        data.after_for = 0;\n        j_loop3 = extract_loop_info(ast[2], &data);\n        this->module_loop_info[name + \"_inter\"] = j_loop3;\n    }\n\n    return;\n}\n\nvoid TuningProgram::extract_module_attr(\n    std::string name, int double_buffer, int in, int io, int to_dram, int serialize, int to_pe, int filter) {\n    std::shared_ptr<json> j = std::make_shared<json>();    \n    (*j)[\"double_buffer\"] = double_buffer;\n    (*j)[\"in\"] = in;\n    (*j)[\"io\"] = io;\n    (*j)[\"to_dram\"] = to_dram;\n    (*j)[\"serialize\"] = serialize;\n    (*j)[\"to_pe\"] = to_pe;\n    (*j)[\"filter\"] = filter;\n\n    this->module_attr[name] = j;\n\n    return;\n}\n\nstruct build_dim_iter_map_data {\n    isl_map *ref;\n    isl_map *new_ref;\n    std::unordered_map<int, TPIterator *> dim_iter_map;  \n    TPExpr *dim_expr;\n    int done;\n};\n\n/* Test if the partial schedule above the \"node\" matches the \"domain\".\n * If so, climb the schedule tree and update the mapping between the schedule dimension and the \n * TPIterator.\n */\n__isl_give isl_schedule_node *build_dim_iter_map(__isl_take isl_schedule_node *node, void *user)\n{    \n    struct build_dim_iter_map_data *data = (struct build_dim_iter_map_data *)user;\n    if (data->done)\n        return node;\n\n    isl_union_set *domain = isl_schedule_node_get_domain(node);\n    isl_union_set *ref_domain = isl_union_set_from_set(isl_map_domain(isl_map_copy(data->ref)));\n    if (!isl_union_set_is_empty(domain) && isl_union_set_is_strict_subset(domain, ref_domain)) {                \n        isl_union_map *prefix = isl_schedule_node_get_prefix_schedule_relation(node);\n        data->new_ref = isl_map_from_union_map(isl_union_map_apply_domain(\n            isl_union_map_from_map(isl_map_copy(data->ref)), prefix));            \n        data->done = 1; \n        isl_schedule_node *new_node = isl_schedule_node_copy(node);\n        while (isl_schedule_node_has_parent(new_node)) {\n            if (isl_schedule_node_get_type(new_node) == isl_schedule_node_band) {\n                isl_set *new_prefix_sched_domain = \n                    isl_set_from_union_set(isl_union_map_range(isl_schedule_node_get_prefix_schedule_relation(new_node)));\n  \n                int n = isl_schedule_node_band_n_member(new_node);\n                for (int i = 0; i < n; i++) {\n                    TPIterator *iter = (TPIterator *)isl_schedule_node_band_member_get_iter(new_node, i);\n                    if (iter) {                        \n                        data->dim_iter_map[isl_set_dim(new_prefix_sched_domain, isl_dim_set) + i] = iter;\n                    }\n                }\n                isl_set_free(new_prefix_sched_domain);\n            }\n            new_node = isl_schedule_node_parent(new_node);        \n        }\n        isl_schedule_node_free(new_node);\n    }\n    isl_union_set_free(domain);\n    isl_union_set_free(ref_domain);  \n\n    return node;\n}\n\nisl_stat extract_dim_expr(__isl_take isl_basic_map *bmap, void *user) \n{\n    struct build_dim_iter_map_data *data = (struct build_dim_iter_map_data *)user;    \n    isl_mat *cst_mat = isl_basic_map_equalities_matrix(\n        bmap, isl_dim_in, isl_dim_param, isl_dim_cst, isl_dim_div, isl_dim_out\n    );        \n    assert(isl_basic_map_dim(bmap, isl_dim_param) == 0);\n    assert(isl_basic_map_dim(bmap, isl_dim_div) == 0);\n    for (int r = 0; r < isl_mat_rows(cst_mat); r++) {\n        isl_val *val = isl_mat_get_element_val(cst_mat, r, \n            isl_basic_map_dim(bmap, isl_dim_in) + isl_basic_map_dim(bmap, isl_dim_param)\n            + isl_basic_map_dim(bmap, isl_dim_cst) + isl_basic_map_dim(bmap, isl_dim_div));\n        int val_i = isl_val_get_num_si(val);\n        isl_val_free(val);\n        if (val_i != 1) {\n            continue;\n        }\n        for (int i = 0; i < isl_basic_map_dim(bmap, isl_dim_in); i++) {\n            isl_val *val = isl_mat_get_element_val(cst_mat, r, i);\n            int val_i = isl_val_get_num_si(val);        \n            if (val_i != 0) {\n                auto it = data->dim_iter_map.find(i);\n                if (it != data->dim_iter_map.end()) {\n                    TPIterator *iter = data->dim_iter_map[i];                    \n                    TPExpr *expr = new TPExpr(\n                        \"mul\", \n                        new TPExpr(\"literal\", new TPConst(val_i * (-1))), \n                        new TPExpr(\"literal\", new TPParameter(iter->name))\n                    );                    \n                    data->dim_expr = data->dim_expr->add(expr);                    \n                }\n            }\n\n            isl_val_free(val);\n        }\n        for (int i = 0; i < isl_basic_map_dim(bmap, isl_dim_cst); i++) {            \n            isl_val *val = isl_mat_get_element_val(cst_mat, r, isl_basic_map_dim(bmap, isl_dim_in) + i);\n            int val_i = isl_val_get_num_si(val);\n            if (val_i != 0) \n                data->dim_expr = data->dim_expr->add(new TPExpr(\"literal\", new TPConst(val_i * (-1))));            \n            isl_val_free(val);\n        }\n    }\n\n    isl_mat_free(cst_mat);\n    isl_basic_map_free(bmap);\n\n    return isl_stat_ok;\n}\n\nstd::shared_ptr<TPArrayRef> TuningProgram::build_array_ref(\n    std::string name, __isl_keep isl_map *ref, __isl_keep isl_schedule *schedule)\n{\n    // Step 1: Build the mapping between the sched dims to the loop iterators\n    // i0 -> c0\n    // i1 -> c1\n    // i2 -> c2    \n    struct build_dim_iter_map_data data;\n    data.ref = ref;\n    data.done = 0;\n    isl_schedule_node *root = isl_schedule_get_root(schedule);\n    root = isl_schedule_node_map_descendant_bottom_up(root, &build_dim_iter_map, &data);    \n    isl_schedule_node_free(root);\n    \n    // Step 2: Parse the access map to build the array reference\n    // [i0, i1, i2, 1] -> A[i0, i2];\n    // class array_ref\n    // {\n    //   std::string name; // A\n    //   std::vector<TPExpr *> index; // [i0, i2]\n    // }\n    auto tp_ref = std::make_shared<TPArrayRef>();\n    tp_ref->name = name;\n    int dim = isl_map_dim(ref, isl_dim_out);\n    for (int i = 0; i < dim; i++) {\n        // Project all the other output dims\n        isl_map *ref_dim = isl_map_project_out(isl_map_copy(data.new_ref), isl_dim_out, 0, i);\n        ref_dim = isl_map_project_out(ref_dim, isl_dim_out, 1, dim - i - 1);\n        TPExpr *dim_expr = new TPExpr();\n        data.dim_expr = dim_expr;\n        isl_map_foreach_basic_map(ref_dim, &extract_dim_expr, &data);\n        isl_map_free(ref_dim);\n        tp_ref->index.push_back(data.dim_expr);        \n    }\n    isl_map_free(data.new_ref);        \n\n    return tp_ref;\n}\n\n/* Update the array indices after tiling. \n * Find the original parameter with the name as \"tile_iter\", replace it with a new expression\n * tile_iter * tile_factor + point_iter\n */\nvoid TuningProgram::update_tiled_arrays(TPIterator *tile_iter, TPIterator *point_iter, TPParameter *tile_factor)\n{    \n    for (int i = 0; i < this->arrays.size(); i++) {\n        TPArray *arr = this->arrays[i];\n        for (int j = 0; j < arr->refs.size(); j++) {\n            TPArrayRef *ref = arr->refs[j].get();     \n            for (int n = 0; n < ref->index.size(); n++) {\n                TPExpr *old_expr = new TPExpr(\"literal\", new TPParameter(tile_iter->name));\n                TPExpr *new_expr = new TPExpr(\"literal\", new TPParameter(tile_iter->name));\n                new_expr = (new_expr->mul(new TPExpr(\"literal\", new TPParameter(tile_factor))))\n                            ->add(new TPExpr(\"literal\", new TPParameter(point_iter->name)));\n                ref->index[n] = ref->index[n]->replace(old_expr, new_expr);\n                delete old_expr;\n                delete new_expr;\n            }            \n        }\n    }    \n}\n\nstd::vector<TPExpr *> TuningProgram::infer_tiled_array_bound_at_dim(int dim, std::vector<std::shared_ptr<TPArrayRef>> refs, std::vector<TPIterator *> fixed_iters)\n{\n    TPExpr *lb = new TPExpr();\n    TPExpr *ub = new TPExpr();\n    std::unordered_map<std::string, TPExpr *> iter_ubs;\n    for (auto iter : this->iters) {        \n        iter_ubs[iter->name] = iter->ub;\n    }\n    std::unordered_map<std::string, TPExpr *> iter_lbs;\n    for (auto iter : this->iters) {        \n        iter_lbs[iter->name] = iter->lb;\n    }\n    std::unordered_set<std::string> ignore_iters;\n    for (auto iter : fixed_iters) {        \n        ignore_iters.insert(iter->name);\n    }\n    for (auto ref : refs) {\n        TPExpr *index = ref->index[dim];        \n        TPExpr *local_lb = index->infer_bound(iter_lbs, iter_ubs, ignore_iters, 0);        \n        TPExpr *local_ub = index->infer_bound(iter_lbs, iter_ubs, ignore_iters, 1);\n        lb = lb->min(local_lb);\n        ub = ub->max(local_ub);\n    }    \n    TPExpr *size = (ub->subtract(lb->dup()))->add(new TPExpr(\"literal\", new TPConst(1)));    \n    size = size->simplify();\n    std::vector<TPExpr *> ret = {lb, size};\n\n    return ret;\n}\n\n/* Given the fixed iters, infer the maximal bounds of the tiled array given the refs.\n * Construct a array tile object and return it.\n */\nTPArrayTile *TuningProgram::infer_tiled_array_bounds(TPArrayTile *tile, std::vector<std::shared_ptr<TPArrayRef>> refs, std::vector<TPIterator *> fixed_iters)\n{        \n    std::vector<TPExpr *> lbs;\n    std::vector<TPExpr *> sizes;\n    int dim = refs[0]->index.size();\n    for (int i = 0; i < dim; i++) {\n        std::vector<TPExpr *> ret = this->infer_tiled_array_bound_at_dim(i, refs, fixed_iters);\n        lbs.push_back(ret[0]);\n        sizes.push_back(ret[1]);        \n    }    \n\n    tile->lbs = lbs;\n    tile->sizes = sizes;\n\n    return tile;\n}\n\nstd::shared_ptr<TPExpr> TPArrayTile::compute_size() {\n    TPExpr *size = new TPExpr();\n    for (auto s : this->sizes) {\n        size = size->mul(s->dup());\n    }\n    return std::shared_ptr<TPExpr>(size);\n}\n\nstd::shared_ptr<TPExpr> TPIterator::compute_size() {\n    TPExpr *size = this->ub->dup();\n    size = size->subtract(this->lb->dup());    \n    return std::shared_ptr<TPExpr>(size);\n}\n\nstruct mul_space_dim_data {    \n    TPExpr *num;\n    int after_for;\n};\n\nisl_bool mul_space_dim(__isl_keep isl_ast_node *node, void *user) {\n    struct mul_space_dim_data *data = (struct mul_space_dim_data *)user;\n    if (isl_ast_node_get_type(node) == isl_ast_node_for) {\n        data->after_for = 1;        \n    } else if (isl_ast_node_get_type(node) == isl_ast_node_mark) {\n        isl_id *id = isl_ast_node_mark_get_id(node);\n        if (!strcmp(isl_id_get_name(id), \"iter_info\") and data->after_for) {\n            TPIterator *iter = (TPIterator *)isl_id_get_user(id);                        \n            if (iter && iter->space_time == \"space\") {\n                data->num = data->num->mul(iter->compute_size().get()->dup());\n            }\n        }\n        isl_id_free(id);\n        data->after_for = 0;\n    } else {\n        data->after_for = 0;\n    }\n    return isl_bool_true;\n}\n\nstd::shared_ptr<TPExpr> TuningProgram::extract_module_num(isl_ast_node *tree)\n{\n    TPExpr *num = new TPExpr(\"literal\", new TPConst(1));\n    struct mul_space_dim_data data;\n    data.num = num;    \n    data.after_for = 0;\n    isl_ast_node_foreach_descendant_top_down(tree, &mul_space_dim, &data);\n    return std::shared_ptr<TPExpr>(data.num);\n}\n\nstruct extract_space_dim_data {    \n    std::vector<std::shared_ptr<TPExpr>> dims;\n    int after_for;\n    int after_array;\n    int io_level;\n};\n\nisl_bool extract_space_dim(__isl_keep isl_ast_node *node, void *user) {\n    struct extract_space_dim_data *data = (struct extract_space_dim_data *)user;\n    if (isl_ast_node_get_type(node) == isl_ast_node_for) {\n        data->after_for = 1;\n    } else if (isl_ast_node_get_type(node) == isl_ast_node_mark) {\n        isl_id *id = isl_ast_node_mark_get_id(node);\n        if (!strcmp(isl_id_get_name(id), \"iter_info\") and data->after_for) {\n            TPIterator *iter = (TPIterator *)isl_id_get_user(id);                        \n            if (iter && iter->space_time == \"space\") {\n                data->dims.push_back(std::shared_ptr<TPExpr>(iter->compute_size().get()->dup()));                \n            }\n        }\n        isl_id_free(id);\n        data->after_for = 0;\n    } else {\n        data->after_for = 0;\n    }\n    return isl_bool_true;\n}\n\nstd::vector<std::shared_ptr<TPExpr>> TuningProgram::extract_module_dims(isl_ast_node *tree)\n{\n    struct extract_space_dim_data data;\n    data.after_for = 0;\n    isl_ast_node_foreach_descendant_top_down(tree, &extract_space_dim, &data);\n    return data.dims;\n}\n\nisl_bool extract_space_dim_io(__isl_keep isl_ast_node *node, void *user) {\n    /* Stop at the io_mark \"io_level\" */\n    struct extract_space_dim_data *data = (struct extract_space_dim_data *)user;    \n    if (isl_ast_node_get_type(node) == isl_ast_node_mark) {\n        isl_id *id = isl_ast_node_mark_get_id(node);\n        if (!strcmp(isl_id_get_name(id), \"iter_info\")) {            \n            TPIterator *iter = (TPIterator *)isl_id_get_user(id);                        \n            if (iter && (data->after_array || iter->space_time == \"space\")) {                                \n                data->dims.push_back(std::shared_ptr<TPExpr>(iter->compute_size().get()->dup()));                \n            }\n        }\n        char io_mark[20];\n        sprintf(io_mark, \"io_L%d\", data->io_level);        \n        if (!strcmp(isl_id_get_name(id), io_mark)) {\n            isl_id_free(id);                          \n            return isl_bool_false;\n        }        \n        if (!strcmp(isl_id_get_name(id), \"array\")) {\n            data->after_array = 1;\n        }        \n        isl_id_free(id);\n    }    \n    return isl_bool_true;\n}\n\nstd::vector<std::shared_ptr<TPExpr>> TuningProgram::extract_module_dims_io(isl_ast_node *tree, int io_level)\n{    \n    struct extract_space_dim_data data;    \n    data.after_for = 0;\n    data.after_array = 0;\n    data.io_level = io_level;    \n    isl_ast_node_foreach_descendant_top_down(tree, &extract_space_dim_io, &data);\n    return data.dims;\n}\n\nvoid TuningProgram::extract_module_memory_info(std::string name, int double_buffer, TPArrayTile *tile, \n    std::vector<isl_ast_node *> &asts)\n{\n    auto j_memory = std::make_shared<json>();\n    // Extract number of modules, double buffer, ele_type, ele_size, buffer_size, data_pack_factor\n    (*j_memory)[\"double_buffer\"] = double_buffer;\n    (*j_memory)[\"array\"] = tile->name;\n    (*j_memory)[\"ele_type\"] = tile->type;\n    (*j_memory)[\"ele_size\"] = tile->ele_size;    \n    (*j_memory)[\"buf_size\"] = tile->compute_size()->to_str();\n    if (tile->data_pack_factor_inter)\n        (*j_memory)[\"data_pack_factor_inter\"] = tile->data_pack_factor_inter->to_str();\n    if (tile->data_pack_factor_intra)\n        (*j_memory)[\"data_pack_factor_intra\"] = tile->data_pack_factor_intra->to_str();\n    TPExpr *num = new TPExpr(\"literal\", new TPConst(1));\n    for (isl_ast_node *ast : asts) {\n        num = num->mul(this->extract_module_num(ast).get()->dup());\n    }\n    (*j_memory)[\"num\"] = num->to_str();\n    delete num;\n    this->module_memory_info[name] = j_memory;\n}\n\nvoid TuningProgram::extract_module_compute_info(std::string name, std::string arr_type, isl_ast_node *tree)\n{\n    auto j_compute = std::make_shared<json>();\n    // Extract number of modules, unroll factor, array type\n    for (auto p : this->params) {\n        if (p->attr == \"SIMD_tiling_factor\")\n            (*j_compute)[\"unroll_factor\"] = p->name;\n    }\n    (*j_compute)[\"ele_type\"] = arr_type;\n    std::shared_ptr<TPExpr> num = this->extract_module_num(tree);    \n    (*j_compute)[\"num\"] = num->to_str();\n    std::vector<std::shared_ptr<TPExpr>> dims = this->extract_module_dims(tree);\n    for (auto dim : dims)\n        (*j_compute)[\"dims\"].push_back(dim->to_str());\n    \n    this->module_compute_info[name] = j_compute;\n}\n\nvoid TuningProgram::extract_module_io_info(std::string name, int io_level, std::vector<isl_ast_node *> &asts)\n{\n    auto j_io = std::make_shared<json>();\n    // Extract dims of io modules\n    for (isl_ast_node *ast : asts) {\n        std::vector<std::shared_ptr<TPExpr>> dims = this->extract_module_dims_io(ast, io_level);\n        for (auto dim : dims)\n            (*j_io)[\"dims\"].push_back(dim->to_str());\n    }\n    if ((*j_io)[\"dims\"].size() == 0) {\n        TPExpr *num = new TPExpr(\"literal\", new TPConst(1));\n        (*j_io)[\"dims\"].push_back(num->to_str());\n        delete num;\n    }\n\n\n    this->module_io_info[name] = j_io;\n}"
  },
  {
    "path": "src/autosa_tuning.h",
    "content": "#ifndef _AUTOSA_TUNING_H\n#define _AUTOSA_TUNING_H\n\n#include <isl/schedule.h>\n#include <isl/schedule_node.h>\n#include <isl/constraint.h>\n\n#include <string>\n#include <vector>\n#include <unordered_map>\n#include <unordered_set>\n\n#include \"json.hpp\"\n#include \"autosa_utils.h\"\n\nusing json = nlohmann::json;\n\n#if defined(__cplusplus)\nextern \"C\" {\n#endif    \n\n//class TPTransformHistory {\n//    public:\n//        TPTransformHistory(){}\n//};\n\n//class TPStatement {\n//    public:         \n//};\n\nclass TPExpr {\n    public:\n        TPExpr() {func = \"NULL\";}\n        TPExpr(std::string f, TPExpr *op) {\n            func = f;\n            ops.push_back(op);\n        }\n        TPExpr(std::string f, TPExpr *op1, TPExpr *op2) {\n            func = f;\n            ops.push_back(op1);\n            ops.push_back(op2);\n        }\n\n        TPExpr *div_by_param(TPExpr *divisor);\n        TPExpr *ceil();\n        TPExpr *add(TPExpr *expr);        \n        TPExpr *mul(TPExpr *expr);\n        TPExpr *subtract(TPExpr *expr); // TODO\n        TPExpr *min(TPExpr *expr);\n        TPExpr *max(TPExpr *expr);\n\n        TPExpr *infer_bound(\n            std::unordered_map<std::string, TPExpr *> lbs, \n            std::unordered_map<std::string, TPExpr *> ubs,\n            std::unordered_set<std::string> ignore, int max);\n        TPExpr *simplify();\n        TPExpr *replace(TPExpr *match, TPExpr *replace);\n        TPExpr *dup();\n        virtual std::string to_str();\n        \n        std::string func; // [floor, ceil, div, literal, mul, null, min, max, sub, add]\n        std::vector<TPExpr *> ops;        \n        \n        virtual ~TPExpr() {            \n            for (int i = 0; i < ops.size(); i++) {                \n                delete ops[i];\n            }            \n        }\n};\n\nclass TPIterator {\n    public:\n        TPIterator(){}\n        TPIterator(std::string n, TPExpr *l, TPExpr *u) {\n            name = n;\n            lb = l;\n            ub = u;\n        }\n        std::shared_ptr<TPExpr> compute_size();\n        std::string name;\n        TPExpr *lb;\n        TPExpr *ub;     \n        std::string space_time;\n        ~TPIterator() {\n            delete lb;\n            delete ub;\n        }\n};\n\n/* Tunable parameters by the tuner. */\nclass TPParameter: public TPExpr {\n    public:\n        TPParameter() {}\n        TPParameter(std::string n) {\n            name = n;\n            type = \"param\";        \n            tune = false;\n            split_by = NULL;\n        }\n        TPParameter(std::string n_prefix, int cnt) {\n            if (cnt == 0) {\n                name = n_prefix;\n            } else {\n                /* Tiling factors. */\n                name = n_prefix + \"_t\" + std::to_string(cnt);\n            }\n            name_prefix = n_prefix;\n            type = \"param\";        \n            tune = false;\n            split_by = NULL;\n        }\n        TPParameter(TPParameter *p) {\n            name = p->name;\n            name_prefix = p->name_prefix;\n            type = p->type;            \n            tune = p->tune;\n            attr = p->attr;                        \n            split_by = p->split_by;\n        }     \n        TPParameter *dup();\n        std::string to_str();\n\n        std::string name;\n        std::string name_prefix;\n        std::string type;        \n        std::vector<std::shared_ptr<TPExpr>> bounds;        \n        bool tune;\n        /* The parameter is divisors of the following exps. */\n        std::vector <std::shared_ptr<TPExpr>> divisors; \n        /* The parameter is multiples of the following exps. */\n        std::vector <std::shared_ptr<TPExpr>> multiples;    \n        TPParameter *split_by;\n        /* Other constraint tags for this parameters. \n         * \"power_of_two\", this parameter should be a power of 2.\n         * \"auto_infer\", this parameter will be auto-inferred by other parameters.\n         * \"external\", this parameter will be provided externally.\n         */\n        std::unordered_set<std::string> tags;\n        std::string attr;\n        virtual ~TPParameter(){\n            //for (int i = 0; i < bounds.size(); i++)\n            //    delete bounds[i];\n            //for (int i = 0; i > divisors.size(); i++)\n            //    delete divisors[i];\n            //for (int i = 0; i > multiples.size(); i++)\n            //    delete multiples[i];\n        }\n};\n\nclass TPConst: public TPExpr {\n    public:\n        TPConst() {}\n        TPConst(int v) {\n            type = \"const\";\n            val = v;\n        }\n        TPConst *dup();\n\n        std::string type;\n        int val;\n};\n\nclass TPArrayRef {\n    public:\n        TPArrayRef(){}\n        TPArrayRef(std::string n, std::vector<TPExpr *> idx) {\n            name = n;\n            for (auto i : idx) {\n                index.push_back(i);\n            }\n        }\n        std::string name;\n        std::vector<TPExpr *> index;\n        std::string to_str();\n        ~TPArrayRef() {\n            for (auto i : index) {\n                delete i;\n            }\n        }\n};\n\nclass TPArray {\n    public:\n        TPArray(){}\n        TPArray(std::string n) {name = n;}\n        std::string name;\n        std::vector<std::shared_ptr<TPArrayRef>> refs;\n        ~TPArray() {\n            //for (auto ref : refs) \n            //    delete ref;\n        }\n};\n\nclass TPArrayTile {\n    public:\n        TPArrayTile(){data_pack_factor_inter = NULL; data_pack_factor_intra = NULL;}\n        std::string name;\n        std::string type;\n        int ele_size; \n        std::vector<TPExpr *> lbs;\n        std::vector<TPExpr *> sizes;\n        TPParameter *data_pack_factor_inter;\n        std::shared_ptr<TPExpr> data_pack_factor_intra;\n        std::shared_ptr<TPExpr> compute_size();\n        ~TPArrayTile() {\n            for (auto lb : lbs) {\n                delete lb;\n            }\n            for (auto size : sizes) {\n                delete size;\n            }\n        }\n};\n\nclass TuningProgram {\n    public:\n        TuningProgram(){id2 = -1;};\n        /* Initialize the tuning program from an ISL schedule */\n        __isl_give isl_schedule *init_from_schedule(__isl_take isl_schedule *schedule);\n        __isl_give isl_schedule_node *tile(__isl_take isl_schedule_node *node, int div, std::string step);\n        __isl_give isl_schedule_node *tile(\n            __isl_take isl_schedule_node *node, int pos, int div, std::string step, std::unordered_set<std::string> tags, int bound);\n        void dump(std::string dir);\n        __isl_give isl_schedule *generate_tuning_schedule(__isl_take isl_schedule *schedule);\n        __isl_give isl_schedule *generate_io_tuning_schedule(__isl_take isl_schedule *schedule, int io_level);\n        void extract_module_loop_info(std::string name, std::vector<isl_ast_node *> &tree);\n        std::shared_ptr<TPExpr> extract_module_num(isl_ast_node *tree);\n        //std::shared_ptr<TPExpr> extract_io_module_num(isl_ast_node *tree, int io_level);\n        std::vector<std::shared_ptr<TPExpr>> extract_module_dims(isl_ast_node *tree);\n        std::vector<std::shared_ptr<TPExpr>> extract_module_dims_io(isl_ast_node *tree, int io_level);\n        void extract_module_memory_info(std::string name, int double_buffer, TPArrayTile *tile, std::vector<isl_ast_node *> &tree);\n        void extract_module_compute_info(std::string name, std::string arr_type, isl_ast_node *tree);\n        void extract_module_io_info(std::string name, int io_level, std::vector<isl_ast_node *> &tree);\n        void extract_module_attr(std::string name, int double_buffer, int in, int io, int to_dram, int serialize, int to_pe, int filter);\n        std::shared_ptr<TPArrayRef> build_array_ref(std::string name, __isl_keep isl_map *ref, __isl_keep isl_schedule *);\n        void update_tiled_arrays(TPIterator *tile_iter, TPIterator *point_iter, TPParameter *tile_factor);\n        TPArrayTile *infer_tiled_array_bounds(TPArrayTile *tile, std::vector<std::shared_ptr<TPArrayRef>> refs, std::vector<TPIterator *> fixed_iters);\n        std::vector<TPExpr *> infer_tiled_array_bound_at_dim(int dim, std::vector<std::shared_ptr<TPArrayRef>> refs, std::vector<TPIterator *> fixed_iters);\n        TPExpr *infer_array_index_lb(TPExpr *, std::vector<TPIterator *> fixed_iters);\n        TPExpr *infer_array_index_ub(TPExpr *, std::vector<TPIterator *> fixed_iters);\n        void load_param_names(char *path);\n\n        std::vector<TPIterator *> iters;        \n        std::vector<TPParameter *> params;                \n        std::vector<TPArray *> arrays;\n        // Maps the parameter name to the point in \"params\"\n        std::unordered_map<std::string, TPParameter *> param_map;        \n        // kernel id to the tuning program\n        int id;\n        // second-level id for loop permutation\n        int id2;\n        std::unordered_map<std::string, std::shared_ptr<json>> module_loop_info;        \n        std::unordered_map<std::string, std::shared_ptr<json>> module_memory_info;\n        std::unordered_map<std::string, std::shared_ptr<json>> module_compute_info;\n        std::unordered_map<std::string, std::shared_ptr<json>> module_io_info;\n        std::unordered_map<std::string, std::shared_ptr<json>> module_attr;\n        std::vector<std::string> param_names;\n        std::unordered_map<std::string, int> param_names_cnt;\n\n        ~TuningProgram() {                        \n            for (int i = 0; i < iters.size(); i++)\n                delete iters[i];            \n            for (int i = 0; i < params.size(); i++)\n                delete params[i];     \n            for (int i = 0; i < arrays.size(); i++)        \n                delete arrays[i];\n        }\n\n        // Future use\n        //std::unordered_set<TPStatement *> stmts;\n        //std::vector<TPTransformHistory *> transform_history;\n        //std::unordered_map<TPIterator *, TPIterator *> iter_map;\n        //std::unordered_map<TPStatement *, TPStatement *> stmt_map;\n};\n\n#if defined(__cplusplus)\n}\n#endif  \n\n#endif"
  },
  {
    "path": "src/autosa_utils.cpp",
    "content": "#include <assert.h>\n#include <string.h>\n#include <ctype.h>\n#include <stdexcept>\n#include <limits>\n#include <cmath>\n\n#include <isl/space.h>\n#include <barvinok/isl.h>\n\n#include \"autosa_utils.h\"\n\n__isl_give isl_union_map *extract_sizes_from_str(isl_ctx *ctx, const char *str)\n{\n  if (!str)\n    return NULL;\n  return isl_union_map_read_from_str(ctx, str);\n}\n\n/* Concat the basic maps in the map \"el\" with the basic map list \"user\". \n */\nstatic isl_stat concat_basic_map(__isl_take isl_map *el, void *user)\n{\n  isl_basic_map_list **bmap_list = (isl_basic_map_list **)(user);\n  isl_basic_map_list *bmap_list_sub = isl_map_get_basic_map_list(el);\n  if (!(*bmap_list))\n  {\n    *bmap_list = bmap_list_sub;\n  }\n  else\n  {\n    *bmap_list = isl_basic_map_list_concat(*bmap_list, bmap_list_sub);\n  }\n\n  isl_map_free(el);\n  return isl_stat_ok;\n}\n\n/* Extract the basic map list from the union map \"umap\".\n */\n__isl_give isl_basic_map_list *isl_union_map_get_basic_map_list(\n    __isl_keep isl_union_map *umap)\n{\n  isl_map_list *map_list = isl_union_map_get_map_list(umap);\n  isl_basic_map_list *bmap_list = NULL;\n  isl_map_list_foreach(map_list, &concat_basic_map, &bmap_list);\n\n  isl_map_list_free(map_list);\n  return bmap_list;\n}\n\nstatic isl_stat acc_n_basic_map(__isl_take isl_map *el, void *user)\n{\n  isl_size *n = (isl_size *)(user);\n  isl_basic_map_list *bmap_list = isl_map_get_basic_map_list(el);\n  *n = *n + isl_basic_map_list_n_basic_map(bmap_list);\n  isl_map_free(el);\n  isl_basic_map_list_free(bmap_list);\n  return isl_stat_ok;\n}\n\n/* Return the number of basic maps in the union map \"umap\".\n */\nisl_size isl_union_map_n_basic_map(__isl_keep isl_union_map *umap)\n{\n  isl_size n = 0;\n  isl_map_list *map_list = isl_union_map_get_map_list(umap);\n  isl_map_list_foreach(map_list, &acc_n_basic_map, &n);\n\n  isl_map_list_free(map_list);\n\n  return n;\n}\n\n__isl_give isl_basic_map *isl_basic_map_from_map(__isl_take isl_map *map)\n{\n  if (!map)\n    return NULL;\n\n  assert(isl_map_n_basic_map(map) == 1);\n  isl_basic_map_list *bmap_list = isl_map_get_basic_map_list(map);\n  isl_map_free(map);\n\n  isl_basic_map *bmap = isl_basic_map_list_get_basic_map(bmap_list, 0);\n  isl_basic_map_list_free(bmap_list);\n\n  return bmap;\n}\n\n/* Return a union set containing those elements in the domains\n * of the elements of \"mupa\" where they are all nonnegative.\n *\n * If there are no elements, then simply return the entire domain.\n */\n__isl_give isl_union_set *isl_multi_union_pw_aff_nonneg_union_set(\n    __isl_take isl_multi_union_pw_aff *mupa)\n{\n  int i;\n  isl_size n;\n  isl_union_pw_aff *upa;\n  isl_union_set *nonneg;\n\n  n = isl_multi_union_pw_aff_dim(mupa, isl_dim_set);\n  if (n < 0)\n    mupa = isl_multi_union_pw_aff_free(mupa);\n  if (!mupa)\n    return NULL;\n\n  if (n == 0)\n    return isl_multi_union_pw_aff_domain(mupa);\n\n  upa = isl_multi_union_pw_aff_get_union_pw_aff(mupa, 0);\n  nonneg = isl_union_pw_aff_nonneg_union_set(upa);\n\n  for (i = 1; i < n; ++i)\n  {\n    isl_union_set *nonneg_i;\n\n    upa = isl_multi_union_pw_aff_get_union_pw_aff(mupa, i);\n    nonneg_i = isl_union_pw_aff_nonneg_union_set(upa);\n\n    nonneg = isl_union_set_intersect(nonneg, nonneg_i);\n  }\n\n  isl_multi_union_pw_aff_free(mupa);\n  return nonneg;\n}\n\n/* Compute the set of elements in the domain of \"pa\" where it is nonnegative \n * and add this set to \"uset\".\n */\nstatic isl_stat nonneg_union_set(__isl_take isl_pw_aff *pa, void *user)\n{\n  isl_union_set **uset = (isl_union_set **)user;\n\n  *uset = isl_union_set_add_set(*uset, isl_pw_aff_nonneg_set(pa));\n\n  return *uset ? isl_stat_ok : isl_stat_error;\n}\n\n/* Return a union set containing those elements in the domains\n * of \"upa\" where it is nonnegative.\n */\n__isl_give isl_union_set *isl_union_pw_aff_nonneg_union_set(\n    __isl_take isl_union_pw_aff *upa)\n{\n  isl_union_set *nonneg;\n\n  nonneg = isl_union_set_empty(isl_union_pw_aff_get_space(upa));\n  if (isl_union_pw_aff_foreach_pw_aff(upa, &nonneg_union_set, &nonneg) < 0)\n    nonneg = isl_union_set_free(nonneg);\n\n  isl_union_pw_aff_free(upa);\n  return nonneg;\n}\n\n/* Return a union set containing those elements in the domains\n * of the elements of \"mupa\" where they are all non zero.\n *\n * If there are no elements, then simply return the entire domain.\n */\n__isl_give isl_union_set *isl_multi_union_pw_aff_non_zero_union_set(\n    __isl_take isl_multi_union_pw_aff *mupa)\n{\n  int i;\n  isl_size n;\n  isl_union_pw_aff *upa;\n  isl_union_set *non_zero;\n\n  n = isl_multi_union_pw_aff_dim(mupa, isl_dim_set);\n  if (n < 0)\n    mupa = isl_multi_union_pw_aff_free(mupa);\n  if (!mupa)\n    return NULL;\n\n  if (n == 0)\n    return isl_multi_union_pw_aff_domain(mupa);\n\n  upa = isl_multi_union_pw_aff_get_union_pw_aff(mupa, 0);\n  non_zero = isl_union_pw_aff_non_zero_union_set(upa);\n\n  for (i = 1; i < n; ++i)\n  {\n    isl_union_set *non_zero_i;\n\n    upa = isl_multi_union_pw_aff_get_union_pw_aff(mupa, i);\n    non_zero_i = isl_union_pw_aff_nonneg_union_set(upa);\n\n    non_zero = isl_union_set_intersect(non_zero, non_zero_i);\n  }\n\n  isl_multi_union_pw_aff_free(mupa);\n  return non_zero;\n}\n\n/* Compute the set of elements in the domain of \"pa\" where it is non zero\n * and add this set to \"uset\".\n */\nstatic isl_stat non_zero_union_set(__isl_take isl_pw_aff *pa, void *user)\n{\n  isl_union_set **uset = (isl_union_set **)user;\n  *uset = isl_union_set_add_set(*uset, isl_pw_aff_non_zero_set(pa));\n\n  return *uset ? isl_stat_ok : isl_stat_error;\n}\n\n/* Return a union_set containing those elements in the domains\n * of \"upa\" where it is non zero.\n */\n__isl_give isl_union_set *isl_union_pw_aff_non_zero_union_set(\n    __isl_take isl_union_pw_aff *upa)\n{\n  isl_union_set *non_zero;\n\n  non_zero = isl_union_set_empty(isl_union_pw_aff_get_space(upa));\n  if (isl_union_pw_aff_foreach_pw_aff(upa, &non_zero_union_set, &non_zero) < 0)\n    non_zero = isl_union_set_free(non_zero);\n\n  isl_union_pw_aff_free(upa);\n  return non_zero;\n}\n\n/* Print the isl_mat \"mat\" to \"fp\".\n */\nvoid print_mat(FILE *fp, __isl_keep isl_mat *mat)\n{\n  isl_printer *printer = isl_printer_to_file(isl_mat_get_ctx(mat), fp);\n  for (int i = 0; i < isl_mat_rows(mat); i++)\n  {\n    for (int j = 0; j < isl_mat_cols(mat); j++)\n    {\n      isl_printer_print_val(printer, isl_mat_get_element_val(mat, i, j));\n      fprintf(fp, \" \");\n    }\n    fprintf(fp, \"\\n\");\n  }\n  isl_printer_free(printer);\n}\n\n/* Compare the two vectors, return 0 if equal.\n */\nint isl_vec_cmp(__isl_keep isl_vec *vec1, __isl_keep isl_vec *vec2)\n{\n  if (isl_vec_size(vec1) != isl_vec_size(vec2))\n    return 1;\n\n  for (int i = 0; i < isl_vec_size(vec1); i++)\n  {\n    if (isl_vec_cmp_element(vec1, vec2, i))\n      return 1;\n  }\n\n  return 0;\n}\n\n/* Construct the string \"<a>_<b>\".\n */\nchar *concat(isl_ctx *ctx, const char *a, const char *b)\n{\n  isl_printer *p;\n  char *s;\n\n  p = isl_printer_to_str(ctx);\n  p = isl_printer_print_str(p, a);\n  p = isl_printer_print_str(p, \"_\");\n  p = isl_printer_print_str(p, b);\n  s = isl_printer_get_str(p);\n  isl_printer_free(p);\n\n  return s;\n}\n\nbool isl_vec_is_zero(__isl_keep isl_vec *vec)\n{\n  int n = isl_vec_size(vec);\n  for (int i = 0; i < n; i++)\n  {\n    isl_val *val = isl_vec_get_element_val(vec, i);\n    if (!isl_val_is_zero(val))\n    {\n      isl_val_free(val);\n      return false;\n    }\n    isl_val_free(val);\n  }\n  return true;\n}\n\nint suffixcmp(const char *s, const char *suffix)\n{\n  int start = strlen(s) - strlen(suffix);\n  if (start < 0)\n    return 1;\n  else\n    return strncmp(s + start, suffix, strlen(suffix));\n}\n\n/* Add \"len\" parameters p[i] with identifiers \"ids\" and intersect \"set\"\n * with\n *\n *\t{ : 0 <= p[i] < size[i] }\n *\n * or an overapproximation.\n */\n__isl_give isl_set *add_bounded_parameters_dynamic(\n    __isl_take isl_set *set, __isl_keep isl_multi_pw_aff *size,\n    __isl_keep isl_id_list *ids)\n{\n  int i, len;\n  unsigned nparam;\n  isl_space *space;\n  isl_local_space *ls;\n\n  len = isl_multi_pw_aff_dim(size, isl_dim_out);\n  nparam = isl_set_dim(set, isl_dim_param);\n  set = isl_set_add_dims(set, isl_dim_param, len);\n\n  for (i = 0; i < len; ++i)\n  {\n    isl_id *id;\n\n    id = isl_id_list_get_id(ids, i);\n    set = isl_set_set_dim_id(set, isl_dim_param, nparam + i, id);\n  }\n\n  space = isl_space_params(isl_set_get_space(set));\n  ls = isl_local_space_from_space(space);\n  for (i = 0; i < len; ++i)\n  {\n    isl_pw_aff *param, *size_i, *zero;\n    isl_set *bound;\n\n    param = isl_pw_aff_var_on_domain(isl_local_space_copy(ls),\n                                     isl_dim_param, nparam + i);\n\n    size_i = isl_multi_pw_aff_get_pw_aff(size, i);\n    bound = isl_pw_aff_lt_set(isl_pw_aff_copy(param), size_i);\n    bound = isl_set_from_basic_set(isl_set_simple_hull(bound));\n    set = isl_set_intersect_params(set, bound);\n\n    zero = isl_pw_aff_zero_on_domain(isl_local_space_copy(ls));\n    bound = isl_pw_aff_ge_set(param, zero);\n    set = isl_set_intersect_params(set, bound);\n  }\n  isl_local_space_free(ls);\n\n  return set;\n}\n\nlong int convert_pwqpoly_to_int(__isl_keep isl_pw_qpolynomial *to_convert)\n{\n  isl_ctx *ctx = isl_pw_qpolynomial_get_ctx(to_convert);\n  long int ret = -1;\n  isl_printer *p;\n  char *str;\n\n  p = isl_printer_to_str(ctx);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = isl_printer_print_pw_qpolynomial(p, to_convert);\n  str = isl_printer_get_str(p);\n  isl_printer_free(p);\n\n  /* Check if the string only contains the digits */\n  for (int i = 0; i < strlen(str); i++) \n  {\n    if (!isdigit(str[i])) {\n      throw std::runtime_error(\"[AutoSA] Error: The pw_qpolynomial contains non-digits.\\n\");\n    }\n  }\n\n  ret = atol(str);\n  free(str);\n\n  return ret;\n}\n\nchar *isl_vec_to_str(__isl_keep isl_vec *vec)\n{\n  isl_printer *p_str;\n  p_str = isl_printer_to_str(isl_vec_get_ctx(vec));\n  p_str = isl_printer_print_vec(p_str, vec);\n  char *ret = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  return ret;\n}\n\n/* Safe conversion to integer value. */\nlong isl_val_get_num(__isl_take isl_val *val)\n{\n  long ret;\n  isl_val *denominator = isl_val_get_den_val(val)  ;\n  assert(isl_val_is_one(denominator));\n  isl_val_free(denominator);\n  ret = isl_val_get_num_si(val);\n  isl_val_free(val);\n\n  return ret;\n}\n\nstatic isl_stat find_pa_min(__isl_take isl_set *set, __isl_take isl_aff *aff, void *user)\n{\n  long *min = (long *)user;\n  if (isl_aff_is_cst(aff)) {\n    *min = std::min(*min, isl_val_get_num(isl_aff_get_constant_val(aff)));\n  } else {\n    *min = std::numeric_limits<long>::min();\n  }\n  isl_set_free(set);\n  isl_aff_free(aff);\n  return isl_stat_ok;\n}\n\nlong compute_set_min(__isl_keep isl_set *set, int dim)\n{\n  long min = std::numeric_limits<long>::max();\n  isl_pw_aff *pa = isl_set_dim_min(isl_set_copy(set), dim);\n  isl_pw_aff_foreach_piece(pa, &find_pa_min, &min);\n  isl_pw_aff_free(pa);\n\n  return min;  \n}\n\nstatic isl_stat find_pa_max(__isl_take isl_set *set, __isl_take isl_aff *aff, void *user)\n{\n  long *max = (long *)user;\n  if (isl_aff_is_cst(aff)) {\n    *max = std::max(*max, isl_val_get_num(isl_aff_get_constant_val(aff)));\n  } else {\n    *max = std::numeric_limits<long>::max();\n  }\n  isl_set_free(set);\n  isl_aff_free(aff);\n  return isl_stat_ok;\n}\n\nlong compute_set_max(__isl_keep isl_set *set, int dim)\n{\n  long max = std::numeric_limits<long>::min();\n  isl_pw_aff *pa = isl_set_dim_max(isl_set_copy(set), dim);\n  isl_pw_aff_foreach_piece(pa, &find_pa_max, &max);\n  isl_pw_aff_free(pa);\n\n  return max;  \n}\n\nstd::vector<int> get_factors(int x) {\n  std::vector<int> factors;\n  std::vector<int> large_factors;\n  for (int i = 1; i < int(sqrt((float)x) + 1); i++) {\n    if (x % i == 0)\n      factors.push_back(i);\n    if (i * i != x)\n      large_factors.push_back((int)(x / i));\n  }\n  for (int i = large_factors.size() - 1; i >= 0; i--) {\n    factors.push_back(large_factors[i]);\n  }\n  return factors;\n}"
  },
  {
    "path": "src/autosa_utils.h",
    "content": "#ifndef _AUTOSA_UTILS_H\n#define _AUTOSA_UTILS_H\n\n#include <isl/ast.h>\n#include <isl/id.h>\n#include <isl/id_to_ast_expr.h>\n#include <isl/polynomial.h>\n\n#include <pet.h>\n\n#include <vector>\n\n#include \"ppcg.h\"\n#include \"ppcg_options.h\"\n\n#if defined(__cplusplus)\nextern \"C\" {\n#endif    \n\n__isl_give isl_union_map *extract_sizes_from_str(isl_ctx *ctx, const char *str);\n\n__isl_give isl_basic_map_list *isl_union_map_get_basic_map_list(\n    __isl_keep isl_union_map *umap);\nisl_size isl_union_map_n_basic_map(__isl_keep isl_union_map *umap);\n__isl_give isl_basic_map *isl_basic_map_from_map(__isl_take isl_map *map);\n\n__isl_give isl_union_set *isl_multi_union_pw_aff_nonneg_union_set(\n    __isl_take isl_multi_union_pw_aff *mupa);\n__isl_give isl_union_set *isl_union_pw_aff_nonneg_union_set(\n    __isl_take isl_union_pw_aff *upa);\n__isl_give isl_union_set *isl_multi_union_pw_aff_non_zero_union_set(\n    __isl_take isl_multi_union_pw_aff *mupa);\n__isl_give isl_union_set *isl_union_pw_aff_non_zero_union_set(\n    __isl_take isl_union_pw_aff *upa);\n\nvoid print_mat(FILE *fp, __isl_keep isl_mat *mat);\nint isl_vec_cmp(__isl_keep isl_vec *vec1, __isl_keep isl_vec *vec2);\nchar *concat(isl_ctx *ctx, const char *a, const char *b);\nbool isl_vec_is_zero(__isl_keep isl_vec *vec);\nint suffixcmp(const char *s, const char *suffix);\n\n__isl_give isl_set *add_bounded_parameters_dynamic(\n    __isl_take isl_set *set, __isl_keep isl_multi_pw_aff *size,\n    __isl_keep isl_id_list *ids);\n\nlong int convert_pwqpoly_to_int(__isl_keep isl_pw_qpolynomial *to_convert);\n\n/* Get strings */\nchar *isl_vec_to_str(__isl_keep isl_vec *vec);\n\nlong isl_val_get_num(__isl_take isl_val *val);\nlong compute_set_min(__isl_keep isl_set *set, int dim);\nlong compute_set_max(__isl_keep isl_set *set, int dim);\n\n/* Get the factors of the number x. */\nstd::vector<int> get_factors(int x);\n\n#if defined(__cplusplus)\n}\n#endif\n\n#endif"
  },
  {
    "path": "src/autosa_xilinx_hls_c.cpp",
    "content": "#include <isl/ctx.h>\n\n#include \"autosa_xilinx_hls_c.h\"\n#include \"autosa_common.h\"\n#include \"autosa_comm.h\"\n#include \"autosa_print.h\"\n#include \"autosa_trans.h\"\n#include \"autosa_codegen.h\"\n#include \"autosa_utils.h\"\n\n#include <set>\n\nstruct print_host_user_data\n{\n  struct hls_info *hls;\n  struct autosa_prog *prog;\n  struct autosa_hw_top_module *top;\n};\n\nstruct print_hw_module_data\n{\n  struct hls_info *hls;\n  struct autosa_prog *prog;\n  struct autosa_hw_module *module;\n  /* Used for double buffer codegen. Modify the printed iterator prefix. */\n  const char *iterator_prefix;\n};\n\n/* Print the includes for Xilinx OpenCL host.  \n */\nstatic void print_xilinx_host_header(FILE *fp)\n{\n  fprintf(fp, \"#include <iostream>\\n\");\n  fprintf(fp, \"#include <vector>\\n\");\n  fprintf(fp, \"#include <fstream>\\n\\n\");\n\n  fprintf(fp, \"#define CL_HPP_CL_1_2_DEFAULT_BUILD\\n\");\n  fprintf(fp, \"#define CL_HPP_TARGET_OPENCL_VERSION 120\\n\");\n  fprintf(fp, \"#define CL_HPP_MINIMUM_OPENCL_VERSION 120\\n\");\n  fprintf(fp, \"#define CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY 1\\n\");\n  fprintf(fp, \"#define CL_USE_DEPRECATED_OPENCL_1_2_APIS\\n\\n\");\n\n  fprintf(fp, \"#include <CL/cl2.hpp>\\n\");\n  fprintf(fp, \"#include <CL/cl_ext_xilinx.h>\\n\\n\");\n\n  fprintf(fp, \"#define OCL_CHECK(error,call)                                       \\\\\\n\");\n  fprintf(fp, \"    call;                                                           \\\\\\n\");\n  fprintf(fp, \"    if (error != CL_SUCCESS) {                                      \\\\\\n\");\n  fprintf(fp, \"      printf(\\\"%%s:%%d Error calling \\\" #call \\\", error code is: %%d\\\\n\\\",  \\\\\\n\");\n  fprintf(fp, \"              __FILE__,__LINE__, error);                            \\\\\\n\");\n  fprintf(fp, \"      exit(EXIT_FAILURE);                                           \\\\\\n\");\n  fprintf(fp, \"    }\\n\\n\");\n\n  fprintf(fp, \"std::string xclbin_file_name;\\n\\n\");\n\n  fprintf(fp, \"template <typename T>\\n\");\n  fprintf(fp, \"struct aligned_allocator\\n\");\n  fprintf(fp, \"{\\n\");\n  fprintf(fp, \"  using value_type = T;\\n\");\n  fprintf(fp, \"  T* allocate(std::size_t num)\\n\");\n  fprintf(fp, \"  {\\n\");\n  fprintf(fp, \"    void* ptr = nullptr;\\n\");\n  fprintf(fp, \"    if (posix_memalign(&ptr,4096,num*sizeof(T)))\\n\");\n  fprintf(fp, \"      throw std::bad_alloc();\\n\");\n  fprintf(fp, \"    return reinterpret_cast<T*>(ptr);\\n\");\n  fprintf(fp, \"  }\\n\");\n  fprintf(fp, \"  void deallocate(T* p, std::size_t num)\\n\");\n  fprintf(fp, \"  {\\n\");\n  fprintf(fp, \"    free(p);\\n\");\n  fprintf(fp, \"  }\\n\");\n  fprintf(fp, \"};\\n\\n\");\n\n  fprintf(fp, \"cl::Program::Binaries import_binary_file()\\n\");\n  fprintf(fp, \"{\\n\");\n  fprintf(fp, \"    std::cout << \\\"\\\\n Loading: \\\"<< xclbin_file_name.c_str() << \\\"\\\\n\\\";\\n\");\n  fprintf(fp, \"    std::ifstream bin_file(xclbin_file_name.c_str(), std::ifstream::binary);\\n\");\n  fprintf(fp, \"    bin_file.seekg (0, bin_file.end);\\n\");\n  fprintf(fp, \"    unsigned nb = bin_file.tellg();\\n\");\n  fprintf(fp, \"    bin_file.seekg (0, bin_file.beg);\\n\");\n  fprintf(fp, \"    char *buf = new char [nb];\\n\");\n  fprintf(fp, \"    bin_file.read(buf, nb);\\n\");\n  fprintf(fp, \"\\n\");\n  fprintf(fp, \"    cl::Program::Binaries bins;\\n\");\n  fprintf(fp, \"    bins.push_back({buf,nb});\\n\");\n  fprintf(fp, \"    return bins;\\n\");\n  fprintf(fp, \"}\\n\\n\");\n\n  fprintf(fp, \"std::vector<cl::Device> get_devices() {\\n\");\n  fprintf(fp, \"    size_t i;\\n\");\n  fprintf(fp, \"    cl_int err;\\n\");\n  fprintf(fp, \"    std::vector<cl::Platform> platforms;\\n\");\n  fprintf(fp, \"    OCL_CHECK(err, err = cl::Platform::get(&platforms));\\n\");\n  fprintf(fp, \"    cl::Platform platform;\\n\");\n  fprintf(fp, \"    for (i  = 0 ; i < platforms.size(); i++){\\n\");\n  fprintf(fp, \"        platform = platforms[i];\\n\");\n  fprintf(fp, \"        OCL_CHECK(err, std::string platformName = platform.getInfo<CL_PLATFORM_NAME>(&err));\\n\");\n  fprintf(fp, \"        if (platformName == \\\"Xilinx\\\"){\\n\");\n  fprintf(fp, \"            std::cout << \\\"\\\\nFound Platform\\\" << std::endl;\\n\");\n  fprintf(fp, \"            std::cout << \\\"\\\\nPlatform Name: \\\" << platformName.c_str() << std::endl;\\n\");\n  fprintf(fp, \"            break;\\n\");\n  fprintf(fp, \"        }\\n\");\n  fprintf(fp, \"    }\\n\");\n  fprintf(fp, \"    if (i == platforms.size()) {\\n\");\n  fprintf(fp, \"        std::cout << \\\"Error: Failed to find Xilinx platform\\\" << std::endl;\\n\");\n  fprintf(fp, \"        exit(EXIT_FAILURE);\\n\");\n  fprintf(fp, \"    }\\n\");\n  fprintf(fp, \"    //Getting ACCELERATOR Devices and selecting 1st such device\\n\");\n  fprintf(fp, \"    std::vector<cl::Device> devices;\\n\");\n  fprintf(fp, \"    OCL_CHECK(err, err = platform.getDevices(CL_DEVICE_TYPE_ACCELERATOR, &devices));\\n\");\n  fprintf(fp, \"    return devices;\\n\");\n  fprintf(fp, \"}\\n\\n\");\n}\n\n/* Open the host .cpp file and the kernel .h and .cpp files for writing.\n * Add the necessary includes.\n */\nstatic void hls_open_files(struct hls_info *info, const char *input)\n{\n  char name[PATH_MAX];\n  char dir[PATH_MAX];\n  int len, len_dir;\n  isl_printer *p_str;\n  char *file_path;\n\n  p_str = isl_printer_to_str(info->ctx);\n  p_str = isl_printer_print_str(p_str, info->output_dir);\n  p_str = isl_printer_print_str(p_str, \"/src/\");\n  file_path = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  len = ppcg_extract_base_name(name, input);\n  /* Add the prefix */\n  sprintf(dir, \"%s\", file_path);\n  len_dir = strlen(file_path);\n\n  strcpy(name + len, \"_host.cpp\");\n  strcpy(dir + len_dir, name);\n  info->host_c = fopen(dir, \"w\");\n  if (!info->host_c)\n  {\n    printf(\"[AutoSA] Error: Can't open the file: %s\\n\", dir);\n    exit(1);\n  }\n\n  if (!info->hls)\n  {\n    /* OpenCL host */\n    strcpy(name + len, \"_host.hpp\");\n    strcpy(dir + len_dir, name);\n    info->host_h = fopen(dir, \"w\");\n    print_xilinx_host_header(info->host_h);\n    fprintf(info->host_c, \"#include \\\"%s\\\"\\n\", name);\n  }\n\n  strcpy(name + len, \"_kernel_modules.cpp\");\n  strcpy(dir + len_dir, name);\n  info->kernel_c = fopen(dir, \"w\");\n  if (!info->kernel_c)\n  {\n    printf(\"[AutoSA] Error: Can't open the file: %s\\n\", dir);\n    exit(1);\n  }\n\n  strcpy(name + len, \"_kernel.h\");\n  strcpy(dir + len_dir, name);\n  info->kernel_h = fopen(dir, \"w\");\n  if (!info->kernel_h)\n  {\n    printf(\"[AutoSA] Error: Can't open the file: %s\\n\", dir);\n    exit(1);\n  }\n\n  fprintf(info->host_c, \"#include <assert.h>\\n\");\n  fprintf(info->host_c, \"#include <stdio.h>\\n\");\n  if (info->hls)\n    fprintf(info->host_c, \"#include \\\"%s\\\"\\n\\n\", name);\n\n  if (info->hls && !info->hcl)\n    fprintf(info->kernel_c, \"#include \\\"%s\\\"\\n\", name);\n\n  if (info->hcl) {\n    strcpy(name + len, \"_hcl_decl.h\");\n    strcpy(dir + len_dir, name);\n    info->hcl_decl = fopen(dir, \"w\");\n    if (!info->hcl_decl) {\n      printf(\"[AutoSA] Error: Can't open the file: %s\\n\", dir);\n      exit(1);\n    }\n  }\n\n  strcpy(name + len, \"_top_gen.cpp\");\n  strcpy(dir + len_dir, name);\n  info->top_gen_c = fopen(dir, \"w\");\n\n  strcpy(name + len, \"_top_gen.h\");\n  strcpy(dir + len_dir, name);\n  info->top_gen_h = fopen(dir, \"w\");\n\n  fprintf(info->top_gen_c, \"#include <isl/printer.h>\\n\");\n  fprintf(info->top_gen_c, \"#include \\\"%s\\\"\\n\", name);\n    \n  fprintf(info->kernel_h, \"#include <ap_int.h>\\n\");\n  fprintf(info->kernel_h, \"#include <hls_stream.h>\\n\");\n  fprintf(info->kernel_h, \"\\n\");  \n\n  fprintf(info->kernel_h, \"#define min(x,y) ((x < y) ? x : y)\\n\");\n  fprintf(info->kernel_h, \"#define max(x,y) ((x > y) ? x : y)\\n\");\n  fprintf(info->kernel_h, \"\\n\");  \n\n  free(file_path);\n}\n\n/* Close all output files.\n */\nstatic void hls_close_files(struct hls_info *info)\n{\n  isl_printer *p_str;\n  char *complete;\n  FILE *f;\n\n  fclose(info->kernel_c);\n  fclose(info->kernel_h);\n  fclose(info->host_c);\n  if (!info->hls)\n  {\n    fclose(info->host_h);\n  }\n  fclose(info->top_gen_c);\n  fclose(info->top_gen_h);\n  if (info->hcl)\n    fclose(info->hcl_decl);\n\n  p_str = isl_printer_to_str(info->ctx);\n  p_str = isl_printer_print_str(p_str, info->output_dir);\n  p_str = isl_printer_print_str(p_str, \"/src/completed\");\n  complete = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n  f = fopen(complete, \"w\");\n  fclose(f);\n  free(complete);\n}\n\n/* Extract the data pack factors for each I/O buffer allocated for the current\n * I/O group.\n * Only insert the data pack factor that is not found in the current list\n * \"data_pack_factors\".\n * The list is in ascending order.\n */\nstatic int *extract_data_pack_factors(int *data_pack_factors,\n                                      int *n_factor, struct autosa_array_ref_group *group)\n{\n  /* Test if the group default packing factor needs to be inserted */\n  if (group->n_lane > 1)\n  {    \n    int n_lane = group->n_lane;\n    bool insert = true;\n    int pos = 0;\n    for (pos = 0; pos < *n_factor; pos++)\n    {\n      if (n_lane > data_pack_factors[pos])\n      {\n        if (pos < *n_factor - 1)\n        {\n          if (n_lane < data_pack_factors[pos + 1])\n          {\n            // insert @pos+1\n            pos++;\n            break;\n          }\n        }\n      }\n      else if (n_lane == data_pack_factors[pos])\n      {\n        insert = false;\n        break;\n      }\n    }\n\n    if (insert) {\n      *n_factor = *n_factor + 1;\n      data_pack_factors = (int *)realloc(data_pack_factors,\n                                         sizeof(int) * (*n_factor));\n      for (int j = *n_factor - 1; j > pos; j--)\n      {\n        data_pack_factors[j] = data_pack_factors[j - 1];\n      }\n      data_pack_factors[pos] = n_lane;\n    }\n  }\n\n  for (int i = 0; i < group->n_io_buffer; i++)\n  {\n    struct autosa_io_buffer *buf = group->io_buffers[i];\n    bool insert = true;\n    int pos = 0;\n    for (pos = 0; pos < *n_factor; pos++)\n    {\n      if (buf->n_lane > data_pack_factors[pos])\n      {\n        if (pos < *n_factor - 1)\n        {\n          if (buf->n_lane < data_pack_factors[pos + 1])\n          {\n            // insert @pos+1\n            pos++;\n            break;\n          }\n        }\n      }\n      else if (buf->n_lane == data_pack_factors[pos])\n      {\n        insert = false;\n        break;\n      }\n    }\n\n    if (!insert)\n      continue;\n\n    *n_factor = *n_factor + 1;\n    data_pack_factors = (int *)realloc(data_pack_factors,\n                                       sizeof(int) * (*n_factor));\n    for (int j = *n_factor - 1; j > pos; j--)\n    {\n      data_pack_factors[j] = data_pack_factors[j - 1];\n    }\n    data_pack_factors[pos] = buf->n_lane;\n  }\n\n  return data_pack_factors;\n}\n\n/* Examine the local buffers of each array group. \n * Extract the data pack factors and build the data types \n * required by the program. \n */\nstatic isl_stat print_data_types_xilinx(\n    struct autosa_hw_top_module *top, struct hls_info *hls)\n{\n  isl_printer *p;\n  struct autosa_kernel *kernel;\n\n  kernel = top->kernel;\n  p = isl_printer_to_file(kernel->ctx, hls->kernel_h);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_str_new_line(p, \"/* Data Type */\");\n\n  /* Print the primitive data type. */\n  for (int i = 0; i < kernel->n_array; i++) {\n    struct autosa_local_array_info *local = &kernel->array[i];\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"typedef \");\n    p = isl_printer_print_str(p, local->array->type);\n    p = isl_printer_print_str(p, \" \");\n    p = isl_printer_print_str(p, local->array->name);\n    p = isl_printer_print_str(p, \"_t1;\");\n    p = isl_printer_end_line(p);\n  }\n\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local = &kernel->array[i];\n    int *data_pack_factors = (int *)malloc(sizeof(int));\n    int n_factor = 1;\n    /* First insert the default data pack factor for the array. */\n    data_pack_factors[0] = local->n_lane;    \n\n    /* IO group */\n    for (int n = 0; n < local->n_io_group; n++)\n    {\n      struct autosa_array_ref_group *group = local->io_groups[n];\n      data_pack_factors = extract_data_pack_factors(data_pack_factors, &n_factor, group);\n    }\n    /* Drain group */\n    if (local->drain_group)\n      data_pack_factors = extract_data_pack_factors(data_pack_factors, &n_factor, local->drain_group);\n\n    if (local->is_sparse) {\n      std::set<int> tmp_lanes;\n      for (int n = 0; n < n_factor; n++) {\n        tmp_lanes.insert(data_pack_factors[n] * kernel->n_nzero);\n        tmp_lanes.insert(data_pack_factors[n]);\n      }\n      for (auto it = tmp_lanes.begin(); it != tmp_lanes.end(); ++it) {\n        int f = *it;\n        if (local->array->size * 8 * f > 1024) {\n          printf(\"[AutoSA] Warning: The data width %d is greater than 1024-bit. The type definition is not generated.\\n\", local->array->size * 8 * f);\n          continue;\n        }\n        if (f > 1) {\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"typedef ap_uint<\");\n          p = isl_printer_print_int(p, local->array->size * 8 * f);\n          p = isl_printer_print_str(p, \"> \");\n          p = isl_printer_print_str(p, local->array->name);\n          p = isl_printer_print_str(p, \"_t\");\n          p = isl_printer_print_int(p, f);\n          p = isl_printer_print_str(p, \";\");\n          p = isl_printer_end_line(p);\n        }\n      }\n\n      for (int n = 0; n < n_factor; n++) {\n        if (data_pack_factors[n] * kernel->n_nzero * local->array->size * 8 > 1024)\n          continue;\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"typedef struct \");\n        p = isl_printer_print_str(p, local->array->name);\n        p = isl_printer_print_str(p, \"_s_t\");\n        p = isl_printer_print_int(p, data_pack_factors[n]);\n        p = isl_printer_print_str(p, \" {\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_indent(p, 2);\n        \n        p = isl_printer_start_line(p);\n        if (data_pack_factors[n] == 1 && kernel->n_nzero == 1) {\n          p = isl_printer_print_str(p, local->array->type);\n        } else {\n          p = isl_printer_print_str(p, local->array->name);\n          p = isl_printer_print_str(p, \"_t\");\n          p = isl_printer_print_int(p, data_pack_factors[n] * kernel->n_nzero);\n        }\n        p = isl_printer_print_str(p, \" d;\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        if (data_pack_factors[n] == 1 && kernel->n_nzero == 1) {\n          p = isl_printer_print_str(p, \"unsigned char\");  \n        } else {\n          p = isl_printer_print_str(p, \"ap_uint<\");\n          p = isl_printer_print_int(p, 8 * data_pack_factors[n]);\n          p = isl_printer_print_str(p, \">\");\n        }\n        p = isl_printer_print_str(p, \" i;\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_indent(p, -2);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"} \");\n        p = isl_printer_print_str(p, local->array->name);\n        p = isl_printer_print_str(p, \"_s_t\");\n        p = isl_printer_print_int(p, data_pack_factors[n]);\n        p = isl_printer_print_str(p, \";\");\n        p = isl_printer_end_line(p);\n      }\n    } else {\n      for (int n = 0; n < n_factor; n++)\n      {\n        if (data_pack_factors[n] != 1)\n        {\n          int width;\n          width = local->array->size * 8 * data_pack_factors[n];\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"typedef ap_uint<\");\n          p = isl_printer_print_int(p, width);\n          p = isl_printer_print_str(p, \"> \");\n          p = isl_printer_print_str(p, local->array->name);\n          p = isl_printer_print_str(p, \"_t\");\n          p = isl_printer_print_int(p, data_pack_factors[n]);\n          p = isl_printer_print_str(p, \";\");\n          p = isl_printer_end_line(p);\n        }\n      }\n    }\n    free(data_pack_factors);    \n  }\n  p = print_str_new_line(p, \"/* Data Type */\");\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  return isl_stat_ok;\n}\n\nstatic __isl_give isl_printer *find_device_xilinx(__isl_take isl_printer *p)\n{\n  p = print_str_new_line(p, \"if (argc != 2) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"std::cout << \\\"Usage: \\\" << argv[0] << \\\" <XCLBIN File>\\\" << std::endl;\");\n  p = print_str_new_line(p, \"return EXIT_FAILURE;\");\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"cl_int err;\");\n  p = print_str_new_line(p, \"std::vector<cl::Device> devices = get_devices();\");\n  p = print_str_new_line(p, \"cl::Device device = devices[0];\");\n  p = print_str_new_line(p, \"std::string device_name = device.getInfo<CL_DEVICE_NAME>();\");\n  p = print_str_new_line(p, \"std::cout << \\\"Found Device=\\\" << device_name.c_str() << std::endl;\");\n  p = print_str_new_line(p, \"// Creating Context and Command Queue for selected device\");\n  p = print_str_new_line(p, \"cl::Context context(device);\");\n  p = print_str_new_line(p, \"cl::CommandQueue q(context, device);\");\n  p = print_str_new_line(p, \"// Import XCLBIN\");\n  p = print_str_new_line(p, \"xclbin_file_name = argv[1];\");\n  p = print_str_new_line(p, \"cl::Program::Binaries kernel_bins = import_binary_file();\");\n  p = print_str_new_line(p, \"// Create Program and Kernel\");\n  p = print_str_new_line(p, \"//devices.erase(devices.begin());\");\n  p = print_str_new_line(p, \"devices.resize(1);\");\n  p = print_str_new_line(p, \"cl::Program program(context, devices, kernel_bins);\");\n  p = print_str_new_line(p, \"cl::Kernel krnl(program, \\\"kernel0\\\");\");\n\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *declare_and_allocate_device_arrays_xilinx(\n    __isl_take isl_printer *p, struct autosa_prog *prog, \n    struct autosa_kernel *kernel, struct autosa_hw_top_module *top)\n{\n  p = print_str_new_line(p, \"// Allocate memory in host memory\");\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    //if (local_array->n_mem_ports > 1 && local_array->array->copy_in)\n    if (local_array->n_mem_ports > 1)\n    {\n      /* Create multiple host buffers. */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::vector<std::vector<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \", aligned_allocator<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \">>> \");\n      p = isl_printer_print_str(p, \"dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");      \n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n      p = isl_printer_print_int(p, local_array->n_mem_ports);\n      p = isl_printer_print_str(p, \"; i++) {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::vector<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \", aligned_allocator<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \">> \");\n      p = isl_printer_print_str(p, \"dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"_tmp\");\n      p = isl_printer_print_str(p, \"(\");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \");\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \".push_back(dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"_tmp);\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n\n      if (local_array->host_serialize) {\n        /* Allocate additional serialize buffer. */\n        /* Create multiple host buffers. */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"std::vector<std::vector<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \", aligned_allocator<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \">>> \");\n        p = isl_printer_print_str(p, \"dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n      \n        p = isl_printer_print_str(p, \";\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n        p = isl_printer_print_int(p, local_array->n_mem_ports);\n        p = isl_printer_print_str(p, \"; i++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"std::vector<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \", aligned_allocator<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \">> \");\n        p = isl_printer_print_str(p, \"dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"_tmp\");\n        p = isl_printer_print_str(p, \"(\");        \n        p = isl_printer_print_pw_qpolynomial(p, local_array->serialize_bound);\n        if (local_array->is_sparse) {\n          p = isl_printer_print_str(p, \" / \");\n          p = isl_printer_print_double(p, (double)local_array->eff_compress_ratio);\n        }\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \".push_back(dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"_tmp);\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");        \n      }\n    }\n    else\n    {\n      /* Create a single host buffer. */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::vector<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \", aligned_allocator<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \">> \");\n      p = isl_printer_print_str(p, \"dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \"(\");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \");\");\n      p = isl_printer_end_line(p);\n\n      if (local_array->host_serialize) {\n        /* Create a single host buffer. */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"std::vector<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \", aligned_allocator<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \">> \");\n        p = isl_printer_print_str(p, \"dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);      \n        p = isl_printer_print_str(p, \"(\");\n        //p = autosa_array_info_print_data_size(p, local_array->array);\n        //p = isl_printer_print_ast_expr(p, local_array->serialize_bound_expr);\n        p = isl_printer_print_pw_qpolynomial(p, local_array->serialize_bound);\n        if (local_array->is_sparse) {\n          p = isl_printer_print_str(p, \" / \");\n          p = isl_printer_print_double(p, (double)local_array->eff_compress_ratio);\n        }\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n  p = isl_printer_end_line(p);\n\n  /* Initialize buffer. */\n  p = print_str_new_line(p, \"// Initialize host buffers\");\n\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    if (local_array->n_mem_ports > 1 && local_array->array->copy_in)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n      p = isl_printer_print_int(p, local_array->n_mem_ports);\n      p = isl_printer_print_str(p, \"; i++) {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::copy(reinterpret_cast<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *>(\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->is_sparse)\n        p = isl_printer_print_str(p, \"_s\");\n      p = isl_printer_print_str(p, \"), reinterpret_cast<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *>(\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->is_sparse)\n        p = isl_printer_print_str(p, \"_s\");\n      p = isl_printer_print_str(p, \") + \");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \", dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \"[i]\");\n      p = isl_printer_print_str(p, \".begin());\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n    }\n    else if (local_array->array->copy_in)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::copy(reinterpret_cast<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *>(\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->is_sparse)\n        p = isl_printer_print_str(p, \"_s\");\n      p = isl_printer_print_str(p, \"), reinterpret_cast<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *>(\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->is_sparse)\n        p = isl_printer_print_str(p, \"_s\");\n      p = isl_printer_print_str(p, \") + \");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \", dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \".begin());\");\n      p = isl_printer_end_line(p);\n    }    \n  }\n\n  /* Perform data serialization if needed. */\n  for (int i = 0; i < top->n_hw_modules; i++) {\n    struct autosa_hw_module *module = top->hw_modules[i];\n    if (module->serialize_tree && module->in) {\n      struct autosa_array_ref_group *group = module->io_groups[0];\n      struct autosa_local_array_info *local_array = group->local_array;\n      if (local_array->n_mem_ports > 1 && local_array->array->copy_in)\n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n        p = isl_printer_print_int(p, local_array->n_mem_ports);\n        p = isl_printer_print_str(p, \"; i++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n  \n        p = isl_printer_start_line(p);        \n        p = isl_printer_print_str(p, module->in? \"host_serialize_\" : \"host_deserialize_\");\n        p = isl_printer_print_str(p, local_array->array->name);            \n        p = isl_printer_print_str(p, \"(\");\n        p = print_host_serialize_arguments(p, kernel, group, module, 0, 0);  // TODO: add hbm support later.\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n  \n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n      } else \n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, module->in? \"host_serialize_\" : \"host_deserialize_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"(\");\n        p = print_host_serialize_arguments(p, kernel, group, module, 0, 0);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"// Allocate buffers in device memory\");\n  p = print_str_new_line(p, \"// Buffers are allocated using CL_MEM_USE_HOST_PTR for efficient memory and\");\n  p = print_str_new_line(p, \"// device-to-host communication\");\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"std::vector<cl::Buffer> buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n  }\n\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    int indent1, indent2;\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    //for (int j = 0; j < local_array->n_mem_ports; j++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n    p = isl_printer_print_int(p, local_array->n_mem_ports);\n    p = isl_printer_print_str(p, \"; i++) {\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n\n    p = print_str_new_line(p, \"OCL_CHECK(err,\");\n    indent1 = strlen(\"OCL_CHECK(\");\n    p = isl_printer_indent(p, indent1);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"cl::Buffer buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"_tmp\");\n    p = isl_printer_print_str(p, \"(context,\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, strlen(\"cl::Buffer buffer_\") +\n                                  strlen(local_array->array->name) + strlen(\"_tmp\") + 1);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"CL_MEM_USE_HOST_PTR | \");\n    if (local_array->array->copy_in && local_array->array->copy_out)\n    {\n      p = isl_printer_print_str(p, \"CL_MEM_READ_WRITE\");\n    }\n    else\n    {\n      if (local_array->array->copy_in)\n        p = isl_printer_print_str(p, \"CL_MEM_READ_ONLY\");\n      else if (local_array->array->copy_out)\n        p = isl_printer_print_str(p, \"CL_MEM_WRITE_ONLY\");\n    }\n    p = isl_printer_print_str(p, \",\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_start_line(p);\n    if (local_array->host_serialize) {\n      p = autosa_array_info_print_serialize_size(p, local_array->array);\n    } else {\n      p = autosa_array_info_print_size(p, local_array->array);\n    }\n    p = isl_printer_print_str(p, \",\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"dev_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    if (local_array->n_mem_ports > 1)\n    {\n      p = isl_printer_print_str(p, \"[i]\");\n    }\n    p = isl_printer_print_str(p, \".data(),\");\n    p = isl_printer_end_line(p);\n    p = print_str_new_line(p, \"&err));\");\n    p = isl_printer_indent(p, -(strlen(\"cl::Buffer buffer_\") +\n                                strlen(local_array->array->name) + strlen(\"_tmp\") + 1));\n    p = isl_printer_indent(p, -indent1);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \".push_back(std::move(buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"_tmp));\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n  p = isl_printer_end_line(p);\n\n  /* Insert profiling information. */\n  p = print_str_new_line(p, \"auto host_begin = std::chrono::high_resolution_clock::now();\");\n  p = print_str_new_line(p, \"auto fpga_begin = std::chrono::high_resolution_clock::now();\");\n  p = print_str_new_line(p, \"auto fpga_end = std::chrono::high_resolution_clock::now();\");\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *declare_and_allocate_cpu_arrays_xilinx(\n    __isl_take isl_printer *p, struct autosa_prog *prog, \n    struct autosa_kernel *kernel, struct autosa_hw_top_module *top)\n{\n  p = print_str_new_line(p, \"// Allocate memory in host memory\");\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    if (local_array->n_mem_ports > 1)\n    {\n      /* Create multiple host buffers. */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"std::vector<\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *> \");\n      p = isl_printer_print_str(p, \"dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize) {\n        p = isl_printer_print_str(p, \"_unserialized\");\n      }\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n      p = isl_printer_print_int(p, local_array->n_mem_ports);\n      p = isl_printer_print_str(p, \"; i++) {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"_tmp\");\n      p = isl_printer_print_str(p, \" = (\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *)malloc(\");\n      p = autosa_array_info_print_data_size(p, local_array->array);      \n      p = isl_printer_print_str(p, \" * sizeof(\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \"));\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \".push_back(dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"_tmp);\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n\n      if (local_array->host_serialize) {\n        /* Allocate additional serialize buffer. */\n        /* Create multiple host buffers. */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"std::vector<\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \" *> \");\n        p = isl_printer_print_str(p, \"dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);      \n        p = isl_printer_print_str(p, \";\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n        p = isl_printer_print_int(p, local_array->n_mem_ports);\n        p = isl_printer_print_str(p, \"; i++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \" *dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"_tmp\");\n        p = isl_printer_print_str(p, \" = (\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \" *)malloc(\");\n        //p = autosa_array_info_print_data_size(p, local_array->array);\n        p = isl_printer_print_pw_qpolynomial(p, local_array->serialize_bound);\n        if (local_array->is_sparse) {\n          p = isl_printer_print_str(p, \" / \");\n          p = isl_printer_print_double(p, (double)local_array->eff_compress_ratio);\n        }\n        p = isl_printer_print_str(p, \" * sizeof(\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \"));\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \".push_back(dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"_tmp);\");\n        p = isl_printer_end_line(p);\n\n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n      }\n    }\n    else\n    {\n      /* Create a single host buffer. */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \" = (\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \" *)malloc(\");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \" * sizeof(\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \"));\");\n      p = isl_printer_end_line(p);\n\n      if (local_array->host_serialize) {\n        /* Create a single host buffer. */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \" *dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);       \n        p = isl_printer_print_str(p, \" = (\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \" *)malloc(\");\n        //p = autosa_array_info_print_data_size(p, local_array->array);\n        p = isl_printer_print_pw_qpolynomial(p, local_array->serialize_bound);\n        if (local_array->is_sparse) {\n          p = isl_printer_print_str(p, \" / \");\n          p = isl_printer_print_double(p, (double)local_array->eff_compress_ratio);\n        }\n        p = isl_printer_print_str(p, \" * sizeof(\");\n        p = isl_printer_print_str(p, local_array->array->type);\n        p = isl_printer_print_str(p, \"));\");\n        p = isl_printer_end_line(p);\n      }\n    }\n    //    p = isl_printer_print_str(p, \" = (\");\n    //    p = autosa_print_array_type(p, array);\n    //    p = isl_printer_print_str(p, \" *)malloc(\");\n    //    p = autosa_array_info_print_data_size(p, array);\n    //    p = isl_printer_print_str(p, \" * sizeof(\");\n    //    p = isl_printer_print_str(p, array->type);\n    //    p = isl_printer_print_str(p, \"));\");\n    //    p = isl_printer_end_line(p);\n  }\n  p = isl_printer_end_line(p);\n\n  /* Initialize buffer. */\n  p = print_str_new_line(p, \"// Initialize host buffers\");\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    if (local_array->n_mem_ports > 1 && local_array->array->copy_in)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n      p = isl_printer_print_int(p, local_array->n_mem_ports);\n      p = isl_printer_print_str(p, \"; i++) {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"memcpy(dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \"[i]\");      \n      p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->is_sparse)\n        p = isl_printer_print_str(p, \"_s\");\n      p = isl_printer_print_str(p, \", \");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \" * sizeof(\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \"));\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n    }\n    else if (local_array->array->copy_in)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"memcpy(dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->host_serialize)\n        p = isl_printer_print_str(p, \"_unserialized\");\n      p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, local_array->array->name);\n      if (local_array->is_sparse)\n        p = isl_printer_print_str(p, \"_s\");\n      p = isl_printer_print_str(p, \", \");\n      p = autosa_array_info_print_data_size(p, local_array->array);\n      p = isl_printer_print_str(p, \" * sizeof(\");\n      p = isl_printer_print_str(p, local_array->array->type);\n      p = isl_printer_print_str(p, \"));\");\n      p = isl_printer_end_line(p);\n    }\n  }\n  \n  /* Perform data serialization if needed. */\n  for (int i = 0; i < top->n_hw_modules; i++) {\n    struct autosa_hw_module *module = top->hw_modules[i];\n    if (module->serialize_tree && module->in) {\n      struct autosa_array_ref_group *group = module->io_groups[0];\n      struct autosa_local_array_info *local_array = group->local_array;\n      if (local_array->n_mem_ports > 1 && local_array->array->copy_in)\n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n        p = isl_printer_print_int(p, local_array->n_mem_ports);\n        p = isl_printer_print_str(p, \"; i++) {\");\n        p = isl_printer_end_line(p);\n        p = isl_printer_indent(p, 2);\n  \n        p = isl_printer_start_line(p);        \n        p = isl_printer_print_str(p, module->in? \"host_serialize_\" : \"host_deserialize_\");\n        p = isl_printer_print_str(p, local_array->array->name);            \n        p = isl_printer_print_str(p, \"(\");\n        p = print_host_serialize_arguments(p, kernel, group, module, 0, 0);  // TODO: add hbm support later.\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n  \n        p = isl_printer_indent(p, -2);\n        p = print_str_new_line(p, \"}\");\n      } else \n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, module->in? \"host_serialize_\" : \"host_deserialize_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"(\");\n        p = print_host_serialize_arguments(p, kernel, group, module, 0, 0);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }  \n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"// Allocate buffers in device memory\");\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"std::vector<\");\n    p = autosa_print_array_type(p, local_array->array);\n    p = isl_printer_print_str(p, \" *> buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n\n    if (prog->scop->options->autosa->axi_stream) {\n      if (local_array->n_mem_ports > 1) {\n        printf(\"[AutoSA] Error: Can't generate AXI Stream interface for array with more than one memory port: %s\\n\", local_array->array->name);\n        exit(1);\n      }\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"hls::stream<\");\n      p = autosa_print_array_type(p, local_array->array);\n      p = isl_printer_print_str(p, \"> fifo_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    }\n  }\n\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    int indent1, indent2;\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n    p = isl_printer_print_int(p, local_array->n_mem_ports);\n    p = isl_printer_print_str(p, \"; i++) {\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n\n    p = isl_printer_start_line(p);\n    p = autosa_print_array_type(p, local_array->array);\n    p = isl_printer_print_str(p, \" *buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"_tmp = (\");\n    p = autosa_print_array_type(p, local_array->array);\n    p = isl_printer_print_str(p, \" *)malloc(\");\n    if (local_array->host_serialize) {\n      p = autosa_array_info_print_serialize_size(p, local_array->array);\n    } else {\n      p = autosa_array_info_print_size(p, local_array->array);\n    }\n    p = isl_printer_print_str(p, \");\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \".push_back(buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"_tmp);\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print code for initializing the device for execution of the transformed\n * code. This includes declaring locally defined variables as well as\n * declaring and allocating the required copies of arrays on the device.\n */\nstatic __isl_give isl_printer *init_device_xilinx(__isl_take isl_printer *p,\n                                                  struct autosa_prog *prog, \n                                                  struct autosa_kernel *kernel, \n                                                  int hls,\n                                                  struct autosa_hw_top_module *top)\n{\n  p = autosa_print_local_declarations(p, prog);\n  if (!hls)\n  {\n    p = find_device_xilinx(p);\n    p = declare_and_allocate_device_arrays_xilinx(p, prog, kernel, top);\n  }\n  else\n  {\n    p = declare_and_allocate_cpu_arrays_xilinx(p, prog, kernel, top);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *autosa_free_cpu_arrays_xilinx(\n    __isl_take isl_printer *p, struct autosa_prog *prog, struct autosa_kernel *kernel)\n{\n  p = print_str_new_line(p, \"// Clean up resources\");\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n    p = isl_printer_print_int(p, local_array->n_mem_ports);\n    p = isl_printer_print_str(p, \"; i++) {\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"free(buffer_\");\n    p = isl_printer_print_str(p, local_array->array->name);\n    p = isl_printer_print_str(p, \"[i]);\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n\n  for (int i = 0; i < kernel->n_array; i++)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (!autosa_array_requires_device_allocation(local_array->array))\n      continue;\n\n    if (local_array->n_mem_ports > 1)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n      p = isl_printer_print_int(p, local_array->n_mem_ports);\n      p = isl_printer_print_str(p, \"; i++) {\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, 2);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"free(dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"[i]);\");\n      p = isl_printer_end_line(p);\n\n      if (local_array->host_serialize) {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"free(dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"_unserialized\");\n        p = isl_printer_print_str(p, \"[i]);\");\n        p = isl_printer_end_line(p);\n      }\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n    }\n    else\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"free(dev_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \");\");\n      p = isl_printer_end_line(p);\n\n      if (local_array->host_serialize) {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"free(dev_\");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"_unserialized\");\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n\n  //  for (int i = 0; i < prog->n_array; i++) {\n  //    struct autosa_array_info *array = &prog->array[i];\n  //    if (!autosa_array_requires_device_allocation(&prog->array[i]))\n  //      continue;\n  //\n  //    p = isl_printer_start_line(p);\n  //    p = isl_printer_print_str(p, \"free(dev_\");\n  //    p = isl_printer_print_str(p, array->name);\n  //    p = isl_printer_print_str(p, \");\");\n  //    p = isl_printer_end_line(p);\n  //  }\n\n  return p;\n}\n\n/* Print code for clearing the device after execution of the transformed code.\n * In particular, free the memory that was allocated on the device.\n */\nstatic __isl_give isl_printer *clear_device_xilinx(__isl_take isl_printer *p,\n                                                   struct autosa_prog *prog, \n                                                   struct autosa_kernel *kernel, \n                                                   int hls,\n                                                   struct autosa_hw_top_module *top)\n{\n  if (!hls)\n  {\n    /* Profiling results */\n    p = print_str_new_line(p, \"q.finish();\");\n    p = print_str_new_line(p, \"auto host_end = std::chrono::high_resolution_clock::now();\");\n    p = isl_printer_end_line(p);\n    p = print_str_new_line(p, \"// Calculate time\");\n    p = print_str_new_line(p, \"std::chrono::duration<double> fpga_duration = fpga_end - fpga_begin;\");\n    p = print_str_new_line(p, \"std::cout << \\\"FPGA Time: \\\" << fpga_duration.count() / 10 << \\\" s\\\" << std::endl;\");\n    p = print_str_new_line(p, \"std::chrono::duration<double> host_duration = host_end - host_begin;\");\n    p = print_str_new_line(p, \"std::cout << \\\"Host Time: \\\" << host_duration.count() << \\\" s\\\" << std::endl;\");\n    p = isl_printer_end_line(p);\n  }\n\n  /* Deserialize the buffer data if necessary. */\n  for (int i = 0; i < top->n_hw_modules; i++) {\n    struct autosa_hw_module *module = top->hw_modules[i];\n    if (module->serialize_tree && !module->in) {\n      struct autosa_array_ref_group *group = module->io_groups[0];\n      struct autosa_local_array_info *local_array = group->local_array;\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"host_deserialize_\");\n      p = isl_printer_print_str(p, local_array->array->name);\n      p = isl_printer_print_str(p, \"(\");      \n      p = print_host_serialize_arguments(p, top->kernel, group, module, 0, 0);  // TODO: add hbm support later.\n      p = isl_printer_print_str(p, \");\");      \n      p = isl_printer_end_line(p);\n    }\n  }\n\n  if (hls)\n  {\n    /* Restore buffer */\n    p = print_str_new_line(p, \"// Restore data from host buffers\");\n    for (int i = 0; i < prog->n_array; i++)\n    {\n      struct autosa_array_info *array = &prog->array[i];\n      if (!autosa_array_requires_device_allocation(array))\n        continue;\n\n      if (array->copy_out)\n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"memcpy(\");\n        p = isl_printer_print_str(p, array->name);\n        p = isl_printer_print_str(p, \", dev_\");\n        p = isl_printer_print_str(p, array->name);\n        if (array->local_array->host_serialize) {\n          p = isl_printer_print_str(p, \"_unserialized\");\n        }\n        if (array->local_array->n_mem_ports > 1)\n        {\n          p = isl_printer_print_str(p, \"[0]\");\n        }\n        p = isl_printer_print_str(p, \", \");\n        p = autosa_array_info_print_size(p, array);\n        p = isl_printer_print_str(p, \");\");\n        p = isl_printer_end_line(p);\n      }\n    }\n    p = isl_printer_end_line(p);\n    p = autosa_free_cpu_arrays_xilinx(p, prog, kernel);\n  }\n  else\n  {\n    /* Restore buffer */\n    p = print_str_new_line(p, \"// Restore data from host buffers\");\n    for (int i = 0; i < prog->n_array; i++)\n    {\n      struct autosa_array_info *array = &prog->array[i];\n      if (!autosa_array_requires_device_allocation(array))\n        continue;\n\n      if (array->copy_out)\n      {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"std::copy(dev_\");\n        p = isl_printer_print_str(p, array->name);\n        if (array->local_array->host_serialize) {\n          p = isl_printer_print_str(p, \"_unserialized\");\n        }\n        if (array->local_array->n_mem_ports > 1)\n        {\n          p = isl_printer_print_str(p, \"[0]\");\n        }\n        p = isl_printer_print_str(p, \".begin(), dev_\");\n        p = isl_printer_print_str(p, array->name);\n        if (array->local_array->host_serialize) {\n          p = isl_printer_print_str(p, \"_unserialized\");\n        }\n        if (array->local_array->n_mem_ports > 1)\n        {\n          p = isl_printer_print_str(p, \"[0]\");\n        }\n        p = isl_printer_print_str(p, \".end(), reinterpret_cast<\");\n        p = isl_printer_print_str(p, array->type);\n        p = isl_printer_print_str(p, \" *>(\");\n        p = isl_printer_print_str(p, array->name);\n        p = isl_printer_print_str(p, \"));\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *drain_merge_xilinx(\n    __isl_take isl_printer *p, struct autosa_prog *prog,\n    struct autosa_drain_merge_func *func,\n    int hls)\n{\n  struct autosa_array_ref_group *group = func->group;\n  p = print_str_new_line(p, \"// Merge results\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"for (int idx = \");\n  p = isl_printer_print_int(p, group->mem_port_id);\n  p = isl_printer_print_str(p, \"; idx < \");\n  p = isl_printer_print_int(p, group->mem_port_id + group->n_mem_ports);\n  p = isl_printer_print_str(p, \"; idx++) {\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_indent(p, 2);\n  p = isl_printer_start_line(p);\n  p = autosa_array_ref_group_print_prefix(group, p);\n  p = isl_printer_print_str(p, \"_drain_merge(\");\n  p = print_drain_merge_arguments(p, func->kernel, group, func, 0, hls);\n  p = isl_printer_print_str(p, \");\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_end_line(p);\n  return p;\n}\n\n/* Print code to \"p\" for copying \"array\" from the host to the device\n * in its entirety.  The bounds on the extent of \"array\" have\n * been precomputed in extract_array_info and are used in\n * gpu_array_info_print_size.\n */\nstatic __isl_give isl_printer *copy_array_to_device_xilinx(\n    __isl_take isl_printer *p, struct autosa_prog *prog,\n    struct autosa_array_info *array, int hls)\n{\n  int indent;\n  if (!hls)\n  {\n    struct autosa_local_array_info *local_array = array->local_array;\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n    p = isl_printer_print_int(p, local_array->n_mem_ports);\n    p = isl_printer_print_str(p, \"; i++) {\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n\n    p = print_str_new_line(p, \"OCL_CHECK(err,\");\n    indent = strlen(\"OCL_CHECK(\");\n    p = isl_printer_indent(p, indent);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"err = q.enqueueMigrateMemObjects({buffer_\");\n    p = isl_printer_print_str(p, array->name);\n    p = isl_printer_print_str(p, \"[i]\");\n    p = isl_printer_print_str(p, \"}, 0));\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, -indent);\n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n    p = isl_printer_end_line(p);\n  }\n  else\n  {\n    struct autosa_local_array_info *local_array = array->local_array;\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n    p = isl_printer_print_int(p, local_array->n_mem_ports);\n    p = isl_printer_print_str(p, \"; i++) {\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"memcpy(buffer_\");\n    p = isl_printer_print_str(p, array->name);\n    p = isl_printer_print_str(p, \"[i], dev_\");\n    p = isl_printer_print_str(p, array->name);\n    if (local_array->n_mem_ports > 1)\n    {\n      p = isl_printer_print_str(p, \"[i]\");\n    }\n    p = isl_printer_print_str(p, \", \");\n    if (local_array->host_serialize) {\n      p = autosa_array_info_print_serialize_size(p, array);\n    } else {\n      p = autosa_array_info_print_size(p, array);\n    }\n    p = isl_printer_print_str(p, \");\");\n    p = isl_printer_end_line(p);\n\n    if (prog->scop->options->autosa->axi_stream) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int j = 0; j < \");\n      if (!local_array->host_serialize) {\n        printf(\"[AutoSA] Error: Can't generate AXI Stream interface for array: %s without serialization\\n\", array->name);\n        exit(1);\n      }\n      p = autosa_array_info_print_serialize_data_size(p, array);\n      p = isl_printer_print_str(p, \" / \");\n      p = isl_printer_print_int(p, array->n_lane);\n      p = isl_printer_print_str(p, \"; j++) {\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, 2);\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"fifo_\");\n      p = isl_printer_print_str(p, array->name);\n      p = isl_printer_print_str(p, \".write(buffer_\");\n      p = isl_printer_print_str(p, array->name);\n      p = isl_printer_print_str(p, \"[i][j]);\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n    }\n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n    p = isl_printer_end_line(p);\n  }\n\n  return p;\n}\n\n/* Print code to \"p\" for copying \"array\" back from the device to the host\n * in its entirety.  The bounds on the extent of \"array\" have\n * been precomputed in extract_array_info and are used in\n * polysa_array_info_print_size.\n */\nstatic __isl_give isl_printer *copy_array_from_device_xilinx(\n    __isl_take isl_printer *p, struct autosa_prog *prog,\n    struct autosa_array_info *array, int hls)\n{\n  struct autosa_local_array_info *local_array;\n  int indent;\n\n  local_array = array->local_array;\n  if (!hls)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n    p = isl_printer_print_int(p, local_array->n_io_group_refs);\n    p = isl_printer_print_str(p, \"; i++) {\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n\n    p = print_str_new_line(p, \"OCL_CHECK(err,\");\n    indent = strlen(\"OCL_CHECK(\");\n    p = isl_printer_indent(p, indent);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"err = q.enqueueMigrateMemObjects({buffer_\");\n    p = isl_printer_print_str(p, array->name);\n    p = isl_printer_print_str(p, \"[i]\");\n    p = isl_printer_print_str(p, \"}, CL_MIGRATE_MEM_OBJECT_HOST));\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, -indent);\n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n  else\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n    p = isl_printer_print_int(p, local_array->n_mem_ports);\n    p = isl_printer_print_str(p, \"; i++) {\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n\n    if (prog->scop->options->autosa->axi_stream) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"for (int j = 0; j < \");\n      if (!local_array->host_serialize) {\n        printf(\"[AutoSA] Error: Can't generate AXI Stream interface for array: %s without serialization\\n\", array->name);\n        exit(1);\n      }\n      p = autosa_array_info_print_serialize_data_size(p, array);\n      p = isl_printer_print_str(p, \" / \");\n      p = isl_printer_print_int(p, array->n_lane);\n      p = isl_printer_print_str(p, \"; j++) {\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, 2);\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"buffer_\");\n      p = isl_printer_print_str(p, array->name);\n      p = isl_printer_print_str(p, \"[i][j] = fifo_\");\n      p = isl_printer_print_str(p, array->name);\n      p = isl_printer_print_str(p, \".read();\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, -2);\n      p = print_str_new_line(p, \"}\");\n    }\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"memcpy(dev_\");\n    p = isl_printer_print_str(p, array->name);\n    if (local_array->n_mem_ports > 1)\n    {\n      p = isl_printer_print_str(p, \"[i]\");\n    }\n    p = isl_printer_print_str(p, \", buffer_\");\n    p = isl_printer_print_str(p, array->name);\n    p = isl_printer_print_str(p, \"[i], \");\n    if (local_array->host_serialize) {\n      p = autosa_array_info_print_serialize_size(p, array);\n    } else {\n      p = autosa_array_info_print_size(p, array);\n    }\n    p = isl_printer_print_str(p, \");\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n    p = isl_printer_end_line(p);\n    //    p = isl_printer_start_line(p);\n    //    p = isl_printer_print_str(p, \"memcpy(\");\n    //    p = isl_printer_print_str(p, array->name);\n    //    p = isl_printer_print_str(p, \", dev_\");\n    //    p = isl_printer_print_str(p, array->name);\n    //    p = isl_printer_print_str(p, \", \");\n    //    p = autosa_array_info_print_data_size(p, array);\n    //    p = isl_printer_print_str(p, \" * sizeof(\");\n    //    p = isl_printer_print_str(p, array->type);\n    //    p = isl_printer_print_str(p, \"));\");\n    //    p = isl_printer_end_line(p);\n  }\n\n  return p;\n}\n\n/* Print a statement for copying an array to or from the device,\n * or for initializing or clearing the device.\n * The statement identifier of a copying node is called\n * \"to_device_<array name>\" or \"from_device_<array name>\" and\n * its user pointer points to the autosa_array_info of the array\n * that needs to be copied.\n * The node for initializing the device is called \"init_device\".\n * The node for clearing the device is called \"clear_device\".\n *\n * Extract the array (if any) from the identifier and call\n * init_device, clear_device, copy_array_to_device or copy_array_from_device.\n */\nstatic __isl_give isl_printer *print_device_node_xilinx(__isl_take isl_printer *p,\n                                                        __isl_keep isl_ast_node *node, \n                                                        struct autosa_prog *prog, \n                                                        int hls,\n                                                        struct autosa_hw_top_module *top)\n{\n  isl_ast_expr *expr, *arg;\n  isl_id *id;\n  const char *name;\n  struct autosa_array_info *array;\n  struct autosa_kernel *kernel;\n  struct autosa_drain_merge_func *func;\n\n  expr = isl_ast_node_user_get_expr(node);\n  arg = isl_ast_expr_get_op_arg(expr, 0);\n  id = isl_ast_expr_get_id(arg);\n  name = isl_id_get_name(id);\n  if (!strcmp(name, \"init_device\") || !strcmp(name, \"clear_device\"))\n    kernel = (struct autosa_kernel *)isl_id_get_user(id);\n  else if (!strcmp(name, \"drain_merge\"))\n    func = (struct autosa_drain_merge_func *)isl_id_get_user(id);\n  else\n    array = (struct autosa_array_info *)isl_id_get_user(id);\n  isl_id_free(id);\n  isl_ast_expr_free(arg);\n  isl_ast_expr_free(expr);\n\n  if (!name)\n    return isl_printer_free(p);\n  if (!strcmp(name, \"init_device\"))\n    return init_device_xilinx(p, prog, kernel, hls, top);\n  if (!strcmp(name, \"clear_device\"))\n    return clear_device_xilinx(p, prog, kernel, hls, top);\n  if (!strcmp(name, \"drain_merge\"))\n    return drain_merge_xilinx(p, prog, func, hls);\n  if (!array)\n    return isl_printer_free(p);\n\n  if (!prefixcmp(name, \"to_device\"))\n    return copy_array_to_device_xilinx(p, prog, array, hls);\n  else\n    return copy_array_from_device_xilinx(p, prog, array, hls);\n\n  return p;\n}\n\n/* Set kernel arguments:\n * - arrays\n * - parameters\n * - host iterators\n */\nstatic __isl_give isl_printer *print_set_kernel_arguments_xilinx(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_kernel *kernel)\n{\n  int n_arg = 0, n;\n  unsigned nparam;\n  isl_space *space;\n  const char *type;\n\n  /* array */\n  /*   for (int i = 0; i < prog->n_array; ++i) {\n    int required;\n\n    required = autosa_kernel_requires_array_argument(kernel, i);\n    if (required < 0)\n      return isl_printer_free(p);\n    if (!required)\n      continue;\n\n    struct autosa_array_info *array = &prog->array[i];\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"OCL_CHECK(err, err = krnl.setArg(\");\n    p = isl_printer_print_int(p, n_arg);    \n    p = isl_printer_print_str(p, \", buffer_\");    \n    p = isl_printer_print_str(p, array->name);\n    p = isl_printer_print_str(p, \"));\");\n    p = isl_printer_end_line(p);\n    n_arg++;\n  } */\n  for (int i = 0; i < kernel->n_array; ++i)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (autosa_kernel_requires_array_argument(kernel, i))\n    {\n      if (autosa_array_is_scalar(local_array->array))\n      {\n        /* Scalar */\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"OCL_CHECK(err, err = krnl.setArg(\");\n        p = isl_printer_print_int(p, n_arg);\n        p = isl_printer_print_str(p, \", \");\n        p = isl_printer_print_str(p, local_array->array->name);\n        p = isl_printer_print_str(p, \"));\");\n        p = isl_printer_end_line(p);\n        n_arg++;\n      }\n      else\n      {\n        for (int j = 0; j < local_array->n_io_group_refs; j++)\n        {\n          //auto ref_port_map = local_array->group_ref_mem_port_map.at(j);\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"OCL_CHECK(err, err = krnl.setArg(\");\n          p = isl_printer_print_int(p, n_arg);\n          p = isl_printer_print_str(p, \", buffer_\");\n          p = isl_printer_print_str(p, local_array->array->name);\n          p = isl_printer_print_str(p, \"[\");          \n          //p = isl_printer_print_int(p, ref_port_map.second);          \n          p = isl_printer_print_int(p, local_array->group_ref_mem_port_map.at(j * 2 + 1));\n          p = isl_printer_print_str(p, \"]));\");\n          p = isl_printer_end_line(p);\n          n_arg++;\n        }\n      }\n    }\n  }\n\n  /* param */\n  space = isl_union_set_get_space(kernel->arrays);\n  nparam = isl_space_dim(space, isl_dim_param);\n  for (int i = 0; i < nparam; ++i)\n  {\n    const char *name;\n    name = isl_space_get_dim_name(space, isl_dim_param, i);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"OCL_CHECK(err, err = krnl.setArg(\");\n    p = isl_printer_print_int(p, n_arg);\n    p = isl_printer_print_str(p, \", \");\n    p = isl_printer_print_str(p, name);\n    p = isl_printer_print_str(p, \"));\");\n    p = isl_printer_end_line(p);\n    n_arg++;\n  }\n  isl_space_free(space);\n\n  /* host iterator */\n  n = isl_space_dim(kernel->space, isl_dim_set);\n  type = isl_options_get_ast_iterator_type(prog->ctx);\n  for (int i = 0; i < n; ++i)\n  {\n    const char *name;\n    name = isl_space_get_dim_name(kernel->space, isl_dim_set, i);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"OCL_CHECK(err, err = krnl.setArg(\");\n    p = isl_printer_print_int(p, n_arg);\n    p = isl_printer_print_str(p, \", \");\n    p = isl_printer_print_str(p, name);\n    p = isl_printer_print_str(p, \"));\");\n    p = isl_printer_end_line(p);\n    n_arg++;\n  }\n\n  return p;\n}\n\n/* Print the header of the given kernel to both gen->hls.kernel_h\n * and gen->hls.kernel_c.\n */\nstatic void print_kernel_headers_xilinx(struct autosa_prog *prog,\n                                        struct autosa_kernel *kernel, struct hls_info *hls)\n{\n  isl_printer *p;\n\n  p = isl_printer_to_file(prog->ctx, hls->kernel_h);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  if (!hls->hls)\n  {\n    p = print_str_new_line(p, \"extern \\\"C\\\" {\");\n  }\n  p = print_kernel_header(p, prog, kernel, hls, 1);\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n  if (!hls->hls)\n  {\n    p = print_str_new_line(p, \"}\");\n  }\n\n  isl_printer_free(p);\n\n  if (hls->hcl) {\n    /* Print the kernel declaration to a seperate file. */\n    p = isl_printer_to_file(prog->ctx, hls->hcl_decl);\n    p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n    p = print_kernel_header(p, prog, kernel, hls, 0);\n    p = isl_printer_end_line(p);\n    isl_printer_free(p);\n  }  \n}\n\n/* Print the user statement of the host code to \"p\".\n *\n * The host code may contain original user statements, kernel launches,\n * statements that copy data to/from the device and statements\n * the initialize or clear the device.\n * The original user statements and the kernel launches have\n * an associated annotation, while the other statements do not.\n * The latter are handled by print_device_node.\n * The annotation on the user statements is called \"user\".\n *\n * In case of a kernel launch, print a block of statements that\n * defines the grid and the block and then launches the kernel.\n */\nstatic __isl_give isl_printer *print_host_user_xilinx(__isl_take isl_printer *p,\n                                                      __isl_take isl_ast_print_options *print_options,\n                                                      __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  int is_user;\n  struct autosa_kernel *kernel;\n  struct autosa_kernel_stmt *stmt;\n  struct print_host_user_data *data;\n  struct hls_info *hls;\n  struct autosa_hw_top_module *top;\n\n  isl_ast_print_options_free(print_options);\n\n  data = (struct print_host_user_data *)user;\n  hls = data->hls;\n  top = data->top;\n\n  id = isl_ast_node_get_annotation(node);\n  if (!id)\n  {\n    return print_device_node_xilinx(p, node, data->prog, hls->hls, top);\n  }\n\n  is_user = !strcmp(isl_id_get_name(id), \"user\");\n  kernel = is_user ? NULL : (struct autosa_kernel *)isl_id_get_user(id);\n  stmt = is_user ? (struct autosa_kernel_stmt *)isl_id_get_user(id) : NULL;\n  isl_id_free(id);\n\n  if (is_user)\n    return autosa_kernel_print_domain(p, stmt);\n\n  if (!hls->hls)\n  {\n    /* Print OpenCL host. */\n    p = ppcg_start_block(p);\n\n    p = print_set_kernel_arguments_xilinx(p, data->prog, kernel);\n    p = print_str_new_line(p, \"q.finish();\");\n    p = isl_printer_end_line(p);\n\n    p = print_str_new_line(p, \"// Warm up\");\n    p = print_str_new_line(p, \"OCL_CHECK(err, err = q.enqueueTask(krnl));\");\n    p = print_str_new_line(p, \"q.finish();\");\n    p = isl_printer_end_line(p);\n\n    p = print_str_new_line(p, \"fpga_begin = std::chrono::high_resolution_clock::now();\");\n    p = isl_printer_end_line(p);\n\n    p = print_str_new_line(p, \"// Launch the kernel\");\n    p = print_str_new_line(p, \"for (int i = 0; i < 10; i++)\");\n    p = print_str_new_line(p, \"  OCL_CHECK(err, err = q.enqueueTask(krnl));\");\n    p = isl_printer_end_line(p);\n    p = print_str_new_line(p, \"q.finish();\");\n    p = print_str_new_line(p, \"fpga_end = std::chrono::high_resolution_clock::now();\");\n\n    p = ppcg_end_block(p);\n    p = isl_printer_end_line(p);\n  }\n  else\n  {\n    /* Print HLS host. */\n    p = ppcg_start_block(p);\n\n    p = print_str_new_line(p, \"// Launch the kernel\");\n    p = isl_printer_start_line(p);\n    if (data->prog->scop->options->autosa->hcl) {\n      p = isl_printer_print_str(p, \"autosa_func\"); \n    } else {\n      p = isl_printer_print_str(p, \"kernel0\");\n    }\n    p = isl_printer_print_str(p, \"(\");\n    p = print_kernel_arguments(p, data->prog, kernel, 0, hls);\n    p = isl_printer_print_str(p, \");\");\n    p = isl_printer_end_line(p);\n\n    p = ppcg_end_block(p);\n  }\n  /* Print the top kernel header. */\n  print_kernel_headers_xilinx(data->prog, kernel, data->hls);\n\n  return p;\n}\n\n/* Print the header of the given module.\n */\nstatic __isl_give isl_printer *print_module_header_xilinx(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_hw_module *module,\n    int inter, int boundary)\n{\n  int n = isl_id_list_n_id(module->inst_ids);;\n  int first = 1;\n\n  if (n > 0 && prog->scop->options->autosa->use_cplusplus_template) {\n    /* Print the index template */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"template<\");  \n    for (int i = 0; i < n; i++) {\n      if (!first)\n        p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, \"int p\");\n      p = isl_printer_print_int(p, i);    \n      first = 0;\n    }\n    p = isl_printer_print_str(p, \">\");\n    p = isl_printer_end_line(p);\n  }\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"void \");\n  p = isl_printer_print_str(p, module->name);\n  if (inter == 0)\n    p = isl_printer_print_str(p, \"_intra_trans\");\n  else if (inter == 1)\n    p = isl_printer_print_str(p, \"_inter_trans\");\n  if (boundary)\n    p = isl_printer_print_str(p, \"_boundary\");\n  p = isl_printer_print_str(p, \"(\");\n  p = print_module_arguments(p, prog, module->kernel, module, 1, XILINX_HW, inter, -1, boundary, 0);\n  p = isl_printer_print_str(p, \")\");\n\n  return p;\n}\n\n/* Print the header of the given module to both gen->hls.kernel_h\n * and gen->hls.kernel_c\n * If \"inter\" is -1, this is a normal module call.\n * If \"inter\" is 0, this is a intra_trans module call.\n * If \"inter\" is 1, this is a inter_trans module call.\n */\nstatic isl_stat print_module_headers_xilinx(\n    struct autosa_prog *prog, struct autosa_hw_module *module,\n    struct hls_info *hls, int inter, int boundary)\n{\n  isl_printer *p;\n\n  p = isl_printer_to_file(prog->ctx, hls->kernel_h);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_module_header_xilinx(p, prog, module, inter, boundary);\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  p = isl_printer_to_file(prog->ctx, hls->kernel_c);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_module_header_xilinx(p, prog, module, inter, boundary);\n  //p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  return isl_stat_ok;\n}\n\n/* Print out variable declarations on Xilinx platforms.\n * The local variable can be mapped to different memory resources:\n * FF, LUTRAM, BRAM, URAM.\n */\nstatic __isl_give isl_printer *print_module_var_xilinx(\n    __isl_take isl_printer *p,\n    struct autosa_kernel_var *var, int double_buffer,\n    struct autosa_hw_module *module)\n{\n  int j;\n  int use_memory = 0; // 0: FF 1: LUTRAM 2: BRAM 3: URAM\n  use_memory = extract_memory_type(module, var, module->options->autosa->uram);\n\n  p = isl_printer_start_line(p);\n  if (var->array->local_array->is_sparse && module->type != PE_MODULE) {\n    p = isl_printer_print_str(p, var->array->name);\n    p = isl_printer_print_str(p, \"_s_t\");\n    p = isl_printer_print_int(p, var->n_lane);\n  } else {\n    //if (var->n_lane == 1)\n    //  p = isl_printer_print_str(p, var->array->type);\n    //else {\n      p = isl_printer_print_str(p, var->array->name);    \n      p = isl_printer_print_str(p, \"_t\");\n      p = isl_printer_print_int(p, var->n_lane);\n    //}\n  }\n  p = isl_printer_print_str(p, \" \");\n  p = isl_printer_print_str(p, var->name);\n  if (double_buffer)\n    p = isl_printer_print_str(p, \"_ping\");\n  for (j = 0; j < isl_vec_size(var->size); ++j)\n  {\n    isl_val *v;\n\n    p = isl_printer_print_str(p, \"[\");\n    v = isl_vec_get_element_val(var->size, j);\n    p = isl_printer_print_val(p, v);\n    isl_val_free(v);\n    p = isl_printer_print_str(p, \"]\");\n  }\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n  if (use_memory && var->n_part != 1)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"#pragma HLS ARRAY_PARTITION variable=\");\n    p = isl_printer_print_str(p, var->name);\n    if (double_buffer)\n      p = isl_printer_print_str(p, \"_ping\");\n    p = isl_printer_print_str(p, \" dim=\");\n    p = isl_printer_print_int(p, isl_vec_size(var->size));\n    p = isl_printer_print_str(p, \" factor=\");\n    p = isl_printer_print_int(p, var->n_part);\n    p = isl_printer_print_str(p, \" cyclic\");\n    p = isl_printer_end_line(p);\n  } else if (use_memory == 0) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"#pragma HLS ARRAY_PARTITION variable=\");\n    p = isl_printer_print_str(p, var->name);\n    if (double_buffer)\n      p = isl_printer_print_str(p, \"_ping\");\n    p = isl_printer_print_str(p, \" dim=0 complete\");\n    p = isl_printer_end_line(p);\n  }\n\n  if (use_memory)\n  {\n    //if (double_buffer)\n    //{\n    //  p = isl_printer_start_line(p);\n    //  p = isl_printer_print_str(p, \"#pragma HLS ARRAY_MAP variable=\");\n    //  p = isl_printer_print_str(p, var->name);\n    //  p = isl_printer_print_str(p, \"_ping instance=\");\n    //  p = isl_printer_print_str(p, var->name);\n    //  p = isl_printer_print_str(p, \" horizontal\");\n    //  p = isl_printer_end_line(p);\n    //}\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"#pragma HLS RESOURCE variable=\");\n    p = isl_printer_print_str(p, var->name);\n    if (double_buffer)\n      p = isl_printer_print_str(p, \"_ping\");\n    if (module->type == IO_MODULE && module->data_pack_inter == module->data_pack_intra)\n      p = isl_printer_print_str(p, use_memory == 1 ? \" core=RAM_1P_LUTRAM\" : (use_memory == 2 ? \" core=RAM_1P_BRAM\" : \" core=RAM_1P_URAM\"));\n    else\n      p = isl_printer_print_str(p, use_memory == 1 ? \" core=RAM_2P_LUTRAM\" : (use_memory == 2 ? \" core=RAM_2P_BRAM\" : \" core=RAM_2P_URAM\"));\n    p = isl_printer_end_line(p);\n\n    if (var->array->local_array->is_sparse) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"#pragma HLS DATA_PACK variable=\");\n      p = isl_printer_print_str(p, var->name);\n      if (double_buffer)\n        p = isl_printer_print_str(p, \"_ping\");\n      p = isl_printer_end_line(p);  \n    }\n  }\n\n  /* Print pong buffer */\n  if (double_buffer)\n  {\n    p = isl_printer_start_line(p);\n    if (var->array->local_array->is_sparse) {\n      p = isl_printer_print_str(p, var->array->name);\n      p = isl_printer_print_str(p, \"_s_t\");      \n      p = isl_printer_print_int(p, var->n_lane);      \n    } else {\n      if (var->n_lane == 1)\n        p = isl_printer_print_str(p, var->array->type);\n      else {\n        p = isl_printer_print_str(p, var->array->name);        \n        p = isl_printer_print_str(p, \"_t\");\n        p = isl_printer_print_int(p, var->n_lane);\n      }\n    }\n    p = isl_printer_print_str(p, \" \");\n    p = isl_printer_print_str(p, var->name);\n    if (double_buffer)\n      p = isl_printer_print_str(p, \"_pong\");\n    for (j = 0; j < isl_vec_size(var->size); ++j)\n    {\n      isl_val *v;\n\n      p = isl_printer_print_str(p, \"[\");\n      v = isl_vec_get_element_val(var->size, j);\n      p = isl_printer_print_val(p, v);\n      isl_val_free(v);\n      p = isl_printer_print_str(p, \"]\");\n    }\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n    if (use_memory && var->n_part != 1)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"#pragma HLS ARRAY_PARTITION variable=\");\n      p = isl_printer_print_str(p, var->name);\n      if (double_buffer)\n        p = isl_printer_print_str(p, \"_pong\");\n      p = isl_printer_print_str(p, \" dim=\");\n      p = isl_printer_print_int(p, isl_vec_size(var->size));\n      p = isl_printer_print_str(p, \" factor=\");\n      p = isl_printer_print_int(p, var->n_part);\n      p = isl_printer_print_str(p, \" cyclic\");\n      p = isl_printer_end_line(p);\n    } else if (use_memory == 0) {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"#pragma HLS ARRAY_PARTITION variable=\");\n      p = isl_printer_print_str(p, var->name);\n      if (double_buffer)\n        p = isl_printer_print_str(p, \"_pong\");\n      p = isl_printer_print_str(p, \" dim=0 complete\");\n      p = isl_printer_end_line(p);\n    }\n\n    if (use_memory)\n    {\n      //p = isl_printer_start_line(p);\n      //p = isl_printer_print_str(p, \"#pragma HLS ARRAY_MAP variable=\");\n      //p = isl_printer_print_str(p, var->name);\n      //p = isl_printer_print_str(p, \"_pong instance=\");\n      //p = isl_printer_print_str(p, var->name);\n      //p = isl_printer_print_str(p, \" horizontal\");\n      //p = isl_printer_end_line(p);\n\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"#pragma HLS RESOURCE variable=\");\n      p = isl_printer_print_str(p, var->name);\n      p = isl_printer_print_str(p, \"_pong\");\n      if (module->type == IO_MODULE && module->data_pack_inter == module->data_pack_intra)\n        p = isl_printer_print_str(p, use_memory == 1 ? \" core=RAM_1P_LUTRAM\" : (use_memory == 2 ? \" core=RAM_1P_BRAM\" : \" core=RAM_1P_URAM\"));\n      else\n        p = isl_printer_print_str(p, use_memory == 1 ? \" core=RAM_2P_LUTRAM\" : (use_memory == 2 ? \" core=RAM_2P_BRAM\" : \" core=RAM_2P_URAM\"));\n      //p = isl_printer_print_str(p, use_memory == 1 ? \" core=RAM_2P_LUTRAM\" : (use_memory == 2 ? \" core=RAM_2P_BRAM\" : \" core=RAM_2P_URAM\"));\n      p = isl_printer_end_line(p);\n\n      if (var->array->local_array->is_sparse) {\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"#pragma HLS DATA_PACK variable=\");\n        p = isl_printer_print_str(p, var->name);\n        p = isl_printer_print_str(p, \"_pong\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_module_vars_xilinx(__isl_take isl_printer *p,\n                                                        struct autosa_hw_module *module, int inter)\n{\n  int i, n;\n  isl_space *space;\n  const char *type;\n\n  if (inter == -1)\n  {\n    for (i = 0; i < module->n_var; ++i)\n      p = print_module_var_xilinx(p, &module->var[i], module->double_buffer, module);\n  }\n\n  if (module->double_buffer && inter == -1)\n  {\n    type = isl_options_get_ast_iterator_type(module->kernel->ctx);\n\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"bool arb = 0;\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, module->in ? \"bool inter_trans_en = 1;\" : \"bool inter_trans_en = 0;\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, module->in ? \"bool intra_trans_en = 0;\" : \"bool intra_trans_en = 1;\");\n    p = isl_printer_end_line(p);\n    /* iterators */\n    space = (module->in) ? module->intra_space : module->inter_space;\n    n = isl_space_dim(space, isl_dim_set);\n    for (int i = 0; i < n; i++)\n    {\n      const char *name;\n      name = isl_space_get_dim_name(space, isl_dim_set, i);\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, type);\n      p = isl_printer_print_str(p, \" \");\n      p = isl_printer_print_str(p, name);\n      p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, name);\n      p = isl_printer_print_str(p, \"_prev\");\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n    }\n  }\n\n  return p;\n}\n\n//static __isl_give isl_printer *print_module_stmt(__isl_take isl_printer *p,\n//                                                 __isl_take isl_ast_print_options *print_options,\n//                                                 __isl_keep isl_ast_node *node, void *user)\n//{\n//  isl_id *id;\n//  struct autosa_kernel_stmt *stmt;\n//  struct print_hw_module_data *hw_data = (struct print_hw_module_data *)(user);\n//  struct autosa_hw_module *module = hw_data->module;\n//\n//  id = isl_ast_node_get_annotation(node);\n//  stmt = (struct autosa_kernel_stmt *)isl_id_get_user(id);\n//  isl_id_free(id);\n//\n//  isl_ast_print_options_free(print_options);\n//\n//  switch (stmt->type)\n//  {\n//    //    case POLYSA_KERNEL_STMT_COPY:\n//    //      return autosa_kernel_print_copy(p, stmt);\n//    //    case POLYSA_KERNEL_STMT_SYNC:\n//    //      return print_sync(p, stmt);\n//  case AUTOSA_KERNEL_STMT_DOMAIN:\n//    return autosa_kernel_print_domain(p, stmt);\n//  case AUTOSA_KERNEL_STMT_IO:\n//    return autosa_kernel_print_io(p, stmt, hw_data->hls);\n//  case AUTOSA_KERNEL_STMT_IO_TRANSFER:\n//    return autosa_kernel_print_io_transfer(p, stmt, hw_data->hls, \n//              module->options->autosa->double_buffer_style == 0?\n//                hw_data->iterator_prefix : NULL);\n//  case AUTOSA_KERNEL_STMT_IO_DRAM:\n//    return autosa_kernel_print_io_dram(p, stmt, hw_data->hls);\n//  case AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTER_TRANS:\n//    return autosa_kernel_print_inter_trans(p, stmt, hw_data->hls);\n//  case AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTRA_TRANS:\n//    return autosa_kernel_print_intra_trans(p, stmt, hw_data->hls);\n//  case AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTER_INTRA:\n//    return autosa_kernel_print_inter_intra(p, stmt, hw_data->hls);\n//  case AUTOSA_KERNEL_STMT_IO_MODULE_CALL_INTRA_INTER:\n//    return autosa_kernel_print_intra_inter(p, stmt, hw_data->hls);\n//  case AUTOSA_KERNEL_STMT_IO_MODULE_CALL_STATE_HANDLE:\n//    return autosa_kernel_print_state_handle(p, stmt, hw_data->hls);\n//  case AUTOSA_KERNEL_STMT_DRAIN_MERGE:\n//    return autosa_kernel_print_drain_merge(p, stmt, hw_data->hls);\n//  case AUTOSA_KERNEL_STMT_HOST_SERIALIZE:\n//    return autosa_kernel_print_host_serialize(p, stmt, hw_data->hls);\n//  }\n//\n//  return p;\n//}\n\nstatic __isl_give isl_printer *print_for_with_pipeline(\n    __isl_keep isl_ast_node *node, __isl_take isl_printer *p,\n    __isl_take isl_ast_print_options *print_options)\n{\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"#pragma HLS PIPELINE II=1\");\n  p = isl_printer_end_line(p);\n\n  p = isl_ast_node_for_print(node, p, print_options);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_for_with_unroll(\n    __isl_keep isl_ast_node *node, __isl_take isl_printer *p,\n    __isl_take isl_ast_print_options *print_options)\n{\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"#pragma HLS UNROLL\");\n  p = isl_printer_end_line(p);\n\n  p = isl_ast_node_for_print(node, p, print_options);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_for_xilinx(__isl_take isl_printer *p,\n                                                __isl_take isl_ast_print_options *print_options,\n                                                __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  int pipeline;\n  int unroll;\n\n  pipeline = 0;\n  unroll = 0;\n  id = isl_ast_node_get_annotation(node);\n\n  if (id)\n  {\n    struct autosa_ast_node_userinfo *info;\n\n    info = (struct autosa_ast_node_userinfo *)isl_id_get_user(id);\n    if (info && info->is_pipeline)\n      pipeline = 1;\n    if (info && info->is_unroll)\n      unroll = 1;\n  }\n\n  if (pipeline)\n    p = print_for_with_pipeline(node, p, print_options);\n  else if (unroll)\n    p = print_for_with_unroll(node, p, print_options);\n  else\n    p = isl_ast_node_for_print(node, p, print_options);\n\n  isl_id_free(id);\n\n  return p;\n}\n\n///* This function simply skips all for loops to print. */\n//static __isl_give isl_printer *print_for_skip(__isl_take isl_printer *p,\n//                                              __isl_take isl_ast_print_options *print_options,\n//                                              __isl_keep isl_ast_node *node, void *user)\n//{\n//  return p;\n//}\n\n/* Print the intra_trans module.\n */\nstatic __isl_give isl_printer *autosa_print_intra_trans_module(\n    __isl_take isl_printer *p,\n    struct autosa_hw_module *module, struct autosa_prog *prog,\n    struct hls_info *hls, int boundary)\n{\n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  if (!module->intra_tree)\n    return p;\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  print_module_headers_xilinx(prog, module, hls, 0, boundary);\n  fprintf(hls->kernel_c, \" {\\n\");\n  if (hls->target == XILINX_HW) {\n    /* If double buffer is disabled, the module is then inlined to reduce the \n     * overheads.\n     * Double buffer module can't inlined, this might cause deadlocks.\n     */\n    //printf(\"intra trans module name: %s %d\\n\", module->name, module->use_FF);\n    if (module->double_buffer)\n      fprintf(hls->kernel_c, \"#pragma HLS INLINE OFF\\n\");\n    else   \n      fprintf(hls->kernel_c, \"#pragma HLS INLINE\\n\");\n  }\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  if (!prog->scop->options->autosa->use_cplusplus_template) {\n    p = print_module_iterators(p, hls->kernel_c, module);\n  }\n  p = print_module_vars_xilinx(p, module, 0);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  if (module->double_buffer)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"if (!intra_trans_en) return;\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_end_line(p);\n  }\n  /* For local reduce, print the buffer initialization. */\n  for (int i = 0; i < module->n_var; i++) {\n    if (module->var[i].init_required) {\n      p = autosa_print_var_initialization(p, &module->var[i], hls->target);\n    }\n  }\n  p = isl_printer_end_line(p);\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  if (hls->target == XILINX_HW)\n  {\n    print_options = isl_ast_print_options_set_print_for(print_options,\n                                                        &print_for_xilinx, &hw_data);\n  }\n\n  p = isl_ast_node_print(module->intra_tree, p, print_options);\n  p = isl_printer_indent(p, -2);\n\n  fprintf(hls->kernel_c, \"}\\n\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print the inter_trans module.\n */\nstatic __isl_give isl_printer *autosa_print_inter_trans_module(\n    __isl_take isl_printer *p,\n    struct autosa_hw_module *module, struct autosa_prog *prog,\n    struct hls_info *hls, int boundary)\n{\n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  if (boundary) {\n    if (!module->boundary_inter_tree)\n      return p;\n  } else {\n    if (!module->inter_tree)\n      return p;\n  }  \n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  if (hls->target == XILINX_HW)\n    print_module_headers_xilinx(prog, module, hls, 1, boundary);\n  fprintf(hls->kernel_c, \" {\\n\");\n  if (hls->target == XILINX_HW) {\n    if (module->double_buffer)\n      fprintf(hls->kernel_c, \"#pragma HLS INLINE OFF\\n\");\n    else\n      fprintf(hls->kernel_c, \"#pragma HLS INLINE\\n\");\n  }\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  if (!prog->scop->options->autosa->use_cplusplus_template) {\n    p = print_module_iterators(p, hls->kernel_c, module);\n  }\n  if (hls->target == XILINX_HW)\n    p = print_module_vars_xilinx(p, module, 1);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  if (module->double_buffer)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"if (!inter_trans_en) return;\");\n    p = isl_printer_end_line(p);\n    p = isl_printer_end_line(p);\n  }\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  if (hls->target == XILINX_HW)\n  {\n    print_options = isl_ast_print_options_set_print_for(print_options,\n                                                        &print_for_xilinx, &hw_data);\n  }\n\n  p = isl_ast_node_print((boundary == 0) ? module->inter_tree : module->boundary_inter_tree, p, print_options);\n  p = isl_printer_indent(p, -2);\n\n  fprintf(hls->kernel_c, \"}\\n\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n///* Print the drained data merge functions. \n// */\n//static isl_stat print_drain_merge_funcs(\n//    struct autosa_kernel *kernel,\n//    struct autosa_drain_merge_func **funcs, int n_funcs,\n//    struct hls_info *hls)\n//{\n//  isl_printer *p;\n//  isl_ctx *ctx;\n//\n//  if (n_funcs == 0)\n//    return isl_stat_ok;\n//\n//  ctx = kernel->ctx;\n//  if (!hls->hls)\n//    p = isl_printer_to_file(kernel->ctx, hls->host_h);\n//  else\n//    p = isl_printer_to_file(kernel->ctx, hls->kernel_h);\n//  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n//  for (int i = 0; i < n_funcs; i++)\n//  {\n//    struct autosa_array_ref_group *group = funcs[i]->group;\n//    isl_ast_print_options *print_options;\n//    struct print_hw_module_data hw_data = {hls, NULL, NULL, NULL};\n//\n//    p = print_str_new_line(p, \"/* Helper Function */\");\n//    p = isl_printer_start_line(p);\n//    if (hls->hls)\n//      p = isl_printer_print_str(p, \"inline \");\n//    p = isl_printer_print_str(p, \"void \");\n//    p = autosa_array_ref_group_print_prefix(group, p);\n//    p = isl_printer_print_str(p, \"_drain_merge(\");\n//    p = print_drain_merge_arguments(p, kernel, group, funcs[i], 1, hls->hls);\n//    p = isl_printer_print_str(p, \"){\");\n//    p = isl_printer_end_line(p);\n//    p = isl_printer_indent(p, 2);\n//\n//    p = print_str_new_line(p, \"/* Variable Declaration */\");\n//    if (!hls->hls)\n//      print_func_iterators(hls->host_h, funcs[i]);\n//    else\n//      print_func_iterators(hls->kernel_h, funcs[i]);\n//    p = print_str_new_line(p, \"/* Variable Declaration */\");\n//    p = isl_printer_end_line(p);\n//\n//    print_options = isl_ast_print_options_alloc(ctx);\n//    print_options = isl_ast_print_options_set_print_user(print_options,\n//                                                         &print_module_stmt, &hw_data);\n//    p = isl_ast_node_print(funcs[i]->device_tree, p, print_options);\n//\n//    p = isl_printer_indent(p, -2);\n//    p = print_str_new_line(p, \"}\");\n//    p = print_str_new_line(p, \"/* Helper Function */\");\n//  }\n//  p = isl_printer_end_line(p);\n//  isl_printer_free(p);\n//\n//  return isl_stat_ok;\n//}\n\nstatic __isl_give isl_printer *print_module_core_header_xilinx(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_hw_module *module,\n    int inter, int boundary, int serialize, int types)\n{\n  int n = isl_id_list_n_id(module->inst_ids);\n  if (types && n > 0 && prog->scop->options->autosa->use_cplusplus_template) {\n    /* Print the template */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"template<\");\n    for (int i = 0; i < n; i++) {\n      if (i > 0)\n        p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, \"int p\");\n      p = isl_printer_print_int(p, i);\n    }\n    p = isl_printer_print_str(p, \">\");\n    p = isl_printer_end_line(p);\n  }\n\n  p = isl_printer_start_line(p);  \n  if (types)\n    p = isl_printer_print_str(p, \"void \");\n  p = isl_printer_print_str(p, module->name);\n  if (inter == 0)\n    p = isl_printer_print_str(p, \"_intra_trans\");\n  else if (inter == 1)\n    p = isl_printer_print_str(p, \"_inter_trans\");\n  if (boundary)\n    p = isl_printer_print_str(p, \"_boundary\");\n  if (serialize)\n    p = isl_printer_print_str(p, \"_serialize\");\n  if (!types && n > 0 && prog->scop->options->autosa->use_cplusplus_template) {\n    p = isl_printer_print_str(p, \"<\");\n    for (int i = 0; i < n; i++) {\n      if (i > 0)\n        p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, \"p\");\n      p = isl_printer_print_int(p, i);\n    }\n    p = isl_printer_print_str(p, \">\");\n  }\n  p = isl_printer_print_str(p, \"(\");\n  if (!types) {\n    p = isl_printer_end_line(p);\n    p = isl_printer_indent(p, 2);\n    p = isl_printer_start_line(p);  \n  }\n  p = print_module_arguments(p, prog, module->kernel, module, types,\n                             XILINX_HW, inter, -1, boundary, serialize);                             \n  p = isl_printer_print_str(p, \")\");\n  if (!types) {\n    p = isl_printer_indent(p, -2);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_module_core_headers_xilinx(\n    __isl_take isl_printer *p, struct autosa_prog *prog,\n    struct autosa_hw_module *module, struct hls_info *hls,\n    int inter, int boundary, int serialize, int types)\n{\n  p = print_module_core_header_xilinx(p, prog, module, inter, boundary, serialize, types);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_module_wrapper_header_xilinx(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_hw_module *module,\n    int inter, int boundary)\n{\n  int n = isl_id_list_n_id(module->inst_ids);\n  if (n > 0 && prog->scop->options->autosa->use_cplusplus_template) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"template<\");\n    for (int i = 0; i < n; i++) {\n      if (i > 0)\n        p = isl_printer_print_str(p, \", \");\n      p = isl_printer_print_str(p, \"int p\");\n      p = isl_printer_print_int(p, i);        \n    }\n    p = isl_printer_print_str(p, \">\");\n    p = isl_printer_end_line(p);\n  }\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"void \");\n  p = isl_printer_print_str(p, module->name);\n  if (inter == 0)\n    p = isl_printer_print_str(p, \"_intra_trans\");\n  else if (inter == 1)\n    p = isl_printer_print_str(p, \"_inter_trans\");\n  if (boundary)\n    p = isl_printer_print_str(p, \"_boundary\");\n  p = isl_printer_print_str(p, \"_wrapper\");\n  p = isl_printer_print_str(p, \"(\");\n  p = print_module_arguments(p, prog, module->kernel, module, 1,\n                             XILINX_HW, inter, -1, boundary, 0);\n  p = isl_printer_print_str(p, \")\");\n\n  return p;\n}\n\nstatic isl_stat print_module_wrapper_headers_xilinx(\n    struct autosa_prog *prog, struct autosa_hw_module *module,\n    struct hls_info *hls, int inter, int boundary)\n{\n  isl_printer *p;\n\n  p = isl_printer_to_file(prog->ctx, hls->kernel_h);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_module_wrapper_header_xilinx(p, prog, module, inter, boundary);\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  p = isl_printer_to_file(prog->ctx, hls->kernel_c);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_module_wrapper_header_xilinx(p, prog, module, inter, boundary);\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  return isl_stat_ok;\n}\n\n/* Print the body for a module that connects to the DRAM with serialized data. \n */\n//static __isl_give isl_printer *print_module_serialize_body(\n//    __isl_take isl_printer *p, struct autosa_hw_module *module)\n//{\n//  isl_pw_qpolynomial *total_bound_pwq = module->io_groups[0]->array->local_array->serialize_bound;\n//  long int total_bound = -1;  \n//  int ele_size = module->io_groups[0]->array->size; // bytes\n//  total_bound = convert_pwqpoly_to_int(total_bound_pwq);\n//  int data_pack_in = module->data_pack_serialize;\n//  int data_pack_out = module->data_pack_inter;  \n//\n//  if (data_pack_in == data_pack_out) {    \n//    if (module->in) {\n//      p = isl_printer_start_line(p);\n//      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n//      p = isl_printer_print_int(p, total_bound / data_pack_out);\n//      p = isl_printer_print_str(p, \"; i++) {\");\n//      p = isl_printer_end_line(p);\n//    \n//      p = print_str_new_line(p, \"#pragma HLS PIPELINE II=1\");\n//      p = isl_printer_indent(p, 2);\n//      p = isl_printer_start_line(p);\n//      p = autosa_print_array_type_with_lane(p, module->io_groups[0]->array, data_pack_in);      \n//      p = isl_printer_print_str(p, \" fifo_data;\");\n//      p = isl_printer_end_line(p);\n//\n//      p = isl_printer_start_line(p);\n//      p = isl_printer_print_str(p, \"fifo_data = \");\n//      p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n//      p = isl_printer_print_str(p, \"[i];\");\n//      p = isl_printer_end_line(p);\n//\n//      p = isl_printer_start_line(p);\n//      p = autosa_array_ref_group_print_fifo_name(module->io_groups[0], p);\n//      p = isl_printer_print_str(p, \"_local_out.write(fifo_data);\");      \n//      p = isl_printer_end_line(p);\n//\n//      p = isl_printer_indent(p, -2);\n//      p = print_str_new_line(p, \"}\");\n//    } else {\n//      p = isl_printer_start_line(p);\n//      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n//      p = isl_printer_print_int(p, total_bound / data_pack_out);\n//      p = isl_printer_print_str(p, \"; i++) {\");\n//      p = isl_printer_end_line(p);\n//    \n//      p = print_str_new_line(p, \"#pragma HLS PIPELINE II=1\");\n//      p = isl_printer_indent(p, 2);\n//      p = isl_printer_start_line(p);\n//      p = autosa_print_array_type_with_lane(p, module->io_groups[0]->array, data_pack_out);      \n//      p = isl_printer_print_str(p, \" fifo_data;\");\n//      p = isl_printer_end_line(p);\n//\n//      p = isl_printer_start_line(p);\n//      p = isl_printer_print_str(p, \"fifo_data = \");\n//      p = autosa_array_ref_group_print_fifo_name(module->io_groups[0], p);\n//      p = isl_printer_print_str(p, \"_local_in.read();\");\n//      //p = isl_printer_print_str(p, \"fifo_data = fifo_\");\n//      //p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n//      //if (module->type == DRAIN_MODULE)      \n//        //p = isl_printer_print_str(p, \"_drain\");\n//      //p = isl_printer_print_str(p, \"_local_in.read();\");\n//      p = isl_printer_end_line(p);\n//\n//      p = isl_printer_start_line(p);\n//      p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n//      p = isl_printer_print_str(p, \"[i] = fifo_data;\");\n//      p = isl_printer_end_line(p);\n//\n//      p = isl_printer_indent(p, -2);\n//      p = print_str_new_line(p, \"}\");\n//    }\n//  } else {    \n//    if (module->in) {\n//      /* [type] fifo_data; */\n//      p = isl_printer_start_line(p);\n//      p = autosa_print_array_type_with_lane(p, module->io_groups[0]->array, data_pack_out);\n//      p = isl_printer_print_str(p, \" fifo_data;\");\n//      p = isl_printer_end_line(p);\n//\n//      /* [type2] mem_data; */\n//      p = isl_printer_start_line(p);\n//      p = autosa_print_array_type_with_lane(p, module->io_groups[0]->array, data_pack_in);\n//      p = isl_printer_print_str(p, \" mem_data;\");\n//      p = isl_printer_end_line(p);\n//      \n//      p = isl_printer_start_line(p);\n//      if (data_pack_out == 1) {\n//        /* union {unsigned int ui; [type] ut;} u; */\n//        p = isl_printer_print_str(p, \"union {unsigned int ui; \");\n//        p = isl_printer_print_str(p, module->io_groups[0]->array->type);\n//        p = isl_printer_print_str(p, \" ut;} u;\");        \n//      }\n//      p = isl_printer_end_line(p);\n//\n//      p = isl_printer_start_line(p);\n//      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n//      p = isl_printer_print_int(p, total_bound / data_pack_in);\n//      p = isl_printer_print_str(p, \"; i++) {\");\n//      p = isl_printer_end_line(p);\n//    \n//      p = print_str_new_line(p, \"#pragma HLS PIPELINE II=1\");\n//      p = isl_printer_indent(p, 2);\n//\n//      /* mem_data = array[]; */\n//      p = isl_printer_start_line(p);\n//      p = isl_printer_print_str(p, \"mem_data = \");\n//      p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n//      p = isl_printer_print_str(p, \"[i];\");\n//      p = isl_printer_end_line(p);\n//\n//      p = isl_printer_start_line(p);\n//      p = isl_printer_print_str(p, \"for (int p = 0; p < \");\n//      p = isl_printer_print_int(p, data_pack_in / data_pack_out);\n//      p = isl_printer_print_str(p, \"; p++) {\");\n//      p = isl_printer_end_line(p);\n//      p = isl_printer_indent(p, 2);\n//\n//      /* fifo_data = mem_data(..,..); */\n//      p = isl_printer_start_line(p);\n//      if (data_pack_out == 1) {\n//        p = isl_printer_print_str(p, \"u.ui = (unsigned int)mem_data(\");\n//        p = isl_printer_print_int(p, ele_size * data_pack_out * 8 - 1);\n//        p = isl_printer_print_str(p, \", 0);\");\n//        p = isl_printer_end_line(p);\n//\n//        p = print_str_new_line(p, \"fifo_data = u.ut;\");\n//      } else {\n//        p = isl_printer_print_str(p, \"fifo_data = mem_data(\");\n//        p = isl_printer_print_int(p, ele_size * data_pack_out * 8 - 1);\n//        p = isl_printer_print_str(p, \", 0);\");\n//      }\n//      p = isl_printer_end_line(p);\n//\n//      /* mem_data = mem_data >> .. */\n//      p = isl_printer_start_line(p);\n//      p = isl_printer_print_str(p, \"mem_data = mem_data >> \");\n//      p = isl_printer_print_int(p, ele_size * data_pack_out * 8);\n//      p = isl_printer_print_str(p, \";\");\n//      p = isl_printer_end_line(p);\n//\n//      p = isl_printer_start_line(p);\n//      p = autosa_array_ref_group_print_fifo_name(module->io_groups[0], p);\n//      p = isl_printer_print_str(p, \"_local_out.write(fifo_data);\");\n//      //p = isl_printer_print_str(p, \"fifo_\");\n//      //p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n//      //p = isl_printer_print_str(p, \"_local_out.write(fifo_data);\");\n//      p = isl_printer_end_line(p);\n//\n//      p = isl_printer_indent(p, -2);\n//      p = print_str_new_line(p, \"}\");\n//\n//      p = isl_printer_indent(p, -2);\n//      p = print_str_new_line(p, \"}\");\n//    } else {\n//      p = isl_printer_start_line(p);\n//      p = isl_printer_print_str(p, \"for (int i = 0; i < \");\n//      p = isl_printer_print_int(p, total_bound / data_pack_in);\n//      p = isl_printer_print_str(p, \"; i++) {\");\n//      p = isl_printer_end_line(p);\n//    \n//      p = print_str_new_line(p, \"#pragma HLS PIPELINE II=1\");\n//      p = isl_printer_indent(p, 2);\n//\n//      /* [type] fifo_data; */\n//      p = isl_printer_start_line(p);\n//      p = autosa_print_array_type_with_lane(p, module->io_groups[0]->array, data_pack_out);      \n//      p = isl_printer_print_str(p, \" fifo_data;\");\n//      p = isl_printer_end_line(p);      \n//\n//      /* [type2] mem_data; */\n//      p = isl_printer_start_line(p);\n//      p = autosa_print_array_type_with_lane(p, module->io_groups[0]->array, data_pack_in);      \n//      p = isl_printer_print_str(p, \" mem_data;\");\n//      p = isl_printer_end_line(p);      \n//\n//      if (data_pack_out == 1) {\n//        /* union {unsigned int ui; [type] ut;} u; */\n//        p = isl_printer_start_line(p);\n//        p = isl_printer_print_str(p, \"union {unsigned int ui; \");\n//        p = isl_printer_print_str(p, module->io_groups[0]->array->type);\n//        p = isl_printer_print_str(p, \" ut;} u;\");        \n//        p = isl_printer_end_line(p);\n//      }\n//\n//      p = isl_printer_start_line(p);\n//      if (data_pack_out == 1) {\n//        p = isl_printer_print_str(p, \"ap_uint<\");\n//        p = isl_printer_print_int(p, module->io_groups[0]->array->size * 8);\n//        p = isl_printer_print_str(p, \">\");\n//      } else {\n//        p = autosa_print_array_type_with_lane(p, module->io_groups[0]->array, data_pack_out);\n//      }      \n//      p = isl_printer_print_str(p, \" mem_data_split[\");\n//      p = isl_printer_print_int(p, data_pack_in / data_pack_out);\n//      p = isl_printer_print_str(p, \"];\");\n//      p = isl_printer_end_line(p);\n//\n//      p = print_str_new_line(p, \"#pragma HLS ARRAY_PARTITION variable=mem_data_split complete\");\n//\n//      p = isl_printer_start_line(p);\n//      p = isl_printer_print_str(p, \"for (int p = 0; p < \");\n//      p = isl_printer_print_int(p, data_pack_in / data_pack_out);\n//      p = isl_printer_print_str(p, \"; p++) {\");\n//      p = isl_printer_end_line(p);\n//      p = isl_printer_indent(p, 2);\n//\n//      p = isl_printer_start_line(p);\n//      p = isl_printer_print_str(p, \"fifo_data = \");\n//      p = autosa_array_ref_group_print_fifo_name(module->io_groups[0], p);\n//      p = isl_printer_print_str(p, \"_local_in.read();\");\n//      //p = isl_printer_print_str(p, \"fifo_data = fifo_\");\n//      //p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n//      //if (module->type == DRAIN_MODULE)      \n//        //p = isl_printer_print_str(p, \"_drain\");\n//      //p = isl_printer_print_str(p, \"_local_in.read();\");\n//      p = isl_printer_end_line(p);\n//\n//      if (data_pack_out == 1) {\n//        p = print_str_new_line(p, \"u.ut = fifo_data;\");\n//\n//        p = isl_printer_start_line(p);\n//        p = isl_printer_print_str(p, \"mem_data_split[n] = ap_uint<\");\n//        p = isl_printer_print_int(p, module->io_groups[0]->array->size * 8);\n//        p = isl_printer_print_str(p, \">(u.ui);\");\n//        p = isl_printer_end_line(p);\n//      } else {\n//        p = print_str_new_line(p, \"mem_data_split[p] = fifo_data;\");\n//      }\n//      \n//      p = isl_printer_indent(p, -2);\n//      p = print_str_new_line(p, \"}\");\n//\n//      p = isl_printer_start_line(p);\n//      p = isl_printer_print_str(p, \"mem_data = (\");\n//      for (int i = data_pack_in / data_pack_out - 1; i >= 0; i--) {\n//        if (i < data_pack_in / data_pack_out - 1)\n//          p = isl_printer_print_str(p, \", \");\n//        p = isl_printer_print_str(p, \"mem_data_split[\");\n//        p = isl_printer_print_int(p, i);\n//        p = isl_printer_print_str(p, \"]\");\n//      }\n//      p = isl_printer_print_str(p, \");\");\n//      p = isl_printer_end_line(p);\n//\n//      p = isl_printer_start_line(p);\n//      p = isl_printer_print_str(p, module->io_groups[0]->array->name);\n//      p = isl_printer_print_str(p, \"[i] = mem_data;\");\n//      p = isl_printer_end_line(p);\n//\n//      p = isl_printer_indent(p, -2);\n//      p = print_str_new_line(p, \"}\");\n//    }\n//  }\n//\n//  return p;\n//}\n\n/* Print the serializaztion module that connects the external memory to the \n * top-level I/O module. \n */\nstatic __isl_give isl_printer *autosa_print_serialize_module(\n  __isl_take isl_printer *p,\n  struct autosa_hw_module *module, struct autosa_prog *prog,\n  struct hls_info *hls, int boundary)\n{  \n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);  \n\n  /* Print core. */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  if (hls->target == XILINX_HW)\n    p = print_module_core_headers_xilinx(p, prog, module, hls, -1, boundary, 1, 1); // TODO\n  fprintf(hls->kernel_c, \" {\\n\");  \n  fprintf(hls->kernel_c, \"#pragma HLS INLINE OFF\\n\");  \n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  if (!prog->scop->options->autosa->use_cplusplus_template) {\n    p = print_module_iterators(p, hls->kernel_c, module);    \n  }\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  p = print_module_serialize_body(p, module, hls);\n  p = isl_printer_indent(p, -2);\n  fprintf(hls->kernel_c, \"}\\n\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n  return p;\n}\n\n/* Print the default module. \n * For PE modules, we will print a wrapper function to speedup the HLS \n * synthesis. \n * For the rest of the modules, wrapper is disabled. \n */\nstatic __isl_give isl_printer *autosa_print_default_module(\n  __isl_take isl_printer *p,\n  struct autosa_hw_module *module, struct autosa_prog *prog,\n  struct hls_info *hls, int boundary)\n{\n  if (!boundary) {\n    if (!module->device_tree)\n      return p;\n  } else {\n    if (!module->boundary_tree)\n      return p;\n  }    \n\n  bool wrapper = 0;\n  struct print_hw_module_data hw_data = {hls, prog, module, NULL};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n  \n  /* Print wrapper for PE and L1 IO module */\n  if (module->type == PE_MODULE || (module->type != PE_MODULE && module->level == 1)) \n    wrapper = 1;  \n\n  /* Print core. */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  //if (hls->target == XILINX_HW)\n  p = print_module_core_headers_xilinx(p, prog, module, hls, -1, boundary, 0, 1);\n  fprintf(hls->kernel_c, \" {\\n\");\n  if (!boundary || !wrapper)\n    fprintf(hls->kernel_c, \"#pragma HLS INLINE OFF\\n\");\n  else\n    fprintf(hls->kernel_c, \"#pragma HLS INLINE\\n\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  if (!prog->scop->options->autosa->use_cplusplus_template) {\n    p = print_module_iterators(p, hls->kernel_c, module);  \n  }\n  if (prog->scop->options->autosa->block_sparse) {\n    for (int i = 0; i < module->n_io_group; i++) {\n      struct autosa_array_ref_group *group = module->io_groups[i];\n      if (group->local_array->array_type == AUTOSA_EXT_ARRAY) {      \n        int n_lane = get_io_group_n_lane(module, NULL, group);\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, group->array->name);\n        if (group->local_array->is_sparse)\n          p = isl_printer_print_str(p, \"_s_t\");\n        else\n          p = isl_printer_print_str(p, \"_t\");      \n        p = isl_printer_print_int(p, n_lane);\n        p = isl_printer_print_str(p, \" fifo_data_\");\n        p = isl_printer_print_str(p, group->array->name);\n        p = isl_printer_print_str(p, \";\");\n        p = isl_printer_end_line(p);\n      }\n    }\n  }\n  p = print_module_vars_xilinx(p, module, -1);  \n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  if (module->credit && !module->in)\n  {\n    if (hls->target == XILINX_HW)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"credit.write(1);\");\n      p = isl_printer_end_line(p);\n    }\n  }\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  if (hls->target == XILINX_HW)\n  {    \n    print_options = isl_ast_print_options_set_print_for(print_options,\n                                                        &print_for_xilinx, &hw_data);    \n  }\n\n  if (!boundary)\n    p = isl_ast_node_print(module->device_tree, p, print_options);\n  else\n    p = isl_ast_node_print(module->boundary_tree, p, print_options);\n\n  if (module->credit && module->in)\n  {\n    if (hls->target == XILINX_HW)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"int token = credit.read();\");\n      p = isl_printer_end_line(p);\n    }\n  }\n\n  p = isl_printer_indent(p, -2);\n\n  fprintf(hls->kernel_c, \"}\\n\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n\n  if (wrapper) {\n    /* Print wrapper. */\n    if (hls->target == XILINX_HW)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"/* Module Definition */\");\n      p = isl_printer_end_line(p);\n\n      print_module_wrapper_headers_xilinx(prog, module, hls, -1, boundary);\n\n      fprintf(hls->kernel_c, \" {\\n\");\n      p = isl_printer_indent(p, 2);\n\n      p = print_module_core_headers_xilinx(p, prog, module, hls, -1, boundary, 0, 0);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_indent(p, -2);\n      fprintf(hls->kernel_c, \"}\\n\");\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"/* Module Definition */\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_end_line(p);\n    }\n  }\n\n  /* If the module serialization is enabled, we will print out an extra module\n   * for serializing the data. */\n  if (module->to_mem && module->options->autosa->host_serialize) {\n    p = autosa_print_serialize_module(p, module, prog, hls, boundary);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_pe_dummy_module_core_header_xilinx(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_pe_dummy_module *module, int types)\n{\n  struct autosa_array_ref_group *group = module->io_group;\n\n  p = isl_printer_start_line(p);\n  if (types)\n    p = isl_printer_print_str(p, \"void \");\n  // group_name\n  p = isl_printer_print_str(p, group->array->name);\n  if (group->group_type == AUTOSA_IO_GROUP)\n  {\n    if (group->local_array->n_io_group > 1)\n    {\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_int(p, group->nr);\n    }\n  }\n  else if (group->group_type == AUTOSA_DRAIN_GROUP)\n  {\n    p = isl_printer_print_str(p, \"_\");\n    p = isl_printer_print_str(p, \"drain\");\n  }\n  p = isl_printer_print_str(p, \"_PE_dummy\");\n  p = isl_printer_print_str(p, module->in? \"_in\" : \"_out\");\n  p = isl_printer_print_str(p, \"(\");\n  p = print_pe_dummy_module_arguments(p, prog, module->module->kernel,\n                                      module, types, XILINX_HW);\n  p = isl_printer_print_str(p, \")\");\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_pe_dummy_module_core_headers_xilinx(\n    __isl_take isl_printer *p, struct autosa_prog *prog,\n    struct autosa_pe_dummy_module *module, struct hls_info *hls, int types)\n{\n  p = print_pe_dummy_module_core_header_xilinx(p, prog, module, types);\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_pe_dummy_module_wrapper_header_xilinx(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_pe_dummy_module *module)\n{\n  struct autosa_array_ref_group *group = module->io_group;\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"void \");\n  // group_name\n  p = isl_printer_print_str(p, group->array->name);\n  if (group->group_type == AUTOSA_IO_GROUP)\n  {\n    if (group->local_array->n_io_group > 1)\n    {\n      p = isl_printer_print_str(p, \"_\");\n      p = isl_printer_print_int(p, group->nr);\n    }\n  }\n  else if (group->group_type == AUTOSA_DRAIN_GROUP)\n  {\n    p = isl_printer_print_str(p, \"_\");\n    p = isl_printer_print_str(p, \"drain\");\n  }\n  p = isl_printer_print_str(p, \"_PE_dummy\");\n  p = isl_printer_print_str(p, module->in? \"_in\": \"_out\");\n  p = isl_printer_print_str(p, \"_wrapper\");\n  p = isl_printer_print_str(p, \"(\");\n  p = print_pe_dummy_module_arguments(p, prog, module->module->kernel,\n                                      module, 1, XILINX_HW);\n  p = isl_printer_print_str(p, \")\");\n\n  return p;\n}\n\nstatic isl_stat print_pe_dummy_module_wrapper_headers_xilinx(\n    struct autosa_prog *prog, struct autosa_pe_dummy_module *module,\n    struct hls_info *hls)\n{\n  isl_printer *p;\n\n  p = isl_printer_to_file(prog->ctx, hls->kernel_h);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_pe_dummy_module_wrapper_header_xilinx(p, prog, module);\n  p = isl_printer_print_str(p, \";\");\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  p = isl_printer_to_file(prog->ctx, hls->kernel_c);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p = print_pe_dummy_module_wrapper_header_xilinx(p, prog, module);\n  p = isl_printer_end_line(p);\n  isl_printer_free(p);\n\n  return isl_stat_ok;\n}\n\nstatic __isl_give isl_printer *autosa_print_default_pe_dummy_module(\n    __isl_take isl_printer *p,\n    struct autosa_pe_dummy_module *pe_dummy_module,\n    struct autosa_prog *prog, struct hls_info *hls, int boundary)\n{\n  /* For dummy module, we disable wrapper by default due to the relatively\n   * high overheads.\n   */\n  bool wrapper = 0;\n  struct autosa_hw_module *module = pe_dummy_module->module;\n  struct print_hw_module_data hw_data = {hls, prog, module};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  /* Print core. */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  if (hls->target == XILINX_HW)\n    p = print_pe_dummy_module_core_headers_xilinx(p, prog,\n                                                  pe_dummy_module, hls, 1);\n\n  fprintf(hls->kernel_c, \" {\\n\");\n  if (wrapper)\n    fprintf(hls->kernel_c, \"#pragma HLS INLINE\\n\");\n\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"/* Variable Declaration */\"); \n  if (!prog->scop->options->autosa->use_cplusplus_template) {   \n    p = print_module_iterators(p, hls->kernel_c, module);\n  }\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n\n  p = isl_printer_end_line(p);\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  if (hls->target == XILINX_HW)\n  {\n    print_options = isl_ast_print_options_set_print_for(print_options,\n                                                        &print_for_xilinx, &hw_data);\n  }\n\n  p = isl_ast_node_print(pe_dummy_module->device_tree, p, print_options);\n\n  p = isl_printer_indent(p, -2);\n\n  fprintf(hls->kernel_c, \"}\\n\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_end_line(p);\n\n  /* Print wrapper. */\n  if (wrapper) {\n    if (hls->target == XILINX_HW)\n    {\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"/* Module Definition */\");\n      p = isl_printer_end_line(p);\n  \n      print_pe_dummy_module_wrapper_headers_xilinx(prog, pe_dummy_module, hls);\n  \n      fprintf(hls->kernel_c, \" {\\n\");\n      p = isl_printer_indent(p, 2);\n      p = print_pe_dummy_module_core_headers_xilinx(p, prog, pe_dummy_module, hls, 0);\n      p = isl_printer_print_str(p, \";\");\n      p = isl_printer_end_line(p);\n      p = isl_printer_indent(p, -2);\n      fprintf(hls->kernel_c, \"}\\n\");\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"/* Module Definition */\");\n      p = isl_printer_end_line(p);\n  \n      p = isl_printer_end_line(p);\n    }\n  }\n\n  return p;\n}\n\nstruct print_db_module_while_data {\n  int inter; // -1: outer 0: intra 1: inter  \n  int under_if; \n  int reach_user;\n\n  isl_printer *p_for;\n  isl_printer *p_user;\n  /* Outer */\n  std::vector<char *> outer_for_logic;  \n  std::vector<char *> outer_iterator_name;\n  std::vector<char *> outer_iterator_lb;\n  std::vector<char *> outer_iterator_ub;\n  int outer_for_level;\n  /* Inter */\n  std::vector<char *> inter_for_logic;  \n  std::vector<char *> inter_iterator_name;\n  std::vector<char *> inter_iterator_lb;\n  std::vector<char *> inter_iterator_ub;  \n  int inter_for_level;\n  /* Intra */\n  std::vector<char *> intra_for_logic;  \n  std::vector<char *> intra_iterator_name;\n  std::vector<char *> intra_iterator_lb;\n  std::vector<char *> intra_iterator_ub;\n  int intra_for_level;\n};\n\nstatic __isl_give isl_printer *print_double_buffer_module_vars_while(\n  __isl_take isl_printer *p, struct autosa_hw_module *module, \n  struct hls_info *hls,\n  struct print_db_module_while_data *data)\n{\n  /* Inst ids */\n  if (!module->options->autosa->use_cplusplus_template) {\n    p = print_module_iterators(p, hls->kernel_c, module);\n  }\n  /* Local buffer */\n  for (int i = 0; i < module->n_var; i++) {\n    struct autosa_kernel_var *var = &module->var[i];\n    p = isl_printer_start_line(p);\n    if (var->n_lane == 1) \n      p = isl_printer_print_str(p, var->array->type);\n    else\n    {\n      p = isl_printer_print_str(p, var->array->name);\n      p = isl_printer_print_str(p, \"_t\");\n      p = isl_printer_print_int(p, var->n_lane);\n    }\n    p = isl_printer_print_str(p, \" \");\n    p = isl_printer_print_str(p, var->name);\n    p = isl_printer_print_str(p, \"[2]\");\n    for (int j = 0; j < isl_vec_size(var->size); j++) {\n      isl_val *v;\n\n      p = isl_printer_print_str(p, \"[\");\n      v = isl_vec_get_element_val(var->size, j);\n      p = isl_printer_print_val(p, v);\n      isl_val_free(v);\n      p = isl_printer_print_str(p, \"]\");      \n    }\n    p = isl_printer_print_str(p, \";\");\n    p = isl_printer_end_line(p);\n  }\n\n  /* State handle variables */\n  p = print_str_new_line(p, \"bool arb = 0;\");  \n  p = print_str_new_line(p, module->in? \"bool inter_trans_en = 1;\" : \"bool inter_trans_en = 0;\");\n  p = print_str_new_line(p, module->in? \"bool intra_trans_en = 0;\" : \"bool intra_trans_en = 1;\");\n  p = print_str_new_line(p, module->in? \"bool inter_done = 0;\" : \"bool inter_done = 1;\");\n  p = print_str_new_line(p, module->in? \"bool intra_done = 1;\" : \"bool intra_done = 0;\");\n  /* Iterators */\n  for (int i = 0; i < data->outer_iterator_name.size(); i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, data->outer_iterator_name[i]);\n    free(data->outer_iterator_name[i]);\n    p = isl_printer_print_str(p, \" = \");\n    p = isl_printer_print_str(p, data->outer_iterator_lb[i]);\n    free(data->outer_iterator_lb[i]);\n    p = isl_printer_print_str(p, \"; \");\n    p = isl_printer_print_str(p, \"/* UB: \");\n    p = isl_printer_print_str(p, data->outer_iterator_ub[i]);\n    free(data->outer_iterator_ub[i]);\n    p = isl_printer_print_str(p, \" */\");\n    p = isl_printer_end_line(p);\n  }\n  for (int i = 0; i < data->inter_iterator_name.size(); i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, data->inter_iterator_name[i]);\n    free(data->inter_iterator_name[i]);\n    p = isl_printer_print_str(p, \" = \");\n    p = isl_printer_print_str(p, data->inter_iterator_lb[i]);\n    free(data->inter_iterator_lb[i]);\n    p = isl_printer_print_str(p, \"; \");\n    p = isl_printer_print_str(p, \"/* UB: \");\n    p = isl_printer_print_str(p, data->inter_iterator_ub[i]);\n    free(data->inter_iterator_ub[i]);\n    p = isl_printer_print_str(p, \" */\");\n    p = isl_printer_end_line(p);\n  }\n  for (int i = 0; i < data->intra_iterator_name.size(); i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, data->intra_iterator_name[i]);\n    free(data->intra_iterator_name[i]);\n    p = isl_printer_print_str(p, \" = \");\n    p = isl_printer_print_str(p, data->intra_iterator_lb[i]);\n    free(data->intra_iterator_lb[i]);\n    p = isl_printer_print_str(p, \"; \");\n    p = isl_printer_print_str(p, \"/* UB: \");\n    p = isl_printer_print_str(p, data->intra_iterator_ub[i]);\n    free(data->intra_iterator_ub[i]);\n    p = isl_printer_print_str(p, \" */\");\n    p = isl_printer_end_line(p);\n  }\n  \n  p = print_str_new_line(p, \"bool last_run = false;\");\n\n  return p;\n}\n\n/* Count the for level.\n */\nstatic __isl_give isl_printer *count_module_for(__isl_take isl_printer *p,\n                                                __isl_take isl_ast_print_options *print_options,\n                                                __isl_keep isl_ast_node *node, void *user)\n{\n  struct print_db_module_while_data *data = (struct print_db_module_while_data *)user;\n  isl_ast_node *body;\n\n  if (data->inter == -1)\n    data->outer_for_level++;\n  else if (data->inter == 0)\n    data->intra_for_level++;\n  else if (data->inter == 1)\n    data->inter_for_level++;\n\n  body = isl_ast_node_for_get_body(node);\n  p = isl_ast_node_print(body, p, print_options);\n  isl_ast_node_free(body);\n\n  return p;\n}                                                                                                \n\n/* Count the for level. A different implementation. \n * Currently only used for inter_trans module.\n * Since there might be if branches existing, only count one branch.\n * We assume the two branches are with the equal depth.\n */\nstatic isl_bool count_module_for_alt(__isl_keep isl_ast_node *node, void *user) {\n  struct print_db_module_while_data *data = (struct print_db_module_while_data *)user;\n  if (isl_ast_node_get_type(node) == isl_ast_node_if) {\n    data->under_if = 1;\n  }  \n\n  if (isl_ast_node_get_type(node) == isl_ast_node_for) {\n    if (data->under_if == 0 || (data->under_if == 1 && data->reach_user == 0)) {\n      data->inter_for_level++;    \n    }\n  }\n  if (isl_ast_node_get_type(node) == isl_ast_node_user) {\n    data->reach_user = 1;\n  }\n\n  return isl_bool_true;\n}\n\n/* Extract the loop information. \n */\nstatic __isl_give isl_printer *extract_module_for(__isl_take isl_printer *p,\n                                                  __isl_take isl_ast_print_options *print_options,\n                                                  __isl_keep isl_ast_node *node, void *user)\n{\n  struct print_db_module_while_data *data = (struct print_db_module_while_data *)user;\n  isl_ast_expr *iterator, *init, *cond, *ub;  \n  const char *iterator_suffix;\n  isl_printer *p_local, *p_str;  \n  char *text;\n  std::vector<char *> text_lines;\n  isl_ast_node *body;\n\n//  if (data->inter == -1)\n//    iterator_suffix = \"outer_\";\n//  else if (data->inter == 0)\n//    iterator_suffix = \"intra_\";\n//  else\n//    iterator_suffix = \"inter_\";\n  p_local = data->p_for;  \n\n  /* Extract the lower bound and upper bound. */\n  iterator = isl_ast_node_for_get_iterator(node);\n  init = isl_ast_node_for_get_init(node);\n  cond = isl_ast_node_for_get_cond(node);\n  ub = isl_ast_expr_op_get_arg(cond, 1);\n\n  p_str = isl_printer_to_str(isl_ast_node_get_ctx(node));\n  p_str = isl_printer_set_output_format(p_str, ISL_FORMAT_C);\n  //p_str = isl_printer_print_str(p_str, iterator_suffix);\n  p_str = isl_printer_print_ast_expr(p_str, iterator);\n  if (data->inter == -1)\n    data->outer_iterator_name.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 0)\n    data->intra_iterator_name.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 1)\n    data->inter_iterator_name.push_back(isl_printer_get_str(p_str));\n  isl_printer_flush(p_str);\n\n  p_str = isl_printer_print_ast_expr(p_str, ub);\n  if (data->inter == -1)\n    data->outer_iterator_ub.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 0)\n    data->intra_iterator_ub.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 1)\n    data->inter_iterator_ub.push_back(isl_printer_get_str(p_str));\n  isl_printer_flush(p_str);\n\n  p_str = isl_printer_print_ast_expr(p_str, init);\n  if (data->inter == -1)\n    data->outer_iterator_lb.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 0)\n    data->intra_iterator_lb.push_back(isl_printer_get_str(p_str));\n  else if (data->inter == 1)\n    data->inter_iterator_lb.push_back(isl_printer_get_str(p_str));\n  isl_printer_free(p_str);\n\n  p_local = isl_printer_indent(p_local, -4);\n\n  p_local = isl_printer_start_line(p_local);  \n  //p_local = isl_printer_print_str(p_local, iterator_suffix);  \n  p_local = isl_printer_print_ast_expr(p_local, iterator);\n  p_local = isl_printer_print_str(p_local, \"++;\");\n  p_local = isl_printer_end_line(p_local);\n  text = isl_printer_get_str(p_local);\n  text_lines.push_back(text);\n  p_local = isl_printer_flush(p_local);\n\n  p_local = isl_printer_start_line(p_local);\n  p_local = isl_printer_print_str(p_local, \"if (\");\n  //p_local = isl_printer_print_str(p_local, iterator_suffix);  \n  p_local = isl_printer_print_ast_expr(p_local, iterator);\n  p_local = isl_printer_print_str(p_local, \" == \"); \n  p_local = isl_printer_print_ast_expr(p_local, ub);\n  p_local = isl_printer_print_str(p_local, \" + 1) {\"); \n  p_local = isl_printer_end_line(p_local);\n  text = isl_printer_get_str(p_local);\n  text_lines.push_back(text);\n  p_local = isl_printer_flush(p_local);\n\n  p_local = isl_printer_indent(p_local, 4);\n  p_local = isl_printer_start_line(p_local);  \n  //p_local = isl_printer_print_str(p_local, iterator_suffix);\n  p_local = isl_printer_print_ast_expr(p_local, iterator);\n  p_local = isl_printer_print_str(p_local, \" = \");\n  p_local = isl_printer_print_ast_expr(p_local, init);\n  p_local = isl_printer_print_str(p_local, \";\");\n  p_local = isl_printer_end_line(p_local);\n  text = isl_printer_get_str(p_local);\n  text_lines.push_back(text);\n  p_local = isl_printer_flush(p_local);\n\n  if (data->inter == -1)\n    data->outer_for_logic.insert(data->outer_for_logic.begin(), text_lines.begin(), text_lines.end());\n  else if (data->inter == 0)\n    data->intra_for_logic.insert(data->intra_for_logic.begin(), text_lines.begin(), text_lines.end());\n  else if (data->inter == 1)\n    data->inter_for_logic.insert(data->inter_for_logic.begin(), text_lines.begin(), text_lines.end());\n\n  isl_ast_expr_free(iterator);\n  isl_ast_expr_free(init);\n  isl_ast_expr_free(cond);\n  isl_ast_expr_free(ub);\n\n  p_local = isl_printer_indent(p_local, -4);\n\n  body = isl_ast_node_for_get_body(node);\n  p = isl_ast_node_print(body, p, print_options);\n  isl_ast_node_free(body);\n\n  return p;\n}    \n\nstatic void extract_double_buffer_module_while_data(\n  struct autosa_hw_module *module, int boundary, \n  struct print_db_module_while_data *data)\n{\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = module->kernel->ctx;\n  isl_printer *p_for, *p_user, *p;\n  const char *for_logic, *user_logic;\n\n  /* Outer module */\n  data->inter = -1;\n  p = isl_printer_to_str(ctx);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p_for = isl_printer_to_str(ctx);\n  p_for = isl_printer_set_output_format(p_for, ISL_FORMAT_C);\n  p_user = isl_printer_to_str(ctx);\n  p_user = isl_printer_set_output_format(p_user, ISL_FORMAT_C);\n  data->p_for = p_for;\n  data->p_user = p_user;\n  data->outer_for_level = 0;\n\n  /* Count the for level first. */\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &count_module_for, data);\n  if (!boundary)\n    p = isl_ast_node_print(module->device_tree, p, print_options);\n  else\n    p = isl_ast_node_print(module->boundary_tree, p, print_options);  \n\n  /* Extract the for and user logic. */\n  data->p_for = isl_printer_indent(data->p_for, 4 * data->outer_for_level);\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &extract_module_for, data);\n  if (!boundary)\n    p = isl_ast_node_print(module->device_tree, p, print_options);\n  else\n    p = isl_ast_node_print(module->boundary_tree, p, print_options);\n  isl_printer_free(p);  \n  isl_printer_free(data->p_for);\n  isl_printer_free(data->p_user);\n\n  /* Intra module */\n  data->inter = 0;\n  p = isl_printer_to_str(ctx);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p_for = isl_printer_to_str(ctx);\n  p_for = isl_printer_set_output_format(p_for, ISL_FORMAT_C);\n  p_user = isl_printer_to_str(ctx);\n  p_user = isl_printer_set_output_format(p_user, ISL_FORMAT_C);\n  data->p_for = p_for;\n  data->p_user = p_user;\n  data->intra_for_level = 0;\n\n  /* Count the for level first. */\n  print_options = isl_ast_print_options_alloc(ctx);  \n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &count_module_for, data);\n  p = isl_ast_node_print(module->intra_tree, p, print_options);  \n\n  /* Extract the for logic. */\n  data->p_for = isl_printer_indent(data->p_for, 4 * data->intra_for_level);\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &extract_module_for, data);  \n  p = isl_ast_node_print(module->intra_tree, p, print_options);  \n  isl_printer_free(p);  \n  isl_printer_free(data->p_for);\n  isl_printer_free(data->p_user);\n\n  /* Inter module */\n  data->inter = 1;\n  data->under_if = 0;\n  data->reach_user = 0;\n  p = isl_printer_to_str(ctx);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n  p_for = isl_printer_to_str(ctx);\n  p_for = isl_printer_set_output_format(p_for, ISL_FORMAT_C);\n  p_user = isl_printer_to_str(ctx);\n  p_user = isl_printer_set_output_format(p_user, ISL_FORMAT_C);\n  data->p_for = p_for;\n  data->p_user = p_user;\n  data->inter_for_level = 0;\n\n  /* Count the for level first. */\n  if (!boundary) {\n    isl_ast_node_foreach_descendant_top_down(module->inter_tree, &count_module_for_alt, data);\n  } else {        \n    isl_ast_node_foreach_descendant_top_down(module->boundary_inter_tree, &count_module_for_alt, data);\n  }  \n\n  /* Extract the for logic. */\n  data->p_for = isl_printer_indent(data->p_for, 4 * data->inter_for_level);\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &extract_module_for, data);\n  if (!boundary)\n    p = isl_ast_node_print(module->inter_tree, p, print_options);\n  else\n    p = isl_ast_node_print(module->boundary_inter_tree, p, print_options);\n  isl_printer_free(p);  \n  isl_printer_free(data->p_for);\n  isl_printer_free(data->p_user);\n}\n\nstatic __isl_give isl_printer *print_null_for(__isl_take isl_printer *p,\n                                              __isl_take isl_ast_print_options *print_options,\n                                              __isl_keep isl_ast_node *node, void *user)\n{\n  isl_ast_node *body;\n  \n  body = isl_ast_node_for_get_body(node);\n  p = isl_ast_node_print(body, p, print_options);\n  isl_ast_node_free(body);\n\n  return p;\n}    \n\n/* Print the inter_trans module in double buffer mode. \n */\nstatic __isl_give isl_printer *autosa_print_inter_trans_module_double_buffer(\n  __isl_take isl_printer *p,\n  struct autosa_hw_module *module, struct autosa_prog *prog,\n  struct hls_info *hls, int boundary)\n{\n  struct print_hw_module_data hw_data = {hls, prog, module, \"inter_c\"};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &print_null_for, &hw_data);\n\n  p = isl_ast_node_print((boundary == 0) ? module->inter_tree : module->boundary_inter_tree, p, print_options);\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print the intra_trans module in double buffer mode. \n */\nstatic __isl_give isl_printer *autosa_print_intra_trans_module_double_buffer(\n  __isl_take isl_printer *p,\n  struct autosa_hw_module *module, struct autosa_prog *prog,\n  struct hls_info *hls, int boundary)\n{\n  struct print_hw_module_data hw_data = {hls, prog, module, \"intra_c\"};\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_printer_get_ctx(p);\n\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_module_stmt, &hw_data);\n  print_options = isl_ast_print_options_set_print_for(print_options,\n                                                      &print_null_for, &hw_data);\n\n  p = isl_ast_node_print(module->intra_tree, p, print_options);\n  p = isl_printer_end_line(p);\n\n  return p;\n}\n\n/* Print the double buffer module using while loops instead of for loops.\n * First, we will change the buffer to \n * local_buffer[2][...][...].\n * \n * Specifically, when handling a code structure:\n * [outer for loops]\n * for ...\n *   for ...\n * [outer for loops]\n * { \n *   if (arb) {\n *     ld(local_buffer_ping, ld_en);\n *     st(local_buffer_pong, st_en);\n *   else {\n *     ld(local_buffer_pong, ld_en);\n *     st(local_buffer_ping, st_en);\n *   }\n *   [state handle logic]\n *   arb = !arb;\n *   [state handle logic]\n * }\n * [last batch]\n * if (arb) {\n *   st(local_buffer_pong, st_en);\n * } else {\n *   st(local_buffer_ping, st_en);\n * }\n * [last batch]\n * We will convert it to a new code structure:\n * while (1) {\n *   if (ld_en) {\n *     [inlined logic]\n *     ld(local_buffer[arb][...]);\n *     [inlined logic]\n *   } \n *   if (st_en) {\n *     [inlined logic]\n *     st(local_buffer[!arb][...]);\n *     [inlined logic]\n *   }\n *   [state handle logic]\n *   arb = !arb;\n *   ld_en = 1;\n *   st_en = 1;\n *   [state handle logic]\n *   [outer for loops]\n *   outer_iter0++;\n *   if (outer_iter0 == ...) {\n *     outer_iter0 = 0;\n *     [last batch]\n *     ld_en = 0;\n *     [last batch]\n *   }\n *   [outer for loops]\n * }\n * \n * Note that this only works if each for loop structure is a perfectly \n * nested loop so that we could convert to a while loop.\n */\nstatic __isl_give isl_printer *print_double_buffer_module_while(\n  __isl_take isl_printer *p, struct autosa_hw_module *module,\n  struct autosa_prog *prog, struct hls_info *hls, int boundary)\n{\n  if (!boundary) {\n    if (!module->device_tree)\n      return p;    \n  } else {\n    if (!module->boundary_tree)\n      return p;\n  }\n\n  struct print_db_module_while_data print_data;\n\n  /* Extract the code snippets. */\n  extract_double_buffer_module_while_data(module, boundary, &print_data);\n\n  /* Print header */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  print_module_headers_xilinx(prog, module, hls, -1, boundary);\n  p = print_str_new_line(p, \"{\");\n  p = isl_printer_indent(p, 2);\n\n  /* Print variables */\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = print_double_buffer_module_vars_while(p, module, hls, &print_data);\n  p = print_str_new_line(p, \"/* Variable Declaration */\");\n  p = isl_printer_end_line(p);\n\n  /* Print content */\n  p = print_str_new_line(p, \"while (1) {\");\n  p = print_str_new_line(p, \"#pragma HLS PIPELINE II=1\");\n  p = isl_printer_indent(p, 2);\n  \n  /* Print inter_trans */\n  p = print_str_new_line(p, \"if (inter_trans_en) {\");\n  p = isl_printer_indent(p, 2);\n  /* Print the module logic */\n  p = autosa_print_inter_trans_module_double_buffer(p, module, prog, hls, boundary);\n  /* Print the loop counter */  \n  for (int i = 0; i < print_data.inter_for_logic.size(); i++) {    \n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, print_data.inter_for_logic[i]);\n    free(print_data.inter_for_logic[i]);\n  }\n  p = isl_printer_indent(p, 4 * print_data.inter_for_level);\n  p = print_str_new_line(p, \"inter_done = 1;\");\n  p = print_str_new_line(p, \"inter_trans_en = 0;\");\n  for (int i = 0; i < print_data.inter_for_level; i++) {\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n  \n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  /* Print intra_trans */\n  p = print_str_new_line(p, \"if (intra_trans_en) {\");\n  p = isl_printer_indent(p, 2);\n  /* Print the module logic */\n  p = autosa_print_intra_trans_module_double_buffer(p, module, prog, hls, boundary);\n  /* Print the loop counter */\n  for (int i = 0; i < print_data.intra_for_logic.size(); i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, print_data.intra_for_logic[i]);\n    free(print_data.intra_for_logic[i]);\n  }\n  p = isl_printer_indent(p, 4 * print_data.intra_for_level);\n  p = print_str_new_line(p, \"intra_done = 1;\");\n  p = print_str_new_line(p, \"intra_trans_en = 0;\");\n  for (int i = 0; i < print_data.intra_for_level; i++) {\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  /* Print state_handle */\n  p = print_str_new_line(p, \"if (inter_done && intra_done) {\");\n  p = isl_printer_indent(p, 2);\n  p = print_str_new_line(p, \"if (last_run) break;\");\n  p = print_str_new_line(p, \"intra_trans_en = 1;\");\n  p = print_str_new_line(p, \"inter_trans_en = 1;\");\n  p = print_str_new_line(p, \"intra_done = 0;\");\n  p = print_str_new_line(p, \"inter_done = 0;\");\n  p = print_str_new_line(p, \"arb = !arb;\");\n  /* Print the loop counter */\n  for (int i = 0; i < print_data.outer_for_logic.size(); i++) {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, print_data.outer_for_logic[i]);\n    free(print_data.outer_for_logic[i]);\n  }\n  p = isl_printer_indent(p, 4 * print_data.outer_for_level);\n  p = print_str_new_line(p, module->in? \"inter_trans_en = 0;\" : \"intra_trans_en = 0;\");\n  p = print_str_new_line(p, module->in? \"inter_done = 1;\" : \"intra_done = 1;\");\n  p = print_str_new_line(p, \"last_run = true;\");\n  for (int i = 0; i < print_data.outer_for_level; i++) {\n    p = isl_printer_indent(p, -2);\n    p = print_str_new_line(p, \"}\");\n  }\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n\n  p = isl_printer_indent(p, -2);\n  p = print_str_new_line(p, \"}\");\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"/* Module Definition */\");\n  p = isl_printer_end_line(p);\n\n  /* If the module serialization is enabled, we will print out an extra module\n   * for serializing the data. */\n  if (module->to_mem && module->options->autosa->host_serialize) {\n    p = autosa_print_serialize_module(p, module, prog, hls, boundary);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *autosa_print_host_code(__isl_take isl_printer *p,\n                                                      struct autosa_prog *prog, __isl_keep isl_ast_node *tree,\n                                                      struct autosa_hw_module **modules, int n_modules,\n                                                      struct autosa_hw_top_module *top,\n                                                      struct autosa_drain_merge_func **drain_merge_funcs, int n_drain_merge_funcs,\n                                                      struct hls_info *hls)\n{\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_ast_node_get_ctx(tree);\n  struct print_host_user_data data = {hls, prog, top};\n  struct print_hw_module_data hw_data = {hls, prog, NULL};\n  isl_printer *p_module;\n\n  /* Print the data pack types in the program. */\n  print_data_types_xilinx(top, hls);\n\n  /* Print the macros for sparse data structure */\n  if (prog->scop->options->autosa->block_sparse) {\n    print_sparse_macros(top->kernel, hls);\n  }\n\n  /* Print the helper functions in the program. */\n  print_drain_merge_funcs(top->kernel, drain_merge_funcs, n_drain_merge_funcs, hls);\n\n  /* Print the host data serialization function. */\n  print_host_serialize_funcs(top->kernel, modules, n_modules, hls); // TODO\n\n  /* Print the default AST. */\n  print_options = isl_ast_print_options_alloc(ctx);\n  print_options = isl_ast_print_options_set_print_user(print_options,\n                                                       &print_host_user_xilinx, &data);\n\n  /* Print the macros definitions in the program. */\n  p = autosa_print_macros(p, tree);\n  p = isl_ast_node_print(tree, p, print_options);\n\n  /* Print the hw module ASTs. */\n  p_module = isl_printer_to_file(ctx, hls->kernel_c);\n  p_module = isl_printer_set_output_format(p_module, ISL_FORMAT_C);\n\n  for (int i = 0; i < n_modules; i++)\n  {\n    //std::cout << modules[i]->name << \" \" << module->device_tree << std::endl;\n    if (modules[i]->double_buffer && modules[i]->options->autosa->double_buffer_style == 0) \n    {\n      p_module = print_double_buffer_module_while(p_module, modules[i], prog, hls, 0);\n      if (modules[i]->boundary) {\n        p_module = print_double_buffer_module_while(p_module, modules[i], prog, hls, 1);\n      }\n    } else {\n      if (modules[i]->is_filter && modules[i]->is_buffer)\n      {\n        /* Print out the definitions for inter_trans and intra_trans function calls. */\n        /* Intra transfer function */\n        p_module = autosa_print_intra_trans_module(p_module, modules[i], prog, hls, 0);\n  \n        /* Inter transfer function */\n        p_module = autosa_print_inter_trans_module(p_module, modules[i], prog, hls, 0);\n        if (modules[i]->boundary)\n          p_module = autosa_print_inter_trans_module(p_module, modules[i], prog, hls, 1);\n      }\n\n      p_module = autosa_print_default_module(p_module, modules[i], prog, hls, 0);\n  \n      if (modules[i]->boundary)\n      {\n        /* Print out the definitions for boundary trans function calls. */\n        p_module = autosa_print_default_module(p_module, modules[i], prog, hls, 1);\n      }\n\n      if (modules[i]->n_pe_dummy_modules > 0)\n      {\n        /* Print out the definitions for pe dummy function calls. */\n        for (int j = 0; j < modules[i]->n_pe_dummy_modules; j++)\n        {\n          p_module = autosa_print_default_pe_dummy_module(\n              p_module, modules[i]->pe_dummy_modules[j], prog, hls, 0);\n        }\n      }\n    }\n  }\n  isl_printer_free(p_module);\n\n  return p;\n}\n\n/* Declare the AXI interface for each global pointers. \n */\nstatic __isl_give isl_printer *print_top_module_interface_xilinx(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_kernel *kernel)\n{\n  int n;\n  unsigned nparam;\n  isl_space *space;\n  const char *type;\n\n  for (int i = 0; i < kernel->n_array; ++i)\n  {\n    struct autosa_local_array_info *local_array = &kernel->array[i];\n    if (autosa_kernel_requires_array_argument(kernel, i) && !autosa_array_is_scalar(local_array->array))\n    {\n      if (local_array->n_io_group_refs > 1)\n      {\n        for (int j = 0; j < local_array->n_io_group_refs; j++)\n        {\n          p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n          p = isl_printer_start_line(p);\n          if (prog->scop->options->autosa->axi_stream) {\n            p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"#pragma HLS INTERFACE axis port=fifo_\");\n            p = isl_printer_print_str(p, local_array->array->name);\n            p = isl_printer_print_str(p, \"_\");\n            p = isl_printer_print_int(p, j);\n            p = isl_printer_print_str(p, \" bundle=gmem_\");\n            p = isl_printer_print_str(p, local_array->array->name);\n            p = isl_printer_print_str(p, \"_\");\n            p = isl_printer_print_int(p, j);\n            p = isl_printer_print_str(p, \"\\\");\");            \n          } else {\n            p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"#pragma HLS INTERFACE m_axi port=\");\n            p = isl_printer_print_str(p, local_array->array->name);\n            p = isl_printer_print_str(p, \"_\");\n            p = isl_printer_print_int(p, j);\n            p = isl_printer_print_str(p, \" offset=slave bundle=gmem_\");\n            p = isl_printer_print_str(p, local_array->array->name);\n            p = isl_printer_print_str(p, \"_\");\n            p = isl_printer_print_int(p, j);\n            p = isl_printer_print_str(p, \"\\\");\");\n          }\n          p = isl_printer_end_line(p);          \n          p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n        }\n      }\n      else\n      {\n        p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n        p = isl_printer_start_line(p);\n        if (prog->scop->options->autosa->axi_stream) {\n          p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"#pragma HLS INTERFACE axis port=fifo_\");\n          p = isl_printer_print_str(p, local_array->array->name);\n          p = isl_printer_print_str(p, \" bundle=gmem_\");\n          p = isl_printer_print_str(p, local_array->array->name);\n          p = isl_printer_print_str(p, \"\\\");\");\n        } else {\n          p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"#pragma HLS INTERFACE m_axi port=\");\n          p = isl_printer_print_str(p, local_array->array->name);\n          p = isl_printer_print_str(p, \" offset=slave bundle=gmem_\");\n          p = isl_printer_print_str(p, local_array->array->name);\n          p = isl_printer_print_str(p, \"\\\");\");          \n        }\n        p = isl_printer_end_line(p);\n        p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n      }\n    }\n  }\n\n  if (!prog->scop->options->autosa->axi_stream) {\n    for (int i = 0; i < kernel->n_array; ++i)\n    {\n      struct autosa_local_array_info *local_array = &kernel->array[i];\n      if (autosa_kernel_requires_array_argument(kernel, i))\n      {\n        if (local_array->n_io_group_refs > 1)\n        {\n          for (int j = 0; j < local_array->n_io_group_refs; j++)\n          {\n            p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n            p = isl_printer_start_line(p);\n            p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"#pragma HLS INTERFACE s_axilite port=\");\n            p = isl_printer_print_str(p, local_array->array->name);\n            p = isl_printer_print_str(p, \"_\");\n            p = isl_printer_print_int(p, j);\n            p = isl_printer_print_str(p, \" bundle=control\\\");\");\n            p = isl_printer_end_line(p);\n            p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n          }\n        }\n        else\n        {\n          p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n          p = isl_printer_start_line(p);\n          p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"#pragma HLS INTERFACE s_axilite port=\");\n          p = isl_printer_print_str(p, local_array->array->name);\n          p = isl_printer_print_str(p, \" bundle=control\\\");\");\n          p = isl_printer_end_line(p);\n          p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n        }\n      }\n    }\n  }\n\n  space = isl_union_set_get_space(kernel->arrays);\n  nparam = isl_space_dim(space, isl_dim_param);\n  for (int i = 0; i < nparam; i++)\n  {\n    const char *name;\n    name = isl_space_get_dim_name(space, isl_dim_param, i);\n    p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"#pragma HLS INTERFACE s_axilite port=\");\n    p = isl_printer_print_str(p, name);\n    p = isl_printer_print_str(p, \" bundle=control\\\");\");\n    p = isl_printer_end_line(p);\n    p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  }\n  isl_space_free(space);\n\n  n = isl_space_dim(kernel->space, isl_dim_set);\n  type = isl_options_get_ast_iterator_type(prog->ctx);\n  for (int i = 0; i < n; i++)\n  {\n    const char *name;\n    name = isl_space_get_dim_name(kernel->space, isl_dim_set, i);\n    p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"#pragma HLS INTERFACE s_axilite port=\");\n    p = isl_printer_print_str(p, name);\n    p = isl_printer_print_str(p, \" bundle=control\\\");\");\n    p = isl_printer_end_line(p);\n    p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  }\n\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"#pragma HLS INTERFACE s_axilite port=return bundle=control\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_top_module_headers_xilinx(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, struct autosa_hw_top_module *top, struct hls_info *hls)\n{\n  struct autosa_kernel *kernel = top->kernel;\n\n  if (!hls->hls)\n  {\n    p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n    p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"extern \\\\\\\"C\\\\\\\" {\\\");\");\n    p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  }\n\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n\n  p = isl_printer_start_line(p);\n  if (prog->scop->options->autosa->hcl) {\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"void autosa_func\");\n  } else {\n    p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"void kernel\");\n    //p = isl_printer_print_int(p, top->kernel->id);\n    p = isl_printer_print_int(p, 0);\n  }\n  p = isl_printer_print_str(p, \"(\");\n  p = print_kernel_arguments(p, prog, top->kernel, 1, hls);\n  p = isl_printer_print_str(p, \")\\\");\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"{\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  /* Print out the interface pragmas. */\n  if (!prog->scop->options->autosa->hcl) {\n    p = print_top_module_interface_xilinx(p, prog, kernel);\n    p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  }\n\n  /* Print out the dataflow pragma. */  \n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"#pragma HLS DATAFLOW\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n  return p;\n}\n\nstatic char *extract_fifo_name_from_fifo_decl_name(isl_ctx *ctx, char *fifo_decl_name)\n{\n  int loc = 0;\n  char ch;\n  isl_printer *p_str = isl_printer_to_str(ctx);\n  char *name = NULL;\n\n  while ((ch = fifo_decl_name[loc]) != '\\0')\n  {\n    if (ch == '.')\n      break;\n    char buf[2];\n    buf[0] = ch;\n    buf[1] = '\\0';\n    p_str = isl_printer_print_str(p_str, buf);\n    loc++;\n  }\n\n  name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  return name;\n}\n\nstatic char *extract_fifo_width_from_fifo_decl_name(isl_ctx *ctx, char *fifo_decl_name)\n{\n  int loc = 0;\n  char ch;\n  isl_printer *p_str = isl_printer_to_str(ctx);\n  char *name = NULL;\n\n  while ((ch = fifo_decl_name[loc]) != '\\0')\n  {\n    if (ch == '.')\n      break;\n    loc++;\n  }\n\n  loc++;\n\n  while ((ch = fifo_decl_name[loc]) != '\\0')\n  {\n    char buf[2];\n    buf[0] = ch;\n    buf[1] = '\\0';\n    p_str = isl_printer_print_str(p_str, buf);\n    loc++;\n  }\n\n  name = isl_printer_get_str(p_str);\n  isl_printer_free(p_str);\n\n  return name;\n}\n\nstatic __isl_give isl_printer *print_top_module_fifo_stmt(__isl_take isl_printer *p,\n                                                          __isl_take isl_ast_print_options *print_options,\n                                                          __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  struct autosa_kernel_stmt *stmt;\n  struct print_hw_module_data *data = (struct print_hw_module_data *)(user);\n\n  id = isl_ast_node_get_annotation(node);\n  stmt = (struct autosa_kernel_stmt *)isl_id_get_user(id);\n  isl_id_free(id);\n\n  isl_ast_print_options_free(print_options);\n\n  switch (stmt->type)\n  {\n  case AUTOSA_KERNEL_STMT_FIFO_DECL:\n    return autosa_kernel_print_fifo_decl(p, stmt, data->prog, data->hls);\n  }\n\n  return p;\n}\n\nstatic __isl_give isl_printer *print_top_module_call_stmt(\n  __isl_take isl_printer *p,\n  __isl_take isl_ast_print_options *print_options,\n  __isl_keep isl_ast_node *node, void *user)\n{\n  isl_id *id;\n  struct autosa_kernel_stmt *stmt;\n  struct print_hw_module_data *data = (struct print_hw_module_data *)(user);\n\n  id = isl_ast_node_get_annotation(node);\n  stmt = (struct autosa_kernel_stmt *)isl_id_get_user(id);\n  isl_id_free(id);\n\n  isl_ast_print_options_free(print_options);\n\n  switch (stmt->type)\n  {\n  case AUTOSA_KERNEL_STMT_MODULE_CALL:\n    return autosa_kernel_print_module_call(p, stmt, data->prog, data->hls->target);\n  }\n\n  return p;\n}\n\n/* This function prints the code that prints out the top function that \n * calls the hardware modules and declares the fifos.\n */\nstatic void print_top_gen_host_code(\n    struct autosa_prog *prog, __isl_keep isl_ast_node *node,\n    struct autosa_hw_top_module *top, struct hls_info *hls)\n{\n  isl_ast_print_options *print_options;\n  isl_ctx *ctx = isl_ast_node_get_ctx(node);\n  isl_printer *p;\n  int fifo_depth = prog->scop->options->autosa->fifo_depth;\n  struct print_hw_module_data hw_data = {hls, prog, NULL};\n\n  /* Print the top module ASTs. */\n  p = isl_printer_to_file(ctx, hls->top_gen_c);\n  p = isl_printer_set_output_format(p, ISL_FORMAT_C);\n\n  print_top_gen_headers(prog, top, hls);\n  fprintf(hls->top_gen_c, \" {\\n\");\n  p = isl_printer_indent(p, 2);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"FILE *fd = fopen(\\\"\");\n  p = isl_printer_print_str(p, hls->output_dir);\n  p = isl_printer_print_str(p, \"/resource_est/design_info.dat\\\", \\\"w\\\");\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"int fifo_cnt;\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"isl_ctx *ctx = isl_ctx_alloc();\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"isl_printer *p = isl_printer_to_file(ctx, f);\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_end_line(p);\n\n  if (hls->target == XILINX_HW)\n    p = print_top_module_headers_xilinx(p, prog, top, hls);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_indent(p, 2);\");\n  p = isl_printer_end_line(p);\n\n  /* Print FIFO declarations */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_start_line(p);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* FIFO Declaration */\\\");\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_end_line(p);\n\n  /* Print the serialize fifos if existing. */\n  for (int i = 0; i < top->n_hw_modules; i++) {\n    struct autosa_hw_module *module = top->hw_modules[i];\n    struct autosa_array_ref_group *group = module->io_groups[0];\n    if (module->is_serialized) {\n      /* Generate fifo decl counter. */\n      char *fifo_name;\n      int fifo_w;  // bytes\n      fifo_w = module->data_pack_inter * group->array->size;\n      isl_printer *p_str;\n      p_str = isl_printer_to_str(ctx);\n      p_str = autosa_array_ref_group_print_fifo_name(group, p_str);\n      p_str = isl_printer_print_str(p_str, \"_\");\n      p_str = isl_printer_print_str(p_str, module->name);\n      p_str = isl_printer_print_str(p_str, \"_serialize\");\n      fifo_name = isl_printer_get_str(p_str);\n      isl_printer_free(p_str);\n\n      p = print_str_new_line(p, \"fifo_cnt = 1;\");\n      p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* \");\n      p = isl_printer_print_str(p, module->name);\n      p = isl_printer_print_str(p, \"_serialize fifo */ \");      \n      p = print_fifo_type_xilinx(p, group, module->data_pack_inter);\n      p = isl_printer_print_str(p, \" \");\n      p = isl_printer_print_str(p, fifo_name);      \n      p = isl_printer_print_str(p, \";\\\");\");\n      p = isl_printer_end_line(p);\n      p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n      /* Resource pragma */\n      p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"#pragma HLS STREAM variable=\");\n      p = isl_printer_print_str(p, fifo_name);\n      p = isl_printer_print_str(p, \"\\\");\");\n      p = isl_printer_end_line(p);\n      //p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\" depth=2\\\");\");\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\" depth=\");\n      p = isl_printer_print_int(p, fifo_depth);\n      p = isl_printer_print_str(p, \"\\\");\");\n      p = isl_printer_end_line(p);\n\n      p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n\n      if (group->local_array->is_sparse) {\n        p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n        p = isl_printer_start_line(p);\n        p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"#pragma HLS DATA_PACK variable=\");\n        p = isl_printer_print_str(p, fifo_name);\n        p = isl_printer_print_str(p, \"\\\");\");\n        p = isl_printer_end_line(p);\n        p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n      }\n\n      /* fifo:fifo_name:fifo_cnt:fifo_width */\n      p = isl_printer_start_line(p);\n      p = isl_printer_print_str(p, \"fprintf(fd, \\\"fifo:\");\n      p = isl_printer_print_str(p, fifo_name);\n      p = isl_printer_print_str(p, \":\\%d:\");\n      p = isl_printer_print_int(p, fifo_w);\n      p = isl_printer_print_str(p, \"\\\\n\\\", fifo_cnt);\");\n      p = isl_printer_end_line(p);\n\n      p = isl_printer_end_line(p);      \n      free(fifo_name);\n    }\n  }\n\n  for (int i = 0; i < top->n_fifo_decls; i++) {\n    /* Generate fifo decl counter. */\n    char *fifo_decl_name = top->fifo_decl_names[i];\n    char *fifo_name = extract_fifo_name_from_fifo_decl_name(ctx, fifo_decl_name);\n    char *fifo_w = extract_fifo_width_from_fifo_decl_name(ctx, fifo_decl_name);\n    p = print_str_new_line(p, \"fifo_cnt = 0;\");\n\n    /* Print AST */\n    print_options = isl_ast_print_options_alloc(ctx);\n    print_options = isl_ast_print_options_set_print_user(print_options,\n                                                         &print_top_module_fifo_stmt, &hw_data);\n\n    p = isl_ast_node_print(top->fifo_decl_wrapped_trees[i],\n                           p, print_options);\n\n    /* fifo:fifo_name:fifo_cnt:fifo_width */\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"fprintf(fd, \\\"fifo:\");\n    p = isl_printer_print_str(p, fifo_name);\n    p = isl_printer_print_str(p, \":\\%d:\");\n    p = isl_printer_print_str(p, fifo_w);\n    p = isl_printer_print_str(p, \"\\\\n\\\", fifo_cnt);\");\n    p = isl_printer_end_line(p);\n\n    p = isl_printer_end_line(p);\n\n    free(fifo_name);\n    free(fifo_w);\n  }\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_start_line(p);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_print_str(p, \\\"/* FIFO Declaration */\\\");\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_end_line(p);\");\n  p = isl_printer_end_line(p);\n\n  int n_module_names = 0;\n  char **module_names = NULL;\n  for (int i = 0; i < top->n_hw_modules; i++)\n  {\n    /* Generate module call counter. */\n    struct autosa_hw_module *module = top->hw_modules[i];\n    char *module_name;\n\n    if (module->is_filter && module->is_buffer)\n    {\n      module_name = concat(ctx, module->name, \"intra_trans\");\n\n      n_module_names++;\n      module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n      module_names[n_module_names - 1] = module_name;\n\n      module_name = concat(ctx, module->name, \"inter_trans\");\n\n      n_module_names++;\n      module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n      module_names[n_module_names - 1] = module_name;\n\n      if (module->boundary)\n      {\n        module_name = concat(ctx, module->name, \"inter_trans_boundary\");\n\n        n_module_names++;\n        module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n        module_names[n_module_names - 1] = module_name;\n      }\n    }\n\n    module_name = strdup(module->name);\n\n    n_module_names++;\n    module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n    module_names[n_module_names - 1] = module_name;\n\n    if (module->boundary)\n    {\n      module_name = concat(ctx, module->name, \"boundary\");\n\n      n_module_names++;\n      module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n      module_names[n_module_names - 1] = module_name;\n    }\n\n    if (module->n_pe_dummy_modules > 0)\n    {\n      for (int j = 0; j < module->n_pe_dummy_modules; j++)\n      {\n        struct autosa_pe_dummy_module *dummy_module = module->pe_dummy_modules[j];\n        struct autosa_array_ref_group *group = dummy_module->io_group;\n        isl_printer *p_str = isl_printer_to_str(ctx);\n        p_str = autosa_array_ref_group_print_prefix(group, p_str);\n        p_str = isl_printer_print_str(p_str, \"_PE_dummy\");\n        p_str = isl_printer_print_str(p_str, dummy_module->in? \"_in\" : \"_out\");\n        module_name = isl_printer_get_str(p_str);\n        isl_printer_free(p_str);\n\n        n_module_names++;\n        module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n        module_names[n_module_names - 1] = module_name;\n      }\n    }\n\n    if (module->is_serialized) { \n      if (module->boundary)      \n        module_name = concat(ctx, module->name, \"boundary_serialize\");\n      else\n        module_name = concat(ctx, module->name, \"serialize\");\n      \n      n_module_names++;\n      module_names = (char **)realloc(module_names, n_module_names * sizeof(char *));\n      module_names[n_module_names - 1] = module_name;\n    }\n  }\n  for (int i = 0; i < n_module_names; i++)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"int \");\n    p = isl_printer_print_str(p, module_names[i]);\n    p = isl_printer_print_str(p, \"_cnt = 0;\");\n    p = isl_printer_end_line(p);\n  }\n\n  /* Print module calls. */\n  for (int i = 0; i < top->n_module_calls; i++)\n  {\n    /* Print AST */\n    print_options = isl_ast_print_options_alloc(ctx);\n    print_options = isl_ast_print_options_set_print_user(print_options,\n                                                         &print_top_module_call_stmt, &hw_data);    \n\n    p = isl_ast_node_print(top->module_call_wrapped_trees[i],\n                           p, print_options);\n  }\n\n  /* module:module_name:module_cnt. */\n  for (int i = 0; i < n_module_names; i++)\n  {\n    p = isl_printer_start_line(p);\n    p = isl_printer_print_str(p, \"fprintf(fd, \\\"module:\");\n    p = isl_printer_print_str(p, module_names[i]);\n    p = isl_printer_print_str(p, \":\\%d\\\\n\\\", \");\n    p = isl_printer_print_str(p, module_names[i]);\n    p = isl_printer_print_str(p, \"_cnt);\");\n    p = isl_printer_end_line(p);\n  }\n  p = isl_printer_end_line(p);\n\n  for (int i = 0; i < n_module_names; i++)\n  {\n    free(module_names[i]);\n  }\n  free(module_names);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"p = isl_printer_indent(p, -2);\");\n  p = isl_printer_end_line(p);\n\n  p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n  p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"}\\\");\");\n  p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n  if (hls->target == XILINX_HW)\n  {\n    if (!hls->hls)\n    {\n      p = print_str_new_line(p, \"p = isl_printer_start_line(p);\");\n      p = print_str_new_line(p, \"p = isl_printer_print_str(p, \\\"}\\\");\");\n      p = print_str_new_line(p, \"p = isl_printer_end_line(p);\");\n    }\n  }\n\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"fclose(fd);\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"isl_printer_free(p);\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"isl_ctx_free(ctx);\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_indent(p, -2);\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"}\");\n  p = isl_printer_end_line(p);\n  p = isl_printer_end_line(p);\n\n  /* For internal testing only. */\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"int main()\");\n  p = isl_printer_end_line(p);\n\n  p = ppcg_start_block(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"FILE *f = fopen(\\\"\");\n  p = isl_printer_print_str(p, hls->output_dir);\n  p = isl_printer_print_str(p, \"/src/top.cpp\\\", \\\"w\\\");\");\n  p = isl_printer_end_line(p);\n\n  p = isl_printer_start_line(p);\n  p = isl_printer_print_str(p, \"top_generate(f);\");\n  p = isl_printer_end_line(p);\n\n  p = ppcg_end_block(p);\n  p = isl_printer_free(p);\n\n  return;\n}\n\n/* Given a autosa_prog \"prog\" and the corresponding tranformed AST\n * \"tree\", print the entire OpenCL/HLS code to \"p\".\n * \"types\" collects the types for which a definition has already been\n * printed.\n */\nstatic __isl_give isl_printer *print_hw(\n    __isl_take isl_printer *p,\n    struct autosa_prog *prog, __isl_keep isl_ast_node *tree,\n    struct autosa_hw_module **modules, int n_modules,\n    struct autosa_hw_top_module *top_module,\n    struct autosa_drain_merge_func **drain_merge_funcs, int n_drain_merge_funcs,\n    struct autosa_types *types, void *user)\n{\n  struct hls_info *hls = (struct hls_info *)user;\n  isl_printer *p_tmp;\n\n  p_tmp = isl_printer_to_file(isl_printer_get_ctx(p), hls->kernel_c);\n  p_tmp = isl_printer_set_output_format(p_tmp, ISL_FORMAT_C);\n  p_tmp = autosa_print_types(p_tmp, types, prog);\n  p_tmp = isl_printer_free(p_tmp);  \n\n  /* Print OpenCL host and kernel function. */\n  p = autosa_print_host_code(p, prog, tree, modules, n_modules, top_module,\n                             drain_merge_funcs, n_drain_merge_funcs, hls);\n  /* Print seperate top module code generation function. */\n  print_top_gen_host_code(prog, tree, top_module, hls);\n\n  return p;\n}\n\n/* Generate systolic arrays on Xilinx FPGAs.\n */\nint generate_autosa_xilinx_hls_c(isl_ctx *ctx, struct ppcg_options *options,\n                                 const char *input)\n{\n  struct hls_info hls;\n  int r;\n\n  hls.target = XILINX_HW;\n  hls.hls = options->autosa->hls;\n  hls.ctx = ctx;\n  hls.output_dir = options->autosa->output_dir;\n  hls.hcl = options->autosa->hcl;\n  hls_open_files(&hls, input);\n\n  r = generate_sa(ctx, input, hls.host_c, options, &print_hw, &hls);\n\n  hls_close_files(&hls);\n\n  return r;\n}\n"
  },
  {
    "path": "src/autosa_xilinx_hls_c.h",
    "content": "#ifndef _AUTOSA_XILINX_HLS_C_H\n#define _AUTOSA_XILINX_HLS_C_H\n\n#include <pet.h>\n#include \"ppcg_options.h\"\n#include \"ppcg.h\"\n\n#ifdef __cplusplus\nextern \"C\"\n{\n#endif\n\nint generate_autosa_xilinx_hls_c(isl_ctx *ctx, struct ppcg_options *options,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t const char *input);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif"
  },
  {
    "path": "src/configure.ac",
    "content": "AC_INIT([autosa], [0.02], [jiewang@cs.ucla.edu])\nAC_CONFIG_AUX_DIR([build])\nAC_CONFIG_MACRO_DIR([m4])\nAM_INIT_AUTOMAKE([foreign subdir-objects])\nm4_ifdef([AM_SILENT_RULES],[AM_SILENT_RULES([yes])])\n\nAC_PROG_CC\nAC_PROG_CXX\nAC_PROG_LIBTOOL\nPKG_PROG_PKG_CONFIG\n\n# AX_CHECK_OPENMP\n# AX_CHECK_OPENCL\n# if test $HAVE_OPENCL = yes; then\n# \textra_tests=\"$extra_tests opencl_test.sh\"\n# fi\n\nAX_SUBMODULE(isl,build|bundled|system,bundled)\nAM_CONDITIONAL(BUNDLED_ISL, test $with_isl = bundled)\nAM_CONDITIONAL(BUILD_ISL, test $with_isl = build)\n\nAC_SUBST(ISL_CFLAGS)\nAC_SUBST(ISL_LIBS)\nAC_SUBST(ISL_SRCDIR)\nAC_SUBST(ISL_BUILDDIR)\ncase \"$with_isl\" in\nbundled)\n\tISL_CFLAGS=\"-I\\$(top_srcdir)/isl/include -I\\$(top_builddir)/isl/include\"\n\tISL_CFLAGS=\"$ISL_CFLAGS\"\n  ISL_SRCDIR=\"$srcdir/isl\"\n  ISL_BUILDDIR=isl\n\tppcg_configure_args=\"$ppcg_configure_args --with-isl-builddir=../isl\"\n\tppcg_configure_args=\"$ppcg_configure_args --with-isl=build\"\n\t#ppcg_configure_args=\"$ppcg_configure_args --with-clang=system\"\n\tppcg_configure_args=\"$ppcg_configure_args --with-clang=no\"\n  PACKAGE_CFLAGS_ISL='-I${prefix}/include'\n\t;;\nbuild)\n  ISL_SRCDIR=\"$isl_srcdir\"\n\tISL_BUILDDIR=`echo @abs_builddir@ | $with_isl_builddir/config.status --file=-`\n\tISL_CFLAGS=\"-I$isl_srcdir/include -I$ISL_BUILDDIR/include\"\n\tISL_CFLAGS=\"$ISL_CFLAGS\"\n\tISL_LIBS=\"$with_isl_builddir/libisl.la\"\n  PACKAGE_CFLAGS_ISL='-I${prefix}/include'\n\t;;\nsystem)\n\tPKG_CHECK_MODULES([ISL], [isl])\n  PACKAGE_CFLAGS_ISL=\"$ISL_CFLAGS\"\n  ;;\nesac\nAM_CONDITIONAL(HAVE_ISL_BUILDDIR, test \"x$ISL_BUILDDIR\" != \"x\")\n\nAX_SUBMODULE(barvinok,bundled|system,bundled)\nAM_CONDITIONAL(BUNDLED_BARVINOK, test $with_barvinok = bundled)\nAM_CONDITIONAL(BUILD_BARVINOK, test $with_barvinok = build)\n\nAC_SUBST(BARVINOK_CFLAGS)\nAC_SUBST(BARVINOK_LIBS)\nAC_SUBST(BARVINOK_SRCDIR)\nAC_SUBST(BARVINOK_BUILDDIR)\ncase \"$with_barvinok\" in\nbundled)\n  BARVINOK_CFLAGS=\"$BARVINOK_CFLAGS -I\\$(top_srcdir)/barvinok -I\\$(top_builddir)/barvinok\"\n  BARVINOK_CFLAGS=\"$BARVINOK_CFLAGS\"\n  BARVINOK_SRCDIR=\"$srcdir/barvinok\"\n  BARVINOK_BUILDDIR=barvinok\n  ;;\nbuild)\n  BARVINOK_SRCDIR=\"$barvinok_srcdir\"\n  BARVINOK_CFLAGS=\"$BARVINOK_CFLAGS\"\n  BARVINOK_BUILDDIR=`echo @abs_builddir@ | $with_BARVINOK_builddir/config.status --file=-`\n  BARVINOK_CFLAGS=\"-I$barvinok_srcdir/ -I$BARVINOK_BUILDDIR/\"\n  BARVINOK_LIBS=\"$with_barvinok_builddir/libisl.la\"\n  ;;\nsystem)\n  PKG_CHECK_MODULES([BARVINOK], [barvinok])\n  PACKAGE_CFLAGS_BARVINOK=\"$BARVINOK_CFLAGS\"\n  ;;\nesac\nAM_CONDITIONAL(HAVE_BARVINOK_BUILDDIR, test \"x$BARVINOK_BUILDDIR\" != \"x\")\n\nAX_SUBMODULE(pet,bundled|system,bundled)\nAM_CONDITIONAL(BUNDLED_PET, test $with_pet = bundled)\nAM_CONDITIONAL(BUILD_PET, test $with_pet = build)\n\nAC_SUBST(PET_CFLAGS)\nAC_SUBST(PET_LIBS)\nAC_SUBST(PET_BUILDDIR)\ncase \"$with_pet\" in\nbundled)\n\tPET_CFLAGS=\"$PET_CFLAGS -I\\$(top_srcdir)/pet/include\"\n\t;;\nbuild)\n  PET_BUILDDIR=`echo @abs_builddir@ | $with_pet_builddir/config.status --file=-`\n  PET_CFLAGS=\"-I$pet_srcdir/include\"\n  ;;\nsystem)\n\tPKG_CHECK_MODULES([PET], [pet])\n  PACKAGE_CFLAGS_PET=\"$PET_CFLAGS\"\n\t;;\nesac\n\n# AC_SUBST(POLYBENCH_DIR)\n# AC_SUBST(extra_tests)\n# AC_ARG_WITH([polybench],\n# \t[AS_HELP_STRING([--with-polybench=DIR], [PolyBench location])],\n# \t[\n# \tif test -f \"$with_polybench/utilities/benchmark_list\"; then\n# \t\tPOLYBENCH_DIR=$with_polybench\n# \t\textra_tests=\"$extra_tests polybench_test.sh\"\n# \tfi\n# \t])\n\n# AX_DETECT_GIT_HEAD\n\nAC_CONFIG_FILES(Makefile)\n# AC_CONFIG_FILES([polybench_test.sh], [chmod +x polybench_test.sh])\n# AC_CONFIG_FILES([opencl_test.sh], [chmod +x opencl_test.sh])\nif test $with_isl = bundled; then\n\tAC_CONFIG_SUBDIRS(isl)\nfi\nif test $with_barvinok = bundled; then\n  AC_CONFIG_SUBDIRS(barvinok)\nfi\nif test $with_pet = bundled; then\n\tAC_CONFIG_SUBDIRS(pet)\nfi\nAC_CONFIG_COMMANDS_POST([\n\tdnl pass on arguments to subdir configures, but don't\n\tdnl add them to config.status\n\tac_configure_args=\"$ac_configure_args $ppcg_configure_args\"\n])\nAC_OUTPUT\n"
  },
  {
    "path": "src/cpu.c",
    "content": "/*\n * Copyright 2012 INRIA Paris-Rocquencourt\n * Copyright 2012 Ecole Normale Superieure\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Tobias Grosser, INRIA Paris-Rocquencourt,\n * Domaine de Voluceau, Rocquenqourt, B.P. 105,\n * 78153 Le Chesnay Cedex France\n * and Sven Verdoolaege,\n * Ecole Normale Superieure, 45 rue d'Ulm, 75230 Paris, France\n */\n\n#include <limits.h>\n#include <stdio.h>\n#include <string.h>\n\n#include <isl/aff.h>\n#include <isl/ctx.h>\n#include <isl/flow.h>\n#include <isl/map.h>\n#include <isl/ast_build.h>\n#include <isl/schedule.h>\n#include <isl/schedule_node.h>\n#include <pet.h>\n\n#include \"ppcg.h\"\n#include \"ppcg_options.h\"\n#include \"cpu.h\"\n#include \"print.h\"\n#include \"schedule.h\"\n#include \"util.h\"\n\n/* Representation of a statement inside a generated AST.\n *\n * \"stmt\" refers to the original statement.\n * \"ref2expr\" maps the reference identifier of each access in\n * the statement to an AST expression that should be printed\n * at the place of the access.\n */\nstruct ppcg_stmt {\n\tstruct pet_stmt *stmt;\n\n\tisl_id_to_ast_expr *ref2expr;\n};\n\nstatic void ppcg_stmt_free(void *user)\n{\n\tstruct ppcg_stmt *stmt = user;\n\n\tif (!stmt)\n\t\treturn;\n\n\tisl_id_to_ast_expr_free(stmt->ref2expr);\n\n\tfree(stmt);\n}\n\n/* Derive the output file name from the input file name.\n * 'input' is the entire path of the input file. The output\n * is the file name plus the additional extension.\n *\n * We will basically replace everything after the last point\n * with '.ppcg.c'. This means file.c becomes file.ppcg.c\n */\nstatic FILE *get_output_file(const char *input, const char *output)\n{\n\tchar name[PATH_MAX];\n\tconst char *ext;\n\tconst char ppcg_marker[] = \".ppcg\";\n\tint len;\n\tFILE *file;\n\n\tlen = ppcg_extract_base_name(name, input);\n\n\tstrcpy(name + len, ppcg_marker);\n\text = strrchr(input, '.');\n\tstrcpy(name + len + sizeof(ppcg_marker) - 1, ext ? ext : \".c\");\n\n\tif (!output)\n\t\toutput = name;\n\n\tfile = fopen(output, \"w\");\n\tif (!file) {\n\t\tfprintf(stderr, \"Unable to open '%s' for writing\\n\", output);\n\t\treturn NULL;\n\t}\n\n\treturn file;\n}\n\n/* Data used to annotate for nodes in the ast.\n */\nstruct ast_node_userinfo {\n\t/* The for node is an openmp parallel for node. */\n\tint is_openmp;\n};\n\n/* Information used while building the ast.\n */\nstruct ast_build_userinfo {\n\t/* The current ppcg scop. */\n\tstruct ppcg_scop *scop;\n\n\t/* Are we currently in a parallel for loop? */\n\tint in_parallel_for;\n\n\t/* The contraction of the entire schedule tree. */\n\tisl_union_pw_multi_aff *contraction;\n};\n\n/* Check if the current scheduling dimension is parallel.\n *\n * We check for parallelism by verifying that the loop does not carry any\n * dependences.\n *\n * If any expansion nodes are present in the schedule tree,\n * then they are assumed to be situated near the leaves of the schedule tree,\n * underneath any node that may result in a for loop.\n * In particular, these expansions may have been introduced\n * by the call to isl_schedule_expand inside ppcg_compute_grouping_schedule.\n * The dependence relations are formulated in terms of the expanded\n * domains, while, by assumption, the partial schedule returned\n * by isl_ast_build_get_schedule refers to the contracted domains.\n * Plug in the contraction such that the schedule would also\n * refer to the expanded domains.\n * Note that if the schedule tree does not contain any expansions,\n * then the contraction is an identity function.\n *\n * If the live_range_reordering option is set, then this currently\n * includes the order dependences.  In principle, non-zero order dependences\n * could be allowed, but this would require privatization and/or expansion.\n *\n * Parallelism test: if the distance is zero in all outer dimensions, then it\n * has to be zero in the current dimension as well.\n * Implementation: first, translate dependences into time space, then force\n * outer dimensions to be equal.  If the distance is zero in the current\n * dimension, then the loop is parallel.\n * The distance is zero in the current dimension if it is a subset of a map\n * with equal values for the current dimension.\n */\nstatic int ast_schedule_dim_is_parallel(__isl_keep isl_ast_build *build,\n\tstruct ast_build_userinfo *build_info)\n{\n\tstruct ppcg_scop *scop = build_info->scop;\n\tisl_union_map *schedule, *deps;\n\tisl_map *schedule_deps, *test;\n\tisl_space *schedule_space;\n\tunsigned i, dimension, is_parallel;\n\n\tschedule = isl_ast_build_get_schedule(build);\n\tschedule = isl_union_map_preimage_domain_union_pw_multi_aff(schedule,\n\t\tisl_union_pw_multi_aff_copy(build_info->contraction));\n\tschedule_space = isl_ast_build_get_schedule_space(build);\n\n\tdimension = isl_space_dim(schedule_space, isl_dim_out) - 1;\n\n\tdeps = isl_union_map_copy(scop->dep_flow);\n\tdeps = isl_union_map_union(deps, isl_union_map_copy(scop->dep_false));\n\tif (scop->options->live_range_reordering) {\n\t\tisl_union_map *order = isl_union_map_copy(scop->dep_order);\n\t\tdeps = isl_union_map_union(deps, order);\n\t}\n\tdeps = isl_union_map_apply_range(deps, isl_union_map_copy(schedule));\n\tdeps = isl_union_map_apply_domain(deps, schedule);\n\n\tif (isl_union_map_is_empty(deps)) {\n\t\tisl_union_map_free(deps);\n\t\tisl_space_free(schedule_space);\n\t\treturn 1;\n\t}\n\n\tschedule_deps = isl_map_from_union_map(deps);\n\n\tfor (i = 0; i < dimension; i++)\n\t\tschedule_deps = isl_map_equate(schedule_deps, isl_dim_out, i,\n\t\t\t\t\t       isl_dim_in, i);\n\n\ttest = isl_map_universe(isl_map_get_space(schedule_deps));\n\ttest = isl_map_equate(test, isl_dim_out, dimension, isl_dim_in,\n\t\t\t      dimension);\n\tis_parallel = isl_map_is_subset(schedule_deps, test);\n\n\tisl_space_free(schedule_space);\n\tisl_map_free(test);\n\tisl_map_free(schedule_deps);\n\n\treturn is_parallel;\n}\n\n/* Mark a for node openmp parallel, if it is the outermost parallel for node.\n */\nstatic void mark_openmp_parallel(__isl_keep isl_ast_build *build,\n\tstruct ast_build_userinfo *build_info,\n\tstruct ast_node_userinfo *node_info)\n{\n\tif (build_info->in_parallel_for)\n\t\treturn;\n\n\tif (ast_schedule_dim_is_parallel(build, build_info)) {\n\t\tbuild_info->in_parallel_for = 1;\n\t\tnode_info->is_openmp = 1;\n\t}\n}\n\n/* Allocate an ast_node_info structure and initialize it with default values.\n */\nstatic struct ast_node_userinfo *allocate_ast_node_userinfo()\n{\n\tstruct ast_node_userinfo *node_info;\n\tnode_info = (struct ast_node_userinfo *)\n\t\tmalloc(sizeof(struct ast_node_userinfo));\n\tnode_info->is_openmp = 0;\n\treturn node_info;\n}\n\n/* Free an ast_node_info structure.\n */\nstatic void free_ast_node_userinfo(void *ptr)\n{\n\tstruct ast_node_userinfo *info;\n\tinfo = (struct ast_node_userinfo *) ptr;\n\tfree(info);\n}\n\n/* This method is executed before the construction of a for node. It creates\n * an isl_id that is used to annotate the subsequently generated ast for nodes.\n *\n * In this function we also run the following analyses:\n *\n * \t- Detection of openmp parallel loops\n */\nstatic __isl_give isl_id *ast_build_before_for(\n\t__isl_keep isl_ast_build *build, void *user)\n{\n\tisl_id *id;\n\tstruct ast_build_userinfo *build_info;\n\tstruct ast_node_userinfo *node_info;\n\n\tbuild_info = (struct ast_build_userinfo *) user;\n\tnode_info = allocate_ast_node_userinfo();\n\tid = isl_id_alloc(isl_ast_build_get_ctx(build), \"\", node_info);\n\tid = isl_id_set_free_user(id, free_ast_node_userinfo);\n\n\tmark_openmp_parallel(build, build_info, node_info);\n\n\treturn id;\n}\n\n/* This method is executed after the construction of a for node.\n *\n * It performs the following actions:\n *\n * \t- Reset the 'in_parallel_for' flag, as soon as we leave a for node,\n * \t  that is marked as openmp parallel.\n *\n */\nstatic __isl_give isl_ast_node *ast_build_after_for(\n\t__isl_take isl_ast_node *node, __isl_keep isl_ast_build *build,\n\tvoid *user)\n{\n\tisl_id *id;\n\tstruct ast_build_userinfo *build_info;\n\tstruct ast_node_userinfo *info;\n\n\tid = isl_ast_node_get_annotation(node);\n\tinfo = isl_id_get_user(id);\n\n\tif (info && info->is_openmp) {\n\t\tbuild_info = (struct ast_build_userinfo *) user;\n\t\tbuild_info->in_parallel_for = 0;\n\t}\n\n\tisl_id_free(id);\n\n\treturn node;\n}\n\n/* Find the element in scop->stmts that has the given \"id\".\n */\nstatic struct pet_stmt *find_stmt(struct ppcg_scop *scop, __isl_keep isl_id *id)\n{\n\tint i;\n\n\tfor (i = 0; i < scop->pet->n_stmt; ++i) {\n\t\tstruct pet_stmt *stmt = scop->pet->stmts[i];\n\t\tisl_id *id_i;\n\n\t\tid_i = isl_set_get_tuple_id(stmt->domain);\n\t\tisl_id_free(id_i);\n\n\t\tif (id_i == id)\n\t\t\treturn stmt;\n\t}\n\n\tisl_die(isl_id_get_ctx(id), isl_error_internal,\n\t\t\"statement not found\", return NULL);\n}\n\n/* Print a user statement in the generated AST.\n * The ppcg_stmt has been attached to the node in at_each_domain.\n */\nstatic __isl_give isl_printer *print_user(__isl_take isl_printer *p,\n\t__isl_take isl_ast_print_options *print_options,\n\t__isl_keep isl_ast_node *node, void *user)\n{\n\tstruct ppcg_stmt *stmt;\n\tisl_id *id;\n\n\tid = isl_ast_node_get_annotation(node);\n\tstmt = isl_id_get_user(id);\n\tisl_id_free(id);\n\n\tp = pet_stmt_print_body(stmt->stmt, p, stmt->ref2expr);\n\n\tisl_ast_print_options_free(print_options);\n\n\treturn p;\n}\n\n\n/* Print a for loop node as an openmp parallel loop.\n *\n * To print an openmp parallel loop we print a normal for loop, but add\n * \"#pragma openmp parallel for\" in front.\n *\n * Variables that are declared within the body of this for loop are\n * automatically openmp 'private'. Iterators declared outside of the\n * for loop are automatically openmp 'shared'. As ppcg declares all iterators\n * at the position where they are assigned, there is no need to explicitly mark\n * variables. Their automatically assigned type is already correct.\n *\n * This function only generates valid OpenMP code, if the ast was generated\n * with the 'atomic-bounds' option enabled.\n *\n */\nstatic __isl_give isl_printer *print_for_with_openmp(\n\t__isl_keep isl_ast_node *node, __isl_take isl_printer *p,\n\t__isl_take isl_ast_print_options *print_options)\n{\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"#pragma omp parallel for\");\n\tp = isl_printer_end_line(p);\n\n\tp = isl_ast_node_for_print(node, p, print_options);\n\n\treturn p;\n}\n\n/* Print a for node.\n *\n * Depending on how the node is annotated, we either print a normal\n * for node or an openmp parallel for node.\n */\nstatic __isl_give isl_printer *print_for(__isl_take isl_printer *p,\n\t__isl_take isl_ast_print_options *print_options,\n\t__isl_keep isl_ast_node *node, void *user)\n{\n\tisl_id *id;\n\tint openmp;\n\n\topenmp = 0;\n\tid = isl_ast_node_get_annotation(node);\n\n\tif (id) {\n\t\tstruct ast_node_userinfo *info;\n\n\t\tinfo = (struct ast_node_userinfo *) isl_id_get_user(id);\n\t\tif (info && info->is_openmp)\n\t\t\topenmp = 1;\n\t}\n\n\tif (openmp)\n\t\tp = print_for_with_openmp(node, p, print_options);\n\telse\n\t\tp = isl_ast_node_for_print(node, p, print_options);\n\n\tisl_id_free(id);\n\n\treturn p;\n}\n\n/* Index transformation callback for pet_stmt_build_ast_exprs.\n *\n * \"index\" expresses the array indices in terms of statement iterators\n * \"iterator_map\" expresses the statement iterators in terms of\n * AST loop iterators.\n *\n * The result expresses the array indices in terms of\n * AST loop iterators.\n */\nstatic __isl_give isl_multi_pw_aff *pullback_index(\n\t__isl_take isl_multi_pw_aff *index, __isl_keep isl_id *id, void *user)\n{\n\tisl_pw_multi_aff *iterator_map = user;\n\n\titerator_map = isl_pw_multi_aff_copy(iterator_map);\n\treturn isl_multi_pw_aff_pullback_pw_multi_aff(index, iterator_map);\n}\n\n/* Transform the accesses in the statement associated to the domain\n * called by \"node\" to refer to the AST loop iterators, construct\n * corresponding AST expressions using \"build\",\n * collect them in a ppcg_stmt and annotate the node with the ppcg_stmt.\n */\nstatic __isl_give isl_ast_node *at_each_domain(__isl_take isl_ast_node *node,\n\t__isl_keep isl_ast_build *build, void *user)\n{\n\tstruct ppcg_scop *scop = user;\n\tisl_ast_expr *expr, *arg;\n\tisl_ctx *ctx;\n\tisl_id *id;\n\tisl_map *map;\n\tisl_pw_multi_aff *iterator_map;\n\tstruct ppcg_stmt *stmt;\n\n\tctx = isl_ast_node_get_ctx(node);\n\tstmt = isl_calloc_type(ctx, struct ppcg_stmt);\n\tif (!stmt)\n\t\tgoto error;\n\n\texpr = isl_ast_node_user_get_expr(node);\n\targ = isl_ast_expr_get_op_arg(expr, 0);\n\tisl_ast_expr_free(expr);\n\tid = isl_ast_expr_get_id(arg);\n\tisl_ast_expr_free(arg);\n\tstmt->stmt = find_stmt(scop, id);\n\tisl_id_free(id);\n\tif (!stmt->stmt)\n\t\tgoto error;\n\n\tmap = isl_map_from_union_map(isl_ast_build_get_schedule(build));\n\tmap = isl_map_reverse(map);\n\titerator_map = isl_pw_multi_aff_from_map(map);\n\tstmt->ref2expr = pet_stmt_build_ast_exprs(stmt->stmt, build,\n\t\t\t\t    &pullback_index, iterator_map, NULL, NULL);\n\tisl_pw_multi_aff_free(iterator_map);\n\n\tid = isl_id_alloc(isl_ast_node_get_ctx(node), NULL, stmt);\n\tid = isl_id_set_free_user(id, &ppcg_stmt_free);\n\treturn isl_ast_node_set_annotation(node, id);\nerror:\n\tppcg_stmt_free(stmt);\n\treturn isl_ast_node_free(node);\n}\n\n/* Set *depth (initialized to 0 by the caller) to the maximum\n * of the schedule depths of the leaf nodes for which this function is called.\n */\nstatic isl_bool update_depth(__isl_keep isl_schedule_node *node, void *user)\n{\n\tint *depth = user;\n\tint node_depth;\n\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_leaf)\n\t\treturn isl_bool_true;\n\tnode_depth = isl_schedule_node_get_schedule_depth(node);\n\tif (node_depth > *depth)\n\t\t*depth = node_depth;\n\n\treturn isl_bool_false;\n}\n\n/* This function is called for each node in a CPU AST.\n * In case of a user node, print the macro definitions required\n * for printing the AST expressions in the annotation, if any.\n * For other nodes, return true such that descendants are also\n * visited.\n *\n * In particular, print the macro definitions needed for the substitutions\n * of the original user statements.\n */\nstatic isl_bool at_node(__isl_keep isl_ast_node *node, void *user)\n{\n\tstruct ppcg_stmt *stmt;\n\tisl_id *id;\n\tisl_printer **p = user;\n\n\tif (isl_ast_node_get_type(node) != isl_ast_node_user)\n\t\treturn isl_bool_true;\n\n\tid = isl_ast_node_get_annotation(node);\n\tstmt = isl_id_get_user(id);\n\tisl_id_free(id);\n\n\tif (!stmt)\n\t\treturn isl_bool_error;\n\n\t*p = ppcg_print_body_macros(*p, stmt->ref2expr);\n\tif (!*p)\n\t\treturn isl_bool_error;\n\n\treturn isl_bool_false;\n}\n\n/* Print the required macros for the CPU AST \"node\" to \"p\",\n * including those needed for the user statements inside the AST.\n */\nstatic __isl_give isl_printer *cpu_print_macros(__isl_take isl_printer *p,\n\t__isl_keep isl_ast_node *node)\n{\n\tif (isl_ast_node_foreach_descendant_top_down(node, &at_node, &p) < 0)\n\t\treturn isl_printer_free(p);\n\tp = ppcg_print_macros(p, node);\n\treturn p;\n}\n\n/* Initialize the fields of \"build_info\".\n *\n * Initially, the AST generation is not inside any parallel for loop.\n *\n * The contraction of the entire schedule tree is extracted\n * right underneath the root node.\n */\nstatic isl_stat init_build_info(struct ast_build_userinfo *build_info,\n\tstruct ppcg_scop *scop, __isl_keep isl_schedule *schedule)\n{\n\tisl_schedule_node *node = isl_schedule_get_root(schedule);\n\tnode = isl_schedule_node_child(node, 0);\n\n\tbuild_info->scop = scop;\n\tbuild_info->in_parallel_for = 0;\n\tbuild_info->contraction =\n\t\tisl_schedule_node_get_subtree_contraction(node);\n\n\tisl_schedule_node_free(node);\n\n\treturn isl_stat_non_null(build_info->contraction);\n}\n\n/* Clear all memory allocated by \"build_info\".\n */\nstatic void clear_build_info(struct ast_build_userinfo *build_info)\n{\n\tisl_union_pw_multi_aff_free(build_info->contraction);\n}\n\n/* Code generate the scop 'scop' using \"schedule\"\n * and print the corresponding C code to 'p'.\n */\nstatic __isl_give isl_printer *print_scop(struct ppcg_scop *scop,\n\t__isl_take isl_schedule *schedule, __isl_take isl_printer *p,\n\tstruct ppcg_options *options)\n{\n\tisl_ctx *ctx = isl_printer_get_ctx(p);\n\tisl_ast_build *build;\n\tisl_ast_print_options *print_options;\n\tisl_ast_node *tree;\n\tisl_id_list *iterators;\n\tstruct ast_build_userinfo build_info;\n\tint depth;\n\n\tdepth = 0;\n\tif (isl_schedule_foreach_schedule_node_top_down(schedule, &update_depth,\n\t\t\t\t\t\t&depth) < 0)\n\t\tgoto error;\n\n\tbuild = isl_ast_build_alloc(ctx);\n\titerators = ppcg_scop_generate_names(scop, depth, \"c\");\n\tbuild = isl_ast_build_set_iterators(build, iterators);\n\tbuild = isl_ast_build_set_at_each_domain(build, &at_each_domain, scop);\n\n\tif (options->openmp) {\n\t\tif (init_build_info(&build_info, scop, schedule) < 0)\n\t\t\tbuild = isl_ast_build_free(build);\n\n\t\tbuild = isl_ast_build_set_before_each_for(build,\n\t\t\t\t\t\t\t&ast_build_before_for,\n\t\t\t\t\t\t\t&build_info);\n\t\tbuild = isl_ast_build_set_after_each_for(build,\n\t\t\t\t\t\t\t&ast_build_after_for,\n\t\t\t\t\t\t\t&build_info);\n\t}\n\n\ttree = isl_ast_build_node_from_schedule(build, schedule);\n\tisl_ast_build_free(build);\n\n\tif (options->openmp)\n\t\tclear_build_info(&build_info);\n\n\tprint_options = isl_ast_print_options_alloc(ctx);\n\tprint_options = isl_ast_print_options_set_print_user(print_options,\n\t\t\t\t\t\t\t&print_user, NULL);\n\n\tprint_options = isl_ast_print_options_set_print_for(print_options,\n\t\t\t\t\t\t\t&print_for, NULL);\n\n\tp = cpu_print_macros(p, tree);\n\tp = isl_ast_node_print(tree, p, print_options);\n\n\tisl_ast_node_free(tree);\n\n\treturn p;\nerror:\n\tisl_schedule_free(schedule);\n\tisl_printer_free(p);\n\treturn NULL;\n}\n\n/* Tile the band node \"node\" with tile sizes \"sizes\" and\n * mark all members of the resulting tile node as \"atomic\".\n */\nstatic __isl_give isl_schedule_node *tile(__isl_take isl_schedule_node *node,\n\t__isl_take isl_multi_val *sizes)\n{\n\tnode = isl_schedule_node_band_tile(node, sizes);\n\tnode = ppcg_set_schedule_node_type(node, isl_ast_loop_atomic);\n\n\treturn node;\n}\n\n/* Tile \"node\", if it is a band node with at least 2 members.\n * The tile sizes are set from the \"tile_size\" option.\n */\nstatic __isl_give isl_schedule_node *tile_band(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\tstruct ppcg_scop *scop = user;\n\tint n;\n\tisl_space *space;\n\tisl_multi_val *sizes;\n\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n\t\treturn node;\n\n\tn = isl_schedule_node_band_n_member(node);\n\tif (n <= 1)\n\t\treturn node;\n\n\tspace = isl_schedule_node_band_get_space(node);\n\tsizes = ppcg_multi_val_from_int(space, scop->options->tile_size);\n\n\treturn tile(node, sizes);\n}\n\n/* Construct schedule constraints from the dependences in ps\n * for the purpose of computing a schedule for a CPU.\n *\n * The proximity constraints are set to the flow dependences.\n *\n * If live-range reordering is allowed then the conditional validity\n * constraints are set to the order dependences with the flow dependences\n * as condition.  That is, a live-range (flow dependence) will be either\n * local to an iteration of a band or all adjacent order dependences\n * will be respected by the band.\n * The validity constraints are set to the union of the flow dependences\n * and the forced dependences, while the coincidence constraints\n * are set to the union of the flow dependences, the forced dependences and\n * the order dependences.\n *\n * If live-range reordering is not allowed, then both the validity\n * and the coincidence constraints are set to the union of the flow\n * dependences and the false dependences.\n *\n * Note that the coincidence constraints are only set when the \"openmp\"\n * options is set.  Even though the way openmp pragmas are introduced\n * does not rely on the coincident property of the schedule band members,\n * the coincidence constraints do affect the way the schedule is constructed,\n * such that more schedule dimensions should be detected as parallel\n * by ast_schedule_dim_is_parallel.\n * Since the order dependences are also taken into account by\n * ast_schedule_dim_is_parallel, they are also added to\n * the coincidence constraints.  If the openmp handling learns\n * how to privatize some memory, then the corresponding order\n * dependences can be removed from the coincidence constraints.\n */\nstatic __isl_give isl_schedule_constraints *construct_cpu_schedule_constraints(\n\tstruct ppcg_scop *ps)\n{\n\tisl_schedule_constraints *sc;\n\tisl_union_map *validity, *coincidence;\n\n\tsc = isl_schedule_constraints_on_domain(isl_union_set_copy(ps->domain));\n\tif (ps->options->live_range_reordering) {\n\t\tsc = isl_schedule_constraints_set_conditional_validity(sc,\n\t\t\t\tisl_union_map_copy(ps->tagged_dep_flow),\n\t\t\t\tisl_union_map_copy(ps->tagged_dep_order));\n\t\tvalidity = isl_union_map_copy(ps->dep_flow);\n\t\tvalidity = isl_union_map_union(validity,\n\t\t\t\tisl_union_map_copy(ps->dep_forced));\n\t\tif (ps->options->openmp) {\n\t\t\tcoincidence = isl_union_map_copy(validity);\n\t\t\tcoincidence = isl_union_map_union(coincidence,\n\t\t\t\t\tisl_union_map_copy(ps->dep_order));\n\t\t}\n\t} else {\n\t\tvalidity = isl_union_map_copy(ps->dep_flow);\n\t\tvalidity = isl_union_map_union(validity,\n\t\t\t\tisl_union_map_copy(ps->dep_false));\n\t\tif (ps->options->openmp)\n\t\t\tcoincidence = isl_union_map_copy(validity);\n\t}\n\tif (ps->options->openmp)\n\t\tsc = isl_schedule_constraints_set_coincidence(sc, coincidence);\n\tsc = isl_schedule_constraints_set_validity(sc, validity);\n\tsc = isl_schedule_constraints_set_proximity(sc,\n\t\t\t\t\tisl_union_map_copy(ps->dep_flow));\n\n\treturn sc;\n}\n\n/* Compute a schedule for the scop \"ps\".\n *\n * First derive the appropriate schedule constraints from the dependences\n * in \"ps\" and then compute a schedule from those schedule constraints,\n * possibly grouping statement instances based on the input schedule.\n */\nstatic __isl_give isl_schedule *compute_cpu_schedule(struct ppcg_scop *ps)\n{\n\tisl_schedule_constraints *sc;\n\tisl_schedule *schedule;\n\n\tif (!ps)\n\t\treturn NULL;\n\n\tsc = construct_cpu_schedule_constraints(ps);\n\n\tschedule = ppcg_compute_schedule(sc, ps->schedule, ps->options);\n\n\treturn schedule;\n}\n\n/* Compute a new schedule to the scop \"ps\" if the reschedule option is set.\n * Otherwise, return a copy of the original schedule.\n */\nstatic __isl_give isl_schedule *optionally_compute_schedule(void *user)\n{\n\tstruct ppcg_scop *ps = user;\n\n\tif (!ps)\n\t\treturn NULL;\n\tif (!ps->options->reschedule)\n\t\treturn isl_schedule_copy(ps->schedule);\n\treturn compute_cpu_schedule(ps);\n}\n\n/* Compute a schedule based on the dependences in \"ps\" and\n * tile it if requested by the user.\n */\nstatic __isl_give isl_schedule *get_schedule(struct ppcg_scop *ps,\n\tstruct ppcg_options *options)\n{\n\tisl_ctx *ctx;\n\tisl_schedule *schedule;\n\n\tif (!ps)\n\t\treturn NULL;\n\n\tctx = isl_union_set_get_ctx(ps->domain);\n\tschedule = ppcg_get_schedule(ctx, options,\n\t\t\t\t    &optionally_compute_schedule, ps);\n\tif (ps->options->tile)\n\t\tschedule = isl_schedule_map_schedule_node_bottom_up(schedule,\n\t\t\t\t\t\t\t&tile_band, ps);\n\n\treturn schedule;\n}\n\n/* Generate CPU code for the scop \"ps\" using \"schedule\" and\n * print the corresponding C code to \"p\", including variable declarations.\n */\nstatic __isl_give isl_printer *print_cpu_with_schedule(\n\t__isl_take isl_printer *p, struct ppcg_scop *ps,\n\t__isl_take isl_schedule *schedule, struct ppcg_options *options)\n{\n\tint hidden;\n\tisl_set *context;\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"/* ppcg generated CPU code */\");\n\tp = isl_printer_end_line(p);\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_end_line(p);\n\n\tp = ppcg_set_macro_names(p);\n\tp = ppcg_print_exposed_declarations(p, ps);\n\thidden = ppcg_scop_any_hidden_declarations(ps);\n\tif (hidden) {\n\t\tp = ppcg_start_block(p);\n\t\tp = ppcg_print_hidden_declarations(p, ps);\n\t}\n\n\tcontext = isl_set_copy(ps->context);\n\tcontext = isl_set_from_params(context);\n\tschedule = isl_schedule_insert_context(schedule, context);\n\tif (options->debug->dump_final_schedule)\n\t\tisl_schedule_dump(schedule);\n\tp = print_scop(ps, schedule, p, options);\n\tif (hidden)\n\t\tp = ppcg_end_block(p);\n\n\treturn p;\n}\n\n/* Generate CPU code for the scop \"ps\" and print the corresponding C code\n * to \"p\", including variable declarations.\n */\n__isl_give isl_printer *print_cpu(__isl_take isl_printer *p,\n\tstruct ppcg_scop *ps, struct ppcg_options *options)\n{\n\tisl_schedule *schedule;\n\n\tschedule = isl_schedule_copy(ps->schedule);\n\treturn print_cpu_with_schedule(p, ps, schedule, options);\n}\n\n/* Generate CPU code for \"scop\" and print it to \"p\".\n *\n * First obtain a schedule for \"scop\" and then print code for \"scop\"\n * using that schedule.\n */\nstatic __isl_give isl_printer *generate(__isl_take isl_printer *p,\n\tstruct ppcg_scop *scop, struct ppcg_options *options)\n{\n\tisl_schedule *schedule;\n\n\tschedule = get_schedule(scop, options);\n\n\treturn print_cpu_with_schedule(p, scop, schedule, options);\n}\n\n/* Wrapper around generate for use as a ppcg_transform callback.\n */\nstatic __isl_give isl_printer *print_cpu_wrap(__isl_take isl_printer *p,\n\tstruct ppcg_scop *scop, void *user)\n{\n\tstruct ppcg_options *options = user;\n\n\treturn generate(p, scop, options);\n}\n\n/* Transform the code in the file called \"input\" by replacing\n * all scops by corresponding CPU code and write the results to a file\n * called \"output\".\n */\nint generate_cpu(isl_ctx *ctx, struct ppcg_options *options,\n\tconst char *input, const char *output)\n{\n\tFILE *output_file;\n\tint r;\n\n\toutput_file = get_output_file(input, output);\n\tif (!output_file)\n\t\treturn -1;\n\n\tr = ppcg_transform(ctx, input, output_file, options,\n\t\t\t\t\t&print_cpu_wrap, options);\n\n\tfclose(output_file);\n\n\treturn r;\n}\n"
  },
  {
    "path": "src/cpu.h",
    "content": "#ifndef _CPU_H\n#define _CPU_H\n\n#include <isl/ctx.h>\n\n#include \"ppcg.h\"\n\n#ifdef __cplusplus\nextern \"C\"\n{\n#endif\n\n\tstruct ppcg_options;\n\n\t__isl_give isl_printer *print_cpu(__isl_take isl_printer *p,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tstruct ppcg_scop *ps, struct ppcg_options *options);\n\tint generate_cpu(isl_ctx *ctx, struct ppcg_options *options,\n\t\t\t\t\t\t\t\t\t const char *input, const char *output);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "src/examples/chemv.c",
    "content": "/*\n * Copyright 2014      ARM Ltd.\n *\n * Use of this software is governed by the MIT license\n */\n\n#include <stdio.h>\n#include <stdlib.h>\n\nstruct ComplexFloat\n{\n\tfloat Re;\n\tfloat Im;\n};\n\n/* chemv - complex hermitian matrix-vector multiplication\n * The function body was taken from a VOBLA-generated BLAS library.\n */\nvoid chemv(int n, float alpha_re, float alpha_im,\n\tint ldAT, struct ComplexFloat AT[restrict const static n][ldAT],\n\tint incX, struct ComplexFloat X[restrict const static n][incX],\n\tfloat beta_re, float beta_im,\n\tint incY, struct ComplexFloat Y[restrict const static n][incY])\n{\n#pragma scop\n\tfor (int i0 = 0; i0 <= (n-1); i0 += 1) {\n\t\tfloat var5_Re;\n\t\tfloat var5_Im;\n\t\tvar5_Re = ((Y[i0][0].Re*beta_re)-(Y[i0][0].Im*beta_im));\n\t\tvar5_Im = ((Y[i0][0].Im*beta_re)+(Y[i0][0].Re*beta_im));\n\t\tY[i0][0].Re = var5_Re;\n\t\tY[i0][0].Im = var5_Im;\n\t}\n\tfor (int i1 = 0; i1 <= ((n-1)+1)-1; i1 += 1) {\n\t\tfloat var2_Re;\n\t\tfloat var3_Im;\n\t\tfloat var2_Im;\n\t\tfloat var4_Im;\n\t\tfloat var4_Re;\n\t\tfloat var3_Re;\n\t\tvar2_Re = (alpha_re*AT[i1][i1].Re);\n\t\tvar2_Im = (alpha_im*AT[i1][i1].Re);\n\t\tvar3_Re = ((var2_Re*X[i1][0].Re)-(var2_Im*X[i1][0].Im));\n\t\tvar3_Im = ((var2_Im*X[i1][0].Re)+(var2_Re*X[i1][0].Im));\n\t\tvar4_Re = (Y[i1][0].Re+var3_Re);\n\t\tvar4_Im = (Y[i1][0].Im+var3_Im);\n\t\tY[i1][0].Re = var4_Re;\n\t\tY[i1][0].Im = var4_Im;\n\t}\n\tfor (int i2 = 0; i2 <= ((n-1)-1); i2 += 1) {\n\t\tfor (int i3 = 0; i3 <= (n-1)-(1+i2); i3 += 1) {\n\t\t\tfloat var99_Re;\n\t\t\tfloat var96_Re;\n\t\t\tfloat var98_Im;\n\t\t\tfloat var96_Im;\n\t\t\tfloat var94_Im;\n\t\t\tfloat var95_Im;\n\t\t\tfloat var94_Re;\n\t\t\tfloat var95_Re;\n\t\t\tfloat var97_Im;\n\t\t\tfloat var99_Im;\n\t\t\tfloat var97_Re;\n\t\t\tfloat var98_Re;\n\t\t\tvar94_Re = ((alpha_re*AT[i2][((1+i2)+i3)].Re)-\n\t\t\t\t(alpha_im*(-AT[i2][((1+i2)+i3)].Im)));\n\t\t\tvar94_Im = ((alpha_im*AT[i2][((1+i2)+i3)].Re)+\n\t\t\t\t(alpha_re*(-AT[i2][((1+i2)+i3)].Im)));\n\t\t\tvar95_Re = ((var94_Re*X[((i3+i2)+1)][0].Re)-\n\t\t\t\t(var94_Im*X[((i3+i2)+1)][0].Im));\n\t\t\tvar95_Im = ((var94_Im*X[((i3+i2)+1)][0].Re)+\n\t\t\t\t(var94_Re*X[((i3+i2)+1)][0].Im));\n\t\t\tvar96_Re = (Y[i2][0].Re+var95_Re);\n\t\t\tvar96_Im = (Y[i2][0].Im+var95_Im);\n\t\t\tY[i2][0].Re = var96_Re;\n\t\t\tY[i2][0].Im = var96_Im;\n\t\t\tvar97_Re = ((alpha_re*AT[i2][((1+i2)+i3)].Re)-\n\t\t\t\t(alpha_im*AT[i2][((1+i2)+i3)].Im));\n\t\t\tvar97_Im = ((alpha_im*AT[i2][((1+i2)+i3)].Re)+\n\t\t\t\t(alpha_re*AT[i2][((1+i2)+i3)].Im));\n\t\t\tvar98_Re = ((var97_Re*X[i2][0].Re)-\n\t\t\t\t(var97_Im*X[i2][0].Im));\n\t\t\tvar98_Im = ((var97_Im*X[i2][0].Re)+\n\t\t\t\t(var97_Re*X[i2][0].Im));\n\t\t\tvar99_Re = (Y[((i3+i2)+1)][0].Re+var98_Re);\n\t\t\tvar99_Im = (Y[((i3+i2)+1)][0].Im+var98_Im);\n\t\t\tY[((i3+i2)+1)][0].Re = var99_Re;\n\t\t\tY[((i3+i2)+1)][0].Im = var99_Im;\n\t\t}\n\t}\n#pragma endscop\n}\n\nint main()\n{\n\tconst int n = 37;\n\tconst int incX = 1;\n\tconst int incY = 1;\n\tconst int ldAT = n;\n\tstruct ComplexFloat AT[n][ldAT];\n\tstruct ComplexFloat X[n][incX];\n\tstruct ComplexFloat Y[n][incY];\n\n\tfor (int i = 0; i < n; i++) {\n\t\tX[i][0] = (struct ComplexFloat){i + 5, i * 2};\n\t\tY[i][0] = (struct ComplexFloat){i * 3, i + 7};\n\t\tfor (int j = 0; j < ldAT; j++) {\n\t\t\tAT[i][j] = (struct ComplexFloat){i + j, i + 3};\n\t\t}\n\t}\n\n\tchemv(n, 3.14f, 1.59f, ldAT, AT, incX, X, 2.71f, 8.28f, incY, Y);\n\n\tfor (int i = 0; i < n; i++)\n\t\tprintf(\"%0.2f %0.2f\\n\", Y[i][0].Re, Y[i][0].Im);\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/get_submodules.sh",
    "content": "#!/bin/sh\ngit submodule init\ngit submodule update\n(cd isl; git submodule init imath; git submodule update imath)\n"
  },
  {
    "path": "src/grouping.c",
    "content": "/*\n * Copyright 2016      Sven Verdoolaege\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege.\n */\n\n#include <isl/ctx.h>\n#include <isl/id.h>\n#include <isl/val.h>\n#include <isl/space.h>\n#include <isl/aff.h>\n#include <isl/set.h>\n#include <isl/map.h>\n#include <isl/union_set.h>\n#include <isl/union_map.h>\n#include <isl/schedule.h>\n#include <isl/schedule_node.h>\n\n#include \"grouping.h\"\n#include \"schedule.h\"\n\n/* Internal data structure for use during the detection of statements\n * that can be grouped.\n *\n * \"sc\" contains the original schedule constraints (not a copy).\n * The validity constraints of \"sc\" are adjusted based on the groups\n * found so far.\n * \"dep\" contains the intersection of the validity and the proximity\n * constraints in \"sc\".  It may be NULL if it has not been computed yet.\n * \"group_id\" is the identifier for the next group that is extracted.\n *\n * \"domain\" is the set of statement instances that belong to any of the groups.\n * \"contraction\" maps the elements of \"domain\" to the corresponding group\n * instances.\n * \"schedule\" schedules the statements in each group relatively to each other.\n * These last three fields are NULL if no groups have been found so far.\n */\nstruct ppcg_grouping {\n\tisl_schedule_constraints *sc;\n\n\tisl_union_map *dep;\n\tint group_id;\n\n\tisl_union_set *domain;\n\tisl_union_pw_multi_aff *contraction;\n\tisl_schedule *schedule;\n};\n\n/* Clear all memory allocated by \"grouping\".\n */\nstatic void ppcg_grouping_clear(struct ppcg_grouping *grouping)\n{\n\tisl_union_map_free(grouping->dep);\n\tisl_union_set_free(grouping->domain);\n\tisl_union_pw_multi_aff_free(grouping->contraction);\n\tisl_schedule_free(grouping->schedule);\n}\n\n/* Compute the intersection of the proximity and validity dependences\n * in grouping->sc and store the result in grouping->dep, unless\n * this intersection has been computed before.\n */\nstatic isl_stat ppcg_grouping_compute_dep(struct ppcg_grouping *grouping)\n{\n\tisl_union_map *validity, *proximity;\n\n\tif (grouping->dep)\n\t\treturn isl_stat_ok;\n\n\tvalidity = isl_schedule_constraints_get_validity(grouping->sc);\n\tproximity = isl_schedule_constraints_get_proximity(grouping->sc);\n\tgrouping->dep = isl_union_map_intersect(validity, proximity);\n\n\tif (!grouping->dep)\n\t\treturn isl_stat_error;\n\n\treturn isl_stat_ok;\n}\n\n/* Information extracted from one or more consecutive leaves\n * in the input schedule.\n *\n * \"list\" contains the sets of statement instances in the leaves,\n * one element in the list for each original leaf.\n * \"domain\" contains the union of the sets in \"list\".\n * \"prefix\" contains the prefix schedule of these elements.\n */\nstruct ppcg_grouping_leaf {\n\tisl_union_set *domain;\n\tisl_union_set_list *list;\n\tisl_multi_union_pw_aff *prefix;\n};\n\n/* Free all memory allocated for \"leaves\".\n */\nstatic void ppcg_grouping_leaf_free(int n, struct ppcg_grouping_leaf leaves[n])\n{\n\tint i;\n\n\tif (!leaves)\n\t\treturn;\n\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_union_set_free(leaves[i].domain);\n\t\tisl_union_set_list_free(leaves[i].list);\n\t\tisl_multi_union_pw_aff_free(leaves[i].prefix);\n\t}\n\n\tfree(leaves);\n}\n\n/* Short-hand for retrieving the prefix schedule at \"node\"\n * in the form of an isl_multi_union_pw_aff.\n */\nstatic __isl_give isl_multi_union_pw_aff *get_prefix(\n\t__isl_keep isl_schedule_node *node)\n{\n\treturn isl_schedule_node_get_prefix_schedule_multi_union_pw_aff(node);\n}\n\n/* Return an array of \"n\" elements with information extracted from\n * the \"n\" children of \"node\" starting at \"first\", all of which\n * are known to be filtered leaves.\n */\nstruct ppcg_grouping_leaf *extract_leaves(__isl_keep isl_schedule_node *node,\n\tint first, int n)\n{\n\tint i;\n\tisl_ctx *ctx;\n\tstruct ppcg_grouping_leaf *leaves;\n\n\tif (!node)\n\t\treturn NULL;\n\n\tctx = isl_schedule_node_get_ctx(node);\n\tleaves = isl_calloc_array(ctx, struct ppcg_grouping_leaf, n);\n\tif (!leaves)\n\t\treturn NULL;\n\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_schedule_node *child;\n\t\tisl_union_set *domain;\n\n\t\tchild = isl_schedule_node_get_child(node, first + i);\n\t\tchild = isl_schedule_node_child(child, 0);\n\t\tdomain = isl_schedule_node_get_domain(child);\n\t\tleaves[i].domain = isl_union_set_copy(domain);\n\t\tleaves[i].list = isl_union_set_list_from_union_set(domain);\n\t\tleaves[i].prefix = get_prefix(child);\n\t\tisl_schedule_node_free(child);\n\t}\n\n\treturn leaves;\n}\n\n/* Internal data structure used by merge_leaves.\n *\n * \"src\" and \"dst\" point to the two consecutive leaves that are\n * under investigation for being merged.\n * \"merge\" is initially set to 0 and is set to 1 as soon as\n * it turns out that it is useful to merge the two leaves.\n */\nstruct ppcg_merge_leaves_data {\n\tint merge;\n\tstruct ppcg_grouping_leaf *src;\n\tstruct ppcg_grouping_leaf *dst;\n};\n\n/* Given a relation \"map\" between instances of two statements A and B,\n * does it relate every instance of A (according to the domain of \"src\")\n * to every instance of B (according to the domain of \"dst\")?\n */\nstatic isl_bool covers_src_and_dst(__isl_keep isl_map *map,\n\tstruct ppcg_grouping_leaf *src, struct ppcg_grouping_leaf *dst)\n{\n\tisl_space *space;\n\tisl_set *set1, *set2;\n\tisl_bool is_subset;\n\n\tspace = isl_space_domain(isl_map_get_space(map));\n\tset1 = isl_union_set_extract_set(src->domain, space);\n\tset2 = isl_map_domain(isl_map_copy(map));\n\tis_subset = isl_set_is_subset(set1, set2);\n\tisl_set_free(set1);\n\tisl_set_free(set2);\n\tif (is_subset < 0 || !is_subset)\n\t\treturn is_subset;\n\n\tspace = isl_space_range(isl_map_get_space(map));\n\tset1 = isl_union_set_extract_set(dst->domain, space);\n\tset2 = isl_map_range(isl_map_copy(map));\n\tis_subset = isl_set_is_subset(set1, set2);\n\tisl_set_free(set1);\n\tisl_set_free(set2);\n\n\treturn is_subset;\n}\n\n/* Given a relation \"map\" between instances of two statements A and B,\n * are pairs of related instances executed together in the input schedule?\n * That is, is each pair of instances assigned the same value\n * by the corresponding prefix schedules?\n *\n * In particular, select the subset of \"map\" that has pairs of elements\n * with the same value for the prefix schedules and then check\n * if \"map\" is still a subset of the result.\n */\nstatic isl_bool matches_prefix(__isl_keep isl_map *map,\n\tstruct ppcg_grouping_leaf *src, struct ppcg_grouping_leaf *dst)\n{\n\tisl_union_map *umap, *equal;\n\tisl_multi_union_pw_aff *src_prefix, *dst_prefix, *prefix;\n\tisl_bool is_subset;\n\n\tsrc_prefix = isl_multi_union_pw_aff_copy(src->prefix);\n\tdst_prefix = isl_multi_union_pw_aff_copy(dst->prefix);\n\tprefix = isl_multi_union_pw_aff_union_add(src_prefix, dst_prefix);\n\n\tumap = isl_union_map_from_map(isl_map_copy(map));\n\tequal = isl_union_map_copy(umap);\n\tequal = isl_union_map_eq_at_multi_union_pw_aff(equal, prefix);\n\n\tis_subset = isl_union_map_is_subset(umap, equal);\n\n\tisl_union_map_free(umap);\n\tisl_union_map_free(equal);\n\n\treturn is_subset;\n}\n\n/* Given a set of validity and proximity schedule constraints \"map\"\n * between statements in consecutive leaves in a valid schedule,\n * should the two leaves be merged into one?\n *\n * In particular, the two are merged if the constraints form\n * a bijection between every instance of the first statement and\n * every instance of the second statement.  Moreover, each\n * pair of such dependent instances needs to be executed consecutively\n * in the input schedule.  That is, they need to be assigned\n * the same value by their prefix schedules.\n *\n * What this means is that for each instance of the first statement\n * there is exactly one instance of the second statement that\n * is executed immediately after the instance of the first statement and\n * that, moreover, both depends on this statement instance and\n * should be brought as close as possible to this statement instance.\n * In other words, it is both possible to execute the two instances\n * together (according to the input schedule) and desirable to do so\n * (according to the validity and proximity schedule constraints).\n */\nstatic isl_stat check_merge(__isl_take isl_map *map, void *user)\n{\n\tstruct ppcg_merge_leaves_data *data = user;\n\tisl_bool ok;\n\n\tok = covers_src_and_dst(map, data->src, data->dst);\n\tif (ok >= 0 && ok)\n\t\tok = isl_map_is_bijective(map);\n\tif (ok >= 0 && ok)\n\t\tok = matches_prefix(map, data->src, data->dst);\n\n\tisl_map_free(map);\n\n\tif (ok < 0)\n\t\treturn isl_stat_error;\n\tif (!ok)\n\t\treturn isl_stat_ok;\n\n\tdata->merge = 1;\n\treturn isl_stat_error;\n}\n\n/* Merge the leaves at position \"pos\" and \"pos + 1\" in \"leaves\".\n */\nstatic isl_stat merge_pair(int n, struct ppcg_grouping_leaf leaves[n], int pos)\n{\n\tint i;\n\n\tleaves[pos].domain = isl_union_set_union(leaves[pos].domain,\n\t\t\t\t\t\tleaves[pos + 1].domain);\n\tleaves[pos].list = isl_union_set_list_concat(leaves[pos].list,\n\t\t\t\t\t\tleaves[pos + 1].list);\n\tleaves[pos].prefix = isl_multi_union_pw_aff_union_add(\n\t\t\t\tleaves[pos].prefix, leaves[pos + 1].prefix);\n\tfor (i = pos + 1; i + 1 < n; ++i)\n\t\tleaves[i] = leaves[i + 1];\n\tleaves[n - 1].domain = NULL;\n\tleaves[n - 1].list = NULL;\n\tleaves[n - 1].prefix = NULL;\n\n\tif (!leaves[pos].domain || !leaves[pos].list || !leaves[pos].prefix)\n\t\treturn isl_stat_error;\n\n\treturn isl_stat_ok;\n}\n\n/* Merge pairs of consecutive leaves in \"leaves\" taking into account\n * the intersection of validity and proximity schedule constraints \"dep\".\n *\n * If a leaf has been merged with the next leaf, then the combination\n * is checked again for merging with the next leaf.\n * That is, if the leaves are A, B and C, then B may not have been\n * merged with C, but after merging A and B, it could still be useful\n * to merge the combination AB with C.\n *\n * Two leaves A and B are merged if there are instances of at least\n * one pair of statements, one statement in A and one B, such that\n * the validity and proximity schedule constraints between them\n * make them suitable for merging according to check_merge.\n *\n * Return the final number of leaves in the sequence, or -1 on error.\n */\nstatic int merge_leaves(int n, struct ppcg_grouping_leaf leaves[n],\n\t__isl_keep isl_union_map *dep)\n{\n\tint i;\n\tstruct ppcg_merge_leaves_data data;\n\n\tfor (i = n - 1; i >= 0; --i) {\n\t\tisl_union_map *dep_i;\n\t\tisl_stat ok;\n\n\t\tif (i + 1 >= n)\n\t\t\tcontinue;\n\n\t\tdep_i = isl_union_map_copy(dep);\n\t\tdep_i = isl_union_map_intersect_domain(dep_i,\n\t\t\t\tisl_union_set_copy(leaves[i].domain));\n\t\tdep_i = isl_union_map_intersect_range(dep_i,\n\t\t\t\tisl_union_set_copy(leaves[i + 1].domain));\n\t\tdata.merge = 0;\n\t\tdata.src = &leaves[i];\n\t\tdata.dst = &leaves[i + 1];\n\t\tok = isl_union_map_foreach_map(dep_i, &check_merge, &data);\n\t\tisl_union_map_free(dep_i);\n\t\tif (ok < 0 && !data.merge)\n\t\t\treturn -1;\n\t\tif (!data.merge)\n\t\t\tcontinue;\n\t\tif (merge_pair(n, leaves, i) < 0)\n\t\t\treturn -1;\n\t\t--n;\n\t\t++i;\n\t}\n\n\treturn n;\n}\n\n/* Construct a schedule with \"domain\" as domain, that executes\n * the elements of \"list\" in order (as a sequence).\n */\nstatic __isl_give isl_schedule *schedule_from_domain_and_list(\n\t__isl_keep isl_union_set *domain, __isl_keep isl_union_set_list *list)\n{\n\tisl_schedule *schedule;\n\tisl_schedule_node *node;\n\n\tschedule = isl_schedule_from_domain(isl_union_set_copy(domain));\n\tnode = isl_schedule_get_root(schedule);\n\tisl_schedule_free(schedule);\n\tnode = isl_schedule_node_child(node, 0);\n\tlist = isl_union_set_list_copy(list);\n\tnode = isl_schedule_node_insert_sequence(node, list);\n\tschedule = isl_schedule_node_get_schedule(node);\n\tisl_schedule_node_free(node);\n\n\treturn schedule;\n}\n\n/* Construct a unique identifier for a group in \"grouping\".\n *\n * The name is of the form G_n, with n the first value starting at\n * grouping->group_id that does not result in an identifier\n * that is already in use in the domain of the original schedule\n * constraints.\n */\nstatic isl_id *construct_group_id(struct ppcg_grouping *grouping,\n\t__isl_take isl_space *space)\n{\n\tisl_ctx *ctx;\n\tisl_id *id;\n\tisl_bool empty;\n\tisl_union_set *domain;\n\n\tif (!space)\n\t\treturn NULL;\n\n\tctx = isl_space_get_ctx(space);\n\tdomain = isl_schedule_constraints_get_domain(grouping->sc);\n\n\tdo {\n\t\tchar buffer[20];\n\t\tisl_id *id;\n\t\tisl_set *set;\n\n\t\tsnprintf(buffer, sizeof(buffer), \"G_%d\", grouping->group_id);\n\t\tgrouping->group_id++;\n\t\tid = isl_id_alloc(ctx, buffer, NULL);\n\t\tspace = isl_space_set_tuple_id(space, isl_dim_set, id);\n\t\tset = isl_union_set_extract_set(domain, isl_space_copy(space));\n\t\tempty = isl_set_plain_is_empty(set);\n\t\tisl_set_free(set);\n\t} while (empty >= 0 && !empty);\n\n\tif (empty < 0)\n\t\tspace = isl_space_free(space);\n\n\tid = isl_space_get_tuple_id(space, isl_dim_set);\n\n\tisl_space_free(space);\n\tisl_union_set_free(domain);\n\n\treturn id;\n}\n\n/* Construct a contraction from \"prefix\" and \"domain\" for a new group\n * in \"grouping\".\n *\n * The values of the prefix schedule \"prefix\" are used as instances\n * of the new group.  The identifier of the group is constructed\n * in such a way that it does not conflict with those of earlier\n * groups nor with statements in the domain of the original\n * schedule constraints.\n * The isl_multi_union_pw_aff \"prefix\" then simply needs to be\n * converted to an isl_union_pw_multi_aff.  However, this is not\n * possible if \"prefix\" is zero-dimensional, so in this case,\n * a contraction is constructed from \"domain\" instead.\n */\nstatic isl_union_pw_multi_aff *group_contraction_from_prefix_and_domain(\n\tstruct ppcg_grouping *grouping,\n\t__isl_keep isl_multi_union_pw_aff *prefix,\n\t__isl_keep isl_union_set *domain)\n{\n\tisl_id *id;\n\tisl_space *space;\n\tint dim;\n\n\tspace = isl_multi_union_pw_aff_get_space(prefix);\n\tif (!space)\n\t\treturn NULL;\n\tdim = isl_space_dim(space, isl_dim_set);\n\tid = construct_group_id(grouping, space);\n\tif (dim == 0) {\n\t\tisl_multi_val *mv;\n\n\t\tspace = isl_multi_union_pw_aff_get_space(prefix);\n\t\tspace = isl_space_set_tuple_id(space, isl_dim_set, id);\n\t\tmv = isl_multi_val_zero(space);\n\t\tdomain = isl_union_set_copy(domain);\n\t\treturn isl_union_pw_multi_aff_multi_val_on_domain(domain, mv);\n\t}\n\tprefix = isl_multi_union_pw_aff_copy(prefix);\n\tprefix = isl_multi_union_pw_aff_set_tuple_id(prefix, isl_dim_out, id);\n\treturn isl_union_pw_multi_aff_from_multi_union_pw_aff(prefix);\n}\n\n/* Remove the validity schedule constraints from \"sc\" between\n * statement instances that get contracted to the same group instance\n * by the contraction described by \"prefix\" and \"domain\".\n *\n * The values of the prefix schedule \"prefix\" are used as instances\n * of the new group.  This means that validity schedule constraints\n * between instances with the same prefix schedule value need to be removed.\n * If \"prefix\" is zero-dimensional, then it does not contain any\n * information about the domain.  Instead, those schedule constraints\n * are removed that connect pairs of instances in \"domain\".\n */\nstatic __isl_give isl_schedule_constraints *remove_group_validity(\n\t__isl_take isl_schedule_constraints *sc,\n\t__isl_keep isl_multi_union_pw_aff *prefix,\n\t__isl_keep isl_union_set *domain)\n{\n\tint n;\n\tisl_union_map *validity, *joined;\n\n\tvalidity = isl_schedule_constraints_get_validity(sc);\n\tjoined = isl_union_map_copy(validity);\n\tn = isl_multi_union_pw_aff_dim(prefix, isl_dim_out);\n\tif (n == 0) {\n\t\tjoined = isl_union_map_intersect_domain(joined,\n\t\t\t\t\t\tisl_union_set_copy(domain));\n\t\tjoined = isl_union_map_intersect_range(joined,\n\t\t\t\t\t\tisl_union_set_copy(domain));\n\t} else {\n\t\tjoined = isl_union_map_eq_at_multi_union_pw_aff(joined,\n\t\t\t\t\tisl_multi_union_pw_aff_copy(prefix));\n\t}\n\tvalidity = isl_union_map_subtract(validity, joined);\n\tsc = isl_schedule_constraints_set_validity(sc, validity);\n\treturn sc;\n}\n\n/* Extend \"grouping\" with groups corresponding to merged\n * leaves in the list of potentially merged leaves \"leaves\".\n *\n * The \"list\" field of each element in \"leaves\" contains a list\n * of the instances sets of the original leaves that have been\n * merged into this element.  If at least two of the original leaves\n * have been merged into a given element, then add the corresponding\n * group to \"grouping\" and remove validity schedule constraints\n * between statement instances that get mapped to the same group instance.\n * In particular, the domain is extended with the statement instances\n * of the merged leaves, the contraction is extended with a mapping\n * of these statement instances to instances of a new group and\n * the schedule is extended with a schedule that executes\n * the statement instances according to the order of the leaves\n * in which they appear.\n * Since the instances of the groups should already be scheduled apart\n * in the schedule into which this schedule will be plugged in,\n * the schedules of the individual groups are combined independently\n * of each other (as a set).\n */\nstatic isl_stat add_groups(struct ppcg_grouping *grouping,\n\tint n, struct ppcg_grouping_leaf leaves[n])\n{\n\tint i;\n\n\tfor (i = 0; i < n; ++i) {\n\t\tint n_leaf;\n\t\tisl_schedule *schedule;\n\t\tisl_union_set *domain;\n\t\tisl_union_pw_multi_aff *upma;\n\n\t\tn_leaf = isl_union_set_list_n_union_set(leaves[i].list);\n\t\tif (n_leaf < 0)\n\t\t\treturn isl_stat_error;\n\t\tif (n_leaf <= 1)\n\t\t\tcontinue;\n\t\tschedule = schedule_from_domain_and_list(leaves[i].domain,\n\t\t\t\t\t\t\tleaves[i].list);\n\t\tupma = group_contraction_from_prefix_and_domain(grouping,\n\t\t\t\t\tleaves[i].prefix, leaves[i].domain);\n\t\tgrouping->sc = remove_group_validity(grouping->sc,\n\t\t\t\t\tleaves[i].prefix, leaves[i].domain);\n\n\t\tdomain = isl_union_set_copy(leaves[i].domain);\n\t\tif (grouping->domain) {\n\t\t\tdomain = isl_union_set_union(domain, grouping->domain);\n\t\t\tupma = isl_union_pw_multi_aff_union_add(upma,\n\t\t\t\t\t\tgrouping->contraction);\n\t\t\tschedule = isl_schedule_set(schedule,\n\t\t\t\t\t\tgrouping->schedule);\n\t\t}\n\t\tgrouping->domain = domain;\n\t\tgrouping->contraction = upma;\n\t\tgrouping->schedule = schedule;\n\n\t\tif (!grouping->domain || !grouping->contraction ||\n\t\t    !grouping->schedule)\n\t\t\treturn isl_stat_error;\n\t}\n\n\treturn isl_stat_ok;\n}\n\n/* Look for any pairs of consecutive leaves among the \"n\" children of \"node\"\n * starting at \"first\" that should be merged together.\n * Store the results in \"grouping\".\n *\n * First make sure the intersection of validity and proximity\n * schedule constraints is available and extract the required\n * information from the \"n\" leaves.\n * Then try and merge consecutive leaves based on the validity\n * and proximity constraints.\n * If any pairs were successfully merged, then add groups\n * corresponding to the merged leaves to \"grouping\".\n */\nstatic isl_stat group_subsequence(__isl_keep isl_schedule_node *node,\n\tint first, int n, struct ppcg_grouping *grouping)\n{\n\tint n_merge;\n\tstruct ppcg_grouping_leaf *leaves;\n\n\tif (ppcg_grouping_compute_dep(grouping) < 0)\n\t\treturn isl_stat_error;\n\n\tleaves = extract_leaves(node, first, n);\n\tif (!leaves)\n\t\treturn isl_stat_error;\n\n\tn_merge = merge_leaves(n, leaves, grouping->dep);\n\tif (n_merge >= 0 && n_merge < n &&\n\t    add_groups(grouping, n_merge, leaves) < 0)\n\t\treturn isl_stat_error;\n\n\tppcg_grouping_leaf_free(n, leaves);\n\n\treturn isl_stat_ok;\n}\n\n/* If \"node\" is a sequence, then check if it has any consecutive\n * leaves that should be merged together and store the results\n * in \"grouping\".\n *\n * In particular, call group_subsequence on each consecutive\n * sequence of (filtered) leaves among the children of \"node\".\n */\nstatic isl_bool detect_groups(__isl_keep isl_schedule_node *node, void *user)\n{\n\tint i, n, first;\n\tstruct ppcg_grouping *grouping = user;\n\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_sequence)\n\t\treturn isl_bool_true;\n\n\tn = isl_schedule_node_n_children(node);\n\tif (n < 0)\n\t\treturn isl_bool_error;\n\n\tfirst = -1;\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_schedule_node *child;\n\t\tenum isl_schedule_node_type type;\n\n\t\tchild = isl_schedule_node_get_child(node, i);\n\t\tchild = isl_schedule_node_child(child, 0);\n\t\ttype = isl_schedule_node_get_type(child);\n\t\tisl_schedule_node_free(child);\n\n\t\tif (first >= 0 && type != isl_schedule_node_leaf) {\n\t\t\tif (group_subsequence(node, first, i - first,\n\t\t\t\t\t\tgrouping) < 0)\n\t\t\t\treturn isl_bool_error;\n\t\t\tfirst = -1;\n\t\t}\n\t\tif (first < 0 && type == isl_schedule_node_leaf)\n\t\t\tfirst = i;\n\t}\n\tif (first >= 0) {\n\t\tif (group_subsequence(node, first, n - first, grouping) < 0)\n\t\t\treturn isl_bool_error;\n\t}\n\n\treturn isl_bool_true;\n}\n\n/* Complete \"grouping\" to cover all statement instances in the domain\n * of grouping->sc.\n *\n * In particular, grouping->domain is set to the full set of statement\n * instances; group->contraction is extended with an identity\n * contraction on the additional instances and group->schedule\n * is extended with an independent schedule on those additional instances.\n * In the extension of group->contraction, the additional instances\n * are split into those belong to different statements and those\n * that belong to some of the same statements.  The first group\n * is replaced by its universe in order to simplify the contraction extension.\n */\nstatic void complete_grouping(struct ppcg_grouping *grouping)\n{\n\tisl_union_set *domain, *left, *overlap;\n\tisl_union_pw_multi_aff *upma;\n\tisl_schedule *schedule;\n\n\tdomain = isl_schedule_constraints_get_domain(grouping->sc);\n\tleft = isl_union_set_subtract(isl_union_set_copy(domain),\n\t\t\t\t    isl_union_set_copy(grouping->domain));\n\tschedule = isl_schedule_from_domain(isl_union_set_copy(left));\n\tschedule = isl_schedule_set(schedule, grouping->schedule);\n\tgrouping->schedule = schedule;\n\n\toverlap = isl_union_set_universe(grouping->domain);\n\tgrouping->domain = domain;\n\toverlap = isl_union_set_intersect(isl_union_set_copy(left), overlap);\n\tleft = isl_union_set_subtract(left, isl_union_set_copy(overlap));\n\tleft = isl_union_set_universe(left);\n\tleft = isl_union_set_union(left, overlap);\n\tupma = isl_union_set_identity_union_pw_multi_aff(left);\n\tupma = isl_union_pw_multi_aff_union_add(upma, grouping->contraction);\n\tgrouping->contraction = upma;\n}\n\n/* Report that the given grouping is used during scheduling\n * (if the verbose options is set).\n */\nstatic void report_grouping(__isl_keep isl_union_pw_multi_aff *contraction,\n\tstruct ppcg_options *options)\n{\n\tisl_ctx *ctx;\n\tisl_printer *p;\n\n\tif (!options->debug->verbose)\n\t\treturn;\n\n\tctx = isl_union_pw_multi_aff_get_ctx(contraction);\n\tp = isl_printer_to_file(ctx, stdout);\n\tp = isl_printer_print_str(p, \"Scheduling performed with grouping \");\n\tp = isl_printer_print_union_pw_multi_aff(p, contraction);\n\tp = isl_printer_print_str(p, \" (use --no-group-chains to disable)\");\n\tp = isl_printer_end_line(p);\n\tisl_printer_free(p);\n}\n\n/* Compute a schedule on the domain of \"sc\" that respects the schedule\n * constraints in \"sc\", after trying to combine groups of statements.\n *\n * \"schedule\" is a known correct schedule that is used while combining\n * groups of statements.\n * In particular, statements that are executed consecutively in a sequence\n * in this schedule and where all instances of the second depend on\n * the instance of the first that is executed in the same iteration\n * of outer band nodes are grouped together into a single statement.\n * The schedule constraints are then mapped to these groups of statements\n * and the resulting schedule is expanded again to refer to the original\n * statements.\n */\n__isl_give isl_schedule *ppcg_compute_grouping_schedule(\n\t__isl_take isl_schedule_constraints *sc,\n\t__isl_keep isl_schedule *schedule, struct ppcg_options *options)\n{\n\tstruct ppcg_grouping grouping = { sc };\n\tisl_union_pw_multi_aff *contraction;\n\tisl_union_map *umap;\n\tisl_schedule *res, *expansion;\n\n\tgrouping.group_id = 0;\n\tif (isl_schedule_foreach_schedule_node_top_down(schedule,\n\t\t\t&detect_groups, &grouping) < 0)\n\t\tgoto error;\n\tif (!grouping.contraction) {\n\t\tppcg_grouping_clear(&grouping);\n\t\treturn ppcg_compute_non_grouping_schedule(grouping.sc, options);\n\t}\n\tcomplete_grouping(&grouping);\n\tcontraction = isl_union_pw_multi_aff_copy(grouping.contraction);\n\treport_grouping(contraction, options);\n\tumap = isl_union_map_from_union_pw_multi_aff(contraction);\n\n\tsc = isl_schedule_constraints_apply(grouping.sc, umap);\n\n\tres = ppcg_compute_non_grouping_schedule(sc, options);\n\n\tcontraction = isl_union_pw_multi_aff_copy(grouping.contraction);\n\texpansion = isl_schedule_copy(grouping.schedule);\n\tres = isl_schedule_expand(res, contraction, expansion);\n\n\tppcg_grouping_clear(&grouping);\n\treturn res;\nerror:\n\tppcg_grouping_clear(&grouping);\n\tisl_schedule_constraints_free(sc);\n\treturn NULL;\n}\n"
  },
  {
    "path": "src/grouping.h",
    "content": "#ifndef PPCG_GROUPING_H\n\n#include <isl/schedule.h>\n\n#include \"ppcg_options.h\"\n\n__isl_give isl_schedule *ppcg_compute_grouping_schedule(\n\t\t__isl_take isl_schedule_constraints *sc,\n\t\t__isl_keep isl_schedule *schedule, struct ppcg_options *options);\n\n#endif\n"
  },
  {
    "path": "src/hybrid.c",
    "content": "/*\n * Copyright 2013      Ecole Normale Superieure\n * Copyright 2015      Sven Verdoolaege\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege,\n * Ecole Normale Superieure, 45 rue d'Ulm, 75230 Paris, France\n */\n\n#include <string.h>\n\n#include <isl/space.h>\n#include <isl/constraint.h>\n#include <isl/val.h>\n#include <isl/aff.h>\n#include <isl/set.h>\n#include <isl/map.h>\n#include <isl/union_set.h>\n#include <isl/union_map.h>\n\n#include \"hybrid.h\"\n#include \"schedule.h\"\n\n/* The hybrid tiling implemented in this file is based on\n * Grosser et al., \"Hybrid Hexagonal/Classical Tiling for GPUs\".\n */\n\n/* Bounds on relative dependence distances in input to hybrid tiling.\n * upper is an upper bound on the relative dependence distances\n * in the first space dimension\n * -lower is a lower bound on the relative dependence distances\n * in all space dimensions.\n *\n * In particular,\n *\n *\td_i >= -lower_i d_0\n * and\n *\td_1 <= upper d_0\n *\n * for each dependence distance vector d, where d_1 is the component\n * corresponding to the first space dimension.\n *\n * upper and lower are always non-negative.\n * Some of the values may be NaN if no bound could be found.\n */\nstruct ppcg_ht_bounds {\n\tisl_val *upper;\n\tisl_multi_val *lower;\n};\n\n/* Free \"bounds\" along with all its fields.\n */\n__isl_null ppcg_ht_bounds *ppcg_ht_bounds_free(\n\t__isl_take ppcg_ht_bounds *bounds)\n{\n\tif (!bounds)\n\t\treturn NULL;\n\tisl_val_free(bounds->upper);\n\tisl_multi_val_free(bounds->lower);\n\tfree(bounds);\n\n\treturn NULL;\n}\n\n/* Create a ppcg_ht_bounds object for a band living in \"space\".\n * The bounds are initialized to NaN.\n */\n__isl_give ppcg_ht_bounds *ppcg_ht_bounds_alloc(__isl_take isl_space *space)\n{\n\tint i, n;\n\tisl_ctx *ctx;\n\tppcg_ht_bounds *bounds;\n\n\tif (!space)\n\t\treturn NULL;\n\n\tctx = isl_space_get_ctx(space);\n\tbounds = isl_alloc_type(ctx, struct ppcg_ht_bounds);\n\tif (!bounds)\n\t\tgoto error;\n\tbounds->upper = isl_val_nan(ctx);\n\tbounds->lower = isl_multi_val_zero(space);\n\tn = isl_multi_val_dim(bounds->lower, isl_dim_set);\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_val *v = isl_val_copy(bounds->upper);\n\t\tbounds->lower = isl_multi_val_set_val(bounds->lower, i, v);\n\t}\n\n\tif (!bounds->lower || !bounds->upper)\n\t\treturn ppcg_ht_bounds_free(bounds);\n\n\treturn bounds;\nerror:\n\tisl_space_free(space);\n\treturn NULL;\n}\n\nvoid ppcg_ht_bounds_dump(__isl_keep ppcg_ht_bounds *bounds)\n{\n\tif (!bounds)\n\t\treturn;\n\n\tfprintf(stderr, \"lower: \");\n\tisl_multi_val_dump(bounds->lower);\n\tfprintf(stderr, \"upper: \");\n\tisl_val_dump(bounds->upper);\n}\n\n/* Return the upper bound on the relative dependence distances\n * in the first space dimension.\n */\n__isl_give isl_val *ppcg_ht_bounds_get_upper(__isl_keep ppcg_ht_bounds *bounds)\n{\n\tif (!bounds)\n\t\treturn NULL;\n\treturn isl_val_copy(bounds->upper);\n}\n\n/* Replace the upper bound on the relative dependence distances\n * in the first space dimension by \"upper\".\n */\n__isl_give ppcg_ht_bounds *ppcg_ht_bounds_set_upper(\n\t__isl_take ppcg_ht_bounds *bounds, __isl_take isl_val *upper)\n{\n\tif (!bounds || !upper)\n\t\tgoto error;\n\tisl_val_free(bounds->upper);\n\tbounds->upper = upper;\n\treturn bounds;\nerror:\n\tppcg_ht_bounds_free(bounds);\n\tisl_val_free(upper);\n\treturn NULL;\n}\n\n/* Return the lower bound on the relative dependence distances\n * in space dimension \"pos\".\n */\n__isl_give isl_val *ppcg_ht_bounds_get_lower(__isl_keep ppcg_ht_bounds *bounds,\n\tint pos)\n{\n\tif (!bounds)\n\t\treturn NULL;\n\treturn isl_multi_val_get_val(bounds->lower, pos);\n}\n\n/* Replace the lower bound on the relative dependence distances\n * in space dimension \"pos\" by \"lower\".\n */\n__isl_give ppcg_ht_bounds *ppcg_ht_bounds_set_lower(\n\t__isl_take ppcg_ht_bounds *bounds, int pos, __isl_take isl_val *lower)\n{\n\tif (!bounds || !lower)\n\t\tgoto error;\n\tbounds->lower = isl_multi_val_set_val(bounds->lower, pos, lower);\n\tif (!bounds->lower)\n\t\treturn ppcg_ht_bounds_free(bounds);\n\treturn bounds;\nerror:\n\tppcg_ht_bounds_free(bounds);\n\tisl_val_free(lower);\n\treturn NULL;\n}\n\n/* Can the bounds on relative dependence distances recorded in \"bounds\"\n * be used to perform hybrid tiling?\n * In particular, have appropriate lower and upper bounds been found?\n * Any NaN indicates that no corresponding bound was found.\n */\nisl_bool ppcg_ht_bounds_is_valid(__isl_keep ppcg_ht_bounds *bounds)\n{\n\tisl_bool is_nan;\n\tint i, n;\n\n\tif (!bounds)\n\t\treturn isl_bool_error;\n\tis_nan = isl_val_is_nan(bounds->upper);\n\tif (is_nan < 0)\n\t\treturn isl_bool_error;\n\tif (is_nan)\n\t\treturn isl_bool_false;\n\n\tn = isl_multi_val_dim(bounds->lower, isl_dim_set);\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_val *v;\n\n\t\tv = isl_multi_val_get_val(bounds->lower, i);\n\t\tis_nan = isl_val_is_nan(v);\n\t\tif (is_nan < 0)\n\t\t\treturn isl_bool_error;\n\t\tif (is_nan)\n\t\t\treturn isl_bool_false;\n\t\tisl_val_free(v);\n\t}\n\n\treturn isl_bool_true;\n}\n\n/* Structure that represents the basic hexagonal tiling,\n * along with information that is needed to perform the hybrid tiling.\n *\n * \"bounds\" are the bounds on the dependence distances that\n * define the hexagonal shape and the required skewing in the remaining\n * space dimensions.\n *\n * \"input_node\" points to the input pair of band nodes.\n * \"input_schedule\" is the partial schedule of this input pair of band nodes.\n * The space of this schedule is [P -> C], where P is the space\n * of the parent node and C is the space of the child node.\n *\n * \"space_sizes\" represent the total size of a tile for the space\n * dimensions, i.e., those corresponding to the child node.\n * The space of \"space_sizes\" is C.\n * If S_0 is the original tile size in the first space dimension,\n * then the first entry of \"space_sizes\" is equal to\n * W = 2*S_0 + floor(d_l h) + floor(d_u h).\n * The remaining entries are the same as in the original tile sizes.\n *\n * The basic hexagonal tiling \"hex\" is defined\n * in a \"ts\" (time-space) space and corresponds to the phase-1 tiles.\n * \"time_tile\" maps the \"ts\" space to outer time tile.\n * Is is equal to ts[t, s] -> floor(t/(2 * S_t)), with S_t the original tile\n * size corresponding to the parent node.\n * \"local_time\" maps the \"ts\" space to the time dimension inside each tile.\n * It is equal to ts[t, s] -> t mod (2 S_t), with S_t the original tile\n * size corresponding to the parent node.\n * \"shift_space\" shifts the tiles at time tile T = floor(t/(2 S_t))\n * in the space dimension such that they align to a multiple of W.\n * It is equal to ts[t, s] -> s + (-(2 * shift_s)*T) % W,\n * with shift_s = S_0 + floor(d_u h).\n * \"shift_phase\" is the shift taken to go from phase 0 to phase 1\n * It is equal to ts[t, s] -> ts[t + S_t, s + shift_s],\n * with shift_s = S_0 + floor(d_u h).\n *\n * \"project_ts\" projects the space of the input schedule to the ts-space.\n * It is equal to [P[t] -> C[s_0, ...]] -> ts[t, s_0].\n */\nstruct ppcg_ht_tiling {\n\tint ref;\n\n\tppcg_ht_bounds *bounds;\n\tisl_schedule_node *input_node;\n\tisl_multi_union_pw_aff *input_schedule;\n\n\tisl_multi_val *space_sizes;\n\n\tisl_aff *time_tile;\n\tisl_aff *local_time;\n\tisl_aff *shift_space;\n\tisl_multi_aff *shift_phase;\n\tisl_set *hex;\n\n\tisl_multi_aff *project_ts;\n};\ntypedef struct ppcg_ht_tiling ppcg_ht_tiling;\n\n/* Return the space of the pair of band nodes that form the input\n * to the hybrid tiling.\n * In particular, return the space [P -> C], where P is the space\n * of the parent node and C is the space of the child node.\n */\n__isl_give isl_space *ppcg_ht_tiling_get_input_space(\n\t__isl_keep ppcg_ht_tiling *tile)\n{\n\tif (!tile)\n\t\treturn NULL;\n\n\treturn isl_multi_union_pw_aff_get_space(tile->input_schedule);\n}\n\n/* Remove a reference to \"tile\" and free \"tile\" along with all its fields\n * as soon as the reference count drops to zero.\n */\nstatic __isl_null ppcg_ht_tiling *ppcg_ht_tiling_free(\n\t__isl_take ppcg_ht_tiling *tiling)\n{\n\tif (!tiling)\n\t\treturn NULL;\n\tif (--tiling->ref > 0)\n\t\treturn NULL;\n\n\tppcg_ht_bounds_free(tiling->bounds);\n\tisl_schedule_node_free(tiling->input_node);\n\tisl_multi_union_pw_aff_free(tiling->input_schedule);\n\tisl_multi_val_free(tiling->space_sizes);\n\tisl_aff_free(tiling->time_tile);\n\tisl_aff_free(tiling->local_time);\n\tisl_aff_free(tiling->shift_space);\n\tisl_multi_aff_free(tiling->shift_phase);\n\tisl_set_free(tiling->hex);\n\tisl_multi_aff_free(tiling->project_ts);\n\tfree(tiling);\n\n\treturn NULL;\n}\n\n/* Return a new reference to \"tiling\".\n */\n__isl_give ppcg_ht_tiling *ppcg_ht_tiling_copy(\n\t__isl_keep ppcg_ht_tiling *tiling)\n{\n\tif (!tiling)\n\t\treturn NULL;\n\n\ttiling->ref++;\n\treturn tiling;\n}\n\n/* Return the isl_ctx to which \"tiling\" belongs.\n */\nisl_ctx *ppcg_ht_tiling_get_ctx(__isl_keep ppcg_ht_tiling *tiling)\n{\n\tif (!tiling)\n\t\treturn NULL;\n\n\treturn isl_multi_union_pw_aff_get_ctx(tiling->input_schedule);\n}\n\n/* Representation of one of the two phases of hybrid tiling.\n *\n * \"tiling\" points to the shared tiling data.\n *\n * \"time_tile\", \"local_time\" and \"shift_space\" are equal to the corresponding\n * fields of \"tiling\", pulled back to the input space.\n * In case of phase 0, these expressions have also been moved\n * from phase 1 to phase 0.\n *\n * \"domain\" contains the hexagonal tiling of this phase.\n *\n * \"space_shift\" is the shift that should be added to the space band\n * in order to be able to apply rectangular tiling to the space.\n * For phase 1, it is equal to\n *\n *\t[P[t] -> C[s_0, s_i]] -> C[(-(2 * shift_s)*T) % W, dl_i * u]\n *\n * with shift_s = S_0 + floor(d_u h),\n * T equal to \"time_tile\" and u equal to \"local_time\".\n * For phase 0, it is equal to\n *\n *\t[P[t] -> C[s_0, s_i]] -> C[shift_s + (-(2 * shift_s)*T) % W, dl_i * u]\n *\n * \"space_tile\" is the space tiling.  It is equal to\n *\n *\t[P[t] -> C[s]] -> C[floor((s + space_shift)/space_size]\n */\nstruct ppcg_ht_phase {\n\tppcg_ht_tiling *tiling;\n\n\tisl_aff *time_tile;\n\tisl_aff *local_time;\n\tisl_aff *shift_space;\n\tisl_set *domain;\n\n\tisl_multi_aff *space_shift;\n\tisl_multi_aff *space_tile;\n};\n\n/* Free \"phase\" along with all its fields.\n */\nstatic __isl_null ppcg_ht_phase *ppcg_ht_phase_free(\n\t__isl_take ppcg_ht_phase *phase)\n{\n\tif (!phase)\n\t\treturn NULL;\n\n\tppcg_ht_tiling_free(phase->tiling);\n\tisl_aff_free(phase->time_tile);\n\tisl_aff_free(phase->local_time);\n\tisl_aff_free(phase->shift_space);\n\tisl_set_free(phase->domain);\n\tisl_multi_aff_free(phase->space_shift);\n\tisl_multi_aff_free(phase->space_tile);\n\tfree(phase);\n\n\treturn NULL;\n}\n\n/* Wrapper around ppcg_ht_phase_free for use as an argument\n * to isl_id_set_free_user.\n */\nstatic void ppcg_ht_phase_free_wrap(void *user)\n{\n\tppcg_ht_phase *phase = user;\n\n\tppcg_ht_phase_free(phase);\n}\n\n/* Return the domain of hybrid tiling phase \"phase\".\n */\nstatic __isl_give isl_set *ppcg_ht_phase_get_domain(ppcg_ht_phase *phase)\n{\n\tif (!phase)\n\t\treturn NULL;\n\n\treturn isl_set_copy(phase->domain);\n}\n\n/* Return the space of the pair of band nodes that form the input\n * to the hybrid tiling of which \"phase\" is a phase.\n * In particular, return the space [P -> C], where P is the space\n * of the parent node and C is the space of the child node.\n */\nstatic __isl_give isl_space *ppcg_ht_phase_get_input_space(\n\t__isl_keep ppcg_ht_phase *phase)\n{\n\tif (!phase)\n\t\treturn NULL;\n\n\treturn ppcg_ht_tiling_get_input_space(phase->tiling);\n}\n\n/* Construct the lower left constraint of the hexagonal tile, i.e.,\n *\n *\tdu a - b <= (2h+1) du - duh\n *\t-du a + b + (2h+1) du - duh >= 0\n *\n * where duh = floor(du * h).\n *\n * This constraint corresponds to (6) in\n * \"Hybrid Hexagonal/Classical Tiling for GPUs\".\n */\nstatic __isl_give isl_constraint *hex_lower_left(__isl_take isl_local_space *ls,\n\t__isl_keep isl_val *h, __isl_keep isl_val *du, __isl_keep isl_val *duh)\n{\n\tisl_val *v;\n\tisl_aff *aff;\n\n\tv = isl_val_add_ui(isl_val_mul_ui(isl_val_copy(h), 2), 1);\n\tv = isl_val_mul(v, isl_val_copy(du));\n\tv = isl_val_sub(v, isl_val_copy(duh));\n\taff = isl_aff_val_on_domain(ls, v);\n\tv = isl_val_neg(isl_val_copy(du));\n\taff = isl_aff_set_coefficient_val(aff, isl_dim_in, 0, v);\n\taff = isl_aff_set_coefficient_si(aff, isl_dim_in, 1, 1);\n\n\treturn isl_inequality_from_aff(aff);\n}\n\n/* Construct the lower constraint of the hexagonal tile, i.e.,\n *\n *\ta <= 2h+1\n *\t-a + 2h+1 >= 0\n *\n * This constraint corresponds to (7) in\n * \"Hybrid Hexagonal/Classical Tiling for GPUs\".\n */\nstatic __isl_give isl_constraint *hex_lower(__isl_take isl_local_space *ls,\n\t__isl_keep isl_val *h)\n{\n\tisl_val *v;\n\tisl_aff *aff;\n\n\tv = isl_val_add_ui(isl_val_mul_ui(isl_val_copy(h), 2), 1);\n\taff = isl_aff_val_on_domain(ls, v);\n\taff = isl_aff_set_coefficient_si(aff, isl_dim_in, 0, -1);\n\n\treturn isl_inequality_from_aff(aff);\n}\n\n/* Construct the lower right constraint of the hexagonal tile, i.e.,\n *\n *\tdl a + b <= (2h+1) dl + duh + (s0-1)\n *\t-dl a - b + (2h+1) dl + duh + (s0-1) >= 0\n *\n * where duh = floor(du * h).\n *\n * This constraint corresponds to (8) in\n * \"Hybrid Hexagonal/Classical Tiling for GPUs\".\n */\nstatic __isl_give isl_constraint *hex_lower_right(\n\t__isl_take isl_local_space *ls, __isl_keep isl_val *h,\n\t__isl_keep isl_val *s0, __isl_keep isl_val *dl, __isl_keep isl_val *duh)\n{\n\tisl_val *v;\n\tisl_aff *aff;\n\n\tv = isl_val_add_ui(isl_val_mul_ui(isl_val_copy(h), 2), 1);\n\tv = isl_val_mul(v, isl_val_copy(dl));\n\tv = isl_val_add(v, isl_val_copy(duh));\n\tv = isl_val_add(v, isl_val_copy(s0));\n\tv = isl_val_sub_ui(v, 1);\n\taff = isl_aff_val_on_domain(ls, v);\n\tv = isl_val_neg(isl_val_copy(dl));\n\taff = isl_aff_set_coefficient_val(aff, isl_dim_in, 0, v);\n\taff = isl_aff_set_coefficient_si(aff, isl_dim_in, 1, -1);\n\n\treturn isl_inequality_from_aff(aff);\n}\n\n/* Construct the upper left constraint of the hexagonal tile, i.e.,\n *\n *\tdl a + b >= h dl - (d - 1)/d\t\t\t\twith d = den(dl)\n *\tdl a + b - h dl + (d - 1)/d >= 0\n *\n * This constraint corresponds to (10) in\n * \"Hybrid Hexagonal/Classical Tiling for GPUs\".\n */\nstatic __isl_give isl_constraint *hex_upper_left(__isl_take isl_local_space *ls,\n\t__isl_keep isl_val *h, __isl_keep isl_val *dl)\n{\n\tisl_val *v, *d;\n\tisl_aff *aff;\n\n\td = isl_val_get_den_val(dl);\n\tv = isl_val_sub_ui(isl_val_copy(d), 1);\n\tv = isl_val_div(v, d);\n\tv = isl_val_sub(v, isl_val_mul(isl_val_copy(h), isl_val_copy(dl)));\n\taff = isl_aff_val_on_domain(ls, v);\n\taff = isl_aff_set_coefficient_val(aff, isl_dim_in, 0, isl_val_copy(dl));\n\taff = isl_aff_set_coefficient_si(aff, isl_dim_in, 1, 1);\n\n\treturn isl_inequality_from_aff(aff);\n}\n\n/* Construct the upper right constraint of the hexagonal tile, i.e.,\n *\n *\tdu a - b >= du h - duh - (s0-1) - dlh - (d - 1)/d\twith d = den(du)\n *\tdu a - b - du h + duh + (s0-1) + dlh + (d - 1)/d >= 0\n *\n * where dlh = floor(dl * h) and duh = floor(du * h).\n *\n * This constraint corresponds to (12) in\n * \"Hybrid Hexagonal/Classical Tiling for GPUs\".\n */\nstatic __isl_give isl_constraint *hex_upper_right(\n\t__isl_take isl_local_space *ls, __isl_keep isl_val *h,\n\t__isl_keep isl_val *s0, __isl_keep isl_val *du,\n\t__isl_keep isl_val *dlh, __isl_keep isl_val *duh)\n{\n\tisl_val *v, *d;\n\tisl_aff *aff;\n\n\td = isl_val_get_den_val(du);\n\tv = isl_val_sub_ui(isl_val_copy(d), 1);\n\tv = isl_val_div(v, d);\n\tv = isl_val_sub(v, isl_val_mul(isl_val_copy(h), isl_val_copy(du)));\n\tv = isl_val_add(v, isl_val_copy(duh));\n\tv = isl_val_add(v, isl_val_copy(dlh));\n\tv = isl_val_add(v, isl_val_copy(s0));\n\tv = isl_val_sub_ui(v, 1);\n\taff = isl_aff_val_on_domain(ls, v);\n\taff = isl_aff_set_coefficient_val(aff, isl_dim_in, 0, isl_val_copy(du));\n\taff = isl_aff_set_coefficient_si(aff, isl_dim_in, 1, -1);\n\n\treturn isl_inequality_from_aff(aff);\n}\n\n/* Construct the uppper constraint of the hexagonal tile, i.e.,\n *\n *\ta >= 0\n *\n * This constraint corresponds to (13) in\n * \"Hybrid Hexagonal/Classical Tiling for GPUs\".\n */\nstatic __isl_give isl_constraint *hex_upper(__isl_take isl_local_space *ls)\n{\n\tisl_aff *aff;\n\n\taff = isl_aff_var_on_domain(ls, isl_dim_set, 0);\n\n\treturn isl_inequality_from_aff(aff);\n}\n\n/* Construct the basic hexagonal tile shape.\n * \"space\" is the 2D space in which the hexagon should be constructed.\n * h is st-1, with st the tile size in the time dimension\n * s0 is the tile size in the space dimension\n * dl is a bound on the negative relative dependence distances, i.e.,\n *\n *\td_s >= -dl d_t\n *\n * du is a bound on the positive relative dependence distances, i.e.,\n *\n *\td_s <= du d_t\n *\n * with (d_t,d_s) any dependence distance vector.\n * dlh = floor(dl * h)\n * duh = floor(du * h)\n *\n * The shape of the hexagon is as follows:\n *\n *\t\t0 dlh   dlh+s0-1\n *\t\t   ______                __\n * 0\t\t  /      \\_             /\n *\t\t /         \\_          /\n * h\t\t/            \\ ______ /\n * h+1\t\t\\_           //      \\\\_\n *\t\t  \\_        //         \\\\_\n * 2h+1\t\t    \\______//            \\\\\n *\t\t0   duh   duh+s0-1\n *\t\t             duh+s0-1+dlh\n *\t\t                  duh+s0-1+dlh+1+s0+1\n *\n * The next hexagon is shifted by duh + dlh + 2 * s0.\n *\n * The slope of the \"/\" constraints is dl.\n * The slope of the \"\\_\" constraints is du.\n */\nstatic __isl_give isl_set *compute_hexagon(__isl_take isl_space *space,\n\t__isl_keep isl_val *h, __isl_keep isl_val *s0,\n\t__isl_keep isl_val *dl, __isl_keep isl_val *du,\n\t__isl_keep isl_val *dlh, __isl_keep isl_val *duh)\n{\n\tisl_local_space *ls;\n\tisl_constraint *c;\n\tisl_basic_set *bset;\n\n\tls = isl_local_space_from_space(space);\n\n\tc = hex_lower_left(isl_local_space_copy(ls), h, du, duh);\n\tbset = isl_basic_set_from_constraint(c);\n\n\tc = hex_lower(isl_local_space_copy(ls), h);\n\tbset = isl_basic_set_add_constraint(bset, c);\n\n\tc = hex_lower_right(isl_local_space_copy(ls), h, s0, dl, duh);\n\tbset = isl_basic_set_add_constraint(bset, c);\n\n\tc = hex_upper_left(isl_local_space_copy(ls), h, dl);\n\tbset = isl_basic_set_add_constraint(bset, c);\n\n\tc = hex_upper_right(isl_local_space_copy(ls), h, s0, du, dlh, duh);\n\tbset = isl_basic_set_add_constraint(bset, c);\n\n\tc = hex_upper(ls);\n\tbset = isl_basic_set_add_constraint(bset, c);\n\n\treturn isl_set_from_basic_set(bset);\n}\n\n/* Name of the ts-space.\n */\nstatic const char *ts_space_name = \"ts\";\n\n/* Construct and return the space ts[t, s].\n */\nstatic __isl_give isl_space *construct_ts_space(isl_ctx *ctx)\n{\n\tisl_space *s;\n\n\ts = isl_space_set_alloc(ctx, 0, 2);\n\ts = isl_space_set_tuple_name(s, isl_dim_set, ts_space_name);\n\n\treturn s;\n}\n\n/* Name of the local ts-space.\n */\nstatic const char *local_ts_space_name = \"local_ts\";\n\n/* Construct and return the space local_ts[t, s].\n */\nstatic __isl_give isl_space *construct_local_ts_space(isl_ctx *ctx)\n{\n\tisl_space *s;\n\n\ts = isl_space_set_alloc(ctx, 0, 2);\n\ts = isl_space_set_tuple_name(s, isl_dim_set, local_ts_space_name);\n\n\treturn s;\n}\n\n/* Compute the total size of a tile for the space dimensions,\n * i.e., those corresponding to the child node\n * of the input pattern.\n * If S_0 is the original tile size in the first space dimension,\n * then the first entry of \"space_sizes\" is equal to\n * W = 2*S_0 + floor(d_l h) + floor(d_u h).\n * The remaining entries are the same as in the original tile sizes.\n * \"tile_sizes\" contains the original tile sizes, including\n * the tile size corresponding to the parent node.\n * \"dlh\" is equal to floor(d_l h).\n * \"duh\" is equal to floor(d_u h).\n */\nstatic __isl_give isl_multi_val *compute_space_sizes(\n\t__isl_keep isl_multi_val *tile_sizes,\n\t__isl_keep isl_val *dlh, __isl_keep isl_val *duh)\n{\n\tisl_val *size;\n\tisl_multi_val *space_sizes;\n\n\tspace_sizes = isl_multi_val_copy(tile_sizes);\n\tspace_sizes = isl_multi_val_factor_range(space_sizes);\n\tsize = isl_multi_val_get_val(space_sizes, 0);\n\tsize = isl_val_mul_ui(size, 2);\n\tsize = isl_val_add(size, isl_val_copy(duh));\n\tsize = isl_val_add(size, isl_val_copy(dlh));\n\tspace_sizes = isl_multi_val_set_val(space_sizes, 0, size);\n\n\treturn space_sizes;\n}\n\n/* Compute the offset of phase 1 with respect to phase 0\n * in the ts-space (\"space\").\n * In particular, return\n *\n *\tts[st, s0 + duh]\n */\nstatic __isl_give isl_multi_val *compute_phase_shift(\n\t__isl_keep isl_space *space, __isl_keep isl_val *st,\n\t__isl_keep isl_val *s0, __isl_keep isl_val *duh)\n{\n\tisl_val *v;\n\tisl_multi_val *phase_shift;\n\n\tphase_shift = isl_multi_val_zero(isl_space_copy(space));\n\tphase_shift = isl_multi_val_set_val(phase_shift, 0, isl_val_copy(st));\n\tv = isl_val_add(isl_val_copy(duh), isl_val_copy(s0));\n\tphase_shift = isl_multi_val_set_val(phase_shift, 1, v);\n\n\treturn phase_shift;\n}\n\n/* Return the function\n *\n *\tts[t, s] -> floor(t/(2 * st))\n *\n * representing the time tile.\n * \"space\" is the space ts[t, s].\n */\nstatic __isl_give isl_aff *compute_time_tile(__isl_keep isl_space *space,\n\t__isl_keep isl_val *st)\n{\n\tisl_val *v;\n\tisl_aff *t;\n\tisl_local_space *ls;\n\n\tls = isl_local_space_from_space(isl_space_copy(space));\n\tt = isl_aff_var_on_domain(ls, isl_dim_set, 0);\n\tv = isl_val_mul_ui(isl_val_copy(st), 2);\n\tt = isl_aff_floor(isl_aff_scale_down_val(t, v));\n\n\treturn t;\n}\n\n/* Compute a shift in the space dimension for tiles\n * at time tile T = floor(t/(2 * S_t))\n * such that they align to a multiple of the total space tile dimension W.\n * In particular, compute\n *\n *\tts[t, s] -> s + (-(2 * shift_s)*T) % W\n *\n * where shift_s is the shift of phase 1 with respect to phase 0\n * in the space dimension (the first element of \"phase_shift\").\n * W is stored in the first element of \"space_sizes\".\n * \"time_tile\" is the function\n *\n *\tts[t, s] -> floor(t/(2 * S_T))\n *\n * Since phase 1 is shifted by shift_s with respect to phase 0,\n * the next line of phase 0 (at T+1) is shifted by 2*shift_s\n * with respect to the previous line (at T).\n * A shift of -(2 * shift_s)*T therefore allows the basic pattern\n * (which starts at 0) to be applied.\n * However, this shift will be used to obtain the tile coordinate\n * in the first space dimension and if the original values\n * in the space dimension are non-negative, then the shift should\n * not make them negative.  Moreover, the shift should be as minimal\n * as possible.\n * Since the pattern repeats itself with a period of W in the space\n * dimension, the shift can be replaced by (-(2 * shift_s)*T) % W.\n */\nstatic __isl_give isl_aff *compute_shift_space(__isl_keep isl_aff *time_tile,\n\t__isl_keep isl_multi_val *space_sizes,\n\t__isl_keep isl_multi_val *phase_shift)\n{\n\tisl_val *v;\n\tisl_aff *s, *t;\n\tisl_local_space *ls;\n\n\tls = isl_local_space_from_space(isl_aff_get_domain_space(time_tile));\n\tt = isl_aff_copy(time_tile);\n\tv = isl_val_mul_ui(isl_multi_val_get_val(phase_shift, 1), 2);\n\tv = isl_val_neg(v);\n\tt = isl_aff_scale_val(t, v);\n\tv = isl_multi_val_get_val(space_sizes, 0);\n\tt = isl_aff_mod_val(t, v);\n\ts = isl_aff_var_on_domain(ls, isl_dim_set, 1);\n\ts = isl_aff_add(s, t);\n\n\treturn s;\n}\n\n/* Give the phase_shift ts[S_t, S_0 + floor(d_u h)],\n * compute a function that applies the shift, i.e.,\n *\n *\tts[t, s] -> ts[t + S_t, s + S_0 + floor(d_u h)],\n */\nstatic __isl_give isl_multi_aff *compute_shift_phase(\n\t__isl_keep isl_multi_val *phase_shift)\n{\n\tisl_space *space;\n\tisl_multi_aff *shift;\n\n\tspace = isl_multi_val_get_space(phase_shift);\n\tshift = isl_multi_aff_multi_val_on_space(space,\n\t\t\t\t\tisl_multi_val_copy(phase_shift));\n\tspace = isl_multi_aff_get_space(shift);\n\tshift = isl_multi_aff_add(shift, isl_multi_aff_identity(space));\n\n\treturn shift;\n}\n\n/* Compute a mapping from the ts-space to the local coordinates\n * within each tile.  In particular, compute\n *\n *\tts[t, s] -> local_ts[t % (2 S_t), (s + (-(2 * shift_s)*T) % W) % W]\n *\n * \"ts\" is the space ts[t, s]\n * \"local_ts\" is the space local_ts[t, s]\n * \"shift_space\" is equal to ts[t, s] -> s + (-(2 * shift_s)*T) % W\n * \"st\" is the tile size in the time dimension S_t.\n * The first element of \"space_sizes\" is equal to W.\n */\nstatic __isl_give isl_multi_aff *compute_localize(\n\t__isl_keep isl_space *local_ts, __isl_keep isl_aff *shift_space,\n\t__isl_keep isl_val *st, __isl_keep isl_multi_val *space_sizes)\n{\n\tisl_val *v;\n\tisl_space *space;\n\tisl_aff *s, *t;\n\tisl_multi_aff *localize;\n\n\tspace = isl_aff_get_domain_space(shift_space);\n\tlocal_ts = isl_space_copy(local_ts);\n\tspace = isl_space_map_from_domain_and_range(space, local_ts);\n\tlocalize = isl_multi_aff_identity(space);\n\tt = isl_multi_aff_get_aff(localize, 0);\n\tv = isl_val_mul_ui(isl_val_copy(st), 2);\n\tt = isl_aff_mod_val(t, v);\n\tlocalize = isl_multi_aff_set_aff(localize, 0, t);\n\ts = isl_aff_copy(shift_space);\n\tv = isl_multi_val_get_val(space_sizes, 0);\n\ts = isl_aff_mod_val(s, v);\n\tlocalize = isl_multi_aff_set_aff(localize, 1, s);\n\n\treturn localize;\n}\n\n/* Set the project_ts field of \"tiling\".\n *\n * This field projects the space of the input schedule to the ts-space.\n * It is equal to [P[t] -> C[s_0, ...]] -> ts[t, s_0].\n */\nstatic __isl_give ppcg_ht_tiling *ppcg_ht_tiling_set_project_ts(\n\t__isl_take ppcg_ht_tiling *tiling)\n{\n\tint n;\n\tisl_space *space;\n\tisl_multi_aff *project;\n\n\tif (!tiling)\n\t\treturn NULL;\n\n\tspace = ppcg_ht_tiling_get_input_space(tiling);\n\tn = isl_space_dim(space, isl_dim_set);\n\tproject = isl_multi_aff_project_out_map(space, isl_dim_set, 2, n - 2);\n\tproject = isl_multi_aff_set_tuple_name(project,\n\t\t\t\t\t\tisl_dim_out, ts_space_name);\n\tif (!project)\n\t\treturn ppcg_ht_tiling_free(tiling);\n\n\ttiling->project_ts = project;\n\n\treturn tiling;\n}\n\n/* Construct a hybrid tiling description from bounds on the dependence\n * distances \"bounds\".\n * \"input_node\" points to the original parent node.\n * \"input_schedule\" is the combined schedule of the parent and child\n * node in the input.\n * \"tile_sizes\" are the original, user specified tile sizes.\n */\nstatic __isl_give ppcg_ht_tiling *ppcg_ht_bounds_construct_tiling(\n\t__isl_take ppcg_ht_bounds *bounds,\n\t__isl_keep isl_schedule_node *input_node,\n\t__isl_keep isl_multi_union_pw_aff *input_schedule,\n\t__isl_keep isl_multi_val *tile_sizes)\n{\n\tisl_ctx *ctx;\n\tppcg_ht_tiling *tiling;\n\tisl_multi_val *space_sizes, *phase_shift;\n\tisl_aff *time_tile, *shift_space;\n\tisl_multi_aff *localize;\n\tisl_val *h, *duh, *dlh;\n\tisl_val *st, *s0, *du, *dl;\n\tisl_space *ts, *local_ts;\n\n\tif (!bounds || !input_node || !input_schedule || !tile_sizes)\n\t\tgoto error;\n\n\tctx = isl_multi_union_pw_aff_get_ctx(input_schedule);\n\ttiling = isl_calloc_type(ctx, struct ppcg_ht_tiling);\n\tif (!tiling)\n\t\tgoto error;\n\ttiling->ref = 1;\n\n\tst = isl_multi_val_get_val(tile_sizes, 0);\n\th = isl_val_sub_ui(isl_val_copy(st), 1);\n\ts0 = isl_multi_val_get_val(tile_sizes, 1);\n\tdu = ppcg_ht_bounds_get_upper(bounds);\n\tdl = ppcg_ht_bounds_get_lower(bounds, 0);\n\n\tduh = isl_val_floor(isl_val_mul(isl_val_copy(du), isl_val_copy(h)));\n\tdlh = isl_val_floor(isl_val_mul(isl_val_copy(dl), isl_val_copy(h)));\n\n\tts = construct_ts_space(ctx);\n\tlocal_ts = construct_local_ts_space(ctx);\n\n\tspace_sizes = compute_space_sizes(tile_sizes, dlh, duh);\n\tphase_shift = compute_phase_shift(ts, st, s0, duh);\n\ttime_tile = compute_time_tile(ts, st);\n\tshift_space = compute_shift_space(time_tile, space_sizes, phase_shift);\n\tlocalize = compute_localize(local_ts, shift_space, st, space_sizes);\n\tisl_space_free(ts);\n\n\ttiling->input_node = isl_schedule_node_copy(input_node);\n\ttiling->input_schedule = isl_multi_union_pw_aff_copy(input_schedule);\n\ttiling->space_sizes = space_sizes;\n\ttiling->bounds = bounds;\n\ttiling->local_time = isl_multi_aff_get_aff(localize, 0);\n\ttiling->hex = compute_hexagon(local_ts, h, s0, dl, du, dlh, duh);\n\ttiling->hex = isl_set_preimage_multi_aff(tiling->hex, localize);\n\ttiling->time_tile = time_tile;\n\ttiling->shift_space = shift_space;\n\ttiling->shift_phase = compute_shift_phase(phase_shift);\n\tisl_multi_val_free(phase_shift);\n\n\tisl_val_free(duh);\n\tisl_val_free(dlh);\n\tisl_val_free(du);\n\tisl_val_free(dl);\n\tisl_val_free(s0);\n\tisl_val_free(st);\n\tisl_val_free(h);\n\n\tif (!tiling->input_schedule || !tiling->local_time || !tiling->hex ||\n\t    !tiling->shift_space || !tiling->shift_phase)\n\t\treturn ppcg_ht_tiling_free(tiling);\n\n\ttiling = ppcg_ht_tiling_set_project_ts(tiling);\n\n\treturn tiling;\nerror:\n\tppcg_ht_bounds_free(bounds);\n\treturn NULL;\n}\n\n/* Are all members of the band node \"node\" coincident?\n */\nstatic isl_bool all_coincident(__isl_keep isl_schedule_node *node)\n{\n\tint i, n;\n\n\tn = isl_schedule_node_band_n_member(node);\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_bool c;\n\n\t\tc = isl_schedule_node_band_member_get_coincident(node, i);\n\t\tif (c < 0 || !c)\n\t\t\treturn c;\n\t}\n\n\treturn isl_bool_true;\n}\n\n/* Does \"node\" satisfy the properties of the inner node in the input\n * pattern for hybrid tiling?\n * That is, is it a band node with only coincident members, of which\n * there is at least one?\n */\nstatic isl_bool has_child_properties(__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn isl_bool_error;\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n\t\treturn isl_bool_false;\n\tif (isl_schedule_node_band_n_member(node) < 1)\n\t\treturn isl_bool_false;\n\treturn all_coincident(node);\n}\n\n/* Does \"node\" satisfy the properties of the outer node in the input\n * pattern for hybrid tiling?\n * That is, is it a band node with a single member?\n */\nstatic isl_bool has_parent_properties(__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn isl_bool_error;\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n\t\treturn isl_bool_false;\n\tif (isl_schedule_node_band_n_member(node) != 1)\n\t\treturn isl_bool_false;\n\treturn isl_bool_true;\n}\n\n/* Does the parent of \"node\" satisfy the input patttern for hybrid tiling?\n * That is, does \"node\" satisfy the properties of the inner node and\n * does the parent of \"node\" satisfy the properties of the outer node?\n */\nisl_bool ppcg_ht_parent_has_input_pattern(__isl_keep isl_schedule_node *node)\n{\n\tisl_bool has_pattern;\n\n\thas_pattern = has_child_properties(node);\n\tif (has_pattern < 0 || !has_pattern)\n\t\treturn has_pattern;\n\n\tnode = isl_schedule_node_copy(node);\n\tnode = isl_schedule_node_parent(node);\n\thas_pattern = has_parent_properties(node);\n\tisl_schedule_node_free(node);\n\n\treturn has_pattern;\n}\n\n/* Does \"node\" satisfy the input patttern for hybrid tiling?\n * That is, does \"node\" satisfy the properties of the outer node and\n * does the child of \"node\" satisfy the properties of the inner node?\n */\nisl_bool ppcg_ht_has_input_pattern(__isl_keep isl_schedule_node *node)\n{\n\tisl_bool has_pattern;\n\n\thas_pattern = has_parent_properties(node);\n\tif (has_pattern < 0 || !has_pattern)\n\t\treturn has_pattern;\n\n\tnode = isl_schedule_node_get_child(node, 0);\n\thas_pattern = has_child_properties(node);\n\tisl_schedule_node_free(node);\n\n\treturn has_pattern;\n}\n\n/* Check that \"node\" satisfies the input pattern for hybrid tiling.\n * Error out if it does not.\n */\nstatic isl_stat check_input_pattern(__isl_keep isl_schedule_node *node)\n{\n\tisl_bool has_pattern;\n\n\thas_pattern = ppcg_ht_has_input_pattern(node);\n\tif (has_pattern < 0)\n\t\treturn isl_stat_error;\n\tif (!has_pattern)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_invalid,\n\t\t\t\"invalid input pattern for hybrid tiling\",\n\t\t\treturn isl_stat_error);\n\n\treturn isl_stat_ok;\n}\n\n/* Extract the input schedule from \"node\", i.e., the product\n * of the partial schedules of the parent and child nodes\n * in the input pattern.\n */\nstatic __isl_give isl_multi_union_pw_aff *extract_input_schedule(\n\t__isl_keep isl_schedule_node *node)\n{\n\tisl_multi_union_pw_aff *partial, *partial2;\n\n\tpartial = isl_schedule_node_band_get_partial_schedule(node);\n\tnode = isl_schedule_node_get_child(node, 0);\n\tpartial2 = isl_schedule_node_band_get_partial_schedule(node);\n\tisl_schedule_node_free(node);\n\n\treturn isl_multi_union_pw_aff_range_product(partial, partial2);\n}\n\n/* Collect all dependences from \"scop\" that are relevant for performing\n * hybrid tiling on \"node\" and its child and map them to the schedule\n * space of this pair of nodes.\n *\n * In case live range reordering is not used,\n * the flow and the false dependences are collected.\n * In case live range reordering is used,\n * the flow and the forced dependences are collected, as well\n * as the order dependences that are adjacent to non-local\n * flow dependences.\n *\n * In all cases, only dependences that map to the same instance\n * of the outer part of the schedule are considered.\n */\nstatic __isl_give isl_map *collect_deps(struct ppcg_scop *scop,\n\t__isl_keep isl_schedule_node *node)\n{\n\tisl_space *space;\n\tisl_multi_union_pw_aff *prefix, *partial;\n\tisl_union_map *flow, *other, *dep, *umap;\n\tisl_map *map;\n\n\tprefix = isl_schedule_node_get_prefix_schedule_multi_union_pw_aff(node);\n\tpartial = extract_input_schedule(node);\n\tspace = isl_multi_union_pw_aff_get_space(partial);\n\n\tflow = isl_union_map_copy(scop->dep_flow);\n\tflow = isl_union_map_eq_at_multi_union_pw_aff(flow,\n\t\t\t\t\tisl_multi_union_pw_aff_copy(prefix));\n\tif (!scop->options->live_range_reordering) {\n\t\tother = isl_union_map_copy(scop->dep_false);\n\t\tother = isl_union_map_eq_at_multi_union_pw_aff(other, prefix);\n\t} else {\n\t\tisl_union_map *local, *non_local, *order, *adj;\n\t\tisl_union_set *domain, *range;\n\n\t\tother = isl_union_map_copy(scop->dep_forced);\n\t\tother = isl_union_map_eq_at_multi_union_pw_aff(other,\n\t\t\t\t\tisl_multi_union_pw_aff_copy(prefix));\n\t\tlocal = isl_union_map_copy(flow);\n\t\tlocal = isl_union_map_eq_at_multi_union_pw_aff(local,\n\t\t\t\t\tisl_multi_union_pw_aff_copy(partial));\n\t\tnon_local = isl_union_map_copy(flow);\n\t\tnon_local = isl_union_map_subtract(non_local, local);\n\n\t\torder = isl_union_map_copy(scop->dep_order);\n\t\torder = isl_union_map_eq_at_multi_union_pw_aff(order, prefix);\n\t\tadj = isl_union_map_copy(order);\n\t\tdomain = isl_union_map_domain(isl_union_map_copy(non_local));\n\t\tdomain = isl_union_set_coalesce(domain);\n\t\tadj = isl_union_map_intersect_range(adj, domain);\n\t\tother = isl_union_map_union(other, adj);\n\n\t\tadj = order;\n\t\trange = isl_union_map_range(non_local);\n\t\trange = isl_union_set_coalesce(range);\n\t\tadj = isl_union_map_intersect_domain(adj, range);\n\t\tother = isl_union_map_union(other, adj);\n\t}\n\tdep = isl_union_map_union(flow, other);\n\n\tumap = isl_union_map_from_multi_union_pw_aff(partial);\n\tdep = isl_union_map_apply_domain(dep, isl_union_map_copy(umap));\n\tdep = isl_union_map_apply_range(dep, umap);\n\n\tspace = isl_space_map_from_set(space);\n\tmap = isl_union_map_extract_map(dep, space);\n\tisl_union_map_free(dep);\n\n\tmap = isl_map_coalesce(map);\n\n\treturn map;\n}\n\n/* Given a constraint of the form\n *\n *\ta i_0 + b i_1 >= 0\n * or\n *\ta i_0 + b i_1 = 0\n *\n * use it to update one or both of the non-negative bounds\n * in \"list\" = (min, max) such that\n *\n *\ti_1 >= -min i_0\n * and\n *\ti_1 <= max i_0\n *\n * If b = 0, then the constraint cannot be used.\n * Otherwise, the constraint is equivalent to\n *\n *\tsgn(b) i_1 >= - a/abs(b) i_0\n * i.e.,\n *\ti_1 >= - a/abs(b) i_0\n * or\n *\ti_1 <= a/abs(b) i_0\n *\n * Set the first or second element of \"list\" to max(0, a/abs(b)),\n * according to the sign of \"b\".  Or set both in case the constraint\n * is an equality, taking into account the sign change.\n */\nstatic __isl_give isl_val_list *list_set_min_max(__isl_take isl_val_list *list,\n\t__isl_keep isl_constraint *c)\n{\n\tisl_val *a, *b;\n\tint sign;\n\tint pos;\n\tisl_bool eq, is_zero, is_neg;\n\n\teq = isl_constraint_is_equality(c);\n\tif (eq < 0)\n\t\treturn isl_val_list_free(list);\n\n\tb = isl_constraint_get_coefficient_val(c, isl_dim_set, 1);\n\tis_zero = isl_val_is_zero(b);\n\tif (is_zero == isl_bool_true) {\n\t\tisl_val_free(b);\n\t\treturn list;\n\t}\n\ta = isl_constraint_get_coefficient_val(c, isl_dim_set, 0);\n\tsign = isl_val_sgn(b);\n\tb = isl_val_abs(b);\n\ta = isl_val_div(a, b);\n\n\tif (eq)\n\t\tb = isl_val_copy(a);\n\n\tpos = sign > 0 ? 0 : 1;\n\tis_neg = isl_val_is_neg(a);\n\tif (is_neg == isl_bool_true)\n\t\ta = isl_val_set_si(a, 0);\n\tlist = isl_val_list_set_val(list, pos, a);\n\n\tif (!eq)\n\t\treturn is_neg < 0 ? isl_val_list_free(list) : list;\n\n\tpos = 1 - pos;\n\ta = isl_val_neg(b);\n\tis_neg = isl_val_is_neg(a);\n\tif (is_neg == isl_bool_true)\n\t\ta = isl_val_set_si(a, 0);\n\tlist = isl_val_list_set_val(list, pos, a);\n\n\treturn is_neg < 0 ? isl_val_list_free(list) : list;\n}\n\n/* If constraint \"c\" passes through the origin, then try and use it\n * to update the non-negative bounds in \"list\" = (min, max) such that\n *\n *\ti_1 >= -min i_0\n * and\n *\ti_1 <= max i_0\n */\nstatic isl_stat set_min_max(__isl_take isl_constraint *c, void *user)\n{\n\tisl_val *v;\n\tisl_val_list **list = user;\n\tisl_bool is_zero;\n\n\tv = isl_constraint_get_constant_val(c);\n\tis_zero = isl_val_is_zero(v);\n\tisl_val_free(v);\n\n\tif (is_zero == isl_bool_true)\n\t\t*list = list_set_min_max(*list, c);\n\n\tisl_constraint_free(c);\n\treturn is_zero < 0 ? isl_stat_error : isl_stat_ok;\n}\n\n/* Given a set of dependence distance vectors \"dist\", compute\n * pair of non-negative bounds min and max such that\n *\n *\td_pos >= -min d_0\n * and\n *\td_pos <= max d_0\n *\n * and return the pair (min, max).\n * If no bound can be found in either direction, then the bound\n * is replaced by NaN.\n *\n * The dependence distances are first projected onto the (d_0, d_pos).\n * Then the zero dependence distance is added and the convex hull is computed.\n * Finally, the bounds are extracted from the constraints of the convex hull\n * that pass through the origin.\n */\nstatic __isl_give isl_val_list *min_max_dist(__isl_keep isl_set *dist, int pos)\n{\n\tisl_space *space;\n\tisl_basic_set *hull;\n\tint dim;\n\tisl_ctx *ctx;\n\tisl_val *nan;\n\tisl_val_list *list;\n\n\tctx = isl_set_get_ctx(dist);\n\tnan = isl_val_nan(ctx);\n\tlist = isl_val_list_alloc(ctx, 2);\n\tlist = isl_val_list_add(list, isl_val_copy(nan));\n\tlist = isl_val_list_add(list, nan);\n\n\tdist = isl_set_copy(dist);\n\tdim = isl_set_dim(dist, isl_dim_set);\n\tif (dist && pos >= dim)\n\t\tisl_die(ctx, isl_error_internal, \"position out of bounds\",\n\t\t\tdist = isl_set_free(dist));\n\tdist = isl_set_project_out(dist, isl_dim_set, pos + 1, dim - (pos + 1));\n\tdist = isl_set_project_out(dist, isl_dim_set, 1, pos - 1);\n\n\tspace = isl_set_get_space(dist);\n\tdist = isl_set_union(dist, isl_set_from_point(isl_point_zero(space)));\n\tdist = isl_set_remove_divs(dist);\n\thull = isl_set_convex_hull(dist);\n\n\tif (isl_basic_set_foreach_constraint(hull, &set_min_max, &list) < 0)\n\t\tlist = isl_val_list_free(list);\n\tisl_basic_set_free(hull);\n\n\treturn list;\n}\n\n/* Given a schedule node \"node\" that, together with its child,\n * satisfies the input pattern for hybrid tiling, compute bounds\n * on the relative dependence distances of the child node with\n * respect to the parent node.  These bounds are needed to\n * construct a hybrid tiling.\n *\n * First all relevant dependences are collected and mapped\n * to the schedule space of the pair of nodes.  Then, the\n * dependence distances are computed in this space.\n *\n * These dependence distances are then projected onto a two-dimensional\n * space consisting of the single schedule dimension of the outer node\n * and one of the schedule dimensions of the inner node.\n * The maximal and minimal relative dependence distances are extracted\n * from these projections.\n * This process is repeated for each of the schedule dimensions\n * of the inner node.  For the first dimension, both minimal and\n * maximal relative dependence distances are stored in the result.\n * For the other dimensions, only the minimal relative dependence\n * distance is stored.\n */\n__isl_give ppcg_ht_bounds *ppcg_ht_compute_bounds(struct ppcg_scop *scop,\n\t__isl_keep isl_schedule_node *node)\n{\n\tppcg_ht_bounds *bnd;\n\tisl_space *space;\n\tisl_map *map;\n\tisl_set *dist;\n\tisl_val_list *pair;\n\tisl_schedule_node *child;\n\tint n;\n\tint i, dim;\n\n\tif (!scop || !node || check_input_pattern(node) < 0)\n\t\treturn NULL;\n\n\tchild = isl_schedule_node_get_child(node, 0);\n\tspace = isl_schedule_node_band_get_space(child);\n\tdim = isl_schedule_node_band_n_member(child);\n\tisl_schedule_node_free(child);\n\tbnd = ppcg_ht_bounds_alloc(space);\n\tif (!bnd)\n\t\treturn NULL;\n\n\tmap = collect_deps(scop, node);\n\n\tdist = isl_map_deltas(map);\n\tn = isl_set_dim(dist, isl_dim_param);\n\tdist = isl_set_project_out(dist, isl_dim_param, 0, n);\n\n\tpair = min_max_dist(dist, 1);\n\tbnd = ppcg_ht_bounds_set_lower(bnd, 0, isl_val_list_get_val(pair, 0));\n\tbnd = ppcg_ht_bounds_set_upper(bnd, isl_val_list_get_val(pair, 1));\n\tisl_val_list_free(pair);\n\n\tfor (i = 1; i < dim; ++i) {\n\t\tpair = min_max_dist(dist, 1 + i);\n\t\tbnd = ppcg_ht_bounds_set_lower(bnd, i,\n\t\t\t\t\t\tisl_val_list_get_val(pair, 0));\n\t\tisl_val_list_free(pair);\n\t}\n\n\tisl_set_free(dist);\n\n\treturn bnd;\n}\n\n/* Check if all the fields of \"phase\" are valid, freeing \"phase\"\n * if they are not.\n */\nstatic __isl_give ppcg_ht_phase *check_phase(__isl_take ppcg_ht_phase *phase)\n{\n\tif (!phase)\n\t\treturn NULL;\n\n\tif (!phase->tiling || !phase->local_time ||\n\t    !phase->shift_space || !phase->domain)\n\t\treturn ppcg_ht_phase_free(phase);\n\n\treturn phase;\n}\n\n/* Construct a ppcg_ht_phase object, that simply copies\n * information from \"tiling\".\n * That is, the result is defined over the \"ts\" space and\n * corresponds to phase 1.\n */\nstatic __isl_give ppcg_ht_phase *construct_phase(\n\t__isl_keep ppcg_ht_tiling *tiling)\n{\n\tisl_ctx *ctx;\n\tppcg_ht_phase *phase;\n\n\tif (!tiling)\n\t\treturn NULL;\n\n\tctx = ppcg_ht_tiling_get_ctx(tiling);\n\tphase = isl_calloc_type(ctx, struct ppcg_ht_phase);\n\tif (!phase)\n\t\treturn NULL;\n\tphase->tiling = ppcg_ht_tiling_copy(tiling);\n\tphase->time_tile = isl_aff_copy(tiling->time_tile);\n\tphase->local_time = isl_aff_copy(tiling->local_time);\n\tphase->shift_space = isl_aff_copy(tiling->shift_space);\n\tphase->domain = isl_set_copy(tiling->hex);\n\n\treturn check_phase(phase);\n}\n\n/* Align the parameters of the elements of \"phase\" to those of \"space\".\n */\nstatic __isl_give ppcg_ht_phase *phase_align_params(\n\t__isl_take ppcg_ht_phase *phase, __isl_take isl_space *space)\n{\n\tif (!phase)\n\t\tgoto error;\n\n\tphase->time_tile = isl_aff_align_params(phase->time_tile,\n\t\t\t\t\t\t\tisl_space_copy(space));\n\tphase->local_time = isl_aff_align_params(phase->local_time,\n\t\t\t\t\t\t\tisl_space_copy(space));\n\tphase->shift_space = isl_aff_align_params(phase->shift_space,\n\t\t\t\t\t\t\tisl_space_copy(space));\n\tphase->domain = isl_set_align_params(phase->domain, space);\n\n\treturn check_phase(phase);\nerror:\n\tisl_space_free(space);\n\treturn NULL;\n}\n\n/* Pull back \"phase\" over \"ma\".\n * That is, take a phase defined over the range of \"ma\" and\n * turn it into a phase defined over the domain of \"ma\".\n */\nstatic __isl_give ppcg_ht_phase *pullback_phase(__isl_take ppcg_ht_phase *phase,\n\t__isl_take isl_multi_aff *ma)\n{\n\tphase = phase_align_params(phase, isl_multi_aff_get_space(ma));\n\tif (!phase)\n\t\tgoto error;\n\n\tphase->time_tile = isl_aff_pullback_multi_aff(phase->time_tile,\n\t\t\t\t\t\t\tisl_multi_aff_copy(ma));\n\tphase->local_time = isl_aff_pullback_multi_aff(phase->local_time,\n\t\t\t\t\t\t\tisl_multi_aff_copy(ma));\n\tphase->shift_space = isl_aff_pullback_multi_aff(phase->shift_space,\n\t\t\t\t\t\t\tisl_multi_aff_copy(ma));\n\tphase->domain = isl_set_preimage_multi_aff(phase->domain, ma);\n\n\treturn check_phase(phase);\nerror:\n\tisl_multi_aff_free(ma);\n\treturn NULL;\n}\n\n/* Pullback \"phase\" over phase->tiling->shift_phase, which shifts\n * phase 0 to phase 1.  The pullback therefore takes a phase 1\n * description and turns it into a phase 0 description.\n */\nstatic __isl_give ppcg_ht_phase *shift_phase(__isl_take ppcg_ht_phase *phase)\n{\n\tppcg_ht_tiling *tiling;\n\n\tif (!phase)\n\t\treturn NULL;\n\n\ttiling = phase->tiling;\n\treturn pullback_phase(phase, isl_multi_aff_copy(tiling->shift_phase));\n}\n\n/* Take a \"phase\" defined over the ts-space and plug in the projection\n * from the input schedule space to the ts-space.\n * The result is then defined over this input schedule space.\n */\nstatic __isl_give ppcg_ht_phase *lift_phase(__isl_take ppcg_ht_phase *phase)\n{\n\tppcg_ht_tiling *tiling;\n\n\tif (!phase)\n\t\treturn NULL;\n\n\ttiling = phase->tiling;\n\treturn pullback_phase(phase, isl_multi_aff_copy(tiling->project_ts));\n}\n\n/* Compute the shift that should be added to the space band\n * in order to be able to apply rectangular tiling to the space.\n * Store the shift in phase->space_shift.\n *\n * In the first dimension, it is equal to shift_space - s.\n * For phase 1, this results in\n *\n *\t(-(2 * shift_s)*T) % W\n *\n * In phase 0, the \"s\" in shift_space has been replaced by \"s + shift_s\",\n * so the result is\n *\n *\tshift_s + (-(2 * shift_s)*T) % W\n *\n * In the other dimensions, the shift is equal to\n *\n *\tdl_i * local_time.\n */\nstatic __isl_give ppcg_ht_phase *compute_space_shift(\n\t__isl_take ppcg_ht_phase *phase)\n{\n\tint i, n;\n\tisl_space *space;\n\tisl_local_space *ls;\n\tisl_aff *aff, *s;\n\tisl_multi_aff *space_shift;\n\n\tif (!phase)\n\t\treturn NULL;\n\n\tspace = ppcg_ht_phase_get_input_space(phase);\n\tspace = isl_space_unwrap(space);\n\tspace = isl_space_range_map(space);\n\n\tspace_shift = isl_multi_aff_zero(space);\n\taff = isl_aff_copy(phase->shift_space);\n\tls = isl_local_space_from_space(isl_aff_get_domain_space(aff));\n\ts = isl_aff_var_on_domain(ls, isl_dim_set, 1);\n\taff = isl_aff_sub(aff, s);\n\tspace_shift = isl_multi_aff_set_aff(space_shift, 0, aff);\n\n\tn = isl_multi_aff_dim(space_shift, isl_dim_out);\n\tfor (i = 1; i < n; ++i) {\n\t\tisl_val *v;\n\t\tisl_aff *time;\n\n\t\tv = ppcg_ht_bounds_get_lower(phase->tiling->bounds, i);\n\t\ttime = isl_aff_copy(phase->local_time);\n\t\ttime = isl_aff_scale_val(time, v);\n\t\tspace_shift = isl_multi_aff_set_aff(space_shift, i, time);\n\t}\n\n\tif (!space_shift)\n\t\treturn ppcg_ht_phase_free(phase);\n\tphase->space_shift = space_shift;\n\treturn phase;\n}\n\n/* Compute the space tiling and store the result in phase->space_tile.\n * The space tiling is of the form\n *\n *\t[P[t] -> C[s]] -> C[floor((s + space_shift)/space_size]\n */\nstatic __isl_give ppcg_ht_phase *compute_space_tile(\n\t__isl_take ppcg_ht_phase *phase)\n{\n\tisl_space *space;\n\tisl_multi_val *space_sizes;\n\tisl_multi_aff *space_shift;\n\tisl_multi_aff *tile;\n\n\tif (!phase)\n\t\treturn NULL;\n\n\tspace = ppcg_ht_phase_get_input_space(phase);\n\tspace = isl_space_unwrap(space);\n\ttile = isl_multi_aff_range_map(space);\n\tspace_shift = isl_multi_aff_copy(phase->space_shift);\n\ttile = isl_multi_aff_add(space_shift, tile);\n\tspace_sizes = isl_multi_val_copy(phase->tiling->space_sizes);\n\ttile = isl_multi_aff_scale_down_multi_val(tile, space_sizes);\n\ttile = isl_multi_aff_floor(tile);\n\n\tif (!tile)\n\t\treturn ppcg_ht_phase_free(phase);\n\tphase->space_tile = tile;\n\treturn phase;\n}\n\n/* Construct a representation for one of the two phase for hybrid tiling\n * \"tiling\".  If \"shift\" is not set, then the phase is constructed\n * directly from the hexagonal tile shape in \"tiling\", which represents\n * the phase-1 tiles.  If \"shift\" is set, then this tile shape is shifted\n * back over tiling->shift_phase to obtain the phase-0 tiles.\n *\n * First copy data from \"tiling\", then optionally shift the phase and\n * finally move the tiling from the \"ts\" space of \"tiling\" to\n * the space of the input pattern.\n *\n * After the basic phase has been computed, also compute\n * the corresponding space shift.\n */\nstatic __isl_give ppcg_ht_phase *ppcg_ht_tiling_compute_phase(\n\t__isl_keep ppcg_ht_tiling *tiling, int shift)\n{\n\tppcg_ht_phase *phase;\n\n\tphase = construct_phase(tiling);\n\tif (shift)\n\t\tphase = shift_phase(phase);\n\tphase = lift_phase(phase);\n\n\tphase = compute_space_shift(phase);\n\tphase = compute_space_tile(phase);\n\n\treturn phase;\n}\n\n/* Consruct a function that is equal to the time tile of \"phase0\"\n * on the domain of \"phase0\" and equal to the time tile of \"phase1\"\n * on the domain of \"phase1\".\n * The two domains are assumed to form a partition of the input\n * schedule space.\n */\nstatic __isl_give isl_pw_multi_aff *combine_time_tile(\n\t__isl_keep ppcg_ht_phase *phase0, __isl_keep ppcg_ht_phase *phase1)\n{\n\tisl_aff *T;\n\tisl_pw_aff *time, *time1;\n\n\tif (!phase0 || !phase1)\n\t\treturn NULL;\n\n\tT = isl_aff_copy(phase0->time_tile);\n\ttime = isl_pw_aff_alloc(ppcg_ht_phase_get_domain(phase0), T);\n\n\tT = isl_aff_copy(phase1->time_tile);\n\ttime1 = isl_pw_aff_alloc(ppcg_ht_phase_get_domain(phase1), T);\n\n\ttime = isl_pw_aff_union_add(time, time1);\n\n\treturn isl_pw_multi_aff_from_pw_aff(time);\n}\n\n/* Name used in mark nodes that contain a pointer to a ppcg_ht_phase.\n */\nstatic char *ppcg_phase_name = \"phase\";\n\n/* Does \"id\" contain a pointer to a ppcg_ht_phase?\n * That is, is it called \"phase\"?\n */\nstatic isl_bool is_phase_id(__isl_keep isl_id *id)\n{\n\tconst char *name;\n\n\tname = isl_id_get_name(id);\n\tif (!name)\n\t\treturn isl_bool_error;\n\n\treturn !strcmp(name, ppcg_phase_name);\n}\n\n/* Given a mark node with an identifier that points to a ppcg_ht_phase,\n * extract this ppcg_ht_phase pointer.\n */\n__isl_keep ppcg_ht_phase *ppcg_ht_phase_extract_from_mark(\n\t__isl_keep isl_schedule_node *node)\n{\n\tisl_bool is_phase;\n\tisl_id *id;\n\tvoid *p;\n\n\tif (!node)\n\t\treturn NULL;\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_mark)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_internal,\n\t\t\t\"not a phase mark\", return NULL);\n\n\tid = isl_schedule_node_mark_get_id(node);\n\tis_phase = is_phase_id(id);\n\tp = isl_id_get_user(id);\n\tisl_id_free(id);\n\n\tif (is_phase < 0)\n\t\treturn NULL;\n\tif (!is_phase)\n\t\tisl_die(isl_schedule_node_get_ctx(node), isl_error_internal,\n\t\t\t\"not a phase mark\", return NULL);\n\n\treturn p;\n}\n\n/* Insert a mark node at \"node\" holding a pointer to \"phase\".\n */\nstatic __isl_give isl_schedule_node *insert_phase(\n\t__isl_take isl_schedule_node *node, __isl_take ppcg_ht_phase *phase)\n{\n\tisl_ctx *ctx;\n\tisl_id *id;\n\n\tif (!node)\n\t\tgoto error;\n\tctx = isl_schedule_node_get_ctx(node);\n\tid = isl_id_alloc(ctx, ppcg_phase_name, phase);\n\tif (!id)\n\t\tgoto error;\n\tid = isl_id_set_free_user(id, &ppcg_ht_phase_free_wrap);\n\tnode = isl_schedule_node_insert_mark(node, id);\n\n\treturn node;\nerror:\n\tppcg_ht_phase_free(phase);\n\tisl_schedule_node_free(node);\n\treturn NULL;\n}\n\n/* Construct a mapping from the elements of the original pair of bands\n * to which tiling was applied that belong to a tile of \"phase\"\n * to that tile, preserving the values for the outer bands.\n *\n * The mapping is of the form\n *\n *\t[[outer] -> [P -> C]] -> [[outer] -> [tile]]\n *\n * where tile is defined by a concatenation of the time_tile and\n * the space_tile.\n */\nstatic __isl_give isl_map *construct_tile_map(__isl_keep ppcg_ht_phase *phase)\n{\n\tint depth;\n\tisl_space *space;\n\tisl_multi_aff *ma;\n\tisl_multi_aff *tiling;\n\tisl_map *el2tile;\n\n\tdepth = isl_schedule_node_get_schedule_depth(\n\t\t\t\t\t\tphase->tiling->input_node);\n\tspace = isl_aff_get_space(phase->time_tile);\n\tspace = isl_space_params(space);\n\tspace = isl_space_set_from_params(space);\n\tspace = isl_space_add_dims(space, isl_dim_set, depth);\n\tspace = isl_space_map_from_set(space);\n\tma = isl_multi_aff_identity(space);\n\n\ttiling = isl_multi_aff_flat_range_product(\n\t\tisl_multi_aff_from_aff(isl_aff_copy(phase->time_tile)),\n\t\tisl_multi_aff_copy(phase->space_tile));\n\tel2tile = isl_map_from_multi_aff(tiling);\n\tel2tile = isl_map_intersect_domain(el2tile,\n\t\t\t\t\t\tisl_set_copy(phase->domain));\n\tel2tile = isl_map_product(isl_map_from_multi_aff(ma), el2tile);\n\n\treturn el2tile;\n}\n\n/* Return a description of the full tiles of \"phase\" at the point\n * in the original schedule tree where the tiling was applied.\n *\n * First construct a mapping from the input schedule dimensions\n * up to and including the original pair of bands to which hybrid tiling\n * was applied to schedule dimensions in which this original pair\n * has been replaced by the tiles.\n * This mapping is of the form\n *\n *\t[[outer] -> [P -> C]] -> [[outer] -> [tile]]\n *\n * Apply this mapping to the set of all values for the input\n * schedule dimensions and then apply its inverse.\n * The result is the set of values for the input schedule dimensions\n * that would map to any of the tiles.  Subtracting from this set\n * the set of values that are actually executed produces the set\n * of values that belong to a tile but that are not executed.\n * Mapping these back to the tiles produces a description of\n * the partial tiles.  Subtracting these from the set of all tiles\n * produces a description of the full tiles in the form\n *\n *\t[[outer] -> [tile]]\n */\nstatic __isl_give isl_set *compute_full_tile(__isl_keep ppcg_ht_phase *phase)\n{\n\tisl_schedule_node *node;\n\tisl_union_set *domain;\n\tisl_union_map *prefix, *schedule;\n\tisl_set *all, *partial, *all_el;\n\tisl_map *tile2el, *el2tile;\n\tisl_multi_union_pw_aff *mupa;\n\n\tel2tile = construct_tile_map(phase);\n\ttile2el = isl_map_reverse(isl_map_copy(el2tile));\n\n\tnode = phase->tiling->input_node;\n\tprefix = isl_schedule_node_get_prefix_schedule_union_map(node);\n\tdomain = isl_schedule_node_get_domain(node);\n\tmupa = isl_multi_union_pw_aff_copy(phase->tiling->input_schedule);\n\tschedule = isl_union_map_from_multi_union_pw_aff(mupa);\n\tschedule = isl_union_map_range_product(prefix, schedule);\n\tall_el = isl_set_from_union_set(isl_union_set_apply(domain, schedule));\n\tall_el = isl_set_coalesce(all_el);\n\n\tall = isl_set_apply(isl_set_copy(all_el), isl_map_copy(el2tile));\n\n\tpartial = isl_set_copy(all);\n\tpartial = isl_set_apply(partial, tile2el);\n\tpartial = isl_set_subtract(partial, all_el);\n\tpartial = isl_set_apply(partial, el2tile);\n\n\treturn isl_set_subtract(all, partial);\n}\n\n/* Copy the AST loop types of the non-isolated part to those\n * of the isolated part.\n */\nstatic __isl_give isl_schedule_node *set_isolate_loop_type(\n\t__isl_take isl_schedule_node *node)\n{\n\tint i, n;\n\n\tn = isl_schedule_node_band_n_member(node);\n\tfor (i = 0; i < n; ++i) {\n\t\tenum isl_ast_loop_type type;\n\n\t\ttype = isl_schedule_node_band_member_get_ast_loop_type(node, i);\n\t\tnode = isl_schedule_node_band_member_set_isolate_ast_loop_type(\n\t\t\t\t\t\t\t\tnode, i, type);\n\t}\n\n\treturn node;\n}\n\n/* If options->isolate_full_tiles is set, then mark the full tiles\n * in \"node\" for isolation.  The full tiles are derived from \"phase\".\n * \"node\" may point to a part of the tiling, e.g., the space tiling.\n *\n * The full tiles are originally computed in the form\n *\n *\t[[outer] -> [tile]]\n *\n * However, the band that \"node\" points to may only contain\n * subset of the tile dimensions.\n * The description above is therefore treated as\n *\n *\t[[outer] -> [before; this; after]]\n *\n * before is of size \"pos\"; this is of size \"dim\"; and\n * after is of size \"out - pos - dim\".\n * The after part is first project out.  Then the range is split\n * into a before and this part and finally the before part is moved\n * to the domain, resulting in\n *\n *\t[[outer; before] -> [this]]\n *\n * This description is then used as the isolate option.\n *\n * The AST loop type for the isolated part is set to be the same\n * as that of the non-isolated part.\n */\nstatic __isl_give isl_schedule_node *ppcg_ht_phase_isolate_full_tile_node(\n\t__isl_keep ppcg_ht_phase *phase, __isl_take isl_schedule_node *node,\n\tstruct ppcg_options *options)\n{\n\tint in, out, pos, depth, dim;\n\tisl_space *space;\n\tisl_multi_aff *ma1, *ma2;\n\tisl_set *tile;\n\tisl_map *map;\n\tisl_set *set;\n\tisl_union_set *opt;\n\n\tif (!options->isolate_full_tiles)\n\t\treturn node;\n\n\tdepth = isl_schedule_node_get_schedule_depth(node);\n\tdim = isl_schedule_node_band_n_member(node);\n\n\ttile = compute_full_tile(phase);\n\tmap = isl_set_unwrap(tile);\n\tin = isl_map_dim(map, isl_dim_in);\n\tout = isl_map_dim(map, isl_dim_out);\n\tpos = depth - in;\n\tmap = isl_map_project_out(map, isl_dim_out, pos + dim,\n\t\t\t\tout - (pos + dim));\n\tspace = isl_space_range(isl_map_get_space(map));\n\tma1 = isl_multi_aff_project_out_map(isl_space_copy(space),\n\t\t\t\t\t   isl_dim_set, pos, dim);\n\tma2 = isl_multi_aff_project_out_map(space, isl_dim_set, 0, pos);\n\tma1 = isl_multi_aff_range_product(ma1, ma2);\n\tmap = isl_map_apply_range(map, isl_map_from_multi_aff(ma1));\n\tmap = isl_map_uncurry(map);\n\tmap = isl_map_flatten_domain(map);\n\tset = isl_map_wrap(map);\n\tset = isl_set_set_tuple_name(set, \"isolate\");\n\n\topt = isl_schedule_node_band_get_ast_build_options(node);\n\topt = isl_union_set_add_set(opt, set);\n\tnode = isl_schedule_node_band_set_ast_build_options(node, opt);\n\tnode = set_isolate_loop_type(node);\n\n\treturn node;\n}\n\n/* Insert a band node for performing the space tiling for \"phase\" at \"node\".\n * In particular, insert a band node with partial schedule\n *\n *\t[P[t] -> C[s]] -> C[floor((s + space_shift)/space_size)]\n *\n * pulled back over the input schedule.\n * \"options\" determines whether full tiles should be separated\n * from partial tiles.\n *\n * The first tile dimension iterates over the hexagons in the same\n * phase, which are independent by construction.  The first dimension\n * is therefore marked coincident.\n * All dimensions are also marked for being generated as atomic loops\n * because separation is usually not desirable on tile loops.\n */\nstatic __isl_give isl_schedule_node *insert_space_tiling(\n\t__isl_keep ppcg_ht_phase *phase, __isl_take isl_schedule_node *node,\n\tstruct ppcg_options *options)\n{\n\tisl_multi_aff *space_tile;\n\tisl_multi_union_pw_aff *mupa;\n\n\tif (!phase)\n\t\treturn isl_schedule_node_free(node);\n\n\tspace_tile = isl_multi_aff_copy(phase->space_tile);\n\tmupa = isl_multi_union_pw_aff_copy(phase->tiling->input_schedule);\n\tmupa = isl_multi_union_pw_aff_apply_multi_aff(mupa, space_tile);\n\tnode = isl_schedule_node_insert_partial_schedule(node, mupa);\n\tnode = ppcg_set_schedule_node_type(node, isl_ast_loop_atomic);\n\tnode = ppcg_ht_phase_isolate_full_tile_node(phase, node, options);\n\tnode = isl_schedule_node_band_member_set_coincident(node, 0, 1);\n\n\treturn node;\n}\n\n/* Given a pointer \"node\" to (a copy of) the original child node\n * in the input pattern, adjust its partial schedule such that\n * it starts at zero within each tile.\n *\n * That is, replace \"s\" by (s + space_shift) % space_sizes.\n */\n__isl_give isl_schedule_node *ppcg_ht_phase_shift_space_point(\n\t__isl_keep ppcg_ht_phase *phase, __isl_take isl_schedule_node *node)\n{\n\tisl_multi_val *space_sizes;\n\tisl_multi_aff *space_shift;\n\tisl_multi_union_pw_aff *mupa;\n\n\tspace_shift = isl_multi_aff_copy(phase->space_shift);\n\tmupa = isl_multi_union_pw_aff_copy(phase->tiling->input_schedule);\n\tmupa = isl_multi_union_pw_aff_apply_multi_aff(mupa, space_shift);\n\tnode = isl_schedule_node_band_shift(node, mupa);\n\tspace_sizes = isl_multi_val_copy(phase->tiling->space_sizes);\n\tnode = isl_schedule_node_band_mod(node, space_sizes);\n\n\treturn node;\n}\n\n/* Does\n *\n *\ts0 > delta + 2 * {delta * h} - 1\n *\n * hold?\n */\nstatic isl_bool wide_enough(__isl_keep isl_val *s0, __isl_keep isl_val *delta,\n\t__isl_keep isl_val *h)\n{\n\tisl_val *v, *v2;\n\tisl_bool ok;\n\n\tv = isl_val_mul(isl_val_copy(delta), isl_val_copy(h));\n\tv2 = isl_val_floor(isl_val_copy(v));\n\tv = isl_val_sub(v, v2);\n\tv = isl_val_mul_ui(v, 2);\n\tv = isl_val_add(v, isl_val_copy(delta));\n\tv = isl_val_sub_ui(v, 1);\n\tok = isl_val_gt(s0, v);\n\tisl_val_free(v);\n\n\treturn ok;\n}\n\n/* Is the tile size specified by \"sizes\" wide enough in the first space\n * dimension, i.e., the base of the hexagon?  This ensures that,\n * after hybrid tiling using \"bounds\" and these sizes,\n * neighboring hexagons in the same phase are far enough apart\n * that they do not depend on each other.\n * The test is only meaningful if the bounds are valid.\n *\n * Let st be (half) the size in the time dimension and s0 the base\n * size in the first space dimension.  Let delta be the dependence\n * distance in either positive or negative direction.  In principle,\n * it should be enough to have s0 + 1 > delta, i.e., s0 >= delta.\n * However, in case of fractional delta, the tile is not extended\n * with delta * (st - 1), but instead with floor(delta * (st - 1)).\n * The condition therefore needs to be adjusted to\n *\n *\ts0 + 1 > delta + 2 {delta * (st - 1)}\n *\n * (with {} the fractional part) to account for the two slanted sides.\n * The condition in the paper \"Hybrid Hexagonal/Classical Tiling for GPUs\"\n * translates to\n *\n *\ts0 >= delta + {delta * (st - 1)}\n *\n * Since 1 > frac(delta * (st - 1)), this condition implies\n * the condition above.\n *\n * The condition is checked for both directions.\n */\nisl_bool ppcg_ht_bounds_supports_sizes(__isl_keep ppcg_ht_bounds *bounds,\n\t__isl_keep isl_multi_val *sizes)\n{\n\tisl_val *s0, *h;\n\tisl_val *delta;\n\tisl_bool ok;\n\n\tok = ppcg_ht_bounds_is_valid(bounds);\n\tif (ok < 0 || !ok)\n\t\treturn ok;\n\n\th = isl_val_sub_ui(isl_multi_val_get_val(sizes, 0), 1);\n\ts0 = isl_multi_val_get_val(sizes, 1);\n\n\tdelta = ppcg_ht_bounds_get_lower(bounds, 0);\n\tok = wide_enough(s0, delta, h);\n\tisl_val_free(delta);\n\n\tdelta = ppcg_ht_bounds_get_upper(bounds);\n\tif (ok == isl_bool_true)\n\t\tok = wide_enough(s0, delta, h);\n\tisl_val_free(delta);\n\n\tisl_val_free(s0);\n\tisl_val_free(h);\n\n\treturn ok;\n}\n\n/* Check that the tile will be wide enough in the first space\n * dimension, i.e., the base of the hexagon.  This ensures that\n * neighboring hexagons in the same phase are far enough apart\n * that they do not depend on each other.\n *\n * Error out if the condition fails to hold.\n */\nstatic isl_stat check_width(__isl_keep ppcg_ht_bounds *bounds,\n\t__isl_keep isl_multi_val *sizes)\n{\n\tisl_bool ok;\n\n\tok = ppcg_ht_bounds_supports_sizes(bounds, sizes);\n\n\tif (ok < 0)\n\t\treturn isl_stat_error;\n\tif (!ok)\n\t\tisl_die(isl_multi_val_get_ctx(sizes), isl_error_invalid,\n\t\t\t\"base of hybrid tiling hexagon not sufficiently wide\",\n\t\t\treturn isl_stat_error);\n\n\treturn isl_stat_ok;\n}\n\n/* Given valid bounds on the relative dependence distances for\n * the pair of nested nodes that \"node\" point to, as well as sufficiently\n * wide tile sizes \"sizes\", insert the corresponding time and space tiling\n * at \"node\", along with a pair of phase nodes that can be used\n * to make further changes.\n * The space of \"sizes\" should be the product of the spaces\n * of the schedules of the pair of parent and child nodes.\n * \"options\" determines whether full tiles should be separated\n * from partial tiles.\n *\n * In particular, given an input of the form\n *\n *\tP - C - ...\n *\n * the output has the form\n *\n *\t        /- F0 - M0 - CT0 - P - C - ...\n *\tPT - seq\n *\t        \\- F1 - M1 - CT1 - P - C - ...\n *\n * PT is the global time tiling.  Within each of these tiles,\n * two phases are executed in order.  Within each phase, the schedule\n * space is further subdivided into tiles through CT0 and CT1.\n * The first dimension of each of these iterates over the hexagons\n * within a phase and these are independent by construction.\n * The F0 and F1 filters filter the statement instances that belong\n * to the corresponding phase.  The M0 and M1 marks contain a pointer\n * to a ppcg_ht_phase object that can be used to perform further changes.\n *\n * After checking that input satisfies the requirements,\n * a data structure is constructed that represents the tiling and\n * two additional data structures are constructed for the two phases\n * of the tiling.  These are then used to define the filters F0 and F1 and\n * combined to construct the time tiling PT.\n * Then the time tiling node PT is inserted, followed by\n * the sequence with the two filters, the CT space tiling nodes and\n * the phase markers M.\n */\n__isl_give isl_schedule_node *ppcg_ht_bounds_insert_tiling(\n\t__isl_take ppcg_ht_bounds *bounds, __isl_take isl_multi_val *sizes,\n\t__isl_take isl_schedule_node *node, struct ppcg_options *options)\n{\n\tisl_ctx *ctx;\n\tisl_union_set *phase0;\n\tisl_union_set *phase1;\n\tisl_multi_union_pw_aff *input, *dom_time;\n\tisl_union_pw_multi_aff *upma;\n\tisl_pw_multi_aff *time;\n\tisl_union_set_list *phases;\n\tppcg_ht_tiling *tiling;\n\tppcg_ht_phase *phase_0;\n\tppcg_ht_phase *phase_1;\n\n\tif (!node || !sizes || !bounds)\n\t\tgoto error;\n\tif (check_input_pattern(node) < 0 || check_width(bounds, sizes) < 0)\n\t\tgoto error;\n\n\tctx = isl_schedule_node_get_ctx(node);\n\n\tinput = extract_input_schedule(node);\n\n\ttiling = ppcg_ht_bounds_construct_tiling(bounds, node, input, sizes);\n\tphase_0 = ppcg_ht_tiling_compute_phase(tiling, 1);\n\tphase_1 = ppcg_ht_tiling_compute_phase(tiling, 0);\n\ttime = combine_time_tile(phase_0, phase_1);\n\tppcg_ht_tiling_free(tiling);\n\n\tupma = isl_union_pw_multi_aff_from_multi_union_pw_aff(\n\t\t\t\t\tisl_multi_union_pw_aff_copy(input));\n\tphase0 = isl_union_set_from_set(ppcg_ht_phase_get_domain(phase_0));\n\tphase0 = isl_union_set_preimage_union_pw_multi_aff(phase0,\n\t\t\t\t\tisl_union_pw_multi_aff_copy(upma));\n\tphase1 = isl_union_set_from_set(ppcg_ht_phase_get_domain(phase_1));\n\tphase1 = isl_union_set_preimage_union_pw_multi_aff(phase1, upma);\n\n\tphases = isl_union_set_list_alloc(ctx, 2);\n\tphases = isl_union_set_list_add(phases, phase0);\n\tphases = isl_union_set_list_add(phases, phase1);\n\n\tdom_time = isl_multi_union_pw_aff_apply_pw_multi_aff(input, time);\n\tnode = isl_schedule_node_insert_partial_schedule(node, dom_time);\n\n\tnode = isl_schedule_node_child(node, 0);\n\n\tnode = isl_schedule_node_insert_sequence(node, phases);\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = insert_space_tiling(phase_0, node, options);\n\tnode = insert_phase(node, phase_0);\n\tnode = isl_schedule_node_parent(node);\n\tnode = isl_schedule_node_next_sibling(node);\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = insert_space_tiling(phase_1, node, options);\n\tnode = insert_phase(node, phase_1);\n\tnode = isl_schedule_node_parent(node);\n\tnode = isl_schedule_node_parent(node);\n\n\tnode = isl_schedule_node_parent(node);\n\n\tisl_multi_val_free(sizes);\n\treturn node;\nerror:\n\tisl_multi_val_free(sizes);\n\tisl_schedule_node_free(node);\n\tppcg_ht_bounds_free(bounds);\n\treturn NULL;\n}\n\n/* Given a branch \"node\" that contains a sequence node with two phases\n * of hybrid tiling as input, call \"fn\" on each of the two phase marker\n * nodes.\n *\n * That is, the input is as follows\n *\n *\t         /- F0 - M0 - ...\n *\t... - seq\n *\t         \\- F1 - M1 - ...\n *\n * and \"fn\" is called on M0 and on M1.\n */\n__isl_give isl_schedule_node *hybrid_tile_foreach_phase(\n\t__isl_take isl_schedule_node *node,\n\t__isl_give isl_schedule_node *(*fn)(__isl_take isl_schedule_node *node,\n\t\tvoid *user), void *user)\n{\n\tint depth0, depth;\n\n\tdepth0 = isl_schedule_node_get_tree_depth(node);\n\n\twhile (node &&\n\t    isl_schedule_node_get_type(node) != isl_schedule_node_sequence)\n\t\tnode = isl_schedule_node_child(node, 0);\n\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = isl_schedule_node_child(node, 0);\n\tif (!node)\n\t\treturn NULL;\n\tnode = fn(node, user);\n\tnode = isl_schedule_node_parent(node);\n\tnode = isl_schedule_node_next_sibling(node);\n\tnode = isl_schedule_node_child(node, 0);\n\tif (!node)\n\t\treturn NULL;\n\tnode = fn(node, user);\n\tnode = isl_schedule_node_parent(node);\n\tnode = isl_schedule_node_parent(node);\n\n\tdepth = isl_schedule_node_get_tree_depth(node);\n\tnode = isl_schedule_node_ancestor(node, depth - depth0);\n\n\treturn node;\n}\n\n/* This function is called on each of the two phase marks\n * in a hybrid tiling tree.\n * Drop the phase mark at \"node\".\n */\nstatic __isl_give isl_schedule_node *drop_phase_mark(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\tisl_id *id;\n\tisl_bool is_phase;\n\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_mark)\n\t\treturn node;\n\n\tid = isl_schedule_node_mark_get_id(node);\n\tis_phase = is_phase_id(id);\n\tisl_id_free(id);\n\n\tif (is_phase < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (is_phase)\n\t\tnode = isl_schedule_node_delete(node);\n\n\treturn node;\n}\n\n/* Given a branch \"node\" that contains a sequence node with two phases\n * of hybrid tiling as input, remove the two phase marker nodes.\n *\n * That is, the input is as follows\n *\n *\t         /- F0 - M0 - ...\n *\t... - seq\n *\t         \\- F1 - M1 - ...\n *\n * and the output is\n *\n *\t         /- F0 - ...\n *\t... - seq\n *\t         \\- F1 - ...\n */\n__isl_give isl_schedule_node *hybrid_tile_drop_phase_marks(\n\t__isl_take isl_schedule_node *node)\n{\n\treturn hybrid_tile_foreach_phase(node, &drop_phase_mark, NULL);\n}\n"
  },
  {
    "path": "src/hybrid.h",
    "content": "#ifndef HYBRID_H\n#define HYBRID_H\n\n#include <isl/val.h>\n#include <isl/schedule_node.h>\n\n#include \"ppcg.h\"\n\nstruct ppcg_ht_bounds;\ntypedef struct ppcg_ht_bounds ppcg_ht_bounds;\n\nstruct ppcg_ht_phase;\ntypedef struct ppcg_ht_phase ppcg_ht_phase;\n\nisl_bool ppcg_ht_has_input_pattern(__isl_keep isl_schedule_node *node);\nisl_bool ppcg_ht_parent_has_input_pattern(__isl_keep isl_schedule_node *node);\n\n__isl_give ppcg_ht_bounds *ppcg_ht_compute_bounds(struct ppcg_scop *scop,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t__isl_keep isl_schedule_node *node);\nvoid ppcg_ht_bounds_dump(__isl_keep ppcg_ht_bounds *bounds);\nisl_bool ppcg_ht_bounds_is_valid(__isl_keep ppcg_ht_bounds *bounds);\nisl_bool ppcg_ht_bounds_supports_sizes(__isl_keep ppcg_ht_bounds *bounds,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t __isl_keep isl_multi_val *sizes);\n__isl_give isl_schedule_node *ppcg_ht_bounds_insert_tiling(\n\t\t__isl_take ppcg_ht_bounds *bounds, __isl_take isl_multi_val *sizes,\n\t\t__isl_take isl_schedule_node *node, struct ppcg_options *options);\n__isl_null ppcg_ht_bounds *ppcg_ht_bounds_free(\n\t\t__isl_take ppcg_ht_bounds *bounds);\n\n__isl_keep ppcg_ht_phase *ppcg_ht_phase_extract_from_mark(\n\t\t__isl_keep isl_schedule_node *node);\n__isl_give isl_schedule_node *ppcg_ht_phase_shift_space_point(\n\t\t__isl_keep ppcg_ht_phase *phase, __isl_take isl_schedule_node *node);\n__isl_give isl_schedule_node *hybrid_tile_foreach_phase(\n\t\t__isl_take isl_schedule_node *node,\n\t\t__isl_give isl_schedule_node *(*fn)(__isl_take isl_schedule_node *node,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tvoid *user),\n\t\tvoid *user);\n__isl_give isl_schedule_node *hybrid_tile_drop_phase_marks(\n\t\t__isl_take isl_schedule_node *node);\n\n#endif\n"
  },
  {
    "path": "src/json.hpp",
    "content": "/*\n    __ _____ _____ _____\n __|  |   __|     |   | |  JSON for Modern C++\n|  |  |__   |  |  | | | |  version 3.9.1\n|_____|_____|_____|_|___|  https://github.com/nlohmann/json\n\nLicensed under the MIT License <http://opensource.org/licenses/MIT>.\nSPDX-License-Identifier: MIT\nCopyright (c) 2013-2019 Niels Lohmann <http://nlohmann.me>.\n\nPermission is hereby  granted, free of charge, to any  person obtaining a copy\nof this software and associated  documentation files (the \"Software\"), to deal\nin the Software  without restriction, including without  limitation the rights\nto  use, copy,  modify, merge,  publish, distribute,  sublicense, and/or  sell\ncopies  of  the Software,  and  to  permit persons  to  whom  the Software  is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE  IS PROVIDED \"AS  IS\", WITHOUT WARRANTY  OF ANY KIND,  EXPRESS OR\nIMPLIED,  INCLUDING BUT  NOT  LIMITED TO  THE  WARRANTIES OF  MERCHANTABILITY,\nFITNESS FOR  A PARTICULAR PURPOSE AND  NONINFRINGEMENT. IN NO EVENT  SHALL THE\nAUTHORS  OR COPYRIGHT  HOLDERS  BE  LIABLE FOR  ANY  CLAIM,  DAMAGES OR  OTHER\nLIABILITY, WHETHER IN AN ACTION OF  CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE  OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n*/\n\n#ifndef INCLUDE_NLOHMANN_JSON_HPP_\n#define INCLUDE_NLOHMANN_JSON_HPP_\n\n#define NLOHMANN_JSON_VERSION_MAJOR 3\n#define NLOHMANN_JSON_VERSION_MINOR 9\n#define NLOHMANN_JSON_VERSION_PATCH 1\n\n#include <algorithm> // all_of, find, for_each\n#include <cstddef> // nullptr_t, ptrdiff_t, size_t\n#include <functional> // hash, less\n#include <initializer_list> // initializer_list\n#include <iosfwd> // istream, ostream\n#include <iterator> // random_access_iterator_tag\n#include <memory> // unique_ptr\n#include <numeric> // accumulate\n#include <string> // string, stoi, to_string\n#include <utility> // declval, forward, move, pair, swap\n#include <vector> // vector\n\n// #include <nlohmann/adl_serializer.hpp>\n\n\n#include <utility>\n\n// #include <nlohmann/detail/conversions/from_json.hpp>\n\n\n#include <algorithm> // transform\n#include <array> // array\n#include <forward_list> // forward_list\n#include <iterator> // inserter, front_inserter, end\n#include <map> // map\n#include <string> // string\n#include <tuple> // tuple, make_tuple\n#include <type_traits> // is_arithmetic, is_same, is_enum, underlying_type, is_convertible\n#include <unordered_map> // unordered_map\n#include <utility> // pair, declval\n#include <valarray> // valarray\n\n// #include <nlohmann/detail/exceptions.hpp>\n\n\n#include <exception> // exception\n#include <stdexcept> // runtime_error\n#include <string> // to_string\n#include <vector> // vector\n\n// #include <nlohmann/detail/value_t.hpp>\n\n\n#include <array> // array\n#include <cstddef> // size_t\n#include <cstdint> // uint8_t\n#include <string> // string\n\nnamespace nlohmann\n{\nnamespace detail\n{\n///////////////////////////\n// JSON type enumeration //\n///////////////////////////\n\n/*!\n@brief the JSON type enumeration\n\nThis enumeration collects the different JSON types. It is internally used to\ndistinguish the stored values, and the functions @ref basic_json::is_null(),\n@ref basic_json::is_object(), @ref basic_json::is_array(),\n@ref basic_json::is_string(), @ref basic_json::is_boolean(),\n@ref basic_json::is_number() (with @ref basic_json::is_number_integer(),\n@ref basic_json::is_number_unsigned(), and @ref basic_json::is_number_float()),\n@ref basic_json::is_discarded(), @ref basic_json::is_primitive(), and\n@ref basic_json::is_structured() rely on it.\n\n@note There are three enumeration entries (number_integer, number_unsigned, and\nnumber_float), because the library distinguishes these three types for numbers:\n@ref basic_json::number_unsigned_t is used for unsigned integers,\n@ref basic_json::number_integer_t is used for signed integers, and\n@ref basic_json::number_float_t is used for floating-point numbers or to\napproximate integers which do not fit in the limits of their respective type.\n\n@sa see @ref basic_json::basic_json(const value_t value_type) -- create a JSON\nvalue with the default value for a given type\n\n@since version 1.0.0\n*/\nenum class value_t : std::uint8_t\n{\n    null,             ///< null value\n    object,           ///< object (unordered set of name/value pairs)\n    array,            ///< array (ordered collection of values)\n    string,           ///< string value\n    boolean,          ///< boolean value\n    number_integer,   ///< number value (signed integer)\n    number_unsigned,  ///< number value (unsigned integer)\n    number_float,     ///< number value (floating-point)\n    binary,           ///< binary array (ordered collection of bytes)\n    discarded         ///< discarded by the parser callback function\n};\n\n/*!\n@brief comparison operator for JSON types\n\nReturns an ordering that is similar to Python:\n- order: null < boolean < number < object < array < string < binary\n- furthermore, each type is not smaller than itself\n- discarded values are not comparable\n- binary is represented as a b\"\" string in python and directly comparable to a\n  string; however, making a binary array directly comparable with a string would\n  be surprising behavior in a JSON file.\n\n@since version 1.0.0\n*/\ninline bool operator<(const value_t lhs, const value_t rhs) noexcept\n{\n    static constexpr std::array<std::uint8_t, 9> order = {{\n            0 /* null */, 3 /* object */, 4 /* array */, 5 /* string */,\n            1 /* boolean */, 2 /* integer */, 2 /* unsigned */, 2 /* float */,\n            6 /* binary */\n        }\n    };\n\n    const auto l_index = static_cast<std::size_t>(lhs);\n    const auto r_index = static_cast<std::size_t>(rhs);\n    return l_index < order.size() && r_index < order.size() && order[l_index] < order[r_index];\n}\n}  // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/string_escape.hpp>\n\n\n#include <string>\n// #include <nlohmann/detail/macro_scope.hpp>\n\n\n#include <utility> // pair\n// #include <nlohmann/thirdparty/hedley/hedley.hpp>\n\n\n/* Hedley - https://nemequ.github.io/hedley\n * Created by Evan Nemerson <evan@nemerson.com>\n *\n * To the extent possible under law, the author(s) have dedicated all\n * copyright and related and neighboring rights to this software to\n * the public domain worldwide. This software is distributed without\n * any warranty.\n *\n * For details, see <http://creativecommons.org/publicdomain/zero/1.0/>.\n * SPDX-License-Identifier: CC0-1.0\n */\n\n#if !defined(JSON_HEDLEY_VERSION) || (JSON_HEDLEY_VERSION < 15)\n#if defined(JSON_HEDLEY_VERSION)\n    #undef JSON_HEDLEY_VERSION\n#endif\n#define JSON_HEDLEY_VERSION 15\n\n#if defined(JSON_HEDLEY_STRINGIFY_EX)\n    #undef JSON_HEDLEY_STRINGIFY_EX\n#endif\n#define JSON_HEDLEY_STRINGIFY_EX(x) #x\n\n#if defined(JSON_HEDLEY_STRINGIFY)\n    #undef JSON_HEDLEY_STRINGIFY\n#endif\n#define JSON_HEDLEY_STRINGIFY(x) JSON_HEDLEY_STRINGIFY_EX(x)\n\n#if defined(JSON_HEDLEY_CONCAT_EX)\n    #undef JSON_HEDLEY_CONCAT_EX\n#endif\n#define JSON_HEDLEY_CONCAT_EX(a,b) a##b\n\n#if defined(JSON_HEDLEY_CONCAT)\n    #undef JSON_HEDLEY_CONCAT\n#endif\n#define JSON_HEDLEY_CONCAT(a,b) JSON_HEDLEY_CONCAT_EX(a,b)\n\n#if defined(JSON_HEDLEY_CONCAT3_EX)\n    #undef JSON_HEDLEY_CONCAT3_EX\n#endif\n#define JSON_HEDLEY_CONCAT3_EX(a,b,c) a##b##c\n\n#if defined(JSON_HEDLEY_CONCAT3)\n    #undef JSON_HEDLEY_CONCAT3\n#endif\n#define JSON_HEDLEY_CONCAT3(a,b,c) JSON_HEDLEY_CONCAT3_EX(a,b,c)\n\n#if defined(JSON_HEDLEY_VERSION_ENCODE)\n    #undef JSON_HEDLEY_VERSION_ENCODE\n#endif\n#define JSON_HEDLEY_VERSION_ENCODE(major,minor,revision) (((major) * 1000000) + ((minor) * 1000) + (revision))\n\n#if defined(JSON_HEDLEY_VERSION_DECODE_MAJOR)\n    #undef JSON_HEDLEY_VERSION_DECODE_MAJOR\n#endif\n#define JSON_HEDLEY_VERSION_DECODE_MAJOR(version) ((version) / 1000000)\n\n#if defined(JSON_HEDLEY_VERSION_DECODE_MINOR)\n    #undef JSON_HEDLEY_VERSION_DECODE_MINOR\n#endif\n#define JSON_HEDLEY_VERSION_DECODE_MINOR(version) (((version) % 1000000) / 1000)\n\n#if defined(JSON_HEDLEY_VERSION_DECODE_REVISION)\n    #undef JSON_HEDLEY_VERSION_DECODE_REVISION\n#endif\n#define JSON_HEDLEY_VERSION_DECODE_REVISION(version) ((version) % 1000)\n\n#if defined(JSON_HEDLEY_GNUC_VERSION)\n    #undef JSON_HEDLEY_GNUC_VERSION\n#endif\n#if defined(__GNUC__) && defined(__GNUC_PATCHLEVEL__)\n    #define JSON_HEDLEY_GNUC_VERSION JSON_HEDLEY_VERSION_ENCODE(__GNUC__, __GNUC_MINOR__, __GNUC_PATCHLEVEL__)\n#elif defined(__GNUC__)\n    #define JSON_HEDLEY_GNUC_VERSION JSON_HEDLEY_VERSION_ENCODE(__GNUC__, __GNUC_MINOR__, 0)\n#endif\n\n#if defined(JSON_HEDLEY_GNUC_VERSION_CHECK)\n    #undef JSON_HEDLEY_GNUC_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_GNUC_VERSION)\n    #define JSON_HEDLEY_GNUC_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_GNUC_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_GNUC_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_MSVC_VERSION)\n    #undef JSON_HEDLEY_MSVC_VERSION\n#endif\n#if defined(_MSC_FULL_VER) && (_MSC_FULL_VER >= 140000000) && !defined(__ICL)\n    #define JSON_HEDLEY_MSVC_VERSION JSON_HEDLEY_VERSION_ENCODE(_MSC_FULL_VER / 10000000, (_MSC_FULL_VER % 10000000) / 100000, (_MSC_FULL_VER % 100000) / 100)\n#elif defined(_MSC_FULL_VER) && !defined(__ICL)\n    #define JSON_HEDLEY_MSVC_VERSION JSON_HEDLEY_VERSION_ENCODE(_MSC_FULL_VER / 1000000, (_MSC_FULL_VER % 1000000) / 10000, (_MSC_FULL_VER % 10000) / 10)\n#elif defined(_MSC_VER) && !defined(__ICL)\n    #define JSON_HEDLEY_MSVC_VERSION JSON_HEDLEY_VERSION_ENCODE(_MSC_VER / 100, _MSC_VER % 100, 0)\n#endif\n\n#if defined(JSON_HEDLEY_MSVC_VERSION_CHECK)\n    #undef JSON_HEDLEY_MSVC_VERSION_CHECK\n#endif\n#if !defined(JSON_HEDLEY_MSVC_VERSION)\n    #define JSON_HEDLEY_MSVC_VERSION_CHECK(major,minor,patch) (0)\n#elif defined(_MSC_VER) && (_MSC_VER >= 1400)\n    #define JSON_HEDLEY_MSVC_VERSION_CHECK(major,minor,patch) (_MSC_FULL_VER >= ((major * 10000000) + (minor * 100000) + (patch)))\n#elif defined(_MSC_VER) && (_MSC_VER >= 1200)\n    #define JSON_HEDLEY_MSVC_VERSION_CHECK(major,minor,patch) (_MSC_FULL_VER >= ((major * 1000000) + (minor * 10000) + (patch)))\n#else\n    #define JSON_HEDLEY_MSVC_VERSION_CHECK(major,minor,patch) (_MSC_VER >= ((major * 100) + (minor)))\n#endif\n\n#if defined(JSON_HEDLEY_INTEL_VERSION)\n    #undef JSON_HEDLEY_INTEL_VERSION\n#endif\n#if defined(__INTEL_COMPILER) && defined(__INTEL_COMPILER_UPDATE) && !defined(__ICL)\n    #define JSON_HEDLEY_INTEL_VERSION JSON_HEDLEY_VERSION_ENCODE(__INTEL_COMPILER / 100, __INTEL_COMPILER % 100, __INTEL_COMPILER_UPDATE)\n#elif defined(__INTEL_COMPILER) && !defined(__ICL)\n    #define JSON_HEDLEY_INTEL_VERSION JSON_HEDLEY_VERSION_ENCODE(__INTEL_COMPILER / 100, __INTEL_COMPILER % 100, 0)\n#endif\n\n#if defined(JSON_HEDLEY_INTEL_VERSION_CHECK)\n    #undef JSON_HEDLEY_INTEL_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_INTEL_VERSION)\n    #define JSON_HEDLEY_INTEL_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_INTEL_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_INTEL_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_INTEL_CL_VERSION)\n    #undef JSON_HEDLEY_INTEL_CL_VERSION\n#endif\n#if defined(__INTEL_COMPILER) && defined(__INTEL_COMPILER_UPDATE) && defined(__ICL)\n    #define JSON_HEDLEY_INTEL_CL_VERSION JSON_HEDLEY_VERSION_ENCODE(__INTEL_COMPILER, __INTEL_COMPILER_UPDATE, 0)\n#endif\n\n#if defined(JSON_HEDLEY_INTEL_CL_VERSION_CHECK)\n    #undef JSON_HEDLEY_INTEL_CL_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_INTEL_CL_VERSION)\n    #define JSON_HEDLEY_INTEL_CL_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_INTEL_CL_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_INTEL_CL_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_PGI_VERSION)\n    #undef JSON_HEDLEY_PGI_VERSION\n#endif\n#if defined(__PGI) && defined(__PGIC__) && defined(__PGIC_MINOR__) && defined(__PGIC_PATCHLEVEL__)\n    #define JSON_HEDLEY_PGI_VERSION JSON_HEDLEY_VERSION_ENCODE(__PGIC__, __PGIC_MINOR__, __PGIC_PATCHLEVEL__)\n#endif\n\n#if defined(JSON_HEDLEY_PGI_VERSION_CHECK)\n    #undef JSON_HEDLEY_PGI_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_PGI_VERSION)\n    #define JSON_HEDLEY_PGI_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_PGI_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_PGI_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_SUNPRO_VERSION)\n    #undef JSON_HEDLEY_SUNPRO_VERSION\n#endif\n#if defined(__SUNPRO_C) && (__SUNPRO_C > 0x1000)\n    #define JSON_HEDLEY_SUNPRO_VERSION JSON_HEDLEY_VERSION_ENCODE((((__SUNPRO_C >> 16) & 0xf) * 10) + ((__SUNPRO_C >> 12) & 0xf), (((__SUNPRO_C >> 8) & 0xf) * 10) + ((__SUNPRO_C >> 4) & 0xf), (__SUNPRO_C & 0xf) * 10)\n#elif defined(__SUNPRO_C)\n    #define JSON_HEDLEY_SUNPRO_VERSION JSON_HEDLEY_VERSION_ENCODE((__SUNPRO_C >> 8) & 0xf, (__SUNPRO_C >> 4) & 0xf, (__SUNPRO_C) & 0xf)\n#elif defined(__SUNPRO_CC) && (__SUNPRO_CC > 0x1000)\n    #define JSON_HEDLEY_SUNPRO_VERSION JSON_HEDLEY_VERSION_ENCODE((((__SUNPRO_CC >> 16) & 0xf) * 10) + ((__SUNPRO_CC >> 12) & 0xf), (((__SUNPRO_CC >> 8) & 0xf) * 10) + ((__SUNPRO_CC >> 4) & 0xf), (__SUNPRO_CC & 0xf) * 10)\n#elif defined(__SUNPRO_CC)\n    #define JSON_HEDLEY_SUNPRO_VERSION JSON_HEDLEY_VERSION_ENCODE((__SUNPRO_CC >> 8) & 0xf, (__SUNPRO_CC >> 4) & 0xf, (__SUNPRO_CC) & 0xf)\n#endif\n\n#if defined(JSON_HEDLEY_SUNPRO_VERSION_CHECK)\n    #undef JSON_HEDLEY_SUNPRO_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_SUNPRO_VERSION)\n    #define JSON_HEDLEY_SUNPRO_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_SUNPRO_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_SUNPRO_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_EMSCRIPTEN_VERSION)\n    #undef JSON_HEDLEY_EMSCRIPTEN_VERSION\n#endif\n#if defined(__EMSCRIPTEN__)\n    #define JSON_HEDLEY_EMSCRIPTEN_VERSION JSON_HEDLEY_VERSION_ENCODE(__EMSCRIPTEN_major__, __EMSCRIPTEN_minor__, __EMSCRIPTEN_tiny__)\n#endif\n\n#if defined(JSON_HEDLEY_EMSCRIPTEN_VERSION_CHECK)\n    #undef JSON_HEDLEY_EMSCRIPTEN_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_EMSCRIPTEN_VERSION)\n    #define JSON_HEDLEY_EMSCRIPTEN_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_EMSCRIPTEN_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_EMSCRIPTEN_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_ARM_VERSION)\n    #undef JSON_HEDLEY_ARM_VERSION\n#endif\n#if defined(__CC_ARM) && defined(__ARMCOMPILER_VERSION)\n    #define JSON_HEDLEY_ARM_VERSION JSON_HEDLEY_VERSION_ENCODE(__ARMCOMPILER_VERSION / 1000000, (__ARMCOMPILER_VERSION % 1000000) / 10000, (__ARMCOMPILER_VERSION % 10000) / 100)\n#elif defined(__CC_ARM) && defined(__ARMCC_VERSION)\n    #define JSON_HEDLEY_ARM_VERSION JSON_HEDLEY_VERSION_ENCODE(__ARMCC_VERSION / 1000000, (__ARMCC_VERSION % 1000000) / 10000, (__ARMCC_VERSION % 10000) / 100)\n#endif\n\n#if defined(JSON_HEDLEY_ARM_VERSION_CHECK)\n    #undef JSON_HEDLEY_ARM_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_ARM_VERSION)\n    #define JSON_HEDLEY_ARM_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_ARM_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_ARM_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_IBM_VERSION)\n    #undef JSON_HEDLEY_IBM_VERSION\n#endif\n#if defined(__ibmxl__)\n    #define JSON_HEDLEY_IBM_VERSION JSON_HEDLEY_VERSION_ENCODE(__ibmxl_version__, __ibmxl_release__, __ibmxl_modification__)\n#elif defined(__xlC__) && defined(__xlC_ver__)\n    #define JSON_HEDLEY_IBM_VERSION JSON_HEDLEY_VERSION_ENCODE(__xlC__ >> 8, __xlC__ & 0xff, (__xlC_ver__ >> 8) & 0xff)\n#elif defined(__xlC__)\n    #define JSON_HEDLEY_IBM_VERSION JSON_HEDLEY_VERSION_ENCODE(__xlC__ >> 8, __xlC__ & 0xff, 0)\n#endif\n\n#if defined(JSON_HEDLEY_IBM_VERSION_CHECK)\n    #undef JSON_HEDLEY_IBM_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_IBM_VERSION)\n    #define JSON_HEDLEY_IBM_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_IBM_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_IBM_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_TI_VERSION)\n    #undef JSON_HEDLEY_TI_VERSION\n#endif\n#if \\\n    defined(__TI_COMPILER_VERSION__) && \\\n    ( \\\n      defined(__TMS470__) || defined(__TI_ARM__) || \\\n      defined(__MSP430__) || \\\n      defined(__TMS320C2000__) \\\n    )\n#if (__TI_COMPILER_VERSION__ >= 16000000)\n    #define JSON_HEDLEY_TI_VERSION JSON_HEDLEY_VERSION_ENCODE(__TI_COMPILER_VERSION__ / 1000000, (__TI_COMPILER_VERSION__ % 1000000) / 1000, (__TI_COMPILER_VERSION__ % 1000))\n#endif\n#endif\n\n#if defined(JSON_HEDLEY_TI_VERSION_CHECK)\n    #undef JSON_HEDLEY_TI_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_TI_VERSION)\n    #define JSON_HEDLEY_TI_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_TI_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_TI_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_TI_CL2000_VERSION)\n    #undef JSON_HEDLEY_TI_CL2000_VERSION\n#endif\n#if defined(__TI_COMPILER_VERSION__) && defined(__TMS320C2000__)\n    #define JSON_HEDLEY_TI_CL2000_VERSION JSON_HEDLEY_VERSION_ENCODE(__TI_COMPILER_VERSION__ / 1000000, (__TI_COMPILER_VERSION__ % 1000000) / 1000, (__TI_COMPILER_VERSION__ % 1000))\n#endif\n\n#if defined(JSON_HEDLEY_TI_CL2000_VERSION_CHECK)\n    #undef JSON_HEDLEY_TI_CL2000_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_TI_CL2000_VERSION)\n    #define JSON_HEDLEY_TI_CL2000_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_TI_CL2000_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_TI_CL2000_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_TI_CL430_VERSION)\n    #undef JSON_HEDLEY_TI_CL430_VERSION\n#endif\n#if defined(__TI_COMPILER_VERSION__) && defined(__MSP430__)\n    #define JSON_HEDLEY_TI_CL430_VERSION JSON_HEDLEY_VERSION_ENCODE(__TI_COMPILER_VERSION__ / 1000000, (__TI_COMPILER_VERSION__ % 1000000) / 1000, (__TI_COMPILER_VERSION__ % 1000))\n#endif\n\n#if defined(JSON_HEDLEY_TI_CL430_VERSION_CHECK)\n    #undef JSON_HEDLEY_TI_CL430_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_TI_CL430_VERSION)\n    #define JSON_HEDLEY_TI_CL430_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_TI_CL430_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_TI_CL430_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_TI_ARMCL_VERSION)\n    #undef JSON_HEDLEY_TI_ARMCL_VERSION\n#endif\n#if defined(__TI_COMPILER_VERSION__) && (defined(__TMS470__) || defined(__TI_ARM__))\n    #define JSON_HEDLEY_TI_ARMCL_VERSION JSON_HEDLEY_VERSION_ENCODE(__TI_COMPILER_VERSION__ / 1000000, (__TI_COMPILER_VERSION__ % 1000000) / 1000, (__TI_COMPILER_VERSION__ % 1000))\n#endif\n\n#if defined(JSON_HEDLEY_TI_ARMCL_VERSION_CHECK)\n    #undef JSON_HEDLEY_TI_ARMCL_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_TI_ARMCL_VERSION)\n    #define JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_TI_ARMCL_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_TI_CL6X_VERSION)\n    #undef JSON_HEDLEY_TI_CL6X_VERSION\n#endif\n#if defined(__TI_COMPILER_VERSION__) && defined(__TMS320C6X__)\n    #define JSON_HEDLEY_TI_CL6X_VERSION JSON_HEDLEY_VERSION_ENCODE(__TI_COMPILER_VERSION__ / 1000000, (__TI_COMPILER_VERSION__ % 1000000) / 1000, (__TI_COMPILER_VERSION__ % 1000))\n#endif\n\n#if defined(JSON_HEDLEY_TI_CL6X_VERSION_CHECK)\n    #undef JSON_HEDLEY_TI_CL6X_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_TI_CL6X_VERSION)\n    #define JSON_HEDLEY_TI_CL6X_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_TI_CL6X_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_TI_CL6X_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_TI_CL7X_VERSION)\n    #undef JSON_HEDLEY_TI_CL7X_VERSION\n#endif\n#if defined(__TI_COMPILER_VERSION__) && defined(__C7000__)\n    #define JSON_HEDLEY_TI_CL7X_VERSION JSON_HEDLEY_VERSION_ENCODE(__TI_COMPILER_VERSION__ / 1000000, (__TI_COMPILER_VERSION__ % 1000000) / 1000, (__TI_COMPILER_VERSION__ % 1000))\n#endif\n\n#if defined(JSON_HEDLEY_TI_CL7X_VERSION_CHECK)\n    #undef JSON_HEDLEY_TI_CL7X_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_TI_CL7X_VERSION)\n    #define JSON_HEDLEY_TI_CL7X_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_TI_CL7X_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_TI_CL7X_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_TI_CLPRU_VERSION)\n    #undef JSON_HEDLEY_TI_CLPRU_VERSION\n#endif\n#if defined(__TI_COMPILER_VERSION__) && defined(__PRU__)\n    #define JSON_HEDLEY_TI_CLPRU_VERSION JSON_HEDLEY_VERSION_ENCODE(__TI_COMPILER_VERSION__ / 1000000, (__TI_COMPILER_VERSION__ % 1000000) / 1000, (__TI_COMPILER_VERSION__ % 1000))\n#endif\n\n#if defined(JSON_HEDLEY_TI_CLPRU_VERSION_CHECK)\n    #undef JSON_HEDLEY_TI_CLPRU_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_TI_CLPRU_VERSION)\n    #define JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_TI_CLPRU_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_CRAY_VERSION)\n    #undef JSON_HEDLEY_CRAY_VERSION\n#endif\n#if defined(_CRAYC)\n    #if defined(_RELEASE_PATCHLEVEL)\n        #define JSON_HEDLEY_CRAY_VERSION JSON_HEDLEY_VERSION_ENCODE(_RELEASE_MAJOR, _RELEASE_MINOR, _RELEASE_PATCHLEVEL)\n    #else\n        #define JSON_HEDLEY_CRAY_VERSION JSON_HEDLEY_VERSION_ENCODE(_RELEASE_MAJOR, _RELEASE_MINOR, 0)\n    #endif\n#endif\n\n#if defined(JSON_HEDLEY_CRAY_VERSION_CHECK)\n    #undef JSON_HEDLEY_CRAY_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_CRAY_VERSION)\n    #define JSON_HEDLEY_CRAY_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_CRAY_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_CRAY_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_IAR_VERSION)\n    #undef JSON_HEDLEY_IAR_VERSION\n#endif\n#if defined(__IAR_SYSTEMS_ICC__)\n    #if __VER__ > 1000\n        #define JSON_HEDLEY_IAR_VERSION JSON_HEDLEY_VERSION_ENCODE((__VER__ / 1000000), ((__VER__ / 1000) % 1000), (__VER__ % 1000))\n    #else\n        #define JSON_HEDLEY_IAR_VERSION JSON_HEDLEY_VERSION_ENCODE(__VER__ / 100, __VER__ % 100, 0)\n    #endif\n#endif\n\n#if defined(JSON_HEDLEY_IAR_VERSION_CHECK)\n    #undef JSON_HEDLEY_IAR_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_IAR_VERSION)\n    #define JSON_HEDLEY_IAR_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_IAR_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_IAR_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_TINYC_VERSION)\n    #undef JSON_HEDLEY_TINYC_VERSION\n#endif\n#if defined(__TINYC__)\n    #define JSON_HEDLEY_TINYC_VERSION JSON_HEDLEY_VERSION_ENCODE(__TINYC__ / 1000, (__TINYC__ / 100) % 10, __TINYC__ % 100)\n#endif\n\n#if defined(JSON_HEDLEY_TINYC_VERSION_CHECK)\n    #undef JSON_HEDLEY_TINYC_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_TINYC_VERSION)\n    #define JSON_HEDLEY_TINYC_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_TINYC_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_TINYC_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_DMC_VERSION)\n    #undef JSON_HEDLEY_DMC_VERSION\n#endif\n#if defined(__DMC__)\n    #define JSON_HEDLEY_DMC_VERSION JSON_HEDLEY_VERSION_ENCODE(__DMC__ >> 8, (__DMC__ >> 4) & 0xf, __DMC__ & 0xf)\n#endif\n\n#if defined(JSON_HEDLEY_DMC_VERSION_CHECK)\n    #undef JSON_HEDLEY_DMC_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_DMC_VERSION)\n    #define JSON_HEDLEY_DMC_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_DMC_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_DMC_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_COMPCERT_VERSION)\n    #undef JSON_HEDLEY_COMPCERT_VERSION\n#endif\n#if defined(__COMPCERT_VERSION__)\n    #define JSON_HEDLEY_COMPCERT_VERSION JSON_HEDLEY_VERSION_ENCODE(__COMPCERT_VERSION__ / 10000, (__COMPCERT_VERSION__ / 100) % 100, __COMPCERT_VERSION__ % 100)\n#endif\n\n#if defined(JSON_HEDLEY_COMPCERT_VERSION_CHECK)\n    #undef JSON_HEDLEY_COMPCERT_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_COMPCERT_VERSION)\n    #define JSON_HEDLEY_COMPCERT_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_COMPCERT_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_COMPCERT_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_PELLES_VERSION)\n    #undef JSON_HEDLEY_PELLES_VERSION\n#endif\n#if defined(__POCC__)\n    #define JSON_HEDLEY_PELLES_VERSION JSON_HEDLEY_VERSION_ENCODE(__POCC__ / 100, __POCC__ % 100, 0)\n#endif\n\n#if defined(JSON_HEDLEY_PELLES_VERSION_CHECK)\n    #undef JSON_HEDLEY_PELLES_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_PELLES_VERSION)\n    #define JSON_HEDLEY_PELLES_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_PELLES_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_PELLES_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_MCST_LCC_VERSION)\n    #undef JSON_HEDLEY_MCST_LCC_VERSION\n#endif\n#if defined(__LCC__) && defined(__LCC_MINOR__)\n    #define JSON_HEDLEY_MCST_LCC_VERSION JSON_HEDLEY_VERSION_ENCODE(__LCC__ / 100, __LCC__ % 100, __LCC_MINOR__)\n#endif\n\n#if defined(JSON_HEDLEY_MCST_LCC_VERSION_CHECK)\n    #undef JSON_HEDLEY_MCST_LCC_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_MCST_LCC_VERSION)\n    #define JSON_HEDLEY_MCST_LCC_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_MCST_LCC_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_MCST_LCC_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_GCC_VERSION)\n    #undef JSON_HEDLEY_GCC_VERSION\n#endif\n#if \\\n    defined(JSON_HEDLEY_GNUC_VERSION) && \\\n    !defined(__clang__) && \\\n    !defined(JSON_HEDLEY_INTEL_VERSION) && \\\n    !defined(JSON_HEDLEY_PGI_VERSION) && \\\n    !defined(JSON_HEDLEY_ARM_VERSION) && \\\n    !defined(JSON_HEDLEY_CRAY_VERSION) && \\\n    !defined(JSON_HEDLEY_TI_VERSION) && \\\n    !defined(JSON_HEDLEY_TI_ARMCL_VERSION) && \\\n    !defined(JSON_HEDLEY_TI_CL430_VERSION) && \\\n    !defined(JSON_HEDLEY_TI_CL2000_VERSION) && \\\n    !defined(JSON_HEDLEY_TI_CL6X_VERSION) && \\\n    !defined(JSON_HEDLEY_TI_CL7X_VERSION) && \\\n    !defined(JSON_HEDLEY_TI_CLPRU_VERSION) && \\\n    !defined(__COMPCERT__) && \\\n    !defined(JSON_HEDLEY_MCST_LCC_VERSION)\n    #define JSON_HEDLEY_GCC_VERSION JSON_HEDLEY_GNUC_VERSION\n#endif\n\n#if defined(JSON_HEDLEY_GCC_VERSION_CHECK)\n    #undef JSON_HEDLEY_GCC_VERSION_CHECK\n#endif\n#if defined(JSON_HEDLEY_GCC_VERSION)\n    #define JSON_HEDLEY_GCC_VERSION_CHECK(major,minor,patch) (JSON_HEDLEY_GCC_VERSION >= JSON_HEDLEY_VERSION_ENCODE(major, minor, patch))\n#else\n    #define JSON_HEDLEY_GCC_VERSION_CHECK(major,minor,patch) (0)\n#endif\n\n#if defined(JSON_HEDLEY_HAS_ATTRIBUTE)\n    #undef JSON_HEDLEY_HAS_ATTRIBUTE\n#endif\n#if \\\n  defined(__has_attribute) && \\\n  ( \\\n    (!defined(JSON_HEDLEY_IAR_VERSION) || JSON_HEDLEY_IAR_VERSION_CHECK(8,5,9)) \\\n  )\n#  define JSON_HEDLEY_HAS_ATTRIBUTE(attribute) __has_attribute(attribute)\n#else\n#  define JSON_HEDLEY_HAS_ATTRIBUTE(attribute) (0)\n#endif\n\n#if defined(JSON_HEDLEY_GNUC_HAS_ATTRIBUTE)\n    #undef JSON_HEDLEY_GNUC_HAS_ATTRIBUTE\n#endif\n#if defined(__has_attribute)\n    #define JSON_HEDLEY_GNUC_HAS_ATTRIBUTE(attribute,major,minor,patch) JSON_HEDLEY_HAS_ATTRIBUTE(attribute)\n#else\n    #define JSON_HEDLEY_GNUC_HAS_ATTRIBUTE(attribute,major,minor,patch) JSON_HEDLEY_GNUC_VERSION_CHECK(major,minor,patch)\n#endif\n\n#if defined(JSON_HEDLEY_GCC_HAS_ATTRIBUTE)\n    #undef JSON_HEDLEY_GCC_HAS_ATTRIBUTE\n#endif\n#if defined(__has_attribute)\n    #define JSON_HEDLEY_GCC_HAS_ATTRIBUTE(attribute,major,minor,patch) JSON_HEDLEY_HAS_ATTRIBUTE(attribute)\n#else\n    #define JSON_HEDLEY_GCC_HAS_ATTRIBUTE(attribute,major,minor,patch) JSON_HEDLEY_GCC_VERSION_CHECK(major,minor,patch)\n#endif\n\n#if defined(JSON_HEDLEY_HAS_CPP_ATTRIBUTE)\n    #undef JSON_HEDLEY_HAS_CPP_ATTRIBUTE\n#endif\n#if \\\n    defined(__has_cpp_attribute) && \\\n    defined(__cplusplus) && \\\n    (!defined(JSON_HEDLEY_SUNPRO_VERSION) || JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,15,0))\n    #define JSON_HEDLEY_HAS_CPP_ATTRIBUTE(attribute) __has_cpp_attribute(attribute)\n#else\n    #define JSON_HEDLEY_HAS_CPP_ATTRIBUTE(attribute) (0)\n#endif\n\n#if defined(JSON_HEDLEY_HAS_CPP_ATTRIBUTE_NS)\n    #undef JSON_HEDLEY_HAS_CPP_ATTRIBUTE_NS\n#endif\n#if !defined(__cplusplus) || !defined(__has_cpp_attribute)\n    #define JSON_HEDLEY_HAS_CPP_ATTRIBUTE_NS(ns,attribute) (0)\n#elif \\\n    !defined(JSON_HEDLEY_PGI_VERSION) && \\\n    !defined(JSON_HEDLEY_IAR_VERSION) && \\\n    (!defined(JSON_HEDLEY_SUNPRO_VERSION) || JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,15,0)) && \\\n    (!defined(JSON_HEDLEY_MSVC_VERSION) || JSON_HEDLEY_MSVC_VERSION_CHECK(19,20,0))\n    #define JSON_HEDLEY_HAS_CPP_ATTRIBUTE_NS(ns,attribute) JSON_HEDLEY_HAS_CPP_ATTRIBUTE(ns::attribute)\n#else\n    #define JSON_HEDLEY_HAS_CPP_ATTRIBUTE_NS(ns,attribute) (0)\n#endif\n\n#if defined(JSON_HEDLEY_GNUC_HAS_CPP_ATTRIBUTE)\n    #undef JSON_HEDLEY_GNUC_HAS_CPP_ATTRIBUTE\n#endif\n#if defined(__has_cpp_attribute) && defined(__cplusplus)\n    #define JSON_HEDLEY_GNUC_HAS_CPP_ATTRIBUTE(attribute,major,minor,patch) __has_cpp_attribute(attribute)\n#else\n    #define JSON_HEDLEY_GNUC_HAS_CPP_ATTRIBUTE(attribute,major,minor,patch) JSON_HEDLEY_GNUC_VERSION_CHECK(major,minor,patch)\n#endif\n\n#if defined(JSON_HEDLEY_GCC_HAS_CPP_ATTRIBUTE)\n    #undef JSON_HEDLEY_GCC_HAS_CPP_ATTRIBUTE\n#endif\n#if defined(__has_cpp_attribute) && defined(__cplusplus)\n    #define JSON_HEDLEY_GCC_HAS_CPP_ATTRIBUTE(attribute,major,minor,patch) __has_cpp_attribute(attribute)\n#else\n    #define JSON_HEDLEY_GCC_HAS_CPP_ATTRIBUTE(attribute,major,minor,patch) JSON_HEDLEY_GCC_VERSION_CHECK(major,minor,patch)\n#endif\n\n#if defined(JSON_HEDLEY_HAS_BUILTIN)\n    #undef JSON_HEDLEY_HAS_BUILTIN\n#endif\n#if defined(__has_builtin)\n    #define JSON_HEDLEY_HAS_BUILTIN(builtin) __has_builtin(builtin)\n#else\n    #define JSON_HEDLEY_HAS_BUILTIN(builtin) (0)\n#endif\n\n#if defined(JSON_HEDLEY_GNUC_HAS_BUILTIN)\n    #undef JSON_HEDLEY_GNUC_HAS_BUILTIN\n#endif\n#if defined(__has_builtin)\n    #define JSON_HEDLEY_GNUC_HAS_BUILTIN(builtin,major,minor,patch) __has_builtin(builtin)\n#else\n    #define JSON_HEDLEY_GNUC_HAS_BUILTIN(builtin,major,minor,patch) JSON_HEDLEY_GNUC_VERSION_CHECK(major,minor,patch)\n#endif\n\n#if defined(JSON_HEDLEY_GCC_HAS_BUILTIN)\n    #undef JSON_HEDLEY_GCC_HAS_BUILTIN\n#endif\n#if defined(__has_builtin)\n    #define JSON_HEDLEY_GCC_HAS_BUILTIN(builtin,major,minor,patch) __has_builtin(builtin)\n#else\n    #define JSON_HEDLEY_GCC_HAS_BUILTIN(builtin,major,minor,patch) JSON_HEDLEY_GCC_VERSION_CHECK(major,minor,patch)\n#endif\n\n#if defined(JSON_HEDLEY_HAS_FEATURE)\n    #undef JSON_HEDLEY_HAS_FEATURE\n#endif\n#if defined(__has_feature)\n    #define JSON_HEDLEY_HAS_FEATURE(feature) __has_feature(feature)\n#else\n    #define JSON_HEDLEY_HAS_FEATURE(feature) (0)\n#endif\n\n#if defined(JSON_HEDLEY_GNUC_HAS_FEATURE)\n    #undef JSON_HEDLEY_GNUC_HAS_FEATURE\n#endif\n#if defined(__has_feature)\n    #define JSON_HEDLEY_GNUC_HAS_FEATURE(feature,major,minor,patch) __has_feature(feature)\n#else\n    #define JSON_HEDLEY_GNUC_HAS_FEATURE(feature,major,minor,patch) JSON_HEDLEY_GNUC_VERSION_CHECK(major,minor,patch)\n#endif\n\n#if defined(JSON_HEDLEY_GCC_HAS_FEATURE)\n    #undef JSON_HEDLEY_GCC_HAS_FEATURE\n#endif\n#if defined(__has_feature)\n    #define JSON_HEDLEY_GCC_HAS_FEATURE(feature,major,minor,patch) __has_feature(feature)\n#else\n    #define JSON_HEDLEY_GCC_HAS_FEATURE(feature,major,minor,patch) JSON_HEDLEY_GCC_VERSION_CHECK(major,minor,patch)\n#endif\n\n#if defined(JSON_HEDLEY_HAS_EXTENSION)\n    #undef JSON_HEDLEY_HAS_EXTENSION\n#endif\n#if defined(__has_extension)\n    #define JSON_HEDLEY_HAS_EXTENSION(extension) __has_extension(extension)\n#else\n    #define JSON_HEDLEY_HAS_EXTENSION(extension) (0)\n#endif\n\n#if defined(JSON_HEDLEY_GNUC_HAS_EXTENSION)\n    #undef JSON_HEDLEY_GNUC_HAS_EXTENSION\n#endif\n#if defined(__has_extension)\n    #define JSON_HEDLEY_GNUC_HAS_EXTENSION(extension,major,minor,patch) __has_extension(extension)\n#else\n    #define JSON_HEDLEY_GNUC_HAS_EXTENSION(extension,major,minor,patch) JSON_HEDLEY_GNUC_VERSION_CHECK(major,minor,patch)\n#endif\n\n#if defined(JSON_HEDLEY_GCC_HAS_EXTENSION)\n    #undef JSON_HEDLEY_GCC_HAS_EXTENSION\n#endif\n#if defined(__has_extension)\n    #define JSON_HEDLEY_GCC_HAS_EXTENSION(extension,major,minor,patch) __has_extension(extension)\n#else\n    #define JSON_HEDLEY_GCC_HAS_EXTENSION(extension,major,minor,patch) JSON_HEDLEY_GCC_VERSION_CHECK(major,minor,patch)\n#endif\n\n#if defined(JSON_HEDLEY_HAS_DECLSPEC_ATTRIBUTE)\n    #undef JSON_HEDLEY_HAS_DECLSPEC_ATTRIBUTE\n#endif\n#if defined(__has_declspec_attribute)\n    #define JSON_HEDLEY_HAS_DECLSPEC_ATTRIBUTE(attribute) __has_declspec_attribute(attribute)\n#else\n    #define JSON_HEDLEY_HAS_DECLSPEC_ATTRIBUTE(attribute) (0)\n#endif\n\n#if defined(JSON_HEDLEY_GNUC_HAS_DECLSPEC_ATTRIBUTE)\n    #undef JSON_HEDLEY_GNUC_HAS_DECLSPEC_ATTRIBUTE\n#endif\n#if defined(__has_declspec_attribute)\n    #define JSON_HEDLEY_GNUC_HAS_DECLSPEC_ATTRIBUTE(attribute,major,minor,patch) __has_declspec_attribute(attribute)\n#else\n    #define JSON_HEDLEY_GNUC_HAS_DECLSPEC_ATTRIBUTE(attribute,major,minor,patch) JSON_HEDLEY_GNUC_VERSION_CHECK(major,minor,patch)\n#endif\n\n#if defined(JSON_HEDLEY_GCC_HAS_DECLSPEC_ATTRIBUTE)\n    #undef JSON_HEDLEY_GCC_HAS_DECLSPEC_ATTRIBUTE\n#endif\n#if defined(__has_declspec_attribute)\n    #define JSON_HEDLEY_GCC_HAS_DECLSPEC_ATTRIBUTE(attribute,major,minor,patch) __has_declspec_attribute(attribute)\n#else\n    #define JSON_HEDLEY_GCC_HAS_DECLSPEC_ATTRIBUTE(attribute,major,minor,patch) JSON_HEDLEY_GCC_VERSION_CHECK(major,minor,patch)\n#endif\n\n#if defined(JSON_HEDLEY_HAS_WARNING)\n    #undef JSON_HEDLEY_HAS_WARNING\n#endif\n#if defined(__has_warning)\n    #define JSON_HEDLEY_HAS_WARNING(warning) __has_warning(warning)\n#else\n    #define JSON_HEDLEY_HAS_WARNING(warning) (0)\n#endif\n\n#if defined(JSON_HEDLEY_GNUC_HAS_WARNING)\n    #undef JSON_HEDLEY_GNUC_HAS_WARNING\n#endif\n#if defined(__has_warning)\n    #define JSON_HEDLEY_GNUC_HAS_WARNING(warning,major,minor,patch) __has_warning(warning)\n#else\n    #define JSON_HEDLEY_GNUC_HAS_WARNING(warning,major,minor,patch) JSON_HEDLEY_GNUC_VERSION_CHECK(major,minor,patch)\n#endif\n\n#if defined(JSON_HEDLEY_GCC_HAS_WARNING)\n    #undef JSON_HEDLEY_GCC_HAS_WARNING\n#endif\n#if defined(__has_warning)\n    #define JSON_HEDLEY_GCC_HAS_WARNING(warning,major,minor,patch) __has_warning(warning)\n#else\n    #define JSON_HEDLEY_GCC_HAS_WARNING(warning,major,minor,patch) JSON_HEDLEY_GCC_VERSION_CHECK(major,minor,patch)\n#endif\n\n#if \\\n    (defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)) || \\\n    defined(__clang__) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(3,0,0) || \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_IAR_VERSION_CHECK(8,0,0) || \\\n    JSON_HEDLEY_PGI_VERSION_CHECK(18,4,0) || \\\n    JSON_HEDLEY_ARM_VERSION_CHECK(4,1,0) || \\\n    JSON_HEDLEY_TI_VERSION_CHECK(15,12,0) || \\\n    JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(4,7,0) || \\\n    JSON_HEDLEY_TI_CL430_VERSION_CHECK(2,0,1) || \\\n    JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,1,0) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,0,0) || \\\n    JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n    JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,1,0) || \\\n    JSON_HEDLEY_CRAY_VERSION_CHECK(5,0,0) || \\\n    JSON_HEDLEY_TINYC_VERSION_CHECK(0,9,17) || \\\n    JSON_HEDLEY_SUNPRO_VERSION_CHECK(8,0,0) || \\\n    (JSON_HEDLEY_IBM_VERSION_CHECK(10,1,0) && defined(__C99_PRAGMA_OPERATOR))\n    #define JSON_HEDLEY_PRAGMA(value) _Pragma(#value)\n#elif JSON_HEDLEY_MSVC_VERSION_CHECK(15,0,0)\n    #define JSON_HEDLEY_PRAGMA(value) __pragma(value)\n#else\n    #define JSON_HEDLEY_PRAGMA(value)\n#endif\n\n#if defined(JSON_HEDLEY_DIAGNOSTIC_PUSH)\n    #undef JSON_HEDLEY_DIAGNOSTIC_PUSH\n#endif\n#if defined(JSON_HEDLEY_DIAGNOSTIC_POP)\n    #undef JSON_HEDLEY_DIAGNOSTIC_POP\n#endif\n#if defined(__clang__)\n    #define JSON_HEDLEY_DIAGNOSTIC_PUSH _Pragma(\"clang diagnostic push\")\n    #define JSON_HEDLEY_DIAGNOSTIC_POP _Pragma(\"clang diagnostic pop\")\n#elif JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_PUSH _Pragma(\"warning(push)\")\n    #define JSON_HEDLEY_DIAGNOSTIC_POP _Pragma(\"warning(pop)\")\n#elif JSON_HEDLEY_GCC_VERSION_CHECK(4,6,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_PUSH _Pragma(\"GCC diagnostic push\")\n    #define JSON_HEDLEY_DIAGNOSTIC_POP _Pragma(\"GCC diagnostic pop\")\n#elif \\\n    JSON_HEDLEY_MSVC_VERSION_CHECK(15,0,0) || \\\n    JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_PUSH __pragma(warning(push))\n    #define JSON_HEDLEY_DIAGNOSTIC_POP __pragma(warning(pop))\n#elif JSON_HEDLEY_ARM_VERSION_CHECK(5,6,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_PUSH _Pragma(\"push\")\n    #define JSON_HEDLEY_DIAGNOSTIC_POP _Pragma(\"pop\")\n#elif \\\n    JSON_HEDLEY_TI_VERSION_CHECK(15,12,0) || \\\n    JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(5,2,0) || \\\n    JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,4,0) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(8,1,0) || \\\n    JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n    JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,1,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_PUSH _Pragma(\"diag_push\")\n    #define JSON_HEDLEY_DIAGNOSTIC_POP _Pragma(\"diag_pop\")\n#elif JSON_HEDLEY_PELLES_VERSION_CHECK(2,90,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_PUSH _Pragma(\"warning(push)\")\n    #define JSON_HEDLEY_DIAGNOSTIC_POP _Pragma(\"warning(pop)\")\n#else\n    #define JSON_HEDLEY_DIAGNOSTIC_PUSH\n    #define JSON_HEDLEY_DIAGNOSTIC_POP\n#endif\n\n/* JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_ is for\n   HEDLEY INTERNAL USE ONLY.  API subject to change without notice. */\n#if defined(JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_)\n    #undef JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_\n#endif\n#if defined(__cplusplus)\n#  if JSON_HEDLEY_HAS_WARNING(\"-Wc++98-compat\")\n#    if JSON_HEDLEY_HAS_WARNING(\"-Wc++17-extensions\")\n#      if JSON_HEDLEY_HAS_WARNING(\"-Wc++1z-extensions\")\n#        define JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_(xpr) \\\n    JSON_HEDLEY_DIAGNOSTIC_PUSH \\\n    _Pragma(\"clang diagnostic ignored \\\"-Wc++98-compat\\\"\") \\\n    _Pragma(\"clang diagnostic ignored \\\"-Wc++17-extensions\\\"\") \\\n    _Pragma(\"clang diagnostic ignored \\\"-Wc++1z-extensions\\\"\") \\\n    xpr \\\n    JSON_HEDLEY_DIAGNOSTIC_POP\n#      else\n#        define JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_(xpr) \\\n    JSON_HEDLEY_DIAGNOSTIC_PUSH \\\n    _Pragma(\"clang diagnostic ignored \\\"-Wc++98-compat\\\"\") \\\n    _Pragma(\"clang diagnostic ignored \\\"-Wc++17-extensions\\\"\") \\\n    xpr \\\n    JSON_HEDLEY_DIAGNOSTIC_POP\n#      endif\n#    else\n#      define JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_(xpr) \\\n    JSON_HEDLEY_DIAGNOSTIC_PUSH \\\n    _Pragma(\"clang diagnostic ignored \\\"-Wc++98-compat\\\"\") \\\n    xpr \\\n    JSON_HEDLEY_DIAGNOSTIC_POP\n#    endif\n#  endif\n#endif\n#if !defined(JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_(x) x\n#endif\n\n#if defined(JSON_HEDLEY_CONST_CAST)\n    #undef JSON_HEDLEY_CONST_CAST\n#endif\n#if defined(__cplusplus)\n#  define JSON_HEDLEY_CONST_CAST(T, expr) (const_cast<T>(expr))\n#elif \\\n  JSON_HEDLEY_HAS_WARNING(\"-Wcast-qual\") || \\\n  JSON_HEDLEY_GCC_VERSION_CHECK(4,6,0) || \\\n  JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0)\n#  define JSON_HEDLEY_CONST_CAST(T, expr) (__extension__ ({ \\\n        JSON_HEDLEY_DIAGNOSTIC_PUSH \\\n        JSON_HEDLEY_DIAGNOSTIC_DISABLE_CAST_QUAL \\\n        ((T) (expr)); \\\n        JSON_HEDLEY_DIAGNOSTIC_POP \\\n    }))\n#else\n#  define JSON_HEDLEY_CONST_CAST(T, expr) ((T) (expr))\n#endif\n\n#if defined(JSON_HEDLEY_REINTERPRET_CAST)\n    #undef JSON_HEDLEY_REINTERPRET_CAST\n#endif\n#if defined(__cplusplus)\n    #define JSON_HEDLEY_REINTERPRET_CAST(T, expr) (reinterpret_cast<T>(expr))\n#else\n    #define JSON_HEDLEY_REINTERPRET_CAST(T, expr) ((T) (expr))\n#endif\n\n#if defined(JSON_HEDLEY_STATIC_CAST)\n    #undef JSON_HEDLEY_STATIC_CAST\n#endif\n#if defined(__cplusplus)\n    #define JSON_HEDLEY_STATIC_CAST(T, expr) (static_cast<T>(expr))\n#else\n    #define JSON_HEDLEY_STATIC_CAST(T, expr) ((T) (expr))\n#endif\n\n#if defined(JSON_HEDLEY_CPP_CAST)\n    #undef JSON_HEDLEY_CPP_CAST\n#endif\n#if defined(__cplusplus)\n#  if JSON_HEDLEY_HAS_WARNING(\"-Wold-style-cast\")\n#    define JSON_HEDLEY_CPP_CAST(T, expr) \\\n    JSON_HEDLEY_DIAGNOSTIC_PUSH \\\n    _Pragma(\"clang diagnostic ignored \\\"-Wold-style-cast\\\"\") \\\n    ((T) (expr)) \\\n    JSON_HEDLEY_DIAGNOSTIC_POP\n#  elif JSON_HEDLEY_IAR_VERSION_CHECK(8,3,0)\n#    define JSON_HEDLEY_CPP_CAST(T, expr) \\\n    JSON_HEDLEY_DIAGNOSTIC_PUSH \\\n    _Pragma(\"diag_suppress=Pe137\") \\\n    JSON_HEDLEY_DIAGNOSTIC_POP\n#  else\n#    define JSON_HEDLEY_CPP_CAST(T, expr) ((T) (expr))\n#  endif\n#else\n#  define JSON_HEDLEY_CPP_CAST(T, expr) (expr)\n#endif\n\n#if defined(JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED)\n    #undef JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED\n#endif\n#if JSON_HEDLEY_HAS_WARNING(\"-Wdeprecated-declarations\")\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED _Pragma(\"clang diagnostic ignored \\\"-Wdeprecated-declarations\\\"\")\n#elif JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED _Pragma(\"warning(disable:1478 1786)\")\n#elif JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED __pragma(warning(disable:1478 1786))\n#elif JSON_HEDLEY_PGI_VERSION_CHECK(20,7,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED _Pragma(\"diag_suppress 1215,1216,1444,1445\")\n#elif JSON_HEDLEY_PGI_VERSION_CHECK(17,10,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED _Pragma(\"diag_suppress 1215,1444\")\n#elif JSON_HEDLEY_GCC_VERSION_CHECK(4,3,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED _Pragma(\"GCC diagnostic ignored \\\"-Wdeprecated-declarations\\\"\")\n#elif JSON_HEDLEY_MSVC_VERSION_CHECK(15,0,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED __pragma(warning(disable:4996))\n#elif JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED _Pragma(\"diag_suppress 1215,1444\")\n#elif \\\n    JSON_HEDLEY_TI_VERSION_CHECK(15,12,0) || \\\n    (JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(4,8,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(5,2,0) || \\\n    (JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,4,0) || \\\n    (JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,3,0) || \\\n    (JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,2,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,5,0) || \\\n    JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n    JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,1,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED _Pragma(\"diag_suppress 1291,1718\")\n#elif JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,13,0) && !defined(__cplusplus)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED _Pragma(\"error_messages(off,E_DEPRECATED_ATT,E_DEPRECATED_ATT_MESS)\")\n#elif JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,13,0) && defined(__cplusplus)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED _Pragma(\"error_messages(off,symdeprecated,symdeprecated2)\")\n#elif JSON_HEDLEY_IAR_VERSION_CHECK(8,0,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED _Pragma(\"diag_suppress=Pe1444,Pe1215\")\n#elif JSON_HEDLEY_PELLES_VERSION_CHECK(2,90,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED _Pragma(\"warn(disable:2241)\")\n#else\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED\n#endif\n\n#if defined(JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_PRAGMAS)\n    #undef JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_PRAGMAS\n#endif\n#if JSON_HEDLEY_HAS_WARNING(\"-Wunknown-pragmas\")\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_PRAGMAS _Pragma(\"clang diagnostic ignored \\\"-Wunknown-pragmas\\\"\")\n#elif JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_PRAGMAS _Pragma(\"warning(disable:161)\")\n#elif JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_PRAGMAS __pragma(warning(disable:161))\n#elif JSON_HEDLEY_PGI_VERSION_CHECK(17,10,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_PRAGMAS _Pragma(\"diag_suppress 1675\")\n#elif JSON_HEDLEY_GCC_VERSION_CHECK(4,3,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_PRAGMAS _Pragma(\"GCC diagnostic ignored \\\"-Wunknown-pragmas\\\"\")\n#elif JSON_HEDLEY_MSVC_VERSION_CHECK(15,0,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_PRAGMAS __pragma(warning(disable:4068))\n#elif \\\n    JSON_HEDLEY_TI_VERSION_CHECK(16,9,0) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(8,0,0) || \\\n    JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n    JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,3,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_PRAGMAS _Pragma(\"diag_suppress 163\")\n#elif JSON_HEDLEY_TI_CL6X_VERSION_CHECK(8,0,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_PRAGMAS _Pragma(\"diag_suppress 163\")\n#elif JSON_HEDLEY_IAR_VERSION_CHECK(8,0,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_PRAGMAS _Pragma(\"diag_suppress=Pe161\")\n#elif JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_PRAGMAS _Pragma(\"diag_suppress 161\")\n#else\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_PRAGMAS\n#endif\n\n#if defined(JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_CPP_ATTRIBUTES)\n    #undef JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_CPP_ATTRIBUTES\n#endif\n#if JSON_HEDLEY_HAS_WARNING(\"-Wunknown-attributes\")\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_CPP_ATTRIBUTES _Pragma(\"clang diagnostic ignored \\\"-Wunknown-attributes\\\"\")\n#elif JSON_HEDLEY_GCC_VERSION_CHECK(4,6,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_CPP_ATTRIBUTES _Pragma(\"GCC diagnostic ignored \\\"-Wdeprecated-declarations\\\"\")\n#elif JSON_HEDLEY_INTEL_VERSION_CHECK(17,0,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_CPP_ATTRIBUTES _Pragma(\"warning(disable:1292)\")\n#elif JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_CPP_ATTRIBUTES __pragma(warning(disable:1292))\n#elif JSON_HEDLEY_MSVC_VERSION_CHECK(19,0,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_CPP_ATTRIBUTES __pragma(warning(disable:5030))\n#elif JSON_HEDLEY_PGI_VERSION_CHECK(20,7,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_CPP_ATTRIBUTES _Pragma(\"diag_suppress 1097,1098\")\n#elif JSON_HEDLEY_PGI_VERSION_CHECK(17,10,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_CPP_ATTRIBUTES _Pragma(\"diag_suppress 1097\")\n#elif JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,14,0) && defined(__cplusplus)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_CPP_ATTRIBUTES _Pragma(\"error_messages(off,attrskipunsup)\")\n#elif \\\n    JSON_HEDLEY_TI_VERSION_CHECK(18,1,0) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(8,3,0) || \\\n    JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_CPP_ATTRIBUTES _Pragma(\"diag_suppress 1173\")\n#elif JSON_HEDLEY_IAR_VERSION_CHECK(8,0,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_CPP_ATTRIBUTES _Pragma(\"diag_suppress=Pe1097\")\n#elif JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_CPP_ATTRIBUTES _Pragma(\"diag_suppress 1097\")\n#else\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_CPP_ATTRIBUTES\n#endif\n\n#if defined(JSON_HEDLEY_DIAGNOSTIC_DISABLE_CAST_QUAL)\n    #undef JSON_HEDLEY_DIAGNOSTIC_DISABLE_CAST_QUAL\n#endif\n#if JSON_HEDLEY_HAS_WARNING(\"-Wcast-qual\")\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_CAST_QUAL _Pragma(\"clang diagnostic ignored \\\"-Wcast-qual\\\"\")\n#elif JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_CAST_QUAL _Pragma(\"warning(disable:2203 2331)\")\n#elif JSON_HEDLEY_GCC_VERSION_CHECK(3,0,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_CAST_QUAL _Pragma(\"GCC diagnostic ignored \\\"-Wcast-qual\\\"\")\n#else\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_CAST_QUAL\n#endif\n\n#if defined(JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNUSED_FUNCTION)\n    #undef JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNUSED_FUNCTION\n#endif\n#if JSON_HEDLEY_HAS_WARNING(\"-Wunused-function\")\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNUSED_FUNCTION _Pragma(\"clang diagnostic ignored \\\"-Wunused-function\\\"\")\n#elif JSON_HEDLEY_GCC_VERSION_CHECK(3,4,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNUSED_FUNCTION _Pragma(\"GCC diagnostic ignored \\\"-Wunused-function\\\"\")\n#elif JSON_HEDLEY_MSVC_VERSION_CHECK(1,0,0)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNUSED_FUNCTION __pragma(warning(disable:4505))\n#elif JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNUSED_FUNCTION _Pragma(\"diag_suppress 3142\")\n#else\n    #define JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNUSED_FUNCTION\n#endif\n\n#if defined(JSON_HEDLEY_DEPRECATED)\n    #undef JSON_HEDLEY_DEPRECATED\n#endif\n#if defined(JSON_HEDLEY_DEPRECATED_FOR)\n    #undef JSON_HEDLEY_DEPRECATED_FOR\n#endif\n#if \\\n    JSON_HEDLEY_MSVC_VERSION_CHECK(14,0,0) || \\\n    JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0)\n    #define JSON_HEDLEY_DEPRECATED(since) __declspec(deprecated(\"Since \" # since))\n    #define JSON_HEDLEY_DEPRECATED_FOR(since, replacement) __declspec(deprecated(\"Since \" #since \"; use \" #replacement))\n#elif \\\n    (JSON_HEDLEY_HAS_EXTENSION(attribute_deprecated_with_message) && !defined(JSON_HEDLEY_IAR_VERSION)) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(4,5,0) || \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_ARM_VERSION_CHECK(5,6,0) || \\\n    JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,13,0) || \\\n    JSON_HEDLEY_PGI_VERSION_CHECK(17,10,0) || \\\n    JSON_HEDLEY_TI_VERSION_CHECK(18,1,0) || \\\n    JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(18,1,0) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(8,3,0) || \\\n    JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n    JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,3,0) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_DEPRECATED(since) __attribute__((__deprecated__(\"Since \" #since)))\n    #define JSON_HEDLEY_DEPRECATED_FOR(since, replacement) __attribute__((__deprecated__(\"Since \" #since \"; use \" #replacement)))\n#elif defined(__cplusplus) && (__cplusplus >= 201402L)\n    #define JSON_HEDLEY_DEPRECATED(since) JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_([[deprecated(\"Since \" #since)]])\n    #define JSON_HEDLEY_DEPRECATED_FOR(since, replacement) JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_([[deprecated(\"Since \" #since \"; use \" #replacement)]])\n#elif \\\n    JSON_HEDLEY_HAS_ATTRIBUTE(deprecated) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(3,1,0) || \\\n    JSON_HEDLEY_ARM_VERSION_CHECK(4,1,0) || \\\n    JSON_HEDLEY_TI_VERSION_CHECK(15,12,0) || \\\n    (JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(4,8,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(5,2,0) || \\\n    (JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,4,0) || \\\n    (JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,3,0) || \\\n    (JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,2,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,5,0) || \\\n    JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n    JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,1,0) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10) || \\\n    JSON_HEDLEY_IAR_VERSION_CHECK(8,10,0)\n    #define JSON_HEDLEY_DEPRECATED(since) __attribute__((__deprecated__))\n    #define JSON_HEDLEY_DEPRECATED_FOR(since, replacement) __attribute__((__deprecated__))\n#elif \\\n    JSON_HEDLEY_MSVC_VERSION_CHECK(13,10,0) || \\\n    JSON_HEDLEY_PELLES_VERSION_CHECK(6,50,0) || \\\n    JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0)\n    #define JSON_HEDLEY_DEPRECATED(since) __declspec(deprecated)\n    #define JSON_HEDLEY_DEPRECATED_FOR(since, replacement) __declspec(deprecated)\n#elif JSON_HEDLEY_IAR_VERSION_CHECK(8,0,0)\n    #define JSON_HEDLEY_DEPRECATED(since) _Pragma(\"deprecated\")\n    #define JSON_HEDLEY_DEPRECATED_FOR(since, replacement) _Pragma(\"deprecated\")\n#else\n    #define JSON_HEDLEY_DEPRECATED(since)\n    #define JSON_HEDLEY_DEPRECATED_FOR(since, replacement)\n#endif\n\n#if defined(JSON_HEDLEY_UNAVAILABLE)\n    #undef JSON_HEDLEY_UNAVAILABLE\n#endif\n#if \\\n    JSON_HEDLEY_HAS_ATTRIBUTE(warning) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(4,3,0) || \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_UNAVAILABLE(available_since) __attribute__((__warning__(\"Not available until \" #available_since)))\n#else\n    #define JSON_HEDLEY_UNAVAILABLE(available_since)\n#endif\n\n#if defined(JSON_HEDLEY_WARN_UNUSED_RESULT)\n    #undef JSON_HEDLEY_WARN_UNUSED_RESULT\n#endif\n#if defined(JSON_HEDLEY_WARN_UNUSED_RESULT_MSG)\n    #undef JSON_HEDLEY_WARN_UNUSED_RESULT_MSG\n#endif\n#if \\\n    JSON_HEDLEY_HAS_ATTRIBUTE(warn_unused_result) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(3,4,0) || \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_TI_VERSION_CHECK(15,12,0) || \\\n    (JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(4,8,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(5,2,0) || \\\n    (JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,4,0) || \\\n    (JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,3,0) || \\\n    (JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,2,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,5,0) || \\\n    JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n    JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,1,0) || \\\n    (JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,15,0) && defined(__cplusplus)) || \\\n    JSON_HEDLEY_PGI_VERSION_CHECK(17,10,0) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_WARN_UNUSED_RESULT __attribute__((__warn_unused_result__))\n    #define JSON_HEDLEY_WARN_UNUSED_RESULT_MSG(msg) __attribute__((__warn_unused_result__))\n#elif (JSON_HEDLEY_HAS_CPP_ATTRIBUTE(nodiscard) >= 201907L)\n    #define JSON_HEDLEY_WARN_UNUSED_RESULT JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_([[nodiscard]])\n    #define JSON_HEDLEY_WARN_UNUSED_RESULT_MSG(msg) JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_([[nodiscard(msg)]])\n#elif JSON_HEDLEY_HAS_CPP_ATTRIBUTE(nodiscard)\n    #define JSON_HEDLEY_WARN_UNUSED_RESULT JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_([[nodiscard]])\n    #define JSON_HEDLEY_WARN_UNUSED_RESULT_MSG(msg) JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_([[nodiscard]])\n#elif defined(_Check_return_) /* SAL */\n    #define JSON_HEDLEY_WARN_UNUSED_RESULT _Check_return_\n    #define JSON_HEDLEY_WARN_UNUSED_RESULT_MSG(msg) _Check_return_\n#else\n    #define JSON_HEDLEY_WARN_UNUSED_RESULT\n    #define JSON_HEDLEY_WARN_UNUSED_RESULT_MSG(msg)\n#endif\n\n#if defined(JSON_HEDLEY_SENTINEL)\n    #undef JSON_HEDLEY_SENTINEL\n#endif\n#if \\\n    JSON_HEDLEY_HAS_ATTRIBUTE(sentinel) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(4,0,0) || \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_ARM_VERSION_CHECK(5,4,0) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_SENTINEL(position) __attribute__((__sentinel__(position)))\n#else\n    #define JSON_HEDLEY_SENTINEL(position)\n#endif\n\n#if defined(JSON_HEDLEY_NO_RETURN)\n    #undef JSON_HEDLEY_NO_RETURN\n#endif\n#if JSON_HEDLEY_IAR_VERSION_CHECK(8,0,0)\n    #define JSON_HEDLEY_NO_RETURN __noreturn\n#elif \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_NO_RETURN __attribute__((__noreturn__))\n#elif defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L\n    #define JSON_HEDLEY_NO_RETURN _Noreturn\n#elif defined(__cplusplus) && (__cplusplus >= 201103L)\n    #define JSON_HEDLEY_NO_RETURN JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_([[noreturn]])\n#elif \\\n    JSON_HEDLEY_HAS_ATTRIBUTE(noreturn) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(3,2,0) || \\\n    JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,11,0) || \\\n    JSON_HEDLEY_ARM_VERSION_CHECK(4,1,0) || \\\n    JSON_HEDLEY_IBM_VERSION_CHECK(10,1,0) || \\\n    JSON_HEDLEY_TI_VERSION_CHECK(15,12,0) || \\\n    (JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(4,8,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(5,2,0) || \\\n    (JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,4,0) || \\\n    (JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,3,0) || \\\n    (JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,2,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,5,0) || \\\n    JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n    JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,1,0) || \\\n    JSON_HEDLEY_IAR_VERSION_CHECK(8,10,0)\n    #define JSON_HEDLEY_NO_RETURN __attribute__((__noreturn__))\n#elif JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,10,0)\n    #define JSON_HEDLEY_NO_RETURN _Pragma(\"does_not_return\")\n#elif \\\n    JSON_HEDLEY_MSVC_VERSION_CHECK(13,10,0) || \\\n    JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0)\n    #define JSON_HEDLEY_NO_RETURN __declspec(noreturn)\n#elif JSON_HEDLEY_TI_CL6X_VERSION_CHECK(6,0,0) && defined(__cplusplus)\n    #define JSON_HEDLEY_NO_RETURN _Pragma(\"FUNC_NEVER_RETURNS;\")\n#elif JSON_HEDLEY_COMPCERT_VERSION_CHECK(3,2,0)\n    #define JSON_HEDLEY_NO_RETURN __attribute((noreturn))\n#elif JSON_HEDLEY_PELLES_VERSION_CHECK(9,0,0)\n    #define JSON_HEDLEY_NO_RETURN __declspec(noreturn)\n#else\n    #define JSON_HEDLEY_NO_RETURN\n#endif\n\n#if defined(JSON_HEDLEY_NO_ESCAPE)\n    #undef JSON_HEDLEY_NO_ESCAPE\n#endif\n#if JSON_HEDLEY_HAS_ATTRIBUTE(noescape)\n    #define JSON_HEDLEY_NO_ESCAPE __attribute__((__noescape__))\n#else\n    #define JSON_HEDLEY_NO_ESCAPE\n#endif\n\n#if defined(JSON_HEDLEY_UNREACHABLE)\n    #undef JSON_HEDLEY_UNREACHABLE\n#endif\n#if defined(JSON_HEDLEY_UNREACHABLE_RETURN)\n    #undef JSON_HEDLEY_UNREACHABLE_RETURN\n#endif\n#if defined(JSON_HEDLEY_ASSUME)\n    #undef JSON_HEDLEY_ASSUME\n#endif\n#if \\\n    JSON_HEDLEY_MSVC_VERSION_CHECK(13,10,0) || \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0)\n    #define JSON_HEDLEY_ASSUME(expr) __assume(expr)\n#elif JSON_HEDLEY_HAS_BUILTIN(__builtin_assume)\n    #define JSON_HEDLEY_ASSUME(expr) __builtin_assume(expr)\n#elif \\\n    JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,2,0) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(4,0,0)\n    #if defined(__cplusplus)\n        #define JSON_HEDLEY_ASSUME(expr) std::_nassert(expr)\n    #else\n        #define JSON_HEDLEY_ASSUME(expr) _nassert(expr)\n    #endif\n#endif\n#if \\\n    (JSON_HEDLEY_HAS_BUILTIN(__builtin_unreachable) && (!defined(JSON_HEDLEY_ARM_VERSION))) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(4,5,0) || \\\n    JSON_HEDLEY_PGI_VERSION_CHECK(18,10,0) || \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_IBM_VERSION_CHECK(13,1,5) || \\\n    JSON_HEDLEY_CRAY_VERSION_CHECK(10,0,0) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_UNREACHABLE() __builtin_unreachable()\n#elif defined(JSON_HEDLEY_ASSUME)\n    #define JSON_HEDLEY_UNREACHABLE() JSON_HEDLEY_ASSUME(0)\n#endif\n#if !defined(JSON_HEDLEY_ASSUME)\n    #if defined(JSON_HEDLEY_UNREACHABLE)\n        #define JSON_HEDLEY_ASSUME(expr) JSON_HEDLEY_STATIC_CAST(void, ((expr) ? 1 : (JSON_HEDLEY_UNREACHABLE(), 1)))\n    #else\n        #define JSON_HEDLEY_ASSUME(expr) JSON_HEDLEY_STATIC_CAST(void, expr)\n    #endif\n#endif\n#if defined(JSON_HEDLEY_UNREACHABLE)\n    #if  \\\n        JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,2,0) || \\\n        JSON_HEDLEY_TI_CL6X_VERSION_CHECK(4,0,0)\n        #define JSON_HEDLEY_UNREACHABLE_RETURN(value) return (JSON_HEDLEY_STATIC_CAST(void, JSON_HEDLEY_ASSUME(0)), (value))\n    #else\n        #define JSON_HEDLEY_UNREACHABLE_RETURN(value) JSON_HEDLEY_UNREACHABLE()\n    #endif\n#else\n    #define JSON_HEDLEY_UNREACHABLE_RETURN(value) return (value)\n#endif\n#if !defined(JSON_HEDLEY_UNREACHABLE)\n    #define JSON_HEDLEY_UNREACHABLE() JSON_HEDLEY_ASSUME(0)\n#endif\n\nJSON_HEDLEY_DIAGNOSTIC_PUSH\n#if JSON_HEDLEY_HAS_WARNING(\"-Wpedantic\")\n    #pragma clang diagnostic ignored \"-Wpedantic\"\n#endif\n#if JSON_HEDLEY_HAS_WARNING(\"-Wc++98-compat-pedantic\") && defined(__cplusplus)\n    #pragma clang diagnostic ignored \"-Wc++98-compat-pedantic\"\n#endif\n#if JSON_HEDLEY_GCC_HAS_WARNING(\"-Wvariadic-macros\",4,0,0)\n    #if defined(__clang__)\n        #pragma clang diagnostic ignored \"-Wvariadic-macros\"\n    #elif defined(JSON_HEDLEY_GCC_VERSION)\n        #pragma GCC diagnostic ignored \"-Wvariadic-macros\"\n    #endif\n#endif\n#if defined(JSON_HEDLEY_NON_NULL)\n    #undef JSON_HEDLEY_NON_NULL\n#endif\n#if \\\n    JSON_HEDLEY_HAS_ATTRIBUTE(nonnull) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(3,3,0) || \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_ARM_VERSION_CHECK(4,1,0)\n    #define JSON_HEDLEY_NON_NULL(...) __attribute__((__nonnull__(__VA_ARGS__)))\n#else\n    #define JSON_HEDLEY_NON_NULL(...)\n#endif\nJSON_HEDLEY_DIAGNOSTIC_POP\n\n#if defined(JSON_HEDLEY_PRINTF_FORMAT)\n    #undef JSON_HEDLEY_PRINTF_FORMAT\n#endif\n#if defined(__MINGW32__) && JSON_HEDLEY_GCC_HAS_ATTRIBUTE(format,4,4,0) && !defined(__USE_MINGW_ANSI_STDIO)\n    #define JSON_HEDLEY_PRINTF_FORMAT(string_idx,first_to_check) __attribute__((__format__(ms_printf, string_idx, first_to_check)))\n#elif defined(__MINGW32__) && JSON_HEDLEY_GCC_HAS_ATTRIBUTE(format,4,4,0) && defined(__USE_MINGW_ANSI_STDIO)\n    #define JSON_HEDLEY_PRINTF_FORMAT(string_idx,first_to_check) __attribute__((__format__(gnu_printf, string_idx, first_to_check)))\n#elif \\\n    JSON_HEDLEY_HAS_ATTRIBUTE(format) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(3,1,0) || \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_ARM_VERSION_CHECK(5,6,0) || \\\n    JSON_HEDLEY_IBM_VERSION_CHECK(10,1,0) || \\\n    JSON_HEDLEY_TI_VERSION_CHECK(15,12,0) || \\\n    (JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(4,8,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(5,2,0) || \\\n    (JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,4,0) || \\\n    (JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,3,0) || \\\n    (JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,2,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,5,0) || \\\n    JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n    JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,1,0) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_PRINTF_FORMAT(string_idx,first_to_check) __attribute__((__format__(__printf__, string_idx, first_to_check)))\n#elif JSON_HEDLEY_PELLES_VERSION_CHECK(6,0,0)\n    #define JSON_HEDLEY_PRINTF_FORMAT(string_idx,first_to_check) __declspec(vaformat(printf,string_idx,first_to_check))\n#else\n    #define JSON_HEDLEY_PRINTF_FORMAT(string_idx,first_to_check)\n#endif\n\n#if defined(JSON_HEDLEY_CONSTEXPR)\n    #undef JSON_HEDLEY_CONSTEXPR\n#endif\n#if defined(__cplusplus)\n    #if __cplusplus >= 201103L\n        #define JSON_HEDLEY_CONSTEXPR JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_(constexpr)\n    #endif\n#endif\n#if !defined(JSON_HEDLEY_CONSTEXPR)\n    #define JSON_HEDLEY_CONSTEXPR\n#endif\n\n#if defined(JSON_HEDLEY_PREDICT)\n    #undef JSON_HEDLEY_PREDICT\n#endif\n#if defined(JSON_HEDLEY_LIKELY)\n    #undef JSON_HEDLEY_LIKELY\n#endif\n#if defined(JSON_HEDLEY_UNLIKELY)\n    #undef JSON_HEDLEY_UNLIKELY\n#endif\n#if defined(JSON_HEDLEY_UNPREDICTABLE)\n    #undef JSON_HEDLEY_UNPREDICTABLE\n#endif\n#if JSON_HEDLEY_HAS_BUILTIN(__builtin_unpredictable)\n    #define JSON_HEDLEY_UNPREDICTABLE(expr) __builtin_unpredictable((expr))\n#endif\n#if \\\n  (JSON_HEDLEY_HAS_BUILTIN(__builtin_expect_with_probability) && !defined(JSON_HEDLEY_PGI_VERSION)) || \\\n  JSON_HEDLEY_GCC_VERSION_CHECK(9,0,0) || \\\n  JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n#  define JSON_HEDLEY_PREDICT(expr, value, probability) __builtin_expect_with_probability(  (expr), (value), (probability))\n#  define JSON_HEDLEY_PREDICT_TRUE(expr, probability)   __builtin_expect_with_probability(!!(expr),    1   , (probability))\n#  define JSON_HEDLEY_PREDICT_FALSE(expr, probability)  __builtin_expect_with_probability(!!(expr),    0   , (probability))\n#  define JSON_HEDLEY_LIKELY(expr)                      __builtin_expect                 (!!(expr),    1                  )\n#  define JSON_HEDLEY_UNLIKELY(expr)                    __builtin_expect                 (!!(expr),    0                  )\n#elif \\\n  (JSON_HEDLEY_HAS_BUILTIN(__builtin_expect) && !defined(JSON_HEDLEY_INTEL_CL_VERSION)) || \\\n  JSON_HEDLEY_GCC_VERSION_CHECK(3,0,0) || \\\n  JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n  (JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,15,0) && defined(__cplusplus)) || \\\n  JSON_HEDLEY_ARM_VERSION_CHECK(4,1,0) || \\\n  JSON_HEDLEY_IBM_VERSION_CHECK(10,1,0) || \\\n  JSON_HEDLEY_TI_VERSION_CHECK(15,12,0) || \\\n  JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(4,7,0) || \\\n  JSON_HEDLEY_TI_CL430_VERSION_CHECK(3,1,0) || \\\n  JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,1,0) || \\\n  JSON_HEDLEY_TI_CL6X_VERSION_CHECK(6,1,0) || \\\n  JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n  JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,1,0) || \\\n  JSON_HEDLEY_TINYC_VERSION_CHECK(0,9,27) || \\\n  JSON_HEDLEY_CRAY_VERSION_CHECK(8,1,0) || \\\n  JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n#  define JSON_HEDLEY_PREDICT(expr, expected, probability) \\\n    (((probability) >= 0.9) ? __builtin_expect((expr), (expected)) : (JSON_HEDLEY_STATIC_CAST(void, expected), (expr)))\n#  define JSON_HEDLEY_PREDICT_TRUE(expr, probability) \\\n    (__extension__ ({ \\\n        double hedley_probability_ = (probability); \\\n        ((hedley_probability_ >= 0.9) ? __builtin_expect(!!(expr), 1) : ((hedley_probability_ <= 0.1) ? __builtin_expect(!!(expr), 0) : !!(expr))); \\\n    }))\n#  define JSON_HEDLEY_PREDICT_FALSE(expr, probability) \\\n    (__extension__ ({ \\\n        double hedley_probability_ = (probability); \\\n        ((hedley_probability_ >= 0.9) ? __builtin_expect(!!(expr), 0) : ((hedley_probability_ <= 0.1) ? __builtin_expect(!!(expr), 1) : !!(expr))); \\\n    }))\n#  define JSON_HEDLEY_LIKELY(expr)   __builtin_expect(!!(expr), 1)\n#  define JSON_HEDLEY_UNLIKELY(expr) __builtin_expect(!!(expr), 0)\n#else\n#  define JSON_HEDLEY_PREDICT(expr, expected, probability) (JSON_HEDLEY_STATIC_CAST(void, expected), (expr))\n#  define JSON_HEDLEY_PREDICT_TRUE(expr, probability) (!!(expr))\n#  define JSON_HEDLEY_PREDICT_FALSE(expr, probability) (!!(expr))\n#  define JSON_HEDLEY_LIKELY(expr) (!!(expr))\n#  define JSON_HEDLEY_UNLIKELY(expr) (!!(expr))\n#endif\n#if !defined(JSON_HEDLEY_UNPREDICTABLE)\n    #define JSON_HEDLEY_UNPREDICTABLE(expr) JSON_HEDLEY_PREDICT(expr, 1, 0.5)\n#endif\n\n#if defined(JSON_HEDLEY_MALLOC)\n    #undef JSON_HEDLEY_MALLOC\n#endif\n#if \\\n    JSON_HEDLEY_HAS_ATTRIBUTE(malloc) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(3,1,0) || \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,11,0) || \\\n    JSON_HEDLEY_ARM_VERSION_CHECK(4,1,0) || \\\n    JSON_HEDLEY_IBM_VERSION_CHECK(12,1,0) || \\\n    JSON_HEDLEY_TI_VERSION_CHECK(15,12,0) || \\\n    (JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(4,8,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(5,2,0) || \\\n    (JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,4,0) || \\\n    (JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,3,0) || \\\n    (JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,2,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,5,0) || \\\n    JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n    JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,1,0) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_MALLOC __attribute__((__malloc__))\n#elif JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,10,0)\n    #define JSON_HEDLEY_MALLOC _Pragma(\"returns_new_memory\")\n#elif \\\n    JSON_HEDLEY_MSVC_VERSION_CHECK(14,0,0) || \\\n    JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0)\n    #define JSON_HEDLEY_MALLOC __declspec(restrict)\n#else\n    #define JSON_HEDLEY_MALLOC\n#endif\n\n#if defined(JSON_HEDLEY_PURE)\n    #undef JSON_HEDLEY_PURE\n#endif\n#if \\\n  JSON_HEDLEY_HAS_ATTRIBUTE(pure) || \\\n  JSON_HEDLEY_GCC_VERSION_CHECK(2,96,0) || \\\n  JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n  JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,11,0) || \\\n  JSON_HEDLEY_ARM_VERSION_CHECK(4,1,0) || \\\n  JSON_HEDLEY_IBM_VERSION_CHECK(10,1,0) || \\\n  JSON_HEDLEY_TI_VERSION_CHECK(15,12,0) || \\\n  (JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(4,8,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n  JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(5,2,0) || \\\n  (JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n  JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,4,0) || \\\n  (JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n  JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,3,0) || \\\n  (JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,2,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n  JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,5,0) || \\\n  JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n  JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,1,0) || \\\n  JSON_HEDLEY_PGI_VERSION_CHECK(17,10,0) || \\\n  JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n#  define JSON_HEDLEY_PURE __attribute__((__pure__))\n#elif JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,10,0)\n#  define JSON_HEDLEY_PURE _Pragma(\"does_not_write_global_data\")\n#elif defined(__cplusplus) && \\\n    ( \\\n      JSON_HEDLEY_TI_CL430_VERSION_CHECK(2,0,1) || \\\n      JSON_HEDLEY_TI_CL6X_VERSION_CHECK(4,0,0) || \\\n      JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) \\\n    )\n#  define JSON_HEDLEY_PURE _Pragma(\"FUNC_IS_PURE;\")\n#else\n#  define JSON_HEDLEY_PURE\n#endif\n\n#if defined(JSON_HEDLEY_CONST)\n    #undef JSON_HEDLEY_CONST\n#endif\n#if \\\n    JSON_HEDLEY_HAS_ATTRIBUTE(const) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(2,5,0) || \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,11,0) || \\\n    JSON_HEDLEY_ARM_VERSION_CHECK(4,1,0) || \\\n    JSON_HEDLEY_IBM_VERSION_CHECK(10,1,0) || \\\n    JSON_HEDLEY_TI_VERSION_CHECK(15,12,0) || \\\n    (JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(4,8,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(5,2,0) || \\\n    (JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,4,0) || \\\n    (JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,3,0) || \\\n    (JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,2,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,5,0) || \\\n    JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n    JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,1,0) || \\\n    JSON_HEDLEY_PGI_VERSION_CHECK(17,10,0) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_CONST __attribute__((__const__))\n#elif \\\n    JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,10,0)\n    #define JSON_HEDLEY_CONST _Pragma(\"no_side_effect\")\n#else\n    #define JSON_HEDLEY_CONST JSON_HEDLEY_PURE\n#endif\n\n#if defined(JSON_HEDLEY_RESTRICT)\n    #undef JSON_HEDLEY_RESTRICT\n#endif\n#if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) && !defined(__cplusplus)\n    #define JSON_HEDLEY_RESTRICT restrict\n#elif \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(3,1,0) || \\\n    JSON_HEDLEY_MSVC_VERSION_CHECK(14,0,0) || \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0) || \\\n    JSON_HEDLEY_ARM_VERSION_CHECK(4,1,0) || \\\n    JSON_HEDLEY_IBM_VERSION_CHECK(10,1,0) || \\\n    JSON_HEDLEY_PGI_VERSION_CHECK(17,10,0) || \\\n    JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,3,0) || \\\n    JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,2,4) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(8,1,0) || \\\n    JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n    (JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,14,0) && defined(__cplusplus)) || \\\n    JSON_HEDLEY_IAR_VERSION_CHECK(8,0,0) || \\\n    defined(__clang__) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_RESTRICT __restrict\n#elif JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,3,0) && !defined(__cplusplus)\n    #define JSON_HEDLEY_RESTRICT _Restrict\n#else\n    #define JSON_HEDLEY_RESTRICT\n#endif\n\n#if defined(JSON_HEDLEY_INLINE)\n    #undef JSON_HEDLEY_INLINE\n#endif\n#if \\\n    (defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)) || \\\n    (defined(__cplusplus) && (__cplusplus >= 199711L))\n    #define JSON_HEDLEY_INLINE inline\n#elif \\\n    defined(JSON_HEDLEY_GCC_VERSION) || \\\n    JSON_HEDLEY_ARM_VERSION_CHECK(6,2,0)\n    #define JSON_HEDLEY_INLINE __inline__\n#elif \\\n    JSON_HEDLEY_MSVC_VERSION_CHECK(12,0,0) || \\\n    JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0) || \\\n    JSON_HEDLEY_ARM_VERSION_CHECK(4,1,0) || \\\n    JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(5,1,0) || \\\n    JSON_HEDLEY_TI_CL430_VERSION_CHECK(3,1,0) || \\\n    JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,2,0) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(8,0,0) || \\\n    JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n    JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,1,0) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_INLINE __inline\n#else\n    #define JSON_HEDLEY_INLINE\n#endif\n\n#if defined(JSON_HEDLEY_ALWAYS_INLINE)\n    #undef JSON_HEDLEY_ALWAYS_INLINE\n#endif\n#if \\\n  JSON_HEDLEY_HAS_ATTRIBUTE(always_inline) || \\\n  JSON_HEDLEY_GCC_VERSION_CHECK(4,0,0) || \\\n  JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n  JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,11,0) || \\\n  JSON_HEDLEY_ARM_VERSION_CHECK(4,1,0) || \\\n  JSON_HEDLEY_IBM_VERSION_CHECK(10,1,0) || \\\n  JSON_HEDLEY_TI_VERSION_CHECK(15,12,0) || \\\n  (JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(4,8,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n  JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(5,2,0) || \\\n  (JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n  JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,4,0) || \\\n  (JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n  JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,3,0) || \\\n  (JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,2,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n  JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,5,0) || \\\n  JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n  JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,1,0) || \\\n  JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10) || \\\n  JSON_HEDLEY_IAR_VERSION_CHECK(8,10,0)\n#  define JSON_HEDLEY_ALWAYS_INLINE __attribute__((__always_inline__)) JSON_HEDLEY_INLINE\n#elif \\\n  JSON_HEDLEY_MSVC_VERSION_CHECK(12,0,0) || \\\n  JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0)\n#  define JSON_HEDLEY_ALWAYS_INLINE __forceinline\n#elif defined(__cplusplus) && \\\n    ( \\\n      JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(5,2,0) || \\\n      JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,3,0) || \\\n      JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,4,0) || \\\n      JSON_HEDLEY_TI_CL6X_VERSION_CHECK(6,1,0) || \\\n      JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n      JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,1,0) \\\n    )\n#  define JSON_HEDLEY_ALWAYS_INLINE _Pragma(\"FUNC_ALWAYS_INLINE;\")\n#elif JSON_HEDLEY_IAR_VERSION_CHECK(8,0,0)\n#  define JSON_HEDLEY_ALWAYS_INLINE _Pragma(\"inline=forced\")\n#else\n#  define JSON_HEDLEY_ALWAYS_INLINE JSON_HEDLEY_INLINE\n#endif\n\n#if defined(JSON_HEDLEY_NEVER_INLINE)\n    #undef JSON_HEDLEY_NEVER_INLINE\n#endif\n#if \\\n    JSON_HEDLEY_HAS_ATTRIBUTE(noinline) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(4,0,0) || \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,11,0) || \\\n    JSON_HEDLEY_ARM_VERSION_CHECK(4,1,0) || \\\n    JSON_HEDLEY_IBM_VERSION_CHECK(10,1,0) || \\\n    JSON_HEDLEY_TI_VERSION_CHECK(15,12,0) || \\\n    (JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(4,8,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_ARMCL_VERSION_CHECK(5,2,0) || \\\n    (JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL2000_VERSION_CHECK(6,4,0) || \\\n    (JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,0,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL430_VERSION_CHECK(4,3,0) || \\\n    (JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,2,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,5,0) || \\\n    JSON_HEDLEY_TI_CL7X_VERSION_CHECK(1,2,0) || \\\n    JSON_HEDLEY_TI_CLPRU_VERSION_CHECK(2,1,0) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10) || \\\n    JSON_HEDLEY_IAR_VERSION_CHECK(8,10,0)\n    #define JSON_HEDLEY_NEVER_INLINE __attribute__((__noinline__))\n#elif \\\n    JSON_HEDLEY_MSVC_VERSION_CHECK(13,10,0) || \\\n    JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0)\n    #define JSON_HEDLEY_NEVER_INLINE __declspec(noinline)\n#elif JSON_HEDLEY_PGI_VERSION_CHECK(10,2,0)\n    #define JSON_HEDLEY_NEVER_INLINE _Pragma(\"noinline\")\n#elif JSON_HEDLEY_TI_CL6X_VERSION_CHECK(6,0,0) && defined(__cplusplus)\n    #define JSON_HEDLEY_NEVER_INLINE _Pragma(\"FUNC_CANNOT_INLINE;\")\n#elif JSON_HEDLEY_IAR_VERSION_CHECK(8,0,0)\n    #define JSON_HEDLEY_NEVER_INLINE _Pragma(\"inline=never\")\n#elif JSON_HEDLEY_COMPCERT_VERSION_CHECK(3,2,0)\n    #define JSON_HEDLEY_NEVER_INLINE __attribute((noinline))\n#elif JSON_HEDLEY_PELLES_VERSION_CHECK(9,0,0)\n    #define JSON_HEDLEY_NEVER_INLINE __declspec(noinline)\n#else\n    #define JSON_HEDLEY_NEVER_INLINE\n#endif\n\n#if defined(JSON_HEDLEY_PRIVATE)\n    #undef JSON_HEDLEY_PRIVATE\n#endif\n#if defined(JSON_HEDLEY_PUBLIC)\n    #undef JSON_HEDLEY_PUBLIC\n#endif\n#if defined(JSON_HEDLEY_IMPORT)\n    #undef JSON_HEDLEY_IMPORT\n#endif\n#if defined(_WIN32) || defined(__CYGWIN__)\n#  define JSON_HEDLEY_PRIVATE\n#  define JSON_HEDLEY_PUBLIC   __declspec(dllexport)\n#  define JSON_HEDLEY_IMPORT   __declspec(dllimport)\n#else\n#  if \\\n    JSON_HEDLEY_HAS_ATTRIBUTE(visibility) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(3,3,0) || \\\n    JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,11,0) || \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_ARM_VERSION_CHECK(4,1,0) || \\\n    JSON_HEDLEY_IBM_VERSION_CHECK(13,1,0) || \\\n    ( \\\n      defined(__TI_EABI__) && \\\n      ( \\\n        (JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,2,0) && defined(__TI_GNU_ATTRIBUTE_SUPPORT__)) || \\\n        JSON_HEDLEY_TI_CL6X_VERSION_CHECK(7,5,0) \\\n      ) \\\n    ) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n#    define JSON_HEDLEY_PRIVATE __attribute__((__visibility__(\"hidden\")))\n#    define JSON_HEDLEY_PUBLIC  __attribute__((__visibility__(\"default\")))\n#  else\n#    define JSON_HEDLEY_PRIVATE\n#    define JSON_HEDLEY_PUBLIC\n#  endif\n#  define JSON_HEDLEY_IMPORT    extern\n#endif\n\n#if defined(JSON_HEDLEY_NO_THROW)\n    #undef JSON_HEDLEY_NO_THROW\n#endif\n#if \\\n    JSON_HEDLEY_HAS_ATTRIBUTE(nothrow) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(3,3,0) || \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_NO_THROW __attribute__((__nothrow__))\n#elif \\\n    JSON_HEDLEY_MSVC_VERSION_CHECK(13,1,0) || \\\n    JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0) || \\\n    JSON_HEDLEY_ARM_VERSION_CHECK(4,1,0)\n    #define JSON_HEDLEY_NO_THROW __declspec(nothrow)\n#else\n    #define JSON_HEDLEY_NO_THROW\n#endif\n\n#if defined(JSON_HEDLEY_FALL_THROUGH)\n    #undef JSON_HEDLEY_FALL_THROUGH\n#endif\n#if \\\n    JSON_HEDLEY_HAS_ATTRIBUTE(fallthrough) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(7,0,0) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_FALL_THROUGH __attribute__((__fallthrough__))\n#elif JSON_HEDLEY_HAS_CPP_ATTRIBUTE_NS(clang,fallthrough)\n    #define JSON_HEDLEY_FALL_THROUGH JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_([[clang::fallthrough]])\n#elif JSON_HEDLEY_HAS_CPP_ATTRIBUTE(fallthrough)\n    #define JSON_HEDLEY_FALL_THROUGH JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_([[fallthrough]])\n#elif defined(__fallthrough) /* SAL */\n    #define JSON_HEDLEY_FALL_THROUGH __fallthrough\n#else\n    #define JSON_HEDLEY_FALL_THROUGH\n#endif\n\n#if defined(JSON_HEDLEY_RETURNS_NON_NULL)\n    #undef JSON_HEDLEY_RETURNS_NON_NULL\n#endif\n#if \\\n    JSON_HEDLEY_HAS_ATTRIBUTE(returns_nonnull) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(4,9,0) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_RETURNS_NON_NULL __attribute__((__returns_nonnull__))\n#elif defined(_Ret_notnull_) /* SAL */\n    #define JSON_HEDLEY_RETURNS_NON_NULL _Ret_notnull_\n#else\n    #define JSON_HEDLEY_RETURNS_NON_NULL\n#endif\n\n#if defined(JSON_HEDLEY_ARRAY_PARAM)\n    #undef JSON_HEDLEY_ARRAY_PARAM\n#endif\n#if \\\n    defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) && \\\n    !defined(__STDC_NO_VLA__) && \\\n    !defined(__cplusplus) && \\\n    !defined(JSON_HEDLEY_PGI_VERSION) && \\\n    !defined(JSON_HEDLEY_TINYC_VERSION)\n    #define JSON_HEDLEY_ARRAY_PARAM(name) (name)\n#else\n    #define JSON_HEDLEY_ARRAY_PARAM(name)\n#endif\n\n#if defined(JSON_HEDLEY_IS_CONSTANT)\n    #undef JSON_HEDLEY_IS_CONSTANT\n#endif\n#if defined(JSON_HEDLEY_REQUIRE_CONSTEXPR)\n    #undef JSON_HEDLEY_REQUIRE_CONSTEXPR\n#endif\n/* JSON_HEDLEY_IS_CONSTEXPR_ is for\n   HEDLEY INTERNAL USE ONLY.  API subject to change without notice. */\n#if defined(JSON_HEDLEY_IS_CONSTEXPR_)\n    #undef JSON_HEDLEY_IS_CONSTEXPR_\n#endif\n#if \\\n    JSON_HEDLEY_HAS_BUILTIN(__builtin_constant_p) || \\\n    JSON_HEDLEY_GCC_VERSION_CHECK(3,4,0) || \\\n    JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n    JSON_HEDLEY_TINYC_VERSION_CHECK(0,9,19) || \\\n    JSON_HEDLEY_ARM_VERSION_CHECK(4,1,0) || \\\n    JSON_HEDLEY_IBM_VERSION_CHECK(13,1,0) || \\\n    JSON_HEDLEY_TI_CL6X_VERSION_CHECK(6,1,0) || \\\n    (JSON_HEDLEY_SUNPRO_VERSION_CHECK(5,10,0) && !defined(__cplusplus)) || \\\n    JSON_HEDLEY_CRAY_VERSION_CHECK(8,1,0) || \\\n    JSON_HEDLEY_MCST_LCC_VERSION_CHECK(1,25,10)\n    #define JSON_HEDLEY_IS_CONSTANT(expr) __builtin_constant_p(expr)\n#endif\n#if !defined(__cplusplus)\n#  if \\\n       JSON_HEDLEY_HAS_BUILTIN(__builtin_types_compatible_p) || \\\n       JSON_HEDLEY_GCC_VERSION_CHECK(3,4,0) || \\\n       JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n       JSON_HEDLEY_IBM_VERSION_CHECK(13,1,0) || \\\n       JSON_HEDLEY_CRAY_VERSION_CHECK(8,1,0) || \\\n       JSON_HEDLEY_ARM_VERSION_CHECK(5,4,0) || \\\n       JSON_HEDLEY_TINYC_VERSION_CHECK(0,9,24)\n#if defined(__INTPTR_TYPE__)\n    #define JSON_HEDLEY_IS_CONSTEXPR_(expr) __builtin_types_compatible_p(__typeof__((1 ? (void*) ((__INTPTR_TYPE__) ((expr) * 0)) : (int*) 0)), int*)\n#else\n    #include <stdint.h>\n    #define JSON_HEDLEY_IS_CONSTEXPR_(expr) __builtin_types_compatible_p(__typeof__((1 ? (void*) ((intptr_t) ((expr) * 0)) : (int*) 0)), int*)\n#endif\n#  elif \\\n       ( \\\n          defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 201112L) && \\\n          !defined(JSON_HEDLEY_SUNPRO_VERSION) && \\\n          !defined(JSON_HEDLEY_PGI_VERSION) && \\\n          !defined(JSON_HEDLEY_IAR_VERSION)) || \\\n       (JSON_HEDLEY_HAS_EXTENSION(c_generic_selections) && !defined(JSON_HEDLEY_IAR_VERSION)) || \\\n       JSON_HEDLEY_GCC_VERSION_CHECK(4,9,0) || \\\n       JSON_HEDLEY_INTEL_VERSION_CHECK(17,0,0) || \\\n       JSON_HEDLEY_IBM_VERSION_CHECK(12,1,0) || \\\n       JSON_HEDLEY_ARM_VERSION_CHECK(5,3,0)\n#if defined(__INTPTR_TYPE__)\n    #define JSON_HEDLEY_IS_CONSTEXPR_(expr) _Generic((1 ? (void*) ((__INTPTR_TYPE__) ((expr) * 0)) : (int*) 0), int*: 1, void*: 0)\n#else\n    #include <stdint.h>\n    #define JSON_HEDLEY_IS_CONSTEXPR_(expr) _Generic((1 ? (void*) ((intptr_t) * 0) : (int*) 0), int*: 1, void*: 0)\n#endif\n#  elif \\\n       defined(JSON_HEDLEY_GCC_VERSION) || \\\n       defined(JSON_HEDLEY_INTEL_VERSION) || \\\n       defined(JSON_HEDLEY_TINYC_VERSION) || \\\n       defined(JSON_HEDLEY_TI_ARMCL_VERSION) || \\\n       JSON_HEDLEY_TI_CL430_VERSION_CHECK(18,12,0) || \\\n       defined(JSON_HEDLEY_TI_CL2000_VERSION) || \\\n       defined(JSON_HEDLEY_TI_CL6X_VERSION) || \\\n       defined(JSON_HEDLEY_TI_CL7X_VERSION) || \\\n       defined(JSON_HEDLEY_TI_CLPRU_VERSION) || \\\n       defined(__clang__)\n#    define JSON_HEDLEY_IS_CONSTEXPR_(expr) ( \\\n        sizeof(void) != \\\n        sizeof(*( \\\n                  1 ? \\\n                  ((void*) ((expr) * 0L) ) : \\\n((struct { char v[sizeof(void) * 2]; } *) 1) \\\n                ) \\\n              ) \\\n                                            )\n#  endif\n#endif\n#if defined(JSON_HEDLEY_IS_CONSTEXPR_)\n    #if !defined(JSON_HEDLEY_IS_CONSTANT)\n        #define JSON_HEDLEY_IS_CONSTANT(expr) JSON_HEDLEY_IS_CONSTEXPR_(expr)\n    #endif\n    #define JSON_HEDLEY_REQUIRE_CONSTEXPR(expr) (JSON_HEDLEY_IS_CONSTEXPR_(expr) ? (expr) : (-1))\n#else\n    #if !defined(JSON_HEDLEY_IS_CONSTANT)\n        #define JSON_HEDLEY_IS_CONSTANT(expr) (0)\n    #endif\n    #define JSON_HEDLEY_REQUIRE_CONSTEXPR(expr) (expr)\n#endif\n\n#if defined(JSON_HEDLEY_BEGIN_C_DECLS)\n    #undef JSON_HEDLEY_BEGIN_C_DECLS\n#endif\n#if defined(JSON_HEDLEY_END_C_DECLS)\n    #undef JSON_HEDLEY_END_C_DECLS\n#endif\n#if defined(JSON_HEDLEY_C_DECL)\n    #undef JSON_HEDLEY_C_DECL\n#endif\n#if defined(__cplusplus)\n    #define JSON_HEDLEY_BEGIN_C_DECLS extern \"C\" {\n    #define JSON_HEDLEY_END_C_DECLS }\n    #define JSON_HEDLEY_C_DECL extern \"C\"\n#else\n    #define JSON_HEDLEY_BEGIN_C_DECLS\n    #define JSON_HEDLEY_END_C_DECLS\n    #define JSON_HEDLEY_C_DECL\n#endif\n\n#if defined(JSON_HEDLEY_STATIC_ASSERT)\n    #undef JSON_HEDLEY_STATIC_ASSERT\n#endif\n#if \\\n  !defined(__cplusplus) && ( \\\n      (defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 201112L)) || \\\n      (JSON_HEDLEY_HAS_FEATURE(c_static_assert) && !defined(JSON_HEDLEY_INTEL_CL_VERSION)) || \\\n      JSON_HEDLEY_GCC_VERSION_CHECK(6,0,0) || \\\n      JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0) || \\\n      defined(_Static_assert) \\\n    )\n#  define JSON_HEDLEY_STATIC_ASSERT(expr, message) _Static_assert(expr, message)\n#elif \\\n  (defined(__cplusplus) && (__cplusplus >= 201103L)) || \\\n  JSON_HEDLEY_MSVC_VERSION_CHECK(16,0,0) || \\\n  JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0)\n#  define JSON_HEDLEY_STATIC_ASSERT(expr, message) JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_(static_assert(expr, message))\n#else\n#  define JSON_HEDLEY_STATIC_ASSERT(expr, message)\n#endif\n\n#if defined(JSON_HEDLEY_NULL)\n    #undef JSON_HEDLEY_NULL\n#endif\n#if defined(__cplusplus)\n    #if __cplusplus >= 201103L\n        #define JSON_HEDLEY_NULL JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_(nullptr)\n    #elif defined(NULL)\n        #define JSON_HEDLEY_NULL NULL\n    #else\n        #define JSON_HEDLEY_NULL JSON_HEDLEY_STATIC_CAST(void*, 0)\n    #endif\n#elif defined(NULL)\n    #define JSON_HEDLEY_NULL NULL\n#else\n    #define JSON_HEDLEY_NULL ((void*) 0)\n#endif\n\n#if defined(JSON_HEDLEY_MESSAGE)\n    #undef JSON_HEDLEY_MESSAGE\n#endif\n#if JSON_HEDLEY_HAS_WARNING(\"-Wunknown-pragmas\")\n#  define JSON_HEDLEY_MESSAGE(msg) \\\n    JSON_HEDLEY_DIAGNOSTIC_PUSH \\\n    JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_PRAGMAS \\\n    JSON_HEDLEY_PRAGMA(message msg) \\\n    JSON_HEDLEY_DIAGNOSTIC_POP\n#elif \\\n  JSON_HEDLEY_GCC_VERSION_CHECK(4,4,0) || \\\n  JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0)\n#  define JSON_HEDLEY_MESSAGE(msg) JSON_HEDLEY_PRAGMA(message msg)\n#elif JSON_HEDLEY_CRAY_VERSION_CHECK(5,0,0)\n#  define JSON_HEDLEY_MESSAGE(msg) JSON_HEDLEY_PRAGMA(_CRI message msg)\n#elif JSON_HEDLEY_IAR_VERSION_CHECK(8,0,0)\n#  define JSON_HEDLEY_MESSAGE(msg) JSON_HEDLEY_PRAGMA(message(msg))\n#elif JSON_HEDLEY_PELLES_VERSION_CHECK(2,0,0)\n#  define JSON_HEDLEY_MESSAGE(msg) JSON_HEDLEY_PRAGMA(message(msg))\n#else\n#  define JSON_HEDLEY_MESSAGE(msg)\n#endif\n\n#if defined(JSON_HEDLEY_WARNING)\n    #undef JSON_HEDLEY_WARNING\n#endif\n#if JSON_HEDLEY_HAS_WARNING(\"-Wunknown-pragmas\")\n#  define JSON_HEDLEY_WARNING(msg) \\\n    JSON_HEDLEY_DIAGNOSTIC_PUSH \\\n    JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_PRAGMAS \\\n    JSON_HEDLEY_PRAGMA(clang warning msg) \\\n    JSON_HEDLEY_DIAGNOSTIC_POP\n#elif \\\n  JSON_HEDLEY_GCC_VERSION_CHECK(4,8,0) || \\\n  JSON_HEDLEY_PGI_VERSION_CHECK(18,4,0) || \\\n  JSON_HEDLEY_INTEL_VERSION_CHECK(13,0,0)\n#  define JSON_HEDLEY_WARNING(msg) JSON_HEDLEY_PRAGMA(GCC warning msg)\n#elif \\\n  JSON_HEDLEY_MSVC_VERSION_CHECK(15,0,0) || \\\n  JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0)\n#  define JSON_HEDLEY_WARNING(msg) JSON_HEDLEY_PRAGMA(message(msg))\n#else\n#  define JSON_HEDLEY_WARNING(msg) JSON_HEDLEY_MESSAGE(msg)\n#endif\n\n#if defined(JSON_HEDLEY_REQUIRE)\n    #undef JSON_HEDLEY_REQUIRE\n#endif\n#if defined(JSON_HEDLEY_REQUIRE_MSG)\n    #undef JSON_HEDLEY_REQUIRE_MSG\n#endif\n#if JSON_HEDLEY_HAS_ATTRIBUTE(diagnose_if)\n#  if JSON_HEDLEY_HAS_WARNING(\"-Wgcc-compat\")\n#    define JSON_HEDLEY_REQUIRE(expr) \\\n    JSON_HEDLEY_DIAGNOSTIC_PUSH \\\n    _Pragma(\"clang diagnostic ignored \\\"-Wgcc-compat\\\"\") \\\n    __attribute__((diagnose_if(!(expr), #expr, \"error\"))) \\\n    JSON_HEDLEY_DIAGNOSTIC_POP\n#    define JSON_HEDLEY_REQUIRE_MSG(expr,msg) \\\n    JSON_HEDLEY_DIAGNOSTIC_PUSH \\\n    _Pragma(\"clang diagnostic ignored \\\"-Wgcc-compat\\\"\") \\\n    __attribute__((diagnose_if(!(expr), msg, \"error\"))) \\\n    JSON_HEDLEY_DIAGNOSTIC_POP\n#  else\n#    define JSON_HEDLEY_REQUIRE(expr) __attribute__((diagnose_if(!(expr), #expr, \"error\")))\n#    define JSON_HEDLEY_REQUIRE_MSG(expr,msg) __attribute__((diagnose_if(!(expr), msg, \"error\")))\n#  endif\n#else\n#  define JSON_HEDLEY_REQUIRE(expr)\n#  define JSON_HEDLEY_REQUIRE_MSG(expr,msg)\n#endif\n\n#if defined(JSON_HEDLEY_FLAGS)\n    #undef JSON_HEDLEY_FLAGS\n#endif\n#if JSON_HEDLEY_HAS_ATTRIBUTE(flag_enum) && (!defined(__cplusplus) || JSON_HEDLEY_HAS_WARNING(\"-Wbitfield-enum-conversion\"))\n    #define JSON_HEDLEY_FLAGS __attribute__((__flag_enum__))\n#else\n    #define JSON_HEDLEY_FLAGS\n#endif\n\n#if defined(JSON_HEDLEY_FLAGS_CAST)\n    #undef JSON_HEDLEY_FLAGS_CAST\n#endif\n#if JSON_HEDLEY_INTEL_VERSION_CHECK(19,0,0)\n#  define JSON_HEDLEY_FLAGS_CAST(T, expr) (__extension__ ({ \\\n        JSON_HEDLEY_DIAGNOSTIC_PUSH \\\n        _Pragma(\"warning(disable:188)\") \\\n        ((T) (expr)); \\\n        JSON_HEDLEY_DIAGNOSTIC_POP \\\n    }))\n#else\n#  define JSON_HEDLEY_FLAGS_CAST(T, expr) JSON_HEDLEY_STATIC_CAST(T, expr)\n#endif\n\n#if defined(JSON_HEDLEY_EMPTY_BASES)\n    #undef JSON_HEDLEY_EMPTY_BASES\n#endif\n#if \\\n    (JSON_HEDLEY_MSVC_VERSION_CHECK(19,0,23918) && !JSON_HEDLEY_MSVC_VERSION_CHECK(20,0,0)) || \\\n    JSON_HEDLEY_INTEL_CL_VERSION_CHECK(2021,1,0)\n    #define JSON_HEDLEY_EMPTY_BASES __declspec(empty_bases)\n#else\n    #define JSON_HEDLEY_EMPTY_BASES\n#endif\n\n/* Remaining macros are deprecated. */\n\n#if defined(JSON_HEDLEY_GCC_NOT_CLANG_VERSION_CHECK)\n    #undef JSON_HEDLEY_GCC_NOT_CLANG_VERSION_CHECK\n#endif\n#if defined(__clang__)\n    #define JSON_HEDLEY_GCC_NOT_CLANG_VERSION_CHECK(major,minor,patch) (0)\n#else\n    #define JSON_HEDLEY_GCC_NOT_CLANG_VERSION_CHECK(major,minor,patch) JSON_HEDLEY_GCC_VERSION_CHECK(major,minor,patch)\n#endif\n\n#if defined(JSON_HEDLEY_CLANG_HAS_ATTRIBUTE)\n    #undef JSON_HEDLEY_CLANG_HAS_ATTRIBUTE\n#endif\n#define JSON_HEDLEY_CLANG_HAS_ATTRIBUTE(attribute) JSON_HEDLEY_HAS_ATTRIBUTE(attribute)\n\n#if defined(JSON_HEDLEY_CLANG_HAS_CPP_ATTRIBUTE)\n    #undef JSON_HEDLEY_CLANG_HAS_CPP_ATTRIBUTE\n#endif\n#define JSON_HEDLEY_CLANG_HAS_CPP_ATTRIBUTE(attribute) JSON_HEDLEY_HAS_CPP_ATTRIBUTE(attribute)\n\n#if defined(JSON_HEDLEY_CLANG_HAS_BUILTIN)\n    #undef JSON_HEDLEY_CLANG_HAS_BUILTIN\n#endif\n#define JSON_HEDLEY_CLANG_HAS_BUILTIN(builtin) JSON_HEDLEY_HAS_BUILTIN(builtin)\n\n#if defined(JSON_HEDLEY_CLANG_HAS_FEATURE)\n    #undef JSON_HEDLEY_CLANG_HAS_FEATURE\n#endif\n#define JSON_HEDLEY_CLANG_HAS_FEATURE(feature) JSON_HEDLEY_HAS_FEATURE(feature)\n\n#if defined(JSON_HEDLEY_CLANG_HAS_EXTENSION)\n    #undef JSON_HEDLEY_CLANG_HAS_EXTENSION\n#endif\n#define JSON_HEDLEY_CLANG_HAS_EXTENSION(extension) JSON_HEDLEY_HAS_EXTENSION(extension)\n\n#if defined(JSON_HEDLEY_CLANG_HAS_DECLSPEC_DECLSPEC_ATTRIBUTE)\n    #undef JSON_HEDLEY_CLANG_HAS_DECLSPEC_DECLSPEC_ATTRIBUTE\n#endif\n#define JSON_HEDLEY_CLANG_HAS_DECLSPEC_ATTRIBUTE(attribute) JSON_HEDLEY_HAS_DECLSPEC_ATTRIBUTE(attribute)\n\n#if defined(JSON_HEDLEY_CLANG_HAS_WARNING)\n    #undef JSON_HEDLEY_CLANG_HAS_WARNING\n#endif\n#define JSON_HEDLEY_CLANG_HAS_WARNING(warning) JSON_HEDLEY_HAS_WARNING(warning)\n\n#endif /* !defined(JSON_HEDLEY_VERSION) || (JSON_HEDLEY_VERSION < X) */\n\n\n// This file contains all internal macro definitions\n// You MUST include macro_unscope.hpp at the end of json.hpp to undef all of them\n\n// exclude unsupported compilers\n#if !defined(JSON_SKIP_UNSUPPORTED_COMPILER_CHECK)\n    #if defined(__clang__)\n        #if (__clang_major__ * 10000 + __clang_minor__ * 100 + __clang_patchlevel__) < 30400\n            #error \"unsupported Clang version - see https://github.com/nlohmann/json#supported-compilers\"\n        #endif\n    #elif defined(__GNUC__) && !(defined(__ICC) || defined(__INTEL_COMPILER))\n        #if (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) < 40800\n            #error \"unsupported GCC version - see https://github.com/nlohmann/json#supported-compilers\"\n        #endif\n    #endif\n#endif\n\n// C++ language standard detection\n#if (defined(__cplusplus) && __cplusplus >= 202002L) || (defined(_MSVC_LANG) && _MSVC_LANG >= 202002L)\n    #define JSON_HAS_CPP_20\n    #define JSON_HAS_CPP_17\n    #define JSON_HAS_CPP_14\n#elif (defined(__cplusplus) && __cplusplus >= 201703L) || (defined(_HAS_CXX17) && _HAS_CXX17 == 1) // fix for issue #464\n    #define JSON_HAS_CPP_17\n    #define JSON_HAS_CPP_14\n#elif (defined(__cplusplus) && __cplusplus >= 201402L) || (defined(_HAS_CXX14) && _HAS_CXX14 == 1)\n    #define JSON_HAS_CPP_14\n#endif\n\n// disable documentation warnings on clang\n#if defined(__clang__)\n    #pragma GCC diagnostic push\n    #pragma GCC diagnostic ignored \"-Wdocumentation\"\n#endif\n\n// allow to disable exceptions\n#if (defined(__cpp_exceptions) || defined(__EXCEPTIONS) || defined(_CPPUNWIND)) && !defined(JSON_NOEXCEPTION)\n    #define JSON_THROW(exception) throw exception\n    #define JSON_TRY try\n    #define JSON_CATCH(exception) catch(exception)\n    #define JSON_INTERNAL_CATCH(exception) catch(exception)\n#else\n    #include <cstdlib>\n    #define JSON_THROW(exception) std::abort()\n    #define JSON_TRY if(true)\n    #define JSON_CATCH(exception) if(false)\n    #define JSON_INTERNAL_CATCH(exception) if(false)\n#endif\n\n// override exception macros\n#if defined(JSON_THROW_USER)\n    #undef JSON_THROW\n    #define JSON_THROW JSON_THROW_USER\n#endif\n#if defined(JSON_TRY_USER)\n    #undef JSON_TRY\n    #define JSON_TRY JSON_TRY_USER\n#endif\n#if defined(JSON_CATCH_USER)\n    #undef JSON_CATCH\n    #define JSON_CATCH JSON_CATCH_USER\n    #undef JSON_INTERNAL_CATCH\n    #define JSON_INTERNAL_CATCH JSON_CATCH_USER\n#endif\n#if defined(JSON_INTERNAL_CATCH_USER)\n    #undef JSON_INTERNAL_CATCH\n    #define JSON_INTERNAL_CATCH JSON_INTERNAL_CATCH_USER\n#endif\n\n// allow to override assert\n#if !defined(JSON_ASSERT)\n    #include <cassert> // assert\n    #define JSON_ASSERT(x) assert(x)\n#endif\n\n// allow to access some private functions (needed by the test suite)\n#if defined(JSON_TESTS_PRIVATE)\n    #define JSON_PRIVATE_UNLESS_TESTED public\n#else\n    #define JSON_PRIVATE_UNLESS_TESTED private\n#endif\n\n/*!\n@brief macro to briefly define a mapping between an enum and JSON\n@def NLOHMANN_JSON_SERIALIZE_ENUM\n@since version 3.4.0\n*/\n#define NLOHMANN_JSON_SERIALIZE_ENUM(ENUM_TYPE, ...)                                            \\\n    template<typename BasicJsonType>                                                            \\\n    inline void to_json(BasicJsonType& j, const ENUM_TYPE& e)                                   \\\n    {                                                                                           \\\n        static_assert(std::is_enum<ENUM_TYPE>::value, #ENUM_TYPE \" must be an enum!\");          \\\n        static const std::pair<ENUM_TYPE, BasicJsonType> m[] = __VA_ARGS__;                     \\\n        auto it = std::find_if(std::begin(m), std::end(m),                                      \\\n                               [e](const std::pair<ENUM_TYPE, BasicJsonType>& ej_pair) -> bool  \\\n        {                                                                                       \\\n            return ej_pair.first == e;                                                          \\\n        });                                                                                     \\\n        j = ((it != std::end(m)) ? it : std::begin(m))->second;                                 \\\n    }                                                                                           \\\n    template<typename BasicJsonType>                                                            \\\n    inline void from_json(const BasicJsonType& j, ENUM_TYPE& e)                                 \\\n    {                                                                                           \\\n        static_assert(std::is_enum<ENUM_TYPE>::value, #ENUM_TYPE \" must be an enum!\");          \\\n        static const std::pair<ENUM_TYPE, BasicJsonType> m[] = __VA_ARGS__;                     \\\n        auto it = std::find_if(std::begin(m), std::end(m),                                      \\\n                               [&j](const std::pair<ENUM_TYPE, BasicJsonType>& ej_pair) -> bool \\\n        {                                                                                       \\\n            return ej_pair.second == j;                                                         \\\n        });                                                                                     \\\n        e = ((it != std::end(m)) ? it : std::begin(m))->first;                                  \\\n    }\n\n// Ugly macros to avoid uglier copy-paste when specializing basic_json. They\n// may be removed in the future once the class is split.\n\n#define NLOHMANN_BASIC_JSON_TPL_DECLARATION                                \\\n    template<template<typename, typename, typename...> class ObjectType,   \\\n             template<typename, typename...> class ArrayType,              \\\n             class StringType, class BooleanType, class NumberIntegerType, \\\n             class NumberUnsignedType, class NumberFloatType,              \\\n             template<typename> class AllocatorType,                       \\\n             template<typename, typename = void> class JSONSerializer,     \\\n             class BinaryType>\n\n#define NLOHMANN_BASIC_JSON_TPL                                            \\\n    basic_json<ObjectType, ArrayType, StringType, BooleanType,             \\\n    NumberIntegerType, NumberUnsignedType, NumberFloatType,                \\\n    AllocatorType, JSONSerializer, BinaryType>\n\n// Macros to simplify conversion from/to types\n\n#define NLOHMANN_JSON_EXPAND( x ) x\n#define NLOHMANN_JSON_GET_MACRO(_1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _13, _14, _15, _16, _17, _18, _19, _20, _21, _22, _23, _24, _25, _26, _27, _28, _29, _30, _31, _32, _33, _34, _35, _36, _37, _38, _39, _40, _41, _42, _43, _44, _45, _46, _47, _48, _49, _50, _51, _52, _53, _54, _55, _56, _57, _58, _59, _60, _61, _62, _63, _64, NAME,...) NAME\n#define NLOHMANN_JSON_PASTE(...) NLOHMANN_JSON_EXPAND(NLOHMANN_JSON_GET_MACRO(__VA_ARGS__, \\\n        NLOHMANN_JSON_PASTE64, \\\n        NLOHMANN_JSON_PASTE63, \\\n        NLOHMANN_JSON_PASTE62, \\\n        NLOHMANN_JSON_PASTE61, \\\n        NLOHMANN_JSON_PASTE60, \\\n        NLOHMANN_JSON_PASTE59, \\\n        NLOHMANN_JSON_PASTE58, \\\n        NLOHMANN_JSON_PASTE57, \\\n        NLOHMANN_JSON_PASTE56, \\\n        NLOHMANN_JSON_PASTE55, \\\n        NLOHMANN_JSON_PASTE54, \\\n        NLOHMANN_JSON_PASTE53, \\\n        NLOHMANN_JSON_PASTE52, \\\n        NLOHMANN_JSON_PASTE51, \\\n        NLOHMANN_JSON_PASTE50, \\\n        NLOHMANN_JSON_PASTE49, \\\n        NLOHMANN_JSON_PASTE48, \\\n        NLOHMANN_JSON_PASTE47, \\\n        NLOHMANN_JSON_PASTE46, \\\n        NLOHMANN_JSON_PASTE45, \\\n        NLOHMANN_JSON_PASTE44, \\\n        NLOHMANN_JSON_PASTE43, \\\n        NLOHMANN_JSON_PASTE42, \\\n        NLOHMANN_JSON_PASTE41, \\\n        NLOHMANN_JSON_PASTE40, \\\n        NLOHMANN_JSON_PASTE39, \\\n        NLOHMANN_JSON_PASTE38, \\\n        NLOHMANN_JSON_PASTE37, \\\n        NLOHMANN_JSON_PASTE36, \\\n        NLOHMANN_JSON_PASTE35, \\\n        NLOHMANN_JSON_PASTE34, \\\n        NLOHMANN_JSON_PASTE33, \\\n        NLOHMANN_JSON_PASTE32, \\\n        NLOHMANN_JSON_PASTE31, \\\n        NLOHMANN_JSON_PASTE30, \\\n        NLOHMANN_JSON_PASTE29, \\\n        NLOHMANN_JSON_PASTE28, \\\n        NLOHMANN_JSON_PASTE27, \\\n        NLOHMANN_JSON_PASTE26, \\\n        NLOHMANN_JSON_PASTE25, \\\n        NLOHMANN_JSON_PASTE24, \\\n        NLOHMANN_JSON_PASTE23, \\\n        NLOHMANN_JSON_PASTE22, \\\n        NLOHMANN_JSON_PASTE21, \\\n        NLOHMANN_JSON_PASTE20, \\\n        NLOHMANN_JSON_PASTE19, \\\n        NLOHMANN_JSON_PASTE18, \\\n        NLOHMANN_JSON_PASTE17, \\\n        NLOHMANN_JSON_PASTE16, \\\n        NLOHMANN_JSON_PASTE15, \\\n        NLOHMANN_JSON_PASTE14, \\\n        NLOHMANN_JSON_PASTE13, \\\n        NLOHMANN_JSON_PASTE12, \\\n        NLOHMANN_JSON_PASTE11, \\\n        NLOHMANN_JSON_PASTE10, \\\n        NLOHMANN_JSON_PASTE9, \\\n        NLOHMANN_JSON_PASTE8, \\\n        NLOHMANN_JSON_PASTE7, \\\n        NLOHMANN_JSON_PASTE6, \\\n        NLOHMANN_JSON_PASTE5, \\\n        NLOHMANN_JSON_PASTE4, \\\n        NLOHMANN_JSON_PASTE3, \\\n        NLOHMANN_JSON_PASTE2, \\\n        NLOHMANN_JSON_PASTE1)(__VA_ARGS__))\n#define NLOHMANN_JSON_PASTE2(func, v1) func(v1)\n#define NLOHMANN_JSON_PASTE3(func, v1, v2) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE2(func, v2)\n#define NLOHMANN_JSON_PASTE4(func, v1, v2, v3) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE3(func, v2, v3)\n#define NLOHMANN_JSON_PASTE5(func, v1, v2, v3, v4) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE4(func, v2, v3, v4)\n#define NLOHMANN_JSON_PASTE6(func, v1, v2, v3, v4, v5) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE5(func, v2, v3, v4, v5)\n#define NLOHMANN_JSON_PASTE7(func, v1, v2, v3, v4, v5, v6) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE6(func, v2, v3, v4, v5, v6)\n#define NLOHMANN_JSON_PASTE8(func, v1, v2, v3, v4, v5, v6, v7) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE7(func, v2, v3, v4, v5, v6, v7)\n#define NLOHMANN_JSON_PASTE9(func, v1, v2, v3, v4, v5, v6, v7, v8) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE8(func, v2, v3, v4, v5, v6, v7, v8)\n#define NLOHMANN_JSON_PASTE10(func, v1, v2, v3, v4, v5, v6, v7, v8, v9) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE9(func, v2, v3, v4, v5, v6, v7, v8, v9)\n#define NLOHMANN_JSON_PASTE11(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE10(func, v2, v3, v4, v5, v6, v7, v8, v9, v10)\n#define NLOHMANN_JSON_PASTE12(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE11(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11)\n#define NLOHMANN_JSON_PASTE13(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE12(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12)\n#define NLOHMANN_JSON_PASTE14(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE13(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13)\n#define NLOHMANN_JSON_PASTE15(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE14(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14)\n#define NLOHMANN_JSON_PASTE16(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE15(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15)\n#define NLOHMANN_JSON_PASTE17(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE16(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16)\n#define NLOHMANN_JSON_PASTE18(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE17(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17)\n#define NLOHMANN_JSON_PASTE19(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE18(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18)\n#define NLOHMANN_JSON_PASTE20(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE19(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19)\n#define NLOHMANN_JSON_PASTE21(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE20(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20)\n#define NLOHMANN_JSON_PASTE22(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE21(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21)\n#define NLOHMANN_JSON_PASTE23(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE22(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22)\n#define NLOHMANN_JSON_PASTE24(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE23(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23)\n#define NLOHMANN_JSON_PASTE25(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE24(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24)\n#define NLOHMANN_JSON_PASTE26(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE25(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25)\n#define NLOHMANN_JSON_PASTE27(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE26(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26)\n#define NLOHMANN_JSON_PASTE28(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE27(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27)\n#define NLOHMANN_JSON_PASTE29(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE28(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28)\n#define NLOHMANN_JSON_PASTE30(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE29(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29)\n#define NLOHMANN_JSON_PASTE31(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE30(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30)\n#define NLOHMANN_JSON_PASTE32(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE31(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31)\n#define NLOHMANN_JSON_PASTE33(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE32(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32)\n#define NLOHMANN_JSON_PASTE34(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE33(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33)\n#define NLOHMANN_JSON_PASTE35(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE34(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34)\n#define NLOHMANN_JSON_PASTE36(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE35(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35)\n#define NLOHMANN_JSON_PASTE37(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE36(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36)\n#define NLOHMANN_JSON_PASTE38(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE37(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37)\n#define NLOHMANN_JSON_PASTE39(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE38(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38)\n#define NLOHMANN_JSON_PASTE40(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE39(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39)\n#define NLOHMANN_JSON_PASTE41(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE40(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40)\n#define NLOHMANN_JSON_PASTE42(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE41(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41)\n#define NLOHMANN_JSON_PASTE43(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE42(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42)\n#define NLOHMANN_JSON_PASTE44(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE43(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43)\n#define NLOHMANN_JSON_PASTE45(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE44(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44)\n#define NLOHMANN_JSON_PASTE46(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE45(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45)\n#define NLOHMANN_JSON_PASTE47(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE46(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46)\n#define NLOHMANN_JSON_PASTE48(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE47(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47)\n#define NLOHMANN_JSON_PASTE49(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE48(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48)\n#define NLOHMANN_JSON_PASTE50(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE49(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49)\n#define NLOHMANN_JSON_PASTE51(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE50(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50)\n#define NLOHMANN_JSON_PASTE52(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE51(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51)\n#define NLOHMANN_JSON_PASTE53(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE52(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52)\n#define NLOHMANN_JSON_PASTE54(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE53(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53)\n#define NLOHMANN_JSON_PASTE55(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE54(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54)\n#define NLOHMANN_JSON_PASTE56(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE55(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55)\n#define NLOHMANN_JSON_PASTE57(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55, v56) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE56(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55, v56)\n#define NLOHMANN_JSON_PASTE58(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55, v56, v57) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE57(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55, v56, v57)\n#define NLOHMANN_JSON_PASTE59(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55, v56, v57, v58) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE58(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55, v56, v57, v58)\n#define NLOHMANN_JSON_PASTE60(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55, v56, v57, v58, v59) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE59(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55, v56, v57, v58, v59)\n#define NLOHMANN_JSON_PASTE61(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55, v56, v57, v58, v59, v60) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE60(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55, v56, v57, v58, v59, v60)\n#define NLOHMANN_JSON_PASTE62(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55, v56, v57, v58, v59, v60, v61) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE61(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55, v56, v57, v58, v59, v60, v61)\n#define NLOHMANN_JSON_PASTE63(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55, v56, v57, v58, v59, v60, v61, v62) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE62(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55, v56, v57, v58, v59, v60, v61, v62)\n#define NLOHMANN_JSON_PASTE64(func, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55, v56, v57, v58, v59, v60, v61, v62, v63) NLOHMANN_JSON_PASTE2(func, v1) NLOHMANN_JSON_PASTE63(func, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31, v32, v33, v34, v35, v36, v37, v38, v39, v40, v41, v42, v43, v44, v45, v46, v47, v48, v49, v50, v51, v52, v53, v54, v55, v56, v57, v58, v59, v60, v61, v62, v63)\n\n#define NLOHMANN_JSON_TO(v1) nlohmann_json_j[#v1] = nlohmann_json_t.v1;\n#define NLOHMANN_JSON_FROM(v1) nlohmann_json_j.at(#v1).get_to(nlohmann_json_t.v1);\n\n/*!\n@brief macro\n@def NLOHMANN_DEFINE_TYPE_INTRUSIVE\n@since version 3.9.0\n*/\n#define NLOHMANN_DEFINE_TYPE_INTRUSIVE(Type, ...)  \\\n    friend void to_json(nlohmann::json& nlohmann_json_j, const Type& nlohmann_json_t) { NLOHMANN_JSON_EXPAND(NLOHMANN_JSON_PASTE(NLOHMANN_JSON_TO, __VA_ARGS__)) } \\\n    friend void from_json(const nlohmann::json& nlohmann_json_j, Type& nlohmann_json_t) { NLOHMANN_JSON_EXPAND(NLOHMANN_JSON_PASTE(NLOHMANN_JSON_FROM, __VA_ARGS__)) }\n\n/*!\n@brief macro\n@def NLOHMANN_DEFINE_TYPE_NON_INTRUSIVE\n@since version 3.9.0\n*/\n#define NLOHMANN_DEFINE_TYPE_NON_INTRUSIVE(Type, ...)  \\\n    inline void to_json(nlohmann::json& nlohmann_json_j, const Type& nlohmann_json_t) { NLOHMANN_JSON_EXPAND(NLOHMANN_JSON_PASTE(NLOHMANN_JSON_TO, __VA_ARGS__)) } \\\n    inline void from_json(const nlohmann::json& nlohmann_json_j, Type& nlohmann_json_t) { NLOHMANN_JSON_EXPAND(NLOHMANN_JSON_PASTE(NLOHMANN_JSON_FROM, __VA_ARGS__)) }\n\n#ifndef JSON_USE_IMPLICIT_CONVERSIONS\n    #define JSON_USE_IMPLICIT_CONVERSIONS 1\n#endif\n\n#if JSON_USE_IMPLICIT_CONVERSIONS\n    #define JSON_EXPLICIT\n#else\n    #define JSON_EXPLICIT explicit\n#endif\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\n\n/*!\n@brief replace all occurrences of a substring by another string\n\n@param[in,out] s  the string to manipulate; changed so that all\n               occurrences of @a f are replaced with @a t\n@param[in]     f  the substring to replace with @a t\n@param[in]     t  the string to replace @a f\n\n@pre The search string @a f must not be empty. **This precondition is\nenforced with an assertion.**\n\n@since version 2.0.0\n*/\ninline void replace_substring(std::string& s, const std::string& f,\n                              const std::string& t)\n{\n    JSON_ASSERT(!f.empty());\n    for (auto pos = s.find(f);                // find first occurrence of f\n            pos != std::string::npos;         // make sure f was found\n            s.replace(pos, f.size(), t),      // replace with t, and\n            pos = s.find(f, pos + t.size()))  // find next occurrence of f\n    {}\n}\n\n/*!\n * @brief string escaping as described in RFC 6901 (Sect. 4)\n * @param[in] s string to escape\n * @return    escaped string\n *\n * Note the order of escaping \"~\" to \"~0\" and \"/\" to \"~1\" is important.\n */\ninline std::string escape(std::string s)\n{\n    replace_substring(s, \"~\", \"~0\");\n    replace_substring(s, \"/\", \"~1\");\n    return s;\n}\n\n/*!\n * @brief string unescaping as described in RFC 6901 (Sect. 4)\n * @param[in] s string to unescape\n * @return    unescaped string\n *\n * Note the order of escaping \"~1\" to \"/\" and \"~0\" to \"~\" is important.\n */\nstatic void unescape(std::string& s)\n{\n    replace_substring(s, \"~1\", \"/\");\n    replace_substring(s, \"~0\", \"~\");\n}\n\n} // namespace detail\n} // namespace nlohmann\n\n// #include <nlohmann/detail/input/position_t.hpp>\n\n\n#include <cstddef> // size_t\n\nnamespace nlohmann\n{\nnamespace detail\n{\n/// struct to capture the start position of the current token\nstruct position_t\n{\n    /// the total number of characters read\n    std::size_t chars_read_total = 0;\n    /// the number of characters read in the current line\n    std::size_t chars_read_current_line = 0;\n    /// the number of lines read\n    std::size_t lines_read = 0;\n\n    /// conversion to size_t to preserve SAX interface\n    constexpr operator size_t() const\n    {\n        return chars_read_total;\n    }\n};\n\n} // namespace detail\n} // namespace nlohmann\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\n////////////////\n// exceptions //\n////////////////\n\n/*!\n@brief general exception of the @ref basic_json class\n\nThis class is an extension of `std::exception` objects with a member @a id for\nexception ids. It is used as the base class for all exceptions thrown by the\n@ref basic_json class. This class can hence be used as \"wildcard\" to catch\nexceptions.\n\nSubclasses:\n- @ref parse_error for exceptions indicating a parse error\n- @ref invalid_iterator for exceptions indicating errors with iterators\n- @ref type_error for exceptions indicating executing a member function with\n                  a wrong type\n- @ref out_of_range for exceptions indicating access out of the defined range\n- @ref other_error for exceptions indicating other library errors\n\n@internal\n@note To have nothrow-copy-constructible exceptions, we internally use\n      `std::runtime_error` which can cope with arbitrary-length error messages.\n      Intermediate strings are built with static functions and then passed to\n      the actual constructor.\n@endinternal\n\n@liveexample{The following code shows how arbitrary library exceptions can be\ncaught.,exception}\n\n@since version 3.0.0\n*/\nclass exception : public std::exception\n{\n  public:\n    /// returns the explanatory string\n    const char* what() const noexcept override\n    {\n        return m.what();\n    }\n\n    /// the id of the exception\n    const int id; // NOLINT(cppcoreguidelines-non-private-member-variables-in-classes)\n\n  protected:\n    JSON_HEDLEY_NON_NULL(3)\n    exception(int id_, const char* what_arg) : id(id_), m(what_arg) {}\n\n    static std::string name(const std::string& ename, int id_)\n    {\n        return \"[json.exception.\" + ename + \".\" + std::to_string(id_) + \"] \";\n    }\n\n    template<typename BasicJsonType>\n    static std::string diagnostics(const BasicJsonType& leaf_element)\n    {\n#if JSON_DIAGNOSTICS\n        std::vector<std::string> tokens;\n        for (const auto* current = &leaf_element; current->m_parent != nullptr; current = current->m_parent)\n        {\n            switch (current->m_parent->type())\n            {\n                case value_t::array:\n                {\n                    for (std::size_t i = 0; i < current->m_parent->m_value.array->size(); ++i)\n                    {\n                        if (&current->m_parent->m_value.array->operator[](i) == current)\n                        {\n                            tokens.emplace_back(std::to_string(i));\n                            break;\n                        }\n                    }\n                    break;\n                }\n\n                case value_t::object:\n                {\n                    for (const auto& element : *current->m_parent->m_value.object)\n                    {\n                        if (&element.second == current)\n                        {\n                            tokens.emplace_back(element.first.c_str());\n                            break;\n                        }\n                    }\n                    break;\n                }\n\n                default:   // LCOV_EXCL_LINE\n                    break; // LCOV_EXCL_LINE\n            }\n        }\n\n        if (tokens.empty())\n        {\n            return \"\";\n        }\n\n        return \"(\" + std::accumulate(tokens.rbegin(), tokens.rend(), std::string{},\n                                     [](const std::string & a, const std::string & b)\n        {\n            return a + \"/\" + detail::escape(b);\n        }) + \") \";\n#else\n        static_cast<void>(leaf_element);\n        return \"\";\n#endif\n    }\n\n  private:\n    /// an exception object as storage for error messages\n    std::runtime_error m;\n};\n\n/*!\n@brief exception indicating a parse error\n\nThis exception is thrown by the library when a parse error occurs. Parse errors\ncan occur during the deserialization of JSON text, CBOR, MessagePack, as well\nas when using JSON Patch.\n\nMember @a byte holds the byte index of the last read character in the input\nfile.\n\nExceptions have ids 1xx.\n\nname / id                      | example message | description\n------------------------------ | --------------- | -------------------------\njson.exception.parse_error.101 | parse error at 2: unexpected end of input; expected string literal | This error indicates a syntax error while deserializing a JSON text. The error message describes that an unexpected token (character) was encountered, and the member @a byte indicates the error position.\njson.exception.parse_error.102 | parse error at 14: missing or wrong low surrogate | JSON uses the `\\uxxxx` format to describe Unicode characters. Code points above above 0xFFFF are split into two `\\uxxxx` entries (\"surrogate pairs\"). This error indicates that the surrogate pair is incomplete or contains an invalid code point.\njson.exception.parse_error.103 | parse error: code points above 0x10FFFF are invalid | Unicode supports code points up to 0x10FFFF. Code points above 0x10FFFF are invalid.\njson.exception.parse_error.104 | parse error: JSON patch must be an array of objects | [RFC 6902](https://tools.ietf.org/html/rfc6902) requires a JSON Patch document to be a JSON document that represents an array of objects.\njson.exception.parse_error.105 | parse error: operation must have string member 'op' | An operation of a JSON Patch document must contain exactly one \"op\" member, whose value indicates the operation to perform. Its value must be one of \"add\", \"remove\", \"replace\", \"move\", \"copy\", or \"test\"; other values are errors.\njson.exception.parse_error.106 | parse error: array index '01' must not begin with '0' | An array index in a JSON Pointer ([RFC 6901](https://tools.ietf.org/html/rfc6901)) may be `0` or any number without a leading `0`.\njson.exception.parse_error.107 | parse error: JSON pointer must be empty or begin with '/' - was: 'foo' | A JSON Pointer must be a Unicode string containing a sequence of zero or more reference tokens, each prefixed by a `/` character.\njson.exception.parse_error.108 | parse error: escape character '~' must be followed with '0' or '1' | In a JSON Pointer, only `~0` and `~1` are valid escape sequences.\njson.exception.parse_error.109 | parse error: array index 'one' is not a number | A JSON Pointer array index must be a number.\njson.exception.parse_error.110 | parse error at 1: cannot read 2 bytes from vector | When parsing CBOR or MessagePack, the byte vector ends before the complete value has been read.\njson.exception.parse_error.112 | parse error at 1: error reading CBOR; last byte: 0xF8 | Not all types of CBOR or MessagePack are supported. This exception occurs if an unsupported byte was read.\njson.exception.parse_error.113 | parse error at 2: expected a CBOR string; last byte: 0x98 | While parsing a map key, a value that is not a string has been read.\njson.exception.parse_error.114 | parse error: Unsupported BSON record type 0x0F | The parsing of the corresponding BSON record type is not implemented (yet).\njson.exception.parse_error.115 | parse error at byte 5: syntax error while parsing UBJSON high-precision number: invalid number text: 1A | A UBJSON high-precision number could not be parsed.\n\n@note For an input with n bytes, 1 is the index of the first character and n+1\n      is the index of the terminating null byte or the end of file. This also\n      holds true when reading a byte vector (CBOR or MessagePack).\n\n@liveexample{The following code shows how a `parse_error` exception can be\ncaught.,parse_error}\n\n@sa - @ref exception for the base class of the library exceptions\n@sa - @ref invalid_iterator for exceptions indicating errors with iterators\n@sa - @ref type_error for exceptions indicating executing a member function with\n                    a wrong type\n@sa - @ref out_of_range for exceptions indicating access out of the defined range\n@sa - @ref other_error for exceptions indicating other library errors\n\n@since version 3.0.0\n*/\nclass parse_error : public exception\n{\n  public:\n    /*!\n    @brief create a parse error exception\n    @param[in] id_       the id of the exception\n    @param[in] pos       the position where the error occurred (or with\n                         chars_read_total=0 if the position cannot be\n                         determined)\n    @param[in] what_arg  the explanatory string\n    @return parse_error object\n    */\n    template<typename BasicJsonType>\n    static parse_error create(int id_, const position_t& pos, const std::string& what_arg, const BasicJsonType& context)\n    {\n        std::string w = exception::name(\"parse_error\", id_) + \"parse error\" +\n                        position_string(pos) + \": \" + exception::diagnostics(context) + what_arg;\n        return parse_error(id_, pos.chars_read_total, w.c_str());\n    }\n\n    template<typename BasicJsonType>\n    static parse_error create(int id_, std::size_t byte_, const std::string& what_arg, const BasicJsonType& context)\n    {\n        std::string w = exception::name(\"parse_error\", id_) + \"parse error\" +\n                        (byte_ != 0 ? (\" at byte \" + std::to_string(byte_)) : \"\") +\n                        \": \" + exception::diagnostics(context) + what_arg;\n        return parse_error(id_, byte_, w.c_str());\n    }\n\n    /*!\n    @brief byte index of the parse error\n\n    The byte index of the last read character in the input file.\n\n    @note For an input with n bytes, 1 is the index of the first character and\n          n+1 is the index of the terminating null byte or the end of file.\n          This also holds true when reading a byte vector (CBOR or MessagePack).\n    */\n    const std::size_t byte;\n\n  private:\n    parse_error(int id_, std::size_t byte_, const char* what_arg)\n        : exception(id_, what_arg), byte(byte_) {}\n\n    static std::string position_string(const position_t& pos)\n    {\n        return \" at line \" + std::to_string(pos.lines_read + 1) +\n               \", column \" + std::to_string(pos.chars_read_current_line);\n    }\n};\n\n/*!\n@brief exception indicating errors with iterators\n\nThis exception is thrown if iterators passed to a library function do not match\nthe expected semantics.\n\nExceptions have ids 2xx.\n\nname / id                           | example message | description\n----------------------------------- | --------------- | -------------------------\njson.exception.invalid_iterator.201 | iterators are not compatible | The iterators passed to constructor @ref basic_json(InputIT first, InputIT last) are not compatible, meaning they do not belong to the same container. Therefore, the range (@a first, @a last) is invalid.\njson.exception.invalid_iterator.202 | iterator does not fit current value | In an erase or insert function, the passed iterator @a pos does not belong to the JSON value for which the function was called. It hence does not define a valid position for the deletion/insertion.\njson.exception.invalid_iterator.203 | iterators do not fit current value | Either iterator passed to function @ref erase(IteratorType first, IteratorType last) does not belong to the JSON value from which values shall be erased. It hence does not define a valid range to delete values from.\njson.exception.invalid_iterator.204 | iterators out of range | When an iterator range for a primitive type (number, boolean, or string) is passed to a constructor or an erase function, this range has to be exactly (@ref begin(), @ref end()), because this is the only way the single stored value is expressed. All other ranges are invalid.\njson.exception.invalid_iterator.205 | iterator out of range | When an iterator for a primitive type (number, boolean, or string) is passed to an erase function, the iterator has to be the @ref begin() iterator, because it is the only way to address the stored value. All other iterators are invalid.\njson.exception.invalid_iterator.206 | cannot construct with iterators from null | The iterators passed to constructor @ref basic_json(InputIT first, InputIT last) belong to a JSON null value and hence to not define a valid range.\njson.exception.invalid_iterator.207 | cannot use key() for non-object iterators | The key() member function can only be used on iterators belonging to a JSON object, because other types do not have a concept of a key.\njson.exception.invalid_iterator.208 | cannot use operator[] for object iterators | The operator[] to specify a concrete offset cannot be used on iterators belonging to a JSON object, because JSON objects are unordered.\njson.exception.invalid_iterator.209 | cannot use offsets with object iterators | The offset operators (+, -, +=, -=) cannot be used on iterators belonging to a JSON object, because JSON objects are unordered.\njson.exception.invalid_iterator.210 | iterators do not fit | The iterator range passed to the insert function are not compatible, meaning they do not belong to the same container. Therefore, the range (@a first, @a last) is invalid.\njson.exception.invalid_iterator.211 | passed iterators may not belong to container | The iterator range passed to the insert function must not be a subrange of the container to insert to.\njson.exception.invalid_iterator.212 | cannot compare iterators of different containers | When two iterators are compared, they must belong to the same container.\njson.exception.invalid_iterator.213 | cannot compare order of object iterators | The order of object iterators cannot be compared, because JSON objects are unordered.\njson.exception.invalid_iterator.214 | cannot get value | Cannot get value for iterator: Either the iterator belongs to a null value or it is an iterator to a primitive type (number, boolean, or string), but the iterator is different to @ref begin().\n\n@liveexample{The following code shows how an `invalid_iterator` exception can be\ncaught.,invalid_iterator}\n\n@sa - @ref exception for the base class of the library exceptions\n@sa - @ref parse_error for exceptions indicating a parse error\n@sa - @ref type_error for exceptions indicating executing a member function with\n                    a wrong type\n@sa - @ref out_of_range for exceptions indicating access out of the defined range\n@sa - @ref other_error for exceptions indicating other library errors\n\n@since version 3.0.0\n*/\nclass invalid_iterator : public exception\n{\n  public:\n    template<typename BasicJsonType>\n    static invalid_iterator create(int id_, const std::string& what_arg, const BasicJsonType& context)\n    {\n        std::string w = exception::name(\"invalid_iterator\", id_) + exception::diagnostics(context) + what_arg;\n        return invalid_iterator(id_, w.c_str());\n    }\n\n  private:\n    JSON_HEDLEY_NON_NULL(3)\n    invalid_iterator(int id_, const char* what_arg)\n        : exception(id_, what_arg) {}\n};\n\n/*!\n@brief exception indicating executing a member function with a wrong type\n\nThis exception is thrown in case of a type error; that is, a library function is\nexecuted on a JSON value whose type does not match the expected semantics.\n\nExceptions have ids 3xx.\n\nname / id                     | example message | description\n----------------------------- | --------------- | -------------------------\njson.exception.type_error.301 | cannot create object from initializer list | To create an object from an initializer list, the initializer list must consist only of a list of pairs whose first element is a string. When this constraint is violated, an array is created instead.\njson.exception.type_error.302 | type must be object, but is array | During implicit or explicit value conversion, the JSON type must be compatible to the target type. For instance, a JSON string can only be converted into string types, but not into numbers or boolean types.\njson.exception.type_error.303 | incompatible ReferenceType for get_ref, actual type is object | To retrieve a reference to a value stored in a @ref basic_json object with @ref get_ref, the type of the reference must match the value type. For instance, for a JSON array, the @a ReferenceType must be @ref array_t &.\njson.exception.type_error.304 | cannot use at() with string | The @ref at() member functions can only be executed for certain JSON types.\njson.exception.type_error.305 | cannot use operator[] with string | The @ref operator[] member functions can only be executed for certain JSON types.\njson.exception.type_error.306 | cannot use value() with string | The @ref value() member functions can only be executed for certain JSON types.\njson.exception.type_error.307 | cannot use erase() with string | The @ref erase() member functions can only be executed for certain JSON types.\njson.exception.type_error.308 | cannot use push_back() with string | The @ref push_back() and @ref operator+= member functions can only be executed for certain JSON types.\njson.exception.type_error.309 | cannot use insert() with | The @ref insert() member functions can only be executed for certain JSON types.\njson.exception.type_error.310 | cannot use swap() with number | The @ref swap() member functions can only be executed for certain JSON types.\njson.exception.type_error.311 | cannot use emplace_back() with string | The @ref emplace_back() member function can only be executed for certain JSON types.\njson.exception.type_error.312 | cannot use update() with string | The @ref update() member functions can only be executed for certain JSON types.\njson.exception.type_error.313 | invalid value to unflatten | The @ref unflatten function converts an object whose keys are JSON Pointers back into an arbitrary nested JSON value. The JSON Pointers must not overlap, because then the resulting value would not be well defined.\njson.exception.type_error.314 | only objects can be unflattened | The @ref unflatten function only works for an object whose keys are JSON Pointers.\njson.exception.type_error.315 | values in object must be primitive | The @ref unflatten function only works for an object whose keys are JSON Pointers and whose values are primitive.\njson.exception.type_error.316 | invalid UTF-8 byte at index 10: 0x7E | The @ref dump function only works with UTF-8 encoded strings; that is, if you assign a `std::string` to a JSON value, make sure it is UTF-8 encoded. |\njson.exception.type_error.317 | JSON value cannot be serialized to requested format | The dynamic type of the object cannot be represented in the requested serialization format (e.g. a raw `true` or `null` JSON object cannot be serialized to BSON) |\n\n@liveexample{The following code shows how a `type_error` exception can be\ncaught.,type_error}\n\n@sa - @ref exception for the base class of the library exceptions\n@sa - @ref parse_error for exceptions indicating a parse error\n@sa - @ref invalid_iterator for exceptions indicating errors with iterators\n@sa - @ref out_of_range for exceptions indicating access out of the defined range\n@sa - @ref other_error for exceptions indicating other library errors\n\n@since version 3.0.0\n*/\nclass type_error : public exception\n{\n  public:\n    template<typename BasicJsonType>\n    static type_error create(int id_, const std::string& what_arg, const BasicJsonType& context)\n    {\n        std::string w = exception::name(\"type_error\", id_) + exception::diagnostics(context) + what_arg;\n        return type_error(id_, w.c_str());\n    }\n\n  private:\n    JSON_HEDLEY_NON_NULL(3)\n    type_error(int id_, const char* what_arg) : exception(id_, what_arg) {}\n};\n\n/*!\n@brief exception indicating access out of the defined range\n\nThis exception is thrown in case a library function is called on an input\nparameter that exceeds the expected range, for instance in case of array\nindices or nonexisting object keys.\n\nExceptions have ids 4xx.\n\nname / id                       | example message | description\n------------------------------- | --------------- | -------------------------\njson.exception.out_of_range.401 | array index 3 is out of range | The provided array index @a i is larger than @a size-1.\njson.exception.out_of_range.402 | array index '-' (3) is out of range | The special array index `-` in a JSON Pointer never describes a valid element of the array, but the index past the end. That is, it can only be used to add elements at this position, but not to read it.\njson.exception.out_of_range.403 | key 'foo' not found | The provided key was not found in the JSON object.\njson.exception.out_of_range.404 | unresolved reference token 'foo' | A reference token in a JSON Pointer could not be resolved.\njson.exception.out_of_range.405 | JSON pointer has no parent | The JSON Patch operations 'remove' and 'add' can not be applied to the root element of the JSON value.\njson.exception.out_of_range.406 | number overflow parsing '10E1000' | A parsed number could not be stored as without changing it to NaN or INF.\njson.exception.out_of_range.407 | number overflow serializing '9223372036854775808' | UBJSON and BSON only support integer numbers up to 9223372036854775807. (until version 3.8.0) |\njson.exception.out_of_range.408 | excessive array size: 8658170730974374167 | The size (following `#`) of an UBJSON array or object exceeds the maximal capacity. |\njson.exception.out_of_range.409 | BSON key cannot contain code point U+0000 (at byte 2) | Key identifiers to be serialized to BSON cannot contain code point U+0000, since the key is stored as zero-terminated c-string |\n\n@liveexample{The following code shows how an `out_of_range` exception can be\ncaught.,out_of_range}\n\n@sa - @ref exception for the base class of the library exceptions\n@sa - @ref parse_error for exceptions indicating a parse error\n@sa - @ref invalid_iterator for exceptions indicating errors with iterators\n@sa - @ref type_error for exceptions indicating executing a member function with\n                    a wrong type\n@sa - @ref other_error for exceptions indicating other library errors\n\n@since version 3.0.0\n*/\nclass out_of_range : public exception\n{\n  public:\n    template<typename BasicJsonType>\n    static out_of_range create(int id_, const std::string& what_arg, const BasicJsonType& context)\n    {\n        std::string w = exception::name(\"out_of_range\", id_) + exception::diagnostics(context) + what_arg;\n        return out_of_range(id_, w.c_str());\n    }\n\n  private:\n    JSON_HEDLEY_NON_NULL(3)\n    out_of_range(int id_, const char* what_arg) : exception(id_, what_arg) {}\n};\n\n/*!\n@brief exception indicating other library errors\n\nThis exception is thrown in case of errors that cannot be classified with the\nother exception types.\n\nExceptions have ids 5xx.\n\nname / id                      | example message | description\n------------------------------ | --------------- | -------------------------\njson.exception.other_error.501 | unsuccessful: {\"op\":\"test\",\"path\":\"/baz\", \"value\":\"bar\"} | A JSON Patch operation 'test' failed. The unsuccessful operation is also printed.\n\n@sa - @ref exception for the base class of the library exceptions\n@sa - @ref parse_error for exceptions indicating a parse error\n@sa - @ref invalid_iterator for exceptions indicating errors with iterators\n@sa - @ref type_error for exceptions indicating executing a member function with\n                    a wrong type\n@sa - @ref out_of_range for exceptions indicating access out of the defined range\n\n@liveexample{The following code shows how an `other_error` exception can be\ncaught.,other_error}\n\n@since version 3.0.0\n*/\nclass other_error : public exception\n{\n  public:\n    template<typename BasicJsonType>\n    static other_error create(int id_, const std::string& what_arg, const BasicJsonType& context)\n    {\n        std::string w = exception::name(\"other_error\", id_) + exception::diagnostics(context) + what_arg;\n        return other_error(id_, w.c_str());\n    }\n\n  private:\n    JSON_HEDLEY_NON_NULL(3)\n    other_error(int id_, const char* what_arg) : exception(id_, what_arg) {}\n};\n}  // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n// #include <nlohmann/detail/meta/cpp_future.hpp>\n\n\n#include <cstddef> // size_t\n#include <type_traits> // conditional, enable_if, false_type, integral_constant, is_constructible, is_integral, is_same, remove_cv, remove_reference, true_type\n#include <utility> // index_sequence, make_index_sequence, index_sequence_for\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\n\ntemplate<typename T>\nusing uncvref_t = typename std::remove_cv<typename std::remove_reference<T>::type>::type;\n\n#ifdef JSON_HAS_CPP_14\n\n// the following utilities are natively available in C++14\nusing std::enable_if_t;\nusing std::index_sequence;\nusing std::make_index_sequence;\nusing std::index_sequence_for;\n\n#else\n\n// alias templates to reduce boilerplate\ntemplate<bool B, typename T = void>\nusing enable_if_t = typename std::enable_if<B, T>::type;\n\n// The following code is taken from https://github.com/abseil/abseil-cpp/blob/10cb35e459f5ecca5b2ff107635da0bfa41011b4/absl/utility/utility.h\n// which is part of Google Abseil (https://github.com/abseil/abseil-cpp), licensed under the Apache License 2.0.\n\n//// START OF CODE FROM GOOGLE ABSEIL\n\n// integer_sequence\n//\n// Class template representing a compile-time integer sequence. An instantiation\n// of `integer_sequence<T, Ints...>` has a sequence of integers encoded in its\n// type through its template arguments (which is a common need when\n// working with C++11 variadic templates). `absl::integer_sequence` is designed\n// to be a drop-in replacement for C++14's `std::integer_sequence`.\n//\n// Example:\n//\n//   template< class T, T... Ints >\n//   void user_function(integer_sequence<T, Ints...>);\n//\n//   int main()\n//   {\n//     // user_function's `T` will be deduced to `int` and `Ints...`\n//     // will be deduced to `0, 1, 2, 3, 4`.\n//     user_function(make_integer_sequence<int, 5>());\n//   }\ntemplate <typename T, T... Ints>\nstruct integer_sequence\n{\n    using value_type = T;\n    static constexpr std::size_t size() noexcept\n    {\n        return sizeof...(Ints);\n    }\n};\n\n// index_sequence\n//\n// A helper template for an `integer_sequence` of `size_t`,\n// `absl::index_sequence` is designed to be a drop-in replacement for C++14's\n// `std::index_sequence`.\ntemplate <size_t... Ints>\nusing index_sequence = integer_sequence<size_t, Ints...>;\n\nnamespace utility_internal\n{\n\ntemplate <typename Seq, size_t SeqSize, size_t Rem>\nstruct Extend;\n\n// Note that SeqSize == sizeof...(Ints). It's passed explicitly for efficiency.\ntemplate <typename T, T... Ints, size_t SeqSize>\nstruct Extend<integer_sequence<T, Ints...>, SeqSize, 0>\n{\n    using type = integer_sequence < T, Ints..., (Ints + SeqSize)... >;\n};\n\ntemplate <typename T, T... Ints, size_t SeqSize>\nstruct Extend<integer_sequence<T, Ints...>, SeqSize, 1>\n{\n    using type = integer_sequence < T, Ints..., (Ints + SeqSize)..., 2 * SeqSize >;\n};\n\n// Recursion helper for 'make_integer_sequence<T, N>'.\n// 'Gen<T, N>::type' is an alias for 'integer_sequence<T, 0, 1, ... N-1>'.\ntemplate <typename T, size_t N>\nstruct Gen\n{\n    using type =\n        typename Extend < typename Gen < T, N / 2 >::type, N / 2, N % 2 >::type;\n};\n\ntemplate <typename T>\nstruct Gen<T, 0>\n{\n    using type = integer_sequence<T>;\n};\n\n}  // namespace utility_internal\n\n// Compile-time sequences of integers\n\n// make_integer_sequence\n//\n// This template alias is equivalent to\n// `integer_sequence<int, 0, 1, ..., N-1>`, and is designed to be a drop-in\n// replacement for C++14's `std::make_integer_sequence`.\ntemplate <typename T, T N>\nusing make_integer_sequence = typename utility_internal::Gen<T, N>::type;\n\n// make_index_sequence\n//\n// This template alias is equivalent to `index_sequence<0, 1, ..., N-1>`,\n// and is designed to be a drop-in replacement for C++14's\n// `std::make_index_sequence`.\ntemplate <size_t N>\nusing make_index_sequence = make_integer_sequence<size_t, N>;\n\n// index_sequence_for\n//\n// Converts a typename pack into an index sequence of the same length, and\n// is designed to be a drop-in replacement for C++14's\n// `std::index_sequence_for()`\ntemplate <typename... Ts>\nusing index_sequence_for = make_index_sequence<sizeof...(Ts)>;\n\n//// END OF CODE FROM GOOGLE ABSEIL\n\n#endif\n\n// dispatch utility (taken from ranges-v3)\ntemplate<unsigned N> struct priority_tag : priority_tag < N - 1 > {};\ntemplate<> struct priority_tag<0> {};\n\n// taken from ranges-v3\ntemplate<typename T>\nstruct static_const\n{\n    static constexpr T value{};\n};\n\ntemplate<typename T>\nconstexpr T static_const<T>::value;\n\n}  // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/meta/type_traits.hpp>\n\n\n#include <limits> // numeric_limits\n#include <type_traits> // false_type, is_constructible, is_integral, is_same, true_type\n#include <utility> // declval\n#include <tuple> // tuple\n\n// #include <nlohmann/detail/iterators/iterator_traits.hpp>\n\n\n#include <iterator> // random_access_iterator_tag\n\n// #include <nlohmann/detail/meta/void_t.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\ntemplate<typename ...Ts> struct make_void\n{\n    using type = void;\n};\ntemplate<typename ...Ts> using void_t = typename make_void<Ts...>::type;\n} // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/meta/cpp_future.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\ntemplate<typename It, typename = void>\nstruct iterator_types {};\n\ntemplate<typename It>\nstruct iterator_types <\n    It,\n    void_t<typename It::difference_type, typename It::value_type, typename It::pointer,\n    typename It::reference, typename It::iterator_category >>\n{\n    using difference_type = typename It::difference_type;\n    using value_type = typename It::value_type;\n    using pointer = typename It::pointer;\n    using reference = typename It::reference;\n    using iterator_category = typename It::iterator_category;\n};\n\n// This is required as some compilers implement std::iterator_traits in a way that\n// doesn't work with SFINAE. See https://github.com/nlohmann/json/issues/1341.\ntemplate<typename T, typename = void>\nstruct iterator_traits\n{\n};\n\ntemplate<typename T>\nstruct iterator_traits < T, enable_if_t < !std::is_pointer<T>::value >>\n            : iterator_types<T>\n{\n};\n\ntemplate<typename T>\nstruct iterator_traits<T*, enable_if_t<std::is_object<T>::value>>\n{\n    using iterator_category = std::random_access_iterator_tag;\n    using value_type = T;\n    using difference_type = ptrdiff_t;\n    using pointer = T*;\n    using reference = T&;\n};\n} // namespace detail\n} // namespace nlohmann\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n// #include <nlohmann/detail/meta/cpp_future.hpp>\n\n// #include <nlohmann/detail/meta/detected.hpp>\n\n\n#include <type_traits>\n\n// #include <nlohmann/detail/meta/void_t.hpp>\n\n\n// https://en.cppreference.com/w/cpp/experimental/is_detected\nnamespace nlohmann\n{\nnamespace detail\n{\nstruct nonesuch\n{\n    nonesuch() = delete;\n    ~nonesuch() = delete;\n    nonesuch(nonesuch const&) = delete;\n    nonesuch(nonesuch const&&) = delete;\n    void operator=(nonesuch const&) = delete;\n    void operator=(nonesuch&&) = delete;\n};\n\ntemplate<class Default,\n         class AlwaysVoid,\n         template<class...> class Op,\n         class... Args>\nstruct detector\n{\n    using value_t = std::false_type;\n    using type = Default;\n};\n\ntemplate<class Default, template<class...> class Op, class... Args>\nstruct detector<Default, void_t<Op<Args...>>, Op, Args...>\n{\n    using value_t = std::true_type;\n    using type = Op<Args...>;\n};\n\ntemplate<template<class...> class Op, class... Args>\nusing is_detected = typename detector<nonesuch, void, Op, Args...>::value_t;\n\ntemplate<template<class...> class Op, class... Args>\nusing detected_t = typename detector<nonesuch, void, Op, Args...>::type;\n\ntemplate<class Default, template<class...> class Op, class... Args>\nusing detected_or = detector<Default, void, Op, Args...>;\n\ntemplate<class Default, template<class...> class Op, class... Args>\nusing detected_or_t = typename detected_or<Default, Op, Args...>::type;\n\ntemplate<class Expected, template<class...> class Op, class... Args>\nusing is_detected_exact = std::is_same<Expected, detected_t<Op, Args...>>;\n\ntemplate<class To, template<class...> class Op, class... Args>\nusing is_detected_convertible =\n    std::is_convertible<detected_t<Op, Args...>, To>;\n}  // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/json_fwd.hpp>\n#ifndef INCLUDE_NLOHMANN_JSON_FWD_HPP_\n#define INCLUDE_NLOHMANN_JSON_FWD_HPP_\n\n#include <cstdint> // int64_t, uint64_t\n#include <map> // map\n#include <memory> // allocator\n#include <string> // string\n#include <vector> // vector\n\n/*!\n@brief namespace for Niels Lohmann\n@see https://github.com/nlohmann\n@since version 1.0.0\n*/\nnamespace nlohmann\n{\n/*!\n@brief default JSONSerializer template argument\n\nThis serializer ignores the template arguments and uses ADL\n([argument-dependent lookup](https://en.cppreference.com/w/cpp/language/adl))\nfor serialization.\n*/\ntemplate<typename T = void, typename SFINAE = void>\nstruct adl_serializer;\n\ntemplate<template<typename U, typename V, typename... Args> class ObjectType =\n         std::map,\n         template<typename U, typename... Args> class ArrayType = std::vector,\n         class StringType = std::string, class BooleanType = bool,\n         class NumberIntegerType = std::int64_t,\n         class NumberUnsignedType = std::uint64_t,\n         class NumberFloatType = double,\n         template<typename U> class AllocatorType = std::allocator,\n         template<typename T, typename SFINAE = void> class JSONSerializer =\n         adl_serializer,\n         class BinaryType = std::vector<std::uint8_t>>\nclass basic_json;\n\n/*!\n@brief JSON Pointer\n\nA JSON pointer defines a string syntax for identifying a specific value\nwithin a JSON document. It can be used with functions `at` and\n`operator[]`. Furthermore, JSON pointers are the base for JSON patches.\n\n@sa [RFC 6901](https://tools.ietf.org/html/rfc6901)\n\n@since version 2.0.0\n*/\ntemplate<typename BasicJsonType>\nclass json_pointer;\n\n/*!\n@brief default JSON class\n\nThis type is the default specialization of the @ref basic_json class which\nuses the standard template types.\n\n@since version 1.0.0\n*/\nusing json = basic_json<>;\n\ntemplate<class Key, class T, class IgnoredLess, class Allocator>\nstruct ordered_map;\n\n/*!\n@brief ordered JSON class\n\nThis type preserves the insertion order of object keys.\n\n@since version 3.9.0\n*/\nusing ordered_json = basic_json<nlohmann::ordered_map>;\n\n}  // namespace nlohmann\n\n#endif  // INCLUDE_NLOHMANN_JSON_FWD_HPP_\n\n\nnamespace nlohmann\n{\n/*!\n@brief detail namespace with internal helper functions\n\nThis namespace collects functions that should not be exposed,\nimplementations of some @ref basic_json methods, and meta-programming helpers.\n\n@since version 2.1.0\n*/\nnamespace detail\n{\n/////////////\n// helpers //\n/////////////\n\n// Note to maintainers:\n//\n// Every trait in this file expects a non CV-qualified type.\n// The only exceptions are in the 'aliases for detected' section\n// (i.e. those of the form: decltype(T::member_function(std::declval<T>())))\n//\n// In this case, T has to be properly CV-qualified to constraint the function arguments\n// (e.g. to_json(BasicJsonType&, const T&))\n\ntemplate<typename> struct is_basic_json : std::false_type {};\n\nNLOHMANN_BASIC_JSON_TPL_DECLARATION\nstruct is_basic_json<NLOHMANN_BASIC_JSON_TPL> : std::true_type {};\n\n//////////////////////\n// json_ref helpers //\n//////////////////////\n\ntemplate<typename>\nclass json_ref;\n\ntemplate<typename>\nstruct is_json_ref : std::false_type {};\n\ntemplate<typename T>\nstruct is_json_ref<json_ref<T>> : std::true_type {};\n\n//////////////////////////\n// aliases for detected //\n//////////////////////////\n\ntemplate<typename T>\nusing mapped_type_t = typename T::mapped_type;\n\ntemplate<typename T>\nusing key_type_t = typename T::key_type;\n\ntemplate<typename T>\nusing value_type_t = typename T::value_type;\n\ntemplate<typename T>\nusing difference_type_t = typename T::difference_type;\n\ntemplate<typename T>\nusing pointer_t = typename T::pointer;\n\ntemplate<typename T>\nusing reference_t = typename T::reference;\n\ntemplate<typename T>\nusing iterator_category_t = typename T::iterator_category;\n\ntemplate<typename T>\nusing iterator_t = typename T::iterator;\n\ntemplate<typename T, typename... Args>\nusing to_json_function = decltype(T::to_json(std::declval<Args>()...));\n\ntemplate<typename T, typename... Args>\nusing from_json_function = decltype(T::from_json(std::declval<Args>()...));\n\ntemplate<typename T, typename U>\nusing get_template_function = decltype(std::declval<T>().template get<U>());\n\n// trait checking if JSONSerializer<T>::from_json(json const&, udt&) exists\ntemplate<typename BasicJsonType, typename T, typename = void>\nstruct has_from_json : std::false_type {};\n\n// trait checking if j.get<T> is valid\n// use this trait instead of std::is_constructible or std::is_convertible,\n// both rely on, or make use of implicit conversions, and thus fail when T\n// has several constructors/operator= (see https://github.com/nlohmann/json/issues/958)\ntemplate <typename BasicJsonType, typename T>\nstruct is_getable\n{\n    static constexpr bool value = is_detected<get_template_function, const BasicJsonType&, T>::value;\n};\n\ntemplate<typename BasicJsonType, typename T>\nstruct has_from_json < BasicJsonType, T,\n           enable_if_t < !is_basic_json<T>::value >>\n{\n    using serializer = typename BasicJsonType::template json_serializer<T, void>;\n\n    static constexpr bool value =\n        is_detected_exact<void, from_json_function, serializer,\n        const BasicJsonType&, T&>::value;\n};\n\n// This trait checks if JSONSerializer<T>::from_json(json const&) exists\n// this overload is used for non-default-constructible user-defined-types\ntemplate<typename BasicJsonType, typename T, typename = void>\nstruct has_non_default_from_json : std::false_type {};\n\ntemplate<typename BasicJsonType, typename T>\nstruct has_non_default_from_json < BasicJsonType, T, enable_if_t < !is_basic_json<T>::value >>\n{\n    using serializer = typename BasicJsonType::template json_serializer<T, void>;\n\n    static constexpr bool value =\n        is_detected_exact<T, from_json_function, serializer,\n        const BasicJsonType&>::value;\n};\n\n// This trait checks if BasicJsonType::json_serializer<T>::to_json exists\n// Do not evaluate the trait when T is a basic_json type, to avoid template instantiation infinite recursion.\ntemplate<typename BasicJsonType, typename T, typename = void>\nstruct has_to_json : std::false_type {};\n\ntemplate<typename BasicJsonType, typename T>\nstruct has_to_json < BasicJsonType, T, enable_if_t < !is_basic_json<T>::value >>\n{\n    using serializer = typename BasicJsonType::template json_serializer<T, void>;\n\n    static constexpr bool value =\n        is_detected_exact<void, to_json_function, serializer, BasicJsonType&,\n        T>::value;\n};\n\n\n///////////////////\n// is_ functions //\n///////////////////\n\ntemplate<typename T, typename = void>\nstruct is_iterator_traits : std::false_type {};\n\ntemplate<typename T>\nstruct is_iterator_traits<iterator_traits<T>>\n{\n  private:\n    using traits = iterator_traits<T>;\n\n  public:\n    static constexpr auto value =\n        is_detected<value_type_t, traits>::value &&\n        is_detected<difference_type_t, traits>::value &&\n        is_detected<pointer_t, traits>::value &&\n        is_detected<iterator_category_t, traits>::value &&\n        is_detected<reference_t, traits>::value;\n};\n\n// The following implementation of is_complete_type is taken from\n// https://blogs.msdn.microsoft.com/vcblog/2015/12/02/partial-support-for-expression-sfinae-in-vs-2015-update-1/\n// and is written by Xiang Fan who agreed to using it in this library.\n\ntemplate<typename T, typename = void>\nstruct is_complete_type : std::false_type {};\n\ntemplate<typename T>\nstruct is_complete_type<T, decltype(void(sizeof(T)))> : std::true_type {};\n\ntemplate<typename BasicJsonType, typename CompatibleObjectType,\n         typename = void>\nstruct is_compatible_object_type_impl : std::false_type {};\n\ntemplate<typename BasicJsonType, typename CompatibleObjectType>\nstruct is_compatible_object_type_impl <\n    BasicJsonType, CompatibleObjectType,\n    enable_if_t < is_detected<mapped_type_t, CompatibleObjectType>::value&&\n    is_detected<key_type_t, CompatibleObjectType>::value >>\n{\n    using object_t = typename BasicJsonType::object_t;\n\n    // macOS's is_constructible does not play well with nonesuch...\n    static constexpr bool value =\n        std::is_constructible<typename object_t::key_type,\n        typename CompatibleObjectType::key_type>::value &&\n        std::is_constructible<typename object_t::mapped_type,\n        typename CompatibleObjectType::mapped_type>::value;\n};\n\ntemplate<typename BasicJsonType, typename CompatibleObjectType>\nstruct is_compatible_object_type\n    : is_compatible_object_type_impl<BasicJsonType, CompatibleObjectType> {};\n\ntemplate<typename BasicJsonType, typename ConstructibleObjectType,\n         typename = void>\nstruct is_constructible_object_type_impl : std::false_type {};\n\ntemplate<typename BasicJsonType, typename ConstructibleObjectType>\nstruct is_constructible_object_type_impl <\n    BasicJsonType, ConstructibleObjectType,\n    enable_if_t < is_detected<mapped_type_t, ConstructibleObjectType>::value&&\n    is_detected<key_type_t, ConstructibleObjectType>::value >>\n{\n    using object_t = typename BasicJsonType::object_t;\n\n    static constexpr bool value =\n        (std::is_default_constructible<ConstructibleObjectType>::value &&\n         (std::is_move_assignable<ConstructibleObjectType>::value ||\n          std::is_copy_assignable<ConstructibleObjectType>::value) &&\n         (std::is_constructible<typename ConstructibleObjectType::key_type,\n          typename object_t::key_type>::value &&\n          std::is_same <\n          typename object_t::mapped_type,\n          typename ConstructibleObjectType::mapped_type >::value)) ||\n        (has_from_json<BasicJsonType,\n         typename ConstructibleObjectType::mapped_type>::value ||\n         has_non_default_from_json <\n         BasicJsonType,\n         typename ConstructibleObjectType::mapped_type >::value);\n};\n\ntemplate<typename BasicJsonType, typename ConstructibleObjectType>\nstruct is_constructible_object_type\n    : is_constructible_object_type_impl<BasicJsonType,\n      ConstructibleObjectType> {};\n\ntemplate<typename BasicJsonType, typename CompatibleStringType,\n         typename = void>\nstruct is_compatible_string_type_impl : std::false_type {};\n\ntemplate<typename BasicJsonType, typename CompatibleStringType>\nstruct is_compatible_string_type_impl <\n    BasicJsonType, CompatibleStringType,\n    enable_if_t<is_detected_exact<typename BasicJsonType::string_t::value_type,\n    value_type_t, CompatibleStringType>::value >>\n{\n    static constexpr auto value =\n        std::is_constructible<typename BasicJsonType::string_t, CompatibleStringType>::value;\n};\n\ntemplate<typename BasicJsonType, typename ConstructibleStringType>\nstruct is_compatible_string_type\n    : is_compatible_string_type_impl<BasicJsonType, ConstructibleStringType> {};\n\ntemplate<typename BasicJsonType, typename ConstructibleStringType,\n         typename = void>\nstruct is_constructible_string_type_impl : std::false_type {};\n\ntemplate<typename BasicJsonType, typename ConstructibleStringType>\nstruct is_constructible_string_type_impl <\n    BasicJsonType, ConstructibleStringType,\n    enable_if_t<is_detected_exact<typename BasicJsonType::string_t::value_type,\n    value_type_t, ConstructibleStringType>::value >>\n{\n    static constexpr auto value =\n        std::is_constructible<ConstructibleStringType,\n        typename BasicJsonType::string_t>::value;\n};\n\ntemplate<typename BasicJsonType, typename ConstructibleStringType>\nstruct is_constructible_string_type\n    : is_constructible_string_type_impl<BasicJsonType, ConstructibleStringType> {};\n\ntemplate<typename BasicJsonType, typename CompatibleArrayType, typename = void>\nstruct is_compatible_array_type_impl : std::false_type {};\n\ntemplate<typename BasicJsonType, typename CompatibleArrayType>\nstruct is_compatible_array_type_impl <\n    BasicJsonType, CompatibleArrayType,\n    enable_if_t < is_detected<value_type_t, CompatibleArrayType>::value&&\n    is_detected<iterator_t, CompatibleArrayType>::value&&\n// This is needed because json_reverse_iterator has a ::iterator type...\n// Therefore it is detected as a CompatibleArrayType.\n// The real fix would be to have an Iterable concept.\n    !is_iterator_traits <\n    iterator_traits<CompatibleArrayType >>::value >>\n{\n    static constexpr bool value =\n        std::is_constructible<BasicJsonType,\n        typename CompatibleArrayType::value_type>::value;\n};\n\ntemplate<typename BasicJsonType, typename CompatibleArrayType>\nstruct is_compatible_array_type\n    : is_compatible_array_type_impl<BasicJsonType, CompatibleArrayType> {};\n\ntemplate<typename BasicJsonType, typename ConstructibleArrayType, typename = void>\nstruct is_constructible_array_type_impl : std::false_type {};\n\ntemplate<typename BasicJsonType, typename ConstructibleArrayType>\nstruct is_constructible_array_type_impl <\n    BasicJsonType, ConstructibleArrayType,\n    enable_if_t<std::is_same<ConstructibleArrayType,\n    typename BasicJsonType::value_type>::value >>\n            : std::true_type {};\n\ntemplate<typename BasicJsonType, typename ConstructibleArrayType>\nstruct is_constructible_array_type_impl <\n    BasicJsonType, ConstructibleArrayType,\n    enable_if_t < !std::is_same<ConstructibleArrayType,\n    typename BasicJsonType::value_type>::value&&\n    std::is_default_constructible<ConstructibleArrayType>::value&&\n(std::is_move_assignable<ConstructibleArrayType>::value ||\n std::is_copy_assignable<ConstructibleArrayType>::value)&&\nis_detected<value_type_t, ConstructibleArrayType>::value&&\nis_detected<iterator_t, ConstructibleArrayType>::value&&\nis_complete_type <\ndetected_t<value_type_t, ConstructibleArrayType >>::value >>\n{\n    static constexpr bool value =\n        // This is needed because json_reverse_iterator has a ::iterator type,\n        // furthermore, std::back_insert_iterator (and other iterators) have a\n        // base class `iterator`... Therefore it is detected as a\n        // ConstructibleArrayType. The real fix would be to have an Iterable\n        // concept.\n        !is_iterator_traits<iterator_traits<ConstructibleArrayType>>::value &&\n\n        (std::is_same<typename ConstructibleArrayType::value_type,\n         typename BasicJsonType::array_t::value_type>::value ||\n         has_from_json<BasicJsonType,\n         typename ConstructibleArrayType::value_type>::value ||\n         has_non_default_from_json <\n         BasicJsonType, typename ConstructibleArrayType::value_type >::value);\n};\n\ntemplate<typename BasicJsonType, typename ConstructibleArrayType>\nstruct is_constructible_array_type\n    : is_constructible_array_type_impl<BasicJsonType, ConstructibleArrayType> {};\n\ntemplate<typename RealIntegerType, typename CompatibleNumberIntegerType,\n         typename = void>\nstruct is_compatible_integer_type_impl : std::false_type {};\n\ntemplate<typename RealIntegerType, typename CompatibleNumberIntegerType>\nstruct is_compatible_integer_type_impl <\n    RealIntegerType, CompatibleNumberIntegerType,\n    enable_if_t < std::is_integral<RealIntegerType>::value&&\n    std::is_integral<CompatibleNumberIntegerType>::value&&\n    !std::is_same<bool, CompatibleNumberIntegerType>::value >>\n{\n    // is there an assert somewhere on overflows?\n    using RealLimits = std::numeric_limits<RealIntegerType>;\n    using CompatibleLimits = std::numeric_limits<CompatibleNumberIntegerType>;\n\n    static constexpr auto value =\n        std::is_constructible<RealIntegerType,\n        CompatibleNumberIntegerType>::value &&\n        CompatibleLimits::is_integer &&\n        RealLimits::is_signed == CompatibleLimits::is_signed;\n};\n\ntemplate<typename RealIntegerType, typename CompatibleNumberIntegerType>\nstruct is_compatible_integer_type\n    : is_compatible_integer_type_impl<RealIntegerType,\n      CompatibleNumberIntegerType> {};\n\ntemplate<typename BasicJsonType, typename CompatibleType, typename = void>\nstruct is_compatible_type_impl: std::false_type {};\n\ntemplate<typename BasicJsonType, typename CompatibleType>\nstruct is_compatible_type_impl <\n    BasicJsonType, CompatibleType,\n    enable_if_t<is_complete_type<CompatibleType>::value >>\n{\n    static constexpr bool value =\n        has_to_json<BasicJsonType, CompatibleType>::value;\n};\n\ntemplate<typename BasicJsonType, typename CompatibleType>\nstruct is_compatible_type\n    : is_compatible_type_impl<BasicJsonType, CompatibleType> {};\n\n// https://en.cppreference.com/w/cpp/types/conjunction\ntemplate<class...> struct conjunction : std::true_type { };\ntemplate<class B1> struct conjunction<B1> : B1 { };\ntemplate<class B1, class... Bn>\nstruct conjunction<B1, Bn...>\n: std::conditional<bool(B1::value), conjunction<Bn...>, B1>::type {};\n\ntemplate<typename T1, typename T2>\nstruct is_constructible_tuple : std::false_type {};\n\ntemplate<typename T1, typename... Args>\nstruct is_constructible_tuple<T1, std::tuple<Args...>> : conjunction<std::is_constructible<T1, Args>...> {};\n}  // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/value_t.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\ntemplate<typename BasicJsonType>\nvoid from_json(const BasicJsonType& j, typename std::nullptr_t& n)\n{\n    if (JSON_HEDLEY_UNLIKELY(!j.is_null()))\n    {\n        JSON_THROW(type_error::create(302, \"type must be null, but is \" + std::string(j.type_name()), j));\n    }\n    n = nullptr;\n}\n\n// overloads for basic_json template parameters\ntemplate < typename BasicJsonType, typename ArithmeticType,\n           enable_if_t < std::is_arithmetic<ArithmeticType>::value&&\n                         !std::is_same<ArithmeticType, typename BasicJsonType::boolean_t>::value,\n                         int > = 0 >\nvoid get_arithmetic_value(const BasicJsonType& j, ArithmeticType& val)\n{\n    switch (static_cast<value_t>(j))\n    {\n        case value_t::number_unsigned:\n        {\n            val = static_cast<ArithmeticType>(*j.template get_ptr<const typename BasicJsonType::number_unsigned_t*>());\n            break;\n        }\n        case value_t::number_integer:\n        {\n            val = static_cast<ArithmeticType>(*j.template get_ptr<const typename BasicJsonType::number_integer_t*>());\n            break;\n        }\n        case value_t::number_float:\n        {\n            val = static_cast<ArithmeticType>(*j.template get_ptr<const typename BasicJsonType::number_float_t*>());\n            break;\n        }\n\n        default:\n            JSON_THROW(type_error::create(302, \"type must be number, but is \" + std::string(j.type_name()), j));\n    }\n}\n\ntemplate<typename BasicJsonType>\nvoid from_json(const BasicJsonType& j, typename BasicJsonType::boolean_t& b)\n{\n    if (JSON_HEDLEY_UNLIKELY(!j.is_boolean()))\n    {\n        JSON_THROW(type_error::create(302, \"type must be boolean, but is \" + std::string(j.type_name()), j));\n    }\n    b = *j.template get_ptr<const typename BasicJsonType::boolean_t*>();\n}\n\ntemplate<typename BasicJsonType>\nvoid from_json(const BasicJsonType& j, typename BasicJsonType::string_t& s)\n{\n    if (JSON_HEDLEY_UNLIKELY(!j.is_string()))\n    {\n        JSON_THROW(type_error::create(302, \"type must be string, but is \" + std::string(j.type_name()), j));\n    }\n    s = *j.template get_ptr<const typename BasicJsonType::string_t*>();\n}\n\ntemplate <\n    typename BasicJsonType, typename ConstructibleStringType,\n    enable_if_t <\n        is_constructible_string_type<BasicJsonType, ConstructibleStringType>::value&&\n        !std::is_same<typename BasicJsonType::string_t,\n                      ConstructibleStringType>::value,\n        int > = 0 >\nvoid from_json(const BasicJsonType& j, ConstructibleStringType& s)\n{\n    if (JSON_HEDLEY_UNLIKELY(!j.is_string()))\n    {\n        JSON_THROW(type_error::create(302, \"type must be string, but is \" + std::string(j.type_name()), j));\n    }\n\n    s = *j.template get_ptr<const typename BasicJsonType::string_t*>();\n}\n\ntemplate<typename BasicJsonType>\nvoid from_json(const BasicJsonType& j, typename BasicJsonType::number_float_t& val)\n{\n    get_arithmetic_value(j, val);\n}\n\ntemplate<typename BasicJsonType>\nvoid from_json(const BasicJsonType& j, typename BasicJsonType::number_unsigned_t& val)\n{\n    get_arithmetic_value(j, val);\n}\n\ntemplate<typename BasicJsonType>\nvoid from_json(const BasicJsonType& j, typename BasicJsonType::number_integer_t& val)\n{\n    get_arithmetic_value(j, val);\n}\n\ntemplate<typename BasicJsonType, typename EnumType,\n         enable_if_t<std::is_enum<EnumType>::value, int> = 0>\nvoid from_json(const BasicJsonType& j, EnumType& e)\n{\n    typename std::underlying_type<EnumType>::type val;\n    get_arithmetic_value(j, val);\n    e = static_cast<EnumType>(val);\n}\n\n// forward_list doesn't have an insert method\ntemplate<typename BasicJsonType, typename T, typename Allocator,\n         enable_if_t<is_getable<BasicJsonType, T>::value, int> = 0>\nvoid from_json(const BasicJsonType& j, std::forward_list<T, Allocator>& l)\n{\n    if (JSON_HEDLEY_UNLIKELY(!j.is_array()))\n    {\n        JSON_THROW(type_error::create(302, \"type must be array, but is \" + std::string(j.type_name()), j));\n    }\n    l.clear();\n    std::transform(j.rbegin(), j.rend(),\n                   std::front_inserter(l), [](const BasicJsonType & i)\n    {\n        return i.template get<T>();\n    });\n}\n\n// valarray doesn't have an insert method\ntemplate<typename BasicJsonType, typename T,\n         enable_if_t<is_getable<BasicJsonType, T>::value, int> = 0>\nvoid from_json(const BasicJsonType& j, std::valarray<T>& l)\n{\n    if (JSON_HEDLEY_UNLIKELY(!j.is_array()))\n    {\n        JSON_THROW(type_error::create(302, \"type must be array, but is \" + std::string(j.type_name()), j));\n    }\n    l.resize(j.size());\n    std::transform(j.begin(), j.end(), std::begin(l),\n                   [](const BasicJsonType & elem)\n    {\n        return elem.template get<T>();\n    });\n}\n\ntemplate<typename BasicJsonType, typename T, std::size_t N>\nauto from_json(const BasicJsonType& j, T (&arr)[N]) // NOLINT(cppcoreguidelines-avoid-c-arrays,hicpp-avoid-c-arrays,modernize-avoid-c-arrays)\n-> decltype(j.template get<T>(), void())\n{\n    for (std::size_t i = 0; i < N; ++i)\n    {\n        arr[i] = j.at(i).template get<T>();\n    }\n}\n\ntemplate<typename BasicJsonType>\nvoid from_json_array_impl(const BasicJsonType& j, typename BasicJsonType::array_t& arr, priority_tag<3> /*unused*/)\n{\n    arr = *j.template get_ptr<const typename BasicJsonType::array_t*>();\n}\n\ntemplate<typename BasicJsonType, typename T, std::size_t N>\nauto from_json_array_impl(const BasicJsonType& j, std::array<T, N>& arr,\n                          priority_tag<2> /*unused*/)\n-> decltype(j.template get<T>(), void())\n{\n    for (std::size_t i = 0; i < N; ++i)\n    {\n        arr[i] = j.at(i).template get<T>();\n    }\n}\n\ntemplate<typename BasicJsonType, typename ConstructibleArrayType>\nauto from_json_array_impl(const BasicJsonType& j, ConstructibleArrayType& arr, priority_tag<1> /*unused*/)\n-> decltype(\n    arr.reserve(std::declval<typename ConstructibleArrayType::size_type>()),\n    j.template get<typename ConstructibleArrayType::value_type>(),\n    void())\n{\n    using std::end;\n\n    ConstructibleArrayType ret;\n    ret.reserve(j.size());\n    std::transform(j.begin(), j.end(),\n                   std::inserter(ret, end(ret)), [](const BasicJsonType & i)\n    {\n        // get<BasicJsonType>() returns *this, this won't call a from_json\n        // method when value_type is BasicJsonType\n        return i.template get<typename ConstructibleArrayType::value_type>();\n    });\n    arr = std::move(ret);\n}\n\ntemplate<typename BasicJsonType, typename ConstructibleArrayType>\nvoid from_json_array_impl(const BasicJsonType& j, ConstructibleArrayType& arr,\n                          priority_tag<0> /*unused*/)\n{\n    using std::end;\n\n    ConstructibleArrayType ret;\n    std::transform(\n        j.begin(), j.end(), std::inserter(ret, end(ret)),\n        [](const BasicJsonType & i)\n    {\n        // get<BasicJsonType>() returns *this, this won't call a from_json\n        // method when value_type is BasicJsonType\n        return i.template get<typename ConstructibleArrayType::value_type>();\n    });\n    arr = std::move(ret);\n}\n\ntemplate < typename BasicJsonType, typename ConstructibleArrayType,\n           enable_if_t <\n               is_constructible_array_type<BasicJsonType, ConstructibleArrayType>::value&&\n               !is_constructible_object_type<BasicJsonType, ConstructibleArrayType>::value&&\n               !is_constructible_string_type<BasicJsonType, ConstructibleArrayType>::value&&\n               !std::is_same<ConstructibleArrayType, typename BasicJsonType::binary_t>::value&&\n               !is_basic_json<ConstructibleArrayType>::value,\n               int > = 0 >\nauto from_json(const BasicJsonType& j, ConstructibleArrayType& arr)\n-> decltype(from_json_array_impl(j, arr, priority_tag<3> {}),\nj.template get<typename ConstructibleArrayType::value_type>(),\nvoid())\n{\n    if (JSON_HEDLEY_UNLIKELY(!j.is_array()))\n    {\n        JSON_THROW(type_error::create(302, \"type must be array, but is \" + std::string(j.type_name()), j));\n    }\n\n    from_json_array_impl(j, arr, priority_tag<3> {});\n}\n\ntemplate<typename BasicJsonType>\nvoid from_json(const BasicJsonType& j, typename BasicJsonType::binary_t& bin)\n{\n    if (JSON_HEDLEY_UNLIKELY(!j.is_binary()))\n    {\n        JSON_THROW(type_error::create(302, \"type must be binary, but is \" + std::string(j.type_name()), j));\n    }\n\n    bin = *j.template get_ptr<const typename BasicJsonType::binary_t*>();\n}\n\ntemplate<typename BasicJsonType, typename ConstructibleObjectType,\n         enable_if_t<is_constructible_object_type<BasicJsonType, ConstructibleObjectType>::value, int> = 0>\nvoid from_json(const BasicJsonType& j, ConstructibleObjectType& obj)\n{\n    if (JSON_HEDLEY_UNLIKELY(!j.is_object()))\n    {\n        JSON_THROW(type_error::create(302, \"type must be object, but is \" + std::string(j.type_name()), j));\n    }\n\n    ConstructibleObjectType ret;\n    const auto* inner_object = j.template get_ptr<const typename BasicJsonType::object_t*>();\n    using value_type = typename ConstructibleObjectType::value_type;\n    std::transform(\n        inner_object->begin(), inner_object->end(),\n        std::inserter(ret, ret.begin()),\n        [](typename BasicJsonType::object_t::value_type const & p)\n    {\n        return value_type(p.first, p.second.template get<typename ConstructibleObjectType::mapped_type>());\n    });\n    obj = std::move(ret);\n}\n\n// overload for arithmetic types, not chosen for basic_json template arguments\n// (BooleanType, etc..); note: Is it really necessary to provide explicit\n// overloads for boolean_t etc. in case of a custom BooleanType which is not\n// an arithmetic type?\ntemplate < typename BasicJsonType, typename ArithmeticType,\n           enable_if_t <\n               std::is_arithmetic<ArithmeticType>::value&&\n               !std::is_same<ArithmeticType, typename BasicJsonType::number_unsigned_t>::value&&\n               !std::is_same<ArithmeticType, typename BasicJsonType::number_integer_t>::value&&\n               !std::is_same<ArithmeticType, typename BasicJsonType::number_float_t>::value&&\n               !std::is_same<ArithmeticType, typename BasicJsonType::boolean_t>::value,\n               int > = 0 >\nvoid from_json(const BasicJsonType& j, ArithmeticType& val)\n{\n    switch (static_cast<value_t>(j))\n    {\n        case value_t::number_unsigned:\n        {\n            val = static_cast<ArithmeticType>(*j.template get_ptr<const typename BasicJsonType::number_unsigned_t*>());\n            break;\n        }\n        case value_t::number_integer:\n        {\n            val = static_cast<ArithmeticType>(*j.template get_ptr<const typename BasicJsonType::number_integer_t*>());\n            break;\n        }\n        case value_t::number_float:\n        {\n            val = static_cast<ArithmeticType>(*j.template get_ptr<const typename BasicJsonType::number_float_t*>());\n            break;\n        }\n        case value_t::boolean:\n        {\n            val = static_cast<ArithmeticType>(*j.template get_ptr<const typename BasicJsonType::boolean_t*>());\n            break;\n        }\n\n        default:\n            JSON_THROW(type_error::create(302, \"type must be number, but is \" + std::string(j.type_name()), j));\n    }\n}\n\ntemplate<typename BasicJsonType, typename A1, typename A2>\nvoid from_json(const BasicJsonType& j, std::pair<A1, A2>& p)\n{\n    p = {j.at(0).template get<A1>(), j.at(1).template get<A2>()};\n}\n\ntemplate<typename BasicJsonType, typename Tuple, std::size_t... Idx>\nvoid from_json_tuple_impl(const BasicJsonType& j, Tuple& t, index_sequence<Idx...> /*unused*/)\n{\n    t = std::make_tuple(j.at(Idx).template get<typename std::tuple_element<Idx, Tuple>::type>()...);\n}\n\ntemplate<typename BasicJsonType, typename... Args>\nvoid from_json(const BasicJsonType& j, std::tuple<Args...>& t)\n{\n    from_json_tuple_impl(j, t, index_sequence_for<Args...> {});\n}\n\ntemplate < typename BasicJsonType, typename Key, typename Value, typename Compare, typename Allocator,\n           typename = enable_if_t < !std::is_constructible <\n                                        typename BasicJsonType::string_t, Key >::value >>\nvoid from_json(const BasicJsonType& j, std::map<Key, Value, Compare, Allocator>& m)\n{\n    if (JSON_HEDLEY_UNLIKELY(!j.is_array()))\n    {\n        JSON_THROW(type_error::create(302, \"type must be array, but is \" + std::string(j.type_name()), j));\n    }\n    m.clear();\n    for (const auto& p : j)\n    {\n        if (JSON_HEDLEY_UNLIKELY(!p.is_array()))\n        {\n            JSON_THROW(type_error::create(302, \"type must be array, but is \" + std::string(p.type_name()), j));\n        }\n        m.emplace(p.at(0).template get<Key>(), p.at(1).template get<Value>());\n    }\n}\n\ntemplate < typename BasicJsonType, typename Key, typename Value, typename Hash, typename KeyEqual, typename Allocator,\n           typename = enable_if_t < !std::is_constructible <\n                                        typename BasicJsonType::string_t, Key >::value >>\nvoid from_json(const BasicJsonType& j, std::unordered_map<Key, Value, Hash, KeyEqual, Allocator>& m)\n{\n    if (JSON_HEDLEY_UNLIKELY(!j.is_array()))\n    {\n        JSON_THROW(type_error::create(302, \"type must be array, but is \" + std::string(j.type_name()), j));\n    }\n    m.clear();\n    for (const auto& p : j)\n    {\n        if (JSON_HEDLEY_UNLIKELY(!p.is_array()))\n        {\n            JSON_THROW(type_error::create(302, \"type must be array, but is \" + std::string(p.type_name()), j));\n        }\n        m.emplace(p.at(0).template get<Key>(), p.at(1).template get<Value>());\n    }\n}\n\nstruct from_json_fn\n{\n    template<typename BasicJsonType, typename T>\n    auto operator()(const BasicJsonType& j, T& val) const\n    noexcept(noexcept(from_json(j, val)))\n    -> decltype(from_json(j, val), void())\n    {\n        return from_json(j, val);\n    }\n};\n}  // namespace detail\n\n/// namespace to hold default `from_json` function\n/// to see why this is required:\n/// http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4381.html\nnamespace // NOLINT(cert-dcl59-cpp,fuchsia-header-anon-namespaces,google-build-namespaces)\n{\nconstexpr const auto& from_json = detail::static_const<detail::from_json_fn>::value; // NOLINT(misc-definitions-in-headers)\n} // namespace\n} // namespace nlohmann\n\n// #include <nlohmann/detail/conversions/to_json.hpp>\n\n\n#include <algorithm> // copy\n#include <iterator> // begin, end\n#include <string> // string\n#include <tuple> // tuple, get\n#include <type_traits> // is_same, is_constructible, is_floating_point, is_enum, underlying_type\n#include <utility> // move, forward, declval, pair\n#include <valarray> // valarray\n#include <vector> // vector\n\n// #include <nlohmann/detail/iterators/iteration_proxy.hpp>\n\n\n#include <cstddef> // size_t\n#include <iterator> // input_iterator_tag\n#include <string> // string, to_string\n#include <tuple> // tuple_size, get, tuple_element\n#include <utility> // move\n\n// #include <nlohmann/detail/meta/type_traits.hpp>\n\n// #include <nlohmann/detail/value_t.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\ntemplate<typename string_type>\nvoid int_to_string( string_type& target, std::size_t value )\n{\n    // For ADL\n    using std::to_string;\n    target = to_string(value);\n}\ntemplate<typename IteratorType> class iteration_proxy_value\n{\n  public:\n    using difference_type = std::ptrdiff_t;\n    using value_type = iteration_proxy_value;\n    using pointer = value_type * ;\n    using reference = value_type & ;\n    using iterator_category = std::input_iterator_tag;\n    using string_type = typename std::remove_cv< typename std::remove_reference<decltype( std::declval<IteratorType>().key() ) >::type >::type;\n\n  private:\n    /// the iterator\n    IteratorType anchor;\n    /// an index for arrays (used to create key names)\n    std::size_t array_index = 0;\n    /// last stringified array index\n    mutable std::size_t array_index_last = 0;\n    /// a string representation of the array index\n    mutable string_type array_index_str = \"0\";\n    /// an empty string (to return a reference for primitive values)\n    const string_type empty_str{};\n\n  public:\n    explicit iteration_proxy_value(IteratorType it) noexcept\n        : anchor(std::move(it))\n    {}\n\n    /// dereference operator (needed for range-based for)\n    iteration_proxy_value& operator*()\n    {\n        return *this;\n    }\n\n    /// increment operator (needed for range-based for)\n    iteration_proxy_value& operator++()\n    {\n        ++anchor;\n        ++array_index;\n\n        return *this;\n    }\n\n    /// equality operator (needed for InputIterator)\n    bool operator==(const iteration_proxy_value& o) const\n    {\n        return anchor == o.anchor;\n    }\n\n    /// inequality operator (needed for range-based for)\n    bool operator!=(const iteration_proxy_value& o) const\n    {\n        return anchor != o.anchor;\n    }\n\n    /// return key of the iterator\n    const string_type& key() const\n    {\n        JSON_ASSERT(anchor.m_object != nullptr);\n\n        switch (anchor.m_object->type())\n        {\n            // use integer array index as key\n            case value_t::array:\n            {\n                if (array_index != array_index_last)\n                {\n                    int_to_string( array_index_str, array_index );\n                    array_index_last = array_index;\n                }\n                return array_index_str;\n            }\n\n            // use key from the object\n            case value_t::object:\n                return anchor.key();\n\n            // use an empty key for all primitive types\n            default:\n                return empty_str;\n        }\n    }\n\n    /// return value of the iterator\n    typename IteratorType::reference value() const\n    {\n        return anchor.value();\n    }\n};\n\n/// proxy class for the items() function\ntemplate<typename IteratorType> class iteration_proxy\n{\n  private:\n    /// the container to iterate\n    typename IteratorType::reference container;\n\n  public:\n    /// construct iteration proxy from a container\n    explicit iteration_proxy(typename IteratorType::reference cont) noexcept\n        : container(cont) {}\n\n    /// return iterator begin (needed for range-based for)\n    iteration_proxy_value<IteratorType> begin() noexcept\n    {\n        return iteration_proxy_value<IteratorType>(container.begin());\n    }\n\n    /// return iterator end (needed for range-based for)\n    iteration_proxy_value<IteratorType> end() noexcept\n    {\n        return iteration_proxy_value<IteratorType>(container.end());\n    }\n};\n// Structured Bindings Support\n// For further reference see https://blog.tartanllama.xyz/structured-bindings/\n// And see https://github.com/nlohmann/json/pull/1391\ntemplate<std::size_t N, typename IteratorType, enable_if_t<N == 0, int> = 0>\nauto get(const nlohmann::detail::iteration_proxy_value<IteratorType>& i) -> decltype(i.key())\n{\n    return i.key();\n}\n// Structured Bindings Support\n// For further reference see https://blog.tartanllama.xyz/structured-bindings/\n// And see https://github.com/nlohmann/json/pull/1391\ntemplate<std::size_t N, typename IteratorType, enable_if_t<N == 1, int> = 0>\nauto get(const nlohmann::detail::iteration_proxy_value<IteratorType>& i) -> decltype(i.value())\n{\n    return i.value();\n}\n}  // namespace detail\n}  // namespace nlohmann\n\n// The Addition to the STD Namespace is required to add\n// Structured Bindings Support to the iteration_proxy_value class\n// For further reference see https://blog.tartanllama.xyz/structured-bindings/\n// And see https://github.com/nlohmann/json/pull/1391\nnamespace std\n{\n#if defined(__clang__)\n    // Fix: https://github.com/nlohmann/json/issues/1401\n    #pragma clang diagnostic push\n    #pragma clang diagnostic ignored \"-Wmismatched-tags\"\n#endif\ntemplate<typename IteratorType>\nclass tuple_size<::nlohmann::detail::iteration_proxy_value<IteratorType>>\n            : public std::integral_constant<std::size_t, 2> {};\n\ntemplate<std::size_t N, typename IteratorType>\nclass tuple_element<N, ::nlohmann::detail::iteration_proxy_value<IteratorType >>\n{\n  public:\n    using type = decltype(\n                     get<N>(std::declval <\n                            ::nlohmann::detail::iteration_proxy_value<IteratorType >> ()));\n};\n#if defined(__clang__)\n    #pragma clang diagnostic pop\n#endif\n} // namespace std\n\n// #include <nlohmann/detail/meta/cpp_future.hpp>\n\n// #include <nlohmann/detail/meta/type_traits.hpp>\n\n// #include <nlohmann/detail/value_t.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\n//////////////////\n// constructors //\n//////////////////\n\ntemplate<value_t> struct external_constructor;\n\ntemplate<>\nstruct external_constructor<value_t::boolean>\n{\n    template<typename BasicJsonType>\n    static void construct(BasicJsonType& j, typename BasicJsonType::boolean_t b) noexcept\n    {\n        j.m_type = value_t::boolean;\n        j.m_value = b;\n        j.assert_invariant();\n    }\n};\n\ntemplate<>\nstruct external_constructor<value_t::string>\n{\n    template<typename BasicJsonType>\n    static void construct(BasicJsonType& j, const typename BasicJsonType::string_t& s)\n    {\n        j.m_type = value_t::string;\n        j.m_value = s;\n        j.assert_invariant();\n    }\n\n    template<typename BasicJsonType>\n    static void construct(BasicJsonType& j, typename BasicJsonType::string_t&& s)\n    {\n        j.m_type = value_t::string;\n        j.m_value = std::move(s);\n        j.assert_invariant();\n    }\n\n    template < typename BasicJsonType, typename CompatibleStringType,\n               enable_if_t < !std::is_same<CompatibleStringType, typename BasicJsonType::string_t>::value,\n                             int > = 0 >\n    static void construct(BasicJsonType& j, const CompatibleStringType& str)\n    {\n        j.m_type = value_t::string;\n        j.m_value.string = j.template create<typename BasicJsonType::string_t>(str);\n        j.assert_invariant();\n    }\n};\n\ntemplate<>\nstruct external_constructor<value_t::binary>\n{\n    template<typename BasicJsonType>\n    static void construct(BasicJsonType& j, const typename BasicJsonType::binary_t& b)\n    {\n        j.m_type = value_t::binary;\n        j.m_value = typename BasicJsonType::binary_t(b);\n        j.assert_invariant();\n    }\n\n    template<typename BasicJsonType>\n    static void construct(BasicJsonType& j, typename BasicJsonType::binary_t&& b)\n    {\n        j.m_type = value_t::binary;\n        j.m_value = typename BasicJsonType::binary_t(std::move(b));;\n        j.assert_invariant();\n    }\n};\n\ntemplate<>\nstruct external_constructor<value_t::number_float>\n{\n    template<typename BasicJsonType>\n    static void construct(BasicJsonType& j, typename BasicJsonType::number_float_t val) noexcept\n    {\n        j.m_type = value_t::number_float;\n        j.m_value = val;\n        j.assert_invariant();\n    }\n};\n\ntemplate<>\nstruct external_constructor<value_t::number_unsigned>\n{\n    template<typename BasicJsonType>\n    static void construct(BasicJsonType& j, typename BasicJsonType::number_unsigned_t val) noexcept\n    {\n        j.m_type = value_t::number_unsigned;\n        j.m_value = val;\n        j.assert_invariant();\n    }\n};\n\ntemplate<>\nstruct external_constructor<value_t::number_integer>\n{\n    template<typename BasicJsonType>\n    static void construct(BasicJsonType& j, typename BasicJsonType::number_integer_t val) noexcept\n    {\n        j.m_type = value_t::number_integer;\n        j.m_value = val;\n        j.assert_invariant();\n    }\n};\n\ntemplate<>\nstruct external_constructor<value_t::array>\n{\n    template<typename BasicJsonType>\n    static void construct(BasicJsonType& j, const typename BasicJsonType::array_t& arr)\n    {\n        j.m_type = value_t::array;\n        j.m_value = arr;\n        j.set_parents();\n        j.assert_invariant();\n    }\n\n    template<typename BasicJsonType>\n    static void construct(BasicJsonType& j, typename BasicJsonType::array_t&& arr)\n    {\n        j.m_type = value_t::array;\n        j.m_value = std::move(arr);\n        j.set_parents();\n        j.assert_invariant();\n    }\n\n    template < typename BasicJsonType, typename CompatibleArrayType,\n               enable_if_t < !std::is_same<CompatibleArrayType, typename BasicJsonType::array_t>::value,\n                             int > = 0 >\n    static void construct(BasicJsonType& j, const CompatibleArrayType& arr)\n    {\n        using std::begin;\n        using std::end;\n        j.m_type = value_t::array;\n        j.m_value.array = j.template create<typename BasicJsonType::array_t>(begin(arr), end(arr));\n        j.set_parents();\n        j.assert_invariant();\n    }\n\n    template<typename BasicJsonType>\n    static void construct(BasicJsonType& j, const std::vector<bool>& arr)\n    {\n        j.m_type = value_t::array;\n        j.m_value = value_t::array;\n        j.m_value.array->reserve(arr.size());\n        for (const bool x : arr)\n        {\n            j.m_value.array->push_back(x);\n            j.set_parent(j.m_value.array->back());\n        }\n        j.assert_invariant();\n    }\n\n    template<typename BasicJsonType, typename T,\n             enable_if_t<std::is_convertible<T, BasicJsonType>::value, int> = 0>\n    static void construct(BasicJsonType& j, const std::valarray<T>& arr)\n    {\n        j.m_type = value_t::array;\n        j.m_value = value_t::array;\n        j.m_value.array->resize(arr.size());\n        if (arr.size() > 0)\n        {\n            std::copy(std::begin(arr), std::end(arr), j.m_value.array->begin());\n        }\n        j.set_parents();\n        j.assert_invariant();\n    }\n};\n\ntemplate<>\nstruct external_constructor<value_t::object>\n{\n    template<typename BasicJsonType>\n    static void construct(BasicJsonType& j, const typename BasicJsonType::object_t& obj)\n    {\n        j.m_type = value_t::object;\n        j.m_value = obj;\n        j.set_parents();\n        j.assert_invariant();\n    }\n\n    template<typename BasicJsonType>\n    static void construct(BasicJsonType& j, typename BasicJsonType::object_t&& obj)\n    {\n        j.m_type = value_t::object;\n        j.m_value = std::move(obj);\n        j.set_parents();\n        j.assert_invariant();\n    }\n\n    template < typename BasicJsonType, typename CompatibleObjectType,\n               enable_if_t < !std::is_same<CompatibleObjectType, typename BasicJsonType::object_t>::value, int > = 0 >\n    static void construct(BasicJsonType& j, const CompatibleObjectType& obj)\n    {\n        using std::begin;\n        using std::end;\n\n        j.m_type = value_t::object;\n        j.m_value.object = j.template create<typename BasicJsonType::object_t>(begin(obj), end(obj));\n        j.set_parents();\n        j.assert_invariant();\n    }\n};\n\n/////////////\n// to_json //\n/////////////\n\ntemplate<typename BasicJsonType, typename T,\n         enable_if_t<std::is_same<T, typename BasicJsonType::boolean_t>::value, int> = 0>\nvoid to_json(BasicJsonType& j, T b) noexcept\n{\n    external_constructor<value_t::boolean>::construct(j, b);\n}\n\ntemplate<typename BasicJsonType, typename CompatibleString,\n         enable_if_t<std::is_constructible<typename BasicJsonType::string_t, CompatibleString>::value, int> = 0>\nvoid to_json(BasicJsonType& j, const CompatibleString& s)\n{\n    external_constructor<value_t::string>::construct(j, s);\n}\n\ntemplate<typename BasicJsonType>\nvoid to_json(BasicJsonType& j, typename BasicJsonType::string_t&& s)\n{\n    external_constructor<value_t::string>::construct(j, std::move(s));\n}\n\ntemplate<typename BasicJsonType, typename FloatType,\n         enable_if_t<std::is_floating_point<FloatType>::value, int> = 0>\nvoid to_json(BasicJsonType& j, FloatType val) noexcept\n{\n    external_constructor<value_t::number_float>::construct(j, static_cast<typename BasicJsonType::number_float_t>(val));\n}\n\ntemplate<typename BasicJsonType, typename CompatibleNumberUnsignedType,\n         enable_if_t<is_compatible_integer_type<typename BasicJsonType::number_unsigned_t, CompatibleNumberUnsignedType>::value, int> = 0>\nvoid to_json(BasicJsonType& j, CompatibleNumberUnsignedType val) noexcept\n{\n    external_constructor<value_t::number_unsigned>::construct(j, static_cast<typename BasicJsonType::number_unsigned_t>(val));\n}\n\ntemplate<typename BasicJsonType, typename CompatibleNumberIntegerType,\n         enable_if_t<is_compatible_integer_type<typename BasicJsonType::number_integer_t, CompatibleNumberIntegerType>::value, int> = 0>\nvoid to_json(BasicJsonType& j, CompatibleNumberIntegerType val) noexcept\n{\n    external_constructor<value_t::number_integer>::construct(j, static_cast<typename BasicJsonType::number_integer_t>(val));\n}\n\ntemplate<typename BasicJsonType, typename EnumType,\n         enable_if_t<std::is_enum<EnumType>::value, int> = 0>\nvoid to_json(BasicJsonType& j, EnumType e) noexcept\n{\n    using underlying_type = typename std::underlying_type<EnumType>::type;\n    external_constructor<value_t::number_integer>::construct(j, static_cast<underlying_type>(e));\n}\n\ntemplate<typename BasicJsonType>\nvoid to_json(BasicJsonType& j, const std::vector<bool>& e)\n{\n    external_constructor<value_t::array>::construct(j, e);\n}\n\ntemplate < typename BasicJsonType, typename CompatibleArrayType,\n           enable_if_t < is_compatible_array_type<BasicJsonType,\n                         CompatibleArrayType>::value&&\n                         !is_compatible_object_type<BasicJsonType, CompatibleArrayType>::value&&\n                         !is_compatible_string_type<BasicJsonType, CompatibleArrayType>::value&&\n                         !std::is_same<typename BasicJsonType::binary_t, CompatibleArrayType>::value&&\n                         !is_basic_json<CompatibleArrayType>::value,\n                         int > = 0 >\nvoid to_json(BasicJsonType& j, const CompatibleArrayType& arr)\n{\n    external_constructor<value_t::array>::construct(j, arr);\n}\n\ntemplate<typename BasicJsonType>\nvoid to_json(BasicJsonType& j, const typename BasicJsonType::binary_t& bin)\n{\n    external_constructor<value_t::binary>::construct(j, bin);\n}\n\ntemplate<typename BasicJsonType, typename T,\n         enable_if_t<std::is_convertible<T, BasicJsonType>::value, int> = 0>\nvoid to_json(BasicJsonType& j, const std::valarray<T>& arr)\n{\n    external_constructor<value_t::array>::construct(j, std::move(arr));\n}\n\ntemplate<typename BasicJsonType>\nvoid to_json(BasicJsonType& j, typename BasicJsonType::array_t&& arr)\n{\n    external_constructor<value_t::array>::construct(j, std::move(arr));\n}\n\ntemplate < typename BasicJsonType, typename CompatibleObjectType,\n           enable_if_t < is_compatible_object_type<BasicJsonType, CompatibleObjectType>::value&& !is_basic_json<CompatibleObjectType>::value, int > = 0 >\nvoid to_json(BasicJsonType& j, const CompatibleObjectType& obj)\n{\n    external_constructor<value_t::object>::construct(j, obj);\n}\n\ntemplate<typename BasicJsonType>\nvoid to_json(BasicJsonType& j, typename BasicJsonType::object_t&& obj)\n{\n    external_constructor<value_t::object>::construct(j, std::move(obj));\n}\n\ntemplate <\n    typename BasicJsonType, typename T, std::size_t N,\n    enable_if_t < !std::is_constructible<typename BasicJsonType::string_t,\n                  const T(&)[N]>::value, // NOLINT(cppcoreguidelines-avoid-c-arrays,hicpp-avoid-c-arrays,modernize-avoid-c-arrays)\n                  int > = 0 >\nvoid to_json(BasicJsonType& j, const T(&arr)[N]) // NOLINT(cppcoreguidelines-avoid-c-arrays,hicpp-avoid-c-arrays,modernize-avoid-c-arrays)\n{\n    external_constructor<value_t::array>::construct(j, arr);\n}\n\ntemplate < typename BasicJsonType, typename T1, typename T2, enable_if_t < std::is_constructible<BasicJsonType, T1>::value&& std::is_constructible<BasicJsonType, T2>::value, int > = 0 >\nvoid to_json(BasicJsonType& j, const std::pair<T1, T2>& p)\n{\n    j = { p.first, p.second };\n}\n\n// for https://github.com/nlohmann/json/pull/1134\ntemplate<typename BasicJsonType, typename T,\n         enable_if_t<std::is_same<T, iteration_proxy_value<typename BasicJsonType::iterator>>::value, int> = 0>\nvoid to_json(BasicJsonType& j, const T& b)\n{\n    j = { {b.key(), b.value()} };\n}\n\ntemplate<typename BasicJsonType, typename Tuple, std::size_t... Idx>\nvoid to_json_tuple_impl(BasicJsonType& j, const Tuple& t, index_sequence<Idx...> /*unused*/)\n{\n    j = { std::get<Idx>(t)... };\n}\n\ntemplate<typename BasicJsonType, typename T, enable_if_t<is_constructible_tuple<BasicJsonType, T>::value, int > = 0>\nvoid to_json(BasicJsonType& j, const T& t)\n{\n    to_json_tuple_impl(j, t, make_index_sequence<std::tuple_size<T>::value> {});\n}\n\nstruct to_json_fn\n{\n    template<typename BasicJsonType, typename T>\n    auto operator()(BasicJsonType& j, T&& val) const noexcept(noexcept(to_json(j, std::forward<T>(val))))\n    -> decltype(to_json(j, std::forward<T>(val)), void())\n    {\n        return to_json(j, std::forward<T>(val));\n    }\n};\n}  // namespace detail\n\n/// namespace to hold default `to_json` function\n/// to see why this is required:\n/// http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4381.html\nnamespace // NOLINT(cert-dcl59-cpp,fuchsia-header-anon-namespaces,google-build-namespaces)\n{\nconstexpr const auto& to_json = detail::static_const<detail::to_json_fn>::value; // NOLINT(misc-definitions-in-headers)\n} // namespace\n} // namespace nlohmann\n\n\nnamespace nlohmann\n{\n\ntemplate<typename, typename>\nstruct adl_serializer\n{\n    /*!\n    @brief convert a JSON value to any value type\n\n    This function is usually called by the `get()` function of the\n    @ref basic_json class (either explicit or via conversion operators).\n\n    @param[in] j        JSON value to read from\n    @param[in,out] val  value to write to\n    */\n    template<typename BasicJsonType, typename ValueType>\n    static auto from_json(BasicJsonType&& j, ValueType& val) noexcept(\n        noexcept(::nlohmann::from_json(std::forward<BasicJsonType>(j), val)))\n    -> decltype(::nlohmann::from_json(std::forward<BasicJsonType>(j), val), void())\n    {\n        ::nlohmann::from_json(std::forward<BasicJsonType>(j), val);\n    }\n\n    /*!\n    @brief convert any value type to a JSON value\n\n    This function is usually called by the constructors of the @ref basic_json\n    class.\n\n    @param[in,out] j  JSON value to write to\n    @param[in] val    value to read from\n    */\n    template<typename BasicJsonType, typename ValueType>\n    static auto to_json(BasicJsonType& j, ValueType&& val) noexcept(\n        noexcept(::nlohmann::to_json(j, std::forward<ValueType>(val))))\n    -> decltype(::nlohmann::to_json(j, std::forward<ValueType>(val)), void())\n    {\n        ::nlohmann::to_json(j, std::forward<ValueType>(val));\n    }\n};\n\n}  // namespace nlohmann\n\n// #include <nlohmann/byte_container_with_subtype.hpp>\n\n\n#include <cstdint> // uint8_t\n#include <tuple> // tie\n#include <utility> // move\n\nnamespace nlohmann\n{\n\n/*!\n@brief an internal type for a backed binary type\n\nThis type extends the template parameter @a BinaryType provided to `basic_json`\nwith a subtype used by BSON and MessagePack. This type exists so that the user\ndoes not have to specify a type themselves with a specific naming scheme in\norder to override the binary type.\n\n@tparam BinaryType container to store bytes (`std::vector<std::uint8_t>` by\n                   default)\n\n@since version 3.8.0\n*/\ntemplate<typename BinaryType>\nclass byte_container_with_subtype : public BinaryType\n{\n  public:\n    /// the type of the underlying container\n    using container_type = BinaryType;\n\n    byte_container_with_subtype() noexcept(noexcept(container_type()))\n        : container_type()\n    {}\n\n    byte_container_with_subtype(const container_type& b) noexcept(noexcept(container_type(b)))\n        : container_type(b)\n    {}\n\n    byte_container_with_subtype(container_type&& b) noexcept(noexcept(container_type(std::move(b))))\n        : container_type(std::move(b))\n    {}\n\n    byte_container_with_subtype(const container_type& b, std::uint8_t subtype_) noexcept(noexcept(container_type(b)))\n        : container_type(b)\n        , m_subtype(subtype_)\n        , m_has_subtype(true)\n    {}\n\n    byte_container_with_subtype(container_type&& b, std::uint8_t subtype_) noexcept(noexcept(container_type(std::move(b))))\n        : container_type(std::move(b))\n        , m_subtype(subtype_)\n        , m_has_subtype(true)\n    {}\n\n    bool operator==(const byte_container_with_subtype& rhs) const\n    {\n        return std::tie(static_cast<const BinaryType&>(*this), m_subtype, m_has_subtype) ==\n               std::tie(static_cast<const BinaryType&>(rhs), rhs.m_subtype, rhs.m_has_subtype);\n    }\n\n    bool operator!=(const byte_container_with_subtype& rhs) const\n    {\n        return !(rhs == *this);\n    }\n\n    /*!\n    @brief sets the binary subtype\n\n    Sets the binary subtype of the value, also flags a binary JSON value as\n    having a subtype, which has implications for serialization.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @sa see @ref subtype() -- return the binary subtype\n    @sa see @ref clear_subtype() -- clears the binary subtype\n    @sa see @ref has_subtype() -- returns whether or not the binary value has a\n    subtype\n\n    @since version 3.8.0\n    */\n    void set_subtype(std::uint8_t subtype_) noexcept\n    {\n        m_subtype = subtype_;\n        m_has_subtype = true;\n    }\n\n    /*!\n    @brief return the binary subtype\n\n    Returns the numerical subtype of the value if it has a subtype. If it does\n    not have a subtype, this function will return size_t(-1) as a sentinel\n    value.\n\n    @return the numerical subtype of the binary value\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @sa see @ref set_subtype() -- sets the binary subtype\n    @sa see @ref clear_subtype() -- clears the binary subtype\n    @sa see @ref has_subtype() -- returns whether or not the binary value has a\n    subtype\n\n    @since version 3.8.0\n    */\n    constexpr std::uint8_t subtype() const noexcept\n    {\n        return m_subtype;\n    }\n\n    /*!\n    @brief return whether the value has a subtype\n\n    @return whether the value has a subtype\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @sa see @ref subtype() -- return the binary subtype\n    @sa see @ref set_subtype() -- sets the binary subtype\n    @sa see @ref clear_subtype() -- clears the binary subtype\n\n    @since version 3.8.0\n    */\n    constexpr bool has_subtype() const noexcept\n    {\n        return m_has_subtype;\n    }\n\n    /*!\n    @brief clears the binary subtype\n\n    Clears the binary subtype and flags the value as not having a subtype, which\n    has implications for serialization; for instance MessagePack will prefer the\n    bin family over the ext family.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @sa see @ref subtype() -- return the binary subtype\n    @sa see @ref set_subtype() -- sets the binary subtype\n    @sa see @ref has_subtype() -- returns whether or not the binary value has a\n    subtype\n\n    @since version 3.8.0\n    */\n    void clear_subtype() noexcept\n    {\n        m_subtype = 0;\n        m_has_subtype = false;\n    }\n\n  private:\n    std::uint8_t m_subtype = 0;\n    bool m_has_subtype = false;\n};\n\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/conversions/from_json.hpp>\n\n// #include <nlohmann/detail/conversions/to_json.hpp>\n\n// #include <nlohmann/detail/exceptions.hpp>\n\n// #include <nlohmann/detail/hash.hpp>\n\n\n#include <cstdint> // uint8_t\n#include <cstddef> // size_t\n#include <functional> // hash\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\n\n// boost::hash_combine\ninline std::size_t combine(std::size_t seed, std::size_t h) noexcept\n{\n    seed ^= h + 0x9e3779b9 + (seed << 6U) + (seed >> 2U);\n    return seed;\n}\n\n/*!\n@brief hash a JSON value\n\nThe hash function tries to rely on std::hash where possible. Furthermore, the\ntype of the JSON value is taken into account to have different hash values for\nnull, 0, 0U, and false, etc.\n\n@tparam BasicJsonType basic_json specialization\n@param j JSON value to hash\n@return hash value of j\n*/\ntemplate<typename BasicJsonType>\nstd::size_t hash(const BasicJsonType& j)\n{\n    using string_t = typename BasicJsonType::string_t;\n    using number_integer_t = typename BasicJsonType::number_integer_t;\n    using number_unsigned_t = typename BasicJsonType::number_unsigned_t;\n    using number_float_t = typename BasicJsonType::number_float_t;\n\n    const auto type = static_cast<std::size_t>(j.type());\n    switch (j.type())\n    {\n        case BasicJsonType::value_t::null:\n        case BasicJsonType::value_t::discarded:\n        {\n            return combine(type, 0);\n        }\n\n        case BasicJsonType::value_t::object:\n        {\n            auto seed = combine(type, j.size());\n            for (const auto& element : j.items())\n            {\n                const auto h = std::hash<string_t> {}(element.key());\n                seed = combine(seed, h);\n                seed = combine(seed, hash(element.value()));\n            }\n            return seed;\n        }\n\n        case BasicJsonType::value_t::array:\n        {\n            auto seed = combine(type, j.size());\n            for (const auto& element : j)\n            {\n                seed = combine(seed, hash(element));\n            }\n            return seed;\n        }\n\n        case BasicJsonType::value_t::string:\n        {\n            const auto h = std::hash<string_t> {}(j.template get_ref<const string_t&>());\n            return combine(type, h);\n        }\n\n        case BasicJsonType::value_t::boolean:\n        {\n            const auto h = std::hash<bool> {}(j.template get<bool>());\n            return combine(type, h);\n        }\n\n        case BasicJsonType::value_t::number_integer:\n        {\n            const auto h = std::hash<number_integer_t> {}(j.template get<number_integer_t>());\n            return combine(type, h);\n        }\n\n        case BasicJsonType::value_t::number_unsigned:\n        {\n            const auto h = std::hash<number_unsigned_t> {}(j.template get<number_unsigned_t>());\n            return combine(type, h);\n        }\n\n        case BasicJsonType::value_t::number_float:\n        {\n            const auto h = std::hash<number_float_t> {}(j.template get<number_float_t>());\n            return combine(type, h);\n        }\n\n        case BasicJsonType::value_t::binary:\n        {\n            auto seed = combine(type, j.get_binary().size());\n            const auto h = std::hash<bool> {}(j.get_binary().has_subtype());\n            seed = combine(seed, h);\n            seed = combine(seed, j.get_binary().subtype());\n            for (const auto byte : j.get_binary())\n            {\n                seed = combine(seed, std::hash<std::uint8_t> {}(byte));\n            }\n            return seed;\n        }\n\n        default:                   // LCOV_EXCL_LINE\n            JSON_ASSERT(false); // NOLINT(cert-dcl03-c,hicpp-static-assert,misc-static-assert) LCOV_EXCL_LINE\n            return 0;              // LCOV_EXCL_LINE\n    }\n}\n\n}  // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/input/binary_reader.hpp>\n\n\n#include <algorithm> // generate_n\n#include <array> // array\n#include <cmath> // ldexp\n#include <cstddef> // size_t\n#include <cstdint> // uint8_t, uint16_t, uint32_t, uint64_t\n#include <cstdio> // snprintf\n#include <cstring> // memcpy\n#include <iterator> // back_inserter\n#include <limits> // numeric_limits\n#include <string> // char_traits, string\n#include <utility> // make_pair, move\n#include <vector> // vector\n\n// #include <nlohmann/detail/exceptions.hpp>\n\n// #include <nlohmann/detail/input/input_adapters.hpp>\n\n\n#include <array> // array\n#include <cstddef> // size_t\n#include <cstdio> //FILE *\n#include <cstring> // strlen\n#include <istream> // istream\n#include <iterator> // begin, end, iterator_traits, random_access_iterator_tag, distance, next\n#include <memory> // shared_ptr, make_shared, addressof\n#include <numeric> // accumulate\n#include <string> // string, char_traits\n#include <type_traits> // enable_if, is_base_of, is_pointer, is_integral, remove_pointer\n#include <utility> // pair, declval\n\n// #include <nlohmann/detail/iterators/iterator_traits.hpp>\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\n/// the supported input formats\nenum class input_format_t { json, cbor, msgpack, ubjson, bson };\n\n////////////////////\n// input adapters //\n////////////////////\n\n/*!\nInput adapter for stdio file access. This adapter read only 1 byte and do not use any\n buffer. This adapter is a very low level adapter.\n*/\nclass file_input_adapter\n{\n  public:\n    using char_type = char;\n\n    JSON_HEDLEY_NON_NULL(2)\n    explicit file_input_adapter(std::FILE* f) noexcept\n        : m_file(f)\n    {}\n\n    // make class move-only\n    file_input_adapter(const file_input_adapter&) = delete;\n    file_input_adapter(file_input_adapter&&) noexcept = default;\n    file_input_adapter& operator=(const file_input_adapter&) = delete;\n    file_input_adapter& operator=(file_input_adapter&&) = delete;\n    ~file_input_adapter() = default;\n\n    std::char_traits<char>::int_type get_character() noexcept\n    {\n        return std::fgetc(m_file);\n    }\n\n  private:\n    /// the file pointer to read from\n    std::FILE* m_file;\n};\n\n\n/*!\nInput adapter for a (caching) istream. Ignores a UFT Byte Order Mark at\nbeginning of input. Does not support changing the underlying std::streambuf\nin mid-input. Maintains underlying std::istream and std::streambuf to support\nsubsequent use of standard std::istream operations to process any input\ncharacters following those used in parsing the JSON input.  Clears the\nstd::istream flags; any input errors (e.g., EOF) will be detected by the first\nsubsequent call for input from the std::istream.\n*/\nclass input_stream_adapter\n{\n  public:\n    using char_type = char;\n\n    ~input_stream_adapter()\n    {\n        // clear stream flags; we use underlying streambuf I/O, do not\n        // maintain ifstream flags, except eof\n        if (is != nullptr)\n        {\n            is->clear(is->rdstate() & std::ios::eofbit);\n        }\n    }\n\n    explicit input_stream_adapter(std::istream& i)\n        : is(&i), sb(i.rdbuf())\n    {}\n\n    // delete because of pointer members\n    input_stream_adapter(const input_stream_adapter&) = delete;\n    input_stream_adapter& operator=(input_stream_adapter&) = delete;\n    input_stream_adapter& operator=(input_stream_adapter&&) = delete;\n\n    input_stream_adapter(input_stream_adapter&& rhs) noexcept\n        : is(rhs.is), sb(rhs.sb)\n    {\n        rhs.is = nullptr;\n        rhs.sb = nullptr;\n    }\n\n    // std::istream/std::streambuf use std::char_traits<char>::to_int_type, to\n    // ensure that std::char_traits<char>::eof() and the character 0xFF do not\n    // end up as the same value, eg. 0xFFFFFFFF.\n    std::char_traits<char>::int_type get_character()\n    {\n        auto res = sb->sbumpc();\n        // set eof manually, as we don't use the istream interface.\n        if (JSON_HEDLEY_UNLIKELY(res == EOF))\n        {\n            is->clear(is->rdstate() | std::ios::eofbit);\n        }\n        return res;\n    }\n\n  private:\n    /// the associated input stream\n    std::istream* is = nullptr;\n    std::streambuf* sb = nullptr;\n};\n\n// General-purpose iterator-based adapter. It might not be as fast as\n// theoretically possible for some containers, but it is extremely versatile.\ntemplate<typename IteratorType>\nclass iterator_input_adapter\n{\n  public:\n    using char_type = typename std::iterator_traits<IteratorType>::value_type;\n\n    iterator_input_adapter(IteratorType first, IteratorType last)\n        : current(std::move(first)), end(std::move(last))\n    {}\n\n    typename std::char_traits<char_type>::int_type get_character()\n    {\n        if (JSON_HEDLEY_LIKELY(current != end))\n        {\n            auto result = std::char_traits<char_type>::to_int_type(*current);\n            std::advance(current, 1);\n            return result;\n        }\n\n        return std::char_traits<char_type>::eof();\n    }\n\n  private:\n    IteratorType current;\n    IteratorType end;\n\n    template<typename BaseInputAdapter, size_t T>\n    friend struct wide_string_input_helper;\n\n    bool empty() const\n    {\n        return current == end;\n    }\n};\n\n\ntemplate<typename BaseInputAdapter, size_t T>\nstruct wide_string_input_helper;\n\ntemplate<typename BaseInputAdapter>\nstruct wide_string_input_helper<BaseInputAdapter, 4>\n{\n    // UTF-32\n    static void fill_buffer(BaseInputAdapter& input,\n                            std::array<std::char_traits<char>::int_type, 4>& utf8_bytes,\n                            size_t& utf8_bytes_index,\n                            size_t& utf8_bytes_filled)\n    {\n        utf8_bytes_index = 0;\n\n        if (JSON_HEDLEY_UNLIKELY(input.empty()))\n        {\n            utf8_bytes[0] = std::char_traits<char>::eof();\n            utf8_bytes_filled = 1;\n        }\n        else\n        {\n            // get the current character\n            const auto wc = input.get_character();\n\n            // UTF-32 to UTF-8 encoding\n            if (wc < 0x80)\n            {\n                utf8_bytes[0] = static_cast<std::char_traits<char>::int_type>(wc);\n                utf8_bytes_filled = 1;\n            }\n            else if (wc <= 0x7FF)\n            {\n                utf8_bytes[0] = static_cast<std::char_traits<char>::int_type>(0xC0u | ((static_cast<unsigned int>(wc) >> 6u) & 0x1Fu));\n                utf8_bytes[1] = static_cast<std::char_traits<char>::int_type>(0x80u | (static_cast<unsigned int>(wc) & 0x3Fu));\n                utf8_bytes_filled = 2;\n            }\n            else if (wc <= 0xFFFF)\n            {\n                utf8_bytes[0] = static_cast<std::char_traits<char>::int_type>(0xE0u | ((static_cast<unsigned int>(wc) >> 12u) & 0x0Fu));\n                utf8_bytes[1] = static_cast<std::char_traits<char>::int_type>(0x80u | ((static_cast<unsigned int>(wc) >> 6u) & 0x3Fu));\n                utf8_bytes[2] = static_cast<std::char_traits<char>::int_type>(0x80u | (static_cast<unsigned int>(wc) & 0x3Fu));\n                utf8_bytes_filled = 3;\n            }\n            else if (wc <= 0x10FFFF)\n            {\n                utf8_bytes[0] = static_cast<std::char_traits<char>::int_type>(0xF0u | ((static_cast<unsigned int>(wc) >> 18u) & 0x07u));\n                utf8_bytes[1] = static_cast<std::char_traits<char>::int_type>(0x80u | ((static_cast<unsigned int>(wc) >> 12u) & 0x3Fu));\n                utf8_bytes[2] = static_cast<std::char_traits<char>::int_type>(0x80u | ((static_cast<unsigned int>(wc) >> 6u) & 0x3Fu));\n                utf8_bytes[3] = static_cast<std::char_traits<char>::int_type>(0x80u | (static_cast<unsigned int>(wc) & 0x3Fu));\n                utf8_bytes_filled = 4;\n            }\n            else\n            {\n                // unknown character\n                utf8_bytes[0] = static_cast<std::char_traits<char>::int_type>(wc);\n                utf8_bytes_filled = 1;\n            }\n        }\n    }\n};\n\ntemplate<typename BaseInputAdapter>\nstruct wide_string_input_helper<BaseInputAdapter, 2>\n{\n    // UTF-16\n    static void fill_buffer(BaseInputAdapter& input,\n                            std::array<std::char_traits<char>::int_type, 4>& utf8_bytes,\n                            size_t& utf8_bytes_index,\n                            size_t& utf8_bytes_filled)\n    {\n        utf8_bytes_index = 0;\n\n        if (JSON_HEDLEY_UNLIKELY(input.empty()))\n        {\n            utf8_bytes[0] = std::char_traits<char>::eof();\n            utf8_bytes_filled = 1;\n        }\n        else\n        {\n            // get the current character\n            const auto wc = input.get_character();\n\n            // UTF-16 to UTF-8 encoding\n            if (wc < 0x80)\n            {\n                utf8_bytes[0] = static_cast<std::char_traits<char>::int_type>(wc);\n                utf8_bytes_filled = 1;\n            }\n            else if (wc <= 0x7FF)\n            {\n                utf8_bytes[0] = static_cast<std::char_traits<char>::int_type>(0xC0u | ((static_cast<unsigned int>(wc) >> 6u)));\n                utf8_bytes[1] = static_cast<std::char_traits<char>::int_type>(0x80u | (static_cast<unsigned int>(wc) & 0x3Fu));\n                utf8_bytes_filled = 2;\n            }\n            else if (0xD800 > wc || wc >= 0xE000)\n            {\n                utf8_bytes[0] = static_cast<std::char_traits<char>::int_type>(0xE0u | ((static_cast<unsigned int>(wc) >> 12u)));\n                utf8_bytes[1] = static_cast<std::char_traits<char>::int_type>(0x80u | ((static_cast<unsigned int>(wc) >> 6u) & 0x3Fu));\n                utf8_bytes[2] = static_cast<std::char_traits<char>::int_type>(0x80u | (static_cast<unsigned int>(wc) & 0x3Fu));\n                utf8_bytes_filled = 3;\n            }\n            else\n            {\n                if (JSON_HEDLEY_UNLIKELY(!input.empty()))\n                {\n                    const auto wc2 = static_cast<unsigned int>(input.get_character());\n                    const auto charcode = 0x10000u + (((static_cast<unsigned int>(wc) & 0x3FFu) << 10u) | (wc2 & 0x3FFu));\n                    utf8_bytes[0] = static_cast<std::char_traits<char>::int_type>(0xF0u | (charcode >> 18u));\n                    utf8_bytes[1] = static_cast<std::char_traits<char>::int_type>(0x80u | ((charcode >> 12u) & 0x3Fu));\n                    utf8_bytes[2] = static_cast<std::char_traits<char>::int_type>(0x80u | ((charcode >> 6u) & 0x3Fu));\n                    utf8_bytes[3] = static_cast<std::char_traits<char>::int_type>(0x80u | (charcode & 0x3Fu));\n                    utf8_bytes_filled = 4;\n                }\n                else\n                {\n                    utf8_bytes[0] = static_cast<std::char_traits<char>::int_type>(wc);\n                    utf8_bytes_filled = 1;\n                }\n            }\n        }\n    }\n};\n\n// Wraps another input apdater to convert wide character types into individual bytes.\ntemplate<typename BaseInputAdapter, typename WideCharType>\nclass wide_string_input_adapter\n{\n  public:\n    using char_type = char;\n\n    wide_string_input_adapter(BaseInputAdapter base)\n        : base_adapter(base) {}\n\n    typename std::char_traits<char>::int_type get_character() noexcept\n    {\n        // check if buffer needs to be filled\n        if (utf8_bytes_index == utf8_bytes_filled)\n        {\n            fill_buffer<sizeof(WideCharType)>();\n\n            JSON_ASSERT(utf8_bytes_filled > 0);\n            JSON_ASSERT(utf8_bytes_index == 0);\n        }\n\n        // use buffer\n        JSON_ASSERT(utf8_bytes_filled > 0);\n        JSON_ASSERT(utf8_bytes_index < utf8_bytes_filled);\n        return utf8_bytes[utf8_bytes_index++];\n    }\n\n  private:\n    BaseInputAdapter base_adapter;\n\n    template<size_t T>\n    void fill_buffer()\n    {\n        wide_string_input_helper<BaseInputAdapter, T>::fill_buffer(base_adapter, utf8_bytes, utf8_bytes_index, utf8_bytes_filled);\n    }\n\n    /// a buffer for UTF-8 bytes\n    std::array<std::char_traits<char>::int_type, 4> utf8_bytes = {{0, 0, 0, 0}};\n\n    /// index to the utf8_codes array for the next valid byte\n    std::size_t utf8_bytes_index = 0;\n    /// number of valid bytes in the utf8_codes array\n    std::size_t utf8_bytes_filled = 0;\n};\n\n\ntemplate<typename IteratorType, typename Enable = void>\nstruct iterator_input_adapter_factory\n{\n    using iterator_type = IteratorType;\n    using char_type = typename std::iterator_traits<iterator_type>::value_type;\n    using adapter_type = iterator_input_adapter<iterator_type>;\n\n    static adapter_type create(IteratorType first, IteratorType last)\n    {\n        return adapter_type(std::move(first), std::move(last));\n    }\n};\n\ntemplate<typename T>\nstruct is_iterator_of_multibyte\n{\n    using value_type = typename std::iterator_traits<T>::value_type;\n    enum\n    {\n        value = sizeof(value_type) > 1\n    };\n};\n\ntemplate<typename IteratorType>\nstruct iterator_input_adapter_factory<IteratorType, enable_if_t<is_iterator_of_multibyte<IteratorType>::value>>\n{\n    using iterator_type = IteratorType;\n    using char_type = typename std::iterator_traits<iterator_type>::value_type;\n    using base_adapter_type = iterator_input_adapter<iterator_type>;\n    using adapter_type = wide_string_input_adapter<base_adapter_type, char_type>;\n\n    static adapter_type create(IteratorType first, IteratorType last)\n    {\n        return adapter_type(base_adapter_type(std::move(first), std::move(last)));\n    }\n};\n\n// General purpose iterator-based input\ntemplate<typename IteratorType>\ntypename iterator_input_adapter_factory<IteratorType>::adapter_type input_adapter(IteratorType first, IteratorType last)\n{\n    using factory_type = iterator_input_adapter_factory<IteratorType>;\n    return factory_type::create(first, last);\n}\n\n// Convenience shorthand from container to iterator\n// Enables ADL on begin(container) and end(container)\n// Encloses the using declarations in namespace for not to leak them to outside scope\n\nnamespace container_input_adapter_factory_impl\n{\n\nusing std::begin;\nusing std::end;\n\ntemplate<typename ContainerType, typename Enable = void>\nstruct container_input_adapter_factory {};\n\ntemplate<typename ContainerType>\nstruct container_input_adapter_factory< ContainerType,\n       void_t<decltype(begin(std::declval<ContainerType>()), end(std::declval<ContainerType>()))>>\n       {\n           using adapter_type = decltype(input_adapter(begin(std::declval<ContainerType>()), end(std::declval<ContainerType>())));\n\n           static adapter_type create(const ContainerType& container)\n{\n    return input_adapter(begin(container), end(container));\n}\n       };\n\n} // namespace container_input_adapter_factory_impl\n\ntemplate<typename ContainerType>\ntypename container_input_adapter_factory_impl::container_input_adapter_factory<ContainerType>::adapter_type input_adapter(const ContainerType& container)\n{\n    return container_input_adapter_factory_impl::container_input_adapter_factory<ContainerType>::create(container);\n}\n\n// Special cases with fast paths\ninline file_input_adapter input_adapter(std::FILE* file)\n{\n    return file_input_adapter(file);\n}\n\ninline input_stream_adapter input_adapter(std::istream& stream)\n{\n    return input_stream_adapter(stream);\n}\n\ninline input_stream_adapter input_adapter(std::istream&& stream)\n{\n    return input_stream_adapter(stream);\n}\n\nusing contiguous_bytes_input_adapter = decltype(input_adapter(std::declval<const char*>(), std::declval<const char*>()));\n\n// Null-delimited strings, and the like.\ntemplate < typename CharT,\n           typename std::enable_if <\n               std::is_pointer<CharT>::value&&\n               !std::is_array<CharT>::value&&\n               std::is_integral<typename std::remove_pointer<CharT>::type>::value&&\n               sizeof(typename std::remove_pointer<CharT>::type) == 1,\n               int >::type = 0 >\ncontiguous_bytes_input_adapter input_adapter(CharT b)\n{\n    auto length = std::strlen(reinterpret_cast<const char*>(b));\n    const auto* ptr = reinterpret_cast<const char*>(b);\n    return input_adapter(ptr, ptr + length);\n}\n\ntemplate<typename T, std::size_t N>\nauto input_adapter(T (&array)[N]) -> decltype(input_adapter(array, array + N)) // NOLINT(cppcoreguidelines-avoid-c-arrays,hicpp-avoid-c-arrays,modernize-avoid-c-arrays)\n{\n    return input_adapter(array, array + N);\n}\n\n// This class only handles inputs of input_buffer_adapter type.\n// It's required so that expressions like {ptr, len} can be implicitely casted\n// to the correct adapter.\nclass span_input_adapter\n{\n  public:\n    template < typename CharT,\n               typename std::enable_if <\n                   std::is_pointer<CharT>::value&&\n                   std::is_integral<typename std::remove_pointer<CharT>::type>::value&&\n                   sizeof(typename std::remove_pointer<CharT>::type) == 1,\n                   int >::type = 0 >\n    span_input_adapter(CharT b, std::size_t l)\n        : ia(reinterpret_cast<const char*>(b), reinterpret_cast<const char*>(b) + l) {}\n\n    template<class IteratorType,\n             typename std::enable_if<\n                 std::is_same<typename iterator_traits<IteratorType>::iterator_category, std::random_access_iterator_tag>::value,\n                 int>::type = 0>\n    span_input_adapter(IteratorType first, IteratorType last)\n        : ia(input_adapter(first, last)) {}\n\n    contiguous_bytes_input_adapter&& get()\n    {\n        return std::move(ia); // NOLINT(hicpp-move-const-arg,performance-move-const-arg)\n    }\n\n  private:\n    contiguous_bytes_input_adapter ia;\n};\n}  // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/input/json_sax.hpp>\n\n\n#include <cstddef>\n#include <string> // string\n#include <utility> // move\n#include <vector> // vector\n\n// #include <nlohmann/detail/exceptions.hpp>\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n\nnamespace nlohmann\n{\n\n/*!\n@brief SAX interface\n\nThis class describes the SAX interface used by @ref nlohmann::json::sax_parse.\nEach function is called in different situations while the input is parsed. The\nboolean return value informs the parser whether to continue processing the\ninput.\n*/\ntemplate<typename BasicJsonType>\nstruct json_sax\n{\n    using number_integer_t = typename BasicJsonType::number_integer_t;\n    using number_unsigned_t = typename BasicJsonType::number_unsigned_t;\n    using number_float_t = typename BasicJsonType::number_float_t;\n    using string_t = typename BasicJsonType::string_t;\n    using binary_t = typename BasicJsonType::binary_t;\n\n    /*!\n    @brief a null value was read\n    @return whether parsing should proceed\n    */\n    virtual bool null() = 0;\n\n    /*!\n    @brief a boolean value was read\n    @param[in] val  boolean value\n    @return whether parsing should proceed\n    */\n    virtual bool boolean(bool val) = 0;\n\n    /*!\n    @brief an integer number was read\n    @param[in] val  integer value\n    @return whether parsing should proceed\n    */\n    virtual bool number_integer(number_integer_t val) = 0;\n\n    /*!\n    @brief an unsigned integer number was read\n    @param[in] val  unsigned integer value\n    @return whether parsing should proceed\n    */\n    virtual bool number_unsigned(number_unsigned_t val) = 0;\n\n    /*!\n    @brief an floating-point number was read\n    @param[in] val  floating-point value\n    @param[in] s    raw token value\n    @return whether parsing should proceed\n    */\n    virtual bool number_float(number_float_t val, const string_t& s) = 0;\n\n    /*!\n    @brief a string was read\n    @param[in] val  string value\n    @return whether parsing should proceed\n    @note It is safe to move the passed string.\n    */\n    virtual bool string(string_t& val) = 0;\n\n    /*!\n    @brief a binary string was read\n    @param[in] val  binary value\n    @return whether parsing should proceed\n    @note It is safe to move the passed binary.\n    */\n    virtual bool binary(binary_t& val) = 0;\n\n    /*!\n    @brief the beginning of an object was read\n    @param[in] elements  number of object elements or -1 if unknown\n    @return whether parsing should proceed\n    @note binary formats may report the number of elements\n    */\n    virtual bool start_object(std::size_t elements) = 0;\n\n    /*!\n    @brief an object key was read\n    @param[in] val  object key\n    @return whether parsing should proceed\n    @note It is safe to move the passed string.\n    */\n    virtual bool key(string_t& val) = 0;\n\n    /*!\n    @brief the end of an object was read\n    @return whether parsing should proceed\n    */\n    virtual bool end_object() = 0;\n\n    /*!\n    @brief the beginning of an array was read\n    @param[in] elements  number of array elements or -1 if unknown\n    @return whether parsing should proceed\n    @note binary formats may report the number of elements\n    */\n    virtual bool start_array(std::size_t elements) = 0;\n\n    /*!\n    @brief the end of an array was read\n    @return whether parsing should proceed\n    */\n    virtual bool end_array() = 0;\n\n    /*!\n    @brief a parse error occurred\n    @param[in] position    the position in the input where the error occurs\n    @param[in] last_token  the last read token\n    @param[in] ex          an exception object describing the error\n    @return whether parsing should proceed (must return false)\n    */\n    virtual bool parse_error(std::size_t position,\n                             const std::string& last_token,\n                             const detail::exception& ex) = 0;\n\n    json_sax() = default;\n    json_sax(const json_sax&) = default;\n    json_sax(json_sax&&) noexcept = default;\n    json_sax& operator=(const json_sax&) = default;\n    json_sax& operator=(json_sax&&) noexcept = default;\n    virtual ~json_sax() = default;\n};\n\n\nnamespace detail\n{\n/*!\n@brief SAX implementation to create a JSON value from SAX events\n\nThis class implements the @ref json_sax interface and processes the SAX events\nto create a JSON value which makes it basically a DOM parser. The structure or\nhierarchy of the JSON value is managed by the stack `ref_stack` which contains\na pointer to the respective array or object for each recursion depth.\n\nAfter successful parsing, the value that is passed by reference to the\nconstructor contains the parsed value.\n\n@tparam BasicJsonType  the JSON type\n*/\ntemplate<typename BasicJsonType>\nclass json_sax_dom_parser\n{\n  public:\n    using number_integer_t = typename BasicJsonType::number_integer_t;\n    using number_unsigned_t = typename BasicJsonType::number_unsigned_t;\n    using number_float_t = typename BasicJsonType::number_float_t;\n    using string_t = typename BasicJsonType::string_t;\n    using binary_t = typename BasicJsonType::binary_t;\n\n    /*!\n    @param[in,out] r  reference to a JSON value that is manipulated while\n                       parsing\n    @param[in] allow_exceptions_  whether parse errors yield exceptions\n    */\n    explicit json_sax_dom_parser(BasicJsonType& r, const bool allow_exceptions_ = true)\n        : root(r), allow_exceptions(allow_exceptions_)\n    {}\n\n    // make class move-only\n    json_sax_dom_parser(const json_sax_dom_parser&) = delete;\n    json_sax_dom_parser(json_sax_dom_parser&&) = default; // NOLINT(hicpp-noexcept-move,performance-noexcept-move-constructor)\n    json_sax_dom_parser& operator=(const json_sax_dom_parser&) = delete;\n    json_sax_dom_parser& operator=(json_sax_dom_parser&&) = default; // NOLINT(hicpp-noexcept-move,performance-noexcept-move-constructor)\n    ~json_sax_dom_parser() = default;\n\n    bool null()\n    {\n        handle_value(nullptr);\n        return true;\n    }\n\n    bool boolean(bool val)\n    {\n        handle_value(val);\n        return true;\n    }\n\n    bool number_integer(number_integer_t val)\n    {\n        handle_value(val);\n        return true;\n    }\n\n    bool number_unsigned(number_unsigned_t val)\n    {\n        handle_value(val);\n        return true;\n    }\n\n    bool number_float(number_float_t val, const string_t& /*unused*/)\n    {\n        handle_value(val);\n        return true;\n    }\n\n    bool string(string_t& val)\n    {\n        handle_value(val);\n        return true;\n    }\n\n    bool binary(binary_t& val)\n    {\n        handle_value(std::move(val));\n        return true;\n    }\n\n    bool start_object(std::size_t len)\n    {\n        ref_stack.push_back(handle_value(BasicJsonType::value_t::object));\n\n        if (JSON_HEDLEY_UNLIKELY(len != std::size_t(-1) && len > ref_stack.back()->max_size()))\n        {\n            JSON_THROW(out_of_range::create(408, \"excessive object size: \" + std::to_string(len), *ref_stack.back()));\n        }\n\n        return true;\n    }\n\n    bool key(string_t& val)\n    {\n        // add null at given key and store the reference for later\n        object_element = &(ref_stack.back()->m_value.object->operator[](val));\n        return true;\n    }\n\n    bool end_object()\n    {\n        ref_stack.back()->set_parents();\n        ref_stack.pop_back();\n        return true;\n    }\n\n    bool start_array(std::size_t len)\n    {\n        ref_stack.push_back(handle_value(BasicJsonType::value_t::array));\n\n        if (JSON_HEDLEY_UNLIKELY(len != std::size_t(-1) && len > ref_stack.back()->max_size()))\n        {\n            JSON_THROW(out_of_range::create(408, \"excessive array size: \" + std::to_string(len), *ref_stack.back()));\n        }\n\n        return true;\n    }\n\n    bool end_array()\n    {\n        ref_stack.back()->set_parents();\n        ref_stack.pop_back();\n        return true;\n    }\n\n    template<class Exception>\n    bool parse_error(std::size_t /*unused*/, const std::string& /*unused*/,\n                     const Exception& ex)\n    {\n        errored = true;\n        static_cast<void>(ex);\n        if (allow_exceptions)\n        {\n            JSON_THROW(ex);\n        }\n        return false;\n    }\n\n    constexpr bool is_errored() const\n    {\n        return errored;\n    }\n\n  private:\n    /*!\n    @invariant If the ref stack is empty, then the passed value will be the new\n               root.\n    @invariant If the ref stack contains a value, then it is an array or an\n               object to which we can add elements\n    */\n    template<typename Value>\n    JSON_HEDLEY_RETURNS_NON_NULL\n    BasicJsonType* handle_value(Value&& v)\n    {\n        if (ref_stack.empty())\n        {\n            root = BasicJsonType(std::forward<Value>(v));\n            return &root;\n        }\n\n        JSON_ASSERT(ref_stack.back()->is_array() || ref_stack.back()->is_object());\n\n        if (ref_stack.back()->is_array())\n        {\n            ref_stack.back()->m_value.array->emplace_back(std::forward<Value>(v));\n            return &(ref_stack.back()->m_value.array->back());\n        }\n\n        JSON_ASSERT(ref_stack.back()->is_object());\n        JSON_ASSERT(object_element);\n        *object_element = BasicJsonType(std::forward<Value>(v));\n        return object_element;\n    }\n\n    /// the parsed JSON value\n    BasicJsonType& root;\n    /// stack to model hierarchy of values\n    std::vector<BasicJsonType*> ref_stack {};\n    /// helper to hold the reference for the next object element\n    BasicJsonType* object_element = nullptr;\n    /// whether a syntax error occurred\n    bool errored = false;\n    /// whether to throw exceptions in case of errors\n    const bool allow_exceptions = true;\n};\n\ntemplate<typename BasicJsonType>\nclass json_sax_dom_callback_parser\n{\n  public:\n    using number_integer_t = typename BasicJsonType::number_integer_t;\n    using number_unsigned_t = typename BasicJsonType::number_unsigned_t;\n    using number_float_t = typename BasicJsonType::number_float_t;\n    using string_t = typename BasicJsonType::string_t;\n    using binary_t = typename BasicJsonType::binary_t;\n    using parser_callback_t = typename BasicJsonType::parser_callback_t;\n    using parse_event_t = typename BasicJsonType::parse_event_t;\n\n    json_sax_dom_callback_parser(BasicJsonType& r,\n                                 const parser_callback_t cb,\n                                 const bool allow_exceptions_ = true)\n        : root(r), callback(cb), allow_exceptions(allow_exceptions_)\n    {\n        keep_stack.push_back(true);\n    }\n\n    // make class move-only\n    json_sax_dom_callback_parser(const json_sax_dom_callback_parser&) = delete;\n    json_sax_dom_callback_parser(json_sax_dom_callback_parser&&) = default; // NOLINT(hicpp-noexcept-move,performance-noexcept-move-constructor)\n    json_sax_dom_callback_parser& operator=(const json_sax_dom_callback_parser&) = delete;\n    json_sax_dom_callback_parser& operator=(json_sax_dom_callback_parser&&) = default; // NOLINT(hicpp-noexcept-move,performance-noexcept-move-constructor)\n    ~json_sax_dom_callback_parser() = default;\n\n    bool null()\n    {\n        handle_value(nullptr);\n        return true;\n    }\n\n    bool boolean(bool val)\n    {\n        handle_value(val);\n        return true;\n    }\n\n    bool number_integer(number_integer_t val)\n    {\n        handle_value(val);\n        return true;\n    }\n\n    bool number_unsigned(number_unsigned_t val)\n    {\n        handle_value(val);\n        return true;\n    }\n\n    bool number_float(number_float_t val, const string_t& /*unused*/)\n    {\n        handle_value(val);\n        return true;\n    }\n\n    bool string(string_t& val)\n    {\n        handle_value(val);\n        return true;\n    }\n\n    bool binary(binary_t& val)\n    {\n        handle_value(std::move(val));\n        return true;\n    }\n\n    bool start_object(std::size_t len)\n    {\n        // check callback for object start\n        const bool keep = callback(static_cast<int>(ref_stack.size()), parse_event_t::object_start, discarded);\n        keep_stack.push_back(keep);\n\n        auto val = handle_value(BasicJsonType::value_t::object, true);\n        ref_stack.push_back(val.second);\n\n        // check object limit\n        if (ref_stack.back() && JSON_HEDLEY_UNLIKELY(len != std::size_t(-1) && len > ref_stack.back()->max_size()))\n        {\n            JSON_THROW(out_of_range::create(408, \"excessive object size: \" + std::to_string(len), *ref_stack.back()));\n        }\n\n        return true;\n    }\n\n    bool key(string_t& val)\n    {\n        BasicJsonType k = BasicJsonType(val);\n\n        // check callback for key\n        const bool keep = callback(static_cast<int>(ref_stack.size()), parse_event_t::key, k);\n        key_keep_stack.push_back(keep);\n\n        // add discarded value at given key and store the reference for later\n        if (keep && ref_stack.back())\n        {\n            object_element = &(ref_stack.back()->m_value.object->operator[](val) = discarded);\n        }\n\n        return true;\n    }\n\n    bool end_object()\n    {\n        if (ref_stack.back())\n        {\n            if (!callback(static_cast<int>(ref_stack.size()) - 1, parse_event_t::object_end, *ref_stack.back()))\n            {\n                // discard object\n                *ref_stack.back() = discarded;\n            }\n            else\n            {\n                ref_stack.back()->set_parents();\n            }\n        }\n\n        JSON_ASSERT(!ref_stack.empty());\n        JSON_ASSERT(!keep_stack.empty());\n        ref_stack.pop_back();\n        keep_stack.pop_back();\n\n        if (!ref_stack.empty() && ref_stack.back() && ref_stack.back()->is_structured())\n        {\n            // remove discarded value\n            for (auto it = ref_stack.back()->begin(); it != ref_stack.back()->end(); ++it)\n            {\n                if (it->is_discarded())\n                {\n                    ref_stack.back()->erase(it);\n                    break;\n                }\n            }\n        }\n\n        return true;\n    }\n\n    bool start_array(std::size_t len)\n    {\n        const bool keep = callback(static_cast<int>(ref_stack.size()), parse_event_t::array_start, discarded);\n        keep_stack.push_back(keep);\n\n        auto val = handle_value(BasicJsonType::value_t::array, true);\n        ref_stack.push_back(val.second);\n\n        // check array limit\n        if (ref_stack.back() && JSON_HEDLEY_UNLIKELY(len != std::size_t(-1) && len > ref_stack.back()->max_size()))\n        {\n            JSON_THROW(out_of_range::create(408, \"excessive array size: \" + std::to_string(len), *ref_stack.back()));\n        }\n\n        return true;\n    }\n\n    bool end_array()\n    {\n        bool keep = true;\n\n        if (ref_stack.back())\n        {\n            keep = callback(static_cast<int>(ref_stack.size()) - 1, parse_event_t::array_end, *ref_stack.back());\n            if (keep)\n            {\n                ref_stack.back()->set_parents();\n            }\n            else\n            {\n                // discard array\n                *ref_stack.back() = discarded;\n            }\n        }\n\n        JSON_ASSERT(!ref_stack.empty());\n        JSON_ASSERT(!keep_stack.empty());\n        ref_stack.pop_back();\n        keep_stack.pop_back();\n\n        // remove discarded value\n        if (!keep && !ref_stack.empty() && ref_stack.back()->is_array())\n        {\n            ref_stack.back()->m_value.array->pop_back();\n        }\n\n        return true;\n    }\n\n    template<class Exception>\n    bool parse_error(std::size_t /*unused*/, const std::string& /*unused*/,\n                     const Exception& ex)\n    {\n        errored = true;\n        static_cast<void>(ex);\n        if (allow_exceptions)\n        {\n            JSON_THROW(ex);\n        }\n        return false;\n    }\n\n    constexpr bool is_errored() const\n    {\n        return errored;\n    }\n\n  private:\n    /*!\n    @param[in] v  value to add to the JSON value we build during parsing\n    @param[in] skip_callback  whether we should skip calling the callback\n               function; this is required after start_array() and\n               start_object() SAX events, because otherwise we would call the\n               callback function with an empty array or object, respectively.\n\n    @invariant If the ref stack is empty, then the passed value will be the new\n               root.\n    @invariant If the ref stack contains a value, then it is an array or an\n               object to which we can add elements\n\n    @return pair of boolean (whether value should be kept) and pointer (to the\n            passed value in the ref_stack hierarchy; nullptr if not kept)\n    */\n    template<typename Value>\n    std::pair<bool, BasicJsonType*> handle_value(Value&& v, const bool skip_callback = false)\n    {\n        JSON_ASSERT(!keep_stack.empty());\n\n        // do not handle this value if we know it would be added to a discarded\n        // container\n        if (!keep_stack.back())\n        {\n            return {false, nullptr};\n        }\n\n        // create value\n        auto value = BasicJsonType(std::forward<Value>(v));\n\n        // check callback\n        const bool keep = skip_callback || callback(static_cast<int>(ref_stack.size()), parse_event_t::value, value);\n\n        // do not handle this value if we just learnt it shall be discarded\n        if (!keep)\n        {\n            return {false, nullptr};\n        }\n\n        if (ref_stack.empty())\n        {\n            root = std::move(value);\n            return {true, &root};\n        }\n\n        // skip this value if we already decided to skip the parent\n        // (https://github.com/nlohmann/json/issues/971#issuecomment-413678360)\n        if (!ref_stack.back())\n        {\n            return {false, nullptr};\n        }\n\n        // we now only expect arrays and objects\n        JSON_ASSERT(ref_stack.back()->is_array() || ref_stack.back()->is_object());\n\n        // array\n        if (ref_stack.back()->is_array())\n        {\n            ref_stack.back()->m_value.array->emplace_back(std::move(value));\n            return {true, &(ref_stack.back()->m_value.array->back())};\n        }\n\n        // object\n        JSON_ASSERT(ref_stack.back()->is_object());\n        // check if we should store an element for the current key\n        JSON_ASSERT(!key_keep_stack.empty());\n        const bool store_element = key_keep_stack.back();\n        key_keep_stack.pop_back();\n\n        if (!store_element)\n        {\n            return {false, nullptr};\n        }\n\n        JSON_ASSERT(object_element);\n        *object_element = std::move(value);\n        return {true, object_element};\n    }\n\n    /// the parsed JSON value\n    BasicJsonType& root;\n    /// stack to model hierarchy of values\n    std::vector<BasicJsonType*> ref_stack {};\n    /// stack to manage which values to keep\n    std::vector<bool> keep_stack {};\n    /// stack to manage which object keys to keep\n    std::vector<bool> key_keep_stack {};\n    /// helper to hold the reference for the next object element\n    BasicJsonType* object_element = nullptr;\n    /// whether a syntax error occurred\n    bool errored = false;\n    /// callback function\n    const parser_callback_t callback = nullptr;\n    /// whether to throw exceptions in case of errors\n    const bool allow_exceptions = true;\n    /// a discarded value for the callback\n    BasicJsonType discarded = BasicJsonType::value_t::discarded;\n};\n\ntemplate<typename BasicJsonType>\nclass json_sax_acceptor\n{\n  public:\n    using number_integer_t = typename BasicJsonType::number_integer_t;\n    using number_unsigned_t = typename BasicJsonType::number_unsigned_t;\n    using number_float_t = typename BasicJsonType::number_float_t;\n    using string_t = typename BasicJsonType::string_t;\n    using binary_t = typename BasicJsonType::binary_t;\n\n    bool null()\n    {\n        return true;\n    }\n\n    bool boolean(bool /*unused*/)\n    {\n        return true;\n    }\n\n    bool number_integer(number_integer_t /*unused*/)\n    {\n        return true;\n    }\n\n    bool number_unsigned(number_unsigned_t /*unused*/)\n    {\n        return true;\n    }\n\n    bool number_float(number_float_t /*unused*/, const string_t& /*unused*/)\n    {\n        return true;\n    }\n\n    bool string(string_t& /*unused*/)\n    {\n        return true;\n    }\n\n    bool binary(binary_t& /*unused*/)\n    {\n        return true;\n    }\n\n    bool start_object(std::size_t /*unused*/ = std::size_t(-1))\n    {\n        return true;\n    }\n\n    bool key(string_t& /*unused*/)\n    {\n        return true;\n    }\n\n    bool end_object()\n    {\n        return true;\n    }\n\n    bool start_array(std::size_t /*unused*/ = std::size_t(-1))\n    {\n        return true;\n    }\n\n    bool end_array()\n    {\n        return true;\n    }\n\n    bool parse_error(std::size_t /*unused*/, const std::string& /*unused*/, const detail::exception& /*unused*/)\n    {\n        return false;\n    }\n};\n}  // namespace detail\n\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/input/lexer.hpp>\n\n\n#include <array> // array\n#include <clocale> // localeconv\n#include <cstddef> // size_t\n#include <cstdio> // snprintf\n#include <cstdlib> // strtof, strtod, strtold, strtoll, strtoull\n#include <initializer_list> // initializer_list\n#include <string> // char_traits, string\n#include <utility> // move\n#include <vector> // vector\n\n// #include <nlohmann/detail/input/input_adapters.hpp>\n\n// #include <nlohmann/detail/input/position_t.hpp>\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\n///////////\n// lexer //\n///////////\n\ntemplate<typename BasicJsonType>\nclass lexer_base\n{\n  public:\n    /// token types for the parser\n    enum class token_type\n    {\n        uninitialized,    ///< indicating the scanner is uninitialized\n        literal_true,     ///< the `true` literal\n        literal_false,    ///< the `false` literal\n        literal_null,     ///< the `null` literal\n        value_string,     ///< a string -- use get_string() for actual value\n        value_unsigned,   ///< an unsigned integer -- use get_number_unsigned() for actual value\n        value_integer,    ///< a signed integer -- use get_number_integer() for actual value\n        value_float,      ///< an floating point number -- use get_number_float() for actual value\n        begin_array,      ///< the character for array begin `[`\n        begin_object,     ///< the character for object begin `{`\n        end_array,        ///< the character for array end `]`\n        end_object,       ///< the character for object end `}`\n        name_separator,   ///< the name separator `:`\n        value_separator,  ///< the value separator `,`\n        parse_error,      ///< indicating a parse error\n        end_of_input,     ///< indicating the end of the input buffer\n        literal_or_value  ///< a literal or the begin of a value (only for diagnostics)\n    };\n\n    /// return name of values of type token_type (only used for errors)\n    JSON_HEDLEY_RETURNS_NON_NULL\n    JSON_HEDLEY_CONST\n    static const char* token_type_name(const token_type t) noexcept\n    {\n        switch (t)\n        {\n            case token_type::uninitialized:\n                return \"<uninitialized>\";\n            case token_type::literal_true:\n                return \"true literal\";\n            case token_type::literal_false:\n                return \"false literal\";\n            case token_type::literal_null:\n                return \"null literal\";\n            case token_type::value_string:\n                return \"string literal\";\n            case token_type::value_unsigned:\n            case token_type::value_integer:\n            case token_type::value_float:\n                return \"number literal\";\n            case token_type::begin_array:\n                return \"'['\";\n            case token_type::begin_object:\n                return \"'{'\";\n            case token_type::end_array:\n                return \"']'\";\n            case token_type::end_object:\n                return \"'}'\";\n            case token_type::name_separator:\n                return \"':'\";\n            case token_type::value_separator:\n                return \"','\";\n            case token_type::parse_error:\n                return \"<parse error>\";\n            case token_type::end_of_input:\n                return \"end of input\";\n            case token_type::literal_or_value:\n                return \"'[', '{', or a literal\";\n            // LCOV_EXCL_START\n            default: // catch non-enum values\n                return \"unknown token\";\n                // LCOV_EXCL_STOP\n        }\n    }\n};\n/*!\n@brief lexical analysis\n\nThis class organizes the lexical analysis during JSON deserialization.\n*/\ntemplate<typename BasicJsonType, typename InputAdapterType>\nclass lexer : public lexer_base<BasicJsonType>\n{\n    using number_integer_t = typename BasicJsonType::number_integer_t;\n    using number_unsigned_t = typename BasicJsonType::number_unsigned_t;\n    using number_float_t = typename BasicJsonType::number_float_t;\n    using string_t = typename BasicJsonType::string_t;\n    using char_type = typename InputAdapterType::char_type;\n    using char_int_type = typename std::char_traits<char_type>::int_type;\n\n  public:\n    using token_type = typename lexer_base<BasicJsonType>::token_type;\n\n    explicit lexer(InputAdapterType&& adapter, bool ignore_comments_ = false) noexcept\n        : ia(std::move(adapter))\n        , ignore_comments(ignore_comments_)\n        , decimal_point_char(static_cast<char_int_type>(get_decimal_point()))\n    {}\n\n    // delete because of pointer members\n    lexer(const lexer&) = delete;\n    lexer(lexer&&) = default; // NOLINT(hicpp-noexcept-move,performance-noexcept-move-constructor)\n    lexer& operator=(lexer&) = delete;\n    lexer& operator=(lexer&&) = default; // NOLINT(hicpp-noexcept-move,performance-noexcept-move-constructor)\n    ~lexer() = default;\n\n  private:\n    /////////////////////\n    // locales\n    /////////////////////\n\n    /// return the locale-dependent decimal point\n    JSON_HEDLEY_PURE\n    static char get_decimal_point() noexcept\n    {\n        const auto* loc = localeconv();\n        JSON_ASSERT(loc != nullptr);\n        return (loc->decimal_point == nullptr) ? '.' : *(loc->decimal_point);\n    }\n\n    /////////////////////\n    // scan functions\n    /////////////////////\n\n    /*!\n    @brief get codepoint from 4 hex characters following `\\u`\n\n    For input \"\\u c1 c2 c3 c4\" the codepoint is:\n      (c1 * 0x1000) + (c2 * 0x0100) + (c3 * 0x0010) + c4\n    = (c1 << 12) + (c2 << 8) + (c3 << 4) + (c4 << 0)\n\n    Furthermore, the possible characters '0'..'9', 'A'..'F', and 'a'..'f'\n    must be converted to the integers 0x0..0x9, 0xA..0xF, 0xA..0xF, resp. The\n    conversion is done by subtracting the offset (0x30, 0x37, and 0x57)\n    between the ASCII value of the character and the desired integer value.\n\n    @return codepoint (0x0000..0xFFFF) or -1 in case of an error (e.g. EOF or\n            non-hex character)\n    */\n    int get_codepoint()\n    {\n        // this function only makes sense after reading `\\u`\n        JSON_ASSERT(current == 'u');\n        int codepoint = 0;\n\n        const auto factors = { 12u, 8u, 4u, 0u };\n        for (const auto factor : factors)\n        {\n            get();\n\n            if (current >= '0' && current <= '9')\n            {\n                codepoint += static_cast<int>((static_cast<unsigned int>(current) - 0x30u) << factor);\n            }\n            else if (current >= 'A' && current <= 'F')\n            {\n                codepoint += static_cast<int>((static_cast<unsigned int>(current) - 0x37u) << factor);\n            }\n            else if (current >= 'a' && current <= 'f')\n            {\n                codepoint += static_cast<int>((static_cast<unsigned int>(current) - 0x57u) << factor);\n            }\n            else\n            {\n                return -1;\n            }\n        }\n\n        JSON_ASSERT(0x0000 <= codepoint && codepoint <= 0xFFFF);\n        return codepoint;\n    }\n\n    /*!\n    @brief check if the next byte(s) are inside a given range\n\n    Adds the current byte and, for each passed range, reads a new byte and\n    checks if it is inside the range. If a violation was detected, set up an\n    error message and return false. Otherwise, return true.\n\n    @param[in] ranges  list of integers; interpreted as list of pairs of\n                       inclusive lower and upper bound, respectively\n\n    @pre The passed list @a ranges must have 2, 4, or 6 elements; that is,\n         1, 2, or 3 pairs. This precondition is enforced by an assertion.\n\n    @return true if and only if no range violation was detected\n    */\n    bool next_byte_in_range(std::initializer_list<char_int_type> ranges)\n    {\n        JSON_ASSERT(ranges.size() == 2 || ranges.size() == 4 || ranges.size() == 6);\n        add(current);\n\n        for (auto range = ranges.begin(); range != ranges.end(); ++range)\n        {\n            get();\n            if (JSON_HEDLEY_LIKELY(*range <= current && current <= *(++range)))\n            {\n                add(current);\n            }\n            else\n            {\n                error_message = \"invalid string: ill-formed UTF-8 byte\";\n                return false;\n            }\n        }\n\n        return true;\n    }\n\n    /*!\n    @brief scan a string literal\n\n    This function scans a string according to Sect. 7 of RFC 7159. While\n    scanning, bytes are escaped and copied into buffer token_buffer. Then the\n    function returns successfully, token_buffer is *not* null-terminated (as it\n    may contain \\0 bytes), and token_buffer.size() is the number of bytes in the\n    string.\n\n    @return token_type::value_string if string could be successfully scanned,\n            token_type::parse_error otherwise\n\n    @note In case of errors, variable error_message contains a textual\n          description.\n    */\n    token_type scan_string()\n    {\n        // reset token_buffer (ignore opening quote)\n        reset();\n\n        // we entered the function by reading an open quote\n        JSON_ASSERT(current == '\\\"');\n\n        while (true)\n        {\n            // get next character\n            switch (get())\n            {\n                // end of file while parsing string\n                case std::char_traits<char_type>::eof():\n                {\n                    error_message = \"invalid string: missing closing quote\";\n                    return token_type::parse_error;\n                }\n\n                // closing quote\n                case '\\\"':\n                {\n                    return token_type::value_string;\n                }\n\n                // escapes\n                case '\\\\':\n                {\n                    switch (get())\n                    {\n                        // quotation mark\n                        case '\\\"':\n                            add('\\\"');\n                            break;\n                        // reverse solidus\n                        case '\\\\':\n                            add('\\\\');\n                            break;\n                        // solidus\n                        case '/':\n                            add('/');\n                            break;\n                        // backspace\n                        case 'b':\n                            add('\\b');\n                            break;\n                        // form feed\n                        case 'f':\n                            add('\\f');\n                            break;\n                        // line feed\n                        case 'n':\n                            add('\\n');\n                            break;\n                        // carriage return\n                        case 'r':\n                            add('\\r');\n                            break;\n                        // tab\n                        case 't':\n                            add('\\t');\n                            break;\n\n                        // unicode escapes\n                        case 'u':\n                        {\n                            const int codepoint1 = get_codepoint();\n                            int codepoint = codepoint1; // start with codepoint1\n\n                            if (JSON_HEDLEY_UNLIKELY(codepoint1 == -1))\n                            {\n                                error_message = \"invalid string: '\\\\u' must be followed by 4 hex digits\";\n                                return token_type::parse_error;\n                            }\n\n                            // check if code point is a high surrogate\n                            if (0xD800 <= codepoint1 && codepoint1 <= 0xDBFF)\n                            {\n                                // expect next \\uxxxx entry\n                                if (JSON_HEDLEY_LIKELY(get() == '\\\\' && get() == 'u'))\n                                {\n                                    const int codepoint2 = get_codepoint();\n\n                                    if (JSON_HEDLEY_UNLIKELY(codepoint2 == -1))\n                                    {\n                                        error_message = \"invalid string: '\\\\u' must be followed by 4 hex digits\";\n                                        return token_type::parse_error;\n                                    }\n\n                                    // check if codepoint2 is a low surrogate\n                                    if (JSON_HEDLEY_LIKELY(0xDC00 <= codepoint2 && codepoint2 <= 0xDFFF))\n                                    {\n                                        // overwrite codepoint\n                                        codepoint = static_cast<int>(\n                                                        // high surrogate occupies the most significant 22 bits\n                                                        (static_cast<unsigned int>(codepoint1) << 10u)\n                                                        // low surrogate occupies the least significant 15 bits\n                                                        + static_cast<unsigned int>(codepoint2)\n                                                        // there is still the 0xD800, 0xDC00 and 0x10000 noise\n                                                        // in the result so we have to subtract with:\n                                                        // (0xD800 << 10) + DC00 - 0x10000 = 0x35FDC00\n                                                        - 0x35FDC00u);\n                                    }\n                                    else\n                                    {\n                                        error_message = \"invalid string: surrogate U+D800..U+DBFF must be followed by U+DC00..U+DFFF\";\n                                        return token_type::parse_error;\n                                    }\n                                }\n                                else\n                                {\n                                    error_message = \"invalid string: surrogate U+D800..U+DBFF must be followed by U+DC00..U+DFFF\";\n                                    return token_type::parse_error;\n                                }\n                            }\n                            else\n                            {\n                                if (JSON_HEDLEY_UNLIKELY(0xDC00 <= codepoint1 && codepoint1 <= 0xDFFF))\n                                {\n                                    error_message = \"invalid string: surrogate U+DC00..U+DFFF must follow U+D800..U+DBFF\";\n                                    return token_type::parse_error;\n                                }\n                            }\n\n                            // result of the above calculation yields a proper codepoint\n                            JSON_ASSERT(0x00 <= codepoint && codepoint <= 0x10FFFF);\n\n                            // translate codepoint into bytes\n                            if (codepoint < 0x80)\n                            {\n                                // 1-byte characters: 0xxxxxxx (ASCII)\n                                add(static_cast<char_int_type>(codepoint));\n                            }\n                            else if (codepoint <= 0x7FF)\n                            {\n                                // 2-byte characters: 110xxxxx 10xxxxxx\n                                add(static_cast<char_int_type>(0xC0u | (static_cast<unsigned int>(codepoint) >> 6u)));\n                                add(static_cast<char_int_type>(0x80u | (static_cast<unsigned int>(codepoint) & 0x3Fu)));\n                            }\n                            else if (codepoint <= 0xFFFF)\n                            {\n                                // 3-byte characters: 1110xxxx 10xxxxxx 10xxxxxx\n                                add(static_cast<char_int_type>(0xE0u | (static_cast<unsigned int>(codepoint) >> 12u)));\n                                add(static_cast<char_int_type>(0x80u | ((static_cast<unsigned int>(codepoint) >> 6u) & 0x3Fu)));\n                                add(static_cast<char_int_type>(0x80u | (static_cast<unsigned int>(codepoint) & 0x3Fu)));\n                            }\n                            else\n                            {\n                                // 4-byte characters: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx\n                                add(static_cast<char_int_type>(0xF0u | (static_cast<unsigned int>(codepoint) >> 18u)));\n                                add(static_cast<char_int_type>(0x80u | ((static_cast<unsigned int>(codepoint) >> 12u) & 0x3Fu)));\n                                add(static_cast<char_int_type>(0x80u | ((static_cast<unsigned int>(codepoint) >> 6u) & 0x3Fu)));\n                                add(static_cast<char_int_type>(0x80u | (static_cast<unsigned int>(codepoint) & 0x3Fu)));\n                            }\n\n                            break;\n                        }\n\n                        // other characters after escape\n                        default:\n                            error_message = \"invalid string: forbidden character after backslash\";\n                            return token_type::parse_error;\n                    }\n\n                    break;\n                }\n\n                // invalid control characters\n                case 0x00:\n                {\n                    error_message = \"invalid string: control character U+0000 (NUL) must be escaped to \\\\u0000\";\n                    return token_type::parse_error;\n                }\n\n                case 0x01:\n                {\n                    error_message = \"invalid string: control character U+0001 (SOH) must be escaped to \\\\u0001\";\n                    return token_type::parse_error;\n                }\n\n                case 0x02:\n                {\n                    error_message = \"invalid string: control character U+0002 (STX) must be escaped to \\\\u0002\";\n                    return token_type::parse_error;\n                }\n\n                case 0x03:\n                {\n                    error_message = \"invalid string: control character U+0003 (ETX) must be escaped to \\\\u0003\";\n                    return token_type::parse_error;\n                }\n\n                case 0x04:\n                {\n                    error_message = \"invalid string: control character U+0004 (EOT) must be escaped to \\\\u0004\";\n                    return token_type::parse_error;\n                }\n\n                case 0x05:\n                {\n                    error_message = \"invalid string: control character U+0005 (ENQ) must be escaped to \\\\u0005\";\n                    return token_type::parse_error;\n                }\n\n                case 0x06:\n                {\n                    error_message = \"invalid string: control character U+0006 (ACK) must be escaped to \\\\u0006\";\n                    return token_type::parse_error;\n                }\n\n                case 0x07:\n                {\n                    error_message = \"invalid string: control character U+0007 (BEL) must be escaped to \\\\u0007\";\n                    return token_type::parse_error;\n                }\n\n                case 0x08:\n                {\n                    error_message = \"invalid string: control character U+0008 (BS) must be escaped to \\\\u0008 or \\\\b\";\n                    return token_type::parse_error;\n                }\n\n                case 0x09:\n                {\n                    error_message = \"invalid string: control character U+0009 (HT) must be escaped to \\\\u0009 or \\\\t\";\n                    return token_type::parse_error;\n                }\n\n                case 0x0A:\n                {\n                    error_message = \"invalid string: control character U+000A (LF) must be escaped to \\\\u000A or \\\\n\";\n                    return token_type::parse_error;\n                }\n\n                case 0x0B:\n                {\n                    error_message = \"invalid string: control character U+000B (VT) must be escaped to \\\\u000B\";\n                    return token_type::parse_error;\n                }\n\n                case 0x0C:\n                {\n                    error_message = \"invalid string: control character U+000C (FF) must be escaped to \\\\u000C or \\\\f\";\n                    return token_type::parse_error;\n                }\n\n                case 0x0D:\n                {\n                    error_message = \"invalid string: control character U+000D (CR) must be escaped to \\\\u000D or \\\\r\";\n                    return token_type::parse_error;\n                }\n\n                case 0x0E:\n                {\n                    error_message = \"invalid string: control character U+000E (SO) must be escaped to \\\\u000E\";\n                    return token_type::parse_error;\n                }\n\n                case 0x0F:\n                {\n                    error_message = \"invalid string: control character U+000F (SI) must be escaped to \\\\u000F\";\n                    return token_type::parse_error;\n                }\n\n                case 0x10:\n                {\n                    error_message = \"invalid string: control character U+0010 (DLE) must be escaped to \\\\u0010\";\n                    return token_type::parse_error;\n                }\n\n                case 0x11:\n                {\n                    error_message = \"invalid string: control character U+0011 (DC1) must be escaped to \\\\u0011\";\n                    return token_type::parse_error;\n                }\n\n                case 0x12:\n                {\n                    error_message = \"invalid string: control character U+0012 (DC2) must be escaped to \\\\u0012\";\n                    return token_type::parse_error;\n                }\n\n                case 0x13:\n                {\n                    error_message = \"invalid string: control character U+0013 (DC3) must be escaped to \\\\u0013\";\n                    return token_type::parse_error;\n                }\n\n                case 0x14:\n                {\n                    error_message = \"invalid string: control character U+0014 (DC4) must be escaped to \\\\u0014\";\n                    return token_type::parse_error;\n                }\n\n                case 0x15:\n                {\n                    error_message = \"invalid string: control character U+0015 (NAK) must be escaped to \\\\u0015\";\n                    return token_type::parse_error;\n                }\n\n                case 0x16:\n                {\n                    error_message = \"invalid string: control character U+0016 (SYN) must be escaped to \\\\u0016\";\n                    return token_type::parse_error;\n                }\n\n                case 0x17:\n                {\n                    error_message = \"invalid string: control character U+0017 (ETB) must be escaped to \\\\u0017\";\n                    return token_type::parse_error;\n                }\n\n                case 0x18:\n                {\n                    error_message = \"invalid string: control character U+0018 (CAN) must be escaped to \\\\u0018\";\n                    return token_type::parse_error;\n                }\n\n                case 0x19:\n                {\n                    error_message = \"invalid string: control character U+0019 (EM) must be escaped to \\\\u0019\";\n                    return token_type::parse_error;\n                }\n\n                case 0x1A:\n                {\n                    error_message = \"invalid string: control character U+001A (SUB) must be escaped to \\\\u001A\";\n                    return token_type::parse_error;\n                }\n\n                case 0x1B:\n                {\n                    error_message = \"invalid string: control character U+001B (ESC) must be escaped to \\\\u001B\";\n                    return token_type::parse_error;\n                }\n\n                case 0x1C:\n                {\n                    error_message = \"invalid string: control character U+001C (FS) must be escaped to \\\\u001C\";\n                    return token_type::parse_error;\n                }\n\n                case 0x1D:\n                {\n                    error_message = \"invalid string: control character U+001D (GS) must be escaped to \\\\u001D\";\n                    return token_type::parse_error;\n                }\n\n                case 0x1E:\n                {\n                    error_message = \"invalid string: control character U+001E (RS) must be escaped to \\\\u001E\";\n                    return token_type::parse_error;\n                }\n\n                case 0x1F:\n                {\n                    error_message = \"invalid string: control character U+001F (US) must be escaped to \\\\u001F\";\n                    return token_type::parse_error;\n                }\n\n                // U+0020..U+007F (except U+0022 (quote) and U+005C (backspace))\n                case 0x20:\n                case 0x21:\n                case 0x23:\n                case 0x24:\n                case 0x25:\n                case 0x26:\n                case 0x27:\n                case 0x28:\n                case 0x29:\n                case 0x2A:\n                case 0x2B:\n                case 0x2C:\n                case 0x2D:\n                case 0x2E:\n                case 0x2F:\n                case 0x30:\n                case 0x31:\n                case 0x32:\n                case 0x33:\n                case 0x34:\n                case 0x35:\n                case 0x36:\n                case 0x37:\n                case 0x38:\n                case 0x39:\n                case 0x3A:\n                case 0x3B:\n                case 0x3C:\n                case 0x3D:\n                case 0x3E:\n                case 0x3F:\n                case 0x40:\n                case 0x41:\n                case 0x42:\n                case 0x43:\n                case 0x44:\n                case 0x45:\n                case 0x46:\n                case 0x47:\n                case 0x48:\n                case 0x49:\n                case 0x4A:\n                case 0x4B:\n                case 0x4C:\n                case 0x4D:\n                case 0x4E:\n                case 0x4F:\n                case 0x50:\n                case 0x51:\n                case 0x52:\n                case 0x53:\n                case 0x54:\n                case 0x55:\n                case 0x56:\n                case 0x57:\n                case 0x58:\n                case 0x59:\n                case 0x5A:\n                case 0x5B:\n                case 0x5D:\n                case 0x5E:\n                case 0x5F:\n                case 0x60:\n                case 0x61:\n                case 0x62:\n                case 0x63:\n                case 0x64:\n                case 0x65:\n                case 0x66:\n                case 0x67:\n                case 0x68:\n                case 0x69:\n                case 0x6A:\n                case 0x6B:\n                case 0x6C:\n                case 0x6D:\n                case 0x6E:\n                case 0x6F:\n                case 0x70:\n                case 0x71:\n                case 0x72:\n                case 0x73:\n                case 0x74:\n                case 0x75:\n                case 0x76:\n                case 0x77:\n                case 0x78:\n                case 0x79:\n                case 0x7A:\n                case 0x7B:\n                case 0x7C:\n                case 0x7D:\n                case 0x7E:\n                case 0x7F:\n                {\n                    add(current);\n                    break;\n                }\n\n                // U+0080..U+07FF: bytes C2..DF 80..BF\n                case 0xC2:\n                case 0xC3:\n                case 0xC4:\n                case 0xC5:\n                case 0xC6:\n                case 0xC7:\n                case 0xC8:\n                case 0xC9:\n                case 0xCA:\n                case 0xCB:\n                case 0xCC:\n                case 0xCD:\n                case 0xCE:\n                case 0xCF:\n                case 0xD0:\n                case 0xD1:\n                case 0xD2:\n                case 0xD3:\n                case 0xD4:\n                case 0xD5:\n                case 0xD6:\n                case 0xD7:\n                case 0xD8:\n                case 0xD9:\n                case 0xDA:\n                case 0xDB:\n                case 0xDC:\n                case 0xDD:\n                case 0xDE:\n                case 0xDF:\n                {\n                    if (JSON_HEDLEY_UNLIKELY(!next_byte_in_range({0x80, 0xBF})))\n                    {\n                        return token_type::parse_error;\n                    }\n                    break;\n                }\n\n                // U+0800..U+0FFF: bytes E0 A0..BF 80..BF\n                case 0xE0:\n                {\n                    if (JSON_HEDLEY_UNLIKELY(!(next_byte_in_range({0xA0, 0xBF, 0x80, 0xBF}))))\n                    {\n                        return token_type::parse_error;\n                    }\n                    break;\n                }\n\n                // U+1000..U+CFFF: bytes E1..EC 80..BF 80..BF\n                // U+E000..U+FFFF: bytes EE..EF 80..BF 80..BF\n                case 0xE1:\n                case 0xE2:\n                case 0xE3:\n                case 0xE4:\n                case 0xE5:\n                case 0xE6:\n                case 0xE7:\n                case 0xE8:\n                case 0xE9:\n                case 0xEA:\n                case 0xEB:\n                case 0xEC:\n                case 0xEE:\n                case 0xEF:\n                {\n                    if (JSON_HEDLEY_UNLIKELY(!(next_byte_in_range({0x80, 0xBF, 0x80, 0xBF}))))\n                    {\n                        return token_type::parse_error;\n                    }\n                    break;\n                }\n\n                // U+D000..U+D7FF: bytes ED 80..9F 80..BF\n                case 0xED:\n                {\n                    if (JSON_HEDLEY_UNLIKELY(!(next_byte_in_range({0x80, 0x9F, 0x80, 0xBF}))))\n                    {\n                        return token_type::parse_error;\n                    }\n                    break;\n                }\n\n                // U+10000..U+3FFFF F0 90..BF 80..BF 80..BF\n                case 0xF0:\n                {\n                    if (JSON_HEDLEY_UNLIKELY(!(next_byte_in_range({0x90, 0xBF, 0x80, 0xBF, 0x80, 0xBF}))))\n                    {\n                        return token_type::parse_error;\n                    }\n                    break;\n                }\n\n                // U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF\n                case 0xF1:\n                case 0xF2:\n                case 0xF3:\n                {\n                    if (JSON_HEDLEY_UNLIKELY(!(next_byte_in_range({0x80, 0xBF, 0x80, 0xBF, 0x80, 0xBF}))))\n                    {\n                        return token_type::parse_error;\n                    }\n                    break;\n                }\n\n                // U+100000..U+10FFFF F4 80..8F 80..BF 80..BF\n                case 0xF4:\n                {\n                    if (JSON_HEDLEY_UNLIKELY(!(next_byte_in_range({0x80, 0x8F, 0x80, 0xBF, 0x80, 0xBF}))))\n                    {\n                        return token_type::parse_error;\n                    }\n                    break;\n                }\n\n                // remaining bytes (80..C1 and F5..FF) are ill-formed\n                default:\n                {\n                    error_message = \"invalid string: ill-formed UTF-8 byte\";\n                    return token_type::parse_error;\n                }\n            }\n        }\n    }\n\n    /*!\n     * @brief scan a comment\n     * @return whether comment could be scanned successfully\n     */\n    bool scan_comment()\n    {\n        switch (get())\n        {\n            // single-line comments skip input until a newline or EOF is read\n            case '/':\n            {\n                while (true)\n                {\n                    switch (get())\n                    {\n                        case '\\n':\n                        case '\\r':\n                        case std::char_traits<char_type>::eof():\n                        case '\\0':\n                            return true;\n\n                        default:\n                            break;\n                    }\n                }\n            }\n\n            // multi-line comments skip input until */ is read\n            case '*':\n            {\n                while (true)\n                {\n                    switch (get())\n                    {\n                        case std::char_traits<char_type>::eof():\n                        case '\\0':\n                        {\n                            error_message = \"invalid comment; missing closing '*/'\";\n                            return false;\n                        }\n\n                        case '*':\n                        {\n                            switch (get())\n                            {\n                                case '/':\n                                    return true;\n\n                                default:\n                                {\n                                    unget();\n                                    continue;\n                                }\n                            }\n                        }\n\n                        default:\n                            continue;\n                    }\n                }\n            }\n\n            // unexpected character after reading '/'\n            default:\n            {\n                error_message = \"invalid comment; expecting '/' or '*' after '/'\";\n                return false;\n            }\n        }\n    }\n\n    JSON_HEDLEY_NON_NULL(2)\n    static void strtof(float& f, const char* str, char** endptr) noexcept\n    {\n        f = std::strtof(str, endptr);\n    }\n\n    JSON_HEDLEY_NON_NULL(2)\n    static void strtof(double& f, const char* str, char** endptr) noexcept\n    {\n        f = std::strtod(str, endptr);\n    }\n\n    JSON_HEDLEY_NON_NULL(2)\n    static void strtof(long double& f, const char* str, char** endptr) noexcept\n    {\n        f = std::strtold(str, endptr);\n    }\n\n    /*!\n    @brief scan a number literal\n\n    This function scans a string according to Sect. 6 of RFC 7159.\n\n    The function is realized with a deterministic finite state machine derived\n    from the grammar described in RFC 7159. Starting in state \"init\", the\n    input is read and used to determined the next state. Only state \"done\"\n    accepts the number. State \"error\" is a trap state to model errors. In the\n    table below, \"anything\" means any character but the ones listed before.\n\n    state    | 0        | 1-9      | e E      | +       | -       | .        | anything\n    ---------|----------|----------|----------|---------|---------|----------|-----------\n    init     | zero     | any1     | [error]  | [error] | minus   | [error]  | [error]\n    minus    | zero     | any1     | [error]  | [error] | [error] | [error]  | [error]\n    zero     | done     | done     | exponent | done    | done    | decimal1 | done\n    any1     | any1     | any1     | exponent | done    | done    | decimal1 | done\n    decimal1 | decimal2 | decimal2 | [error]  | [error] | [error] | [error]  | [error]\n    decimal2 | decimal2 | decimal2 | exponent | done    | done    | done     | done\n    exponent | any2     | any2     | [error]  | sign    | sign    | [error]  | [error]\n    sign     | any2     | any2     | [error]  | [error] | [error] | [error]  | [error]\n    any2     | any2     | any2     | done     | done    | done    | done     | done\n\n    The state machine is realized with one label per state (prefixed with\n    \"scan_number_\") and `goto` statements between them. The state machine\n    contains cycles, but any cycle can be left when EOF is read. Therefore,\n    the function is guaranteed to terminate.\n\n    During scanning, the read bytes are stored in token_buffer. This string is\n    then converted to a signed integer, an unsigned integer, or a\n    floating-point number.\n\n    @return token_type::value_unsigned, token_type::value_integer, or\n            token_type::value_float if number could be successfully scanned,\n            token_type::parse_error otherwise\n\n    @note The scanner is independent of the current locale. Internally, the\n          locale's decimal point is used instead of `.` to work with the\n          locale-dependent converters.\n    */\n    token_type scan_number()  // lgtm [cpp/use-of-goto]\n    {\n        // reset token_buffer to store the number's bytes\n        reset();\n\n        // the type of the parsed number; initially set to unsigned; will be\n        // changed if minus sign, decimal point or exponent is read\n        token_type number_type = token_type::value_unsigned;\n\n        // state (init): we just found out we need to scan a number\n        switch (current)\n        {\n            case '-':\n            {\n                add(current);\n                goto scan_number_minus;\n            }\n\n            case '0':\n            {\n                add(current);\n                goto scan_number_zero;\n            }\n\n            case '1':\n            case '2':\n            case '3':\n            case '4':\n            case '5':\n            case '6':\n            case '7':\n            case '8':\n            case '9':\n            {\n                add(current);\n                goto scan_number_any1;\n            }\n\n            // all other characters are rejected outside scan_number()\n            default:            // LCOV_EXCL_LINE\n                JSON_ASSERT(false); // NOLINT(cert-dcl03-c,hicpp-static-assert,misc-static-assert) LCOV_EXCL_LINE\n        }\n\nscan_number_minus:\n        // state: we just parsed a leading minus sign\n        number_type = token_type::value_integer;\n        switch (get())\n        {\n            case '0':\n            {\n                add(current);\n                goto scan_number_zero;\n            }\n\n            case '1':\n            case '2':\n            case '3':\n            case '4':\n            case '5':\n            case '6':\n            case '7':\n            case '8':\n            case '9':\n            {\n                add(current);\n                goto scan_number_any1;\n            }\n\n            default:\n            {\n                error_message = \"invalid number; expected digit after '-'\";\n                return token_type::parse_error;\n            }\n        }\n\nscan_number_zero:\n        // state: we just parse a zero (maybe with a leading minus sign)\n        switch (get())\n        {\n            case '.':\n            {\n                add(decimal_point_char);\n                goto scan_number_decimal1;\n            }\n\n            case 'e':\n            case 'E':\n            {\n                add(current);\n                goto scan_number_exponent;\n            }\n\n            default:\n                goto scan_number_done;\n        }\n\nscan_number_any1:\n        // state: we just parsed a number 0-9 (maybe with a leading minus sign)\n        switch (get())\n        {\n            case '0':\n            case '1':\n            case '2':\n            case '3':\n            case '4':\n            case '5':\n            case '6':\n            case '7':\n            case '8':\n            case '9':\n            {\n                add(current);\n                goto scan_number_any1;\n            }\n\n            case '.':\n            {\n                add(decimal_point_char);\n                goto scan_number_decimal1;\n            }\n\n            case 'e':\n            case 'E':\n            {\n                add(current);\n                goto scan_number_exponent;\n            }\n\n            default:\n                goto scan_number_done;\n        }\n\nscan_number_decimal1:\n        // state: we just parsed a decimal point\n        number_type = token_type::value_float;\n        switch (get())\n        {\n            case '0':\n            case '1':\n            case '2':\n            case '3':\n            case '4':\n            case '5':\n            case '6':\n            case '7':\n            case '8':\n            case '9':\n            {\n                add(current);\n                goto scan_number_decimal2;\n            }\n\n            default:\n            {\n                error_message = \"invalid number; expected digit after '.'\";\n                return token_type::parse_error;\n            }\n        }\n\nscan_number_decimal2:\n        // we just parsed at least one number after a decimal point\n        switch (get())\n        {\n            case '0':\n            case '1':\n            case '2':\n            case '3':\n            case '4':\n            case '5':\n            case '6':\n            case '7':\n            case '8':\n            case '9':\n            {\n                add(current);\n                goto scan_number_decimal2;\n            }\n\n            case 'e':\n            case 'E':\n            {\n                add(current);\n                goto scan_number_exponent;\n            }\n\n            default:\n                goto scan_number_done;\n        }\n\nscan_number_exponent:\n        // we just parsed an exponent\n        number_type = token_type::value_float;\n        switch (get())\n        {\n            case '+':\n            case '-':\n            {\n                add(current);\n                goto scan_number_sign;\n            }\n\n            case '0':\n            case '1':\n            case '2':\n            case '3':\n            case '4':\n            case '5':\n            case '6':\n            case '7':\n            case '8':\n            case '9':\n            {\n                add(current);\n                goto scan_number_any2;\n            }\n\n            default:\n            {\n                error_message =\n                    \"invalid number; expected '+', '-', or digit after exponent\";\n                return token_type::parse_error;\n            }\n        }\n\nscan_number_sign:\n        // we just parsed an exponent sign\n        switch (get())\n        {\n            case '0':\n            case '1':\n            case '2':\n            case '3':\n            case '4':\n            case '5':\n            case '6':\n            case '7':\n            case '8':\n            case '9':\n            {\n                add(current);\n                goto scan_number_any2;\n            }\n\n            default:\n            {\n                error_message = \"invalid number; expected digit after exponent sign\";\n                return token_type::parse_error;\n            }\n        }\n\nscan_number_any2:\n        // we just parsed a number after the exponent or exponent sign\n        switch (get())\n        {\n            case '0':\n            case '1':\n            case '2':\n            case '3':\n            case '4':\n            case '5':\n            case '6':\n            case '7':\n            case '8':\n            case '9':\n            {\n                add(current);\n                goto scan_number_any2;\n            }\n\n            default:\n                goto scan_number_done;\n        }\n\nscan_number_done:\n        // unget the character after the number (we only read it to know that\n        // we are done scanning a number)\n        unget();\n\n        char* endptr = nullptr; // NOLINT(cppcoreguidelines-pro-type-vararg,hicpp-vararg)\n        errno = 0;\n\n        // try to parse integers first and fall back to floats\n        if (number_type == token_type::value_unsigned)\n        {\n            const auto x = std::strtoull(token_buffer.data(), &endptr, 10);\n\n            // we checked the number format before\n            JSON_ASSERT(endptr == token_buffer.data() + token_buffer.size());\n\n            if (errno == 0)\n            {\n                value_unsigned = static_cast<number_unsigned_t>(x);\n                if (value_unsigned == x)\n                {\n                    return token_type::value_unsigned;\n                }\n            }\n        }\n        else if (number_type == token_type::value_integer)\n        {\n            const auto x = std::strtoll(token_buffer.data(), &endptr, 10);\n\n            // we checked the number format before\n            JSON_ASSERT(endptr == token_buffer.data() + token_buffer.size());\n\n            if (errno == 0)\n            {\n                value_integer = static_cast<number_integer_t>(x);\n                if (value_integer == x)\n                {\n                    return token_type::value_integer;\n                }\n            }\n        }\n\n        // this code is reached if we parse a floating-point number or if an\n        // integer conversion above failed\n        strtof(value_float, token_buffer.data(), &endptr);\n\n        // we checked the number format before\n        JSON_ASSERT(endptr == token_buffer.data() + token_buffer.size());\n\n        return token_type::value_float;\n    }\n\n    /*!\n    @param[in] literal_text  the literal text to expect\n    @param[in] length        the length of the passed literal text\n    @param[in] return_type   the token type to return on success\n    */\n    JSON_HEDLEY_NON_NULL(2)\n    token_type scan_literal(const char_type* literal_text, const std::size_t length,\n                            token_type return_type)\n    {\n        JSON_ASSERT(std::char_traits<char_type>::to_char_type(current) == literal_text[0]);\n        for (std::size_t i = 1; i < length; ++i)\n        {\n            if (JSON_HEDLEY_UNLIKELY(std::char_traits<char_type>::to_char_type(get()) != literal_text[i]))\n            {\n                error_message = \"invalid literal\";\n                return token_type::parse_error;\n            }\n        }\n        return return_type;\n    }\n\n    /////////////////////\n    // input management\n    /////////////////////\n\n    /// reset token_buffer; current character is beginning of token\n    void reset() noexcept\n    {\n        token_buffer.clear();\n        token_string.clear();\n        token_string.push_back(std::char_traits<char_type>::to_char_type(current));\n    }\n\n    /*\n    @brief get next character from the input\n\n    This function provides the interface to the used input adapter. It does\n    not throw in case the input reached EOF, but returns a\n    `std::char_traits<char>::eof()` in that case.  Stores the scanned characters\n    for use in error messages.\n\n    @return character read from the input\n    */\n    char_int_type get()\n    {\n        ++position.chars_read_total;\n        ++position.chars_read_current_line;\n\n        if (next_unget)\n        {\n            // just reset the next_unget variable and work with current\n            next_unget = false;\n        }\n        else\n        {\n            current = ia.get_character();\n        }\n\n        if (JSON_HEDLEY_LIKELY(current != std::char_traits<char_type>::eof()))\n        {\n            token_string.push_back(std::char_traits<char_type>::to_char_type(current));\n        }\n\n        if (current == '\\n')\n        {\n            ++position.lines_read;\n            position.chars_read_current_line = 0;\n        }\n\n        return current;\n    }\n\n    /*!\n    @brief unget current character (read it again on next get)\n\n    We implement unget by setting variable next_unget to true. The input is not\n    changed - we just simulate ungetting by modifying chars_read_total,\n    chars_read_current_line, and token_string. The next call to get() will\n    behave as if the unget character is read again.\n    */\n    void unget()\n    {\n        next_unget = true;\n\n        --position.chars_read_total;\n\n        // in case we \"unget\" a newline, we have to also decrement the lines_read\n        if (position.chars_read_current_line == 0)\n        {\n            if (position.lines_read > 0)\n            {\n                --position.lines_read;\n            }\n        }\n        else\n        {\n            --position.chars_read_current_line;\n        }\n\n        if (JSON_HEDLEY_LIKELY(current != std::char_traits<char_type>::eof()))\n        {\n            JSON_ASSERT(!token_string.empty());\n            token_string.pop_back();\n        }\n    }\n\n    /// add a character to token_buffer\n    void add(char_int_type c)\n    {\n        token_buffer.push_back(static_cast<typename string_t::value_type>(c));\n    }\n\n  public:\n    /////////////////////\n    // value getters\n    /////////////////////\n\n    /// return integer value\n    constexpr number_integer_t get_number_integer() const noexcept\n    {\n        return value_integer;\n    }\n\n    /// return unsigned integer value\n    constexpr number_unsigned_t get_number_unsigned() const noexcept\n    {\n        return value_unsigned;\n    }\n\n    /// return floating-point value\n    constexpr number_float_t get_number_float() const noexcept\n    {\n        return value_float;\n    }\n\n    /// return current string value (implicitly resets the token; useful only once)\n    string_t& get_string()\n    {\n        return token_buffer;\n    }\n\n    /////////////////////\n    // diagnostics\n    /////////////////////\n\n    /// return position of last read token\n    constexpr position_t get_position() const noexcept\n    {\n        return position;\n    }\n\n    /// return the last read token (for errors only).  Will never contain EOF\n    /// (an arbitrary value that is not a valid char value, often -1), because\n    /// 255 may legitimately occur.  May contain NUL, which should be escaped.\n    std::string get_token_string() const\n    {\n        // escape control characters\n        std::string result;\n        for (const auto c : token_string)\n        {\n            if (static_cast<unsigned char>(c) <= '\\x1F')\n            {\n                // escape control characters\n                std::array<char, 9> cs{{}};\n                (std::snprintf)(cs.data(), cs.size(), \"<U+%.4X>\", static_cast<unsigned char>(c)); // NOLINT(cppcoreguidelines-pro-type-vararg,hicpp-vararg)\n                result += cs.data();\n            }\n            else\n            {\n                // add character as is\n                result.push_back(static_cast<std::string::value_type>(c));\n            }\n        }\n\n        return result;\n    }\n\n    /// return syntax error message\n    JSON_HEDLEY_RETURNS_NON_NULL\n    constexpr const char* get_error_message() const noexcept\n    {\n        return error_message;\n    }\n\n    /////////////////////\n    // actual scanner\n    /////////////////////\n\n    /*!\n    @brief skip the UTF-8 byte order mark\n    @return true iff there is no BOM or the correct BOM has been skipped\n    */\n    bool skip_bom()\n    {\n        if (get() == 0xEF)\n        {\n            // check if we completely parse the BOM\n            return get() == 0xBB && get() == 0xBF;\n        }\n\n        // the first character is not the beginning of the BOM; unget it to\n        // process is later\n        unget();\n        return true;\n    }\n\n    void skip_whitespace()\n    {\n        do\n        {\n            get();\n        }\n        while (current == ' ' || current == '\\t' || current == '\\n' || current == '\\r');\n    }\n\n    token_type scan()\n    {\n        // initially, skip the BOM\n        if (position.chars_read_total == 0 && !skip_bom())\n        {\n            error_message = \"invalid BOM; must be 0xEF 0xBB 0xBF if given\";\n            return token_type::parse_error;\n        }\n\n        // read next character and ignore whitespace\n        skip_whitespace();\n\n        // ignore comments\n        while (ignore_comments && current == '/')\n        {\n            if (!scan_comment())\n            {\n                return token_type::parse_error;\n            }\n\n            // skip following whitespace\n            skip_whitespace();\n        }\n\n        switch (current)\n        {\n            // structural characters\n            case '[':\n                return token_type::begin_array;\n            case ']':\n                return token_type::end_array;\n            case '{':\n                return token_type::begin_object;\n            case '}':\n                return token_type::end_object;\n            case ':':\n                return token_type::name_separator;\n            case ',':\n                return token_type::value_separator;\n\n            // literals\n            case 't':\n            {\n                std::array<char_type, 4> true_literal = {{char_type('t'), char_type('r'), char_type('u'), char_type('e')}};\n                return scan_literal(true_literal.data(), true_literal.size(), token_type::literal_true);\n            }\n            case 'f':\n            {\n                std::array<char_type, 5> false_literal = {{char_type('f'), char_type('a'), char_type('l'), char_type('s'), char_type('e')}};\n                return scan_literal(false_literal.data(), false_literal.size(), token_type::literal_false);\n            }\n            case 'n':\n            {\n                std::array<char_type, 4> null_literal = {{char_type('n'), char_type('u'), char_type('l'), char_type('l')}};\n                return scan_literal(null_literal.data(), null_literal.size(), token_type::literal_null);\n            }\n\n            // string\n            case '\\\"':\n                return scan_string();\n\n            // number\n            case '-':\n            case '0':\n            case '1':\n            case '2':\n            case '3':\n            case '4':\n            case '5':\n            case '6':\n            case '7':\n            case '8':\n            case '9':\n                return scan_number();\n\n            // end of input (the null byte is needed when parsing from\n            // string literals)\n            case '\\0':\n            case std::char_traits<char_type>::eof():\n                return token_type::end_of_input;\n\n            // error\n            default:\n                error_message = \"invalid literal\";\n                return token_type::parse_error;\n        }\n    }\n\n  private:\n    /// input adapter\n    InputAdapterType ia;\n\n    /// whether comments should be ignored (true) or signaled as errors (false)\n    const bool ignore_comments = false;\n\n    /// the current character\n    char_int_type current = std::char_traits<char_type>::eof();\n\n    /// whether the next get() call should just return current\n    bool next_unget = false;\n\n    /// the start position of the current token\n    position_t position {};\n\n    /// raw input token string (for error messages)\n    std::vector<char_type> token_string {};\n\n    /// buffer for variable-length tokens (numbers, strings)\n    string_t token_buffer {};\n\n    /// a description of occurred lexer errors\n    const char* error_message = \"\";\n\n    // number values\n    number_integer_t value_integer = 0;\n    number_unsigned_t value_unsigned = 0;\n    number_float_t value_float = 0;\n\n    /// the decimal point\n    const char_int_type decimal_point_char = '.';\n};\n}  // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n// #include <nlohmann/detail/meta/is_sax.hpp>\n\n\n#include <cstdint> // size_t\n#include <utility> // declval\n#include <string> // string\n\n// #include <nlohmann/detail/meta/detected.hpp>\n\n// #include <nlohmann/detail/meta/type_traits.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\ntemplate<typename T>\nusing null_function_t = decltype(std::declval<T&>().null());\n\ntemplate<typename T>\nusing boolean_function_t =\n    decltype(std::declval<T&>().boolean(std::declval<bool>()));\n\ntemplate<typename T, typename Integer>\nusing number_integer_function_t =\n    decltype(std::declval<T&>().number_integer(std::declval<Integer>()));\n\ntemplate<typename T, typename Unsigned>\nusing number_unsigned_function_t =\n    decltype(std::declval<T&>().number_unsigned(std::declval<Unsigned>()));\n\ntemplate<typename T, typename Float, typename String>\nusing number_float_function_t = decltype(std::declval<T&>().number_float(\n                                    std::declval<Float>(), std::declval<const String&>()));\n\ntemplate<typename T, typename String>\nusing string_function_t =\n    decltype(std::declval<T&>().string(std::declval<String&>()));\n\ntemplate<typename T, typename Binary>\nusing binary_function_t =\n    decltype(std::declval<T&>().binary(std::declval<Binary&>()));\n\ntemplate<typename T>\nusing start_object_function_t =\n    decltype(std::declval<T&>().start_object(std::declval<std::size_t>()));\n\ntemplate<typename T, typename String>\nusing key_function_t =\n    decltype(std::declval<T&>().key(std::declval<String&>()));\n\ntemplate<typename T>\nusing end_object_function_t = decltype(std::declval<T&>().end_object());\n\ntemplate<typename T>\nusing start_array_function_t =\n    decltype(std::declval<T&>().start_array(std::declval<std::size_t>()));\n\ntemplate<typename T>\nusing end_array_function_t = decltype(std::declval<T&>().end_array());\n\ntemplate<typename T, typename Exception>\nusing parse_error_function_t = decltype(std::declval<T&>().parse_error(\n        std::declval<std::size_t>(), std::declval<const std::string&>(),\n        std::declval<const Exception&>()));\n\ntemplate<typename SAX, typename BasicJsonType>\nstruct is_sax\n{\n  private:\n    static_assert(is_basic_json<BasicJsonType>::value,\n                  \"BasicJsonType must be of type basic_json<...>\");\n\n    using number_integer_t = typename BasicJsonType::number_integer_t;\n    using number_unsigned_t = typename BasicJsonType::number_unsigned_t;\n    using number_float_t = typename BasicJsonType::number_float_t;\n    using string_t = typename BasicJsonType::string_t;\n    using binary_t = typename BasicJsonType::binary_t;\n    using exception_t = typename BasicJsonType::exception;\n\n  public:\n    static constexpr bool value =\n        is_detected_exact<bool, null_function_t, SAX>::value &&\n        is_detected_exact<bool, boolean_function_t, SAX>::value &&\n        is_detected_exact<bool, number_integer_function_t, SAX, number_integer_t>::value &&\n        is_detected_exact<bool, number_unsigned_function_t, SAX, number_unsigned_t>::value &&\n        is_detected_exact<bool, number_float_function_t, SAX, number_float_t, string_t>::value &&\n        is_detected_exact<bool, string_function_t, SAX, string_t>::value &&\n        is_detected_exact<bool, binary_function_t, SAX, binary_t>::value &&\n        is_detected_exact<bool, start_object_function_t, SAX>::value &&\n        is_detected_exact<bool, key_function_t, SAX, string_t>::value &&\n        is_detected_exact<bool, end_object_function_t, SAX>::value &&\n        is_detected_exact<bool, start_array_function_t, SAX>::value &&\n        is_detected_exact<bool, end_array_function_t, SAX>::value &&\n        is_detected_exact<bool, parse_error_function_t, SAX, exception_t>::value;\n};\n\ntemplate<typename SAX, typename BasicJsonType>\nstruct is_sax_static_asserts\n{\n  private:\n    static_assert(is_basic_json<BasicJsonType>::value,\n                  \"BasicJsonType must be of type basic_json<...>\");\n\n    using number_integer_t = typename BasicJsonType::number_integer_t;\n    using number_unsigned_t = typename BasicJsonType::number_unsigned_t;\n    using number_float_t = typename BasicJsonType::number_float_t;\n    using string_t = typename BasicJsonType::string_t;\n    using binary_t = typename BasicJsonType::binary_t;\n    using exception_t = typename BasicJsonType::exception;\n\n  public:\n    static_assert(is_detected_exact<bool, null_function_t, SAX>::value,\n                  \"Missing/invalid function: bool null()\");\n    static_assert(is_detected_exact<bool, boolean_function_t, SAX>::value,\n                  \"Missing/invalid function: bool boolean(bool)\");\n    static_assert(is_detected_exact<bool, boolean_function_t, SAX>::value,\n                  \"Missing/invalid function: bool boolean(bool)\");\n    static_assert(\n        is_detected_exact<bool, number_integer_function_t, SAX,\n        number_integer_t>::value,\n        \"Missing/invalid function: bool number_integer(number_integer_t)\");\n    static_assert(\n        is_detected_exact<bool, number_unsigned_function_t, SAX,\n        number_unsigned_t>::value,\n        \"Missing/invalid function: bool number_unsigned(number_unsigned_t)\");\n    static_assert(is_detected_exact<bool, number_float_function_t, SAX,\n                  number_float_t, string_t>::value,\n                  \"Missing/invalid function: bool number_float(number_float_t, const string_t&)\");\n    static_assert(\n        is_detected_exact<bool, string_function_t, SAX, string_t>::value,\n        \"Missing/invalid function: bool string(string_t&)\");\n    static_assert(\n        is_detected_exact<bool, binary_function_t, SAX, binary_t>::value,\n        \"Missing/invalid function: bool binary(binary_t&)\");\n    static_assert(is_detected_exact<bool, start_object_function_t, SAX>::value,\n                  \"Missing/invalid function: bool start_object(std::size_t)\");\n    static_assert(is_detected_exact<bool, key_function_t, SAX, string_t>::value,\n                  \"Missing/invalid function: bool key(string_t&)\");\n    static_assert(is_detected_exact<bool, end_object_function_t, SAX>::value,\n                  \"Missing/invalid function: bool end_object()\");\n    static_assert(is_detected_exact<bool, start_array_function_t, SAX>::value,\n                  \"Missing/invalid function: bool start_array(std::size_t)\");\n    static_assert(is_detected_exact<bool, end_array_function_t, SAX>::value,\n                  \"Missing/invalid function: bool end_array()\");\n    static_assert(\n        is_detected_exact<bool, parse_error_function_t, SAX, exception_t>::value,\n        \"Missing/invalid function: bool parse_error(std::size_t, const \"\n        \"std::string&, const exception&)\");\n};\n}  // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/value_t.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\n\n/// how to treat CBOR tags\nenum class cbor_tag_handler_t\n{\n    error,  ///< throw a parse_error exception in case of a tag\n    ignore   ///< ignore tags\n};\n\n/*!\n@brief determine system byte order\n\n@return true if and only if system's byte order is little endian\n\n@note from https://stackoverflow.com/a/1001328/266378\n*/\nstatic inline bool little_endianess(int num = 1) noexcept\n{\n    return *reinterpret_cast<char*>(&num) == 1;\n}\n\n\n///////////////////\n// binary reader //\n///////////////////\n\n/*!\n@brief deserialization of CBOR, MessagePack, and UBJSON values\n*/\ntemplate<typename BasicJsonType, typename InputAdapterType, typename SAX = json_sax_dom_parser<BasicJsonType>>\nclass binary_reader\n{\n    using number_integer_t = typename BasicJsonType::number_integer_t;\n    using number_unsigned_t = typename BasicJsonType::number_unsigned_t;\n    using number_float_t = typename BasicJsonType::number_float_t;\n    using string_t = typename BasicJsonType::string_t;\n    using binary_t = typename BasicJsonType::binary_t;\n    using json_sax_t = SAX;\n    using char_type = typename InputAdapterType::char_type;\n    using char_int_type = typename std::char_traits<char_type>::int_type;\n\n  public:\n    /*!\n    @brief create a binary reader\n\n    @param[in] adapter  input adapter to read from\n    */\n    explicit binary_reader(InputAdapterType&& adapter) noexcept : ia(std::move(adapter))\n    {\n        (void)detail::is_sax_static_asserts<SAX, BasicJsonType> {};\n    }\n\n    // make class move-only\n    binary_reader(const binary_reader&) = delete;\n    binary_reader(binary_reader&&) = default; // NOLINT(hicpp-noexcept-move,performance-noexcept-move-constructor)\n    binary_reader& operator=(const binary_reader&) = delete;\n    binary_reader& operator=(binary_reader&&) = default; // NOLINT(hicpp-noexcept-move,performance-noexcept-move-constructor)\n    ~binary_reader() = default;\n\n    /*!\n    @param[in] format  the binary format to parse\n    @param[in] sax_    a SAX event processor\n    @param[in] strict  whether to expect the input to be consumed completed\n    @param[in] tag_handler  how to treat CBOR tags\n\n    @return whether parsing was successful\n    */\n    JSON_HEDLEY_NON_NULL(3)\n    bool sax_parse(const input_format_t format,\n                   json_sax_t* sax_,\n                   const bool strict = true,\n                   const cbor_tag_handler_t tag_handler = cbor_tag_handler_t::error)\n    {\n        sax = sax_;\n        bool result = false;\n\n        switch (format)\n        {\n            case input_format_t::bson:\n                result = parse_bson_internal();\n                break;\n\n            case input_format_t::cbor:\n                result = parse_cbor_internal(true, tag_handler);\n                break;\n\n            case input_format_t::msgpack:\n                result = parse_msgpack_internal();\n                break;\n\n            case input_format_t::ubjson:\n                result = parse_ubjson_internal();\n                break;\n\n            default:            // LCOV_EXCL_LINE\n                JSON_ASSERT(false); // NOLINT(cert-dcl03-c,hicpp-static-assert,misc-static-assert) LCOV_EXCL_LINE\n        }\n\n        // strict mode: next byte must be EOF\n        if (result && strict)\n        {\n            if (format == input_format_t::ubjson)\n            {\n                get_ignore_noop();\n            }\n            else\n            {\n                get();\n            }\n\n            if (JSON_HEDLEY_UNLIKELY(current != std::char_traits<char_type>::eof()))\n            {\n                return sax->parse_error(chars_read, get_token_string(),\n                                        parse_error::create(110, chars_read, exception_message(format, \"expected end of input; last byte: 0x\" + get_token_string(), \"value\"), BasicJsonType()));\n            }\n        }\n\n        return result;\n    }\n\n  private:\n    //////////\n    // BSON //\n    //////////\n\n    /*!\n    @brief Reads in a BSON-object and passes it to the SAX-parser.\n    @return whether a valid BSON-value was passed to the SAX parser\n    */\n    bool parse_bson_internal()\n    {\n        std::int32_t document_size{};\n        get_number<std::int32_t, true>(input_format_t::bson, document_size);\n\n        if (JSON_HEDLEY_UNLIKELY(!sax->start_object(std::size_t(-1))))\n        {\n            return false;\n        }\n\n        if (JSON_HEDLEY_UNLIKELY(!parse_bson_element_list(/*is_array*/false)))\n        {\n            return false;\n        }\n\n        return sax->end_object();\n    }\n\n    /*!\n    @brief Parses a C-style string from the BSON input.\n    @param[in,out] result  A reference to the string variable where the read\n                            string is to be stored.\n    @return `true` if the \\x00-byte indicating the end of the string was\n             encountered before the EOF; false` indicates an unexpected EOF.\n    */\n    bool get_bson_cstr(string_t& result)\n    {\n        auto out = std::back_inserter(result);\n        while (true)\n        {\n            get();\n            if (JSON_HEDLEY_UNLIKELY(!unexpect_eof(input_format_t::bson, \"cstring\")))\n            {\n                return false;\n            }\n            if (current == 0x00)\n            {\n                return true;\n            }\n            *out++ = static_cast<typename string_t::value_type>(current);\n        }\n    }\n\n    /*!\n    @brief Parses a zero-terminated string of length @a len from the BSON\n           input.\n    @param[in] len  The length (including the zero-byte at the end) of the\n                    string to be read.\n    @param[in,out] result  A reference to the string variable where the read\n                            string is to be stored.\n    @tparam NumberType The type of the length @a len\n    @pre len >= 1\n    @return `true` if the string was successfully parsed\n    */\n    template<typename NumberType>\n    bool get_bson_string(const NumberType len, string_t& result)\n    {\n        if (JSON_HEDLEY_UNLIKELY(len < 1))\n        {\n            auto last_token = get_token_string();\n            return sax->parse_error(chars_read, last_token, parse_error::create(112, chars_read, exception_message(input_format_t::bson, \"string length must be at least 1, is \" + std::to_string(len), \"string\"), BasicJsonType()));\n        }\n\n        return get_string(input_format_t::bson, len - static_cast<NumberType>(1), result) && get() != std::char_traits<char_type>::eof();\n    }\n\n    /*!\n    @brief Parses a byte array input of length @a len from the BSON input.\n    @param[in] len  The length of the byte array to be read.\n    @param[in,out] result  A reference to the binary variable where the read\n                            array is to be stored.\n    @tparam NumberType The type of the length @a len\n    @pre len >= 0\n    @return `true` if the byte array was successfully parsed\n    */\n    template<typename NumberType>\n    bool get_bson_binary(const NumberType len, binary_t& result)\n    {\n        if (JSON_HEDLEY_UNLIKELY(len < 0))\n        {\n            auto last_token = get_token_string();\n            return sax->parse_error(chars_read, last_token, parse_error::create(112, chars_read, exception_message(input_format_t::bson, \"byte array length cannot be negative, is \" + std::to_string(len), \"binary\"), BasicJsonType()));\n        }\n\n        // All BSON binary values have a subtype\n        std::uint8_t subtype{};\n        get_number<std::uint8_t>(input_format_t::bson, subtype);\n        result.set_subtype(subtype);\n\n        return get_binary(input_format_t::bson, len, result);\n    }\n\n    /*!\n    @brief Read a BSON document element of the given @a element_type.\n    @param[in] element_type The BSON element type, c.f. http://bsonspec.org/spec.html\n    @param[in] element_type_parse_position The position in the input stream,\n               where the `element_type` was read.\n    @warning Not all BSON element types are supported yet. An unsupported\n             @a element_type will give rise to a parse_error.114:\n             Unsupported BSON record type 0x...\n    @return whether a valid BSON-object/array was passed to the SAX parser\n    */\n    bool parse_bson_element_internal(const char_int_type element_type,\n                                     const std::size_t element_type_parse_position)\n    {\n        switch (element_type)\n        {\n            case 0x01: // double\n            {\n                double number{};\n                return get_number<double, true>(input_format_t::bson, number) && sax->number_float(static_cast<number_float_t>(number), \"\");\n            }\n\n            case 0x02: // string\n            {\n                std::int32_t len{};\n                string_t value;\n                return get_number<std::int32_t, true>(input_format_t::bson, len) && get_bson_string(len, value) && sax->string(value);\n            }\n\n            case 0x03: // object\n            {\n                return parse_bson_internal();\n            }\n\n            case 0x04: // array\n            {\n                return parse_bson_array();\n            }\n\n            case 0x05: // binary\n            {\n                std::int32_t len{};\n                binary_t value;\n                return get_number<std::int32_t, true>(input_format_t::bson, len) && get_bson_binary(len, value) && sax->binary(value);\n            }\n\n            case 0x08: // boolean\n            {\n                return sax->boolean(get() != 0);\n            }\n\n            case 0x0A: // null\n            {\n                return sax->null();\n            }\n\n            case 0x10: // int32\n            {\n                std::int32_t value{};\n                return get_number<std::int32_t, true>(input_format_t::bson, value) && sax->number_integer(value);\n            }\n\n            case 0x12: // int64\n            {\n                std::int64_t value{};\n                return get_number<std::int64_t, true>(input_format_t::bson, value) && sax->number_integer(value);\n            }\n\n            default: // anything else not supported (yet)\n            {\n                std::array<char, 3> cr{{}};\n                (std::snprintf)(cr.data(), cr.size(), \"%.2hhX\", static_cast<unsigned char>(element_type)); // NOLINT(cppcoreguidelines-pro-type-vararg,hicpp-vararg)\n                return sax->parse_error(element_type_parse_position, std::string(cr.data()), parse_error::create(114, element_type_parse_position, \"Unsupported BSON record type 0x\" + std::string(cr.data()), BasicJsonType()));\n            }\n        }\n    }\n\n    /*!\n    @brief Read a BSON element list (as specified in the BSON-spec)\n\n    The same binary layout is used for objects and arrays, hence it must be\n    indicated with the argument @a is_array which one is expected\n    (true --> array, false --> object).\n\n    @param[in] is_array Determines if the element list being read is to be\n                        treated as an object (@a is_array == false), or as an\n                        array (@a is_array == true).\n    @return whether a valid BSON-object/array was passed to the SAX parser\n    */\n    bool parse_bson_element_list(const bool is_array)\n    {\n        string_t key;\n\n        while (auto element_type = get())\n        {\n            if (JSON_HEDLEY_UNLIKELY(!unexpect_eof(input_format_t::bson, \"element list\")))\n            {\n                return false;\n            }\n\n            const std::size_t element_type_parse_position = chars_read;\n            if (JSON_HEDLEY_UNLIKELY(!get_bson_cstr(key)))\n            {\n                return false;\n            }\n\n            if (!is_array && !sax->key(key))\n            {\n                return false;\n            }\n\n            if (JSON_HEDLEY_UNLIKELY(!parse_bson_element_internal(element_type, element_type_parse_position)))\n            {\n                return false;\n            }\n\n            // get_bson_cstr only appends\n            key.clear();\n        }\n\n        return true;\n    }\n\n    /*!\n    @brief Reads an array from the BSON input and passes it to the SAX-parser.\n    @return whether a valid BSON-array was passed to the SAX parser\n    */\n    bool parse_bson_array()\n    {\n        std::int32_t document_size{};\n        get_number<std::int32_t, true>(input_format_t::bson, document_size);\n\n        if (JSON_HEDLEY_UNLIKELY(!sax->start_array(std::size_t(-1))))\n        {\n            return false;\n        }\n\n        if (JSON_HEDLEY_UNLIKELY(!parse_bson_element_list(/*is_array*/true)))\n        {\n            return false;\n        }\n\n        return sax->end_array();\n    }\n\n    //////////\n    // CBOR //\n    //////////\n\n    /*!\n    @param[in] get_char  whether a new character should be retrieved from the\n                         input (true) or whether the last read character should\n                         be considered instead (false)\n    @param[in] tag_handler how CBOR tags should be treated\n\n    @return whether a valid CBOR value was passed to the SAX parser\n    */\n    bool parse_cbor_internal(const bool get_char,\n                             const cbor_tag_handler_t tag_handler)\n    {\n        switch (get_char ? get() : current)\n        {\n            // EOF\n            case std::char_traits<char_type>::eof():\n                return unexpect_eof(input_format_t::cbor, \"value\");\n\n            // Integer 0x00..0x17 (0..23)\n            case 0x00:\n            case 0x01:\n            case 0x02:\n            case 0x03:\n            case 0x04:\n            case 0x05:\n            case 0x06:\n            case 0x07:\n            case 0x08:\n            case 0x09:\n            case 0x0A:\n            case 0x0B:\n            case 0x0C:\n            case 0x0D:\n            case 0x0E:\n            case 0x0F:\n            case 0x10:\n            case 0x11:\n            case 0x12:\n            case 0x13:\n            case 0x14:\n            case 0x15:\n            case 0x16:\n            case 0x17:\n                return sax->number_unsigned(static_cast<number_unsigned_t>(current));\n\n            case 0x18: // Unsigned integer (one-byte uint8_t follows)\n            {\n                std::uint8_t number{};\n                return get_number(input_format_t::cbor, number) && sax->number_unsigned(number);\n            }\n\n            case 0x19: // Unsigned integer (two-byte uint16_t follows)\n            {\n                std::uint16_t number{};\n                return get_number(input_format_t::cbor, number) && sax->number_unsigned(number);\n            }\n\n            case 0x1A: // Unsigned integer (four-byte uint32_t follows)\n            {\n                std::uint32_t number{};\n                return get_number(input_format_t::cbor, number) && sax->number_unsigned(number);\n            }\n\n            case 0x1B: // Unsigned integer (eight-byte uint64_t follows)\n            {\n                std::uint64_t number{};\n                return get_number(input_format_t::cbor, number) && sax->number_unsigned(number);\n            }\n\n            // Negative integer -1-0x00..-1-0x17 (-1..-24)\n            case 0x20:\n            case 0x21:\n            case 0x22:\n            case 0x23:\n            case 0x24:\n            case 0x25:\n            case 0x26:\n            case 0x27:\n            case 0x28:\n            case 0x29:\n            case 0x2A:\n            case 0x2B:\n            case 0x2C:\n            case 0x2D:\n            case 0x2E:\n            case 0x2F:\n            case 0x30:\n            case 0x31:\n            case 0x32:\n            case 0x33:\n            case 0x34:\n            case 0x35:\n            case 0x36:\n            case 0x37:\n                return sax->number_integer(static_cast<std::int8_t>(0x20 - 1 - current));\n\n            case 0x38: // Negative integer (one-byte uint8_t follows)\n            {\n                std::uint8_t number{};\n                return get_number(input_format_t::cbor, number) && sax->number_integer(static_cast<number_integer_t>(-1) - number);\n            }\n\n            case 0x39: // Negative integer -1-n (two-byte uint16_t follows)\n            {\n                std::uint16_t number{};\n                return get_number(input_format_t::cbor, number) && sax->number_integer(static_cast<number_integer_t>(-1) - number);\n            }\n\n            case 0x3A: // Negative integer -1-n (four-byte uint32_t follows)\n            {\n                std::uint32_t number{};\n                return get_number(input_format_t::cbor, number) && sax->number_integer(static_cast<number_integer_t>(-1) - number);\n            }\n\n            case 0x3B: // Negative integer -1-n (eight-byte uint64_t follows)\n            {\n                std::uint64_t number{};\n                return get_number(input_format_t::cbor, number) && sax->number_integer(static_cast<number_integer_t>(-1)\n                        - static_cast<number_integer_t>(number));\n            }\n\n            // Binary data (0x00..0x17 bytes follow)\n            case 0x40:\n            case 0x41:\n            case 0x42:\n            case 0x43:\n            case 0x44:\n            case 0x45:\n            case 0x46:\n            case 0x47:\n            case 0x48:\n            case 0x49:\n            case 0x4A:\n            case 0x4B:\n            case 0x4C:\n            case 0x4D:\n            case 0x4E:\n            case 0x4F:\n            case 0x50:\n            case 0x51:\n            case 0x52:\n            case 0x53:\n            case 0x54:\n            case 0x55:\n            case 0x56:\n            case 0x57:\n            case 0x58: // Binary data (one-byte uint8_t for n follows)\n            case 0x59: // Binary data (two-byte uint16_t for n follow)\n            case 0x5A: // Binary data (four-byte uint32_t for n follow)\n            case 0x5B: // Binary data (eight-byte uint64_t for n follow)\n            case 0x5F: // Binary data (indefinite length)\n            {\n                binary_t b;\n                return get_cbor_binary(b) && sax->binary(b);\n            }\n\n            // UTF-8 string (0x00..0x17 bytes follow)\n            case 0x60:\n            case 0x61:\n            case 0x62:\n            case 0x63:\n            case 0x64:\n            case 0x65:\n            case 0x66:\n            case 0x67:\n            case 0x68:\n            case 0x69:\n            case 0x6A:\n            case 0x6B:\n            case 0x6C:\n            case 0x6D:\n            case 0x6E:\n            case 0x6F:\n            case 0x70:\n            case 0x71:\n            case 0x72:\n            case 0x73:\n            case 0x74:\n            case 0x75:\n            case 0x76:\n            case 0x77:\n            case 0x78: // UTF-8 string (one-byte uint8_t for n follows)\n            case 0x79: // UTF-8 string (two-byte uint16_t for n follow)\n            case 0x7A: // UTF-8 string (four-byte uint32_t for n follow)\n            case 0x7B: // UTF-8 string (eight-byte uint64_t for n follow)\n            case 0x7F: // UTF-8 string (indefinite length)\n            {\n                string_t s;\n                return get_cbor_string(s) && sax->string(s);\n            }\n\n            // array (0x00..0x17 data items follow)\n            case 0x80:\n            case 0x81:\n            case 0x82:\n            case 0x83:\n            case 0x84:\n            case 0x85:\n            case 0x86:\n            case 0x87:\n            case 0x88:\n            case 0x89:\n            case 0x8A:\n            case 0x8B:\n            case 0x8C:\n            case 0x8D:\n            case 0x8E:\n            case 0x8F:\n            case 0x90:\n            case 0x91:\n            case 0x92:\n            case 0x93:\n            case 0x94:\n            case 0x95:\n            case 0x96:\n            case 0x97:\n                return get_cbor_array(static_cast<std::size_t>(static_cast<unsigned int>(current) & 0x1Fu), tag_handler);\n\n            case 0x98: // array (one-byte uint8_t for n follows)\n            {\n                std::uint8_t len{};\n                return get_number(input_format_t::cbor, len) && get_cbor_array(static_cast<std::size_t>(len), tag_handler);\n            }\n\n            case 0x99: // array (two-byte uint16_t for n follow)\n            {\n                std::uint16_t len{};\n                return get_number(input_format_t::cbor, len) && get_cbor_array(static_cast<std::size_t>(len), tag_handler);\n            }\n\n            case 0x9A: // array (four-byte uint32_t for n follow)\n            {\n                std::uint32_t len{};\n                return get_number(input_format_t::cbor, len) && get_cbor_array(static_cast<std::size_t>(len), tag_handler);\n            }\n\n            case 0x9B: // array (eight-byte uint64_t for n follow)\n            {\n                std::uint64_t len{};\n                return get_number(input_format_t::cbor, len) && get_cbor_array(static_cast<std::size_t>(len), tag_handler);\n            }\n\n            case 0x9F: // array (indefinite length)\n                return get_cbor_array(std::size_t(-1), tag_handler);\n\n            // map (0x00..0x17 pairs of data items follow)\n            case 0xA0:\n            case 0xA1:\n            case 0xA2:\n            case 0xA3:\n            case 0xA4:\n            case 0xA5:\n            case 0xA6:\n            case 0xA7:\n            case 0xA8:\n            case 0xA9:\n            case 0xAA:\n            case 0xAB:\n            case 0xAC:\n            case 0xAD:\n            case 0xAE:\n            case 0xAF:\n            case 0xB0:\n            case 0xB1:\n            case 0xB2:\n            case 0xB3:\n            case 0xB4:\n            case 0xB5:\n            case 0xB6:\n            case 0xB7:\n                return get_cbor_object(static_cast<std::size_t>(static_cast<unsigned int>(current) & 0x1Fu), tag_handler);\n\n            case 0xB8: // map (one-byte uint8_t for n follows)\n            {\n                std::uint8_t len{};\n                return get_number(input_format_t::cbor, len) && get_cbor_object(static_cast<std::size_t>(len), tag_handler);\n            }\n\n            case 0xB9: // map (two-byte uint16_t for n follow)\n            {\n                std::uint16_t len{};\n                return get_number(input_format_t::cbor, len) && get_cbor_object(static_cast<std::size_t>(len), tag_handler);\n            }\n\n            case 0xBA: // map (four-byte uint32_t for n follow)\n            {\n                std::uint32_t len{};\n                return get_number(input_format_t::cbor, len) && get_cbor_object(static_cast<std::size_t>(len), tag_handler);\n            }\n\n            case 0xBB: // map (eight-byte uint64_t for n follow)\n            {\n                std::uint64_t len{};\n                return get_number(input_format_t::cbor, len) && get_cbor_object(static_cast<std::size_t>(len), tag_handler);\n            }\n\n            case 0xBF: // map (indefinite length)\n                return get_cbor_object(std::size_t(-1), tag_handler);\n\n            case 0xC6: // tagged item\n            case 0xC7:\n            case 0xC8:\n            case 0xC9:\n            case 0xCA:\n            case 0xCB:\n            case 0xCC:\n            case 0xCD:\n            case 0xCE:\n            case 0xCF:\n            case 0xD0:\n            case 0xD1:\n            case 0xD2:\n            case 0xD3:\n            case 0xD4:\n            case 0xD8: // tagged item (1 bytes follow)\n            case 0xD9: // tagged item (2 bytes follow)\n            case 0xDA: // tagged item (4 bytes follow)\n            case 0xDB: // tagged item (8 bytes follow)\n            {\n                switch (tag_handler)\n                {\n                    case cbor_tag_handler_t::error:\n                    {\n                        auto last_token = get_token_string();\n                        return sax->parse_error(chars_read, last_token, parse_error::create(112, chars_read, exception_message(input_format_t::cbor, \"invalid byte: 0x\" + last_token, \"value\"), BasicJsonType()));\n                    }\n\n                    case cbor_tag_handler_t::ignore:\n                    {\n                        switch (current)\n                        {\n                            case 0xD8:\n                            {\n                                std::uint8_t len{};\n                                get_number(input_format_t::cbor, len);\n                                break;\n                            }\n                            case 0xD9:\n                            {\n                                std::uint16_t len{};\n                                get_number(input_format_t::cbor, len);\n                                break;\n                            }\n                            case 0xDA:\n                            {\n                                std::uint32_t len{};\n                                get_number(input_format_t::cbor, len);\n                                break;\n                            }\n                            case 0xDB:\n                            {\n                                std::uint64_t len{};\n                                get_number(input_format_t::cbor, len);\n                                break;\n                            }\n                            default:\n                                break;\n                        }\n                        return parse_cbor_internal(true, tag_handler);\n                    }\n\n                    default:                 // LCOV_EXCL_LINE\n                        JSON_ASSERT(false); // NOLINT(cert-dcl03-c,hicpp-static-assert,misc-static-assert) LCOV_EXCL_LINE\n                        return false;        // LCOV_EXCL_LINE\n                }\n            }\n\n            case 0xF4: // false\n                return sax->boolean(false);\n\n            case 0xF5: // true\n                return sax->boolean(true);\n\n            case 0xF6: // null\n                return sax->null();\n\n            case 0xF9: // Half-Precision Float (two-byte IEEE 754)\n            {\n                const auto byte1_raw = get();\n                if (JSON_HEDLEY_UNLIKELY(!unexpect_eof(input_format_t::cbor, \"number\")))\n                {\n                    return false;\n                }\n                const auto byte2_raw = get();\n                if (JSON_HEDLEY_UNLIKELY(!unexpect_eof(input_format_t::cbor, \"number\")))\n                {\n                    return false;\n                }\n\n                const auto byte1 = static_cast<unsigned char>(byte1_raw);\n                const auto byte2 = static_cast<unsigned char>(byte2_raw);\n\n                // code from RFC 7049, Appendix D, Figure 3:\n                // As half-precision floating-point numbers were only added\n                // to IEEE 754 in 2008, today's programming platforms often\n                // still only have limited support for them. It is very\n                // easy to include at least decoding support for them even\n                // without such support. An example of a small decoder for\n                // half-precision floating-point numbers in the C language\n                // is shown in Fig. 3.\n                const auto half = static_cast<unsigned int>((byte1 << 8u) + byte2);\n                const double val = [&half]\n                {\n                    const int exp = (half >> 10u) & 0x1Fu;\n                    const unsigned int mant = half & 0x3FFu;\n                    JSON_ASSERT(0 <= exp&& exp <= 32);\n                    JSON_ASSERT(mant <= 1024);\n                    switch (exp)\n                    {\n                        case 0:\n                            return std::ldexp(mant, -24);\n                        case 31:\n                            return (mant == 0)\n                            ? std::numeric_limits<double>::infinity()\n                            : std::numeric_limits<double>::quiet_NaN();\n                        default:\n                            return std::ldexp(mant + 1024, exp - 25);\n                    }\n                }();\n                return sax->number_float((half & 0x8000u) != 0\n                                         ? static_cast<number_float_t>(-val)\n                                         : static_cast<number_float_t>(val), \"\");\n            }\n\n            case 0xFA: // Single-Precision Float (four-byte IEEE 754)\n            {\n                float number{};\n                return get_number(input_format_t::cbor, number) && sax->number_float(static_cast<number_float_t>(number), \"\");\n            }\n\n            case 0xFB: // Double-Precision Float (eight-byte IEEE 754)\n            {\n                double number{};\n                return get_number(input_format_t::cbor, number) && sax->number_float(static_cast<number_float_t>(number), \"\");\n            }\n\n            default: // anything else (0xFF is handled inside the other types)\n            {\n                auto last_token = get_token_string();\n                return sax->parse_error(chars_read, last_token, parse_error::create(112, chars_read, exception_message(input_format_t::cbor, \"invalid byte: 0x\" + last_token, \"value\"), BasicJsonType()));\n            }\n        }\n    }\n\n    /*!\n    @brief reads a CBOR string\n\n    This function first reads starting bytes to determine the expected\n    string length and then copies this number of bytes into a string.\n    Additionally, CBOR's strings with indefinite lengths are supported.\n\n    @param[out] result  created string\n\n    @return whether string creation completed\n    */\n    bool get_cbor_string(string_t& result)\n    {\n        if (JSON_HEDLEY_UNLIKELY(!unexpect_eof(input_format_t::cbor, \"string\")))\n        {\n            return false;\n        }\n\n        switch (current)\n        {\n            // UTF-8 string (0x00..0x17 bytes follow)\n            case 0x60:\n            case 0x61:\n            case 0x62:\n            case 0x63:\n            case 0x64:\n            case 0x65:\n            case 0x66:\n            case 0x67:\n            case 0x68:\n            case 0x69:\n            case 0x6A:\n            case 0x6B:\n            case 0x6C:\n            case 0x6D:\n            case 0x6E:\n            case 0x6F:\n            case 0x70:\n            case 0x71:\n            case 0x72:\n            case 0x73:\n            case 0x74:\n            case 0x75:\n            case 0x76:\n            case 0x77:\n            {\n                return get_string(input_format_t::cbor, static_cast<unsigned int>(current) & 0x1Fu, result);\n            }\n\n            case 0x78: // UTF-8 string (one-byte uint8_t for n follows)\n            {\n                std::uint8_t len{};\n                return get_number(input_format_t::cbor, len) && get_string(input_format_t::cbor, len, result);\n            }\n\n            case 0x79: // UTF-8 string (two-byte uint16_t for n follow)\n            {\n                std::uint16_t len{};\n                return get_number(input_format_t::cbor, len) && get_string(input_format_t::cbor, len, result);\n            }\n\n            case 0x7A: // UTF-8 string (four-byte uint32_t for n follow)\n            {\n                std::uint32_t len{};\n                return get_number(input_format_t::cbor, len) && get_string(input_format_t::cbor, len, result);\n            }\n\n            case 0x7B: // UTF-8 string (eight-byte uint64_t for n follow)\n            {\n                std::uint64_t len{};\n                return get_number(input_format_t::cbor, len) && get_string(input_format_t::cbor, len, result);\n            }\n\n            case 0x7F: // UTF-8 string (indefinite length)\n            {\n                while (get() != 0xFF)\n                {\n                    string_t chunk;\n                    if (!get_cbor_string(chunk))\n                    {\n                        return false;\n                    }\n                    result.append(chunk);\n                }\n                return true;\n            }\n\n            default:\n            {\n                auto last_token = get_token_string();\n                return sax->parse_error(chars_read, last_token, parse_error::create(113, chars_read, exception_message(input_format_t::cbor, \"expected length specification (0x60-0x7B) or indefinite string type (0x7F); last byte: 0x\" + last_token, \"string\"), BasicJsonType()));\n            }\n        }\n    }\n\n    /*!\n    @brief reads a CBOR byte array\n\n    This function first reads starting bytes to determine the expected\n    byte array length and then copies this number of bytes into the byte array.\n    Additionally, CBOR's byte arrays with indefinite lengths are supported.\n\n    @param[out] result  created byte array\n\n    @return whether byte array creation completed\n    */\n    bool get_cbor_binary(binary_t& result)\n    {\n        if (JSON_HEDLEY_UNLIKELY(!unexpect_eof(input_format_t::cbor, \"binary\")))\n        {\n            return false;\n        }\n\n        switch (current)\n        {\n            // Binary data (0x00..0x17 bytes follow)\n            case 0x40:\n            case 0x41:\n            case 0x42:\n            case 0x43:\n            case 0x44:\n            case 0x45:\n            case 0x46:\n            case 0x47:\n            case 0x48:\n            case 0x49:\n            case 0x4A:\n            case 0x4B:\n            case 0x4C:\n            case 0x4D:\n            case 0x4E:\n            case 0x4F:\n            case 0x50:\n            case 0x51:\n            case 0x52:\n            case 0x53:\n            case 0x54:\n            case 0x55:\n            case 0x56:\n            case 0x57:\n            {\n                return get_binary(input_format_t::cbor, static_cast<unsigned int>(current) & 0x1Fu, result);\n            }\n\n            case 0x58: // Binary data (one-byte uint8_t for n follows)\n            {\n                std::uint8_t len{};\n                return get_number(input_format_t::cbor, len) &&\n                       get_binary(input_format_t::cbor, len, result);\n            }\n\n            case 0x59: // Binary data (two-byte uint16_t for n follow)\n            {\n                std::uint16_t len{};\n                return get_number(input_format_t::cbor, len) &&\n                       get_binary(input_format_t::cbor, len, result);\n            }\n\n            case 0x5A: // Binary data (four-byte uint32_t for n follow)\n            {\n                std::uint32_t len{};\n                return get_number(input_format_t::cbor, len) &&\n                       get_binary(input_format_t::cbor, len, result);\n            }\n\n            case 0x5B: // Binary data (eight-byte uint64_t for n follow)\n            {\n                std::uint64_t len{};\n                return get_number(input_format_t::cbor, len) &&\n                       get_binary(input_format_t::cbor, len, result);\n            }\n\n            case 0x5F: // Binary data (indefinite length)\n            {\n                while (get() != 0xFF)\n                {\n                    binary_t chunk;\n                    if (!get_cbor_binary(chunk))\n                    {\n                        return false;\n                    }\n                    result.insert(result.end(), chunk.begin(), chunk.end());\n                }\n                return true;\n            }\n\n            default:\n            {\n                auto last_token = get_token_string();\n                return sax->parse_error(chars_read, last_token, parse_error::create(113, chars_read, exception_message(input_format_t::cbor, \"expected length specification (0x40-0x5B) or indefinite binary array type (0x5F); last byte: 0x\" + last_token, \"binary\"), BasicJsonType()));\n            }\n        }\n    }\n\n    /*!\n    @param[in] len  the length of the array or std::size_t(-1) for an\n                    array of indefinite size\n    @param[in] tag_handler how CBOR tags should be treated\n    @return whether array creation completed\n    */\n    bool get_cbor_array(const std::size_t len,\n                        const cbor_tag_handler_t tag_handler)\n    {\n        if (JSON_HEDLEY_UNLIKELY(!sax->start_array(len)))\n        {\n            return false;\n        }\n\n        if (len != std::size_t(-1))\n        {\n            for (std::size_t i = 0; i < len; ++i)\n            {\n                if (JSON_HEDLEY_UNLIKELY(!parse_cbor_internal(true, tag_handler)))\n                {\n                    return false;\n                }\n            }\n        }\n        else\n        {\n            while (get() != 0xFF)\n            {\n                if (JSON_HEDLEY_UNLIKELY(!parse_cbor_internal(false, tag_handler)))\n                {\n                    return false;\n                }\n            }\n        }\n\n        return sax->end_array();\n    }\n\n    /*!\n    @param[in] len  the length of the object or std::size_t(-1) for an\n                    object of indefinite size\n    @param[in] tag_handler how CBOR tags should be treated\n    @return whether object creation completed\n    */\n    bool get_cbor_object(const std::size_t len,\n                         const cbor_tag_handler_t tag_handler)\n    {\n        if (JSON_HEDLEY_UNLIKELY(!sax->start_object(len)))\n        {\n            return false;\n        }\n\n        string_t key;\n        if (len != std::size_t(-1))\n        {\n            for (std::size_t i = 0; i < len; ++i)\n            {\n                get();\n                if (JSON_HEDLEY_UNLIKELY(!get_cbor_string(key) || !sax->key(key)))\n                {\n                    return false;\n                }\n\n                if (JSON_HEDLEY_UNLIKELY(!parse_cbor_internal(true, tag_handler)))\n                {\n                    return false;\n                }\n                key.clear();\n            }\n        }\n        else\n        {\n            while (get() != 0xFF)\n            {\n                if (JSON_HEDLEY_UNLIKELY(!get_cbor_string(key) || !sax->key(key)))\n                {\n                    return false;\n                }\n\n                if (JSON_HEDLEY_UNLIKELY(!parse_cbor_internal(true, tag_handler)))\n                {\n                    return false;\n                }\n                key.clear();\n            }\n        }\n\n        return sax->end_object();\n    }\n\n    /////////////\n    // MsgPack //\n    /////////////\n\n    /*!\n    @return whether a valid MessagePack value was passed to the SAX parser\n    */\n    bool parse_msgpack_internal()\n    {\n        switch (get())\n        {\n            // EOF\n            case std::char_traits<char_type>::eof():\n                return unexpect_eof(input_format_t::msgpack, \"value\");\n\n            // positive fixint\n            case 0x00:\n            case 0x01:\n            case 0x02:\n            case 0x03:\n            case 0x04:\n            case 0x05:\n            case 0x06:\n            case 0x07:\n            case 0x08:\n            case 0x09:\n            case 0x0A:\n            case 0x0B:\n            case 0x0C:\n            case 0x0D:\n            case 0x0E:\n            case 0x0F:\n            case 0x10:\n            case 0x11:\n            case 0x12:\n            case 0x13:\n            case 0x14:\n            case 0x15:\n            case 0x16:\n            case 0x17:\n            case 0x18:\n            case 0x19:\n            case 0x1A:\n            case 0x1B:\n            case 0x1C:\n            case 0x1D:\n            case 0x1E:\n            case 0x1F:\n            case 0x20:\n            case 0x21:\n            case 0x22:\n            case 0x23:\n            case 0x24:\n            case 0x25:\n            case 0x26:\n            case 0x27:\n            case 0x28:\n            case 0x29:\n            case 0x2A:\n            case 0x2B:\n            case 0x2C:\n            case 0x2D:\n            case 0x2E:\n            case 0x2F:\n            case 0x30:\n            case 0x31:\n            case 0x32:\n            case 0x33:\n            case 0x34:\n            case 0x35:\n            case 0x36:\n            case 0x37:\n            case 0x38:\n            case 0x39:\n            case 0x3A:\n            case 0x3B:\n            case 0x3C:\n            case 0x3D:\n            case 0x3E:\n            case 0x3F:\n            case 0x40:\n            case 0x41:\n            case 0x42:\n            case 0x43:\n            case 0x44:\n            case 0x45:\n            case 0x46:\n            case 0x47:\n            case 0x48:\n            case 0x49:\n            case 0x4A:\n            case 0x4B:\n            case 0x4C:\n            case 0x4D:\n            case 0x4E:\n            case 0x4F:\n            case 0x50:\n            case 0x51:\n            case 0x52:\n            case 0x53:\n            case 0x54:\n            case 0x55:\n            case 0x56:\n            case 0x57:\n            case 0x58:\n            case 0x59:\n            case 0x5A:\n            case 0x5B:\n            case 0x5C:\n            case 0x5D:\n            case 0x5E:\n            case 0x5F:\n            case 0x60:\n            case 0x61:\n            case 0x62:\n            case 0x63:\n            case 0x64:\n            case 0x65:\n            case 0x66:\n            case 0x67:\n            case 0x68:\n            case 0x69:\n            case 0x6A:\n            case 0x6B:\n            case 0x6C:\n            case 0x6D:\n            case 0x6E:\n            case 0x6F:\n            case 0x70:\n            case 0x71:\n            case 0x72:\n            case 0x73:\n            case 0x74:\n            case 0x75:\n            case 0x76:\n            case 0x77:\n            case 0x78:\n            case 0x79:\n            case 0x7A:\n            case 0x7B:\n            case 0x7C:\n            case 0x7D:\n            case 0x7E:\n            case 0x7F:\n                return sax->number_unsigned(static_cast<number_unsigned_t>(current));\n\n            // fixmap\n            case 0x80:\n            case 0x81:\n            case 0x82:\n            case 0x83:\n            case 0x84:\n            case 0x85:\n            case 0x86:\n            case 0x87:\n            case 0x88:\n            case 0x89:\n            case 0x8A:\n            case 0x8B:\n            case 0x8C:\n            case 0x8D:\n            case 0x8E:\n            case 0x8F:\n                return get_msgpack_object(static_cast<std::size_t>(static_cast<unsigned int>(current) & 0x0Fu));\n\n            // fixarray\n            case 0x90:\n            case 0x91:\n            case 0x92:\n            case 0x93:\n            case 0x94:\n            case 0x95:\n            case 0x96:\n            case 0x97:\n            case 0x98:\n            case 0x99:\n            case 0x9A:\n            case 0x9B:\n            case 0x9C:\n            case 0x9D:\n            case 0x9E:\n            case 0x9F:\n                return get_msgpack_array(static_cast<std::size_t>(static_cast<unsigned int>(current) & 0x0Fu));\n\n            // fixstr\n            case 0xA0:\n            case 0xA1:\n            case 0xA2:\n            case 0xA3:\n            case 0xA4:\n            case 0xA5:\n            case 0xA6:\n            case 0xA7:\n            case 0xA8:\n            case 0xA9:\n            case 0xAA:\n            case 0xAB:\n            case 0xAC:\n            case 0xAD:\n            case 0xAE:\n            case 0xAF:\n            case 0xB0:\n            case 0xB1:\n            case 0xB2:\n            case 0xB3:\n            case 0xB4:\n            case 0xB5:\n            case 0xB6:\n            case 0xB7:\n            case 0xB8:\n            case 0xB9:\n            case 0xBA:\n            case 0xBB:\n            case 0xBC:\n            case 0xBD:\n            case 0xBE:\n            case 0xBF:\n            case 0xD9: // str 8\n            case 0xDA: // str 16\n            case 0xDB: // str 32\n            {\n                string_t s;\n                return get_msgpack_string(s) && sax->string(s);\n            }\n\n            case 0xC0: // nil\n                return sax->null();\n\n            case 0xC2: // false\n                return sax->boolean(false);\n\n            case 0xC3: // true\n                return sax->boolean(true);\n\n            case 0xC4: // bin 8\n            case 0xC5: // bin 16\n            case 0xC6: // bin 32\n            case 0xC7: // ext 8\n            case 0xC8: // ext 16\n            case 0xC9: // ext 32\n            case 0xD4: // fixext 1\n            case 0xD5: // fixext 2\n            case 0xD6: // fixext 4\n            case 0xD7: // fixext 8\n            case 0xD8: // fixext 16\n            {\n                binary_t b;\n                return get_msgpack_binary(b) && sax->binary(b);\n            }\n\n            case 0xCA: // float 32\n            {\n                float number{};\n                return get_number(input_format_t::msgpack, number) && sax->number_float(static_cast<number_float_t>(number), \"\");\n            }\n\n            case 0xCB: // float 64\n            {\n                double number{};\n                return get_number(input_format_t::msgpack, number) && sax->number_float(static_cast<number_float_t>(number), \"\");\n            }\n\n            case 0xCC: // uint 8\n            {\n                std::uint8_t number{};\n                return get_number(input_format_t::msgpack, number) && sax->number_unsigned(number);\n            }\n\n            case 0xCD: // uint 16\n            {\n                std::uint16_t number{};\n                return get_number(input_format_t::msgpack, number) && sax->number_unsigned(number);\n            }\n\n            case 0xCE: // uint 32\n            {\n                std::uint32_t number{};\n                return get_number(input_format_t::msgpack, number) && sax->number_unsigned(number);\n            }\n\n            case 0xCF: // uint 64\n            {\n                std::uint64_t number{};\n                return get_number(input_format_t::msgpack, number) && sax->number_unsigned(number);\n            }\n\n            case 0xD0: // int 8\n            {\n                std::int8_t number{};\n                return get_number(input_format_t::msgpack, number) && sax->number_integer(number);\n            }\n\n            case 0xD1: // int 16\n            {\n                std::int16_t number{};\n                return get_number(input_format_t::msgpack, number) && sax->number_integer(number);\n            }\n\n            case 0xD2: // int 32\n            {\n                std::int32_t number{};\n                return get_number(input_format_t::msgpack, number) && sax->number_integer(number);\n            }\n\n            case 0xD3: // int 64\n            {\n                std::int64_t number{};\n                return get_number(input_format_t::msgpack, number) && sax->number_integer(number);\n            }\n\n            case 0xDC: // array 16\n            {\n                std::uint16_t len{};\n                return get_number(input_format_t::msgpack, len) && get_msgpack_array(static_cast<std::size_t>(len));\n            }\n\n            case 0xDD: // array 32\n            {\n                std::uint32_t len{};\n                return get_number(input_format_t::msgpack, len) && get_msgpack_array(static_cast<std::size_t>(len));\n            }\n\n            case 0xDE: // map 16\n            {\n                std::uint16_t len{};\n                return get_number(input_format_t::msgpack, len) && get_msgpack_object(static_cast<std::size_t>(len));\n            }\n\n            case 0xDF: // map 32\n            {\n                std::uint32_t len{};\n                return get_number(input_format_t::msgpack, len) && get_msgpack_object(static_cast<std::size_t>(len));\n            }\n\n            // negative fixint\n            case 0xE0:\n            case 0xE1:\n            case 0xE2:\n            case 0xE3:\n            case 0xE4:\n            case 0xE5:\n            case 0xE6:\n            case 0xE7:\n            case 0xE8:\n            case 0xE9:\n            case 0xEA:\n            case 0xEB:\n            case 0xEC:\n            case 0xED:\n            case 0xEE:\n            case 0xEF:\n            case 0xF0:\n            case 0xF1:\n            case 0xF2:\n            case 0xF3:\n            case 0xF4:\n            case 0xF5:\n            case 0xF6:\n            case 0xF7:\n            case 0xF8:\n            case 0xF9:\n            case 0xFA:\n            case 0xFB:\n            case 0xFC:\n            case 0xFD:\n            case 0xFE:\n            case 0xFF:\n                return sax->number_integer(static_cast<std::int8_t>(current));\n\n            default: // anything else\n            {\n                auto last_token = get_token_string();\n                return sax->parse_error(chars_read, last_token, parse_error::create(112, chars_read, exception_message(input_format_t::msgpack, \"invalid byte: 0x\" + last_token, \"value\"), BasicJsonType()));\n            }\n        }\n    }\n\n    /*!\n    @brief reads a MessagePack string\n\n    This function first reads starting bytes to determine the expected\n    string length and then copies this number of bytes into a string.\n\n    @param[out] result  created string\n\n    @return whether string creation completed\n    */\n    bool get_msgpack_string(string_t& result)\n    {\n        if (JSON_HEDLEY_UNLIKELY(!unexpect_eof(input_format_t::msgpack, \"string\")))\n        {\n            return false;\n        }\n\n        switch (current)\n        {\n            // fixstr\n            case 0xA0:\n            case 0xA1:\n            case 0xA2:\n            case 0xA3:\n            case 0xA4:\n            case 0xA5:\n            case 0xA6:\n            case 0xA7:\n            case 0xA8:\n            case 0xA9:\n            case 0xAA:\n            case 0xAB:\n            case 0xAC:\n            case 0xAD:\n            case 0xAE:\n            case 0xAF:\n            case 0xB0:\n            case 0xB1:\n            case 0xB2:\n            case 0xB3:\n            case 0xB4:\n            case 0xB5:\n            case 0xB6:\n            case 0xB7:\n            case 0xB8:\n            case 0xB9:\n            case 0xBA:\n            case 0xBB:\n            case 0xBC:\n            case 0xBD:\n            case 0xBE:\n            case 0xBF:\n            {\n                return get_string(input_format_t::msgpack, static_cast<unsigned int>(current) & 0x1Fu, result);\n            }\n\n            case 0xD9: // str 8\n            {\n                std::uint8_t len{};\n                return get_number(input_format_t::msgpack, len) && get_string(input_format_t::msgpack, len, result);\n            }\n\n            case 0xDA: // str 16\n            {\n                std::uint16_t len{};\n                return get_number(input_format_t::msgpack, len) && get_string(input_format_t::msgpack, len, result);\n            }\n\n            case 0xDB: // str 32\n            {\n                std::uint32_t len{};\n                return get_number(input_format_t::msgpack, len) && get_string(input_format_t::msgpack, len, result);\n            }\n\n            default:\n            {\n                auto last_token = get_token_string();\n                return sax->parse_error(chars_read, last_token, parse_error::create(113, chars_read, exception_message(input_format_t::msgpack, \"expected length specification (0xA0-0xBF, 0xD9-0xDB); last byte: 0x\" + last_token, \"string\"), BasicJsonType()));\n            }\n        }\n    }\n\n    /*!\n    @brief reads a MessagePack byte array\n\n    This function first reads starting bytes to determine the expected\n    byte array length and then copies this number of bytes into a byte array.\n\n    @param[out] result  created byte array\n\n    @return whether byte array creation completed\n    */\n    bool get_msgpack_binary(binary_t& result)\n    {\n        // helper function to set the subtype\n        auto assign_and_return_true = [&result](std::int8_t subtype)\n        {\n            result.set_subtype(static_cast<std::uint8_t>(subtype));\n            return true;\n        };\n\n        switch (current)\n        {\n            case 0xC4: // bin 8\n            {\n                std::uint8_t len{};\n                return get_number(input_format_t::msgpack, len) &&\n                       get_binary(input_format_t::msgpack, len, result);\n            }\n\n            case 0xC5: // bin 16\n            {\n                std::uint16_t len{};\n                return get_number(input_format_t::msgpack, len) &&\n                       get_binary(input_format_t::msgpack, len, result);\n            }\n\n            case 0xC6: // bin 32\n            {\n                std::uint32_t len{};\n                return get_number(input_format_t::msgpack, len) &&\n                       get_binary(input_format_t::msgpack, len, result);\n            }\n\n            case 0xC7: // ext 8\n            {\n                std::uint8_t len{};\n                std::int8_t subtype{};\n                return get_number(input_format_t::msgpack, len) &&\n                       get_number(input_format_t::msgpack, subtype) &&\n                       get_binary(input_format_t::msgpack, len, result) &&\n                       assign_and_return_true(subtype);\n            }\n\n            case 0xC8: // ext 16\n            {\n                std::uint16_t len{};\n                std::int8_t subtype{};\n                return get_number(input_format_t::msgpack, len) &&\n                       get_number(input_format_t::msgpack, subtype) &&\n                       get_binary(input_format_t::msgpack, len, result) &&\n                       assign_and_return_true(subtype);\n            }\n\n            case 0xC9: // ext 32\n            {\n                std::uint32_t len{};\n                std::int8_t subtype{};\n                return get_number(input_format_t::msgpack, len) &&\n                       get_number(input_format_t::msgpack, subtype) &&\n                       get_binary(input_format_t::msgpack, len, result) &&\n                       assign_and_return_true(subtype);\n            }\n\n            case 0xD4: // fixext 1\n            {\n                std::int8_t subtype{};\n                return get_number(input_format_t::msgpack, subtype) &&\n                       get_binary(input_format_t::msgpack, 1, result) &&\n                       assign_and_return_true(subtype);\n            }\n\n            case 0xD5: // fixext 2\n            {\n                std::int8_t subtype{};\n                return get_number(input_format_t::msgpack, subtype) &&\n                       get_binary(input_format_t::msgpack, 2, result) &&\n                       assign_and_return_true(subtype);\n            }\n\n            case 0xD6: // fixext 4\n            {\n                std::int8_t subtype{};\n                return get_number(input_format_t::msgpack, subtype) &&\n                       get_binary(input_format_t::msgpack, 4, result) &&\n                       assign_and_return_true(subtype);\n            }\n\n            case 0xD7: // fixext 8\n            {\n                std::int8_t subtype{};\n                return get_number(input_format_t::msgpack, subtype) &&\n                       get_binary(input_format_t::msgpack, 8, result) &&\n                       assign_and_return_true(subtype);\n            }\n\n            case 0xD8: // fixext 16\n            {\n                std::int8_t subtype{};\n                return get_number(input_format_t::msgpack, subtype) &&\n                       get_binary(input_format_t::msgpack, 16, result) &&\n                       assign_and_return_true(subtype);\n            }\n\n            default:           // LCOV_EXCL_LINE\n                return false;  // LCOV_EXCL_LINE\n        }\n    }\n\n    /*!\n    @param[in] len  the length of the array\n    @return whether array creation completed\n    */\n    bool get_msgpack_array(const std::size_t len)\n    {\n        if (JSON_HEDLEY_UNLIKELY(!sax->start_array(len)))\n        {\n            return false;\n        }\n\n        for (std::size_t i = 0; i < len; ++i)\n        {\n            if (JSON_HEDLEY_UNLIKELY(!parse_msgpack_internal()))\n            {\n                return false;\n            }\n        }\n\n        return sax->end_array();\n    }\n\n    /*!\n    @param[in] len  the length of the object\n    @return whether object creation completed\n    */\n    bool get_msgpack_object(const std::size_t len)\n    {\n        if (JSON_HEDLEY_UNLIKELY(!sax->start_object(len)))\n        {\n            return false;\n        }\n\n        string_t key;\n        for (std::size_t i = 0; i < len; ++i)\n        {\n            get();\n            if (JSON_HEDLEY_UNLIKELY(!get_msgpack_string(key) || !sax->key(key)))\n            {\n                return false;\n            }\n\n            if (JSON_HEDLEY_UNLIKELY(!parse_msgpack_internal()))\n            {\n                return false;\n            }\n            key.clear();\n        }\n\n        return sax->end_object();\n    }\n\n    ////////////\n    // UBJSON //\n    ////////////\n\n    /*!\n    @param[in] get_char  whether a new character should be retrieved from the\n                         input (true, default) or whether the last read\n                         character should be considered instead\n\n    @return whether a valid UBJSON value was passed to the SAX parser\n    */\n    bool parse_ubjson_internal(const bool get_char = true)\n    {\n        return get_ubjson_value(get_char ? get_ignore_noop() : current);\n    }\n\n    /*!\n    @brief reads a UBJSON string\n\n    This function is either called after reading the 'S' byte explicitly\n    indicating a string, or in case of an object key where the 'S' byte can be\n    left out.\n\n    @param[out] result   created string\n    @param[in] get_char  whether a new character should be retrieved from the\n                         input (true, default) or whether the last read\n                         character should be considered instead\n\n    @return whether string creation completed\n    */\n    bool get_ubjson_string(string_t& result, const bool get_char = true)\n    {\n        if (get_char)\n        {\n            get();  // TODO(niels): may we ignore N here?\n        }\n\n        if (JSON_HEDLEY_UNLIKELY(!unexpect_eof(input_format_t::ubjson, \"value\")))\n        {\n            return false;\n        }\n\n        switch (current)\n        {\n            case 'U':\n            {\n                std::uint8_t len{};\n                return get_number(input_format_t::ubjson, len) && get_string(input_format_t::ubjson, len, result);\n            }\n\n            case 'i':\n            {\n                std::int8_t len{};\n                return get_number(input_format_t::ubjson, len) && get_string(input_format_t::ubjson, len, result);\n            }\n\n            case 'I':\n            {\n                std::int16_t len{};\n                return get_number(input_format_t::ubjson, len) && get_string(input_format_t::ubjson, len, result);\n            }\n\n            case 'l':\n            {\n                std::int32_t len{};\n                return get_number(input_format_t::ubjson, len) && get_string(input_format_t::ubjson, len, result);\n            }\n\n            case 'L':\n            {\n                std::int64_t len{};\n                return get_number(input_format_t::ubjson, len) && get_string(input_format_t::ubjson, len, result);\n            }\n\n            default:\n                auto last_token = get_token_string();\n                return sax->parse_error(chars_read, last_token, parse_error::create(113, chars_read, exception_message(input_format_t::ubjson, \"expected length type specification (U, i, I, l, L); last byte: 0x\" + last_token, \"string\"), BasicJsonType()));\n        }\n    }\n\n    /*!\n    @param[out] result  determined size\n    @return whether size determination completed\n    */\n    bool get_ubjson_size_value(std::size_t& result)\n    {\n        switch (get_ignore_noop())\n        {\n            case 'U':\n            {\n                std::uint8_t number{};\n                if (JSON_HEDLEY_UNLIKELY(!get_number(input_format_t::ubjson, number)))\n                {\n                    return false;\n                }\n                result = static_cast<std::size_t>(number);\n                return true;\n            }\n\n            case 'i':\n            {\n                std::int8_t number{};\n                if (JSON_HEDLEY_UNLIKELY(!get_number(input_format_t::ubjson, number)))\n                {\n                    return false;\n                }\n                result = static_cast<std::size_t>(number); // NOLINT(bugprone-signed-char-misuse,cert-str34-c): number is not a char\n                return true;\n            }\n\n            case 'I':\n            {\n                std::int16_t number{};\n                if (JSON_HEDLEY_UNLIKELY(!get_number(input_format_t::ubjson, number)))\n                {\n                    return false;\n                }\n                result = static_cast<std::size_t>(number);\n                return true;\n            }\n\n            case 'l':\n            {\n                std::int32_t number{};\n                if (JSON_HEDLEY_UNLIKELY(!get_number(input_format_t::ubjson, number)))\n                {\n                    return false;\n                }\n                result = static_cast<std::size_t>(number);\n                return true;\n            }\n\n            case 'L':\n            {\n                std::int64_t number{};\n                if (JSON_HEDLEY_UNLIKELY(!get_number(input_format_t::ubjson, number)))\n                {\n                    return false;\n                }\n                result = static_cast<std::size_t>(number);\n                return true;\n            }\n\n            default:\n            {\n                auto last_token = get_token_string();\n                return sax->parse_error(chars_read, last_token, parse_error::create(113, chars_read, exception_message(input_format_t::ubjson, \"expected length type specification (U, i, I, l, L) after '#'; last byte: 0x\" + last_token, \"size\"), BasicJsonType()));\n            }\n        }\n    }\n\n    /*!\n    @brief determine the type and size for a container\n\n    In the optimized UBJSON format, a type and a size can be provided to allow\n    for a more compact representation.\n\n    @param[out] result  pair of the size and the type\n\n    @return whether pair creation completed\n    */\n    bool get_ubjson_size_type(std::pair<std::size_t, char_int_type>& result)\n    {\n        result.first = string_t::npos; // size\n        result.second = 0; // type\n\n        get_ignore_noop();\n\n        if (current == '$')\n        {\n            result.second = get();  // must not ignore 'N', because 'N' maybe the type\n            if (JSON_HEDLEY_UNLIKELY(!unexpect_eof(input_format_t::ubjson, \"type\")))\n            {\n                return false;\n            }\n\n            get_ignore_noop();\n            if (JSON_HEDLEY_UNLIKELY(current != '#'))\n            {\n                if (JSON_HEDLEY_UNLIKELY(!unexpect_eof(input_format_t::ubjson, \"value\")))\n                {\n                    return false;\n                }\n                auto last_token = get_token_string();\n                return sax->parse_error(chars_read, last_token, parse_error::create(112, chars_read, exception_message(input_format_t::ubjson, \"expected '#' after type information; last byte: 0x\" + last_token, \"size\"), BasicJsonType()));\n            }\n\n            return get_ubjson_size_value(result.first);\n        }\n\n        if (current == '#')\n        {\n            return get_ubjson_size_value(result.first);\n        }\n\n        return true;\n    }\n\n    /*!\n    @param prefix  the previously read or set type prefix\n    @return whether value creation completed\n    */\n    bool get_ubjson_value(const char_int_type prefix)\n    {\n        switch (prefix)\n        {\n            case std::char_traits<char_type>::eof():  // EOF\n                return unexpect_eof(input_format_t::ubjson, \"value\");\n\n            case 'T':  // true\n                return sax->boolean(true);\n            case 'F':  // false\n                return sax->boolean(false);\n\n            case 'Z':  // null\n                return sax->null();\n\n            case 'U':\n            {\n                std::uint8_t number{};\n                return get_number(input_format_t::ubjson, number) && sax->number_unsigned(number);\n            }\n\n            case 'i':\n            {\n                std::int8_t number{};\n                return get_number(input_format_t::ubjson, number) && sax->number_integer(number);\n            }\n\n            case 'I':\n            {\n                std::int16_t number{};\n                return get_number(input_format_t::ubjson, number) && sax->number_integer(number);\n            }\n\n            case 'l':\n            {\n                std::int32_t number{};\n                return get_number(input_format_t::ubjson, number) && sax->number_integer(number);\n            }\n\n            case 'L':\n            {\n                std::int64_t number{};\n                return get_number(input_format_t::ubjson, number) && sax->number_integer(number);\n            }\n\n            case 'd':\n            {\n                float number{};\n                return get_number(input_format_t::ubjson, number) && sax->number_float(static_cast<number_float_t>(number), \"\");\n            }\n\n            case 'D':\n            {\n                double number{};\n                return get_number(input_format_t::ubjson, number) && sax->number_float(static_cast<number_float_t>(number), \"\");\n            }\n\n            case 'H':\n            {\n                return get_ubjson_high_precision_number();\n            }\n\n            case 'C':  // char\n            {\n                get();\n                if (JSON_HEDLEY_UNLIKELY(!unexpect_eof(input_format_t::ubjson, \"char\")))\n                {\n                    return false;\n                }\n                if (JSON_HEDLEY_UNLIKELY(current > 127))\n                {\n                    auto last_token = get_token_string();\n                    return sax->parse_error(chars_read, last_token, parse_error::create(113, chars_read, exception_message(input_format_t::ubjson, \"byte after 'C' must be in range 0x00..0x7F; last byte: 0x\" + last_token, \"char\"), BasicJsonType()));\n                }\n                string_t s(1, static_cast<typename string_t::value_type>(current));\n                return sax->string(s);\n            }\n\n            case 'S':  // string\n            {\n                string_t s;\n                return get_ubjson_string(s) && sax->string(s);\n            }\n\n            case '[':  // array\n                return get_ubjson_array();\n\n            case '{':  // object\n                return get_ubjson_object();\n\n            default: // anything else\n            {\n                auto last_token = get_token_string();\n                return sax->parse_error(chars_read, last_token, parse_error::create(112, chars_read, exception_message(input_format_t::ubjson, \"invalid byte: 0x\" + last_token, \"value\"), BasicJsonType()));\n            }\n        }\n    }\n\n    /*!\n    @return whether array creation completed\n    */\n    bool get_ubjson_array()\n    {\n        std::pair<std::size_t, char_int_type> size_and_type;\n        if (JSON_HEDLEY_UNLIKELY(!get_ubjson_size_type(size_and_type)))\n        {\n            return false;\n        }\n\n        if (size_and_type.first != string_t::npos)\n        {\n            if (JSON_HEDLEY_UNLIKELY(!sax->start_array(size_and_type.first)))\n            {\n                return false;\n            }\n\n            if (size_and_type.second != 0)\n            {\n                if (size_and_type.second != 'N')\n                {\n                    for (std::size_t i = 0; i < size_and_type.first; ++i)\n                    {\n                        if (JSON_HEDLEY_UNLIKELY(!get_ubjson_value(size_and_type.second)))\n                        {\n                            return false;\n                        }\n                    }\n                }\n            }\n            else\n            {\n                for (std::size_t i = 0; i < size_and_type.first; ++i)\n                {\n                    if (JSON_HEDLEY_UNLIKELY(!parse_ubjson_internal()))\n                    {\n                        return false;\n                    }\n                }\n            }\n        }\n        else\n        {\n            if (JSON_HEDLEY_UNLIKELY(!sax->start_array(std::size_t(-1))))\n            {\n                return false;\n            }\n\n            while (current != ']')\n            {\n                if (JSON_HEDLEY_UNLIKELY(!parse_ubjson_internal(false)))\n                {\n                    return false;\n                }\n                get_ignore_noop();\n            }\n        }\n\n        return sax->end_array();\n    }\n\n    /*!\n    @return whether object creation completed\n    */\n    bool get_ubjson_object()\n    {\n        std::pair<std::size_t, char_int_type> size_and_type;\n        if (JSON_HEDLEY_UNLIKELY(!get_ubjson_size_type(size_and_type)))\n        {\n            return false;\n        }\n\n        string_t key;\n        if (size_and_type.first != string_t::npos)\n        {\n            if (JSON_HEDLEY_UNLIKELY(!sax->start_object(size_and_type.first)))\n            {\n                return false;\n            }\n\n            if (size_and_type.second != 0)\n            {\n                for (std::size_t i = 0; i < size_and_type.first; ++i)\n                {\n                    if (JSON_HEDLEY_UNLIKELY(!get_ubjson_string(key) || !sax->key(key)))\n                    {\n                        return false;\n                    }\n                    if (JSON_HEDLEY_UNLIKELY(!get_ubjson_value(size_and_type.second)))\n                    {\n                        return false;\n                    }\n                    key.clear();\n                }\n            }\n            else\n            {\n                for (std::size_t i = 0; i < size_and_type.first; ++i)\n                {\n                    if (JSON_HEDLEY_UNLIKELY(!get_ubjson_string(key) || !sax->key(key)))\n                    {\n                        return false;\n                    }\n                    if (JSON_HEDLEY_UNLIKELY(!parse_ubjson_internal()))\n                    {\n                        return false;\n                    }\n                    key.clear();\n                }\n            }\n        }\n        else\n        {\n            if (JSON_HEDLEY_UNLIKELY(!sax->start_object(std::size_t(-1))))\n            {\n                return false;\n            }\n\n            while (current != '}')\n            {\n                if (JSON_HEDLEY_UNLIKELY(!get_ubjson_string(key, false) || !sax->key(key)))\n                {\n                    return false;\n                }\n                if (JSON_HEDLEY_UNLIKELY(!parse_ubjson_internal()))\n                {\n                    return false;\n                }\n                get_ignore_noop();\n                key.clear();\n            }\n        }\n\n        return sax->end_object();\n    }\n\n    // Note, no reader for UBJSON binary types is implemented because they do\n    // not exist\n\n    bool get_ubjson_high_precision_number()\n    {\n        // get size of following number string\n        std::size_t size{};\n        auto res = get_ubjson_size_value(size);\n        if (JSON_HEDLEY_UNLIKELY(!res))\n        {\n            return res;\n        }\n\n        // get number string\n        std::vector<char> number_vector;\n        for (std::size_t i = 0; i < size; ++i)\n        {\n            get();\n            if (JSON_HEDLEY_UNLIKELY(!unexpect_eof(input_format_t::ubjson, \"number\")))\n            {\n                return false;\n            }\n            number_vector.push_back(static_cast<char>(current));\n        }\n\n        // parse number string\n        using ia_type = decltype(detail::input_adapter(number_vector));\n        auto number_lexer = detail::lexer<BasicJsonType, ia_type>(detail::input_adapter(number_vector), false);\n        const auto result_number = number_lexer.scan();\n        const auto number_string = number_lexer.get_token_string();\n        const auto result_remainder = number_lexer.scan();\n\n        using token_type = typename detail::lexer_base<BasicJsonType>::token_type;\n\n        if (JSON_HEDLEY_UNLIKELY(result_remainder != token_type::end_of_input))\n        {\n            return sax->parse_error(chars_read, number_string, parse_error::create(115, chars_read, exception_message(input_format_t::ubjson, \"invalid number text: \" + number_lexer.get_token_string(), \"high-precision number\"), BasicJsonType()));\n        }\n\n        switch (result_number)\n        {\n            case token_type::value_integer:\n                return sax->number_integer(number_lexer.get_number_integer());\n            case token_type::value_unsigned:\n                return sax->number_unsigned(number_lexer.get_number_unsigned());\n            case token_type::value_float:\n                return sax->number_float(number_lexer.get_number_float(), std::move(number_string));\n            default:\n                return sax->parse_error(chars_read, number_string, parse_error::create(115, chars_read, exception_message(input_format_t::ubjson, \"invalid number text: \" + number_lexer.get_token_string(), \"high-precision number\"), BasicJsonType()));\n        }\n    }\n\n    ///////////////////////\n    // Utility functions //\n    ///////////////////////\n\n    /*!\n    @brief get next character from the input\n\n    This function provides the interface to the used input adapter. It does\n    not throw in case the input reached EOF, but returns a -'ve valued\n    `std::char_traits<char_type>::eof()` in that case.\n\n    @return character read from the input\n    */\n    char_int_type get()\n    {\n        ++chars_read;\n        return current = ia.get_character();\n    }\n\n    /*!\n    @return character read from the input after ignoring all 'N' entries\n    */\n    char_int_type get_ignore_noop()\n    {\n        do\n        {\n            get();\n        }\n        while (current == 'N');\n\n        return current;\n    }\n\n    /*\n    @brief read a number from the input\n\n    @tparam NumberType the type of the number\n    @param[in] format   the current format (for diagnostics)\n    @param[out] result  number of type @a NumberType\n\n    @return whether conversion completed\n\n    @note This function needs to respect the system's endianess, because\n          bytes in CBOR, MessagePack, and UBJSON are stored in network order\n          (big endian) and therefore need reordering on little endian systems.\n    */\n    template<typename NumberType, bool InputIsLittleEndian = false>\n    bool get_number(const input_format_t format, NumberType& result)\n    {\n        // step 1: read input into array with system's byte order\n        std::array<std::uint8_t, sizeof(NumberType)> vec{};\n        for (std::size_t i = 0; i < sizeof(NumberType); ++i)\n        {\n            get();\n            if (JSON_HEDLEY_UNLIKELY(!unexpect_eof(format, \"number\")))\n            {\n                return false;\n            }\n\n            // reverse byte order prior to conversion if necessary\n            if (is_little_endian != InputIsLittleEndian)\n            {\n                vec[sizeof(NumberType) - i - 1] = static_cast<std::uint8_t>(current);\n            }\n            else\n            {\n                vec[i] = static_cast<std::uint8_t>(current); // LCOV_EXCL_LINE\n            }\n        }\n\n        // step 2: convert array into number of type T and return\n        std::memcpy(&result, vec.data(), sizeof(NumberType));\n        return true;\n    }\n\n    /*!\n    @brief create a string by reading characters from the input\n\n    @tparam NumberType the type of the number\n    @param[in] format the current format (for diagnostics)\n    @param[in] len number of characters to read\n    @param[out] result string created by reading @a len bytes\n\n    @return whether string creation completed\n\n    @note We can not reserve @a len bytes for the result, because @a len\n          may be too large. Usually, @ref unexpect_eof() detects the end of\n          the input before we run out of string memory.\n    */\n    template<typename NumberType>\n    bool get_string(const input_format_t format,\n                    const NumberType len,\n                    string_t& result)\n    {\n        bool success = true;\n        for (NumberType i = 0; i < len; i++)\n        {\n            get();\n            if (JSON_HEDLEY_UNLIKELY(!unexpect_eof(format, \"string\")))\n            {\n                success = false;\n                break;\n            }\n            result.push_back(static_cast<typename string_t::value_type>(current));\n        }\n        return success;\n    }\n\n    /*!\n    @brief create a byte array by reading bytes from the input\n\n    @tparam NumberType the type of the number\n    @param[in] format the current format (for diagnostics)\n    @param[in] len number of bytes to read\n    @param[out] result byte array created by reading @a len bytes\n\n    @return whether byte array creation completed\n\n    @note We can not reserve @a len bytes for the result, because @a len\n          may be too large. Usually, @ref unexpect_eof() detects the end of\n          the input before we run out of memory.\n    */\n    template<typename NumberType>\n    bool get_binary(const input_format_t format,\n                    const NumberType len,\n                    binary_t& result)\n    {\n        bool success = true;\n        for (NumberType i = 0; i < len; i++)\n        {\n            get();\n            if (JSON_HEDLEY_UNLIKELY(!unexpect_eof(format, \"binary\")))\n            {\n                success = false;\n                break;\n            }\n            result.push_back(static_cast<std::uint8_t>(current));\n        }\n        return success;\n    }\n\n    /*!\n    @param[in] format   the current format (for diagnostics)\n    @param[in] context  further context information (for diagnostics)\n    @return whether the last read character is not EOF\n    */\n    JSON_HEDLEY_NON_NULL(3)\n    bool unexpect_eof(const input_format_t format, const char* context) const\n    {\n        if (JSON_HEDLEY_UNLIKELY(current == std::char_traits<char_type>::eof()))\n        {\n            return sax->parse_error(chars_read, \"<end of file>\",\n                                    parse_error::create(110, chars_read, exception_message(format, \"unexpected end of input\", context), BasicJsonType()));\n        }\n        return true;\n    }\n\n    /*!\n    @return a string representation of the last read byte\n    */\n    std::string get_token_string() const\n    {\n        std::array<char, 3> cr{{}};\n        (std::snprintf)(cr.data(), cr.size(), \"%.2hhX\", static_cast<unsigned char>(current)); // NOLINT(cppcoreguidelines-pro-type-vararg,hicpp-vararg)\n        return std::string{cr.data()};\n    }\n\n    /*!\n    @param[in] format   the current format\n    @param[in] detail   a detailed error message\n    @param[in] context  further context information\n    @return a message string to use in the parse_error exceptions\n    */\n    std::string exception_message(const input_format_t format,\n                                  const std::string& detail,\n                                  const std::string& context) const\n    {\n        std::string error_msg = \"syntax error while parsing \";\n\n        switch (format)\n        {\n            case input_format_t::cbor:\n                error_msg += \"CBOR\";\n                break;\n\n            case input_format_t::msgpack:\n                error_msg += \"MessagePack\";\n                break;\n\n            case input_format_t::ubjson:\n                error_msg += \"UBJSON\";\n                break;\n\n            case input_format_t::bson:\n                error_msg += \"BSON\";\n                break;\n\n            default:            // LCOV_EXCL_LINE\n                JSON_ASSERT(false); // NOLINT(cert-dcl03-c,hicpp-static-assert,misc-static-assert) LCOV_EXCL_LINE\n        }\n\n        return error_msg + \" \" + context + \": \" + detail;\n    }\n\n  private:\n    /// input adapter\n    InputAdapterType ia;\n\n    /// the current character\n    char_int_type current = std::char_traits<char_type>::eof();\n\n    /// the number of characters read\n    std::size_t chars_read = 0;\n\n    /// whether we can assume little endianess\n    const bool is_little_endian = little_endianess();\n\n    /// the SAX parser\n    json_sax_t* sax = nullptr;\n};\n}  // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/input/input_adapters.hpp>\n\n// #include <nlohmann/detail/input/lexer.hpp>\n\n// #include <nlohmann/detail/input/parser.hpp>\n\n\n#include <cmath> // isfinite\n#include <cstdint> // uint8_t\n#include <functional> // function\n#include <string> // string\n#include <utility> // move\n#include <vector> // vector\n\n// #include <nlohmann/detail/exceptions.hpp>\n\n// #include <nlohmann/detail/input/input_adapters.hpp>\n\n// #include <nlohmann/detail/input/json_sax.hpp>\n\n// #include <nlohmann/detail/input/lexer.hpp>\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n// #include <nlohmann/detail/meta/is_sax.hpp>\n\n// #include <nlohmann/detail/value_t.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\n////////////\n// parser //\n////////////\n\nenum class parse_event_t : uint8_t\n{\n    /// the parser read `{` and started to process a JSON object\n    object_start,\n    /// the parser read `}` and finished processing a JSON object\n    object_end,\n    /// the parser read `[` and started to process a JSON array\n    array_start,\n    /// the parser read `]` and finished processing a JSON array\n    array_end,\n    /// the parser read a key of a value in an object\n    key,\n    /// the parser finished reading a JSON value\n    value\n};\n\ntemplate<typename BasicJsonType>\nusing parser_callback_t =\n    std::function<bool(int /*depth*/, parse_event_t /*event*/, BasicJsonType& /*parsed*/)>;\n\n/*!\n@brief syntax analysis\n\nThis class implements a recursive descent parser.\n*/\ntemplate<typename BasicJsonType, typename InputAdapterType>\nclass parser\n{\n    using number_integer_t = typename BasicJsonType::number_integer_t;\n    using number_unsigned_t = typename BasicJsonType::number_unsigned_t;\n    using number_float_t = typename BasicJsonType::number_float_t;\n    using string_t = typename BasicJsonType::string_t;\n    using lexer_t = lexer<BasicJsonType, InputAdapterType>;\n    using token_type = typename lexer_t::token_type;\n\n  public:\n    /// a parser reading from an input adapter\n    explicit parser(InputAdapterType&& adapter,\n                    const parser_callback_t<BasicJsonType> cb = nullptr,\n                    const bool allow_exceptions_ = true,\n                    const bool skip_comments = false)\n        : callback(cb)\n        , m_lexer(std::move(adapter), skip_comments)\n        , allow_exceptions(allow_exceptions_)\n    {\n        // read first token\n        get_token();\n    }\n\n    /*!\n    @brief public parser interface\n\n    @param[in] strict      whether to expect the last token to be EOF\n    @param[in,out] result  parsed JSON value\n\n    @throw parse_error.101 in case of an unexpected token\n    @throw parse_error.102 if to_unicode fails or surrogate error\n    @throw parse_error.103 if to_unicode fails\n    */\n    void parse(const bool strict, BasicJsonType& result)\n    {\n        if (callback)\n        {\n            json_sax_dom_callback_parser<BasicJsonType> sdp(result, callback, allow_exceptions);\n            sax_parse_internal(&sdp);\n\n            // in strict mode, input must be completely read\n            if (strict && (get_token() != token_type::end_of_input))\n            {\n                sdp.parse_error(m_lexer.get_position(),\n                                m_lexer.get_token_string(),\n                                parse_error::create(101, m_lexer.get_position(),\n                                                    exception_message(token_type::end_of_input, \"value\"), BasicJsonType()));\n            }\n\n            // in case of an error, return discarded value\n            if (sdp.is_errored())\n            {\n                result = value_t::discarded;\n                return;\n            }\n\n            // set top-level value to null if it was discarded by the callback\n            // function\n            if (result.is_discarded())\n            {\n                result = nullptr;\n            }\n        }\n        else\n        {\n            json_sax_dom_parser<BasicJsonType> sdp(result, allow_exceptions);\n            sax_parse_internal(&sdp);\n\n            // in strict mode, input must be completely read\n            if (strict && (get_token() != token_type::end_of_input))\n            {\n                sdp.parse_error(m_lexer.get_position(),\n                                m_lexer.get_token_string(),\n                                parse_error::create(101, m_lexer.get_position(), exception_message(token_type::end_of_input, \"value\"), BasicJsonType()));\n            }\n\n            // in case of an error, return discarded value\n            if (sdp.is_errored())\n            {\n                result = value_t::discarded;\n                return;\n            }\n        }\n\n        result.assert_invariant();\n    }\n\n    /*!\n    @brief public accept interface\n\n    @param[in] strict  whether to expect the last token to be EOF\n    @return whether the input is a proper JSON text\n    */\n    bool accept(const bool strict = true)\n    {\n        json_sax_acceptor<BasicJsonType> sax_acceptor;\n        return sax_parse(&sax_acceptor, strict);\n    }\n\n    template<typename SAX>\n    JSON_HEDLEY_NON_NULL(2)\n    bool sax_parse(SAX* sax, const bool strict = true)\n    {\n        (void)detail::is_sax_static_asserts<SAX, BasicJsonType> {};\n        const bool result = sax_parse_internal(sax);\n\n        // strict mode: next byte must be EOF\n        if (result && strict && (get_token() != token_type::end_of_input))\n        {\n            return sax->parse_error(m_lexer.get_position(),\n                                    m_lexer.get_token_string(),\n                                    parse_error::create(101, m_lexer.get_position(), exception_message(token_type::end_of_input, \"value\"), BasicJsonType()));\n        }\n\n        return result;\n    }\n\n  private:\n    template<typename SAX>\n    JSON_HEDLEY_NON_NULL(2)\n    bool sax_parse_internal(SAX* sax)\n    {\n        // stack to remember the hierarchy of structured values we are parsing\n        // true = array; false = object\n        std::vector<bool> states;\n        // value to avoid a goto (see comment where set to true)\n        bool skip_to_state_evaluation = false;\n\n        while (true)\n        {\n            if (!skip_to_state_evaluation)\n            {\n                // invariant: get_token() was called before each iteration\n                switch (last_token)\n                {\n                    case token_type::begin_object:\n                    {\n                        if (JSON_HEDLEY_UNLIKELY(!sax->start_object(std::size_t(-1))))\n                        {\n                            return false;\n                        }\n\n                        // closing } -> we are done\n                        if (get_token() == token_type::end_object)\n                        {\n                            if (JSON_HEDLEY_UNLIKELY(!sax->end_object()))\n                            {\n                                return false;\n                            }\n                            break;\n                        }\n\n                        // parse key\n                        if (JSON_HEDLEY_UNLIKELY(last_token != token_type::value_string))\n                        {\n                            return sax->parse_error(m_lexer.get_position(),\n                                                    m_lexer.get_token_string(),\n                                                    parse_error::create(101, m_lexer.get_position(), exception_message(token_type::value_string, \"object key\"), BasicJsonType()));\n                        }\n                        if (JSON_HEDLEY_UNLIKELY(!sax->key(m_lexer.get_string())))\n                        {\n                            return false;\n                        }\n\n                        // parse separator (:)\n                        if (JSON_HEDLEY_UNLIKELY(get_token() != token_type::name_separator))\n                        {\n                            return sax->parse_error(m_lexer.get_position(),\n                                                    m_lexer.get_token_string(),\n                                                    parse_error::create(101, m_lexer.get_position(), exception_message(token_type::name_separator, \"object separator\"), BasicJsonType()));\n                        }\n\n                        // remember we are now inside an object\n                        states.push_back(false);\n\n                        // parse values\n                        get_token();\n                        continue;\n                    }\n\n                    case token_type::begin_array:\n                    {\n                        if (JSON_HEDLEY_UNLIKELY(!sax->start_array(std::size_t(-1))))\n                        {\n                            return false;\n                        }\n\n                        // closing ] -> we are done\n                        if (get_token() == token_type::end_array)\n                        {\n                            if (JSON_HEDLEY_UNLIKELY(!sax->end_array()))\n                            {\n                                return false;\n                            }\n                            break;\n                        }\n\n                        // remember we are now inside an array\n                        states.push_back(true);\n\n                        // parse values (no need to call get_token)\n                        continue;\n                    }\n\n                    case token_type::value_float:\n                    {\n                        const auto res = m_lexer.get_number_float();\n\n                        if (JSON_HEDLEY_UNLIKELY(!std::isfinite(res)))\n                        {\n                            return sax->parse_error(m_lexer.get_position(),\n                                                    m_lexer.get_token_string(),\n                                                    out_of_range::create(406, \"number overflow parsing '\" + m_lexer.get_token_string() + \"'\", BasicJsonType()));\n                        }\n\n                        if (JSON_HEDLEY_UNLIKELY(!sax->number_float(res, m_lexer.get_string())))\n                        {\n                            return false;\n                        }\n\n                        break;\n                    }\n\n                    case token_type::literal_false:\n                    {\n                        if (JSON_HEDLEY_UNLIKELY(!sax->boolean(false)))\n                        {\n                            return false;\n                        }\n                        break;\n                    }\n\n                    case token_type::literal_null:\n                    {\n                        if (JSON_HEDLEY_UNLIKELY(!sax->null()))\n                        {\n                            return false;\n                        }\n                        break;\n                    }\n\n                    case token_type::literal_true:\n                    {\n                        if (JSON_HEDLEY_UNLIKELY(!sax->boolean(true)))\n                        {\n                            return false;\n                        }\n                        break;\n                    }\n\n                    case token_type::value_integer:\n                    {\n                        if (JSON_HEDLEY_UNLIKELY(!sax->number_integer(m_lexer.get_number_integer())))\n                        {\n                            return false;\n                        }\n                        break;\n                    }\n\n                    case token_type::value_string:\n                    {\n                        if (JSON_HEDLEY_UNLIKELY(!sax->string(m_lexer.get_string())))\n                        {\n                            return false;\n                        }\n                        break;\n                    }\n\n                    case token_type::value_unsigned:\n                    {\n                        if (JSON_HEDLEY_UNLIKELY(!sax->number_unsigned(m_lexer.get_number_unsigned())))\n                        {\n                            return false;\n                        }\n                        break;\n                    }\n\n                    case token_type::parse_error:\n                    {\n                        // using \"uninitialized\" to avoid \"expected\" message\n                        return sax->parse_error(m_lexer.get_position(),\n                                                m_lexer.get_token_string(),\n                                                parse_error::create(101, m_lexer.get_position(), exception_message(token_type::uninitialized, \"value\"), BasicJsonType()));\n                    }\n\n                    default: // the last token was unexpected\n                    {\n                        return sax->parse_error(m_lexer.get_position(),\n                                                m_lexer.get_token_string(),\n                                                parse_error::create(101, m_lexer.get_position(), exception_message(token_type::literal_or_value, \"value\"), BasicJsonType()));\n                    }\n                }\n            }\n            else\n            {\n                skip_to_state_evaluation = false;\n            }\n\n            // we reached this line after we successfully parsed a value\n            if (states.empty())\n            {\n                // empty stack: we reached the end of the hierarchy: done\n                return true;\n            }\n\n            if (states.back())  // array\n            {\n                // comma -> next value\n                if (get_token() == token_type::value_separator)\n                {\n                    // parse a new value\n                    get_token();\n                    continue;\n                }\n\n                // closing ]\n                if (JSON_HEDLEY_LIKELY(last_token == token_type::end_array))\n                {\n                    if (JSON_HEDLEY_UNLIKELY(!sax->end_array()))\n                    {\n                        return false;\n                    }\n\n                    // We are done with this array. Before we can parse a\n                    // new value, we need to evaluate the new state first.\n                    // By setting skip_to_state_evaluation to false, we\n                    // are effectively jumping to the beginning of this if.\n                    JSON_ASSERT(!states.empty());\n                    states.pop_back();\n                    skip_to_state_evaluation = true;\n                    continue;\n                }\n\n                return sax->parse_error(m_lexer.get_position(),\n                                        m_lexer.get_token_string(),\n                                        parse_error::create(101, m_lexer.get_position(), exception_message(token_type::end_array, \"array\"), BasicJsonType()));\n            }\n\n            // states.back() is false -> object\n\n            // comma -> next value\n            if (get_token() == token_type::value_separator)\n            {\n                // parse key\n                if (JSON_HEDLEY_UNLIKELY(get_token() != token_type::value_string))\n                {\n                    return sax->parse_error(m_lexer.get_position(),\n                                            m_lexer.get_token_string(),\n                                            parse_error::create(101, m_lexer.get_position(), exception_message(token_type::value_string, \"object key\"), BasicJsonType()));\n                }\n\n                if (JSON_HEDLEY_UNLIKELY(!sax->key(m_lexer.get_string())))\n                {\n                    return false;\n                }\n\n                // parse separator (:)\n                if (JSON_HEDLEY_UNLIKELY(get_token() != token_type::name_separator))\n                {\n                    return sax->parse_error(m_lexer.get_position(),\n                                            m_lexer.get_token_string(),\n                                            parse_error::create(101, m_lexer.get_position(), exception_message(token_type::name_separator, \"object separator\"), BasicJsonType()));\n                }\n\n                // parse values\n                get_token();\n                continue;\n            }\n\n            // closing }\n            if (JSON_HEDLEY_LIKELY(last_token == token_type::end_object))\n            {\n                if (JSON_HEDLEY_UNLIKELY(!sax->end_object()))\n                {\n                    return false;\n                }\n\n                // We are done with this object. Before we can parse a\n                // new value, we need to evaluate the new state first.\n                // By setting skip_to_state_evaluation to false, we\n                // are effectively jumping to the beginning of this if.\n                JSON_ASSERT(!states.empty());\n                states.pop_back();\n                skip_to_state_evaluation = true;\n                continue;\n            }\n\n            return sax->parse_error(m_lexer.get_position(),\n                                    m_lexer.get_token_string(),\n                                    parse_error::create(101, m_lexer.get_position(), exception_message(token_type::end_object, \"object\"), BasicJsonType()));\n        }\n    }\n\n    /// get next token from lexer\n    token_type get_token()\n    {\n        return last_token = m_lexer.scan();\n    }\n\n    std::string exception_message(const token_type expected, const std::string& context)\n    {\n        std::string error_msg = \"syntax error \";\n\n        if (!context.empty())\n        {\n            error_msg += \"while parsing \" + context + \" \";\n        }\n\n        error_msg += \"- \";\n\n        if (last_token == token_type::parse_error)\n        {\n            error_msg += std::string(m_lexer.get_error_message()) + \"; last read: '\" +\n                         m_lexer.get_token_string() + \"'\";\n        }\n        else\n        {\n            error_msg += \"unexpected \" + std::string(lexer_t::token_type_name(last_token));\n        }\n\n        if (expected != token_type::uninitialized)\n        {\n            error_msg += \"; expected \" + std::string(lexer_t::token_type_name(expected));\n        }\n\n        return error_msg;\n    }\n\n  private:\n    /// callback function\n    const parser_callback_t<BasicJsonType> callback = nullptr;\n    /// the type of the last read token\n    token_type last_token = token_type::uninitialized;\n    /// the lexer\n    lexer_t m_lexer;\n    /// whether to throw exceptions in case of errors\n    const bool allow_exceptions = true;\n};\n\n}  // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/iterators/internal_iterator.hpp>\n\n\n// #include <nlohmann/detail/iterators/primitive_iterator.hpp>\n\n\n#include <cstddef> // ptrdiff_t\n#include <limits>  // numeric_limits\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\n/*\n@brief an iterator for primitive JSON types\n\nThis class models an iterator for primitive JSON types (boolean, number,\nstring). It's only purpose is to allow the iterator/const_iterator classes\nto \"iterate\" over primitive values. Internally, the iterator is modeled by\na `difference_type` variable. Value begin_value (`0`) models the begin,\nend_value (`1`) models past the end.\n*/\nclass primitive_iterator_t\n{\n  private:\n    using difference_type = std::ptrdiff_t;\n    static constexpr difference_type begin_value = 0;\n    static constexpr difference_type end_value = begin_value + 1;\n\n  JSON_PRIVATE_UNLESS_TESTED:\n    /// iterator as signed integer type\n    difference_type m_it = (std::numeric_limits<std::ptrdiff_t>::min)();\n\n  public:\n    constexpr difference_type get_value() const noexcept\n    {\n        return m_it;\n    }\n\n    /// set iterator to a defined beginning\n    void set_begin() noexcept\n    {\n        m_it = begin_value;\n    }\n\n    /// set iterator to a defined past the end\n    void set_end() noexcept\n    {\n        m_it = end_value;\n    }\n\n    /// return whether the iterator can be dereferenced\n    constexpr bool is_begin() const noexcept\n    {\n        return m_it == begin_value;\n    }\n\n    /// return whether the iterator is at end\n    constexpr bool is_end() const noexcept\n    {\n        return m_it == end_value;\n    }\n\n    friend constexpr bool operator==(primitive_iterator_t lhs, primitive_iterator_t rhs) noexcept\n    {\n        return lhs.m_it == rhs.m_it;\n    }\n\n    friend constexpr bool operator<(primitive_iterator_t lhs, primitive_iterator_t rhs) noexcept\n    {\n        return lhs.m_it < rhs.m_it;\n    }\n\n    primitive_iterator_t operator+(difference_type n) noexcept\n    {\n        auto result = *this;\n        result += n;\n        return result;\n    }\n\n    friend constexpr difference_type operator-(primitive_iterator_t lhs, primitive_iterator_t rhs) noexcept\n    {\n        return lhs.m_it - rhs.m_it;\n    }\n\n    primitive_iterator_t& operator++() noexcept\n    {\n        ++m_it;\n        return *this;\n    }\n\n    primitive_iterator_t const operator++(int) noexcept // NOLINT(readability-const-return-type)\n    {\n        auto result = *this;\n        ++m_it;\n        return result;\n    }\n\n    primitive_iterator_t& operator--() noexcept\n    {\n        --m_it;\n        return *this;\n    }\n\n    primitive_iterator_t const operator--(int) noexcept // NOLINT(readability-const-return-type)\n    {\n        auto result = *this;\n        --m_it;\n        return result;\n    }\n\n    primitive_iterator_t& operator+=(difference_type n) noexcept\n    {\n        m_it += n;\n        return *this;\n    }\n\n    primitive_iterator_t& operator-=(difference_type n) noexcept\n    {\n        m_it -= n;\n        return *this;\n    }\n};\n}  // namespace detail\n}  // namespace nlohmann\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\n/*!\n@brief an iterator value\n\n@note This structure could easily be a union, but MSVC currently does not allow\nunions members with complex constructors, see https://github.com/nlohmann/json/pull/105.\n*/\ntemplate<typename BasicJsonType> struct internal_iterator\n{\n    /// iterator for JSON objects\n    typename BasicJsonType::object_t::iterator object_iterator {};\n    /// iterator for JSON arrays\n    typename BasicJsonType::array_t::iterator array_iterator {};\n    /// generic iterator for all other types\n    primitive_iterator_t primitive_iterator {};\n};\n}  // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/iterators/iter_impl.hpp>\n\n\n#include <iterator> // iterator, random_access_iterator_tag, bidirectional_iterator_tag, advance, next\n#include <type_traits> // conditional, is_const, remove_const\n\n// #include <nlohmann/detail/exceptions.hpp>\n\n// #include <nlohmann/detail/iterators/internal_iterator.hpp>\n\n// #include <nlohmann/detail/iterators/primitive_iterator.hpp>\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n// #include <nlohmann/detail/meta/cpp_future.hpp>\n\n// #include <nlohmann/detail/meta/type_traits.hpp>\n\n// #include <nlohmann/detail/value_t.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\n// forward declare, to be able to friend it later on\ntemplate<typename IteratorType> class iteration_proxy;\ntemplate<typename IteratorType> class iteration_proxy_value;\n\n/*!\n@brief a template for a bidirectional iterator for the @ref basic_json class\nThis class implements a both iterators (iterator and const_iterator) for the\n@ref basic_json class.\n@note An iterator is called *initialized* when a pointer to a JSON value has\n      been set (e.g., by a constructor or a copy assignment). If the iterator is\n      default-constructed, it is *uninitialized* and most methods are undefined.\n      **The library uses assertions to detect calls on uninitialized iterators.**\n@requirement The class satisfies the following concept requirements:\n-\n[BidirectionalIterator](https://en.cppreference.com/w/cpp/named_req/BidirectionalIterator):\n  The iterator that can be moved can be moved in both directions (i.e.\n  incremented and decremented).\n@since version 1.0.0, simplified in version 2.0.9, change to bidirectional\n       iterators in version 3.0.0 (see https://github.com/nlohmann/json/issues/593)\n*/\ntemplate<typename BasicJsonType>\nclass iter_impl\n{\n    /// the iterator with BasicJsonType of different const-ness\n    using other_iter_impl = iter_impl<typename std::conditional<std::is_const<BasicJsonType>::value, typename std::remove_const<BasicJsonType>::type, const BasicJsonType>::type>;\n    /// allow basic_json to access private members\n    friend other_iter_impl;\n    friend BasicJsonType;\n    friend iteration_proxy<iter_impl>;\n    friend iteration_proxy_value<iter_impl>;\n\n    using object_t = typename BasicJsonType::object_t;\n    using array_t = typename BasicJsonType::array_t;\n    // make sure BasicJsonType is basic_json or const basic_json\n    static_assert(is_basic_json<typename std::remove_const<BasicJsonType>::type>::value,\n                  \"iter_impl only accepts (const) basic_json\");\n\n  public:\n\n    /// The std::iterator class template (used as a base class to provide typedefs) is deprecated in C++17.\n    /// The C++ Standard has never required user-defined iterators to derive from std::iterator.\n    /// A user-defined iterator should provide publicly accessible typedefs named\n    /// iterator_category, value_type, difference_type, pointer, and reference.\n    /// Note that value_type is required to be non-const, even for constant iterators.\n    using iterator_category = std::bidirectional_iterator_tag;\n\n    /// the type of the values when the iterator is dereferenced\n    using value_type = typename BasicJsonType::value_type;\n    /// a type to represent differences between iterators\n    using difference_type = typename BasicJsonType::difference_type;\n    /// defines a pointer to the type iterated over (value_type)\n    using pointer = typename std::conditional<std::is_const<BasicJsonType>::value,\n          typename BasicJsonType::const_pointer,\n          typename BasicJsonType::pointer>::type;\n    /// defines a reference to the type iterated over (value_type)\n    using reference =\n        typename std::conditional<std::is_const<BasicJsonType>::value,\n        typename BasicJsonType::const_reference,\n        typename BasicJsonType::reference>::type;\n\n    iter_impl() = default;\n    ~iter_impl() = default;\n    iter_impl(iter_impl&&) noexcept = default;\n    iter_impl& operator=(iter_impl&&) noexcept = default;\n\n    /*!\n    @brief constructor for a given JSON instance\n    @param[in] object  pointer to a JSON object for this iterator\n    @pre object != nullptr\n    @post The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    explicit iter_impl(pointer object) noexcept : m_object(object)\n    {\n        JSON_ASSERT(m_object != nullptr);\n\n        switch (m_object->m_type)\n        {\n            case value_t::object:\n            {\n                m_it.object_iterator = typename object_t::iterator();\n                break;\n            }\n\n            case value_t::array:\n            {\n                m_it.array_iterator = typename array_t::iterator();\n                break;\n            }\n\n            default:\n            {\n                m_it.primitive_iterator = primitive_iterator_t();\n                break;\n            }\n        }\n    }\n\n    /*!\n    @note The conventional copy constructor and copy assignment are implicitly\n          defined. Combined with the following converting constructor and\n          assignment, they support: (1) copy from iterator to iterator, (2)\n          copy from const iterator to const iterator, and (3) conversion from\n          iterator to const iterator. However conversion from const iterator\n          to iterator is not defined.\n    */\n\n    /*!\n    @brief const copy constructor\n    @param[in] other const iterator to copy from\n    @note This copy constructor had to be defined explicitly to circumvent a bug\n          occurring on msvc v19.0 compiler (VS 2015) debug build. For more\n          information refer to: https://github.com/nlohmann/json/issues/1608\n    */\n    iter_impl(const iter_impl<const BasicJsonType>& other) noexcept\n        : m_object(other.m_object), m_it(other.m_it)\n    {}\n\n    /*!\n    @brief converting assignment\n    @param[in] other const iterator to copy from\n    @return const/non-const iterator\n    @note It is not checked whether @a other is initialized.\n    */\n    iter_impl& operator=(const iter_impl<const BasicJsonType>& other) noexcept\n    {\n        if (&other != this)\n        {\n            m_object = other.m_object;\n            m_it = other.m_it;\n        }\n        return *this;\n    }\n\n    /*!\n    @brief converting constructor\n    @param[in] other  non-const iterator to copy from\n    @note It is not checked whether @a other is initialized.\n    */\n    iter_impl(const iter_impl<typename std::remove_const<BasicJsonType>::type>& other) noexcept\n        : m_object(other.m_object), m_it(other.m_it)\n    {}\n\n    /*!\n    @brief converting assignment\n    @param[in] other  non-const iterator to copy from\n    @return const/non-const iterator\n    @note It is not checked whether @a other is initialized.\n    */\n    iter_impl& operator=(const iter_impl<typename std::remove_const<BasicJsonType>::type>& other) noexcept // NOLINT(cert-oop54-cpp)\n    {\n        m_object = other.m_object;\n        m_it = other.m_it;\n        return *this;\n    }\n\n  JSON_PRIVATE_UNLESS_TESTED:\n    /*!\n    @brief set the iterator to the first value\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    void set_begin() noexcept\n    {\n        JSON_ASSERT(m_object != nullptr);\n\n        switch (m_object->m_type)\n        {\n            case value_t::object:\n            {\n                m_it.object_iterator = m_object->m_value.object->begin();\n                break;\n            }\n\n            case value_t::array:\n            {\n                m_it.array_iterator = m_object->m_value.array->begin();\n                break;\n            }\n\n            case value_t::null:\n            {\n                // set to end so begin()==end() is true: null is empty\n                m_it.primitive_iterator.set_end();\n                break;\n            }\n\n            default:\n            {\n                m_it.primitive_iterator.set_begin();\n                break;\n            }\n        }\n    }\n\n    /*!\n    @brief set the iterator past the last value\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    void set_end() noexcept\n    {\n        JSON_ASSERT(m_object != nullptr);\n\n        switch (m_object->m_type)\n        {\n            case value_t::object:\n            {\n                m_it.object_iterator = m_object->m_value.object->end();\n                break;\n            }\n\n            case value_t::array:\n            {\n                m_it.array_iterator = m_object->m_value.array->end();\n                break;\n            }\n\n            default:\n            {\n                m_it.primitive_iterator.set_end();\n                break;\n            }\n        }\n    }\n\n  public:\n    /*!\n    @brief return a reference to the value pointed to by the iterator\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    reference operator*() const\n    {\n        JSON_ASSERT(m_object != nullptr);\n\n        switch (m_object->m_type)\n        {\n            case value_t::object:\n            {\n                JSON_ASSERT(m_it.object_iterator != m_object->m_value.object->end());\n                return m_it.object_iterator->second;\n            }\n\n            case value_t::array:\n            {\n                JSON_ASSERT(m_it.array_iterator != m_object->m_value.array->end());\n                return *m_it.array_iterator;\n            }\n\n            case value_t::null:\n                JSON_THROW(invalid_iterator::create(214, \"cannot get value\", *m_object));\n\n            default:\n            {\n                if (JSON_HEDLEY_LIKELY(m_it.primitive_iterator.is_begin()))\n                {\n                    return *m_object;\n                }\n\n                JSON_THROW(invalid_iterator::create(214, \"cannot get value\", *m_object));\n            }\n        }\n    }\n\n    /*!\n    @brief dereference the iterator\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    pointer operator->() const\n    {\n        JSON_ASSERT(m_object != nullptr);\n\n        switch (m_object->m_type)\n        {\n            case value_t::object:\n            {\n                JSON_ASSERT(m_it.object_iterator != m_object->m_value.object->end());\n                return &(m_it.object_iterator->second);\n            }\n\n            case value_t::array:\n            {\n                JSON_ASSERT(m_it.array_iterator != m_object->m_value.array->end());\n                return &*m_it.array_iterator;\n            }\n\n            default:\n            {\n                if (JSON_HEDLEY_LIKELY(m_it.primitive_iterator.is_begin()))\n                {\n                    return m_object;\n                }\n\n                JSON_THROW(invalid_iterator::create(214, \"cannot get value\", *m_object));\n            }\n        }\n    }\n\n    /*!\n    @brief post-increment (it++)\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    iter_impl const operator++(int) // NOLINT(readability-const-return-type)\n    {\n        auto result = *this;\n        ++(*this);\n        return result;\n    }\n\n    /*!\n    @brief pre-increment (++it)\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    iter_impl& operator++()\n    {\n        JSON_ASSERT(m_object != nullptr);\n\n        switch (m_object->m_type)\n        {\n            case value_t::object:\n            {\n                std::advance(m_it.object_iterator, 1);\n                break;\n            }\n\n            case value_t::array:\n            {\n                std::advance(m_it.array_iterator, 1);\n                break;\n            }\n\n            default:\n            {\n                ++m_it.primitive_iterator;\n                break;\n            }\n        }\n\n        return *this;\n    }\n\n    /*!\n    @brief post-decrement (it--)\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    iter_impl const operator--(int) // NOLINT(readability-const-return-type)\n    {\n        auto result = *this;\n        --(*this);\n        return result;\n    }\n\n    /*!\n    @brief pre-decrement (--it)\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    iter_impl& operator--()\n    {\n        JSON_ASSERT(m_object != nullptr);\n\n        switch (m_object->m_type)\n        {\n            case value_t::object:\n            {\n                std::advance(m_it.object_iterator, -1);\n                break;\n            }\n\n            case value_t::array:\n            {\n                std::advance(m_it.array_iterator, -1);\n                break;\n            }\n\n            default:\n            {\n                --m_it.primitive_iterator;\n                break;\n            }\n        }\n\n        return *this;\n    }\n\n    /*!\n    @brief comparison: equal\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    template < typename IterImpl, detail::enable_if_t < (std::is_same<IterImpl, iter_impl>::value || std::is_same<IterImpl, other_iter_impl>::value), std::nullptr_t > = nullptr >\n    bool operator==(const IterImpl& other) const\n    {\n        // if objects are not the same, the comparison is undefined\n        if (JSON_HEDLEY_UNLIKELY(m_object != other.m_object))\n        {\n            JSON_THROW(invalid_iterator::create(212, \"cannot compare iterators of different containers\", *m_object));\n        }\n\n        JSON_ASSERT(m_object != nullptr);\n\n        switch (m_object->m_type)\n        {\n            case value_t::object:\n                return (m_it.object_iterator == other.m_it.object_iterator);\n\n            case value_t::array:\n                return (m_it.array_iterator == other.m_it.array_iterator);\n\n            default:\n                return (m_it.primitive_iterator == other.m_it.primitive_iterator);\n        }\n    }\n\n    /*!\n    @brief comparison: not equal\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    template < typename IterImpl, detail::enable_if_t < (std::is_same<IterImpl, iter_impl>::value || std::is_same<IterImpl, other_iter_impl>::value), std::nullptr_t > = nullptr >\n    bool operator!=(const IterImpl& other) const\n    {\n        return !operator==(other);\n    }\n\n    /*!\n    @brief comparison: smaller\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    bool operator<(const iter_impl& other) const\n    {\n        // if objects are not the same, the comparison is undefined\n        if (JSON_HEDLEY_UNLIKELY(m_object != other.m_object))\n        {\n            JSON_THROW(invalid_iterator::create(212, \"cannot compare iterators of different containers\", *m_object));\n        }\n\n        JSON_ASSERT(m_object != nullptr);\n\n        switch (m_object->m_type)\n        {\n            case value_t::object:\n                JSON_THROW(invalid_iterator::create(213, \"cannot compare order of object iterators\", *m_object));\n\n            case value_t::array:\n                return (m_it.array_iterator < other.m_it.array_iterator);\n\n            default:\n                return (m_it.primitive_iterator < other.m_it.primitive_iterator);\n        }\n    }\n\n    /*!\n    @brief comparison: less than or equal\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    bool operator<=(const iter_impl& other) const\n    {\n        return !other.operator < (*this);\n    }\n\n    /*!\n    @brief comparison: greater than\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    bool operator>(const iter_impl& other) const\n    {\n        return !operator<=(other);\n    }\n\n    /*!\n    @brief comparison: greater than or equal\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    bool operator>=(const iter_impl& other) const\n    {\n        return !operator<(other);\n    }\n\n    /*!\n    @brief add to iterator\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    iter_impl& operator+=(difference_type i)\n    {\n        JSON_ASSERT(m_object != nullptr);\n\n        switch (m_object->m_type)\n        {\n            case value_t::object:\n                JSON_THROW(invalid_iterator::create(209, \"cannot use offsets with object iterators\", *m_object));\n\n            case value_t::array:\n            {\n                std::advance(m_it.array_iterator, i);\n                break;\n            }\n\n            default:\n            {\n                m_it.primitive_iterator += i;\n                break;\n            }\n        }\n\n        return *this;\n    }\n\n    /*!\n    @brief subtract from iterator\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    iter_impl& operator-=(difference_type i)\n    {\n        return operator+=(-i);\n    }\n\n    /*!\n    @brief add to iterator\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    iter_impl operator+(difference_type i) const\n    {\n        auto result = *this;\n        result += i;\n        return result;\n    }\n\n    /*!\n    @brief addition of distance and iterator\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    friend iter_impl operator+(difference_type i, const iter_impl& it)\n    {\n        auto result = it;\n        result += i;\n        return result;\n    }\n\n    /*!\n    @brief subtract from iterator\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    iter_impl operator-(difference_type i) const\n    {\n        auto result = *this;\n        result -= i;\n        return result;\n    }\n\n    /*!\n    @brief return difference\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    difference_type operator-(const iter_impl& other) const\n    {\n        JSON_ASSERT(m_object != nullptr);\n\n        switch (m_object->m_type)\n        {\n            case value_t::object:\n                JSON_THROW(invalid_iterator::create(209, \"cannot use offsets with object iterators\", *m_object));\n\n            case value_t::array:\n                return m_it.array_iterator - other.m_it.array_iterator;\n\n            default:\n                return m_it.primitive_iterator - other.m_it.primitive_iterator;\n        }\n    }\n\n    /*!\n    @brief access to successor\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    reference operator[](difference_type n) const\n    {\n        JSON_ASSERT(m_object != nullptr);\n\n        switch (m_object->m_type)\n        {\n            case value_t::object:\n                JSON_THROW(invalid_iterator::create(208, \"cannot use operator[] for object iterators\", *m_object));\n\n            case value_t::array:\n                return *std::next(m_it.array_iterator, n);\n\n            case value_t::null:\n                JSON_THROW(invalid_iterator::create(214, \"cannot get value\", *m_object));\n\n            default:\n            {\n                if (JSON_HEDLEY_LIKELY(m_it.primitive_iterator.get_value() == -n))\n                {\n                    return *m_object;\n                }\n\n                JSON_THROW(invalid_iterator::create(214, \"cannot get value\", *m_object));\n            }\n        }\n    }\n\n    /*!\n    @brief return the key of an object iterator\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    const typename object_t::key_type& key() const\n    {\n        JSON_ASSERT(m_object != nullptr);\n\n        if (JSON_HEDLEY_LIKELY(m_object->is_object()))\n        {\n            return m_it.object_iterator->first;\n        }\n\n        JSON_THROW(invalid_iterator::create(207, \"cannot use key() for non-object iterators\", *m_object));\n    }\n\n    /*!\n    @brief return the value of an iterator\n    @pre The iterator is initialized; i.e. `m_object != nullptr`.\n    */\n    reference value() const\n    {\n        return operator*();\n    }\n\n  JSON_PRIVATE_UNLESS_TESTED:\n    /// associated JSON instance\n    pointer m_object = nullptr;\n    /// the actual iterator of the associated instance\n    internal_iterator<typename std::remove_const<BasicJsonType>::type> m_it {};\n};\n} // namespace detail\n} // namespace nlohmann\n\n// #include <nlohmann/detail/iterators/iteration_proxy.hpp>\n\n// #include <nlohmann/detail/iterators/json_reverse_iterator.hpp>\n\n\n#include <cstddef> // ptrdiff_t\n#include <iterator> // reverse_iterator\n#include <utility> // declval\n\nnamespace nlohmann\n{\nnamespace detail\n{\n//////////////////////\n// reverse_iterator //\n//////////////////////\n\n/*!\n@brief a template for a reverse iterator class\n\n@tparam Base the base iterator type to reverse. Valid types are @ref\niterator (to create @ref reverse_iterator) and @ref const_iterator (to\ncreate @ref const_reverse_iterator).\n\n@requirement The class satisfies the following concept requirements:\n-\n[BidirectionalIterator](https://en.cppreference.com/w/cpp/named_req/BidirectionalIterator):\n  The iterator that can be moved can be moved in both directions (i.e.\n  incremented and decremented).\n- [OutputIterator](https://en.cppreference.com/w/cpp/named_req/OutputIterator):\n  It is possible to write to the pointed-to element (only if @a Base is\n  @ref iterator).\n\n@since version 1.0.0\n*/\ntemplate<typename Base>\nclass json_reverse_iterator : public std::reverse_iterator<Base>\n{\n  public:\n    using difference_type = std::ptrdiff_t;\n    /// shortcut to the reverse iterator adapter\n    using base_iterator = std::reverse_iterator<Base>;\n    /// the reference type for the pointed-to element\n    using reference = typename Base::reference;\n\n    /// create reverse iterator from iterator\n    explicit json_reverse_iterator(const typename base_iterator::iterator_type& it) noexcept\n        : base_iterator(it) {}\n\n    /// create reverse iterator from base class\n    explicit json_reverse_iterator(const base_iterator& it) noexcept : base_iterator(it) {}\n\n    /// post-increment (it++)\n    json_reverse_iterator const operator++(int) // NOLINT(readability-const-return-type)\n    {\n        return static_cast<json_reverse_iterator>(base_iterator::operator++(1));\n    }\n\n    /// pre-increment (++it)\n    json_reverse_iterator& operator++()\n    {\n        return static_cast<json_reverse_iterator&>(base_iterator::operator++());\n    }\n\n    /// post-decrement (it--)\n    json_reverse_iterator const operator--(int) // NOLINT(readability-const-return-type)\n    {\n        return static_cast<json_reverse_iterator>(base_iterator::operator--(1));\n    }\n\n    /// pre-decrement (--it)\n    json_reverse_iterator& operator--()\n    {\n        return static_cast<json_reverse_iterator&>(base_iterator::operator--());\n    }\n\n    /// add to iterator\n    json_reverse_iterator& operator+=(difference_type i)\n    {\n        return static_cast<json_reverse_iterator&>(base_iterator::operator+=(i));\n    }\n\n    /// add to iterator\n    json_reverse_iterator operator+(difference_type i) const\n    {\n        return static_cast<json_reverse_iterator>(base_iterator::operator+(i));\n    }\n\n    /// subtract from iterator\n    json_reverse_iterator operator-(difference_type i) const\n    {\n        return static_cast<json_reverse_iterator>(base_iterator::operator-(i));\n    }\n\n    /// return difference\n    difference_type operator-(const json_reverse_iterator& other) const\n    {\n        return base_iterator(*this) - base_iterator(other);\n    }\n\n    /// access to successor\n    reference operator[](difference_type n) const\n    {\n        return *(this->operator+(n));\n    }\n\n    /// return the key of an object iterator\n    auto key() const -> decltype(std::declval<Base>().key())\n    {\n        auto it = --this->base();\n        return it.key();\n    }\n\n    /// return the value of an iterator\n    reference value() const\n    {\n        auto it = --this->base();\n        return it.operator * ();\n    }\n};\n}  // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/iterators/primitive_iterator.hpp>\n\n// #include <nlohmann/detail/json_pointer.hpp>\n\n\n#include <algorithm> // all_of\n#include <cctype> // isdigit\n#include <limits> // max\n#include <numeric> // accumulate\n#include <string> // string\n#include <utility> // move\n#include <vector> // vector\n\n// #include <nlohmann/detail/exceptions.hpp>\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n// #include <nlohmann/detail/string_escape.hpp>\n\n// #include <nlohmann/detail/value_t.hpp>\n\n\nnamespace nlohmann\n{\ntemplate<typename BasicJsonType>\nclass json_pointer\n{\n    // allow basic_json to access private members\n    NLOHMANN_BASIC_JSON_TPL_DECLARATION\n    friend class basic_json;\n\n  public:\n    /*!\n    @brief create JSON pointer\n\n    Create a JSON pointer according to the syntax described in\n    [Section 3 of RFC6901](https://tools.ietf.org/html/rfc6901#section-3).\n\n    @param[in] s  string representing the JSON pointer; if omitted, the empty\n                  string is assumed which references the whole JSON value\n\n    @throw parse_error.107 if the given JSON pointer @a s is nonempty and does\n                           not begin with a slash (`/`); see example below\n\n    @throw parse_error.108 if a tilde (`~`) in the given JSON pointer @a s is\n    not followed by `0` (representing `~`) or `1` (representing `/`); see\n    example below\n\n    @liveexample{The example shows the construction several valid JSON pointers\n    as well as the exceptional behavior.,json_pointer}\n\n    @since version 2.0.0\n    */\n    explicit json_pointer(const std::string& s = \"\")\n        : reference_tokens(split(s))\n    {}\n\n    /*!\n    @brief return a string representation of the JSON pointer\n\n    @invariant For each JSON pointer `ptr`, it holds:\n    @code {.cpp}\n    ptr == json_pointer(ptr.to_string());\n    @endcode\n\n    @return a string representation of the JSON pointer\n\n    @liveexample{The example shows the result of `to_string`.,json_pointer__to_string}\n\n    @since version 2.0.0\n    */\n    std::string to_string() const\n    {\n        return std::accumulate(reference_tokens.begin(), reference_tokens.end(),\n                               std::string{},\n                               [](const std::string & a, const std::string & b)\n        {\n            return a + \"/\" + detail::escape(b);\n        });\n    }\n\n    /// @copydoc to_string()\n    operator std::string() const\n    {\n        return to_string();\n    }\n\n    /*!\n    @brief append another JSON pointer at the end of this JSON pointer\n\n    @param[in] ptr  JSON pointer to append\n    @return JSON pointer with @a ptr appended\n\n    @liveexample{The example shows the usage of `operator/=`.,json_pointer__operator_add}\n\n    @complexity Linear in the length of @a ptr.\n\n    @sa see @ref operator/=(std::string) to append a reference token\n    @sa see @ref operator/=(std::size_t) to append an array index\n    @sa see @ref operator/(const json_pointer&, const json_pointer&) for a binary operator\n\n    @since version 3.6.0\n    */\n    json_pointer& operator/=(const json_pointer& ptr)\n    {\n        reference_tokens.insert(reference_tokens.end(),\n                                ptr.reference_tokens.begin(),\n                                ptr.reference_tokens.end());\n        return *this;\n    }\n\n    /*!\n    @brief append an unescaped reference token at the end of this JSON pointer\n\n    @param[in] token  reference token to append\n    @return JSON pointer with @a token appended without escaping @a token\n\n    @liveexample{The example shows the usage of `operator/=`.,json_pointer__operator_add}\n\n    @complexity Amortized constant.\n\n    @sa see @ref operator/=(const json_pointer&) to append a JSON pointer\n    @sa see @ref operator/=(std::size_t) to append an array index\n    @sa see @ref operator/(const json_pointer&, std::size_t) for a binary operator\n\n    @since version 3.6.0\n    */\n    json_pointer& operator/=(std::string token)\n    {\n        push_back(std::move(token));\n        return *this;\n    }\n\n    /*!\n    @brief append an array index at the end of this JSON pointer\n\n    @param[in] array_idx  array index to append\n    @return JSON pointer with @a array_idx appended\n\n    @liveexample{The example shows the usage of `operator/=`.,json_pointer__operator_add}\n\n    @complexity Amortized constant.\n\n    @sa see @ref operator/=(const json_pointer&) to append a JSON pointer\n    @sa see @ref operator/=(std::string) to append a reference token\n    @sa see @ref operator/(const json_pointer&, std::string) for a binary operator\n\n    @since version 3.6.0\n    */\n    json_pointer& operator/=(std::size_t array_idx)\n    {\n        return *this /= std::to_string(array_idx);\n    }\n\n    /*!\n    @brief create a new JSON pointer by appending the right JSON pointer at the end of the left JSON pointer\n\n    @param[in] lhs  JSON pointer\n    @param[in] rhs  JSON pointer\n    @return a new JSON pointer with @a rhs appended to @a lhs\n\n    @liveexample{The example shows the usage of `operator/`.,json_pointer__operator_add_binary}\n\n    @complexity Linear in the length of @a lhs and @a rhs.\n\n    @sa see @ref operator/=(const json_pointer&) to append a JSON pointer\n\n    @since version 3.6.0\n    */\n    friend json_pointer operator/(const json_pointer& lhs,\n                                  const json_pointer& rhs)\n    {\n        return json_pointer(lhs) /= rhs;\n    }\n\n    /*!\n    @brief create a new JSON pointer by appending the unescaped token at the end of the JSON pointer\n\n    @param[in] ptr  JSON pointer\n    @param[in] token  reference token\n    @return a new JSON pointer with unescaped @a token appended to @a ptr\n\n    @liveexample{The example shows the usage of `operator/`.,json_pointer__operator_add_binary}\n\n    @complexity Linear in the length of @a ptr.\n\n    @sa see @ref operator/=(std::string) to append a reference token\n\n    @since version 3.6.0\n    */\n    friend json_pointer operator/(const json_pointer& ptr, std::string token) // NOLINT(performance-unnecessary-value-param)\n    {\n        return json_pointer(ptr) /= std::move(token);\n    }\n\n    /*!\n    @brief create a new JSON pointer by appending the array-index-token at the end of the JSON pointer\n\n    @param[in] ptr  JSON pointer\n    @param[in] array_idx  array index\n    @return a new JSON pointer with @a array_idx appended to @a ptr\n\n    @liveexample{The example shows the usage of `operator/`.,json_pointer__operator_add_binary}\n\n    @complexity Linear in the length of @a ptr.\n\n    @sa see @ref operator/=(std::size_t) to append an array index\n\n    @since version 3.6.0\n    */\n    friend json_pointer operator/(const json_pointer& ptr, std::size_t array_idx)\n    {\n        return json_pointer(ptr) /= array_idx;\n    }\n\n    /*!\n    @brief returns the parent of this JSON pointer\n\n    @return parent of this JSON pointer; in case this JSON pointer is the root,\n            the root itself is returned\n\n    @complexity Linear in the length of the JSON pointer.\n\n    @liveexample{The example shows the result of `parent_pointer` for different\n    JSON Pointers.,json_pointer__parent_pointer}\n\n    @since version 3.6.0\n    */\n    json_pointer parent_pointer() const\n    {\n        if (empty())\n        {\n            return *this;\n        }\n\n        json_pointer res = *this;\n        res.pop_back();\n        return res;\n    }\n\n    /*!\n    @brief remove last reference token\n\n    @pre not `empty()`\n\n    @liveexample{The example shows the usage of `pop_back`.,json_pointer__pop_back}\n\n    @complexity Constant.\n\n    @throw out_of_range.405 if JSON pointer has no parent\n\n    @since version 3.6.0\n    */\n    void pop_back()\n    {\n        if (JSON_HEDLEY_UNLIKELY(empty()))\n        {\n            JSON_THROW(detail::out_of_range::create(405, \"JSON pointer has no parent\", BasicJsonType()));\n        }\n\n        reference_tokens.pop_back();\n    }\n\n    /*!\n    @brief return last reference token\n\n    @pre not `empty()`\n    @return last reference token\n\n    @liveexample{The example shows the usage of `back`.,json_pointer__back}\n\n    @complexity Constant.\n\n    @throw out_of_range.405 if JSON pointer has no parent\n\n    @since version 3.6.0\n    */\n    const std::string& back() const\n    {\n        if (JSON_HEDLEY_UNLIKELY(empty()))\n        {\n            JSON_THROW(detail::out_of_range::create(405, \"JSON pointer has no parent\", BasicJsonType()));\n        }\n\n        return reference_tokens.back();\n    }\n\n    /*!\n    @brief append an unescaped token at the end of the reference pointer\n\n    @param[in] token  token to add\n\n    @complexity Amortized constant.\n\n    @liveexample{The example shows the result of `push_back` for different\n    JSON Pointers.,json_pointer__push_back}\n\n    @since version 3.6.0\n    */\n    void push_back(const std::string& token)\n    {\n        reference_tokens.push_back(token);\n    }\n\n    /// @copydoc push_back(const std::string&)\n    void push_back(std::string&& token)\n    {\n        reference_tokens.push_back(std::move(token));\n    }\n\n    /*!\n    @brief return whether pointer points to the root document\n\n    @return true iff the JSON pointer points to the root document\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this function never throws exceptions.\n\n    @liveexample{The example shows the result of `empty` for different JSON\n    Pointers.,json_pointer__empty}\n\n    @since version 3.6.0\n    */\n    bool empty() const noexcept\n    {\n        return reference_tokens.empty();\n    }\n\n  private:\n    /*!\n    @param[in] s  reference token to be converted into an array index\n\n    @return integer representation of @a s\n\n    @throw parse_error.106  if an array index begins with '0'\n    @throw parse_error.109  if an array index begins not with a digit\n    @throw out_of_range.404 if string @a s could not be converted to an integer\n    @throw out_of_range.410 if an array index exceeds size_type\n    */\n    static typename BasicJsonType::size_type array_index(const std::string& s)\n    {\n        using size_type = typename BasicJsonType::size_type;\n\n        // error condition (cf. RFC 6901, Sect. 4)\n        if (JSON_HEDLEY_UNLIKELY(s.size() > 1 && s[0] == '0'))\n        {\n            JSON_THROW(detail::parse_error::create(106, 0, \"array index '\" + s + \"' must not begin with '0'\", BasicJsonType()));\n        }\n\n        // error condition (cf. RFC 6901, Sect. 4)\n        if (JSON_HEDLEY_UNLIKELY(s.size() > 1 && !(s[0] >= '1' && s[0] <= '9')))\n        {\n            JSON_THROW(detail::parse_error::create(109, 0, \"array index '\" + s + \"' is not a number\", BasicJsonType()));\n        }\n\n        std::size_t processed_chars = 0;\n        unsigned long long res = 0;  // NOLINT(runtime/int)\n        JSON_TRY\n        {\n            res = std::stoull(s, &processed_chars);\n        }\n        JSON_CATCH(std::out_of_range&)\n        {\n            JSON_THROW(detail::out_of_range::create(404, \"unresolved reference token '\" + s + \"'\", BasicJsonType()));\n        }\n\n        // check if the string was completely read\n        if (JSON_HEDLEY_UNLIKELY(processed_chars != s.size()))\n        {\n            JSON_THROW(detail::out_of_range::create(404, \"unresolved reference token '\" + s + \"'\", BasicJsonType()));\n        }\n\n        // only triggered on special platforms (like 32bit), see also\n        // https://github.com/nlohmann/json/pull/2203\n        if (res >= static_cast<unsigned long long>((std::numeric_limits<size_type>::max)()))  // NOLINT(runtime/int)\n        {\n            JSON_THROW(detail::out_of_range::create(410, \"array index \" + s + \" exceeds size_type\", BasicJsonType())); // LCOV_EXCL_LINE\n        }\n\n        return static_cast<size_type>(res);\n    }\n\n  JSON_PRIVATE_UNLESS_TESTED:\n    json_pointer top() const\n    {\n        if (JSON_HEDLEY_UNLIKELY(empty()))\n        {\n            JSON_THROW(detail::out_of_range::create(405, \"JSON pointer has no parent\", BasicJsonType()));\n        }\n\n        json_pointer result = *this;\n        result.reference_tokens = {reference_tokens[0]};\n        return result;\n    }\n\n  private:\n    /*!\n    @brief create and return a reference to the pointed to value\n\n    @complexity Linear in the number of reference tokens.\n\n    @throw parse_error.109 if array index is not a number\n    @throw type_error.313 if value cannot be unflattened\n    */\n    BasicJsonType& get_and_create(BasicJsonType& j) const\n    {\n        auto* result = &j;\n\n        // in case no reference tokens exist, return a reference to the JSON value\n        // j which will be overwritten by a primitive value\n        for (const auto& reference_token : reference_tokens)\n        {\n            switch (result->type())\n            {\n                case detail::value_t::null:\n                {\n                    if (reference_token == \"0\")\n                    {\n                        // start a new array if reference token is 0\n                        result = &result->operator[](0);\n                    }\n                    else\n                    {\n                        // start a new object otherwise\n                        result = &result->operator[](reference_token);\n                    }\n                    break;\n                }\n\n                case detail::value_t::object:\n                {\n                    // create an entry in the object\n                    result = &result->operator[](reference_token);\n                    break;\n                }\n\n                case detail::value_t::array:\n                {\n                    // create an entry in the array\n                    result = &result->operator[](array_index(reference_token));\n                    break;\n                }\n\n                /*\n                The following code is only reached if there exists a reference\n                token _and_ the current value is primitive. In this case, we have\n                an error situation, because primitive values may only occur as\n                single value; that is, with an empty list of reference tokens.\n                */\n                default:\n                    JSON_THROW(detail::type_error::create(313, \"invalid value to unflatten\", j));\n            }\n        }\n\n        return *result;\n    }\n\n    /*!\n    @brief return a reference to the pointed to value\n\n    @note This version does not throw if a value is not present, but tries to\n          create nested values instead. For instance, calling this function\n          with pointer `\"/this/that\"` on a null value is equivalent to calling\n          `operator[](\"this\").operator[](\"that\")` on that value, effectively\n          changing the null value to an object.\n\n    @param[in] ptr  a JSON value\n\n    @return reference to the JSON value pointed to by the JSON pointer\n\n    @complexity Linear in the length of the JSON pointer.\n\n    @throw parse_error.106   if an array index begins with '0'\n    @throw parse_error.109   if an array index was not a number\n    @throw out_of_range.404  if the JSON pointer can not be resolved\n    */\n    BasicJsonType& get_unchecked(BasicJsonType* ptr) const\n    {\n        for (const auto& reference_token : reference_tokens)\n        {\n            // convert null values to arrays or objects before continuing\n            if (ptr->is_null())\n            {\n                // check if reference token is a number\n                const bool nums =\n                    std::all_of(reference_token.begin(), reference_token.end(),\n                                [](const unsigned char x)\n                {\n                    return std::isdigit(x);\n                });\n\n                // change value to array for numbers or \"-\" or to object otherwise\n                *ptr = (nums || reference_token == \"-\")\n                       ? detail::value_t::array\n                       : detail::value_t::object;\n            }\n\n            switch (ptr->type())\n            {\n                case detail::value_t::object:\n                {\n                    // use unchecked object access\n                    ptr = &ptr->operator[](reference_token);\n                    break;\n                }\n\n                case detail::value_t::array:\n                {\n                    if (reference_token == \"-\")\n                    {\n                        // explicitly treat \"-\" as index beyond the end\n                        ptr = &ptr->operator[](ptr->m_value.array->size());\n                    }\n                    else\n                    {\n                        // convert array index to number; unchecked access\n                        ptr = &ptr->operator[](array_index(reference_token));\n                    }\n                    break;\n                }\n\n                default:\n                    JSON_THROW(detail::out_of_range::create(404, \"unresolved reference token '\" + reference_token + \"'\", *ptr));\n            }\n        }\n\n        return *ptr;\n    }\n\n    /*!\n    @throw parse_error.106   if an array index begins with '0'\n    @throw parse_error.109   if an array index was not a number\n    @throw out_of_range.402  if the array index '-' is used\n    @throw out_of_range.404  if the JSON pointer can not be resolved\n    */\n    BasicJsonType& get_checked(BasicJsonType* ptr) const\n    {\n        for (const auto& reference_token : reference_tokens)\n        {\n            switch (ptr->type())\n            {\n                case detail::value_t::object:\n                {\n                    // note: at performs range check\n                    ptr = &ptr->at(reference_token);\n                    break;\n                }\n\n                case detail::value_t::array:\n                {\n                    if (JSON_HEDLEY_UNLIKELY(reference_token == \"-\"))\n                    {\n                        // \"-\" always fails the range check\n                        JSON_THROW(detail::out_of_range::create(402,\n                                                                \"array index '-' (\" + std::to_string(ptr->m_value.array->size()) +\n                                                                \") is out of range\", *ptr));\n                    }\n\n                    // note: at performs range check\n                    ptr = &ptr->at(array_index(reference_token));\n                    break;\n                }\n\n                default:\n                    JSON_THROW(detail::out_of_range::create(404, \"unresolved reference token '\" + reference_token + \"'\", *ptr));\n            }\n        }\n\n        return *ptr;\n    }\n\n    /*!\n    @brief return a const reference to the pointed to value\n\n    @param[in] ptr  a JSON value\n\n    @return const reference to the JSON value pointed to by the JSON\n    pointer\n\n    @throw parse_error.106   if an array index begins with '0'\n    @throw parse_error.109   if an array index was not a number\n    @throw out_of_range.402  if the array index '-' is used\n    @throw out_of_range.404  if the JSON pointer can not be resolved\n    */\n    const BasicJsonType& get_unchecked(const BasicJsonType* ptr) const\n    {\n        for (const auto& reference_token : reference_tokens)\n        {\n            switch (ptr->type())\n            {\n                case detail::value_t::object:\n                {\n                    // use unchecked object access\n                    ptr = &ptr->operator[](reference_token);\n                    break;\n                }\n\n                case detail::value_t::array:\n                {\n                    if (JSON_HEDLEY_UNLIKELY(reference_token == \"-\"))\n                    {\n                        // \"-\" cannot be used for const access\n                        JSON_THROW(detail::out_of_range::create(402, \"array index '-' (\" + std::to_string(ptr->m_value.array->size()) + \") is out of range\", *ptr));\n                    }\n\n                    // use unchecked array access\n                    ptr = &ptr->operator[](array_index(reference_token));\n                    break;\n                }\n\n                default:\n                    JSON_THROW(detail::out_of_range::create(404, \"unresolved reference token '\" + reference_token + \"'\", *ptr));\n            }\n        }\n\n        return *ptr;\n    }\n\n    /*!\n    @throw parse_error.106   if an array index begins with '0'\n    @throw parse_error.109   if an array index was not a number\n    @throw out_of_range.402  if the array index '-' is used\n    @throw out_of_range.404  if the JSON pointer can not be resolved\n    */\n    const BasicJsonType& get_checked(const BasicJsonType* ptr) const\n    {\n        for (const auto& reference_token : reference_tokens)\n        {\n            switch (ptr->type())\n            {\n                case detail::value_t::object:\n                {\n                    // note: at performs range check\n                    ptr = &ptr->at(reference_token);\n                    break;\n                }\n\n                case detail::value_t::array:\n                {\n                    if (JSON_HEDLEY_UNLIKELY(reference_token == \"-\"))\n                    {\n                        // \"-\" always fails the range check\n                        JSON_THROW(detail::out_of_range::create(402,\n                                                                \"array index '-' (\" + std::to_string(ptr->m_value.array->size()) +\n                                                                \") is out of range\", *ptr));\n                    }\n\n                    // note: at performs range check\n                    ptr = &ptr->at(array_index(reference_token));\n                    break;\n                }\n\n                default:\n                    JSON_THROW(detail::out_of_range::create(404, \"unresolved reference token '\" + reference_token + \"'\", *ptr));\n            }\n        }\n\n        return *ptr;\n    }\n\n    /*!\n    @throw parse_error.106   if an array index begins with '0'\n    @throw parse_error.109   if an array index was not a number\n    */\n    bool contains(const BasicJsonType* ptr) const\n    {\n        for (const auto& reference_token : reference_tokens)\n        {\n            switch (ptr->type())\n            {\n                case detail::value_t::object:\n                {\n                    if (!ptr->contains(reference_token))\n                    {\n                        // we did not find the key in the object\n                        return false;\n                    }\n\n                    ptr = &ptr->operator[](reference_token);\n                    break;\n                }\n\n                case detail::value_t::array:\n                {\n                    if (JSON_HEDLEY_UNLIKELY(reference_token == \"-\"))\n                    {\n                        // \"-\" always fails the range check\n                        return false;\n                    }\n                    if (JSON_HEDLEY_UNLIKELY(reference_token.size() == 1 && !(\"0\" <= reference_token && reference_token <= \"9\")))\n                    {\n                        // invalid char\n                        return false;\n                    }\n                    if (JSON_HEDLEY_UNLIKELY(reference_token.size() > 1))\n                    {\n                        if (JSON_HEDLEY_UNLIKELY(!('1' <= reference_token[0] && reference_token[0] <= '9')))\n                        {\n                            // first char should be between '1' and '9'\n                            return false;\n                        }\n                        for (std::size_t i = 1; i < reference_token.size(); i++)\n                        {\n                            if (JSON_HEDLEY_UNLIKELY(!('0' <= reference_token[i] && reference_token[i] <= '9')))\n                            {\n                                // other char should be between '0' and '9'\n                                return false;\n                            }\n                        }\n                    }\n\n                    const auto idx = array_index(reference_token);\n                    if (idx >= ptr->size())\n                    {\n                        // index out of range\n                        return false;\n                    }\n\n                    ptr = &ptr->operator[](idx);\n                    break;\n                }\n\n                default:\n                {\n                    // we do not expect primitive values if there is still a\n                    // reference token to process\n                    return false;\n                }\n            }\n        }\n\n        // no reference token left means we found a primitive value\n        return true;\n    }\n\n    /*!\n    @brief split the string input to reference tokens\n\n    @note This function is only called by the json_pointer constructor.\n          All exceptions below are documented there.\n\n    @throw parse_error.107  if the pointer is not empty or begins with '/'\n    @throw parse_error.108  if character '~' is not followed by '0' or '1'\n    */\n    static std::vector<std::string> split(const std::string& reference_string)\n    {\n        std::vector<std::string> result;\n\n        // special case: empty reference string -> no reference tokens\n        if (reference_string.empty())\n        {\n            return result;\n        }\n\n        // check if nonempty reference string begins with slash\n        if (JSON_HEDLEY_UNLIKELY(reference_string[0] != '/'))\n        {\n            JSON_THROW(detail::parse_error::create(107, 1, \"JSON pointer must be empty or begin with '/' - was: '\" + reference_string + \"'\", BasicJsonType()));\n        }\n\n        // extract the reference tokens:\n        // - slash: position of the last read slash (or end of string)\n        // - start: position after the previous slash\n        for (\n            // search for the first slash after the first character\n            std::size_t slash = reference_string.find_first_of('/', 1),\n            // set the beginning of the first reference token\n            start = 1;\n            // we can stop if start == 0 (if slash == std::string::npos)\n            start != 0;\n            // set the beginning of the next reference token\n            // (will eventually be 0 if slash == std::string::npos)\n            start = (slash == std::string::npos) ? 0 : slash + 1,\n            // find next slash\n            slash = reference_string.find_first_of('/', start))\n        {\n            // use the text between the beginning of the reference token\n            // (start) and the last slash (slash).\n            auto reference_token = reference_string.substr(start, slash - start);\n\n            // check reference tokens are properly escaped\n            for (std::size_t pos = reference_token.find_first_of('~');\n                    pos != std::string::npos;\n                    pos = reference_token.find_first_of('~', pos + 1))\n            {\n                JSON_ASSERT(reference_token[pos] == '~');\n\n                // ~ must be followed by 0 or 1\n                if (JSON_HEDLEY_UNLIKELY(pos == reference_token.size() - 1 ||\n                                         (reference_token[pos + 1] != '0' &&\n                                          reference_token[pos + 1] != '1')))\n                {\n                    JSON_THROW(detail::parse_error::create(108, 0, \"escape character '~' must be followed with '0' or '1'\", BasicJsonType()));\n                }\n            }\n\n            // finally, store the reference token\n            detail::unescape(reference_token);\n            result.push_back(reference_token);\n        }\n\n        return result;\n    }\n\n  private:\n    /*!\n    @param[in] reference_string  the reference string to the current value\n    @param[in] value             the value to consider\n    @param[in,out] result        the result object to insert values to\n\n    @note Empty objects or arrays are flattened to `null`.\n    */\n    static void flatten(const std::string& reference_string,\n                        const BasicJsonType& value,\n                        BasicJsonType& result)\n    {\n        switch (value.type())\n        {\n            case detail::value_t::array:\n            {\n                if (value.m_value.array->empty())\n                {\n                    // flatten empty array as null\n                    result[reference_string] = nullptr;\n                }\n                else\n                {\n                    // iterate array and use index as reference string\n                    for (std::size_t i = 0; i < value.m_value.array->size(); ++i)\n                    {\n                        flatten(reference_string + \"/\" + std::to_string(i),\n                                value.m_value.array->operator[](i), result);\n                    }\n                }\n                break;\n            }\n\n            case detail::value_t::object:\n            {\n                if (value.m_value.object->empty())\n                {\n                    // flatten empty object as null\n                    result[reference_string] = nullptr;\n                }\n                else\n                {\n                    // iterate object and use keys as reference string\n                    for (const auto& element : *value.m_value.object)\n                    {\n                        flatten(reference_string + \"/\" + detail::escape(element.first), element.second, result);\n                    }\n                }\n                break;\n            }\n\n            default:\n            {\n                // add primitive value with its reference string\n                result[reference_string] = value;\n                break;\n            }\n        }\n    }\n\n    /*!\n    @param[in] value  flattened JSON\n\n    @return unflattened JSON\n\n    @throw parse_error.109 if array index is not a number\n    @throw type_error.314  if value is not an object\n    @throw type_error.315  if object values are not primitive\n    @throw type_error.313  if value cannot be unflattened\n    */\n    static BasicJsonType\n    unflatten(const BasicJsonType& value)\n    {\n        if (JSON_HEDLEY_UNLIKELY(!value.is_object()))\n        {\n            JSON_THROW(detail::type_error::create(314, \"only objects can be unflattened\", value));\n        }\n\n        BasicJsonType result;\n\n        // iterate the JSON object values\n        for (const auto& element : *value.m_value.object)\n        {\n            if (JSON_HEDLEY_UNLIKELY(!element.second.is_primitive()))\n            {\n                JSON_THROW(detail::type_error::create(315, \"values in object must be primitive\", element.second));\n            }\n\n            // assign value to reference pointed to by JSON pointer; Note that if\n            // the JSON pointer is \"\" (i.e., points to the whole value), function\n            // get_and_create returns a reference to result itself. An assignment\n            // will then create a primitive value.\n            json_pointer(element.first).get_and_create(result) = element.second;\n        }\n\n        return result;\n    }\n\n    /*!\n    @brief compares two JSON pointers for equality\n\n    @param[in] lhs  JSON pointer to compare\n    @param[in] rhs  JSON pointer to compare\n    @return whether @a lhs is equal to @a rhs\n\n    @complexity Linear in the length of the JSON pointer\n\n    @exceptionsafety No-throw guarantee: this function never throws exceptions.\n    */\n    friend bool operator==(json_pointer const& lhs,\n                           json_pointer const& rhs) noexcept\n    {\n        return lhs.reference_tokens == rhs.reference_tokens;\n    }\n\n    /*!\n    @brief compares two JSON pointers for inequality\n\n    @param[in] lhs  JSON pointer to compare\n    @param[in] rhs  JSON pointer to compare\n    @return whether @a lhs is not equal @a rhs\n\n    @complexity Linear in the length of the JSON pointer\n\n    @exceptionsafety No-throw guarantee: this function never throws exceptions.\n    */\n    friend bool operator!=(json_pointer const& lhs,\n                           json_pointer const& rhs) noexcept\n    {\n        return !(lhs == rhs);\n    }\n\n    /// the reference tokens\n    std::vector<std::string> reference_tokens;\n};\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/json_ref.hpp>\n\n\n#include <initializer_list>\n#include <utility>\n\n// #include <nlohmann/detail/meta/type_traits.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\ntemplate<typename BasicJsonType>\nclass json_ref\n{\n  public:\n    using value_type = BasicJsonType;\n\n    json_ref(value_type&& value)\n        : owned_value(std::move(value))\n    {}\n\n    json_ref(const value_type& value)\n        : value_ref(&value)\n    {}\n\n    json_ref(std::initializer_list<json_ref> init)\n        : owned_value(init)\n    {}\n\n    template <\n        class... Args,\n        enable_if_t<std::is_constructible<value_type, Args...>::value, int> = 0 >\n    json_ref(Args && ... args)\n        : owned_value(std::forward<Args>(args)...)\n    {}\n\n    // class should be movable only\n    json_ref(json_ref&&) noexcept = default;\n    json_ref(const json_ref&) = delete;\n    json_ref& operator=(const json_ref&) = delete;\n    json_ref& operator=(json_ref&&) = delete;\n    ~json_ref() = default;\n\n    value_type moved_or_copied() const\n    {\n        if (value_ref == nullptr)\n        {\n            return std::move(owned_value);\n        }\n        return *value_ref;\n    }\n\n    value_type const& operator*() const\n    {\n        return value_ref ? *value_ref : owned_value;\n    }\n\n    value_type const* operator->() const\n    {\n        return &** this;\n    }\n\n  private:\n    mutable value_type owned_value = nullptr;\n    value_type const* value_ref = nullptr;\n};\n}  // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n// #include <nlohmann/detail/string_escape.hpp>\n\n// #include <nlohmann/detail/meta/cpp_future.hpp>\n\n// #include <nlohmann/detail/meta/type_traits.hpp>\n\n// #include <nlohmann/detail/output/binary_writer.hpp>\n\n\n#include <algorithm> // reverse\n#include <array> // array\n#include <cmath> // isnan, isinf\n#include <cstdint> // uint8_t, uint16_t, uint32_t, uint64_t\n#include <cstring> // memcpy\n#include <limits> // numeric_limits\n#include <string> // string\n#include <utility> // move\n\n// #include <nlohmann/detail/input/binary_reader.hpp>\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n// #include <nlohmann/detail/output/output_adapters.hpp>\n\n\n#include <algorithm> // copy\n#include <cstddef> // size_t\n#include <ios> // streamsize\n#include <iterator> // back_inserter\n#include <memory> // shared_ptr, make_shared\n#include <ostream> // basic_ostream\n#include <string> // basic_string\n#include <vector> // vector\n// #include <nlohmann/detail/macro_scope.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\n/// abstract output adapter interface\ntemplate<typename CharType> struct output_adapter_protocol\n{\n    virtual void write_character(CharType c) = 0;\n    virtual void write_characters(const CharType* s, std::size_t length) = 0;\n    virtual ~output_adapter_protocol() = default;\n\n    output_adapter_protocol() = default;\n    output_adapter_protocol(const output_adapter_protocol&) = default;\n    output_adapter_protocol(output_adapter_protocol&&) noexcept = default;\n    output_adapter_protocol& operator=(const output_adapter_protocol&) = default;\n    output_adapter_protocol& operator=(output_adapter_protocol&&) noexcept = default;\n};\n\n/// a type to simplify interfaces\ntemplate<typename CharType>\nusing output_adapter_t = std::shared_ptr<output_adapter_protocol<CharType>>;\n\n/// output adapter for byte vectors\ntemplate<typename CharType>\nclass output_vector_adapter : public output_adapter_protocol<CharType>\n{\n  public:\n    explicit output_vector_adapter(std::vector<CharType>& vec) noexcept\n        : v(vec)\n    {}\n\n    void write_character(CharType c) override\n    {\n        v.push_back(c);\n    }\n\n    JSON_HEDLEY_NON_NULL(2)\n    void write_characters(const CharType* s, std::size_t length) override\n    {\n        std::copy(s, s + length, std::back_inserter(v));\n    }\n\n  private:\n    std::vector<CharType>& v;\n};\n\n/// output adapter for output streams\ntemplate<typename CharType>\nclass output_stream_adapter : public output_adapter_protocol<CharType>\n{\n  public:\n    explicit output_stream_adapter(std::basic_ostream<CharType>& s) noexcept\n        : stream(s)\n    {}\n\n    void write_character(CharType c) override\n    {\n        stream.put(c);\n    }\n\n    JSON_HEDLEY_NON_NULL(2)\n    void write_characters(const CharType* s, std::size_t length) override\n    {\n        stream.write(s, static_cast<std::streamsize>(length));\n    }\n\n  private:\n    std::basic_ostream<CharType>& stream;\n};\n\n/// output adapter for basic_string\ntemplate<typename CharType, typename StringType = std::basic_string<CharType>>\nclass output_string_adapter : public output_adapter_protocol<CharType>\n{\n  public:\n    explicit output_string_adapter(StringType& s) noexcept\n        : str(s)\n    {}\n\n    void write_character(CharType c) override\n    {\n        str.push_back(c);\n    }\n\n    JSON_HEDLEY_NON_NULL(2)\n    void write_characters(const CharType* s, std::size_t length) override\n    {\n        str.append(s, length);\n    }\n\n  private:\n    StringType& str;\n};\n\ntemplate<typename CharType, typename StringType = std::basic_string<CharType>>\nclass output_adapter\n{\n  public:\n    output_adapter(std::vector<CharType>& vec)\n        : oa(std::make_shared<output_vector_adapter<CharType>>(vec)) {}\n\n    output_adapter(std::basic_ostream<CharType>& s)\n        : oa(std::make_shared<output_stream_adapter<CharType>>(s)) {}\n\n    output_adapter(StringType& s)\n        : oa(std::make_shared<output_string_adapter<CharType, StringType>>(s)) {}\n\n    operator output_adapter_t<CharType>()\n    {\n        return oa;\n    }\n\n  private:\n    output_adapter_t<CharType> oa = nullptr;\n};\n}  // namespace detail\n}  // namespace nlohmann\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\n///////////////////\n// binary writer //\n///////////////////\n\n/*!\n@brief serialization to CBOR and MessagePack values\n*/\ntemplate<typename BasicJsonType, typename CharType>\nclass binary_writer\n{\n    using string_t = typename BasicJsonType::string_t;\n    using binary_t = typename BasicJsonType::binary_t;\n    using number_float_t = typename BasicJsonType::number_float_t;\n\n  public:\n    /*!\n    @brief create a binary writer\n\n    @param[in] adapter  output adapter to write to\n    */\n    explicit binary_writer(output_adapter_t<CharType> adapter) : oa(std::move(adapter))\n    {\n        JSON_ASSERT(oa);\n    }\n\n    /*!\n    @param[in] j  JSON value to serialize\n    @pre       j.type() == value_t::object\n    */\n    void write_bson(const BasicJsonType& j)\n    {\n        switch (j.type())\n        {\n            case value_t::object:\n            {\n                write_bson_object(*j.m_value.object);\n                break;\n            }\n\n            default:\n            {\n                JSON_THROW(type_error::create(317, \"to serialize to BSON, top-level type must be object, but is \" + std::string(j.type_name()), j));;\n            }\n        }\n    }\n\n    /*!\n    @param[in] j  JSON value to serialize\n    */\n    void write_cbor(const BasicJsonType& j)\n    {\n        switch (j.type())\n        {\n            case value_t::null:\n            {\n                oa->write_character(to_char_type(0xF6));\n                break;\n            }\n\n            case value_t::boolean:\n            {\n                oa->write_character(j.m_value.boolean\n                                    ? to_char_type(0xF5)\n                                    : to_char_type(0xF4));\n                break;\n            }\n\n            case value_t::number_integer:\n            {\n                if (j.m_value.number_integer >= 0)\n                {\n                    // CBOR does not differentiate between positive signed\n                    // integers and unsigned integers. Therefore, we used the\n                    // code from the value_t::number_unsigned case here.\n                    if (j.m_value.number_integer <= 0x17)\n                    {\n                        write_number(static_cast<std::uint8_t>(j.m_value.number_integer));\n                    }\n                    else if (j.m_value.number_integer <= (std::numeric_limits<std::uint8_t>::max)())\n                    {\n                        oa->write_character(to_char_type(0x18));\n                        write_number(static_cast<std::uint8_t>(j.m_value.number_integer));\n                    }\n                    else if (j.m_value.number_integer <= (std::numeric_limits<std::uint16_t>::max)())\n                    {\n                        oa->write_character(to_char_type(0x19));\n                        write_number(static_cast<std::uint16_t>(j.m_value.number_integer));\n                    }\n                    else if (j.m_value.number_integer <= (std::numeric_limits<std::uint32_t>::max)())\n                    {\n                        oa->write_character(to_char_type(0x1A));\n                        write_number(static_cast<std::uint32_t>(j.m_value.number_integer));\n                    }\n                    else\n                    {\n                        oa->write_character(to_char_type(0x1B));\n                        write_number(static_cast<std::uint64_t>(j.m_value.number_integer));\n                    }\n                }\n                else\n                {\n                    // The conversions below encode the sign in the first\n                    // byte, and the value is converted to a positive number.\n                    const auto positive_number = -1 - j.m_value.number_integer;\n                    if (j.m_value.number_integer >= -24)\n                    {\n                        write_number(static_cast<std::uint8_t>(0x20 + positive_number));\n                    }\n                    else if (positive_number <= (std::numeric_limits<std::uint8_t>::max)())\n                    {\n                        oa->write_character(to_char_type(0x38));\n                        write_number(static_cast<std::uint8_t>(positive_number));\n                    }\n                    else if (positive_number <= (std::numeric_limits<std::uint16_t>::max)())\n                    {\n                        oa->write_character(to_char_type(0x39));\n                        write_number(static_cast<std::uint16_t>(positive_number));\n                    }\n                    else if (positive_number <= (std::numeric_limits<std::uint32_t>::max)())\n                    {\n                        oa->write_character(to_char_type(0x3A));\n                        write_number(static_cast<std::uint32_t>(positive_number));\n                    }\n                    else\n                    {\n                        oa->write_character(to_char_type(0x3B));\n                        write_number(static_cast<std::uint64_t>(positive_number));\n                    }\n                }\n                break;\n            }\n\n            case value_t::number_unsigned:\n            {\n                if (j.m_value.number_unsigned <= 0x17)\n                {\n                    write_number(static_cast<std::uint8_t>(j.m_value.number_unsigned));\n                }\n                else if (j.m_value.number_unsigned <= (std::numeric_limits<std::uint8_t>::max)())\n                {\n                    oa->write_character(to_char_type(0x18));\n                    write_number(static_cast<std::uint8_t>(j.m_value.number_unsigned));\n                }\n                else if (j.m_value.number_unsigned <= (std::numeric_limits<std::uint16_t>::max)())\n                {\n                    oa->write_character(to_char_type(0x19));\n                    write_number(static_cast<std::uint16_t>(j.m_value.number_unsigned));\n                }\n                else if (j.m_value.number_unsigned <= (std::numeric_limits<std::uint32_t>::max)())\n                {\n                    oa->write_character(to_char_type(0x1A));\n                    write_number(static_cast<std::uint32_t>(j.m_value.number_unsigned));\n                }\n                else\n                {\n                    oa->write_character(to_char_type(0x1B));\n                    write_number(static_cast<std::uint64_t>(j.m_value.number_unsigned));\n                }\n                break;\n            }\n\n            case value_t::number_float:\n            {\n                if (std::isnan(j.m_value.number_float))\n                {\n                    // NaN is 0xf97e00 in CBOR\n                    oa->write_character(to_char_type(0xF9));\n                    oa->write_character(to_char_type(0x7E));\n                    oa->write_character(to_char_type(0x00));\n                }\n                else if (std::isinf(j.m_value.number_float))\n                {\n                    // Infinity is 0xf97c00, -Infinity is 0xf9fc00\n                    oa->write_character(to_char_type(0xf9));\n                    oa->write_character(j.m_value.number_float > 0 ? to_char_type(0x7C) : to_char_type(0xFC));\n                    oa->write_character(to_char_type(0x00));\n                }\n                else\n                {\n                    write_compact_float(j.m_value.number_float, detail::input_format_t::cbor);\n                }\n                break;\n            }\n\n            case value_t::string:\n            {\n                // step 1: write control byte and the string length\n                const auto N = j.m_value.string->size();\n                if (N <= 0x17)\n                {\n                    write_number(static_cast<std::uint8_t>(0x60 + N));\n                }\n                else if (N <= (std::numeric_limits<std::uint8_t>::max)())\n                {\n                    oa->write_character(to_char_type(0x78));\n                    write_number(static_cast<std::uint8_t>(N));\n                }\n                else if (N <= (std::numeric_limits<std::uint16_t>::max)())\n                {\n                    oa->write_character(to_char_type(0x79));\n                    write_number(static_cast<std::uint16_t>(N));\n                }\n                else if (N <= (std::numeric_limits<std::uint32_t>::max)())\n                {\n                    oa->write_character(to_char_type(0x7A));\n                    write_number(static_cast<std::uint32_t>(N));\n                }\n                // LCOV_EXCL_START\n                else if (N <= (std::numeric_limits<std::uint64_t>::max)())\n                {\n                    oa->write_character(to_char_type(0x7B));\n                    write_number(static_cast<std::uint64_t>(N));\n                }\n                // LCOV_EXCL_STOP\n\n                // step 2: write the string\n                oa->write_characters(\n                    reinterpret_cast<const CharType*>(j.m_value.string->c_str()),\n                    j.m_value.string->size());\n                break;\n            }\n\n            case value_t::array:\n            {\n                // step 1: write control byte and the array size\n                const auto N = j.m_value.array->size();\n                if (N <= 0x17)\n                {\n                    write_number(static_cast<std::uint8_t>(0x80 + N));\n                }\n                else if (N <= (std::numeric_limits<std::uint8_t>::max)())\n                {\n                    oa->write_character(to_char_type(0x98));\n                    write_number(static_cast<std::uint8_t>(N));\n                }\n                else if (N <= (std::numeric_limits<std::uint16_t>::max)())\n                {\n                    oa->write_character(to_char_type(0x99));\n                    write_number(static_cast<std::uint16_t>(N));\n                }\n                else if (N <= (std::numeric_limits<std::uint32_t>::max)())\n                {\n                    oa->write_character(to_char_type(0x9A));\n                    write_number(static_cast<std::uint32_t>(N));\n                }\n                // LCOV_EXCL_START\n                else if (N <= (std::numeric_limits<std::uint64_t>::max)())\n                {\n                    oa->write_character(to_char_type(0x9B));\n                    write_number(static_cast<std::uint64_t>(N));\n                }\n                // LCOV_EXCL_STOP\n\n                // step 2: write each element\n                for (const auto& el : *j.m_value.array)\n                {\n                    write_cbor(el);\n                }\n                break;\n            }\n\n            case value_t::binary:\n            {\n                if (j.m_value.binary->has_subtype())\n                {\n                    write_number(static_cast<std::uint8_t>(0xd8));\n                    write_number(j.m_value.binary->subtype());\n                }\n\n                // step 1: write control byte and the binary array size\n                const auto N = j.m_value.binary->size();\n                if (N <= 0x17)\n                {\n                    write_number(static_cast<std::uint8_t>(0x40 + N));\n                }\n                else if (N <= (std::numeric_limits<std::uint8_t>::max)())\n                {\n                    oa->write_character(to_char_type(0x58));\n                    write_number(static_cast<std::uint8_t>(N));\n                }\n                else if (N <= (std::numeric_limits<std::uint16_t>::max)())\n                {\n                    oa->write_character(to_char_type(0x59));\n                    write_number(static_cast<std::uint16_t>(N));\n                }\n                else if (N <= (std::numeric_limits<std::uint32_t>::max)())\n                {\n                    oa->write_character(to_char_type(0x5A));\n                    write_number(static_cast<std::uint32_t>(N));\n                }\n                // LCOV_EXCL_START\n                else if (N <= (std::numeric_limits<std::uint64_t>::max)())\n                {\n                    oa->write_character(to_char_type(0x5B));\n                    write_number(static_cast<std::uint64_t>(N));\n                }\n                // LCOV_EXCL_STOP\n\n                // step 2: write each element\n                oa->write_characters(\n                    reinterpret_cast<const CharType*>(j.m_value.binary->data()),\n                    N);\n\n                break;\n            }\n\n            case value_t::object:\n            {\n                // step 1: write control byte and the object size\n                const auto N = j.m_value.object->size();\n                if (N <= 0x17)\n                {\n                    write_number(static_cast<std::uint8_t>(0xA0 + N));\n                }\n                else if (N <= (std::numeric_limits<std::uint8_t>::max)())\n                {\n                    oa->write_character(to_char_type(0xB8));\n                    write_number(static_cast<std::uint8_t>(N));\n                }\n                else if (N <= (std::numeric_limits<std::uint16_t>::max)())\n                {\n                    oa->write_character(to_char_type(0xB9));\n                    write_number(static_cast<std::uint16_t>(N));\n                }\n                else if (N <= (std::numeric_limits<std::uint32_t>::max)())\n                {\n                    oa->write_character(to_char_type(0xBA));\n                    write_number(static_cast<std::uint32_t>(N));\n                }\n                // LCOV_EXCL_START\n                else if (N <= (std::numeric_limits<std::uint64_t>::max)())\n                {\n                    oa->write_character(to_char_type(0xBB));\n                    write_number(static_cast<std::uint64_t>(N));\n                }\n                // LCOV_EXCL_STOP\n\n                // step 2: write each element\n                for (const auto& el : *j.m_value.object)\n                {\n                    write_cbor(el.first);\n                    write_cbor(el.second);\n                }\n                break;\n            }\n\n            default:\n                break;\n        }\n    }\n\n    /*!\n    @param[in] j  JSON value to serialize\n    */\n    void write_msgpack(const BasicJsonType& j)\n    {\n        switch (j.type())\n        {\n            case value_t::null: // nil\n            {\n                oa->write_character(to_char_type(0xC0));\n                break;\n            }\n\n            case value_t::boolean: // true and false\n            {\n                oa->write_character(j.m_value.boolean\n                                    ? to_char_type(0xC3)\n                                    : to_char_type(0xC2));\n                break;\n            }\n\n            case value_t::number_integer:\n            {\n                if (j.m_value.number_integer >= 0)\n                {\n                    // MessagePack does not differentiate between positive\n                    // signed integers and unsigned integers. Therefore, we used\n                    // the code from the value_t::number_unsigned case here.\n                    if (j.m_value.number_unsigned < 128)\n                    {\n                        // positive fixnum\n                        write_number(static_cast<std::uint8_t>(j.m_value.number_integer));\n                    }\n                    else if (j.m_value.number_unsigned <= (std::numeric_limits<std::uint8_t>::max)())\n                    {\n                        // uint 8\n                        oa->write_character(to_char_type(0xCC));\n                        write_number(static_cast<std::uint8_t>(j.m_value.number_integer));\n                    }\n                    else if (j.m_value.number_unsigned <= (std::numeric_limits<std::uint16_t>::max)())\n                    {\n                        // uint 16\n                        oa->write_character(to_char_type(0xCD));\n                        write_number(static_cast<std::uint16_t>(j.m_value.number_integer));\n                    }\n                    else if (j.m_value.number_unsigned <= (std::numeric_limits<std::uint32_t>::max)())\n                    {\n                        // uint 32\n                        oa->write_character(to_char_type(0xCE));\n                        write_number(static_cast<std::uint32_t>(j.m_value.number_integer));\n                    }\n                    else if (j.m_value.number_unsigned <= (std::numeric_limits<std::uint64_t>::max)())\n                    {\n                        // uint 64\n                        oa->write_character(to_char_type(0xCF));\n                        write_number(static_cast<std::uint64_t>(j.m_value.number_integer));\n                    }\n                }\n                else\n                {\n                    if (j.m_value.number_integer >= -32)\n                    {\n                        // negative fixnum\n                        write_number(static_cast<std::int8_t>(j.m_value.number_integer));\n                    }\n                    else if (j.m_value.number_integer >= (std::numeric_limits<std::int8_t>::min)() &&\n                             j.m_value.number_integer <= (std::numeric_limits<std::int8_t>::max)())\n                    {\n                        // int 8\n                        oa->write_character(to_char_type(0xD0));\n                        write_number(static_cast<std::int8_t>(j.m_value.number_integer));\n                    }\n                    else if (j.m_value.number_integer >= (std::numeric_limits<std::int16_t>::min)() &&\n                             j.m_value.number_integer <= (std::numeric_limits<std::int16_t>::max)())\n                    {\n                        // int 16\n                        oa->write_character(to_char_type(0xD1));\n                        write_number(static_cast<std::int16_t>(j.m_value.number_integer));\n                    }\n                    else if (j.m_value.number_integer >= (std::numeric_limits<std::int32_t>::min)() &&\n                             j.m_value.number_integer <= (std::numeric_limits<std::int32_t>::max)())\n                    {\n                        // int 32\n                        oa->write_character(to_char_type(0xD2));\n                        write_number(static_cast<std::int32_t>(j.m_value.number_integer));\n                    }\n                    else if (j.m_value.number_integer >= (std::numeric_limits<std::int64_t>::min)() &&\n                             j.m_value.number_integer <= (std::numeric_limits<std::int64_t>::max)())\n                    {\n                        // int 64\n                        oa->write_character(to_char_type(0xD3));\n                        write_number(static_cast<std::int64_t>(j.m_value.number_integer));\n                    }\n                }\n                break;\n            }\n\n            case value_t::number_unsigned:\n            {\n                if (j.m_value.number_unsigned < 128)\n                {\n                    // positive fixnum\n                    write_number(static_cast<std::uint8_t>(j.m_value.number_integer));\n                }\n                else if (j.m_value.number_unsigned <= (std::numeric_limits<std::uint8_t>::max)())\n                {\n                    // uint 8\n                    oa->write_character(to_char_type(0xCC));\n                    write_number(static_cast<std::uint8_t>(j.m_value.number_integer));\n                }\n                else if (j.m_value.number_unsigned <= (std::numeric_limits<std::uint16_t>::max)())\n                {\n                    // uint 16\n                    oa->write_character(to_char_type(0xCD));\n                    write_number(static_cast<std::uint16_t>(j.m_value.number_integer));\n                }\n                else if (j.m_value.number_unsigned <= (std::numeric_limits<std::uint32_t>::max)())\n                {\n                    // uint 32\n                    oa->write_character(to_char_type(0xCE));\n                    write_number(static_cast<std::uint32_t>(j.m_value.number_integer));\n                }\n                else if (j.m_value.number_unsigned <= (std::numeric_limits<std::uint64_t>::max)())\n                {\n                    // uint 64\n                    oa->write_character(to_char_type(0xCF));\n                    write_number(static_cast<std::uint64_t>(j.m_value.number_integer));\n                }\n                break;\n            }\n\n            case value_t::number_float:\n            {\n                write_compact_float(j.m_value.number_float, detail::input_format_t::msgpack);\n                break;\n            }\n\n            case value_t::string:\n            {\n                // step 1: write control byte and the string length\n                const auto N = j.m_value.string->size();\n                if (N <= 31)\n                {\n                    // fixstr\n                    write_number(static_cast<std::uint8_t>(0xA0 | N));\n                }\n                else if (N <= (std::numeric_limits<std::uint8_t>::max)())\n                {\n                    // str 8\n                    oa->write_character(to_char_type(0xD9));\n                    write_number(static_cast<std::uint8_t>(N));\n                }\n                else if (N <= (std::numeric_limits<std::uint16_t>::max)())\n                {\n                    // str 16\n                    oa->write_character(to_char_type(0xDA));\n                    write_number(static_cast<std::uint16_t>(N));\n                }\n                else if (N <= (std::numeric_limits<std::uint32_t>::max)())\n                {\n                    // str 32\n                    oa->write_character(to_char_type(0xDB));\n                    write_number(static_cast<std::uint32_t>(N));\n                }\n\n                // step 2: write the string\n                oa->write_characters(\n                    reinterpret_cast<const CharType*>(j.m_value.string->c_str()),\n                    j.m_value.string->size());\n                break;\n            }\n\n            case value_t::array:\n            {\n                // step 1: write control byte and the array size\n                const auto N = j.m_value.array->size();\n                if (N <= 15)\n                {\n                    // fixarray\n                    write_number(static_cast<std::uint8_t>(0x90 | N));\n                }\n                else if (N <= (std::numeric_limits<std::uint16_t>::max)())\n                {\n                    // array 16\n                    oa->write_character(to_char_type(0xDC));\n                    write_number(static_cast<std::uint16_t>(N));\n                }\n                else if (N <= (std::numeric_limits<std::uint32_t>::max)())\n                {\n                    // array 32\n                    oa->write_character(to_char_type(0xDD));\n                    write_number(static_cast<std::uint32_t>(N));\n                }\n\n                // step 2: write each element\n                for (const auto& el : *j.m_value.array)\n                {\n                    write_msgpack(el);\n                }\n                break;\n            }\n\n            case value_t::binary:\n            {\n                // step 0: determine if the binary type has a set subtype to\n                // determine whether or not to use the ext or fixext types\n                const bool use_ext = j.m_value.binary->has_subtype();\n\n                // step 1: write control byte and the byte string length\n                const auto N = j.m_value.binary->size();\n                if (N <= (std::numeric_limits<std::uint8_t>::max)())\n                {\n                    std::uint8_t output_type{};\n                    bool fixed = true;\n                    if (use_ext)\n                    {\n                        switch (N)\n                        {\n                            case 1:\n                                output_type = 0xD4; // fixext 1\n                                break;\n                            case 2:\n                                output_type = 0xD5; // fixext 2\n                                break;\n                            case 4:\n                                output_type = 0xD6; // fixext 4\n                                break;\n                            case 8:\n                                output_type = 0xD7; // fixext 8\n                                break;\n                            case 16:\n                                output_type = 0xD8; // fixext 16\n                                break;\n                            default:\n                                output_type = 0xC7; // ext 8\n                                fixed = false;\n                                break;\n                        }\n\n                    }\n                    else\n                    {\n                        output_type = 0xC4; // bin 8\n                        fixed = false;\n                    }\n\n                    oa->write_character(to_char_type(output_type));\n                    if (!fixed)\n                    {\n                        write_number(static_cast<std::uint8_t>(N));\n                    }\n                }\n                else if (N <= (std::numeric_limits<std::uint16_t>::max)())\n                {\n                    std::uint8_t output_type = use_ext\n                                               ? 0xC8 // ext 16\n                                               : 0xC5; // bin 16\n\n                    oa->write_character(to_char_type(output_type));\n                    write_number(static_cast<std::uint16_t>(N));\n                }\n                else if (N <= (std::numeric_limits<std::uint32_t>::max)())\n                {\n                    std::uint8_t output_type = use_ext\n                                               ? 0xC9 // ext 32\n                                               : 0xC6; // bin 32\n\n                    oa->write_character(to_char_type(output_type));\n                    write_number(static_cast<std::uint32_t>(N));\n                }\n\n                // step 1.5: if this is an ext type, write the subtype\n                if (use_ext)\n                {\n                    write_number(static_cast<std::int8_t>(j.m_value.binary->subtype()));\n                }\n\n                // step 2: write the byte string\n                oa->write_characters(\n                    reinterpret_cast<const CharType*>(j.m_value.binary->data()),\n                    N);\n\n                break;\n            }\n\n            case value_t::object:\n            {\n                // step 1: write control byte and the object size\n                const auto N = j.m_value.object->size();\n                if (N <= 15)\n                {\n                    // fixmap\n                    write_number(static_cast<std::uint8_t>(0x80 | (N & 0xF)));\n                }\n                else if (N <= (std::numeric_limits<std::uint16_t>::max)())\n                {\n                    // map 16\n                    oa->write_character(to_char_type(0xDE));\n                    write_number(static_cast<std::uint16_t>(N));\n                }\n                else if (N <= (std::numeric_limits<std::uint32_t>::max)())\n                {\n                    // map 32\n                    oa->write_character(to_char_type(0xDF));\n                    write_number(static_cast<std::uint32_t>(N));\n                }\n\n                // step 2: write each element\n                for (const auto& el : *j.m_value.object)\n                {\n                    write_msgpack(el.first);\n                    write_msgpack(el.second);\n                }\n                break;\n            }\n\n            default:\n                break;\n        }\n    }\n\n    /*!\n    @param[in] j  JSON value to serialize\n    @param[in] use_count   whether to use '#' prefixes (optimized format)\n    @param[in] use_type    whether to use '$' prefixes (optimized format)\n    @param[in] add_prefix  whether prefixes need to be used for this value\n    */\n    void write_ubjson(const BasicJsonType& j, const bool use_count,\n                      const bool use_type, const bool add_prefix = true)\n    {\n        switch (j.type())\n        {\n            case value_t::null:\n            {\n                if (add_prefix)\n                {\n                    oa->write_character(to_char_type('Z'));\n                }\n                break;\n            }\n\n            case value_t::boolean:\n            {\n                if (add_prefix)\n                {\n                    oa->write_character(j.m_value.boolean\n                                        ? to_char_type('T')\n                                        : to_char_type('F'));\n                }\n                break;\n            }\n\n            case value_t::number_integer:\n            {\n                write_number_with_ubjson_prefix(j.m_value.number_integer, add_prefix);\n                break;\n            }\n\n            case value_t::number_unsigned:\n            {\n                write_number_with_ubjson_prefix(j.m_value.number_unsigned, add_prefix);\n                break;\n            }\n\n            case value_t::number_float:\n            {\n                write_number_with_ubjson_prefix(j.m_value.number_float, add_prefix);\n                break;\n            }\n\n            case value_t::string:\n            {\n                if (add_prefix)\n                {\n                    oa->write_character(to_char_type('S'));\n                }\n                write_number_with_ubjson_prefix(j.m_value.string->size(), true);\n                oa->write_characters(\n                    reinterpret_cast<const CharType*>(j.m_value.string->c_str()),\n                    j.m_value.string->size());\n                break;\n            }\n\n            case value_t::array:\n            {\n                if (add_prefix)\n                {\n                    oa->write_character(to_char_type('['));\n                }\n\n                bool prefix_required = true;\n                if (use_type && !j.m_value.array->empty())\n                {\n                    JSON_ASSERT(use_count);\n                    const CharType first_prefix = ubjson_prefix(j.front());\n                    const bool same_prefix = std::all_of(j.begin() + 1, j.end(),\n                                                         [this, first_prefix](const BasicJsonType & v)\n                    {\n                        return ubjson_prefix(v) == first_prefix;\n                    });\n\n                    if (same_prefix)\n                    {\n                        prefix_required = false;\n                        oa->write_character(to_char_type('$'));\n                        oa->write_character(first_prefix);\n                    }\n                }\n\n                if (use_count)\n                {\n                    oa->write_character(to_char_type('#'));\n                    write_number_with_ubjson_prefix(j.m_value.array->size(), true);\n                }\n\n                for (const auto& el : *j.m_value.array)\n                {\n                    write_ubjson(el, use_count, use_type, prefix_required);\n                }\n\n                if (!use_count)\n                {\n                    oa->write_character(to_char_type(']'));\n                }\n\n                break;\n            }\n\n            case value_t::binary:\n            {\n                if (add_prefix)\n                {\n                    oa->write_character(to_char_type('['));\n                }\n\n                if (use_type && !j.m_value.binary->empty())\n                {\n                    JSON_ASSERT(use_count);\n                    oa->write_character(to_char_type('$'));\n                    oa->write_character('U');\n                }\n\n                if (use_count)\n                {\n                    oa->write_character(to_char_type('#'));\n                    write_number_with_ubjson_prefix(j.m_value.binary->size(), true);\n                }\n\n                if (use_type)\n                {\n                    oa->write_characters(\n                        reinterpret_cast<const CharType*>(j.m_value.binary->data()),\n                        j.m_value.binary->size());\n                }\n                else\n                {\n                    for (size_t i = 0; i < j.m_value.binary->size(); ++i)\n                    {\n                        oa->write_character(to_char_type('U'));\n                        oa->write_character(j.m_value.binary->data()[i]);\n                    }\n                }\n\n                if (!use_count)\n                {\n                    oa->write_character(to_char_type(']'));\n                }\n\n                break;\n            }\n\n            case value_t::object:\n            {\n                if (add_prefix)\n                {\n                    oa->write_character(to_char_type('{'));\n                }\n\n                bool prefix_required = true;\n                if (use_type && !j.m_value.object->empty())\n                {\n                    JSON_ASSERT(use_count);\n                    const CharType first_prefix = ubjson_prefix(j.front());\n                    const bool same_prefix = std::all_of(j.begin(), j.end(),\n                                                         [this, first_prefix](const BasicJsonType & v)\n                    {\n                        return ubjson_prefix(v) == first_prefix;\n                    });\n\n                    if (same_prefix)\n                    {\n                        prefix_required = false;\n                        oa->write_character(to_char_type('$'));\n                        oa->write_character(first_prefix);\n                    }\n                }\n\n                if (use_count)\n                {\n                    oa->write_character(to_char_type('#'));\n                    write_number_with_ubjson_prefix(j.m_value.object->size(), true);\n                }\n\n                for (const auto& el : *j.m_value.object)\n                {\n                    write_number_with_ubjson_prefix(el.first.size(), true);\n                    oa->write_characters(\n                        reinterpret_cast<const CharType*>(el.first.c_str()),\n                        el.first.size());\n                    write_ubjson(el.second, use_count, use_type, prefix_required);\n                }\n\n                if (!use_count)\n                {\n                    oa->write_character(to_char_type('}'));\n                }\n\n                break;\n            }\n\n            default:\n                break;\n        }\n    }\n\n  private:\n    //////////\n    // BSON //\n    //////////\n\n    /*!\n    @return The size of a BSON document entry header, including the id marker\n            and the entry name size (and its null-terminator).\n    */\n    static std::size_t calc_bson_entry_header_size(const string_t& name, const BasicJsonType& j)\n    {\n        const auto it = name.find(static_cast<typename string_t::value_type>(0));\n        if (JSON_HEDLEY_UNLIKELY(it != BasicJsonType::string_t::npos))\n        {\n            JSON_THROW(out_of_range::create(409, \"BSON key cannot contain code point U+0000 (at byte \" + std::to_string(it) + \")\", j));\n        }\n\n        return /*id*/ 1ul + name.size() + /*zero-terminator*/1u;\n    }\n\n    /*!\n    @brief Writes the given @a element_type and @a name to the output adapter\n    */\n    void write_bson_entry_header(const string_t& name,\n                                 const std::uint8_t element_type)\n    {\n        oa->write_character(to_char_type(element_type)); // boolean\n        oa->write_characters(\n            reinterpret_cast<const CharType*>(name.c_str()),\n            name.size() + 1u);\n    }\n\n    /*!\n    @brief Writes a BSON element with key @a name and boolean value @a value\n    */\n    void write_bson_boolean(const string_t& name,\n                            const bool value)\n    {\n        write_bson_entry_header(name, 0x08);\n        oa->write_character(value ? to_char_type(0x01) : to_char_type(0x00));\n    }\n\n    /*!\n    @brief Writes a BSON element with key @a name and double value @a value\n    */\n    void write_bson_double(const string_t& name,\n                           const double value)\n    {\n        write_bson_entry_header(name, 0x01);\n        write_number<double, true>(value);\n    }\n\n    /*!\n    @return The size of the BSON-encoded string in @a value\n    */\n    static std::size_t calc_bson_string_size(const string_t& value)\n    {\n        return sizeof(std::int32_t) + value.size() + 1ul;\n    }\n\n    /*!\n    @brief Writes a BSON element with key @a name and string value @a value\n    */\n    void write_bson_string(const string_t& name,\n                           const string_t& value)\n    {\n        write_bson_entry_header(name, 0x02);\n\n        write_number<std::int32_t, true>(static_cast<std::int32_t>(value.size() + 1ul));\n        oa->write_characters(\n            reinterpret_cast<const CharType*>(value.c_str()),\n            value.size() + 1);\n    }\n\n    /*!\n    @brief Writes a BSON element with key @a name and null value\n    */\n    void write_bson_null(const string_t& name)\n    {\n        write_bson_entry_header(name, 0x0A);\n    }\n\n    /*!\n    @return The size of the BSON-encoded integer @a value\n    */\n    static std::size_t calc_bson_integer_size(const std::int64_t value)\n    {\n        return (std::numeric_limits<std::int32_t>::min)() <= value && value <= (std::numeric_limits<std::int32_t>::max)()\n               ? sizeof(std::int32_t)\n               : sizeof(std::int64_t);\n    }\n\n    /*!\n    @brief Writes a BSON element with key @a name and integer @a value\n    */\n    void write_bson_integer(const string_t& name,\n                            const std::int64_t value)\n    {\n        if ((std::numeric_limits<std::int32_t>::min)() <= value && value <= (std::numeric_limits<std::int32_t>::max)())\n        {\n            write_bson_entry_header(name, 0x10); // int32\n            write_number<std::int32_t, true>(static_cast<std::int32_t>(value));\n        }\n        else\n        {\n            write_bson_entry_header(name, 0x12); // int64\n            write_number<std::int64_t, true>(static_cast<std::int64_t>(value));\n        }\n    }\n\n    /*!\n    @return The size of the BSON-encoded unsigned integer in @a j\n    */\n    static constexpr std::size_t calc_bson_unsigned_size(const std::uint64_t value) noexcept\n    {\n        return (value <= static_cast<std::uint64_t>((std::numeric_limits<std::int32_t>::max)()))\n               ? sizeof(std::int32_t)\n               : sizeof(std::int64_t);\n    }\n\n    /*!\n    @brief Writes a BSON element with key @a name and unsigned @a value\n    */\n    void write_bson_unsigned(const string_t& name,\n                             const BasicJsonType& j)\n    {\n        if (j.m_value.number_unsigned <= static_cast<std::uint64_t>((std::numeric_limits<std::int32_t>::max)()))\n        {\n            write_bson_entry_header(name, 0x10 /* int32 */);\n            write_number<std::int32_t, true>(static_cast<std::int32_t>(j.m_value.number_unsigned));\n        }\n        else if (j.m_value.number_unsigned <= static_cast<std::uint64_t>((std::numeric_limits<std::int64_t>::max)()))\n        {\n            write_bson_entry_header(name, 0x12 /* int64 */);\n            write_number<std::int64_t, true>(static_cast<std::int64_t>(j.m_value.number_unsigned));\n        }\n        else\n        {\n            JSON_THROW(out_of_range::create(407, \"integer number \" + std::to_string(j.m_value.number_unsigned) + \" cannot be represented by BSON as it does not fit int64\", j));\n        }\n    }\n\n    /*!\n    @brief Writes a BSON element with key @a name and object @a value\n    */\n    void write_bson_object_entry(const string_t& name,\n                                 const typename BasicJsonType::object_t& value)\n    {\n        write_bson_entry_header(name, 0x03); // object\n        write_bson_object(value);\n    }\n\n    /*!\n    @return The size of the BSON-encoded array @a value\n    */\n    static std::size_t calc_bson_array_size(const typename BasicJsonType::array_t& value)\n    {\n        std::size_t array_index = 0ul;\n\n        const std::size_t embedded_document_size = std::accumulate(std::begin(value), std::end(value), std::size_t(0), [&array_index](std::size_t result, const typename BasicJsonType::array_t::value_type & el)\n        {\n            return result + calc_bson_element_size(std::to_string(array_index++), el);\n        });\n\n        return sizeof(std::int32_t) + embedded_document_size + 1ul;\n    }\n\n    /*!\n    @return The size of the BSON-encoded binary array @a value\n    */\n    static std::size_t calc_bson_binary_size(const typename BasicJsonType::binary_t& value)\n    {\n        return sizeof(std::int32_t) + value.size() + 1ul;\n    }\n\n    /*!\n    @brief Writes a BSON element with key @a name and array @a value\n    */\n    void write_bson_array(const string_t& name,\n                          const typename BasicJsonType::array_t& value)\n    {\n        write_bson_entry_header(name, 0x04); // array\n        write_number<std::int32_t, true>(static_cast<std::int32_t>(calc_bson_array_size(value)));\n\n        std::size_t array_index = 0ul;\n\n        for (const auto& el : value)\n        {\n            write_bson_element(std::to_string(array_index++), el);\n        }\n\n        oa->write_character(to_char_type(0x00));\n    }\n\n    /*!\n    @brief Writes a BSON element with key @a name and binary value @a value\n    */\n    void write_bson_binary(const string_t& name,\n                           const binary_t& value)\n    {\n        write_bson_entry_header(name, 0x05);\n\n        write_number<std::int32_t, true>(static_cast<std::int32_t>(value.size()));\n        write_number(value.has_subtype() ? value.subtype() : std::uint8_t(0x00));\n\n        oa->write_characters(reinterpret_cast<const CharType*>(value.data()), value.size());\n    }\n\n    /*!\n    @brief Calculates the size necessary to serialize the JSON value @a j with its @a name\n    @return The calculated size for the BSON document entry for @a j with the given @a name.\n    */\n    static std::size_t calc_bson_element_size(const string_t& name,\n            const BasicJsonType& j)\n    {\n        const auto header_size = calc_bson_entry_header_size(name, j);\n        switch (j.type())\n        {\n            case value_t::object:\n                return header_size + calc_bson_object_size(*j.m_value.object);\n\n            case value_t::array:\n                return header_size + calc_bson_array_size(*j.m_value.array);\n\n            case value_t::binary:\n                return header_size + calc_bson_binary_size(*j.m_value.binary);\n\n            case value_t::boolean:\n                return header_size + 1ul;\n\n            case value_t::number_float:\n                return header_size + 8ul;\n\n            case value_t::number_integer:\n                return header_size + calc_bson_integer_size(j.m_value.number_integer);\n\n            case value_t::number_unsigned:\n                return header_size + calc_bson_unsigned_size(j.m_value.number_unsigned);\n\n            case value_t::string:\n                return header_size + calc_bson_string_size(*j.m_value.string);\n\n            case value_t::null:\n                return header_size + 0ul;\n\n            // LCOV_EXCL_START\n            default:\n                JSON_ASSERT(false); // NOLINT(cert-dcl03-c,hicpp-static-assert,misc-static-assert)\n                return 0ul;\n                // LCOV_EXCL_STOP\n        }\n    }\n\n    /*!\n    @brief Serializes the JSON value @a j to BSON and associates it with the\n           key @a name.\n    @param name The name to associate with the JSON entity @a j within the\n                current BSON document\n    */\n    void write_bson_element(const string_t& name,\n                            const BasicJsonType& j)\n    {\n        switch (j.type())\n        {\n            case value_t::object:\n                return write_bson_object_entry(name, *j.m_value.object);\n\n            case value_t::array:\n                return write_bson_array(name, *j.m_value.array);\n\n            case value_t::binary:\n                return write_bson_binary(name, *j.m_value.binary);\n\n            case value_t::boolean:\n                return write_bson_boolean(name, j.m_value.boolean);\n\n            case value_t::number_float:\n                return write_bson_double(name, j.m_value.number_float);\n\n            case value_t::number_integer:\n                return write_bson_integer(name, j.m_value.number_integer);\n\n            case value_t::number_unsigned:\n                return write_bson_unsigned(name, j);\n\n            case value_t::string:\n                return write_bson_string(name, *j.m_value.string);\n\n            case value_t::null:\n                return write_bson_null(name);\n\n            // LCOV_EXCL_START\n            default:\n                JSON_ASSERT(false); // NOLINT(cert-dcl03-c,hicpp-static-assert,misc-static-assert)\n                return;\n                // LCOV_EXCL_STOP\n        }\n    }\n\n    /*!\n    @brief Calculates the size of the BSON serialization of the given\n           JSON-object @a j.\n    @param[in] value  JSON value to serialize\n    @pre       value.type() == value_t::object\n    */\n    static std::size_t calc_bson_object_size(const typename BasicJsonType::object_t& value)\n    {\n        std::size_t document_size = std::accumulate(value.begin(), value.end(), std::size_t(0),\n                                    [](size_t result, const typename BasicJsonType::object_t::value_type & el)\n        {\n            return result += calc_bson_element_size(el.first, el.second);\n        });\n\n        return sizeof(std::int32_t) + document_size + 1ul;\n    }\n\n    /*!\n    @param[in] value  JSON value to serialize\n    @pre       value.type() == value_t::object\n    */\n    void write_bson_object(const typename BasicJsonType::object_t& value)\n    {\n        write_number<std::int32_t, true>(static_cast<std::int32_t>(calc_bson_object_size(value)));\n\n        for (const auto& el : value)\n        {\n            write_bson_element(el.first, el.second);\n        }\n\n        oa->write_character(to_char_type(0x00));\n    }\n\n    //////////\n    // CBOR //\n    //////////\n\n    static constexpr CharType get_cbor_float_prefix(float /*unused*/)\n    {\n        return to_char_type(0xFA);  // Single-Precision Float\n    }\n\n    static constexpr CharType get_cbor_float_prefix(double /*unused*/)\n    {\n        return to_char_type(0xFB);  // Double-Precision Float\n    }\n\n    /////////////\n    // MsgPack //\n    /////////////\n\n    static constexpr CharType get_msgpack_float_prefix(float /*unused*/)\n    {\n        return to_char_type(0xCA);  // float 32\n    }\n\n    static constexpr CharType get_msgpack_float_prefix(double /*unused*/)\n    {\n        return to_char_type(0xCB);  // float 64\n    }\n\n    ////////////\n    // UBJSON //\n    ////////////\n\n    // UBJSON: write number (floating point)\n    template<typename NumberType, typename std::enable_if<\n                 std::is_floating_point<NumberType>::value, int>::type = 0>\n    void write_number_with_ubjson_prefix(const NumberType n,\n                                         const bool add_prefix)\n    {\n        if (add_prefix)\n        {\n            oa->write_character(get_ubjson_float_prefix(n));\n        }\n        write_number(n);\n    }\n\n    // UBJSON: write number (unsigned integer)\n    template<typename NumberType, typename std::enable_if<\n                 std::is_unsigned<NumberType>::value, int>::type = 0>\n    void write_number_with_ubjson_prefix(const NumberType n,\n                                         const bool add_prefix)\n    {\n        if (n <= static_cast<std::uint64_t>((std::numeric_limits<std::int8_t>::max)()))\n        {\n            if (add_prefix)\n            {\n                oa->write_character(to_char_type('i'));  // int8\n            }\n            write_number(static_cast<std::uint8_t>(n));\n        }\n        else if (n <= (std::numeric_limits<std::uint8_t>::max)())\n        {\n            if (add_prefix)\n            {\n                oa->write_character(to_char_type('U'));  // uint8\n            }\n            write_number(static_cast<std::uint8_t>(n));\n        }\n        else if (n <= static_cast<std::uint64_t>((std::numeric_limits<std::int16_t>::max)()))\n        {\n            if (add_prefix)\n            {\n                oa->write_character(to_char_type('I'));  // int16\n            }\n            write_number(static_cast<std::int16_t>(n));\n        }\n        else if (n <= static_cast<std::uint64_t>((std::numeric_limits<std::int32_t>::max)()))\n        {\n            if (add_prefix)\n            {\n                oa->write_character(to_char_type('l'));  // int32\n            }\n            write_number(static_cast<std::int32_t>(n));\n        }\n        else if (n <= static_cast<std::uint64_t>((std::numeric_limits<std::int64_t>::max)()))\n        {\n            if (add_prefix)\n            {\n                oa->write_character(to_char_type('L'));  // int64\n            }\n            write_number(static_cast<std::int64_t>(n));\n        }\n        else\n        {\n            if (add_prefix)\n            {\n                oa->write_character(to_char_type('H'));  // high-precision number\n            }\n\n            const auto number = BasicJsonType(n).dump();\n            write_number_with_ubjson_prefix(number.size(), true);\n            for (std::size_t i = 0; i < number.size(); ++i)\n            {\n                oa->write_character(to_char_type(static_cast<std::uint8_t>(number[i])));\n            }\n        }\n    }\n\n    // UBJSON: write number (signed integer)\n    template < typename NumberType, typename std::enable_if <\n                   std::is_signed<NumberType>::value&&\n                   !std::is_floating_point<NumberType>::value, int >::type = 0 >\n    void write_number_with_ubjson_prefix(const NumberType n,\n                                         const bool add_prefix)\n    {\n        if ((std::numeric_limits<std::int8_t>::min)() <= n && n <= (std::numeric_limits<std::int8_t>::max)())\n        {\n            if (add_prefix)\n            {\n                oa->write_character(to_char_type('i'));  // int8\n            }\n            write_number(static_cast<std::int8_t>(n));\n        }\n        else if (static_cast<std::int64_t>((std::numeric_limits<std::uint8_t>::min)()) <= n && n <= static_cast<std::int64_t>((std::numeric_limits<std::uint8_t>::max)()))\n        {\n            if (add_prefix)\n            {\n                oa->write_character(to_char_type('U'));  // uint8\n            }\n            write_number(static_cast<std::uint8_t>(n));\n        }\n        else if ((std::numeric_limits<std::int16_t>::min)() <= n && n <= (std::numeric_limits<std::int16_t>::max)())\n        {\n            if (add_prefix)\n            {\n                oa->write_character(to_char_type('I'));  // int16\n            }\n            write_number(static_cast<std::int16_t>(n));\n        }\n        else if ((std::numeric_limits<std::int32_t>::min)() <= n && n <= (std::numeric_limits<std::int32_t>::max)())\n        {\n            if (add_prefix)\n            {\n                oa->write_character(to_char_type('l'));  // int32\n            }\n            write_number(static_cast<std::int32_t>(n));\n        }\n        else if ((std::numeric_limits<std::int64_t>::min)() <= n && n <= (std::numeric_limits<std::int64_t>::max)())\n        {\n            if (add_prefix)\n            {\n                oa->write_character(to_char_type('L'));  // int64\n            }\n            write_number(static_cast<std::int64_t>(n));\n        }\n        // LCOV_EXCL_START\n        else\n        {\n            if (add_prefix)\n            {\n                oa->write_character(to_char_type('H'));  // high-precision number\n            }\n\n            const auto number = BasicJsonType(n).dump();\n            write_number_with_ubjson_prefix(number.size(), true);\n            for (std::size_t i = 0; i < number.size(); ++i)\n            {\n                oa->write_character(to_char_type(static_cast<std::uint8_t>(number[i])));\n            }\n        }\n        // LCOV_EXCL_STOP\n    }\n\n    /*!\n    @brief determine the type prefix of container values\n    */\n    CharType ubjson_prefix(const BasicJsonType& j) const noexcept\n    {\n        switch (j.type())\n        {\n            case value_t::null:\n                return 'Z';\n\n            case value_t::boolean:\n                return j.m_value.boolean ? 'T' : 'F';\n\n            case value_t::number_integer:\n            {\n                if ((std::numeric_limits<std::int8_t>::min)() <= j.m_value.number_integer && j.m_value.number_integer <= (std::numeric_limits<std::int8_t>::max)())\n                {\n                    return 'i';\n                }\n                if ((std::numeric_limits<std::uint8_t>::min)() <= j.m_value.number_integer && j.m_value.number_integer <= (std::numeric_limits<std::uint8_t>::max)())\n                {\n                    return 'U';\n                }\n                if ((std::numeric_limits<std::int16_t>::min)() <= j.m_value.number_integer && j.m_value.number_integer <= (std::numeric_limits<std::int16_t>::max)())\n                {\n                    return 'I';\n                }\n                if ((std::numeric_limits<std::int32_t>::min)() <= j.m_value.number_integer && j.m_value.number_integer <= (std::numeric_limits<std::int32_t>::max)())\n                {\n                    return 'l';\n                }\n                if ((std::numeric_limits<std::int64_t>::min)() <= j.m_value.number_integer && j.m_value.number_integer <= (std::numeric_limits<std::int64_t>::max)())\n                {\n                    return 'L';\n                }\n                // anything else is treated as high-precision number\n                return 'H'; // LCOV_EXCL_LINE\n            }\n\n            case value_t::number_unsigned:\n            {\n                if (j.m_value.number_unsigned <= static_cast<std::uint64_t>((std::numeric_limits<std::int8_t>::max)()))\n                {\n                    return 'i';\n                }\n                if (j.m_value.number_unsigned <= static_cast<std::uint64_t>((std::numeric_limits<std::uint8_t>::max)()))\n                {\n                    return 'U';\n                }\n                if (j.m_value.number_unsigned <= static_cast<std::uint64_t>((std::numeric_limits<std::int16_t>::max)()))\n                {\n                    return 'I';\n                }\n                if (j.m_value.number_unsigned <= static_cast<std::uint64_t>((std::numeric_limits<std::int32_t>::max)()))\n                {\n                    return 'l';\n                }\n                if (j.m_value.number_unsigned <= static_cast<std::uint64_t>((std::numeric_limits<std::int64_t>::max)()))\n                {\n                    return 'L';\n                }\n                // anything else is treated as high-precision number\n                return 'H'; // LCOV_EXCL_LINE\n            }\n\n            case value_t::number_float:\n                return get_ubjson_float_prefix(j.m_value.number_float);\n\n            case value_t::string:\n                return 'S';\n\n            case value_t::array: // fallthrough\n            case value_t::binary:\n                return '[';\n\n            case value_t::object:\n                return '{';\n\n            default:  // discarded values\n                return 'N';\n        }\n    }\n\n    static constexpr CharType get_ubjson_float_prefix(float /*unused*/)\n    {\n        return 'd';  // float 32\n    }\n\n    static constexpr CharType get_ubjson_float_prefix(double /*unused*/)\n    {\n        return 'D';  // float 64\n    }\n\n    ///////////////////////\n    // Utility functions //\n    ///////////////////////\n\n    /*\n    @brief write a number to output input\n    @param[in] n number of type @a NumberType\n    @tparam NumberType the type of the number\n    @tparam OutputIsLittleEndian Set to true if output data is\n                                 required to be little endian\n\n    @note This function needs to respect the system's endianess, because bytes\n          in CBOR, MessagePack, and UBJSON are stored in network order (big\n          endian) and therefore need reordering on little endian systems.\n    */\n    template<typename NumberType, bool OutputIsLittleEndian = false>\n    void write_number(const NumberType n)\n    {\n        // step 1: write number to array of length NumberType\n        std::array<CharType, sizeof(NumberType)> vec{};\n        std::memcpy(vec.data(), &n, sizeof(NumberType));\n\n        // step 2: write array to output (with possible reordering)\n        if (is_little_endian != OutputIsLittleEndian)\n        {\n            // reverse byte order prior to conversion if necessary\n            std::reverse(vec.begin(), vec.end());\n        }\n\n        oa->write_characters(vec.data(), sizeof(NumberType));\n    }\n\n    void write_compact_float(const number_float_t n, detail::input_format_t format)\n    {\n        if (static_cast<double>(n) >= static_cast<double>(std::numeric_limits<float>::lowest()) &&\n                static_cast<double>(n) <= static_cast<double>((std::numeric_limits<float>::max)()) &&\n                static_cast<double>(static_cast<float>(n)) == static_cast<double>(n))\n        {\n            oa->write_character(format == detail::input_format_t::cbor\n                                ? get_cbor_float_prefix(static_cast<float>(n))\n                                : get_msgpack_float_prefix(static_cast<float>(n)));\n            write_number(static_cast<float>(n));\n        }\n        else\n        {\n            oa->write_character(format == detail::input_format_t::cbor\n                                ? get_cbor_float_prefix(n)\n                                : get_msgpack_float_prefix(n));\n            write_number(n);\n        }\n    }\n\n  public:\n    // The following to_char_type functions are implement the conversion\n    // between uint8_t and CharType. In case CharType is not unsigned,\n    // such a conversion is required to allow values greater than 128.\n    // See <https://github.com/nlohmann/json/issues/1286> for a discussion.\n    template < typename C = CharType,\n               enable_if_t < std::is_signed<C>::value && std::is_signed<char>::value > * = nullptr >\n    static constexpr CharType to_char_type(std::uint8_t x) noexcept\n    {\n        return *reinterpret_cast<char*>(&x);\n    }\n\n    template < typename C = CharType,\n               enable_if_t < std::is_signed<C>::value && std::is_unsigned<char>::value > * = nullptr >\n    static CharType to_char_type(std::uint8_t x) noexcept\n    {\n        static_assert(sizeof(std::uint8_t) == sizeof(CharType), \"size of CharType must be equal to std::uint8_t\");\n        static_assert(std::is_trivial<CharType>::value, \"CharType must be trivial\");\n        CharType result;\n        std::memcpy(&result, &x, sizeof(x));\n        return result;\n    }\n\n    template<typename C = CharType,\n             enable_if_t<std::is_unsigned<C>::value>* = nullptr>\n    static constexpr CharType to_char_type(std::uint8_t x) noexcept\n    {\n        return x;\n    }\n\n    template < typename InputCharType, typename C = CharType,\n               enable_if_t <\n                   std::is_signed<C>::value &&\n                   std::is_signed<char>::value &&\n                   std::is_same<char, typename std::remove_cv<InputCharType>::type>::value\n                   > * = nullptr >\n    static constexpr CharType to_char_type(InputCharType x) noexcept\n    {\n        return x;\n    }\n\n  private:\n    /// whether we can assume little endianess\n    const bool is_little_endian = little_endianess();\n\n    /// the output\n    output_adapter_t<CharType> oa = nullptr;\n};\n}  // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/output/output_adapters.hpp>\n\n// #include <nlohmann/detail/output/serializer.hpp>\n\n\n#include <algorithm> // reverse, remove, fill, find, none_of\n#include <array> // array\n#include <clocale> // localeconv, lconv\n#include <cmath> // labs, isfinite, isnan, signbit\n#include <cstddef> // size_t, ptrdiff_t\n#include <cstdint> // uint8_t\n#include <cstdio> // snprintf\n#include <limits> // numeric_limits\n#include <string> // string, char_traits\n#include <type_traits> // is_same\n#include <utility> // move\n\n// #include <nlohmann/detail/conversions/to_chars.hpp>\n\n\n#include <array> // array\n#include <cmath>   // signbit, isfinite\n#include <cstdint> // intN_t, uintN_t\n#include <cstring> // memcpy, memmove\n#include <limits> // numeric_limits\n#include <type_traits> // conditional\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\n\n/*!\n@brief implements the Grisu2 algorithm for binary to decimal floating-point\nconversion.\n\nThis implementation is a slightly modified version of the reference\nimplementation which may be obtained from\nhttp://florian.loitsch.com/publications (bench.tar.gz).\n\nThe code is distributed under the MIT license, Copyright (c) 2009 Florian Loitsch.\n\nFor a detailed description of the algorithm see:\n\n[1] Loitsch, \"Printing Floating-Point Numbers Quickly and Accurately with\n    Integers\", Proceedings of the ACM SIGPLAN 2010 Conference on Programming\n    Language Design and Implementation, PLDI 2010\n[2] Burger, Dybvig, \"Printing Floating-Point Numbers Quickly and Accurately\",\n    Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language\n    Design and Implementation, PLDI 1996\n*/\nnamespace dtoa_impl\n{\n\ntemplate<typename Target, typename Source>\nTarget reinterpret_bits(const Source source)\n{\n    static_assert(sizeof(Target) == sizeof(Source), \"size mismatch\");\n\n    Target target;\n    std::memcpy(&target, &source, sizeof(Source));\n    return target;\n}\n\nstruct diyfp // f * 2^e\n{\n    static constexpr int kPrecision = 64; // = q\n\n    std::uint64_t f = 0;\n    int e = 0;\n\n    constexpr diyfp(std::uint64_t f_, int e_) noexcept : f(f_), e(e_) {}\n\n    /*!\n    @brief returns x - y\n    @pre x.e == y.e and x.f >= y.f\n    */\n    static diyfp sub(const diyfp& x, const diyfp& y) noexcept\n    {\n        JSON_ASSERT(x.e == y.e);\n        JSON_ASSERT(x.f >= y.f);\n\n        return {x.f - y.f, x.e};\n    }\n\n    /*!\n    @brief returns x * y\n    @note The result is rounded. (Only the upper q bits are returned.)\n    */\n    static diyfp mul(const diyfp& x, const diyfp& y) noexcept\n    {\n        static_assert(kPrecision == 64, \"internal error\");\n\n        // Computes:\n        //  f = round((x.f * y.f) / 2^q)\n        //  e = x.e + y.e + q\n\n        // Emulate the 64-bit * 64-bit multiplication:\n        //\n        // p = u * v\n        //   = (u_lo + 2^32 u_hi) (v_lo + 2^32 v_hi)\n        //   = (u_lo v_lo         ) + 2^32 ((u_lo v_hi         ) + (u_hi v_lo         )) + 2^64 (u_hi v_hi         )\n        //   = (p0                ) + 2^32 ((p1                ) + (p2                )) + 2^64 (p3                )\n        //   = (p0_lo + 2^32 p0_hi) + 2^32 ((p1_lo + 2^32 p1_hi) + (p2_lo + 2^32 p2_hi)) + 2^64 (p3                )\n        //   = (p0_lo             ) + 2^32 (p0_hi + p1_lo + p2_lo                      ) + 2^64 (p1_hi + p2_hi + p3)\n        //   = (p0_lo             ) + 2^32 (Q                                          ) + 2^64 (H                 )\n        //   = (p0_lo             ) + 2^32 (Q_lo + 2^32 Q_hi                           ) + 2^64 (H                 )\n        //\n        // (Since Q might be larger than 2^32 - 1)\n        //\n        //   = (p0_lo + 2^32 Q_lo) + 2^64 (Q_hi + H)\n        //\n        // (Q_hi + H does not overflow a 64-bit int)\n        //\n        //   = p_lo + 2^64 p_hi\n\n        const std::uint64_t u_lo = x.f & 0xFFFFFFFFu;\n        const std::uint64_t u_hi = x.f >> 32u;\n        const std::uint64_t v_lo = y.f & 0xFFFFFFFFu;\n        const std::uint64_t v_hi = y.f >> 32u;\n\n        const std::uint64_t p0 = u_lo * v_lo;\n        const std::uint64_t p1 = u_lo * v_hi;\n        const std::uint64_t p2 = u_hi * v_lo;\n        const std::uint64_t p3 = u_hi * v_hi;\n\n        const std::uint64_t p0_hi = p0 >> 32u;\n        const std::uint64_t p1_lo = p1 & 0xFFFFFFFFu;\n        const std::uint64_t p1_hi = p1 >> 32u;\n        const std::uint64_t p2_lo = p2 & 0xFFFFFFFFu;\n        const std::uint64_t p2_hi = p2 >> 32u;\n\n        std::uint64_t Q = p0_hi + p1_lo + p2_lo;\n\n        // The full product might now be computed as\n        //\n        // p_hi = p3 + p2_hi + p1_hi + (Q >> 32)\n        // p_lo = p0_lo + (Q << 32)\n        //\n        // But in this particular case here, the full p_lo is not required.\n        // Effectively we only need to add the highest bit in p_lo to p_hi (and\n        // Q_hi + 1 does not overflow).\n\n        Q += std::uint64_t{1} << (64u - 32u - 1u); // round, ties up\n\n        const std::uint64_t h = p3 + p2_hi + p1_hi + (Q >> 32u);\n\n        return {h, x.e + y.e + 64};\n    }\n\n    /*!\n    @brief normalize x such that the significand is >= 2^(q-1)\n    @pre x.f != 0\n    */\n    static diyfp normalize(diyfp x) noexcept\n    {\n        JSON_ASSERT(x.f != 0);\n\n        while ((x.f >> 63u) == 0)\n        {\n            x.f <<= 1u;\n            x.e--;\n        }\n\n        return x;\n    }\n\n    /*!\n    @brief normalize x such that the result has the exponent E\n    @pre e >= x.e and the upper e - x.e bits of x.f must be zero.\n    */\n    static diyfp normalize_to(const diyfp& x, const int target_exponent) noexcept\n    {\n        const int delta = x.e - target_exponent;\n\n        JSON_ASSERT(delta >= 0);\n        JSON_ASSERT(((x.f << delta) >> delta) == x.f);\n\n        return {x.f << delta, target_exponent};\n    }\n};\n\nstruct boundaries\n{\n    diyfp w;\n    diyfp minus;\n    diyfp plus;\n};\n\n/*!\nCompute the (normalized) diyfp representing the input number 'value' and its\nboundaries.\n\n@pre value must be finite and positive\n*/\ntemplate<typename FloatType>\nboundaries compute_boundaries(FloatType value)\n{\n    JSON_ASSERT(std::isfinite(value));\n    JSON_ASSERT(value > 0);\n\n    // Convert the IEEE representation into a diyfp.\n    //\n    // If v is denormal:\n    //      value = 0.F * 2^(1 - bias) = (          F) * 2^(1 - bias - (p-1))\n    // If v is normalized:\n    //      value = 1.F * 2^(E - bias) = (2^(p-1) + F) * 2^(E - bias - (p-1))\n\n    static_assert(std::numeric_limits<FloatType>::is_iec559,\n                  \"internal error: dtoa_short requires an IEEE-754 floating-point implementation\");\n\n    constexpr int      kPrecision = std::numeric_limits<FloatType>::digits; // = p (includes the hidden bit)\n    constexpr int      kBias      = std::numeric_limits<FloatType>::max_exponent - 1 + (kPrecision - 1);\n    constexpr int      kMinExp    = 1 - kBias;\n    constexpr std::uint64_t kHiddenBit = std::uint64_t{1} << (kPrecision - 1); // = 2^(p-1)\n\n    using bits_type = typename std::conditional<kPrecision == 24, std::uint32_t, std::uint64_t >::type;\n\n    const auto bits = static_cast<std::uint64_t>(reinterpret_bits<bits_type>(value));\n    const std::uint64_t E = bits >> (kPrecision - 1);\n    const std::uint64_t F = bits & (kHiddenBit - 1);\n\n    const bool is_denormal = E == 0;\n    const diyfp v = is_denormal\n                    ? diyfp(F, kMinExp)\n                    : diyfp(F + kHiddenBit, static_cast<int>(E) - kBias);\n\n    // Compute the boundaries m- and m+ of the floating-point value\n    // v = f * 2^e.\n    //\n    // Determine v- and v+, the floating-point predecessor and successor if v,\n    // respectively.\n    //\n    //      v- = v - 2^e        if f != 2^(p-1) or e == e_min                (A)\n    //         = v - 2^(e-1)    if f == 2^(p-1) and e > e_min                (B)\n    //\n    //      v+ = v + 2^e\n    //\n    // Let m- = (v- + v) / 2 and m+ = (v + v+) / 2. All real numbers _strictly_\n    // between m- and m+ round to v, regardless of how the input rounding\n    // algorithm breaks ties.\n    //\n    //      ---+-------------+-------------+-------------+-------------+---  (A)\n    //         v-            m-            v             m+            v+\n    //\n    //      -----------------+------+------+-------------+-------------+---  (B)\n    //                       v-     m-     v             m+            v+\n\n    const bool lower_boundary_is_closer = F == 0 && E > 1;\n    const diyfp m_plus = diyfp(2 * v.f + 1, v.e - 1);\n    const diyfp m_minus = lower_boundary_is_closer\n                          ? diyfp(4 * v.f - 1, v.e - 2)  // (B)\n                          : diyfp(2 * v.f - 1, v.e - 1); // (A)\n\n    // Determine the normalized w+ = m+.\n    const diyfp w_plus = diyfp::normalize(m_plus);\n\n    // Determine w- = m- such that e_(w-) = e_(w+).\n    const diyfp w_minus = diyfp::normalize_to(m_minus, w_plus.e);\n\n    return {diyfp::normalize(v), w_minus, w_plus};\n}\n\n// Given normalized diyfp w, Grisu needs to find a (normalized) cached\n// power-of-ten c, such that the exponent of the product c * w = f * 2^e lies\n// within a certain range [alpha, gamma] (Definition 3.2 from [1])\n//\n//      alpha <= e = e_c + e_w + q <= gamma\n//\n// or\n//\n//      f_c * f_w * 2^alpha <= f_c 2^(e_c) * f_w 2^(e_w) * 2^q\n//                          <= f_c * f_w * 2^gamma\n//\n// Since c and w are normalized, i.e. 2^(q-1) <= f < 2^q, this implies\n//\n//      2^(q-1) * 2^(q-1) * 2^alpha <= c * w * 2^q < 2^q * 2^q * 2^gamma\n//\n// or\n//\n//      2^(q - 2 + alpha) <= c * w < 2^(q + gamma)\n//\n// The choice of (alpha,gamma) determines the size of the table and the form of\n// the digit generation procedure. Using (alpha,gamma)=(-60,-32) works out well\n// in practice:\n//\n// The idea is to cut the number c * w = f * 2^e into two parts, which can be\n// processed independently: An integral part p1, and a fractional part p2:\n//\n//      f * 2^e = ( (f div 2^-e) * 2^-e + (f mod 2^-e) ) * 2^e\n//              = (f div 2^-e) + (f mod 2^-e) * 2^e\n//              = p1 + p2 * 2^e\n//\n// The conversion of p1 into decimal form requires a series of divisions and\n// modulos by (a power of) 10. These operations are faster for 32-bit than for\n// 64-bit integers, so p1 should ideally fit into a 32-bit integer. This can be\n// achieved by choosing\n//\n//      -e >= 32   or   e <= -32 := gamma\n//\n// In order to convert the fractional part\n//\n//      p2 * 2^e = p2 / 2^-e = d[-1] / 10^1 + d[-2] / 10^2 + ...\n//\n// into decimal form, the fraction is repeatedly multiplied by 10 and the digits\n// d[-i] are extracted in order:\n//\n//      (10 * p2) div 2^-e = d[-1]\n//      (10 * p2) mod 2^-e = d[-2] / 10^1 + ...\n//\n// The multiplication by 10 must not overflow. It is sufficient to choose\n//\n//      10 * p2 < 16 * p2 = 2^4 * p2 <= 2^64.\n//\n// Since p2 = f mod 2^-e < 2^-e,\n//\n//      -e <= 60   or   e >= -60 := alpha\n\nconstexpr int kAlpha = -60;\nconstexpr int kGamma = -32;\n\nstruct cached_power // c = f * 2^e ~= 10^k\n{\n    std::uint64_t f;\n    int e;\n    int k;\n};\n\n/*!\nFor a normalized diyfp w = f * 2^e, this function returns a (normalized) cached\npower-of-ten c = f_c * 2^e_c, such that the exponent of the product w * c\nsatisfies (Definition 3.2 from [1])\n\n     alpha <= e_c + e + q <= gamma.\n*/\ninline cached_power get_cached_power_for_binary_exponent(int e)\n{\n    // Now\n    //\n    //      alpha <= e_c + e + q <= gamma                                    (1)\n    //      ==> f_c * 2^alpha <= c * 2^e * 2^q\n    //\n    // and since the c's are normalized, 2^(q-1) <= f_c,\n    //\n    //      ==> 2^(q - 1 + alpha) <= c * 2^(e + q)\n    //      ==> 2^(alpha - e - 1) <= c\n    //\n    // If c were an exact power of ten, i.e. c = 10^k, one may determine k as\n    //\n    //      k = ceil( log_10( 2^(alpha - e - 1) ) )\n    //        = ceil( (alpha - e - 1) * log_10(2) )\n    //\n    // From the paper:\n    // \"In theory the result of the procedure could be wrong since c is rounded,\n    //  and the computation itself is approximated [...]. In practice, however,\n    //  this simple function is sufficient.\"\n    //\n    // For IEEE double precision floating-point numbers converted into\n    // normalized diyfp's w = f * 2^e, with q = 64,\n    //\n    //      e >= -1022      (min IEEE exponent)\n    //           -52        (p - 1)\n    //           -52        (p - 1, possibly normalize denormal IEEE numbers)\n    //           -11        (normalize the diyfp)\n    //         = -1137\n    //\n    // and\n    //\n    //      e <= +1023      (max IEEE exponent)\n    //           -52        (p - 1)\n    //           -11        (normalize the diyfp)\n    //         = 960\n    //\n    // This binary exponent range [-1137,960] results in a decimal exponent\n    // range [-307,324]. One does not need to store a cached power for each\n    // k in this range. For each such k it suffices to find a cached power\n    // such that the exponent of the product lies in [alpha,gamma].\n    // This implies that the difference of the decimal exponents of adjacent\n    // table entries must be less than or equal to\n    //\n    //      floor( (gamma - alpha) * log_10(2) ) = 8.\n    //\n    // (A smaller distance gamma-alpha would require a larger table.)\n\n    // NB:\n    // Actually this function returns c, such that -60 <= e_c + e + 64 <= -34.\n\n    constexpr int kCachedPowersMinDecExp = -300;\n    constexpr int kCachedPowersDecStep = 8;\n\n    static constexpr std::array<cached_power, 79> kCachedPowers =\n    {\n        {\n            { 0xAB70FE17C79AC6CA, -1060, -300 },\n            { 0xFF77B1FCBEBCDC4F, -1034, -292 },\n            { 0xBE5691EF416BD60C, -1007, -284 },\n            { 0x8DD01FAD907FFC3C,  -980, -276 },\n            { 0xD3515C2831559A83,  -954, -268 },\n            { 0x9D71AC8FADA6C9B5,  -927, -260 },\n            { 0xEA9C227723EE8BCB,  -901, -252 },\n            { 0xAECC49914078536D,  -874, -244 },\n            { 0x823C12795DB6CE57,  -847, -236 },\n            { 0xC21094364DFB5637,  -821, -228 },\n            { 0x9096EA6F3848984F,  -794, -220 },\n            { 0xD77485CB25823AC7,  -768, -212 },\n            { 0xA086CFCD97BF97F4,  -741, -204 },\n            { 0xEF340A98172AACE5,  -715, -196 },\n            { 0xB23867FB2A35B28E,  -688, -188 },\n            { 0x84C8D4DFD2C63F3B,  -661, -180 },\n            { 0xC5DD44271AD3CDBA,  -635, -172 },\n            { 0x936B9FCEBB25C996,  -608, -164 },\n            { 0xDBAC6C247D62A584,  -582, -156 },\n            { 0xA3AB66580D5FDAF6,  -555, -148 },\n            { 0xF3E2F893DEC3F126,  -529, -140 },\n            { 0xB5B5ADA8AAFF80B8,  -502, -132 },\n            { 0x87625F056C7C4A8B,  -475, -124 },\n            { 0xC9BCFF6034C13053,  -449, -116 },\n            { 0x964E858C91BA2655,  -422, -108 },\n            { 0xDFF9772470297EBD,  -396, -100 },\n            { 0xA6DFBD9FB8E5B88F,  -369,  -92 },\n            { 0xF8A95FCF88747D94,  -343,  -84 },\n            { 0xB94470938FA89BCF,  -316,  -76 },\n            { 0x8A08F0F8BF0F156B,  -289,  -68 },\n            { 0xCDB02555653131B6,  -263,  -60 },\n            { 0x993FE2C6D07B7FAC,  -236,  -52 },\n            { 0xE45C10C42A2B3B06,  -210,  -44 },\n            { 0xAA242499697392D3,  -183,  -36 },\n            { 0xFD87B5F28300CA0E,  -157,  -28 },\n            { 0xBCE5086492111AEB,  -130,  -20 },\n            { 0x8CBCCC096F5088CC,  -103,  -12 },\n            { 0xD1B71758E219652C,   -77,   -4 },\n            { 0x9C40000000000000,   -50,    4 },\n            { 0xE8D4A51000000000,   -24,   12 },\n            { 0xAD78EBC5AC620000,     3,   20 },\n            { 0x813F3978F8940984,    30,   28 },\n            { 0xC097CE7BC90715B3,    56,   36 },\n            { 0x8F7E32CE7BEA5C70,    83,   44 },\n            { 0xD5D238A4ABE98068,   109,   52 },\n            { 0x9F4F2726179A2245,   136,   60 },\n            { 0xED63A231D4C4FB27,   162,   68 },\n            { 0xB0DE65388CC8ADA8,   189,   76 },\n            { 0x83C7088E1AAB65DB,   216,   84 },\n            { 0xC45D1DF942711D9A,   242,   92 },\n            { 0x924D692CA61BE758,   269,  100 },\n            { 0xDA01EE641A708DEA,   295,  108 },\n            { 0xA26DA3999AEF774A,   322,  116 },\n            { 0xF209787BB47D6B85,   348,  124 },\n            { 0xB454E4A179DD1877,   375,  132 },\n            { 0x865B86925B9BC5C2,   402,  140 },\n            { 0xC83553C5C8965D3D,   428,  148 },\n            { 0x952AB45CFA97A0B3,   455,  156 },\n            { 0xDE469FBD99A05FE3,   481,  164 },\n            { 0xA59BC234DB398C25,   508,  172 },\n            { 0xF6C69A72A3989F5C,   534,  180 },\n            { 0xB7DCBF5354E9BECE,   561,  188 },\n            { 0x88FCF317F22241E2,   588,  196 },\n            { 0xCC20CE9BD35C78A5,   614,  204 },\n            { 0x98165AF37B2153DF,   641,  212 },\n            { 0xE2A0B5DC971F303A,   667,  220 },\n            { 0xA8D9D1535CE3B396,   694,  228 },\n            { 0xFB9B7CD9A4A7443C,   720,  236 },\n            { 0xBB764C4CA7A44410,   747,  244 },\n            { 0x8BAB8EEFB6409C1A,   774,  252 },\n            { 0xD01FEF10A657842C,   800,  260 },\n            { 0x9B10A4E5E9913129,   827,  268 },\n            { 0xE7109BFBA19C0C9D,   853,  276 },\n            { 0xAC2820D9623BF429,   880,  284 },\n            { 0x80444B5E7AA7CF85,   907,  292 },\n            { 0xBF21E44003ACDD2D,   933,  300 },\n            { 0x8E679C2F5E44FF8F,   960,  308 },\n            { 0xD433179D9C8CB841,   986,  316 },\n            { 0x9E19DB92B4E31BA9,  1013,  324 },\n        }\n    };\n\n    // This computation gives exactly the same results for k as\n    //      k = ceil((kAlpha - e - 1) * 0.30102999566398114)\n    // for |e| <= 1500, but doesn't require floating-point operations.\n    // NB: log_10(2) ~= 78913 / 2^18\n    JSON_ASSERT(e >= -1500);\n    JSON_ASSERT(e <=  1500);\n    const int f = kAlpha - e - 1;\n    const int k = (f * 78913) / (1 << 18) + static_cast<int>(f > 0);\n\n    const int index = (-kCachedPowersMinDecExp + k + (kCachedPowersDecStep - 1)) / kCachedPowersDecStep;\n    JSON_ASSERT(index >= 0);\n    JSON_ASSERT(static_cast<std::size_t>(index) < kCachedPowers.size());\n\n    const cached_power cached = kCachedPowers[static_cast<std::size_t>(index)];\n    JSON_ASSERT(kAlpha <= cached.e + e + 64);\n    JSON_ASSERT(kGamma >= cached.e + e + 64);\n\n    return cached;\n}\n\n/*!\nFor n != 0, returns k, such that pow10 := 10^(k-1) <= n < 10^k.\nFor n == 0, returns 1 and sets pow10 := 1.\n*/\ninline int find_largest_pow10(const std::uint32_t n, std::uint32_t& pow10)\n{\n    // LCOV_EXCL_START\n    if (n >= 1000000000)\n    {\n        pow10 = 1000000000;\n        return 10;\n    }\n    // LCOV_EXCL_STOP\n    if (n >= 100000000)\n    {\n        pow10 = 100000000;\n        return  9;\n    }\n    if (n >= 10000000)\n    {\n        pow10 = 10000000;\n        return  8;\n    }\n    if (n >= 1000000)\n    {\n        pow10 = 1000000;\n        return  7;\n    }\n    if (n >= 100000)\n    {\n        pow10 = 100000;\n        return  6;\n    }\n    if (n >= 10000)\n    {\n        pow10 = 10000;\n        return  5;\n    }\n    if (n >= 1000)\n    {\n        pow10 = 1000;\n        return  4;\n    }\n    if (n >= 100)\n    {\n        pow10 = 100;\n        return  3;\n    }\n    if (n >= 10)\n    {\n        pow10 = 10;\n        return  2;\n    }\n\n    pow10 = 1;\n    return 1;\n}\n\ninline void grisu2_round(char* buf, int len, std::uint64_t dist, std::uint64_t delta,\n                         std::uint64_t rest, std::uint64_t ten_k)\n{\n    JSON_ASSERT(len >= 1);\n    JSON_ASSERT(dist <= delta);\n    JSON_ASSERT(rest <= delta);\n    JSON_ASSERT(ten_k > 0);\n\n    //               <--------------------------- delta ---->\n    //                                  <---- dist --------->\n    // --------------[------------------+-------------------]--------------\n    //               M-                 w                   M+\n    //\n    //                                  ten_k\n    //                                <------>\n    //                                       <---- rest ---->\n    // --------------[------------------+----+--------------]--------------\n    //                                  w    V\n    //                                       = buf * 10^k\n    //\n    // ten_k represents a unit-in-the-last-place in the decimal representation\n    // stored in buf.\n    // Decrement buf by ten_k while this takes buf closer to w.\n\n    // The tests are written in this order to avoid overflow in unsigned\n    // integer arithmetic.\n\n    while (rest < dist\n            && delta - rest >= ten_k\n            && (rest + ten_k < dist || dist - rest > rest + ten_k - dist))\n    {\n        JSON_ASSERT(buf[len - 1] != '0');\n        buf[len - 1]--;\n        rest += ten_k;\n    }\n}\n\n/*!\nGenerates V = buffer * 10^decimal_exponent, such that M- <= V <= M+.\nM- and M+ must be normalized and share the same exponent -60 <= e <= -32.\n*/\ninline void grisu2_digit_gen(char* buffer, int& length, int& decimal_exponent,\n                             diyfp M_minus, diyfp w, diyfp M_plus)\n{\n    static_assert(kAlpha >= -60, \"internal error\");\n    static_assert(kGamma <= -32, \"internal error\");\n\n    // Generates the digits (and the exponent) of a decimal floating-point\n    // number V = buffer * 10^decimal_exponent in the range [M-, M+]. The diyfp's\n    // w, M- and M+ share the same exponent e, which satisfies alpha <= e <= gamma.\n    //\n    //               <--------------------------- delta ---->\n    //                                  <---- dist --------->\n    // --------------[------------------+-------------------]--------------\n    //               M-                 w                   M+\n    //\n    // Grisu2 generates the digits of M+ from left to right and stops as soon as\n    // V is in [M-,M+].\n\n    JSON_ASSERT(M_plus.e >= kAlpha);\n    JSON_ASSERT(M_plus.e <= kGamma);\n\n    std::uint64_t delta = diyfp::sub(M_plus, M_minus).f; // (significand of (M+ - M-), implicit exponent is e)\n    std::uint64_t dist  = diyfp::sub(M_plus, w      ).f; // (significand of (M+ - w ), implicit exponent is e)\n\n    // Split M+ = f * 2^e into two parts p1 and p2 (note: e < 0):\n    //\n    //      M+ = f * 2^e\n    //         = ((f div 2^-e) * 2^-e + (f mod 2^-e)) * 2^e\n    //         = ((p1        ) * 2^-e + (p2        )) * 2^e\n    //         = p1 + p2 * 2^e\n\n    const diyfp one(std::uint64_t{1} << -M_plus.e, M_plus.e);\n\n    auto p1 = static_cast<std::uint32_t>(M_plus.f >> -one.e); // p1 = f div 2^-e (Since -e >= 32, p1 fits into a 32-bit int.)\n    std::uint64_t p2 = M_plus.f & (one.f - 1);                    // p2 = f mod 2^-e\n\n    // 1)\n    //\n    // Generate the digits of the integral part p1 = d[n-1]...d[1]d[0]\n\n    JSON_ASSERT(p1 > 0);\n\n    std::uint32_t pow10{};\n    const int k = find_largest_pow10(p1, pow10);\n\n    //      10^(k-1) <= p1 < 10^k, pow10 = 10^(k-1)\n    //\n    //      p1 = (p1 div 10^(k-1)) * 10^(k-1) + (p1 mod 10^(k-1))\n    //         = (d[k-1]         ) * 10^(k-1) + (p1 mod 10^(k-1))\n    //\n    //      M+ = p1                                             + p2 * 2^e\n    //         = d[k-1] * 10^(k-1) + (p1 mod 10^(k-1))          + p2 * 2^e\n    //         = d[k-1] * 10^(k-1) + ((p1 mod 10^(k-1)) * 2^-e + p2) * 2^e\n    //         = d[k-1] * 10^(k-1) + (                         rest) * 2^e\n    //\n    // Now generate the digits d[n] of p1 from left to right (n = k-1,...,0)\n    //\n    //      p1 = d[k-1]...d[n] * 10^n + d[n-1]...d[0]\n    //\n    // but stop as soon as\n    //\n    //      rest * 2^e = (d[n-1]...d[0] * 2^-e + p2) * 2^e <= delta * 2^e\n\n    int n = k;\n    while (n > 0)\n    {\n        // Invariants:\n        //      M+ = buffer * 10^n + (p1 + p2 * 2^e)    (buffer = 0 for n = k)\n        //      pow10 = 10^(n-1) <= p1 < 10^n\n        //\n        const std::uint32_t d = p1 / pow10;  // d = p1 div 10^(n-1)\n        const std::uint32_t r = p1 % pow10;  // r = p1 mod 10^(n-1)\n        //\n        //      M+ = buffer * 10^n + (d * 10^(n-1) + r) + p2 * 2^e\n        //         = (buffer * 10 + d) * 10^(n-1) + (r + p2 * 2^e)\n        //\n        JSON_ASSERT(d <= 9);\n        buffer[length++] = static_cast<char>('0' + d); // buffer := buffer * 10 + d\n        //\n        //      M+ = buffer * 10^(n-1) + (r + p2 * 2^e)\n        //\n        p1 = r;\n        n--;\n        //\n        //      M+ = buffer * 10^n + (p1 + p2 * 2^e)\n        //      pow10 = 10^n\n        //\n\n        // Now check if enough digits have been generated.\n        // Compute\n        //\n        //      p1 + p2 * 2^e = (p1 * 2^-e + p2) * 2^e = rest * 2^e\n        //\n        // Note:\n        // Since rest and delta share the same exponent e, it suffices to\n        // compare the significands.\n        const std::uint64_t rest = (std::uint64_t{p1} << -one.e) + p2;\n        if (rest <= delta)\n        {\n            // V = buffer * 10^n, with M- <= V <= M+.\n\n            decimal_exponent += n;\n\n            // We may now just stop. But instead look if the buffer could be\n            // decremented to bring V closer to w.\n            //\n            // pow10 = 10^n is now 1 ulp in the decimal representation V.\n            // The rounding procedure works with diyfp's with an implicit\n            // exponent of e.\n            //\n            //      10^n = (10^n * 2^-e) * 2^e = ulp * 2^e\n            //\n            const std::uint64_t ten_n = std::uint64_t{pow10} << -one.e;\n            grisu2_round(buffer, length, dist, delta, rest, ten_n);\n\n            return;\n        }\n\n        pow10 /= 10;\n        //\n        //      pow10 = 10^(n-1) <= p1 < 10^n\n        // Invariants restored.\n    }\n\n    // 2)\n    //\n    // The digits of the integral part have been generated:\n    //\n    //      M+ = d[k-1]...d[1]d[0] + p2 * 2^e\n    //         = buffer            + p2 * 2^e\n    //\n    // Now generate the digits of the fractional part p2 * 2^e.\n    //\n    // Note:\n    // No decimal point is generated: the exponent is adjusted instead.\n    //\n    // p2 actually represents the fraction\n    //\n    //      p2 * 2^e\n    //          = p2 / 2^-e\n    //          = d[-1] / 10^1 + d[-2] / 10^2 + ...\n    //\n    // Now generate the digits d[-m] of p1 from left to right (m = 1,2,...)\n    //\n    //      p2 * 2^e = d[-1]d[-2]...d[-m] * 10^-m\n    //                      + 10^-m * (d[-m-1] / 10^1 + d[-m-2] / 10^2 + ...)\n    //\n    // using\n    //\n    //      10^m * p2 = ((10^m * p2) div 2^-e) * 2^-e + ((10^m * p2) mod 2^-e)\n    //                = (                   d) * 2^-e + (                   r)\n    //\n    // or\n    //      10^m * p2 * 2^e = d + r * 2^e\n    //\n    // i.e.\n    //\n    //      M+ = buffer + p2 * 2^e\n    //         = buffer + 10^-m * (d + r * 2^e)\n    //         = (buffer * 10^m + d) * 10^-m + 10^-m * r * 2^e\n    //\n    // and stop as soon as 10^-m * r * 2^e <= delta * 2^e\n\n    JSON_ASSERT(p2 > delta);\n\n    int m = 0;\n    for (;;)\n    {\n        // Invariant:\n        //      M+ = buffer * 10^-m + 10^-m * (d[-m-1] / 10 + d[-m-2] / 10^2 + ...) * 2^e\n        //         = buffer * 10^-m + 10^-m * (p2                                 ) * 2^e\n        //         = buffer * 10^-m + 10^-m * (1/10 * (10 * p2)                   ) * 2^e\n        //         = buffer * 10^-m + 10^-m * (1/10 * ((10*p2 div 2^-e) * 2^-e + (10*p2 mod 2^-e)) * 2^e\n        //\n        JSON_ASSERT(p2 <= (std::numeric_limits<std::uint64_t>::max)() / 10);\n        p2 *= 10;\n        const std::uint64_t d = p2 >> -one.e;     // d = (10 * p2) div 2^-e\n        const std::uint64_t r = p2 & (one.f - 1); // r = (10 * p2) mod 2^-e\n        //\n        //      M+ = buffer * 10^-m + 10^-m * (1/10 * (d * 2^-e + r) * 2^e\n        //         = buffer * 10^-m + 10^-m * (1/10 * (d + r * 2^e))\n        //         = (buffer * 10 + d) * 10^(-m-1) + 10^(-m-1) * r * 2^e\n        //\n        JSON_ASSERT(d <= 9);\n        buffer[length++] = static_cast<char>('0' + d); // buffer := buffer * 10 + d\n        //\n        //      M+ = buffer * 10^(-m-1) + 10^(-m-1) * r * 2^e\n        //\n        p2 = r;\n        m++;\n        //\n        //      M+ = buffer * 10^-m + 10^-m * p2 * 2^e\n        // Invariant restored.\n\n        // Check if enough digits have been generated.\n        //\n        //      10^-m * p2 * 2^e <= delta * 2^e\n        //              p2 * 2^e <= 10^m * delta * 2^e\n        //                    p2 <= 10^m * delta\n        delta *= 10;\n        dist  *= 10;\n        if (p2 <= delta)\n        {\n            break;\n        }\n    }\n\n    // V = buffer * 10^-m, with M- <= V <= M+.\n\n    decimal_exponent -= m;\n\n    // 1 ulp in the decimal representation is now 10^-m.\n    // Since delta and dist are now scaled by 10^m, we need to do the\n    // same with ulp in order to keep the units in sync.\n    //\n    //      10^m * 10^-m = 1 = 2^-e * 2^e = ten_m * 2^e\n    //\n    const std::uint64_t ten_m = one.f;\n    grisu2_round(buffer, length, dist, delta, p2, ten_m);\n\n    // By construction this algorithm generates the shortest possible decimal\n    // number (Loitsch, Theorem 6.2) which rounds back to w.\n    // For an input number of precision p, at least\n    //\n    //      N = 1 + ceil(p * log_10(2))\n    //\n    // decimal digits are sufficient to identify all binary floating-point\n    // numbers (Matula, \"In-and-Out conversions\").\n    // This implies that the algorithm does not produce more than N decimal\n    // digits.\n    //\n    //      N = 17 for p = 53 (IEEE double precision)\n    //      N = 9  for p = 24 (IEEE single precision)\n}\n\n/*!\nv = buf * 10^decimal_exponent\nlen is the length of the buffer (number of decimal digits)\nThe buffer must be large enough, i.e. >= max_digits10.\n*/\nJSON_HEDLEY_NON_NULL(1)\ninline void grisu2(char* buf, int& len, int& decimal_exponent,\n                   diyfp m_minus, diyfp v, diyfp m_plus)\n{\n    JSON_ASSERT(m_plus.e == m_minus.e);\n    JSON_ASSERT(m_plus.e == v.e);\n\n    //  --------(-----------------------+-----------------------)--------    (A)\n    //          m-                      v                       m+\n    //\n    //  --------------------(-----------+-----------------------)--------    (B)\n    //                      m-          v                       m+\n    //\n    // First scale v (and m- and m+) such that the exponent is in the range\n    // [alpha, gamma].\n\n    const cached_power cached = get_cached_power_for_binary_exponent(m_plus.e);\n\n    const diyfp c_minus_k(cached.f, cached.e); // = c ~= 10^-k\n\n    // The exponent of the products is = v.e + c_minus_k.e + q and is in the range [alpha,gamma]\n    const diyfp w       = diyfp::mul(v,       c_minus_k);\n    const diyfp w_minus = diyfp::mul(m_minus, c_minus_k);\n    const diyfp w_plus  = diyfp::mul(m_plus,  c_minus_k);\n\n    //  ----(---+---)---------------(---+---)---------------(---+---)----\n    //          w-                      w                       w+\n    //          = c*m-                  = c*v                   = c*m+\n    //\n    // diyfp::mul rounds its result and c_minus_k is approximated too. w, w- and\n    // w+ are now off by a small amount.\n    // In fact:\n    //\n    //      w - v * 10^k < 1 ulp\n    //\n    // To account for this inaccuracy, add resp. subtract 1 ulp.\n    //\n    //  --------+---[---------------(---+---)---------------]---+--------\n    //          w-  M-                  w                   M+  w+\n    //\n    // Now any number in [M-, M+] (bounds included) will round to w when input,\n    // regardless of how the input rounding algorithm breaks ties.\n    //\n    // And digit_gen generates the shortest possible such number in [M-, M+].\n    // Note that this does not mean that Grisu2 always generates the shortest\n    // possible number in the interval (m-, m+).\n    const diyfp M_minus(w_minus.f + 1, w_minus.e);\n    const diyfp M_plus (w_plus.f  - 1, w_plus.e );\n\n    decimal_exponent = -cached.k; // = -(-k) = k\n\n    grisu2_digit_gen(buf, len, decimal_exponent, M_minus, w, M_plus);\n}\n\n/*!\nv = buf * 10^decimal_exponent\nlen is the length of the buffer (number of decimal digits)\nThe buffer must be large enough, i.e. >= max_digits10.\n*/\ntemplate<typename FloatType>\nJSON_HEDLEY_NON_NULL(1)\nvoid grisu2(char* buf, int& len, int& decimal_exponent, FloatType value)\n{\n    static_assert(diyfp::kPrecision >= std::numeric_limits<FloatType>::digits + 3,\n                  \"internal error: not enough precision\");\n\n    JSON_ASSERT(std::isfinite(value));\n    JSON_ASSERT(value > 0);\n\n    // If the neighbors (and boundaries) of 'value' are always computed for double-precision\n    // numbers, all float's can be recovered using strtod (and strtof). However, the resulting\n    // decimal representations are not exactly \"short\".\n    //\n    // The documentation for 'std::to_chars' (https://en.cppreference.com/w/cpp/utility/to_chars)\n    // says \"value is converted to a string as if by std::sprintf in the default (\"C\") locale\"\n    // and since sprintf promotes float's to double's, I think this is exactly what 'std::to_chars'\n    // does.\n    // On the other hand, the documentation for 'std::to_chars' requires that \"parsing the\n    // representation using the corresponding std::from_chars function recovers value exactly\". That\n    // indicates that single precision floating-point numbers should be recovered using\n    // 'std::strtof'.\n    //\n    // NB: If the neighbors are computed for single-precision numbers, there is a single float\n    //     (7.0385307e-26f) which can't be recovered using strtod. The resulting double precision\n    //     value is off by 1 ulp.\n#if 0\n    const boundaries w = compute_boundaries(static_cast<double>(value));\n#else\n    const boundaries w = compute_boundaries(value);\n#endif\n\n    grisu2(buf, len, decimal_exponent, w.minus, w.w, w.plus);\n}\n\n/*!\n@brief appends a decimal representation of e to buf\n@return a pointer to the element following the exponent.\n@pre -1000 < e < 1000\n*/\nJSON_HEDLEY_NON_NULL(1)\nJSON_HEDLEY_RETURNS_NON_NULL\ninline char* append_exponent(char* buf, int e)\n{\n    JSON_ASSERT(e > -1000);\n    JSON_ASSERT(e <  1000);\n\n    if (e < 0)\n    {\n        e = -e;\n        *buf++ = '-';\n    }\n    else\n    {\n        *buf++ = '+';\n    }\n\n    auto k = static_cast<std::uint32_t>(e);\n    if (k < 10)\n    {\n        // Always print at least two digits in the exponent.\n        // This is for compatibility with printf(\"%g\").\n        *buf++ = '0';\n        *buf++ = static_cast<char>('0' + k);\n    }\n    else if (k < 100)\n    {\n        *buf++ = static_cast<char>('0' + k / 10);\n        k %= 10;\n        *buf++ = static_cast<char>('0' + k);\n    }\n    else\n    {\n        *buf++ = static_cast<char>('0' + k / 100);\n        k %= 100;\n        *buf++ = static_cast<char>('0' + k / 10);\n        k %= 10;\n        *buf++ = static_cast<char>('0' + k);\n    }\n\n    return buf;\n}\n\n/*!\n@brief prettify v = buf * 10^decimal_exponent\n\nIf v is in the range [10^min_exp, 10^max_exp) it will be printed in fixed-point\nnotation. Otherwise it will be printed in exponential notation.\n\n@pre min_exp < 0\n@pre max_exp > 0\n*/\nJSON_HEDLEY_NON_NULL(1)\nJSON_HEDLEY_RETURNS_NON_NULL\ninline char* format_buffer(char* buf, int len, int decimal_exponent,\n                           int min_exp, int max_exp)\n{\n    JSON_ASSERT(min_exp < 0);\n    JSON_ASSERT(max_exp > 0);\n\n    const int k = len;\n    const int n = len + decimal_exponent;\n\n    // v = buf * 10^(n-k)\n    // k is the length of the buffer (number of decimal digits)\n    // n is the position of the decimal point relative to the start of the buffer.\n\n    if (k <= n && n <= max_exp)\n    {\n        // digits[000]\n        // len <= max_exp + 2\n\n        std::memset(buf + k, '0', static_cast<size_t>(n) - static_cast<size_t>(k));\n        // Make it look like a floating-point number (#362, #378)\n        buf[n + 0] = '.';\n        buf[n + 1] = '0';\n        return buf + (static_cast<size_t>(n) + 2);\n    }\n\n    if (0 < n && n <= max_exp)\n    {\n        // dig.its\n        // len <= max_digits10 + 1\n\n        JSON_ASSERT(k > n);\n\n        std::memmove(buf + (static_cast<size_t>(n) + 1), buf + n, static_cast<size_t>(k) - static_cast<size_t>(n));\n        buf[n] = '.';\n        return buf + (static_cast<size_t>(k) + 1U);\n    }\n\n    if (min_exp < n && n <= 0)\n    {\n        // 0.[000]digits\n        // len <= 2 + (-min_exp - 1) + max_digits10\n\n        std::memmove(buf + (2 + static_cast<size_t>(-n)), buf, static_cast<size_t>(k));\n        buf[0] = '0';\n        buf[1] = '.';\n        std::memset(buf + 2, '0', static_cast<size_t>(-n));\n        return buf + (2U + static_cast<size_t>(-n) + static_cast<size_t>(k));\n    }\n\n    if (k == 1)\n    {\n        // dE+123\n        // len <= 1 + 5\n\n        buf += 1;\n    }\n    else\n    {\n        // d.igitsE+123\n        // len <= max_digits10 + 1 + 5\n\n        std::memmove(buf + 2, buf + 1, static_cast<size_t>(k) - 1);\n        buf[1] = '.';\n        buf += 1 + static_cast<size_t>(k);\n    }\n\n    *buf++ = 'e';\n    return append_exponent(buf, n - 1);\n}\n\n} // namespace dtoa_impl\n\n/*!\n@brief generates a decimal representation of the floating-point number value in [first, last).\n\nThe format of the resulting decimal representation is similar to printf's %g\nformat. Returns an iterator pointing past-the-end of the decimal representation.\n\n@note The input number must be finite, i.e. NaN's and Inf's are not supported.\n@note The buffer must be large enough.\n@note The result is NOT null-terminated.\n*/\ntemplate<typename FloatType>\nJSON_HEDLEY_NON_NULL(1, 2)\nJSON_HEDLEY_RETURNS_NON_NULL\nchar* to_chars(char* first, const char* last, FloatType value)\n{\n    static_cast<void>(last); // maybe unused - fix warning\n    JSON_ASSERT(std::isfinite(value));\n\n    // Use signbit(value) instead of (value < 0) since signbit works for -0.\n    if (std::signbit(value))\n    {\n        value = -value;\n        *first++ = '-';\n    }\n\n    if (value == 0) // +-0\n    {\n        *first++ = '0';\n        // Make it look like a floating-point number (#362, #378)\n        *first++ = '.';\n        *first++ = '0';\n        return first;\n    }\n\n    JSON_ASSERT(last - first >= std::numeric_limits<FloatType>::max_digits10);\n\n    // Compute v = buffer * 10^decimal_exponent.\n    // The decimal digits are stored in the buffer, which needs to be interpreted\n    // as an unsigned decimal integer.\n    // len is the length of the buffer, i.e. the number of decimal digits.\n    int len = 0;\n    int decimal_exponent = 0;\n    dtoa_impl::grisu2(first, len, decimal_exponent, value);\n\n    JSON_ASSERT(len <= std::numeric_limits<FloatType>::max_digits10);\n\n    // Format the buffer like printf(\"%.*g\", prec, value)\n    constexpr int kMinExp = -4;\n    // Use digits10 here to increase compatibility with version 2.\n    constexpr int kMaxExp = std::numeric_limits<FloatType>::digits10;\n\n    JSON_ASSERT(last - first >= kMaxExp + 2);\n    JSON_ASSERT(last - first >= 2 + (-kMinExp - 1) + std::numeric_limits<FloatType>::max_digits10);\n    JSON_ASSERT(last - first >= std::numeric_limits<FloatType>::max_digits10 + 6);\n\n    return dtoa_impl::format_buffer(first, len, decimal_exponent, kMinExp, kMaxExp);\n}\n\n} // namespace detail\n} // namespace nlohmann\n\n// #include <nlohmann/detail/exceptions.hpp>\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n// #include <nlohmann/detail/meta/cpp_future.hpp>\n\n// #include <nlohmann/detail/output/binary_writer.hpp>\n\n// #include <nlohmann/detail/output/output_adapters.hpp>\n\n// #include <nlohmann/detail/value_t.hpp>\n\n\nnamespace nlohmann\n{\nnamespace detail\n{\n///////////////////\n// serialization //\n///////////////////\n\n/// how to treat decoding errors\nenum class error_handler_t\n{\n    strict,  ///< throw a type_error exception in case of invalid UTF-8\n    replace, ///< replace invalid UTF-8 sequences with U+FFFD\n    ignore   ///< ignore invalid UTF-8 sequences\n};\n\ntemplate<typename BasicJsonType>\nclass serializer\n{\n    using string_t = typename BasicJsonType::string_t;\n    using number_float_t = typename BasicJsonType::number_float_t;\n    using number_integer_t = typename BasicJsonType::number_integer_t;\n    using number_unsigned_t = typename BasicJsonType::number_unsigned_t;\n    using binary_char_t = typename BasicJsonType::binary_t::value_type;\n    static constexpr std::uint8_t UTF8_ACCEPT = 0;\n    static constexpr std::uint8_t UTF8_REJECT = 1;\n\n  public:\n    /*!\n    @param[in] s  output stream to serialize to\n    @param[in] ichar  indentation character to use\n    @param[in] error_handler_  how to react on decoding errors\n    */\n    serializer(output_adapter_t<char> s, const char ichar,\n               error_handler_t error_handler_ = error_handler_t::strict)\n        : o(std::move(s))\n        , loc(std::localeconv())\n        , thousands_sep(loc->thousands_sep == nullptr ? '\\0' : std::char_traits<char>::to_char_type(* (loc->thousands_sep)))\n        , decimal_point(loc->decimal_point == nullptr ? '\\0' : std::char_traits<char>::to_char_type(* (loc->decimal_point)))\n        , indent_char(ichar)\n        , indent_string(512, indent_char)\n        , error_handler(error_handler_)\n    {}\n\n    // delete because of pointer members\n    serializer(const serializer&) = delete;\n    serializer& operator=(const serializer&) = delete;\n    serializer(serializer&&) = delete;\n    serializer& operator=(serializer&&) = delete;\n    ~serializer() = default;\n\n    /*!\n    @brief internal implementation of the serialization function\n\n    This function is called by the public member function dump and organizes\n    the serialization internally. The indentation level is propagated as\n    additional parameter. In case of arrays and objects, the function is\n    called recursively.\n\n    - strings and object keys are escaped using `escape_string()`\n    - integer numbers are converted implicitly via `operator<<`\n    - floating-point numbers are converted to a string using `\"%g\"` format\n    - binary values are serialized as objects containing the subtype and the\n      byte array\n\n    @param[in] val               value to serialize\n    @param[in] pretty_print      whether the output shall be pretty-printed\n    @param[in] ensure_ascii If @a ensure_ascii is true, all non-ASCII characters\n    in the output are escaped with `\\uXXXX` sequences, and the result consists\n    of ASCII characters only.\n    @param[in] indent_step       the indent level\n    @param[in] current_indent    the current indent level (only used internally)\n    */\n    void dump(const BasicJsonType& val,\n              const bool pretty_print,\n              const bool ensure_ascii,\n              const unsigned int indent_step,\n              const unsigned int current_indent = 0)\n    {\n        switch (val.m_type)\n        {\n            case value_t::object:\n            {\n                if (val.m_value.object->empty())\n                {\n                    o->write_characters(\"{}\", 2);\n                    return;\n                }\n\n                if (pretty_print)\n                {\n                    o->write_characters(\"{\\n\", 2);\n\n                    // variable to hold indentation for recursive calls\n                    const auto new_indent = current_indent + indent_step;\n                    if (JSON_HEDLEY_UNLIKELY(indent_string.size() < new_indent))\n                    {\n                        indent_string.resize(indent_string.size() * 2, ' ');\n                    }\n\n                    // first n-1 elements\n                    auto i = val.m_value.object->cbegin();\n                    for (std::size_t cnt = 0; cnt < val.m_value.object->size() - 1; ++cnt, ++i)\n                    {\n                        o->write_characters(indent_string.c_str(), new_indent);\n                        o->write_character('\\\"');\n                        dump_escaped(i->first, ensure_ascii);\n                        o->write_characters(\"\\\": \", 3);\n                        dump(i->second, true, ensure_ascii, indent_step, new_indent);\n                        o->write_characters(\",\\n\", 2);\n                    }\n\n                    // last element\n                    JSON_ASSERT(i != val.m_value.object->cend());\n                    JSON_ASSERT(std::next(i) == val.m_value.object->cend());\n                    o->write_characters(indent_string.c_str(), new_indent);\n                    o->write_character('\\\"');\n                    dump_escaped(i->first, ensure_ascii);\n                    o->write_characters(\"\\\": \", 3);\n                    dump(i->second, true, ensure_ascii, indent_step, new_indent);\n\n                    o->write_character('\\n');\n                    o->write_characters(indent_string.c_str(), current_indent);\n                    o->write_character('}');\n                }\n                else\n                {\n                    o->write_character('{');\n\n                    // first n-1 elements\n                    auto i = val.m_value.object->cbegin();\n                    for (std::size_t cnt = 0; cnt < val.m_value.object->size() - 1; ++cnt, ++i)\n                    {\n                        o->write_character('\\\"');\n                        dump_escaped(i->first, ensure_ascii);\n                        o->write_characters(\"\\\":\", 2);\n                        dump(i->second, false, ensure_ascii, indent_step, current_indent);\n                        o->write_character(',');\n                    }\n\n                    // last element\n                    JSON_ASSERT(i != val.m_value.object->cend());\n                    JSON_ASSERT(std::next(i) == val.m_value.object->cend());\n                    o->write_character('\\\"');\n                    dump_escaped(i->first, ensure_ascii);\n                    o->write_characters(\"\\\":\", 2);\n                    dump(i->second, false, ensure_ascii, indent_step, current_indent);\n\n                    o->write_character('}');\n                }\n\n                return;\n            }\n\n            case value_t::array:\n            {\n                if (val.m_value.array->empty())\n                {\n                    o->write_characters(\"[]\", 2);\n                    return;\n                }\n\n                if (pretty_print)\n                {\n                    o->write_characters(\"[\\n\", 2);\n\n                    // variable to hold indentation for recursive calls\n                    const auto new_indent = current_indent + indent_step;\n                    if (JSON_HEDLEY_UNLIKELY(indent_string.size() < new_indent))\n                    {\n                        indent_string.resize(indent_string.size() * 2, ' ');\n                    }\n\n                    // first n-1 elements\n                    for (auto i = val.m_value.array->cbegin();\n                            i != val.m_value.array->cend() - 1; ++i)\n                    {\n                        o->write_characters(indent_string.c_str(), new_indent);\n                        dump(*i, true, ensure_ascii, indent_step, new_indent);\n                        o->write_characters(\",\\n\", 2);\n                    }\n\n                    // last element\n                    JSON_ASSERT(!val.m_value.array->empty());\n                    o->write_characters(indent_string.c_str(), new_indent);\n                    dump(val.m_value.array->back(), true, ensure_ascii, indent_step, new_indent);\n\n                    o->write_character('\\n');\n                    o->write_characters(indent_string.c_str(), current_indent);\n                    o->write_character(']');\n                }\n                else\n                {\n                    o->write_character('[');\n\n                    // first n-1 elements\n                    for (auto i = val.m_value.array->cbegin();\n                            i != val.m_value.array->cend() - 1; ++i)\n                    {\n                        dump(*i, false, ensure_ascii, indent_step, current_indent);\n                        o->write_character(',');\n                    }\n\n                    // last element\n                    JSON_ASSERT(!val.m_value.array->empty());\n                    dump(val.m_value.array->back(), false, ensure_ascii, indent_step, current_indent);\n\n                    o->write_character(']');\n                }\n\n                return;\n            }\n\n            case value_t::string:\n            {\n                o->write_character('\\\"');\n                dump_escaped(*val.m_value.string, ensure_ascii);\n                o->write_character('\\\"');\n                return;\n            }\n\n            case value_t::binary:\n            {\n                if (pretty_print)\n                {\n                    o->write_characters(\"{\\n\", 2);\n\n                    // variable to hold indentation for recursive calls\n                    const auto new_indent = current_indent + indent_step;\n                    if (JSON_HEDLEY_UNLIKELY(indent_string.size() < new_indent))\n                    {\n                        indent_string.resize(indent_string.size() * 2, ' ');\n                    }\n\n                    o->write_characters(indent_string.c_str(), new_indent);\n\n                    o->write_characters(\"\\\"bytes\\\": [\", 10);\n\n                    if (!val.m_value.binary->empty())\n                    {\n                        for (auto i = val.m_value.binary->cbegin();\n                                i != val.m_value.binary->cend() - 1; ++i)\n                        {\n                            dump_integer(*i);\n                            o->write_characters(\", \", 2);\n                        }\n                        dump_integer(val.m_value.binary->back());\n                    }\n\n                    o->write_characters(\"],\\n\", 3);\n                    o->write_characters(indent_string.c_str(), new_indent);\n\n                    o->write_characters(\"\\\"subtype\\\": \", 11);\n                    if (val.m_value.binary->has_subtype())\n                    {\n                        dump_integer(val.m_value.binary->subtype());\n                    }\n                    else\n                    {\n                        o->write_characters(\"null\", 4);\n                    }\n                    o->write_character('\\n');\n                    o->write_characters(indent_string.c_str(), current_indent);\n                    o->write_character('}');\n                }\n                else\n                {\n                    o->write_characters(\"{\\\"bytes\\\":[\", 10);\n\n                    if (!val.m_value.binary->empty())\n                    {\n                        for (auto i = val.m_value.binary->cbegin();\n                                i != val.m_value.binary->cend() - 1; ++i)\n                        {\n                            dump_integer(*i);\n                            o->write_character(',');\n                        }\n                        dump_integer(val.m_value.binary->back());\n                    }\n\n                    o->write_characters(\"],\\\"subtype\\\":\", 12);\n                    if (val.m_value.binary->has_subtype())\n                    {\n                        dump_integer(val.m_value.binary->subtype());\n                        o->write_character('}');\n                    }\n                    else\n                    {\n                        o->write_characters(\"null}\", 5);\n                    }\n                }\n                return;\n            }\n\n            case value_t::boolean:\n            {\n                if (val.m_value.boolean)\n                {\n                    o->write_characters(\"true\", 4);\n                }\n                else\n                {\n                    o->write_characters(\"false\", 5);\n                }\n                return;\n            }\n\n            case value_t::number_integer:\n            {\n                dump_integer(val.m_value.number_integer);\n                return;\n            }\n\n            case value_t::number_unsigned:\n            {\n                dump_integer(val.m_value.number_unsigned);\n                return;\n            }\n\n            case value_t::number_float:\n            {\n                dump_float(val.m_value.number_float);\n                return;\n            }\n\n            case value_t::discarded:\n            {\n                o->write_characters(\"<discarded>\", 11);\n                return;\n            }\n\n            case value_t::null:\n            {\n                o->write_characters(\"null\", 4);\n                return;\n            }\n\n            default:            // LCOV_EXCL_LINE\n                JSON_ASSERT(false); // NOLINT(cert-dcl03-c,hicpp-static-assert,misc-static-assert) LCOV_EXCL_LINE\n        }\n    }\n\n  JSON_PRIVATE_UNLESS_TESTED:\n    /*!\n    @brief dump escaped string\n\n    Escape a string by replacing certain special characters by a sequence of an\n    escape character (backslash) and another character and other control\n    characters by a sequence of \"\\u\" followed by a four-digit hex\n    representation. The escaped string is written to output stream @a o.\n\n    @param[in] s  the string to escape\n    @param[in] ensure_ascii  whether to escape non-ASCII characters with\n                             \\uXXXX sequences\n\n    @complexity Linear in the length of string @a s.\n    */\n    void dump_escaped(const string_t& s, const bool ensure_ascii)\n    {\n        std::uint32_t codepoint{};\n        std::uint8_t state = UTF8_ACCEPT;\n        std::size_t bytes = 0;  // number of bytes written to string_buffer\n\n        // number of bytes written at the point of the last valid byte\n        std::size_t bytes_after_last_accept = 0;\n        std::size_t undumped_chars = 0;\n\n        for (std::size_t i = 0; i < s.size(); ++i)\n        {\n            const auto byte = static_cast<uint8_t>(s[i]);\n\n            switch (decode(state, codepoint, byte))\n            {\n                case UTF8_ACCEPT:  // decode found a new code point\n                {\n                    switch (codepoint)\n                    {\n                        case 0x08: // backspace\n                        {\n                            string_buffer[bytes++] = '\\\\';\n                            string_buffer[bytes++] = 'b';\n                            break;\n                        }\n\n                        case 0x09: // horizontal tab\n                        {\n                            string_buffer[bytes++] = '\\\\';\n                            string_buffer[bytes++] = 't';\n                            break;\n                        }\n\n                        case 0x0A: // newline\n                        {\n                            string_buffer[bytes++] = '\\\\';\n                            string_buffer[bytes++] = 'n';\n                            break;\n                        }\n\n                        case 0x0C: // formfeed\n                        {\n                            string_buffer[bytes++] = '\\\\';\n                            string_buffer[bytes++] = 'f';\n                            break;\n                        }\n\n                        case 0x0D: // carriage return\n                        {\n                            string_buffer[bytes++] = '\\\\';\n                            string_buffer[bytes++] = 'r';\n                            break;\n                        }\n\n                        case 0x22: // quotation mark\n                        {\n                            string_buffer[bytes++] = '\\\\';\n                            string_buffer[bytes++] = '\\\"';\n                            break;\n                        }\n\n                        case 0x5C: // reverse solidus\n                        {\n                            string_buffer[bytes++] = '\\\\';\n                            string_buffer[bytes++] = '\\\\';\n                            break;\n                        }\n\n                        default:\n                        {\n                            // escape control characters (0x00..0x1F) or, if\n                            // ensure_ascii parameter is used, non-ASCII characters\n                            if ((codepoint <= 0x1F) || (ensure_ascii && (codepoint >= 0x7F)))\n                            {\n                                if (codepoint <= 0xFFFF)\n                                {\n                                    // NOLINTNEXTLINE(cppcoreguidelines-pro-type-vararg,hicpp-vararg)\n                                    (std::snprintf)(string_buffer.data() + bytes, 7, \"\\\\u%04x\",\n                                                    static_cast<std::uint16_t>(codepoint));\n                                    bytes += 6;\n                                }\n                                else\n                                {\n                                    // NOLINTNEXTLINE(cppcoreguidelines-pro-type-vararg,hicpp-vararg)\n                                    (std::snprintf)(string_buffer.data() + bytes, 13, \"\\\\u%04x\\\\u%04x\",\n                                                    static_cast<std::uint16_t>(0xD7C0u + (codepoint >> 10u)),\n                                                    static_cast<std::uint16_t>(0xDC00u + (codepoint & 0x3FFu)));\n                                    bytes += 12;\n                                }\n                            }\n                            else\n                            {\n                                // copy byte to buffer (all previous bytes\n                                // been copied have in default case above)\n                                string_buffer[bytes++] = s[i];\n                            }\n                            break;\n                        }\n                    }\n\n                    // write buffer and reset index; there must be 13 bytes\n                    // left, as this is the maximal number of bytes to be\n                    // written (\"\\uxxxx\\uxxxx\\0\") for one code point\n                    if (string_buffer.size() - bytes < 13)\n                    {\n                        o->write_characters(string_buffer.data(), bytes);\n                        bytes = 0;\n                    }\n\n                    // remember the byte position of this accept\n                    bytes_after_last_accept = bytes;\n                    undumped_chars = 0;\n                    break;\n                }\n\n                case UTF8_REJECT:  // decode found invalid UTF-8 byte\n                {\n                    switch (error_handler)\n                    {\n                        case error_handler_t::strict:\n                        {\n                            std::string sn(3, '\\0');\n                            // NOLINTNEXTLINE(cppcoreguidelines-pro-type-vararg,hicpp-vararg)\n                            (std::snprintf)(&sn[0], sn.size(), \"%.2X\", byte);\n                            JSON_THROW(type_error::create(316, \"invalid UTF-8 byte at index \" + std::to_string(i) + \": 0x\" + sn, BasicJsonType()));\n                        }\n\n                        case error_handler_t::ignore:\n                        case error_handler_t::replace:\n                        {\n                            // in case we saw this character the first time, we\n                            // would like to read it again, because the byte\n                            // may be OK for itself, but just not OK for the\n                            // previous sequence\n                            if (undumped_chars > 0)\n                            {\n                                --i;\n                            }\n\n                            // reset length buffer to the last accepted index;\n                            // thus removing/ignoring the invalid characters\n                            bytes = bytes_after_last_accept;\n\n                            if (error_handler == error_handler_t::replace)\n                            {\n                                // add a replacement character\n                                if (ensure_ascii)\n                                {\n                                    string_buffer[bytes++] = '\\\\';\n                                    string_buffer[bytes++] = 'u';\n                                    string_buffer[bytes++] = 'f';\n                                    string_buffer[bytes++] = 'f';\n                                    string_buffer[bytes++] = 'f';\n                                    string_buffer[bytes++] = 'd';\n                                }\n                                else\n                                {\n                                    string_buffer[bytes++] = detail::binary_writer<BasicJsonType, char>::to_char_type('\\xEF');\n                                    string_buffer[bytes++] = detail::binary_writer<BasicJsonType, char>::to_char_type('\\xBF');\n                                    string_buffer[bytes++] = detail::binary_writer<BasicJsonType, char>::to_char_type('\\xBD');\n                                }\n\n                                // write buffer and reset index; there must be 13 bytes\n                                // left, as this is the maximal number of bytes to be\n                                // written (\"\\uxxxx\\uxxxx\\0\") for one code point\n                                if (string_buffer.size() - bytes < 13)\n                                {\n                                    o->write_characters(string_buffer.data(), bytes);\n                                    bytes = 0;\n                                }\n\n                                bytes_after_last_accept = bytes;\n                            }\n\n                            undumped_chars = 0;\n\n                            // continue processing the string\n                            state = UTF8_ACCEPT;\n                            break;\n                        }\n\n                        default:            // LCOV_EXCL_LINE\n                            JSON_ASSERT(false); // NOLINT(cert-dcl03-c,hicpp-static-assert,misc-static-assert) LCOV_EXCL_LINE\n                    }\n                    break;\n                }\n\n                default:  // decode found yet incomplete multi-byte code point\n                {\n                    if (!ensure_ascii)\n                    {\n                        // code point will not be escaped - copy byte to buffer\n                        string_buffer[bytes++] = s[i];\n                    }\n                    ++undumped_chars;\n                    break;\n                }\n            }\n        }\n\n        // we finished processing the string\n        if (JSON_HEDLEY_LIKELY(state == UTF8_ACCEPT))\n        {\n            // write buffer\n            if (bytes > 0)\n            {\n                o->write_characters(string_buffer.data(), bytes);\n            }\n        }\n        else\n        {\n            // we finish reading, but do not accept: string was incomplete\n            switch (error_handler)\n            {\n                case error_handler_t::strict:\n                {\n                    std::string sn(3, '\\0');\n                    // NOLINTNEXTLINE(cppcoreguidelines-pro-type-vararg,hicpp-vararg)\n                    (std::snprintf)(&sn[0], sn.size(), \"%.2X\", static_cast<std::uint8_t>(s.back()));\n                    JSON_THROW(type_error::create(316, \"incomplete UTF-8 string; last byte: 0x\" + sn, BasicJsonType()));\n                }\n\n                case error_handler_t::ignore:\n                {\n                    // write all accepted bytes\n                    o->write_characters(string_buffer.data(), bytes_after_last_accept);\n                    break;\n                }\n\n                case error_handler_t::replace:\n                {\n                    // write all accepted bytes\n                    o->write_characters(string_buffer.data(), bytes_after_last_accept);\n                    // add a replacement character\n                    if (ensure_ascii)\n                    {\n                        o->write_characters(\"\\\\ufffd\", 6);\n                    }\n                    else\n                    {\n                        o->write_characters(\"\\xEF\\xBF\\xBD\", 3);\n                    }\n                    break;\n                }\n\n                default:            // LCOV_EXCL_LINE\n                    JSON_ASSERT(false); // NOLINT(cert-dcl03-c,hicpp-static-assert,misc-static-assert) LCOV_EXCL_LINE\n            }\n        }\n    }\n\n  private:\n    /*!\n    @brief count digits\n\n    Count the number of decimal (base 10) digits for an input unsigned integer.\n\n    @param[in] x  unsigned integer number to count its digits\n    @return    number of decimal digits\n    */\n    inline unsigned int count_digits(number_unsigned_t x) noexcept\n    {\n        unsigned int n_digits = 1;\n        for (;;)\n        {\n            if (x < 10)\n            {\n                return n_digits;\n            }\n            if (x < 100)\n            {\n                return n_digits + 1;\n            }\n            if (x < 1000)\n            {\n                return n_digits + 2;\n            }\n            if (x < 10000)\n            {\n                return n_digits + 3;\n            }\n            x = x / 10000u;\n            n_digits += 4;\n        }\n    }\n\n    /*!\n    @brief dump an integer\n\n    Dump a given integer to output stream @a o. Works internally with\n    @a number_buffer.\n\n    @param[in] x  integer number (signed or unsigned) to dump\n    @tparam NumberType either @a number_integer_t or @a number_unsigned_t\n    */\n    template < typename NumberType, detail::enable_if_t <\n                   std::is_same<NumberType, number_unsigned_t>::value ||\n                   std::is_same<NumberType, number_integer_t>::value ||\n                   std::is_same<NumberType, binary_char_t>::value,\n                   int > = 0 >\n    void dump_integer(NumberType x)\n    {\n        static constexpr std::array<std::array<char, 2>, 100> digits_to_99\n        {\n            {\n                {{'0', '0'}}, {{'0', '1'}}, {{'0', '2'}}, {{'0', '3'}}, {{'0', '4'}}, {{'0', '5'}}, {{'0', '6'}}, {{'0', '7'}}, {{'0', '8'}}, {{'0', '9'}},\n                {{'1', '0'}}, {{'1', '1'}}, {{'1', '2'}}, {{'1', '3'}}, {{'1', '4'}}, {{'1', '5'}}, {{'1', '6'}}, {{'1', '7'}}, {{'1', '8'}}, {{'1', '9'}},\n                {{'2', '0'}}, {{'2', '1'}}, {{'2', '2'}}, {{'2', '3'}}, {{'2', '4'}}, {{'2', '5'}}, {{'2', '6'}}, {{'2', '7'}}, {{'2', '8'}}, {{'2', '9'}},\n                {{'3', '0'}}, {{'3', '1'}}, {{'3', '2'}}, {{'3', '3'}}, {{'3', '4'}}, {{'3', '5'}}, {{'3', '6'}}, {{'3', '7'}}, {{'3', '8'}}, {{'3', '9'}},\n                {{'4', '0'}}, {{'4', '1'}}, {{'4', '2'}}, {{'4', '3'}}, {{'4', '4'}}, {{'4', '5'}}, {{'4', '6'}}, {{'4', '7'}}, {{'4', '8'}}, {{'4', '9'}},\n                {{'5', '0'}}, {{'5', '1'}}, {{'5', '2'}}, {{'5', '3'}}, {{'5', '4'}}, {{'5', '5'}}, {{'5', '6'}}, {{'5', '7'}}, {{'5', '8'}}, {{'5', '9'}},\n                {{'6', '0'}}, {{'6', '1'}}, {{'6', '2'}}, {{'6', '3'}}, {{'6', '4'}}, {{'6', '5'}}, {{'6', '6'}}, {{'6', '7'}}, {{'6', '8'}}, {{'6', '9'}},\n                {{'7', '0'}}, {{'7', '1'}}, {{'7', '2'}}, {{'7', '3'}}, {{'7', '4'}}, {{'7', '5'}}, {{'7', '6'}}, {{'7', '7'}}, {{'7', '8'}}, {{'7', '9'}},\n                {{'8', '0'}}, {{'8', '1'}}, {{'8', '2'}}, {{'8', '3'}}, {{'8', '4'}}, {{'8', '5'}}, {{'8', '6'}}, {{'8', '7'}}, {{'8', '8'}}, {{'8', '9'}},\n                {{'9', '0'}}, {{'9', '1'}}, {{'9', '2'}}, {{'9', '3'}}, {{'9', '4'}}, {{'9', '5'}}, {{'9', '6'}}, {{'9', '7'}}, {{'9', '8'}}, {{'9', '9'}},\n            }\n        };\n\n        // special case for \"0\"\n        if (x == 0)\n        {\n            o->write_character('0');\n            return;\n        }\n\n        // use a pointer to fill the buffer\n        auto buffer_ptr = number_buffer.begin(); // NOLINT(llvm-qualified-auto,readability-qualified-auto,cppcoreguidelines-pro-type-vararg,hicpp-vararg)\n\n        const bool is_negative = std::is_same<NumberType, number_integer_t>::value && !(x >= 0); // see issue #755\n        number_unsigned_t abs_value;\n\n        unsigned int n_chars{};\n\n        if (is_negative)\n        {\n            *buffer_ptr = '-';\n            abs_value = remove_sign(static_cast<number_integer_t>(x));\n\n            // account one more byte for the minus sign\n            n_chars = 1 + count_digits(abs_value);\n        }\n        else\n        {\n            abs_value = static_cast<number_unsigned_t>(x);\n            n_chars = count_digits(abs_value);\n        }\n\n        // spare 1 byte for '\\0'\n        JSON_ASSERT(n_chars < number_buffer.size() - 1);\n\n        // jump to the end to generate the string from backward\n        // so we later avoid reversing the result\n        buffer_ptr += n_chars;\n\n        // Fast int2ascii implementation inspired by \"Fastware\" talk by Andrei Alexandrescu\n        // See: https://www.youtube.com/watch?v=o4-CwDo2zpg\n        while (abs_value >= 100)\n        {\n            const auto digits_index = static_cast<unsigned>((abs_value % 100));\n            abs_value /= 100;\n            *(--buffer_ptr) = digits_to_99[digits_index][1];\n            *(--buffer_ptr) = digits_to_99[digits_index][0];\n        }\n\n        if (abs_value >= 10)\n        {\n            const auto digits_index = static_cast<unsigned>(abs_value);\n            *(--buffer_ptr) = digits_to_99[digits_index][1];\n            *(--buffer_ptr) = digits_to_99[digits_index][0];\n        }\n        else\n        {\n            *(--buffer_ptr) = static_cast<char>('0' + abs_value);\n        }\n\n        o->write_characters(number_buffer.data(), n_chars);\n    }\n\n    /*!\n    @brief dump a floating-point number\n\n    Dump a given floating-point number to output stream @a o. Works internally\n    with @a number_buffer.\n\n    @param[in] x  floating-point number to dump\n    */\n    void dump_float(number_float_t x)\n    {\n        // NaN / inf\n        if (!std::isfinite(x))\n        {\n            o->write_characters(\"null\", 4);\n            return;\n        }\n\n        // If number_float_t is an IEEE-754 single or double precision number,\n        // use the Grisu2 algorithm to produce short numbers which are\n        // guaranteed to round-trip, using strtof and strtod, resp.\n        //\n        // NB: The test below works if <long double> == <double>.\n        static constexpr bool is_ieee_single_or_double\n            = (std::numeric_limits<number_float_t>::is_iec559 && std::numeric_limits<number_float_t>::digits == 24 && std::numeric_limits<number_float_t>::max_exponent == 128) ||\n              (std::numeric_limits<number_float_t>::is_iec559 && std::numeric_limits<number_float_t>::digits == 53 && std::numeric_limits<number_float_t>::max_exponent == 1024);\n\n        dump_float(x, std::integral_constant<bool, is_ieee_single_or_double>());\n    }\n\n    void dump_float(number_float_t x, std::true_type /*is_ieee_single_or_double*/)\n    {\n        auto* begin = number_buffer.data();\n        auto* end = ::nlohmann::detail::to_chars(begin, begin + number_buffer.size(), x);\n\n        o->write_characters(begin, static_cast<size_t>(end - begin));\n    }\n\n    void dump_float(number_float_t x, std::false_type /*is_ieee_single_or_double*/)\n    {\n        // get number of digits for a float -> text -> float round-trip\n        static constexpr auto d = std::numeric_limits<number_float_t>::max_digits10;\n\n        // the actual conversion\n        // NOLINTNEXTLINE(cppcoreguidelines-pro-type-vararg,hicpp-vararg)\n        std::ptrdiff_t len = (std::snprintf)(number_buffer.data(), number_buffer.size(), \"%.*g\", d, x);\n\n        // negative value indicates an error\n        JSON_ASSERT(len > 0);\n        // check if buffer was large enough\n        JSON_ASSERT(static_cast<std::size_t>(len) < number_buffer.size());\n\n        // erase thousands separator\n        if (thousands_sep != '\\0')\n        {\n            auto* const end = std::remove(number_buffer.begin(),\n                                          number_buffer.begin() + len, thousands_sep);\n            std::fill(end, number_buffer.end(), '\\0');\n            JSON_ASSERT((end - number_buffer.begin()) <= len);\n            len = (end - number_buffer.begin());\n        }\n\n        // convert decimal point to '.'\n        if (decimal_point != '\\0' && decimal_point != '.')\n        {\n            auto* const dec_pos = std::find(number_buffer.begin(), number_buffer.end(), decimal_point);\n            if (dec_pos != number_buffer.end())\n            {\n                *dec_pos = '.';\n            }\n        }\n\n        o->write_characters(number_buffer.data(), static_cast<std::size_t>(len));\n\n        // determine if need to append \".0\"\n        const bool value_is_int_like =\n            std::none_of(number_buffer.begin(), number_buffer.begin() + len + 1,\n                         [](char c)\n        {\n            return c == '.' || c == 'e';\n        });\n\n        if (value_is_int_like)\n        {\n            o->write_characters(\".0\", 2);\n        }\n    }\n\n    /*!\n    @brief check whether a string is UTF-8 encoded\n\n    The function checks each byte of a string whether it is UTF-8 encoded. The\n    result of the check is stored in the @a state parameter. The function must\n    be called initially with state 0 (accept). State 1 means the string must\n    be rejected, because the current byte is not allowed. If the string is\n    completely processed, but the state is non-zero, the string ended\n    prematurely; that is, the last byte indicated more bytes should have\n    followed.\n\n    @param[in,out] state  the state of the decoding\n    @param[in,out] codep  codepoint (valid only if resulting state is UTF8_ACCEPT)\n    @param[in] byte       next byte to decode\n    @return               new state\n\n    @note The function has been edited: a std::array is used.\n\n    @copyright Copyright (c) 2008-2009 Bjoern Hoehrmann <bjoern@hoehrmann.de>\n    @sa http://bjoern.hoehrmann.de/utf-8/decoder/dfa/\n    */\n    static std::uint8_t decode(std::uint8_t& state, std::uint32_t& codep, const std::uint8_t byte) noexcept\n    {\n        static const std::array<std::uint8_t, 400> utf8d =\n        {\n            {\n                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 00..1F\n                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 20..3F\n                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 40..5F\n                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 60..7F\n                1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, // 80..9F\n                7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, // A0..BF\n                8, 8, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, // C0..DF\n                0xA, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x4, 0x3, 0x3, // E0..EF\n                0xB, 0x6, 0x6, 0x6, 0x5, 0x8, 0x8, 0x8, 0x8, 0x8, 0x8, 0x8, 0x8, 0x8, 0x8, 0x8, // F0..FF\n                0x0, 0x1, 0x2, 0x3, 0x5, 0x8, 0x7, 0x1, 0x1, 0x1, 0x4, 0x6, 0x1, 0x1, 0x1, 0x1, // s0..s0\n                1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, // s1..s2\n                1, 2, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, // s3..s4\n                1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 3, 1, 1, 1, 1, 1, 1, // s5..s6\n                1, 3, 1, 1, 1, 1, 1, 3, 1, 3, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 // s7..s8\n            }\n        };\n\n        JSON_ASSERT(byte < utf8d.size());\n        const std::uint8_t type = utf8d[byte];\n\n        codep = (state != UTF8_ACCEPT)\n                ? (byte & 0x3fu) | (codep << 6u)\n                : (0xFFu >> type) & (byte);\n\n        std::size_t index = 256u + static_cast<size_t>(state) * 16u + static_cast<size_t>(type);\n        JSON_ASSERT(index < 400);\n        state = utf8d[index];\n        return state;\n    }\n\n    /*\n     * Overload to make the compiler happy while it is instantiating\n     * dump_integer for number_unsigned_t.\n     * Must never be called.\n     */\n    number_unsigned_t remove_sign(number_unsigned_t x)\n    {\n        JSON_ASSERT(false); // NOLINT(cert-dcl03-c,hicpp-static-assert,misc-static-assert) LCOV_EXCL_LINE\n        return x; // LCOV_EXCL_LINE\n    }\n\n    /*\n     * Helper function for dump_integer\n     *\n     * This function takes a negative signed integer and returns its absolute\n     * value as unsigned integer. The plus/minus shuffling is necessary as we can\n     * not directly remove the sign of an arbitrary signed integer as the\n     * absolute values of INT_MIN and INT_MAX are usually not the same. See\n     * #1708 for details.\n     */\n    inline number_unsigned_t remove_sign(number_integer_t x) noexcept\n    {\n        JSON_ASSERT(x < 0 && x < (std::numeric_limits<number_integer_t>::max)()); // NOLINT(misc-redundant-expression)\n        return static_cast<number_unsigned_t>(-(x + 1)) + 1;\n    }\n\n  private:\n    /// the output of the serializer\n    output_adapter_t<char> o = nullptr;\n\n    /// a (hopefully) large enough character buffer\n    std::array<char, 64> number_buffer{{}};\n\n    /// the locale\n    const std::lconv* loc = nullptr;\n    /// the locale's thousand separator character\n    const char thousands_sep = '\\0';\n    /// the locale's decimal point character\n    const char decimal_point = '\\0';\n\n    /// string buffer\n    std::array<char, 512> string_buffer{{}};\n\n    /// the indentation character\n    const char indent_char;\n    /// the indentation string\n    string_t indent_string;\n\n    /// error_handler how to react on decoding errors\n    const error_handler_t error_handler;\n};\n}  // namespace detail\n}  // namespace nlohmann\n\n// #include <nlohmann/detail/value_t.hpp>\n\n// #include <nlohmann/json_fwd.hpp>\n\n// #include <nlohmann/ordered_map.hpp>\n\n\n#include <functional> // less\n#include <initializer_list> // initializer_list\n#include <iterator> // input_iterator_tag, iterator_traits\n#include <memory> // allocator\n#include <stdexcept> // for out_of_range\n#include <type_traits> // enable_if, is_convertible\n#include <utility> // pair\n#include <vector> // vector\n\n// #include <nlohmann/detail/macro_scope.hpp>\n\n\nnamespace nlohmann\n{\n\n/// ordered_map: a minimal map-like container that preserves insertion order\n/// for use within nlohmann::basic_json<ordered_map>\ntemplate <class Key, class T, class IgnoredLess = std::less<Key>,\n          class Allocator = std::allocator<std::pair<const Key, T>>>\n                  struct ordered_map : std::vector<std::pair<const Key, T>, Allocator>\n{\n    using key_type = Key;\n    using mapped_type = T;\n    using Container = std::vector<std::pair<const Key, T>, Allocator>;\n    using typename Container::iterator;\n    using typename Container::const_iterator;\n    using typename Container::size_type;\n    using typename Container::value_type;\n\n    // Explicit constructors instead of `using Container::Container`\n    // otherwise older compilers choke on it (GCC <= 5.5, xcode <= 9.4)\n    ordered_map(const Allocator& alloc = Allocator()) : Container{alloc} {}\n    template <class It>\n    ordered_map(It first, It last, const Allocator& alloc = Allocator())\n        : Container{first, last, alloc} {}\n    ordered_map(std::initializer_list<T> init, const Allocator& alloc = Allocator() )\n        : Container{init, alloc} {}\n\n    std::pair<iterator, bool> emplace(const key_type& key, T&& t)\n    {\n        for (auto it = this->begin(); it != this->end(); ++it)\n        {\n            if (it->first == key)\n            {\n                return {it, false};\n            }\n        }\n        Container::emplace_back(key, t);\n        return {--this->end(), true};\n    }\n\n    T& operator[](const Key& key)\n    {\n        return emplace(key, T{}).first->second;\n    }\n\n    const T& operator[](const Key& key) const\n    {\n        return at(key);\n    }\n\n    T& at(const Key& key)\n    {\n        for (auto it = this->begin(); it != this->end(); ++it)\n        {\n            if (it->first == key)\n            {\n                return it->second;\n            }\n        }\n\n        JSON_THROW(std::out_of_range(\"key not found\"));\n    }\n\n    const T& at(const Key& key) const\n    {\n        for (auto it = this->begin(); it != this->end(); ++it)\n        {\n            if (it->first == key)\n            {\n                return it->second;\n            }\n        }\n\n        JSON_THROW(std::out_of_range(\"key not found\"));\n    }\n\n    size_type erase(const Key& key)\n    {\n        for (auto it = this->begin(); it != this->end(); ++it)\n        {\n            if (it->first == key)\n            {\n                // Since we cannot move const Keys, re-construct them in place\n                for (auto next = it; ++next != this->end(); ++it)\n                {\n                    it->~value_type(); // Destroy but keep allocation\n                    new (&*it) value_type{std::move(*next)};\n                }\n                Container::pop_back();\n                return 1;\n            }\n        }\n        return 0;\n    }\n\n    iterator erase(iterator pos)\n    {\n        auto it = pos;\n\n        // Since we cannot move const Keys, re-construct them in place\n        for (auto next = it; ++next != this->end(); ++it)\n        {\n            it->~value_type(); // Destroy but keep allocation\n            new (&*it) value_type{std::move(*next)};\n        }\n        Container::pop_back();\n        return pos;\n    }\n\n    size_type count(const Key& key) const\n    {\n        for (auto it = this->begin(); it != this->end(); ++it)\n        {\n            if (it->first == key)\n            {\n                return 1;\n            }\n        }\n        return 0;\n    }\n\n    iterator find(const Key& key)\n    {\n        for (auto it = this->begin(); it != this->end(); ++it)\n        {\n            if (it->first == key)\n            {\n                return it;\n            }\n        }\n        return Container::end();\n    }\n\n    const_iterator find(const Key& key) const\n    {\n        for (auto it = this->begin(); it != this->end(); ++it)\n        {\n            if (it->first == key)\n            {\n                return it;\n            }\n        }\n        return Container::end();\n    }\n\n    std::pair<iterator, bool> insert( value_type&& value )\n    {\n        return emplace(value.first, std::move(value.second));\n    }\n\n    std::pair<iterator, bool> insert( const value_type& value )\n    {\n        for (auto it = this->begin(); it != this->end(); ++it)\n        {\n            if (it->first == value.first)\n            {\n                return {it, false};\n            }\n        }\n        Container::push_back(value);\n        return {--this->end(), true};\n    }\n\n    template<typename InputIt>\n    using require_input_iter = typename std::enable_if<std::is_convertible<typename std::iterator_traits<InputIt>::iterator_category,\n            std::input_iterator_tag>::value>::type;\n\n    template<typename InputIt, typename = require_input_iter<InputIt>>\n    void insert(InputIt first, InputIt last)\n    {\n        for (auto it = first; it != last; ++it)\n        {\n            insert(*it);\n        }\n    }\n};\n\n}  // namespace nlohmann\n\n\n#if defined(JSON_HAS_CPP_17)\n    #include <string_view>\n#endif\n\n/*!\n@brief namespace for Niels Lohmann\n@see https://github.com/nlohmann\n@since version 1.0.0\n*/\nnamespace nlohmann\n{\n\n/*!\n@brief a class to store JSON values\n\n@tparam ObjectType type for JSON objects (`std::map` by default; will be used\nin @ref object_t)\n@tparam ArrayType type for JSON arrays (`std::vector` by default; will be used\nin @ref array_t)\n@tparam StringType type for JSON strings and object keys (`std::string` by\ndefault; will be used in @ref string_t)\n@tparam BooleanType type for JSON booleans (`bool` by default; will be used\nin @ref boolean_t)\n@tparam NumberIntegerType type for JSON integer numbers (`int64_t` by\ndefault; will be used in @ref number_integer_t)\n@tparam NumberUnsignedType type for JSON unsigned integer numbers (@c\n`uint64_t` by default; will be used in @ref number_unsigned_t)\n@tparam NumberFloatType type for JSON floating-point numbers (`double` by\ndefault; will be used in @ref number_float_t)\n@tparam BinaryType type for packed binary data for compatibility with binary\nserialization formats (`std::vector<std::uint8_t>` by default; will be used in\n@ref binary_t)\n@tparam AllocatorType type of the allocator to use (`std::allocator` by\ndefault)\n@tparam JSONSerializer the serializer to resolve internal calls to `to_json()`\nand `from_json()` (@ref adl_serializer by default)\n\n@requirement The class satisfies the following concept requirements:\n- Basic\n - [DefaultConstructible](https://en.cppreference.com/w/cpp/named_req/DefaultConstructible):\n   JSON values can be default constructed. The result will be a JSON null\n   value.\n - [MoveConstructible](https://en.cppreference.com/w/cpp/named_req/MoveConstructible):\n   A JSON value can be constructed from an rvalue argument.\n - [CopyConstructible](https://en.cppreference.com/w/cpp/named_req/CopyConstructible):\n   A JSON value can be copy-constructed from an lvalue expression.\n - [MoveAssignable](https://en.cppreference.com/w/cpp/named_req/MoveAssignable):\n   A JSON value van be assigned from an rvalue argument.\n - [CopyAssignable](https://en.cppreference.com/w/cpp/named_req/CopyAssignable):\n   A JSON value can be copy-assigned from an lvalue expression.\n - [Destructible](https://en.cppreference.com/w/cpp/named_req/Destructible):\n   JSON values can be destructed.\n- Layout\n - [StandardLayoutType](https://en.cppreference.com/w/cpp/named_req/StandardLayoutType):\n   JSON values have\n   [standard layout](https://en.cppreference.com/w/cpp/language/data_members#Standard_layout):\n   All non-static data members are private and standard layout types, the\n   class has no virtual functions or (virtual) base classes.\n- Library-wide\n - [EqualityComparable](https://en.cppreference.com/w/cpp/named_req/EqualityComparable):\n   JSON values can be compared with `==`, see @ref\n   operator==(const_reference,const_reference).\n - [LessThanComparable](https://en.cppreference.com/w/cpp/named_req/LessThanComparable):\n   JSON values can be compared with `<`, see @ref\n   operator<(const_reference,const_reference).\n - [Swappable](https://en.cppreference.com/w/cpp/named_req/Swappable):\n   Any JSON lvalue or rvalue of can be swapped with any lvalue or rvalue of\n   other compatible types, using unqualified function call @ref swap().\n - [NullablePointer](https://en.cppreference.com/w/cpp/named_req/NullablePointer):\n   JSON values can be compared against `std::nullptr_t` objects which are used\n   to model the `null` value.\n- Container\n - [Container](https://en.cppreference.com/w/cpp/named_req/Container):\n   JSON values can be used like STL containers and provide iterator access.\n - [ReversibleContainer](https://en.cppreference.com/w/cpp/named_req/ReversibleContainer);\n   JSON values can be used like STL containers and provide reverse iterator\n   access.\n\n@invariant The member variables @a m_value and @a m_type have the following\nrelationship:\n- If `m_type == value_t::object`, then `m_value.object != nullptr`.\n- If `m_type == value_t::array`, then `m_value.array != nullptr`.\n- If `m_type == value_t::string`, then `m_value.string != nullptr`.\nThe invariants are checked by member function assert_invariant().\n\n@internal\n@note ObjectType trick from https://stackoverflow.com/a/9860911\n@endinternal\n\n@see [RFC 7159: The JavaScript Object Notation (JSON) Data Interchange\nFormat](http://rfc7159.net/rfc7159)\n\n@since version 1.0.0\n\n@nosubgrouping\n*/\nNLOHMANN_BASIC_JSON_TPL_DECLARATION\nclass basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-special-member-functions)\n{\n  private:\n    template<detail::value_t> friend struct detail::external_constructor;\n    friend ::nlohmann::json_pointer<basic_json>;\n\n    template<typename BasicJsonType, typename InputType>\n    friend class ::nlohmann::detail::parser;\n    friend ::nlohmann::detail::serializer<basic_json>;\n    template<typename BasicJsonType>\n    friend class ::nlohmann::detail::iter_impl;\n    template<typename BasicJsonType, typename CharType>\n    friend class ::nlohmann::detail::binary_writer;\n    template<typename BasicJsonType, typename InputType, typename SAX>\n    friend class ::nlohmann::detail::binary_reader;\n    template<typename BasicJsonType>\n    friend class ::nlohmann::detail::json_sax_dom_parser;\n    template<typename BasicJsonType>\n    friend class ::nlohmann::detail::json_sax_dom_callback_parser;\n    friend class ::nlohmann::detail::exception;\n\n    /// workaround type for MSVC\n    using basic_json_t = NLOHMANN_BASIC_JSON_TPL;\n\n  JSON_PRIVATE_UNLESS_TESTED:\n    // convenience aliases for types residing in namespace detail;\n    using lexer = ::nlohmann::detail::lexer_base<basic_json>;\n\n    template<typename InputAdapterType>\n    static ::nlohmann::detail::parser<basic_json, InputAdapterType> parser(\n        InputAdapterType adapter,\n        detail::parser_callback_t<basic_json>cb = nullptr,\n        const bool allow_exceptions = true,\n        const bool ignore_comments = false\n                                 )\n    {\n        return ::nlohmann::detail::parser<basic_json, InputAdapterType>(std::move(adapter),\n                std::move(cb), allow_exceptions, ignore_comments);\n    }\n\n  private:\n    using primitive_iterator_t = ::nlohmann::detail::primitive_iterator_t;\n    template<typename BasicJsonType>\n    using internal_iterator = ::nlohmann::detail::internal_iterator<BasicJsonType>;\n    template<typename BasicJsonType>\n    using iter_impl = ::nlohmann::detail::iter_impl<BasicJsonType>;\n    template<typename Iterator>\n    using iteration_proxy = ::nlohmann::detail::iteration_proxy<Iterator>;\n    template<typename Base> using json_reverse_iterator = ::nlohmann::detail::json_reverse_iterator<Base>;\n\n    template<typename CharType>\n    using output_adapter_t = ::nlohmann::detail::output_adapter_t<CharType>;\n\n    template<typename InputType>\n    using binary_reader = ::nlohmann::detail::binary_reader<basic_json, InputType>;\n    template<typename CharType> using binary_writer = ::nlohmann::detail::binary_writer<basic_json, CharType>;\n\n  JSON_PRIVATE_UNLESS_TESTED:\n    using serializer = ::nlohmann::detail::serializer<basic_json>;\n\n  public:\n    using value_t = detail::value_t;\n    /// JSON Pointer, see @ref nlohmann::json_pointer\n    using json_pointer = ::nlohmann::json_pointer<basic_json>;\n    template<typename T, typename SFINAE>\n    using json_serializer = JSONSerializer<T, SFINAE>;\n    /// how to treat decoding errors\n    using error_handler_t = detail::error_handler_t;\n    /// how to treat CBOR tags\n    using cbor_tag_handler_t = detail::cbor_tag_handler_t;\n    /// helper type for initializer lists of basic_json values\n    using initializer_list_t = std::initializer_list<detail::json_ref<basic_json>>;\n\n    using input_format_t = detail::input_format_t;\n    /// SAX interface type, see @ref nlohmann::json_sax\n    using json_sax_t = json_sax<basic_json>;\n\n    ////////////////\n    // exceptions //\n    ////////////////\n\n    /// @name exceptions\n    /// Classes to implement user-defined exceptions.\n    /// @{\n\n    /// @copydoc detail::exception\n    using exception = detail::exception;\n    /// @copydoc detail::parse_error\n    using parse_error = detail::parse_error;\n    /// @copydoc detail::invalid_iterator\n    using invalid_iterator = detail::invalid_iterator;\n    /// @copydoc detail::type_error\n    using type_error = detail::type_error;\n    /// @copydoc detail::out_of_range\n    using out_of_range = detail::out_of_range;\n    /// @copydoc detail::other_error\n    using other_error = detail::other_error;\n\n    /// @}\n\n\n    /////////////////////\n    // container types //\n    /////////////////////\n\n    /// @name container types\n    /// The canonic container types to use @ref basic_json like any other STL\n    /// container.\n    /// @{\n\n    /// the type of elements in a basic_json container\n    using value_type = basic_json;\n\n    /// the type of an element reference\n    using reference = value_type&;\n    /// the type of an element const reference\n    using const_reference = const value_type&;\n\n    /// a type to represent differences between iterators\n    using difference_type = std::ptrdiff_t;\n    /// a type to represent container sizes\n    using size_type = std::size_t;\n\n    /// the allocator type\n    using allocator_type = AllocatorType<basic_json>;\n\n    /// the type of an element pointer\n    using pointer = typename std::allocator_traits<allocator_type>::pointer;\n    /// the type of an element const pointer\n    using const_pointer = typename std::allocator_traits<allocator_type>::const_pointer;\n\n    /// an iterator for a basic_json container\n    using iterator = iter_impl<basic_json>;\n    /// a const iterator for a basic_json container\n    using const_iterator = iter_impl<const basic_json>;\n    /// a reverse iterator for a basic_json container\n    using reverse_iterator = json_reverse_iterator<typename basic_json::iterator>;\n    /// a const reverse iterator for a basic_json container\n    using const_reverse_iterator = json_reverse_iterator<typename basic_json::const_iterator>;\n\n    /// @}\n\n\n    /*!\n    @brief returns the allocator associated with the container\n    */\n    static allocator_type get_allocator()\n    {\n        return allocator_type();\n    }\n\n    /*!\n    @brief returns version information on the library\n\n    This function returns a JSON object with information about the library,\n    including the version number and information on the platform and compiler.\n\n    @return JSON object holding version information\n    key         | description\n    ----------- | ---------------\n    `compiler`  | Information on the used compiler. It is an object with the following keys: `c++` (the used C++ standard), `family` (the compiler family; possible values are `clang`, `icc`, `gcc`, `ilecpp`, `msvc`, `pgcpp`, `sunpro`, and `unknown`), and `version` (the compiler version).\n    `copyright` | The copyright line for the library as string.\n    `name`      | The name of the library as string.\n    `platform`  | The used platform as string. Possible values are `win32`, `linux`, `apple`, `unix`, and `unknown`.\n    `url`       | The URL of the project as string.\n    `version`   | The version of the library. It is an object with the following keys: `major`, `minor`, and `patch` as defined by [Semantic Versioning](http://semver.org), and `string` (the version string).\n\n    @liveexample{The following code shows an example output of the `meta()`\n    function.,meta}\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes to any JSON value.\n\n    @complexity Constant.\n\n    @since 2.1.0\n    */\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json meta()\n    {\n        basic_json result;\n\n        result[\"copyright\"] = \"(C) 2013-2021 Niels Lohmann\";\n        result[\"name\"] = \"JSON for Modern C++\";\n        result[\"url\"] = \"https://github.com/nlohmann/json\";\n        result[\"version\"][\"string\"] =\n            std::to_string(NLOHMANN_JSON_VERSION_MAJOR) + \".\" +\n            std::to_string(NLOHMANN_JSON_VERSION_MINOR) + \".\" +\n            std::to_string(NLOHMANN_JSON_VERSION_PATCH);\n        result[\"version\"][\"major\"] = NLOHMANN_JSON_VERSION_MAJOR;\n        result[\"version\"][\"minor\"] = NLOHMANN_JSON_VERSION_MINOR;\n        result[\"version\"][\"patch\"] = NLOHMANN_JSON_VERSION_PATCH;\n\n#ifdef _WIN32\n        result[\"platform\"] = \"win32\";\n#elif defined __linux__\n        result[\"platform\"] = \"linux\";\n#elif defined __APPLE__\n        result[\"platform\"] = \"apple\";\n#elif defined __unix__\n        result[\"platform\"] = \"unix\";\n#else\n        result[\"platform\"] = \"unknown\";\n#endif\n\n#if defined(__ICC) || defined(__INTEL_COMPILER)\n        result[\"compiler\"] = {{\"family\", \"icc\"}, {\"version\", __INTEL_COMPILER}};\n#elif defined(__clang__)\n        result[\"compiler\"] = {{\"family\", \"clang\"}, {\"version\", __clang_version__}};\n#elif defined(__GNUC__) || defined(__GNUG__)\n        result[\"compiler\"] = {{\"family\", \"gcc\"}, {\"version\", std::to_string(__GNUC__) + \".\" + std::to_string(__GNUC_MINOR__) + \".\" + std::to_string(__GNUC_PATCHLEVEL__)}};\n#elif defined(__HP_cc) || defined(__HP_aCC)\n        result[\"compiler\"] = \"hp\"\n#elif defined(__IBMCPP__)\n        result[\"compiler\"] = {{\"family\", \"ilecpp\"}, {\"version\", __IBMCPP__}};\n#elif defined(_MSC_VER)\n        result[\"compiler\"] = {{\"family\", \"msvc\"}, {\"version\", _MSC_VER}};\n#elif defined(__PGI)\n        result[\"compiler\"] = {{\"family\", \"pgcpp\"}, {\"version\", __PGI}};\n#elif defined(__SUNPRO_CC)\n        result[\"compiler\"] = {{\"family\", \"sunpro\"}, {\"version\", __SUNPRO_CC}};\n#else\n        result[\"compiler\"] = {{\"family\", \"unknown\"}, {\"version\", \"unknown\"}};\n#endif\n\n#ifdef __cplusplus\n        result[\"compiler\"][\"c++\"] = std::to_string(__cplusplus);\n#else\n        result[\"compiler\"][\"c++\"] = \"unknown\";\n#endif\n        return result;\n    }\n\n\n    ///////////////////////////\n    // JSON value data types //\n    ///////////////////////////\n\n    /// @name JSON value data types\n    /// The data types to store a JSON value. These types are derived from\n    /// the template arguments passed to class @ref basic_json.\n    /// @{\n\n#if defined(JSON_HAS_CPP_14)\n    // Use transparent comparator if possible, combined with perfect forwarding\n    // on find() and count() calls prevents unnecessary string construction.\n    using object_comparator_t = std::less<>;\n#else\n    using object_comparator_t = std::less<StringType>;\n#endif\n\n    /*!\n    @brief a type for an object\n\n    [RFC 7159](http://rfc7159.net/rfc7159) describes JSON objects as follows:\n    > An object is an unordered collection of zero or more name/value pairs,\n    > where a name is a string and a value is a string, number, boolean, null,\n    > object, or array.\n\n    To store objects in C++, a type is defined by the template parameters\n    described below.\n\n    @tparam ObjectType  the container to store objects (e.g., `std::map` or\n    `std::unordered_map`)\n    @tparam StringType the type of the keys or names (e.g., `std::string`).\n    The comparison function `std::less<StringType>` is used to order elements\n    inside the container.\n    @tparam AllocatorType the allocator to use for objects (e.g.,\n    `std::allocator`)\n\n    #### Default type\n\n    With the default values for @a ObjectType (`std::map`), @a StringType\n    (`std::string`), and @a AllocatorType (`std::allocator`), the default\n    value for @a object_t is:\n\n    @code {.cpp}\n    std::map<\n      std::string, // key_type\n      basic_json, // value_type\n      std::less<std::string>, // key_compare\n      std::allocator<std::pair<const std::string, basic_json>> // allocator_type\n    >\n    @endcode\n\n    #### Behavior\n\n    The choice of @a object_t influences the behavior of the JSON class. With\n    the default type, objects have the following behavior:\n\n    - When all names are unique, objects will be interoperable in the sense\n      that all software implementations receiving that object will agree on\n      the name-value mappings.\n    - When the names within an object are not unique, it is unspecified which\n      one of the values for a given key will be chosen. For instance,\n      `{\"key\": 2, \"key\": 1}` could be equal to either `{\"key\": 1}` or\n      `{\"key\": 2}`.\n    - Internally, name/value pairs are stored in lexicographical order of the\n      names. Objects will also be serialized (see @ref dump) in this order.\n      For instance, `{\"b\": 1, \"a\": 2}` and `{\"a\": 2, \"b\": 1}` will be stored\n      and serialized as `{\"a\": 2, \"b\": 1}`.\n    - When comparing objects, the order of the name/value pairs is irrelevant.\n      This makes objects interoperable in the sense that they will not be\n      affected by these differences. For instance, `{\"b\": 1, \"a\": 2}` and\n      `{\"a\": 2, \"b\": 1}` will be treated as equal.\n\n    #### Limits\n\n    [RFC 7159](http://rfc7159.net/rfc7159) specifies:\n    > An implementation may set limits on the maximum depth of nesting.\n\n    In this class, the object's limit of nesting is not explicitly constrained.\n    However, a maximum depth of nesting may be introduced by the compiler or\n    runtime environment. A theoretical limit can be queried by calling the\n    @ref max_size function of a JSON object.\n\n    #### Storage\n\n    Objects are stored as pointers in a @ref basic_json type. That is, for any\n    access to object values, a pointer of type `object_t*` must be\n    dereferenced.\n\n    @sa see @ref array_t -- type for an array value\n\n    @since version 1.0.0\n\n    @note The order name/value pairs are added to the object is *not*\n    preserved by the library. Therefore, iterating an object may return\n    name/value pairs in a different order than they were originally stored. In\n    fact, keys will be traversed in alphabetical order as `std::map` with\n    `std::less` is used by default. Please note this behavior conforms to [RFC\n    7159](http://rfc7159.net/rfc7159), because any order implements the\n    specified \"unordered\" nature of JSON objects.\n    */\n    using object_t = ObjectType<StringType,\n          basic_json,\n          object_comparator_t,\n          AllocatorType<std::pair<const StringType,\n          basic_json>>>;\n\n    /*!\n    @brief a type for an array\n\n    [RFC 7159](http://rfc7159.net/rfc7159) describes JSON arrays as follows:\n    > An array is an ordered sequence of zero or more values.\n\n    To store objects in C++, a type is defined by the template parameters\n    explained below.\n\n    @tparam ArrayType  container type to store arrays (e.g., `std::vector` or\n    `std::list`)\n    @tparam AllocatorType allocator to use for arrays (e.g., `std::allocator`)\n\n    #### Default type\n\n    With the default values for @a ArrayType (`std::vector`) and @a\n    AllocatorType (`std::allocator`), the default value for @a array_t is:\n\n    @code {.cpp}\n    std::vector<\n      basic_json, // value_type\n      std::allocator<basic_json> // allocator_type\n    >\n    @endcode\n\n    #### Limits\n\n    [RFC 7159](http://rfc7159.net/rfc7159) specifies:\n    > An implementation may set limits on the maximum depth of nesting.\n\n    In this class, the array's limit of nesting is not explicitly constrained.\n    However, a maximum depth of nesting may be introduced by the compiler or\n    runtime environment. A theoretical limit can be queried by calling the\n    @ref max_size function of a JSON array.\n\n    #### Storage\n\n    Arrays are stored as pointers in a @ref basic_json type. That is, for any\n    access to array values, a pointer of type `array_t*` must be dereferenced.\n\n    @sa see @ref object_t -- type for an object value\n\n    @since version 1.0.0\n    */\n    using array_t = ArrayType<basic_json, AllocatorType<basic_json>>;\n\n    /*!\n    @brief a type for a string\n\n    [RFC 7159](http://rfc7159.net/rfc7159) describes JSON strings as follows:\n    > A string is a sequence of zero or more Unicode characters.\n\n    To store objects in C++, a type is defined by the template parameter\n    described below. Unicode values are split by the JSON class into\n    byte-sized characters during deserialization.\n\n    @tparam StringType  the container to store strings (e.g., `std::string`).\n    Note this container is used for keys/names in objects, see @ref object_t.\n\n    #### Default type\n\n    With the default values for @a StringType (`std::string`), the default\n    value for @a string_t is:\n\n    @code {.cpp}\n    std::string\n    @endcode\n\n    #### Encoding\n\n    Strings are stored in UTF-8 encoding. Therefore, functions like\n    `std::string::size()` or `std::string::length()` return the number of\n    bytes in the string rather than the number of characters or glyphs.\n\n    #### String comparison\n\n    [RFC 7159](http://rfc7159.net/rfc7159) states:\n    > Software implementations are typically required to test names of object\n    > members for equality. Implementations that transform the textual\n    > representation into sequences of Unicode code units and then perform the\n    > comparison numerically, code unit by code unit, are interoperable in the\n    > sense that implementations will agree in all cases on equality or\n    > inequality of two strings. For example, implementations that compare\n    > strings with escaped characters unconverted may incorrectly find that\n    > `\"a\\\\b\"` and `\"a\\u005Cb\"` are not equal.\n\n    This implementation is interoperable as it does compare strings code unit\n    by code unit.\n\n    #### Storage\n\n    String values are stored as pointers in a @ref basic_json type. That is,\n    for any access to string values, a pointer of type `string_t*` must be\n    dereferenced.\n\n    @since version 1.0.0\n    */\n    using string_t = StringType;\n\n    /*!\n    @brief a type for a boolean\n\n    [RFC 7159](http://rfc7159.net/rfc7159) implicitly describes a boolean as a\n    type which differentiates the two literals `true` and `false`.\n\n    To store objects in C++, a type is defined by the template parameter @a\n    BooleanType which chooses the type to use.\n\n    #### Default type\n\n    With the default values for @a BooleanType (`bool`), the default value for\n    @a boolean_t is:\n\n    @code {.cpp}\n    bool\n    @endcode\n\n    #### Storage\n\n    Boolean values are stored directly inside a @ref basic_json type.\n\n    @since version 1.0.0\n    */\n    using boolean_t = BooleanType;\n\n    /*!\n    @brief a type for a number (integer)\n\n    [RFC 7159](http://rfc7159.net/rfc7159) describes numbers as follows:\n    > The representation of numbers is similar to that used in most\n    > programming languages. A number is represented in base 10 using decimal\n    > digits. It contains an integer component that may be prefixed with an\n    > optional minus sign, which may be followed by a fraction part and/or an\n    > exponent part. Leading zeros are not allowed. (...) Numeric values that\n    > cannot be represented in the grammar below (such as Infinity and NaN)\n    > are not permitted.\n\n    This description includes both integer and floating-point numbers.\n    However, C++ allows more precise storage if it is known whether the number\n    is a signed integer, an unsigned integer or a floating-point number.\n    Therefore, three different types, @ref number_integer_t, @ref\n    number_unsigned_t and @ref number_float_t are used.\n\n    To store integer numbers in C++, a type is defined by the template\n    parameter @a NumberIntegerType which chooses the type to use.\n\n    #### Default type\n\n    With the default values for @a NumberIntegerType (`int64_t`), the default\n    value for @a number_integer_t is:\n\n    @code {.cpp}\n    int64_t\n    @endcode\n\n    #### Default behavior\n\n    - The restrictions about leading zeros is not enforced in C++. Instead,\n      leading zeros in integer literals lead to an interpretation as octal\n      number. Internally, the value will be stored as decimal number. For\n      instance, the C++ integer literal `010` will be serialized to `8`.\n      During deserialization, leading zeros yield an error.\n    - Not-a-number (NaN) values will be serialized to `null`.\n\n    #### Limits\n\n    [RFC 7159](http://rfc7159.net/rfc7159) specifies:\n    > An implementation may set limits on the range and precision of numbers.\n\n    When the default type is used, the maximal integer number that can be\n    stored is `9223372036854775807` (INT64_MAX) and the minimal integer number\n    that can be stored is `-9223372036854775808` (INT64_MIN). Integer numbers\n    that are out of range will yield over/underflow when used in a\n    constructor. During deserialization, too large or small integer numbers\n    will be automatically be stored as @ref number_unsigned_t or @ref\n    number_float_t.\n\n    [RFC 7159](http://rfc7159.net/rfc7159) further states:\n    > Note that when such software is used, numbers that are integers and are\n    > in the range \\f$[-2^{53}+1, 2^{53}-1]\\f$ are interoperable in the sense\n    > that implementations will agree exactly on their numeric values.\n\n    As this range is a subrange of the exactly supported range [INT64_MIN,\n    INT64_MAX], this class's integer type is interoperable.\n\n    #### Storage\n\n    Integer number values are stored directly inside a @ref basic_json type.\n\n    @sa see @ref number_float_t -- type for number values (floating-point)\n\n    @sa see @ref number_unsigned_t -- type for number values (unsigned integer)\n\n    @since version 1.0.0\n    */\n    using number_integer_t = NumberIntegerType;\n\n    /*!\n    @brief a type for a number (unsigned)\n\n    [RFC 7159](http://rfc7159.net/rfc7159) describes numbers as follows:\n    > The representation of numbers is similar to that used in most\n    > programming languages. A number is represented in base 10 using decimal\n    > digits. It contains an integer component that may be prefixed with an\n    > optional minus sign, which may be followed by a fraction part and/or an\n    > exponent part. Leading zeros are not allowed. (...) Numeric values that\n    > cannot be represented in the grammar below (such as Infinity and NaN)\n    > are not permitted.\n\n    This description includes both integer and floating-point numbers.\n    However, C++ allows more precise storage if it is known whether the number\n    is a signed integer, an unsigned integer or a floating-point number.\n    Therefore, three different types, @ref number_integer_t, @ref\n    number_unsigned_t and @ref number_float_t are used.\n\n    To store unsigned integer numbers in C++, a type is defined by the\n    template parameter @a NumberUnsignedType which chooses the type to use.\n\n    #### Default type\n\n    With the default values for @a NumberUnsignedType (`uint64_t`), the\n    default value for @a number_unsigned_t is:\n\n    @code {.cpp}\n    uint64_t\n    @endcode\n\n    #### Default behavior\n\n    - The restrictions about leading zeros is not enforced in C++. Instead,\n      leading zeros in integer literals lead to an interpretation as octal\n      number. Internally, the value will be stored as decimal number. For\n      instance, the C++ integer literal `010` will be serialized to `8`.\n      During deserialization, leading zeros yield an error.\n    - Not-a-number (NaN) values will be serialized to `null`.\n\n    #### Limits\n\n    [RFC 7159](http://rfc7159.net/rfc7159) specifies:\n    > An implementation may set limits on the range and precision of numbers.\n\n    When the default type is used, the maximal integer number that can be\n    stored is `18446744073709551615` (UINT64_MAX) and the minimal integer\n    number that can be stored is `0`. Integer numbers that are out of range\n    will yield over/underflow when used in a constructor. During\n    deserialization, too large or small integer numbers will be automatically\n    be stored as @ref number_integer_t or @ref number_float_t.\n\n    [RFC 7159](http://rfc7159.net/rfc7159) further states:\n    > Note that when such software is used, numbers that are integers and are\n    > in the range \\f$[-2^{53}+1, 2^{53}-1]\\f$ are interoperable in the sense\n    > that implementations will agree exactly on their numeric values.\n\n    As this range is a subrange (when considered in conjunction with the\n    number_integer_t type) of the exactly supported range [0, UINT64_MAX],\n    this class's integer type is interoperable.\n\n    #### Storage\n\n    Integer number values are stored directly inside a @ref basic_json type.\n\n    @sa see @ref number_float_t -- type for number values (floating-point)\n    @sa see @ref number_integer_t -- type for number values (integer)\n\n    @since version 2.0.0\n    */\n    using number_unsigned_t = NumberUnsignedType;\n\n    /*!\n    @brief a type for a number (floating-point)\n\n    [RFC 7159](http://rfc7159.net/rfc7159) describes numbers as follows:\n    > The representation of numbers is similar to that used in most\n    > programming languages. A number is represented in base 10 using decimal\n    > digits. It contains an integer component that may be prefixed with an\n    > optional minus sign, which may be followed by a fraction part and/or an\n    > exponent part. Leading zeros are not allowed. (...) Numeric values that\n    > cannot be represented in the grammar below (such as Infinity and NaN)\n    > are not permitted.\n\n    This description includes both integer and floating-point numbers.\n    However, C++ allows more precise storage if it is known whether the number\n    is a signed integer, an unsigned integer or a floating-point number.\n    Therefore, three different types, @ref number_integer_t, @ref\n    number_unsigned_t and @ref number_float_t are used.\n\n    To store floating-point numbers in C++, a type is defined by the template\n    parameter @a NumberFloatType which chooses the type to use.\n\n    #### Default type\n\n    With the default values for @a NumberFloatType (`double`), the default\n    value for @a number_float_t is:\n\n    @code {.cpp}\n    double\n    @endcode\n\n    #### Default behavior\n\n    - The restrictions about leading zeros is not enforced in C++. Instead,\n      leading zeros in floating-point literals will be ignored. Internally,\n      the value will be stored as decimal number. For instance, the C++\n      floating-point literal `01.2` will be serialized to `1.2`. During\n      deserialization, leading zeros yield an error.\n    - Not-a-number (NaN) values will be serialized to `null`.\n\n    #### Limits\n\n    [RFC 7159](http://rfc7159.net/rfc7159) states:\n    > This specification allows implementations to set limits on the range and\n    > precision of numbers accepted. Since software that implements IEEE\n    > 754-2008 binary64 (double precision) numbers is generally available and\n    > widely used, good interoperability can be achieved by implementations\n    > that expect no more precision or range than these provide, in the sense\n    > that implementations will approximate JSON numbers within the expected\n    > precision.\n\n    This implementation does exactly follow this approach, as it uses double\n    precision floating-point numbers. Note values smaller than\n    `-1.79769313486232e+308` and values greater than `1.79769313486232e+308`\n    will be stored as NaN internally and be serialized to `null`.\n\n    #### Storage\n\n    Floating-point number values are stored directly inside a @ref basic_json\n    type.\n\n    @sa see @ref number_integer_t -- type for number values (integer)\n\n    @sa see @ref number_unsigned_t -- type for number values (unsigned integer)\n\n    @since version 1.0.0\n    */\n    using number_float_t = NumberFloatType;\n\n    /*!\n    @brief a type for a packed binary type\n\n    This type is a type designed to carry binary data that appears in various\n    serialized formats, such as CBOR's Major Type 2, MessagePack's bin, and\n    BSON's generic binary subtype. This type is NOT a part of standard JSON and\n    exists solely for compatibility with these binary types. As such, it is\n    simply defined as an ordered sequence of zero or more byte values.\n\n    Additionally, as an implementation detail, the subtype of the binary data is\n    carried around as a `std::uint8_t`, which is compatible with both of the\n    binary data formats that use binary subtyping, (though the specific\n    numbering is incompatible with each other, and it is up to the user to\n    translate between them).\n\n    [CBOR's RFC 7049](https://tools.ietf.org/html/rfc7049) describes this type\n    as:\n    > Major type 2: a byte string. The string's length in bytes is represented\n    > following the rules for positive integers (major type 0).\n\n    [MessagePack's documentation on the bin type\n    family](https://github.com/msgpack/msgpack/blob/master/spec.md#bin-format-family)\n    describes this type as:\n    > Bin format family stores an byte array in 2, 3, or 5 bytes of extra bytes\n    > in addition to the size of the byte array.\n\n    [BSON's specifications](http://bsonspec.org/spec.html) describe several\n    binary types; however, this type is intended to represent the generic binary\n    type which has the description:\n    > Generic binary subtype - This is the most commonly used binary subtype and\n    > should be the 'default' for drivers and tools.\n\n    None of these impose any limitations on the internal representation other\n    than the basic unit of storage be some type of array whose parts are\n    decomposable into bytes.\n\n    The default representation of this binary format is a\n    `std::vector<std::uint8_t>`, which is a very common way to represent a byte\n    array in modern C++.\n\n    #### Default type\n\n    The default values for @a BinaryType is `std::vector<std::uint8_t>`\n\n    #### Storage\n\n    Binary Arrays are stored as pointers in a @ref basic_json type. That is,\n    for any access to array values, a pointer of the type `binary_t*` must be\n    dereferenced.\n\n    #### Notes on subtypes\n\n    - CBOR\n       - Binary values are represented as byte strings. No subtypes are\n         supported and will be ignored when CBOR is written.\n    - MessagePack\n       - If a subtype is given and the binary array contains exactly 1, 2, 4, 8,\n         or 16 elements, the fixext family (fixext1, fixext2, fixext4, fixext8)\n         is used. For other sizes, the ext family (ext8, ext16, ext32) is used.\n         The subtype is then added as singed 8-bit integer.\n       - If no subtype is given, the bin family (bin8, bin16, bin32) is used.\n    - BSON\n       - If a subtype is given, it is used and added as unsigned 8-bit integer.\n       - If no subtype is given, the generic binary subtype 0x00 is used.\n\n    @sa see @ref binary -- create a binary array\n\n    @since version 3.8.0\n    */\n    using binary_t = nlohmann::byte_container_with_subtype<BinaryType>;\n    /// @}\n\n  private:\n\n    /// helper for exception-safe object creation\n    template<typename T, typename... Args>\n    JSON_HEDLEY_RETURNS_NON_NULL\n    static T* create(Args&& ... args)\n    {\n        AllocatorType<T> alloc;\n        using AllocatorTraits = std::allocator_traits<AllocatorType<T>>;\n\n        auto deleter = [&](T * obj)\n        {\n            AllocatorTraits::deallocate(alloc, obj, 1);\n        };\n        std::unique_ptr<T, decltype(deleter)> obj(AllocatorTraits::allocate(alloc, 1), deleter);\n        AllocatorTraits::construct(alloc, obj.get(), std::forward<Args>(args)...);\n        JSON_ASSERT(obj != nullptr);\n        return obj.release();\n    }\n\n    ////////////////////////\n    // JSON value storage //\n    ////////////////////////\n\n  JSON_PRIVATE_UNLESS_TESTED:\n    /*!\n    @brief a JSON value\n\n    The actual storage for a JSON value of the @ref basic_json class. This\n    union combines the different storage types for the JSON value types\n    defined in @ref value_t.\n\n    JSON type | value_t type    | used type\n    --------- | --------------- | ------------------------\n    object    | object          | pointer to @ref object_t\n    array     | array           | pointer to @ref array_t\n    string    | string          | pointer to @ref string_t\n    boolean   | boolean         | @ref boolean_t\n    number    | number_integer  | @ref number_integer_t\n    number    | number_unsigned | @ref number_unsigned_t\n    number    | number_float    | @ref number_float_t\n    binary    | binary          | pointer to @ref binary_t\n    null      | null            | *no value is stored*\n\n    @note Variable-length types (objects, arrays, and strings) are stored as\n    pointers. The size of the union should not exceed 64 bits if the default\n    value types are used.\n\n    @since version 1.0.0\n    */\n    union json_value\n    {\n        /// object (stored with pointer to save storage)\n        object_t* object;\n        /// array (stored with pointer to save storage)\n        array_t* array;\n        /// string (stored with pointer to save storage)\n        string_t* string;\n        /// binary (stored with pointer to save storage)\n        binary_t* binary;\n        /// boolean\n        boolean_t boolean;\n        /// number (integer)\n        number_integer_t number_integer;\n        /// number (unsigned integer)\n        number_unsigned_t number_unsigned;\n        /// number (floating-point)\n        number_float_t number_float;\n\n        /// default constructor (for null values)\n        json_value() = default;\n        /// constructor for booleans\n        json_value(boolean_t v) noexcept : boolean(v) {}\n        /// constructor for numbers (integer)\n        json_value(number_integer_t v) noexcept : number_integer(v) {}\n        /// constructor for numbers (unsigned)\n        json_value(number_unsigned_t v) noexcept : number_unsigned(v) {}\n        /// constructor for numbers (floating-point)\n        json_value(number_float_t v) noexcept : number_float(v) {}\n        /// constructor for empty values of a given type\n        json_value(value_t t)\n        {\n            switch (t)\n            {\n                case value_t::object:\n                {\n                    object = create<object_t>();\n                    break;\n                }\n\n                case value_t::array:\n                {\n                    array = create<array_t>();\n                    break;\n                }\n\n                case value_t::string:\n                {\n                    string = create<string_t>(\"\");\n                    break;\n                }\n\n                case value_t::binary:\n                {\n                    binary = create<binary_t>();\n                    break;\n                }\n\n                case value_t::boolean:\n                {\n                    boolean = boolean_t(false);\n                    break;\n                }\n\n                case value_t::number_integer:\n                {\n                    number_integer = number_integer_t(0);\n                    break;\n                }\n\n                case value_t::number_unsigned:\n                {\n                    number_unsigned = number_unsigned_t(0);\n                    break;\n                }\n\n                case value_t::number_float:\n                {\n                    number_float = number_float_t(0.0);\n                    break;\n                }\n\n                case value_t::null:\n                {\n                    object = nullptr;  // silence warning, see #821\n                    break;\n                }\n\n                default:\n                {\n                    object = nullptr;  // silence warning, see #821\n                    if (JSON_HEDLEY_UNLIKELY(t == value_t::null))\n                    {\n                        JSON_THROW(other_error::create(500, \"961c151d2e87f2686a955a9be24d316f1362bf21 3.9.1\", basic_json())); // LCOV_EXCL_LINE\n                    }\n                    break;\n                }\n            }\n        }\n\n        /// constructor for strings\n        json_value(const string_t& value)\n        {\n            string = create<string_t>(value);\n        }\n\n        /// constructor for rvalue strings\n        json_value(string_t&& value)\n        {\n            string = create<string_t>(std::move(value));\n        }\n\n        /// constructor for objects\n        json_value(const object_t& value)\n        {\n            object = create<object_t>(value);\n        }\n\n        /// constructor for rvalue objects\n        json_value(object_t&& value)\n        {\n            object = create<object_t>(std::move(value));\n        }\n\n        /// constructor for arrays\n        json_value(const array_t& value)\n        {\n            array = create<array_t>(value);\n        }\n\n        /// constructor for rvalue arrays\n        json_value(array_t&& value)\n        {\n            array = create<array_t>(std::move(value));\n        }\n\n        /// constructor for binary arrays\n        json_value(const typename binary_t::container_type& value)\n        {\n            binary = create<binary_t>(value);\n        }\n\n        /// constructor for rvalue binary arrays\n        json_value(typename binary_t::container_type&& value)\n        {\n            binary = create<binary_t>(std::move(value));\n        }\n\n        /// constructor for binary arrays (internal type)\n        json_value(const binary_t& value)\n        {\n            binary = create<binary_t>(value);\n        }\n\n        /// constructor for rvalue binary arrays (internal type)\n        json_value(binary_t&& value)\n        {\n            binary = create<binary_t>(std::move(value));\n        }\n\n        void destroy(value_t t) noexcept\n        {\n            // flatten the current json_value to a heap-allocated stack\n            std::vector<basic_json> stack;\n\n            // move the top-level items to stack\n            if (t == value_t::array)\n            {\n                stack.reserve(array->size());\n                std::move(array->begin(), array->end(), std::back_inserter(stack));\n            }\n            else if (t == value_t::object)\n            {\n                stack.reserve(object->size());\n                for (auto&& it : *object)\n                {\n                    stack.push_back(std::move(it.second));\n                }\n            }\n\n            while (!stack.empty())\n            {\n                // move the last item to local variable to be processed\n                basic_json current_item(std::move(stack.back()));\n                stack.pop_back();\n\n                // if current_item is array/object, move\n                // its children to the stack to be processed later\n                if (current_item.is_array())\n                {\n                    std::move(current_item.m_value.array->begin(), current_item.m_value.array->end(),\n                              std::back_inserter(stack));\n\n                    current_item.m_value.array->clear();\n                }\n                else if (current_item.is_object())\n                {\n                    for (auto&& it : *current_item.m_value.object)\n                    {\n                        stack.push_back(std::move(it.second));\n                    }\n\n                    current_item.m_value.object->clear();\n                }\n\n                // it's now safe that current_item get destructed\n                // since it doesn't have any children\n            }\n\n            switch (t)\n            {\n                case value_t::object:\n                {\n                    AllocatorType<object_t> alloc;\n                    std::allocator_traits<decltype(alloc)>::destroy(alloc, object);\n                    std::allocator_traits<decltype(alloc)>::deallocate(alloc, object, 1);\n                    break;\n                }\n\n                case value_t::array:\n                {\n                    AllocatorType<array_t> alloc;\n                    std::allocator_traits<decltype(alloc)>::destroy(alloc, array);\n                    std::allocator_traits<decltype(alloc)>::deallocate(alloc, array, 1);\n                    break;\n                }\n\n                case value_t::string:\n                {\n                    AllocatorType<string_t> alloc;\n                    std::allocator_traits<decltype(alloc)>::destroy(alloc, string);\n                    std::allocator_traits<decltype(alloc)>::deallocate(alloc, string, 1);\n                    break;\n                }\n\n                case value_t::binary:\n                {\n                    AllocatorType<binary_t> alloc;\n                    std::allocator_traits<decltype(alloc)>::destroy(alloc, binary);\n                    std::allocator_traits<decltype(alloc)>::deallocate(alloc, binary, 1);\n                    break;\n                }\n\n                default:\n                {\n                    break;\n                }\n            }\n        }\n    };\n\n  private:\n    /*!\n    @brief checks the class invariants\n\n    This function asserts the class invariants. It needs to be called at the\n    end of every constructor to make sure that created objects respect the\n    invariant. Furthermore, it has to be called each time the type of a JSON\n    value is changed, because the invariant expresses a relationship between\n    @a m_type and @a m_value.\n\n    Furthermore, the parent relation is checked for arrays and objects: If\n    @a check_parents true and the value is an array or object, then the\n    container's elements must have the current value as parent.\n\n    @param[in] check_parents  whether the parent relation should be checked.\n               The value is true by default and should only be set to false\n               during destruction of objects when the invariant does not\n               need to hold.\n    */\n    void assert_invariant(bool check_parents = true) const noexcept\n    {\n        JSON_ASSERT(m_type != value_t::object || m_value.object != nullptr);\n        JSON_ASSERT(m_type != value_t::array || m_value.array != nullptr);\n        JSON_ASSERT(m_type != value_t::string || m_value.string != nullptr);\n        JSON_ASSERT(m_type != value_t::binary || m_value.binary != nullptr);\n\n#if JSON_DIAGNOSTICS\n        JSON_TRY\n        {\n            // cppcheck-suppress assertWithSideEffect\n            JSON_ASSERT(!check_parents || !is_structured() || std::all_of(begin(), end(), [this](const basic_json & j)\n            {\n                return j.m_parent == this;\n            }));\n        }\n        JSON_CATCH(...) {} // LCOV_EXCL_LINE\n#else\n        static_cast<void>(check_parents);\n#endif\n    }\n\n    void set_parents()\n    {\n#if JSON_DIAGNOSTICS\n        switch (m_type)\n        {\n            case value_t::array:\n            {\n                for (auto& element : *m_value.array)\n                {\n                    element.m_parent = this;\n                }\n                break;\n            }\n\n            case value_t::object:\n            {\n                for (auto& element : *m_value.object)\n                {\n                    element.second.m_parent = this;\n                }\n                break;\n            }\n\n            default:\n                break;\n        }\n#endif\n    }\n\n    iterator set_parents(iterator it, typename iterator::difference_type count)\n    {\n#if JSON_DIAGNOSTICS\n        for (typename iterator::difference_type i = 0; i < count; ++i)\n        {\n            (it + i)->m_parent = this;\n        }\n#else\n        static_cast<void>(count);\n#endif\n        return it;\n    }\n\n    reference set_parent(reference j)\n    {\n#if JSON_DIAGNOSTICS\n        j.m_parent = this;\n#else\n        static_cast<void>(j);\n#endif\n        return j;\n    }\n\n  public:\n    //////////////////////////\n    // JSON parser callback //\n    //////////////////////////\n\n    /*!\n    @brief parser event types\n\n    The parser callback distinguishes the following events:\n    - `object_start`: the parser read `{` and started to process a JSON object\n    - `key`: the parser read a key of a value in an object\n    - `object_end`: the parser read `}` and finished processing a JSON object\n    - `array_start`: the parser read `[` and started to process a JSON array\n    - `array_end`: the parser read `]` and finished processing a JSON array\n    - `value`: the parser finished reading a JSON value\n\n    @image html callback_events.png \"Example when certain parse events are triggered\"\n\n    @sa see @ref parser_callback_t for more information and examples\n    */\n    using parse_event_t = detail::parse_event_t;\n\n    /*!\n    @brief per-element parser callback type\n\n    With a parser callback function, the result of parsing a JSON text can be\n    influenced. When passed to @ref parse, it is called on certain events\n    (passed as @ref parse_event_t via parameter @a event) with a set recursion\n    depth @a depth and context JSON value @a parsed. The return value of the\n    callback function is a boolean indicating whether the element that emitted\n    the callback shall be kept or not.\n\n    We distinguish six scenarios (determined by the event type) in which the\n    callback function can be called. The following table describes the values\n    of the parameters @a depth, @a event, and @a parsed.\n\n    parameter @a event | description | parameter @a depth | parameter @a parsed\n    ------------------ | ----------- | ------------------ | -------------------\n    parse_event_t::object_start | the parser read `{` and started to process a JSON object | depth of the parent of the JSON object | a JSON value with type discarded\n    parse_event_t::key | the parser read a key of a value in an object | depth of the currently parsed JSON object | a JSON string containing the key\n    parse_event_t::object_end | the parser read `}` and finished processing a JSON object | depth of the parent of the JSON object | the parsed JSON object\n    parse_event_t::array_start | the parser read `[` and started to process a JSON array | depth of the parent of the JSON array | a JSON value with type discarded\n    parse_event_t::array_end | the parser read `]` and finished processing a JSON array | depth of the parent of the JSON array | the parsed JSON array\n    parse_event_t::value | the parser finished reading a JSON value | depth of the value | the parsed JSON value\n\n    @image html callback_events.png \"Example when certain parse events are triggered\"\n\n    Discarding a value (i.e., returning `false`) has different effects\n    depending on the context in which function was called:\n\n    - Discarded values in structured types are skipped. That is, the parser\n      will behave as if the discarded value was never read.\n    - In case a value outside a structured type is skipped, it is replaced\n      with `null`. This case happens if the top-level element is skipped.\n\n    @param[in] depth  the depth of the recursion during parsing\n\n    @param[in] event  an event of type parse_event_t indicating the context in\n    the callback function has been called\n\n    @param[in,out] parsed  the current intermediate parse result; note that\n    writing to this value has no effect for parse_event_t::key events\n\n    @return Whether the JSON value which called the function during parsing\n    should be kept (`true`) or not (`false`). In the latter case, it is either\n    skipped completely or replaced by an empty discarded object.\n\n    @sa see @ref parse for examples\n\n    @since version 1.0.0\n    */\n    using parser_callback_t = detail::parser_callback_t<basic_json>;\n\n    //////////////////\n    // constructors //\n    //////////////////\n\n    /// @name constructors and destructors\n    /// Constructors of class @ref basic_json, copy/move constructor, copy\n    /// assignment, static functions creating objects, and the destructor.\n    /// @{\n\n    /*!\n    @brief create an empty value with a given type\n\n    Create an empty JSON value with a given type. The value will be default\n    initialized with an empty value which depends on the type:\n\n    Value type  | initial value\n    ----------- | -------------\n    null        | `null`\n    boolean     | `false`\n    string      | `\"\"`\n    number      | `0`\n    object      | `{}`\n    array       | `[]`\n    binary      | empty array\n\n    @param[in] v  the type of the value to create\n\n    @complexity Constant.\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes to any JSON value.\n\n    @liveexample{The following code shows the constructor for different @ref\n    value_t values,basic_json__value_t}\n\n    @sa see @ref clear() -- restores the postcondition of this constructor\n\n    @since version 1.0.0\n    */\n    basic_json(const value_t v)\n        : m_type(v), m_value(v)\n    {\n        assert_invariant();\n    }\n\n    /*!\n    @brief create a null object\n\n    Create a `null` JSON value. It either takes a null pointer as parameter\n    (explicitly creating `null`) or no parameter (implicitly creating `null`).\n    The passed null pointer itself is not read -- it is only used to choose\n    the right constructor.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this constructor never throws\n    exceptions.\n\n    @liveexample{The following code shows the constructor with and without a\n    null pointer parameter.,basic_json__nullptr_t}\n\n    @since version 1.0.0\n    */\n    basic_json(std::nullptr_t = nullptr) noexcept\n        : basic_json(value_t::null)\n    {\n        assert_invariant();\n    }\n\n    /*!\n    @brief create a JSON value\n\n    This is a \"catch all\" constructor for all compatible JSON types; that is,\n    types for which a `to_json()` method exists. The constructor forwards the\n    parameter @a val to that method (to `json_serializer<U>::to_json` method\n    with `U = uncvref_t<CompatibleType>`, to be exact).\n\n    Template type @a CompatibleType includes, but is not limited to, the\n    following types:\n    - **arrays**: @ref array_t and all kinds of compatible containers such as\n      `std::vector`, `std::deque`, `std::list`, `std::forward_list`,\n      `std::array`, `std::valarray`, `std::set`, `std::unordered_set`,\n      `std::multiset`, and `std::unordered_multiset` with a `value_type` from\n      which a @ref basic_json value can be constructed.\n    - **objects**: @ref object_t and all kinds of compatible associative\n      containers such as `std::map`, `std::unordered_map`, `std::multimap`,\n      and `std::unordered_multimap` with a `key_type` compatible to\n      @ref string_t and a `value_type` from which a @ref basic_json value can\n      be constructed.\n    - **strings**: @ref string_t, string literals, and all compatible string\n      containers can be used.\n    - **numbers**: @ref number_integer_t, @ref number_unsigned_t,\n      @ref number_float_t, and all convertible number types such as `int`,\n      `size_t`, `int64_t`, `float` or `double` can be used.\n    - **boolean**: @ref boolean_t / `bool` can be used.\n    - **binary**: @ref binary_t / `std::vector<uint8_t>` may be used,\n      unfortunately because string literals cannot be distinguished from binary\n      character arrays by the C++ type system, all types compatible with `const\n      char*` will be directed to the string constructor instead.  This is both\n      for backwards compatibility, and due to the fact that a binary type is not\n      a standard JSON type.\n\n    See the examples below.\n\n    @tparam CompatibleType a type such that:\n    - @a CompatibleType is not derived from `std::istream`,\n    - @a CompatibleType is not @ref basic_json (to avoid hijacking copy/move\n         constructors),\n    - @a CompatibleType is not a different @ref basic_json type (i.e. with different template arguments)\n    - @a CompatibleType is not a @ref basic_json nested type (e.g.,\n         @ref json_pointer, @ref iterator, etc ...)\n    - `json_serializer<U>` has a `to_json(basic_json_t&, CompatibleType&&)` method\n\n    @tparam U = `uncvref_t<CompatibleType>`\n\n    @param[in] val the value to be forwarded to the respective constructor\n\n    @complexity Usually linear in the size of the passed @a val, also\n                depending on the implementation of the called `to_json()`\n                method.\n\n    @exceptionsafety Depends on the called constructor. For types directly\n    supported by the library (i.e., all types for which no `to_json()` function\n    was provided), strong guarantee holds: if an exception is thrown, there are\n    no changes to any JSON value.\n\n    @liveexample{The following code shows the constructor with several\n    compatible types.,basic_json__CompatibleType}\n\n    @since version 2.1.0\n    */\n    template < typename CompatibleType,\n               typename U = detail::uncvref_t<CompatibleType>,\n               detail::enable_if_t <\n                   !detail::is_basic_json<U>::value && detail::is_compatible_type<basic_json_t, U>::value, int > = 0 >\n    basic_json(CompatibleType && val) noexcept(noexcept( // NOLINT(bugprone-forwarding-reference-overload,bugprone-exception-escape)\n                JSONSerializer<U>::to_json(std::declval<basic_json_t&>(),\n                                           std::forward<CompatibleType>(val))))\n    {\n        JSONSerializer<U>::to_json(*this, std::forward<CompatibleType>(val));\n        set_parents();\n        assert_invariant();\n    }\n\n    /*!\n    @brief create a JSON value from an existing one\n\n    This is a constructor for existing @ref basic_json types.\n    It does not hijack copy/move constructors, since the parameter has different\n    template arguments than the current ones.\n\n    The constructor tries to convert the internal @ref m_value of the parameter.\n\n    @tparam BasicJsonType a type such that:\n    - @a BasicJsonType is a @ref basic_json type.\n    - @a BasicJsonType has different template arguments than @ref basic_json_t.\n\n    @param[in] val the @ref basic_json value to be converted.\n\n    @complexity Usually linear in the size of the passed @a val, also\n                depending on the implementation of the called `to_json()`\n                method.\n\n    @exceptionsafety Depends on the called constructor. For types directly\n    supported by the library (i.e., all types for which no `to_json()` function\n    was provided), strong guarantee holds: if an exception is thrown, there are\n    no changes to any JSON value.\n\n    @since version 3.2.0\n    */\n    template < typename BasicJsonType,\n               detail::enable_if_t <\n                   detail::is_basic_json<BasicJsonType>::value&& !std::is_same<basic_json, BasicJsonType>::value, int > = 0 >\n    basic_json(const BasicJsonType& val)\n    {\n        using other_boolean_t = typename BasicJsonType::boolean_t;\n        using other_number_float_t = typename BasicJsonType::number_float_t;\n        using other_number_integer_t = typename BasicJsonType::number_integer_t;\n        using other_number_unsigned_t = typename BasicJsonType::number_unsigned_t;\n        using other_string_t = typename BasicJsonType::string_t;\n        using other_object_t = typename BasicJsonType::object_t;\n        using other_array_t = typename BasicJsonType::array_t;\n        using other_binary_t = typename BasicJsonType::binary_t;\n\n        switch (val.type())\n        {\n            case value_t::boolean:\n                JSONSerializer<other_boolean_t>::to_json(*this, val.template get<other_boolean_t>());\n                break;\n            case value_t::number_float:\n                JSONSerializer<other_number_float_t>::to_json(*this, val.template get<other_number_float_t>());\n                break;\n            case value_t::number_integer:\n                JSONSerializer<other_number_integer_t>::to_json(*this, val.template get<other_number_integer_t>());\n                break;\n            case value_t::number_unsigned:\n                JSONSerializer<other_number_unsigned_t>::to_json(*this, val.template get<other_number_unsigned_t>());\n                break;\n            case value_t::string:\n                JSONSerializer<other_string_t>::to_json(*this, val.template get_ref<const other_string_t&>());\n                break;\n            case value_t::object:\n                JSONSerializer<other_object_t>::to_json(*this, val.template get_ref<const other_object_t&>());\n                break;\n            case value_t::array:\n                JSONSerializer<other_array_t>::to_json(*this, val.template get_ref<const other_array_t&>());\n                break;\n            case value_t::binary:\n                JSONSerializer<other_binary_t>::to_json(*this, val.template get_ref<const other_binary_t&>());\n                break;\n            case value_t::null:\n                *this = nullptr;\n                break;\n            case value_t::discarded:\n                m_type = value_t::discarded;\n                break;\n            default:            // LCOV_EXCL_LINE\n                JSON_ASSERT(false); // NOLINT(cert-dcl03-c,hicpp-static-assert,misc-static-assert) LCOV_EXCL_LINE\n        }\n        set_parents();\n        assert_invariant();\n    }\n\n    /*!\n    @brief create a container (array or object) from an initializer list\n\n    Creates a JSON value of type array or object from the passed initializer\n    list @a init. In case @a type_deduction is `true` (default), the type of\n    the JSON value to be created is deducted from the initializer list @a init\n    according to the following rules:\n\n    1. If the list is empty, an empty JSON object value `{}` is created.\n    2. If the list consists of pairs whose first element is a string, a JSON\n       object value is created where the first elements of the pairs are\n       treated as keys and the second elements are as values.\n    3. In all other cases, an array is created.\n\n    The rules aim to create the best fit between a C++ initializer list and\n    JSON values. The rationale is as follows:\n\n    1. The empty initializer list is written as `{}` which is exactly an empty\n       JSON object.\n    2. C++ has no way of describing mapped types other than to list a list of\n       pairs. As JSON requires that keys must be of type string, rule 2 is the\n       weakest constraint one can pose on initializer lists to interpret them\n       as an object.\n    3. In all other cases, the initializer list could not be interpreted as\n       JSON object type, so interpreting it as JSON array type is safe.\n\n    With the rules described above, the following JSON values cannot be\n    expressed by an initializer list:\n\n    - the empty array (`[]`): use @ref array(initializer_list_t)\n      with an empty initializer list in this case\n    - arrays whose elements satisfy rule 2: use @ref\n      array(initializer_list_t) with the same initializer list\n      in this case\n\n    @note When used without parentheses around an empty initializer list, @ref\n    basic_json() is called instead of this function, yielding the JSON null\n    value.\n\n    @param[in] init  initializer list with JSON values\n\n    @param[in] type_deduction internal parameter; when set to `true`, the type\n    of the JSON value is deducted from the initializer list @a init; when set\n    to `false`, the type provided via @a manual_type is forced. This mode is\n    used by the functions @ref array(initializer_list_t) and\n    @ref object(initializer_list_t).\n\n    @param[in] manual_type internal parameter; when @a type_deduction is set\n    to `false`, the created JSON value will use the provided type (only @ref\n    value_t::array and @ref value_t::object are valid); when @a type_deduction\n    is set to `true`, this parameter has no effect\n\n    @throw type_error.301 if @a type_deduction is `false`, @a manual_type is\n    `value_t::object`, but @a init contains an element which is not a pair\n    whose first element is a string. In this case, the constructor could not\n    create an object. If @a type_deduction would have be `true`, an array\n    would have been created. See @ref object(initializer_list_t)\n    for an example.\n\n    @complexity Linear in the size of the initializer list @a init.\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes to any JSON value.\n\n    @liveexample{The example below shows how JSON values are created from\n    initializer lists.,basic_json__list_init_t}\n\n    @sa see @ref array(initializer_list_t) -- create a JSON array\n    value from an initializer list\n    @sa see @ref object(initializer_list_t) -- create a JSON object\n    value from an initializer list\n\n    @since version 1.0.0\n    */\n    basic_json(initializer_list_t init,\n               bool type_deduction = true,\n               value_t manual_type = value_t::array)\n    {\n        // check if each element is an array with two elements whose first\n        // element is a string\n        bool is_an_object = std::all_of(init.begin(), init.end(),\n                                        [](const detail::json_ref<basic_json>& element_ref)\n        {\n            return element_ref->is_array() && element_ref->size() == 2 && (*element_ref)[0].is_string();\n        });\n\n        // adjust type if type deduction is not wanted\n        if (!type_deduction)\n        {\n            // if array is wanted, do not create an object though possible\n            if (manual_type == value_t::array)\n            {\n                is_an_object = false;\n            }\n\n            // if object is wanted but impossible, throw an exception\n            if (JSON_HEDLEY_UNLIKELY(manual_type == value_t::object && !is_an_object))\n            {\n                JSON_THROW(type_error::create(301, \"cannot create object from initializer list\", basic_json()));\n            }\n        }\n\n        if (is_an_object)\n        {\n            // the initializer list is a list of pairs -> create object\n            m_type = value_t::object;\n            m_value = value_t::object;\n\n            for (auto& element_ref : init)\n            {\n                auto element = element_ref.moved_or_copied();\n                m_value.object->emplace(\n                    std::move(*((*element.m_value.array)[0].m_value.string)),\n                    std::move((*element.m_value.array)[1]));\n            }\n        }\n        else\n        {\n            // the initializer list describes an array -> create array\n            m_type = value_t::array;\n            m_value.array = create<array_t>(init.begin(), init.end());\n        }\n\n        set_parents();\n        assert_invariant();\n    }\n\n    /*!\n    @brief explicitly create a binary array (without subtype)\n\n    Creates a JSON binary array value from a given binary container. Binary\n    values are part of various binary formats, such as CBOR, MessagePack, and\n    BSON. This constructor is used to create a value for serialization to those\n    formats.\n\n    @note Note, this function exists because of the difficulty in correctly\n    specifying the correct template overload in the standard value ctor, as both\n    JSON arrays and JSON binary arrays are backed with some form of a\n    `std::vector`. Because JSON binary arrays are a non-standard extension it\n    was decided that it would be best to prevent automatic initialization of a\n    binary array type, for backwards compatibility and so it does not happen on\n    accident.\n\n    @param[in] init container containing bytes to use as binary type\n\n    @return JSON binary array value\n\n    @complexity Linear in the size of @a init.\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes to any JSON value.\n\n    @since version 3.8.0\n    */\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json binary(const typename binary_t::container_type& init)\n    {\n        auto res = basic_json();\n        res.m_type = value_t::binary;\n        res.m_value = init;\n        return res;\n    }\n\n    /*!\n    @brief explicitly create a binary array (with subtype)\n\n    Creates a JSON binary array value from a given binary container. Binary\n    values are part of various binary formats, such as CBOR, MessagePack, and\n    BSON. This constructor is used to create a value for serialization to those\n    formats.\n\n    @note Note, this function exists because of the difficulty in correctly\n    specifying the correct template overload in the standard value ctor, as both\n    JSON arrays and JSON binary arrays are backed with some form of a\n    `std::vector`. Because JSON binary arrays are a non-standard extension it\n    was decided that it would be best to prevent automatic initialization of a\n    binary array type, for backwards compatibility and so it does not happen on\n    accident.\n\n    @param[in] init container containing bytes to use as binary type\n    @param[in] subtype subtype to use in MessagePack and BSON\n\n    @return JSON binary array value\n\n    @complexity Linear in the size of @a init.\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes to any JSON value.\n\n    @since version 3.8.0\n    */\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json binary(const typename binary_t::container_type& init, std::uint8_t subtype)\n    {\n        auto res = basic_json();\n        res.m_type = value_t::binary;\n        res.m_value = binary_t(init, subtype);\n        return res;\n    }\n\n    /// @copydoc binary(const typename binary_t::container_type&)\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json binary(typename binary_t::container_type&& init)\n    {\n        auto res = basic_json();\n        res.m_type = value_t::binary;\n        res.m_value = std::move(init);\n        return res;\n    }\n\n    /// @copydoc binary(const typename binary_t::container_type&, std::uint8_t)\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json binary(typename binary_t::container_type&& init, std::uint8_t subtype)\n    {\n        auto res = basic_json();\n        res.m_type = value_t::binary;\n        res.m_value = binary_t(std::move(init), subtype);\n        return res;\n    }\n\n    /*!\n    @brief explicitly create an array from an initializer list\n\n    Creates a JSON array value from a given initializer list. That is, given a\n    list of values `a, b, c`, creates the JSON value `[a, b, c]`. If the\n    initializer list is empty, the empty array `[]` is created.\n\n    @note This function is only needed to express two edge cases that cannot\n    be realized with the initializer list constructor (@ref\n    basic_json(initializer_list_t, bool, value_t)). These cases\n    are:\n    1. creating an array whose elements are all pairs whose first element is a\n    string -- in this case, the initializer list constructor would create an\n    object, taking the first elements as keys\n    2. creating an empty array -- passing the empty initializer list to the\n    initializer list constructor yields an empty object\n\n    @param[in] init  initializer list with JSON values to create an array from\n    (optional)\n\n    @return JSON array value\n\n    @complexity Linear in the size of @a init.\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes to any JSON value.\n\n    @liveexample{The following code shows an example for the `array`\n    function.,array}\n\n    @sa see @ref basic_json(initializer_list_t, bool, value_t) --\n    create a JSON value from an initializer list\n    @sa see @ref object(initializer_list_t) -- create a JSON object\n    value from an initializer list\n\n    @since version 1.0.0\n    */\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json array(initializer_list_t init = {})\n    {\n        return basic_json(init, false, value_t::array);\n    }\n\n    /*!\n    @brief explicitly create an object from an initializer list\n\n    Creates a JSON object value from a given initializer list. The initializer\n    lists elements must be pairs, and their first elements must be strings. If\n    the initializer list is empty, the empty object `{}` is created.\n\n    @note This function is only added for symmetry reasons. In contrast to the\n    related function @ref array(initializer_list_t), there are\n    no cases which can only be expressed by this function. That is, any\n    initializer list @a init can also be passed to the initializer list\n    constructor @ref basic_json(initializer_list_t, bool, value_t).\n\n    @param[in] init  initializer list to create an object from (optional)\n\n    @return JSON object value\n\n    @throw type_error.301 if @a init is not a list of pairs whose first\n    elements are strings. In this case, no object can be created. When such a\n    value is passed to @ref basic_json(initializer_list_t, bool, value_t),\n    an array would have been created from the passed initializer list @a init.\n    See example below.\n\n    @complexity Linear in the size of @a init.\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes to any JSON value.\n\n    @liveexample{The following code shows an example for the `object`\n    function.,object}\n\n    @sa see @ref basic_json(initializer_list_t, bool, value_t) --\n    create a JSON value from an initializer list\n    @sa see @ref array(initializer_list_t) -- create a JSON array\n    value from an initializer list\n\n    @since version 1.0.0\n    */\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json object(initializer_list_t init = {})\n    {\n        return basic_json(init, false, value_t::object);\n    }\n\n    /*!\n    @brief construct an array with count copies of given value\n\n    Constructs a JSON array value by creating @a cnt copies of a passed value.\n    In case @a cnt is `0`, an empty array is created.\n\n    @param[in] cnt  the number of JSON copies of @a val to create\n    @param[in] val  the JSON value to copy\n\n    @post `std::distance(begin(),end()) == cnt` holds.\n\n    @complexity Linear in @a cnt.\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes to any JSON value.\n\n    @liveexample{The following code shows examples for the @ref\n    basic_json(size_type\\, const basic_json&)\n    constructor.,basic_json__size_type_basic_json}\n\n    @since version 1.0.0\n    */\n    basic_json(size_type cnt, const basic_json& val)\n        : m_type(value_t::array)\n    {\n        m_value.array = create<array_t>(cnt, val);\n        set_parents();\n        assert_invariant();\n    }\n\n    /*!\n    @brief construct a JSON container given an iterator range\n\n    Constructs the JSON value with the contents of the range `[first, last)`.\n    The semantics depends on the different types a JSON value can have:\n    - In case of a null type, invalid_iterator.206 is thrown.\n    - In case of other primitive types (number, boolean, or string), @a first\n      must be `begin()` and @a last must be `end()`. In this case, the value is\n      copied. Otherwise, invalid_iterator.204 is thrown.\n    - In case of structured types (array, object), the constructor behaves as\n      similar versions for `std::vector` or `std::map`; that is, a JSON array\n      or object is constructed from the values in the range.\n\n    @tparam InputIT an input iterator type (@ref iterator or @ref\n    const_iterator)\n\n    @param[in] first begin of the range to copy from (included)\n    @param[in] last end of the range to copy from (excluded)\n\n    @pre Iterators @a first and @a last must be initialized. **This\n         precondition is enforced with an assertion (see warning).** If\n         assertions are switched off, a violation of this precondition yields\n         undefined behavior.\n\n    @pre Range `[first, last)` is valid. Usually, this precondition cannot be\n         checked efficiently. Only certain edge cases are detected; see the\n         description of the exceptions below. A violation of this precondition\n         yields undefined behavior.\n\n    @warning A precondition is enforced with a runtime assertion that will\n             result in calling `std::abort` if this precondition is not met.\n             Assertions can be disabled by defining `NDEBUG` at compile time.\n             See https://en.cppreference.com/w/cpp/error/assert for more\n             information.\n\n    @throw invalid_iterator.201 if iterators @a first and @a last are not\n    compatible (i.e., do not belong to the same JSON value). In this case,\n    the range `[first, last)` is undefined.\n    @throw invalid_iterator.204 if iterators @a first and @a last belong to a\n    primitive type (number, boolean, or string), but @a first does not point\n    to the first element any more. In this case, the range `[first, last)` is\n    undefined. See example code below.\n    @throw invalid_iterator.206 if iterators @a first and @a last belong to a\n    null value. In this case, the range `[first, last)` is undefined.\n\n    @complexity Linear in distance between @a first and @a last.\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes to any JSON value.\n\n    @liveexample{The example below shows several ways to create JSON values by\n    specifying a subrange with iterators.,basic_json__InputIt_InputIt}\n\n    @since version 1.0.0\n    */\n    template < class InputIT, typename std::enable_if <\n                   std::is_same<InputIT, typename basic_json_t::iterator>::value ||\n                   std::is_same<InputIT, typename basic_json_t::const_iterator>::value, int >::type = 0 >\n    basic_json(InputIT first, InputIT last)\n    {\n        JSON_ASSERT(first.m_object != nullptr);\n        JSON_ASSERT(last.m_object != nullptr);\n\n        // make sure iterator fits the current value\n        if (JSON_HEDLEY_UNLIKELY(first.m_object != last.m_object))\n        {\n            JSON_THROW(invalid_iterator::create(201, \"iterators are not compatible\", basic_json()));\n        }\n\n        // copy type from first iterator\n        m_type = first.m_object->m_type;\n\n        // check if iterator range is complete for primitive values\n        switch (m_type)\n        {\n            case value_t::boolean:\n            case value_t::number_float:\n            case value_t::number_integer:\n            case value_t::number_unsigned:\n            case value_t::string:\n            {\n                if (JSON_HEDLEY_UNLIKELY(!first.m_it.primitive_iterator.is_begin()\n                                         || !last.m_it.primitive_iterator.is_end()))\n                {\n                    JSON_THROW(invalid_iterator::create(204, \"iterators out of range\", *first.m_object));\n                }\n                break;\n            }\n\n            default:\n                break;\n        }\n\n        switch (m_type)\n        {\n            case value_t::number_integer:\n            {\n                m_value.number_integer = first.m_object->m_value.number_integer;\n                break;\n            }\n\n            case value_t::number_unsigned:\n            {\n                m_value.number_unsigned = first.m_object->m_value.number_unsigned;\n                break;\n            }\n\n            case value_t::number_float:\n            {\n                m_value.number_float = first.m_object->m_value.number_float;\n                break;\n            }\n\n            case value_t::boolean:\n            {\n                m_value.boolean = first.m_object->m_value.boolean;\n                break;\n            }\n\n            case value_t::string:\n            {\n                m_value = *first.m_object->m_value.string;\n                break;\n            }\n\n            case value_t::object:\n            {\n                m_value.object = create<object_t>(first.m_it.object_iterator,\n                                                  last.m_it.object_iterator);\n                break;\n            }\n\n            case value_t::array:\n            {\n                m_value.array = create<array_t>(first.m_it.array_iterator,\n                                                last.m_it.array_iterator);\n                break;\n            }\n\n            case value_t::binary:\n            {\n                m_value = *first.m_object->m_value.binary;\n                break;\n            }\n\n            default:\n                JSON_THROW(invalid_iterator::create(206, \"cannot construct with iterators from \" + std::string(first.m_object->type_name()), *first.m_object));\n        }\n\n        set_parents();\n        assert_invariant();\n    }\n\n\n    ///////////////////////////////////////\n    // other constructors and destructor //\n    ///////////////////////////////////////\n\n    template<typename JsonRef,\n             detail::enable_if_t<detail::conjunction<detail::is_json_ref<JsonRef>,\n                                 std::is_same<typename JsonRef::value_type, basic_json>>::value, int> = 0 >\n    basic_json(const JsonRef& ref) : basic_json(ref.moved_or_copied()) {}\n\n    /*!\n    @brief copy constructor\n\n    Creates a copy of a given JSON value.\n\n    @param[in] other  the JSON value to copy\n\n    @post `*this == other`\n\n    @complexity Linear in the size of @a other.\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes to any JSON value.\n\n    @requirement This function helps `basic_json` satisfying the\n    [Container](https://en.cppreference.com/w/cpp/named_req/Container)\n    requirements:\n    - The complexity is linear.\n    - As postcondition, it holds: `other == basic_json(other)`.\n\n    @liveexample{The following code shows an example for the copy\n    constructor.,basic_json__basic_json}\n\n    @since version 1.0.0\n    */\n    basic_json(const basic_json& other)\n        : m_type(other.m_type)\n    {\n        // check of passed value is valid\n        other.assert_invariant();\n\n        switch (m_type)\n        {\n            case value_t::object:\n            {\n                m_value = *other.m_value.object;\n                break;\n            }\n\n            case value_t::array:\n            {\n                m_value = *other.m_value.array;\n                break;\n            }\n\n            case value_t::string:\n            {\n                m_value = *other.m_value.string;\n                break;\n            }\n\n            case value_t::boolean:\n            {\n                m_value = other.m_value.boolean;\n                break;\n            }\n\n            case value_t::number_integer:\n            {\n                m_value = other.m_value.number_integer;\n                break;\n            }\n\n            case value_t::number_unsigned:\n            {\n                m_value = other.m_value.number_unsigned;\n                break;\n            }\n\n            case value_t::number_float:\n            {\n                m_value = other.m_value.number_float;\n                break;\n            }\n\n            case value_t::binary:\n            {\n                m_value = *other.m_value.binary;\n                break;\n            }\n\n            default:\n                break;\n        }\n\n        set_parents();\n        assert_invariant();\n    }\n\n    /*!\n    @brief move constructor\n\n    Move constructor. Constructs a JSON value with the contents of the given\n    value @a other using move semantics. It \"steals\" the resources from @a\n    other and leaves it as JSON null value.\n\n    @param[in,out] other  value to move to this object\n\n    @post `*this` has the same value as @a other before the call.\n    @post @a other is a JSON null value.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this constructor never throws\n    exceptions.\n\n    @requirement This function helps `basic_json` satisfying the\n    [MoveConstructible](https://en.cppreference.com/w/cpp/named_req/MoveConstructible)\n    requirements.\n\n    @liveexample{The code below shows the move constructor explicitly called\n    via std::move.,basic_json__moveconstructor}\n\n    @since version 1.0.0\n    */\n    basic_json(basic_json&& other) noexcept\n        : m_type(std::move(other.m_type)),\n          m_value(std::move(other.m_value))\n    {\n        // check that passed value is valid\n        other.assert_invariant(false);\n\n        // invalidate payload\n        other.m_type = value_t::null;\n        other.m_value = {};\n\n        set_parents();\n        assert_invariant();\n    }\n\n    /*!\n    @brief copy assignment\n\n    Copy assignment operator. Copies a JSON value via the \"copy and swap\"\n    strategy: It is expressed in terms of the copy constructor, destructor,\n    and the `swap()` member function.\n\n    @param[in] other  value to copy from\n\n    @complexity Linear.\n\n    @requirement This function helps `basic_json` satisfying the\n    [Container](https://en.cppreference.com/w/cpp/named_req/Container)\n    requirements:\n    - The complexity is linear.\n\n    @liveexample{The code below shows and example for the copy assignment. It\n    creates a copy of value `a` which is then swapped with `b`. Finally\\, the\n    copy of `a` (which is the null value after the swap) is\n    destroyed.,basic_json__copyassignment}\n\n    @since version 1.0.0\n    */\n    basic_json& operator=(basic_json other) noexcept (\n        std::is_nothrow_move_constructible<value_t>::value&&\n        std::is_nothrow_move_assignable<value_t>::value&&\n        std::is_nothrow_move_constructible<json_value>::value&&\n        std::is_nothrow_move_assignable<json_value>::value\n    )\n    {\n        // check that passed value is valid\n        other.assert_invariant();\n\n        using std::swap;\n        swap(m_type, other.m_type);\n        swap(m_value, other.m_value);\n\n        set_parents();\n        assert_invariant();\n        return *this;\n    }\n\n    /*!\n    @brief destructor\n\n    Destroys the JSON value and frees all allocated memory.\n\n    @complexity Linear.\n\n    @requirement This function helps `basic_json` satisfying the\n    [Container](https://en.cppreference.com/w/cpp/named_req/Container)\n    requirements:\n    - The complexity is linear.\n    - All stored elements are destroyed and all memory is freed.\n\n    @since version 1.0.0\n    */\n    ~basic_json() noexcept\n    {\n        assert_invariant(false);\n        m_value.destroy(m_type);\n    }\n\n    /// @}\n\n  public:\n    ///////////////////////\n    // object inspection //\n    ///////////////////////\n\n    /// @name object inspection\n    /// Functions to inspect the type of a JSON value.\n    /// @{\n\n    /*!\n    @brief serialization\n\n    Serialization function for JSON values. The function tries to mimic\n    Python's `json.dumps()` function, and currently supports its @a indent\n    and @a ensure_ascii parameters.\n\n    @param[in] indent If indent is nonnegative, then array elements and object\n    members will be pretty-printed with that indent level. An indent level of\n    `0` will only insert newlines. `-1` (the default) selects the most compact\n    representation.\n    @param[in] indent_char The character to use for indentation if @a indent is\n    greater than `0`. The default is ` ` (space).\n    @param[in] ensure_ascii If @a ensure_ascii is true, all non-ASCII characters\n    in the output are escaped with `\\uXXXX` sequences, and the result consists\n    of ASCII characters only.\n    @param[in] error_handler  how to react on decoding errors; there are three\n    possible values: `strict` (throws and exception in case a decoding error\n    occurs; default), `replace` (replace invalid UTF-8 sequences with U+FFFD),\n    and `ignore` (ignore invalid UTF-8 sequences during serialization; all\n    bytes are copied to the output unchanged).\n\n    @return string containing the serialization of the JSON value\n\n    @throw type_error.316 if a string stored inside the JSON value is not\n                          UTF-8 encoded and @a error_handler is set to strict\n\n    @note Binary values are serialized as object containing two keys:\n      - \"bytes\": an array of bytes as integers\n      - \"subtype\": the subtype as integer or \"null\" if the binary has no subtype\n\n    @complexity Linear.\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes in the JSON value.\n\n    @liveexample{The following example shows the effect of different @a indent\\,\n    @a indent_char\\, and @a ensure_ascii parameters to the result of the\n    serialization.,dump}\n\n    @see https://docs.python.org/2/library/json.html#json.dump\n\n    @since version 1.0.0; indentation character @a indent_char, option\n           @a ensure_ascii and exceptions added in version 3.0.0; error\n           handlers added in version 3.4.0; serialization of binary values added\n           in version 3.8.0.\n    */\n    string_t dump(const int indent = -1,\n                  const char indent_char = ' ',\n                  const bool ensure_ascii = false,\n                  const error_handler_t error_handler = error_handler_t::strict) const\n    {\n        string_t result;\n        serializer s(detail::output_adapter<char, string_t>(result), indent_char, error_handler);\n\n        if (indent >= 0)\n        {\n            s.dump(*this, true, ensure_ascii, static_cast<unsigned int>(indent));\n        }\n        else\n        {\n            s.dump(*this, false, ensure_ascii, 0);\n        }\n\n        return result;\n    }\n\n    /*!\n    @brief return the type of the JSON value (explicit)\n\n    Return the type of the JSON value as a value from the @ref value_t\n    enumeration.\n\n    @return the type of the JSON value\n            Value type                | return value\n            ------------------------- | -------------------------\n            null                      | value_t::null\n            boolean                   | value_t::boolean\n            string                    | value_t::string\n            number (integer)          | value_t::number_integer\n            number (unsigned integer) | value_t::number_unsigned\n            number (floating-point)   | value_t::number_float\n            object                    | value_t::object\n            array                     | value_t::array\n            binary                    | value_t::binary\n            discarded                 | value_t::discarded\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @liveexample{The following code exemplifies `type()` for all JSON\n    types.,type}\n\n    @sa see @ref operator value_t() -- return the type of the JSON value (implicit)\n    @sa see @ref type_name() -- return the type as string\n\n    @since version 1.0.0\n    */\n    constexpr value_t type() const noexcept\n    {\n        return m_type;\n    }\n\n    /*!\n    @brief return whether type is primitive\n\n    This function returns true if and only if the JSON type is primitive\n    (string, number, boolean, or null).\n\n    @return `true` if type is primitive (string, number, boolean, or null),\n    `false` otherwise.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @liveexample{The following code exemplifies `is_primitive()` for all JSON\n    types.,is_primitive}\n\n    @sa see @ref is_structured() -- returns whether JSON value is structured\n    @sa see @ref is_null() -- returns whether JSON value is `null`\n    @sa see @ref is_string() -- returns whether JSON value is a string\n    @sa see @ref is_boolean() -- returns whether JSON value is a boolean\n    @sa see @ref is_number() -- returns whether JSON value is a number\n    @sa see @ref is_binary() -- returns whether JSON value is a binary array\n\n    @since version 1.0.0\n    */\n    constexpr bool is_primitive() const noexcept\n    {\n        return is_null() || is_string() || is_boolean() || is_number() || is_binary();\n    }\n\n    /*!\n    @brief return whether type is structured\n\n    This function returns true if and only if the JSON type is structured\n    (array or object).\n\n    @return `true` if type is structured (array or object), `false` otherwise.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @liveexample{The following code exemplifies `is_structured()` for all JSON\n    types.,is_structured}\n\n    @sa see @ref is_primitive() -- returns whether value is primitive\n    @sa see @ref is_array() -- returns whether value is an array\n    @sa see @ref is_object() -- returns whether value is an object\n\n    @since version 1.0.0\n    */\n    constexpr bool is_structured() const noexcept\n    {\n        return is_array() || is_object();\n    }\n\n    /*!\n    @brief return whether value is null\n\n    This function returns true if and only if the JSON value is null.\n\n    @return `true` if type is null, `false` otherwise.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @liveexample{The following code exemplifies `is_null()` for all JSON\n    types.,is_null}\n\n    @since version 1.0.0\n    */\n    constexpr bool is_null() const noexcept\n    {\n        return m_type == value_t::null;\n    }\n\n    /*!\n    @brief return whether value is a boolean\n\n    This function returns true if and only if the JSON value is a boolean.\n\n    @return `true` if type is boolean, `false` otherwise.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @liveexample{The following code exemplifies `is_boolean()` for all JSON\n    types.,is_boolean}\n\n    @since version 1.0.0\n    */\n    constexpr bool is_boolean() const noexcept\n    {\n        return m_type == value_t::boolean;\n    }\n\n    /*!\n    @brief return whether value is a number\n\n    This function returns true if and only if the JSON value is a number. This\n    includes both integer (signed and unsigned) and floating-point values.\n\n    @return `true` if type is number (regardless whether integer, unsigned\n    integer or floating-type), `false` otherwise.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @liveexample{The following code exemplifies `is_number()` for all JSON\n    types.,is_number}\n\n    @sa see @ref is_number_integer() -- check if value is an integer or unsigned\n    integer number\n    @sa see @ref is_number_unsigned() -- check if value is an unsigned integer\n    number\n    @sa see @ref is_number_float() -- check if value is a floating-point number\n\n    @since version 1.0.0\n    */\n    constexpr bool is_number() const noexcept\n    {\n        return is_number_integer() || is_number_float();\n    }\n\n    /*!\n    @brief return whether value is an integer number\n\n    This function returns true if and only if the JSON value is a signed or\n    unsigned integer number. This excludes floating-point values.\n\n    @return `true` if type is an integer or unsigned integer number, `false`\n    otherwise.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @liveexample{The following code exemplifies `is_number_integer()` for all\n    JSON types.,is_number_integer}\n\n    @sa see @ref is_number() -- check if value is a number\n    @sa see @ref is_number_unsigned() -- check if value is an unsigned integer\n    number\n    @sa see @ref is_number_float() -- check if value is a floating-point number\n\n    @since version 1.0.0\n    */\n    constexpr bool is_number_integer() const noexcept\n    {\n        return m_type == value_t::number_integer || m_type == value_t::number_unsigned;\n    }\n\n    /*!\n    @brief return whether value is an unsigned integer number\n\n    This function returns true if and only if the JSON value is an unsigned\n    integer number. This excludes floating-point and signed integer values.\n\n    @return `true` if type is an unsigned integer number, `false` otherwise.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @liveexample{The following code exemplifies `is_number_unsigned()` for all\n    JSON types.,is_number_unsigned}\n\n    @sa see @ref is_number() -- check if value is a number\n    @sa see @ref is_number_integer() -- check if value is an integer or unsigned\n    integer number\n    @sa see @ref is_number_float() -- check if value is a floating-point number\n\n    @since version 2.0.0\n    */\n    constexpr bool is_number_unsigned() const noexcept\n    {\n        return m_type == value_t::number_unsigned;\n    }\n\n    /*!\n    @brief return whether value is a floating-point number\n\n    This function returns true if and only if the JSON value is a\n    floating-point number. This excludes signed and unsigned integer values.\n\n    @return `true` if type is a floating-point number, `false` otherwise.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @liveexample{The following code exemplifies `is_number_float()` for all\n    JSON types.,is_number_float}\n\n    @sa see @ref is_number() -- check if value is number\n    @sa see @ref is_number_integer() -- check if value is an integer number\n    @sa see @ref is_number_unsigned() -- check if value is an unsigned integer\n    number\n\n    @since version 1.0.0\n    */\n    constexpr bool is_number_float() const noexcept\n    {\n        return m_type == value_t::number_float;\n    }\n\n    /*!\n    @brief return whether value is an object\n\n    This function returns true if and only if the JSON value is an object.\n\n    @return `true` if type is object, `false` otherwise.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @liveexample{The following code exemplifies `is_object()` for all JSON\n    types.,is_object}\n\n    @since version 1.0.0\n    */\n    constexpr bool is_object() const noexcept\n    {\n        return m_type == value_t::object;\n    }\n\n    /*!\n    @brief return whether value is an array\n\n    This function returns true if and only if the JSON value is an array.\n\n    @return `true` if type is array, `false` otherwise.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @liveexample{The following code exemplifies `is_array()` for all JSON\n    types.,is_array}\n\n    @since version 1.0.0\n    */\n    constexpr bool is_array() const noexcept\n    {\n        return m_type == value_t::array;\n    }\n\n    /*!\n    @brief return whether value is a string\n\n    This function returns true if and only if the JSON value is a string.\n\n    @return `true` if type is string, `false` otherwise.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @liveexample{The following code exemplifies `is_string()` for all JSON\n    types.,is_string}\n\n    @since version 1.0.0\n    */\n    constexpr bool is_string() const noexcept\n    {\n        return m_type == value_t::string;\n    }\n\n    /*!\n    @brief return whether value is a binary array\n\n    This function returns true if and only if the JSON value is a binary array.\n\n    @return `true` if type is binary array, `false` otherwise.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @liveexample{The following code exemplifies `is_binary()` for all JSON\n    types.,is_binary}\n\n    @since version 3.8.0\n    */\n    constexpr bool is_binary() const noexcept\n    {\n        return m_type == value_t::binary;\n    }\n\n    /*!\n    @brief return whether value is discarded\n\n    This function returns true if and only if the JSON value was discarded\n    during parsing with a callback function (see @ref parser_callback_t).\n\n    @note This function will always be `false` for JSON values after parsing.\n    That is, discarded values can only occur during parsing, but will be\n    removed when inside a structured value or replaced by null in other cases.\n\n    @return `true` if type is discarded, `false` otherwise.\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @liveexample{The following code exemplifies `is_discarded()` for all JSON\n    types.,is_discarded}\n\n    @since version 1.0.0\n    */\n    constexpr bool is_discarded() const noexcept\n    {\n        return m_type == value_t::discarded;\n    }\n\n    /*!\n    @brief return the type of the JSON value (implicit)\n\n    Implicitly return the type of the JSON value as a value from the @ref\n    value_t enumeration.\n\n    @return the type of the JSON value\n\n    @complexity Constant.\n\n    @exceptionsafety No-throw guarantee: this member function never throws\n    exceptions.\n\n    @liveexample{The following code exemplifies the @ref value_t operator for\n    all JSON types.,operator__value_t}\n\n    @sa see @ref type() -- return the type of the JSON value (explicit)\n    @sa see @ref type_name() -- return the type as string\n\n    @since version 1.0.0\n    */\n    constexpr operator value_t() const noexcept\n    {\n        return m_type;\n    }\n\n    /// @}\n\n  private:\n    //////////////////\n    // value access //\n    //////////////////\n\n    /// get a boolean (explicit)\n    boolean_t get_impl(boolean_t* /*unused*/) const\n    {\n        if (JSON_HEDLEY_LIKELY(is_boolean()))\n        {\n            return m_value.boolean;\n        }\n\n        JSON_THROW(type_error::create(302, \"type must be boolean, but is \" + std::string(type_name()), *this));\n    }\n\n    /// get a pointer to the value (object)\n    object_t* get_impl_ptr(object_t* /*unused*/) noexcept\n    {\n        return is_object() ? m_value.object : nullptr;\n    }\n\n    /// get a pointer to the value (object)\n    constexpr const object_t* get_impl_ptr(const object_t* /*unused*/) const noexcept\n    {\n        return is_object() ? m_value.object : nullptr;\n    }\n\n    /// get a pointer to the value (array)\n    array_t* get_impl_ptr(array_t* /*unused*/) noexcept\n    {\n        return is_array() ? m_value.array : nullptr;\n    }\n\n    /// get a pointer to the value (array)\n    constexpr const array_t* get_impl_ptr(const array_t* /*unused*/) const noexcept\n    {\n        return is_array() ? m_value.array : nullptr;\n    }\n\n    /// get a pointer to the value (string)\n    string_t* get_impl_ptr(string_t* /*unused*/) noexcept\n    {\n        return is_string() ? m_value.string : nullptr;\n    }\n\n    /// get a pointer to the value (string)\n    constexpr const string_t* get_impl_ptr(const string_t* /*unused*/) const noexcept\n    {\n        return is_string() ? m_value.string : nullptr;\n    }\n\n    /// get a pointer to the value (boolean)\n    boolean_t* get_impl_ptr(boolean_t* /*unused*/) noexcept\n    {\n        return is_boolean() ? &m_value.boolean : nullptr;\n    }\n\n    /// get a pointer to the value (boolean)\n    constexpr const boolean_t* get_impl_ptr(const boolean_t* /*unused*/) const noexcept\n    {\n        return is_boolean() ? &m_value.boolean : nullptr;\n    }\n\n    /// get a pointer to the value (integer number)\n    number_integer_t* get_impl_ptr(number_integer_t* /*unused*/) noexcept\n    {\n        return is_number_integer() ? &m_value.number_integer : nullptr;\n    }\n\n    /// get a pointer to the value (integer number)\n    constexpr const number_integer_t* get_impl_ptr(const number_integer_t* /*unused*/) const noexcept\n    {\n        return is_number_integer() ? &m_value.number_integer : nullptr;\n    }\n\n    /// get a pointer to the value (unsigned number)\n    number_unsigned_t* get_impl_ptr(number_unsigned_t* /*unused*/) noexcept\n    {\n        return is_number_unsigned() ? &m_value.number_unsigned : nullptr;\n    }\n\n    /// get a pointer to the value (unsigned number)\n    constexpr const number_unsigned_t* get_impl_ptr(const number_unsigned_t* /*unused*/) const noexcept\n    {\n        return is_number_unsigned() ? &m_value.number_unsigned : nullptr;\n    }\n\n    /// get a pointer to the value (floating-point number)\n    number_float_t* get_impl_ptr(number_float_t* /*unused*/) noexcept\n    {\n        return is_number_float() ? &m_value.number_float : nullptr;\n    }\n\n    /// get a pointer to the value (floating-point number)\n    constexpr const number_float_t* get_impl_ptr(const number_float_t* /*unused*/) const noexcept\n    {\n        return is_number_float() ? &m_value.number_float : nullptr;\n    }\n\n    /// get a pointer to the value (binary)\n    binary_t* get_impl_ptr(binary_t* /*unused*/) noexcept\n    {\n        return is_binary() ? m_value.binary : nullptr;\n    }\n\n    /// get a pointer to the value (binary)\n    constexpr const binary_t* get_impl_ptr(const binary_t* /*unused*/) const noexcept\n    {\n        return is_binary() ? m_value.binary : nullptr;\n    }\n\n    /*!\n    @brief helper function to implement get_ref()\n\n    This function helps to implement get_ref() without code duplication for\n    const and non-const overloads\n\n    @tparam ThisType will be deduced as `basic_json` or `const basic_json`\n\n    @throw type_error.303 if ReferenceType does not match underlying value\n    type of the current JSON\n    */\n    template<typename ReferenceType, typename ThisType>\n    static ReferenceType get_ref_impl(ThisType& obj)\n    {\n        // delegate the call to get_ptr<>()\n        auto* ptr = obj.template get_ptr<typename std::add_pointer<ReferenceType>::type>();\n\n        if (JSON_HEDLEY_LIKELY(ptr != nullptr))\n        {\n            return *ptr;\n        }\n\n        JSON_THROW(type_error::create(303, \"incompatible ReferenceType for get_ref, actual type is \" + std::string(obj.type_name()), obj));\n    }\n\n  public:\n    /// @name value access\n    /// Direct access to the stored value of a JSON value.\n    /// @{\n\n    /*!\n    @brief get special-case overload\n\n    This overloads avoids a lot of template boilerplate, it can be seen as the\n    identity method\n\n    @tparam BasicJsonType == @ref basic_json\n\n    @return a copy of *this\n\n    @complexity Constant.\n\n    @since version 2.1.0\n    */\n    template<typename BasicJsonType, detail::enable_if_t<\n                 std::is_same<typename std::remove_const<BasicJsonType>::type, basic_json_t>::value,\n                 int> = 0>\n    basic_json get() const\n    {\n        return *this;\n    }\n\n    /*!\n    @brief get special-case overload\n\n    This overloads converts the current @ref basic_json in a different\n    @ref basic_json type\n\n    @tparam BasicJsonType == @ref basic_json\n\n    @return a copy of *this, converted into @a BasicJsonType\n\n    @complexity Depending on the implementation of the called `from_json()`\n                method.\n\n    @since version 3.2.0\n    */\n    template < typename BasicJsonType, detail::enable_if_t <\n                   !std::is_same<BasicJsonType, basic_json>::value&&\n                   detail::is_basic_json<BasicJsonType>::value, int > = 0 >\n    BasicJsonType get() const\n    {\n        return *this;\n    }\n\n    /*!\n    @brief get a value (explicit)\n\n    Explicit type conversion between the JSON value and a compatible value\n    which is [CopyConstructible](https://en.cppreference.com/w/cpp/named_req/CopyConstructible)\n    and [DefaultConstructible](https://en.cppreference.com/w/cpp/named_req/DefaultConstructible).\n    The value is converted by calling the @ref json_serializer<ValueType>\n    `from_json()` method.\n\n    The function is equivalent to executing\n    @code {.cpp}\n    ValueType ret;\n    JSONSerializer<ValueType>::from_json(*this, ret);\n    return ret;\n    @endcode\n\n    This overloads is chosen if:\n    - @a ValueType is not @ref basic_json,\n    - @ref json_serializer<ValueType> has a `from_json()` method of the form\n      `void from_json(const basic_json&, ValueType&)`, and\n    - @ref json_serializer<ValueType> does not have a `from_json()` method of\n      the form `ValueType from_json(const basic_json&)`\n\n    @tparam ValueTypeCV the provided value type\n    @tparam ValueType the returned value type\n\n    @return copy of the JSON value, converted to @a ValueType\n\n    @throw what @ref json_serializer<ValueType> `from_json()` method throws\n\n    @liveexample{The example below shows several conversions from JSON values\n    to other types. There a few things to note: (1) Floating-point numbers can\n    be converted to integers\\, (2) A JSON array can be converted to a standard\n    `std::vector<short>`\\, (3) A JSON object can be converted to C++\n    associative containers such as `std::unordered_map<std::string\\,\n    json>`.,get__ValueType_const}\n\n    @since version 2.1.0\n    */\n    template < typename ValueTypeCV, typename ValueType = detail::uncvref_t<ValueTypeCV>,\n               detail::enable_if_t <\n                   !detail::is_basic_json<ValueType>::value &&\n                   detail::has_from_json<basic_json_t, ValueType>::value &&\n                   !detail::has_non_default_from_json<basic_json_t, ValueType>::value,\n                   int > = 0 >\n    ValueType get() const noexcept(noexcept(\n                                       JSONSerializer<ValueType>::from_json(std::declval<const basic_json_t&>(), std::declval<ValueType&>())))\n    {\n        // we cannot static_assert on ValueTypeCV being non-const, because\n        // there is support for get<const basic_json_t>(), which is why we\n        // still need the uncvref\n        static_assert(!std::is_reference<ValueTypeCV>::value,\n                      \"get() cannot be used with reference types, you might want to use get_ref()\");\n        static_assert(std::is_default_constructible<ValueType>::value,\n                      \"types must be DefaultConstructible when used with get()\");\n\n        ValueType ret{};\n        JSONSerializer<ValueType>::from_json(*this, ret);\n        return ret;\n    }\n\n    /*!\n    @brief get a value (explicit); special case\n\n    Explicit type conversion between the JSON value and a compatible value\n    which is **not** [CopyConstructible](https://en.cppreference.com/w/cpp/named_req/CopyConstructible)\n    and **not** [DefaultConstructible](https://en.cppreference.com/w/cpp/named_req/DefaultConstructible).\n    The value is converted by calling the @ref json_serializer<ValueType>\n    `from_json()` method.\n\n    The function is equivalent to executing\n    @code {.cpp}\n    return JSONSerializer<ValueTypeCV>::from_json(*this);\n    @endcode\n\n    This overloads is chosen if:\n    - @a ValueType is not @ref basic_json and\n    - @ref json_serializer<ValueType> has a `from_json()` method of the form\n      `ValueType from_json(const basic_json&)`\n\n    @note If @ref json_serializer<ValueType> has both overloads of\n    `from_json()`, this one is chosen.\n\n    @tparam ValueTypeCV the provided value type\n    @tparam ValueType the returned value type\n\n    @return copy of the JSON value, converted to @a ValueType\n\n    @throw what @ref json_serializer<ValueType> `from_json()` method throws\n\n    @since version 2.1.0\n    */\n    template < typename ValueTypeCV, typename ValueType = detail::uncvref_t<ValueTypeCV>,\n               detail::enable_if_t < !std::is_same<basic_json_t, ValueType>::value &&\n                                     detail::has_non_default_from_json<basic_json_t, ValueType>::value,\n                                     int > = 0 >\n    ValueType get() const noexcept(noexcept(\n                                       JSONSerializer<ValueType>::from_json(std::declval<const basic_json_t&>())))\n    {\n        static_assert(!std::is_reference<ValueTypeCV>::value,\n                      \"get() cannot be used with reference types, you might want to use get_ref()\");\n        return JSONSerializer<ValueType>::from_json(*this);\n    }\n\n    /*!\n    @brief get a value (explicit)\n\n    Explicit type conversion between the JSON value and a compatible value.\n    The value is filled into the input parameter by calling the @ref json_serializer<ValueType>\n    `from_json()` method.\n\n    The function is equivalent to executing\n    @code {.cpp}\n    ValueType v;\n    JSONSerializer<ValueType>::from_json(*this, v);\n    @endcode\n\n    This overloads is chosen if:\n    - @a ValueType is not @ref basic_json,\n    - @ref json_serializer<ValueType> has a `from_json()` method of the form\n      `void from_json(const basic_json&, ValueType&)`, and\n\n    @tparam ValueType the input parameter type.\n\n    @return the input parameter, allowing chaining calls.\n\n    @throw what @ref json_serializer<ValueType> `from_json()` method throws\n\n    @liveexample{The example below shows several conversions from JSON values\n    to other types. There a few things to note: (1) Floating-point numbers can\n    be converted to integers\\, (2) A JSON array can be converted to a standard\n    `std::vector<short>`\\, (3) A JSON object can be converted to C++\n    associative containers such as `std::unordered_map<std::string\\,\n    json>`.,get_to}\n\n    @since version 3.3.0\n    */\n    template < typename ValueType,\n               detail::enable_if_t <\n                   !detail::is_basic_json<ValueType>::value&&\n                   detail::has_from_json<basic_json_t, ValueType>::value,\n                   int > = 0 >\n    ValueType & get_to(ValueType& v) const noexcept(noexcept(\n                JSONSerializer<ValueType>::from_json(std::declval<const basic_json_t&>(), v)))\n    {\n        JSONSerializer<ValueType>::from_json(*this, v);\n        return v;\n    }\n\n    // specialization to allow to call get_to with a basic_json value\n    // see https://github.com/nlohmann/json/issues/2175\n    template<typename ValueType,\n             detail::enable_if_t <\n                 detail::is_basic_json<ValueType>::value,\n                 int> = 0>\n    ValueType & get_to(ValueType& v) const\n    {\n        v = *this;\n        return v;\n    }\n\n    template <\n        typename T, std::size_t N,\n        typename Array = T (&)[N], // NOLINT(cppcoreguidelines-avoid-c-arrays,hicpp-avoid-c-arrays,modernize-avoid-c-arrays)\n        detail::enable_if_t <\n            detail::has_from_json<basic_json_t, Array>::value, int > = 0 >\n    Array get_to(T (&v)[N]) const // NOLINT(cppcoreguidelines-avoid-c-arrays,hicpp-avoid-c-arrays,modernize-avoid-c-arrays)\n    noexcept(noexcept(JSONSerializer<Array>::from_json(\n                          std::declval<const basic_json_t&>(), v)))\n    {\n        JSONSerializer<Array>::from_json(*this, v);\n        return v;\n    }\n\n\n    /*!\n    @brief get a pointer value (implicit)\n\n    Implicit pointer access to the internally stored JSON value. No copies are\n    made.\n\n    @warning Writing data to the pointee of the result yields an undefined\n    state.\n\n    @tparam PointerType pointer type; must be a pointer to @ref array_t, @ref\n    object_t, @ref string_t, @ref boolean_t, @ref number_integer_t,\n    @ref number_unsigned_t, or @ref number_float_t. Enforced by a static\n    assertion.\n\n    @return pointer to the internally stored JSON value if the requested\n    pointer type @a PointerType fits to the JSON value; `nullptr` otherwise\n\n    @complexity Constant.\n\n    @liveexample{The example below shows how pointers to internal values of a\n    JSON value can be requested. Note that no type conversions are made and a\n    `nullptr` is returned if the value and the requested pointer type does not\n    match.,get_ptr}\n\n    @since version 1.0.0\n    */\n    template<typename PointerType, typename std::enable_if<\n                 std::is_pointer<PointerType>::value, int>::type = 0>\n    auto get_ptr() noexcept -> decltype(std::declval<basic_json_t&>().get_impl_ptr(std::declval<PointerType>()))\n    {\n        // delegate the call to get_impl_ptr<>()\n        return get_impl_ptr(static_cast<PointerType>(nullptr));\n    }\n\n    /*!\n    @brief get a pointer value (implicit)\n    @copydoc get_ptr()\n    */\n    template < typename PointerType, typename std::enable_if <\n                   std::is_pointer<PointerType>::value&&\n                   std::is_const<typename std::remove_pointer<PointerType>::type>::value, int >::type = 0 >\n    constexpr auto get_ptr() const noexcept -> decltype(std::declval<const basic_json_t&>().get_impl_ptr(std::declval<PointerType>()))\n    {\n        // delegate the call to get_impl_ptr<>() const\n        return get_impl_ptr(static_cast<PointerType>(nullptr));\n    }\n\n    /*!\n    @brief get a pointer value (explicit)\n\n    Explicit pointer access to the internally stored JSON value. No copies are\n    made.\n\n    @warning The pointer becomes invalid if the underlying JSON object\n    changes.\n\n    @tparam PointerType pointer type; must be a pointer to @ref array_t, @ref\n    object_t, @ref string_t, @ref boolean_t, @ref number_integer_t,\n    @ref number_unsigned_t, or @ref number_float_t.\n\n    @return pointer to the internally stored JSON value if the requested\n    pointer type @a PointerType fits to the JSON value; `nullptr` otherwise\n\n    @complexity Constant.\n\n    @liveexample{The example below shows how pointers to internal values of a\n    JSON value can be requested. Note that no type conversions are made and a\n    `nullptr` is returned if the value and the requested pointer type does not\n    match.,get__PointerType}\n\n    @sa see @ref get_ptr() for explicit pointer-member access\n\n    @since version 1.0.0\n    */\n    template<typename PointerType, typename std::enable_if<\n                 std::is_pointer<PointerType>::value, int>::type = 0>\n    auto get() noexcept -> decltype(std::declval<basic_json_t&>().template get_ptr<PointerType>())\n    {\n        // delegate the call to get_ptr\n        return get_ptr<PointerType>();\n    }\n\n    /*!\n    @brief get a pointer value (explicit)\n    @copydoc get()\n    */\n    template<typename PointerType, typename std::enable_if<\n                 std::is_pointer<PointerType>::value, int>::type = 0>\n    constexpr auto get() const noexcept -> decltype(std::declval<const basic_json_t&>().template get_ptr<PointerType>())\n    {\n        // delegate the call to get_ptr\n        return get_ptr<PointerType>();\n    }\n\n    /*!\n    @brief get a reference value (implicit)\n\n    Implicit reference access to the internally stored JSON value. No copies\n    are made.\n\n    @warning Writing data to the referee of the result yields an undefined\n    state.\n\n    @tparam ReferenceType reference type; must be a reference to @ref array_t,\n    @ref object_t, @ref string_t, @ref boolean_t, @ref number_integer_t, or\n    @ref number_float_t. Enforced by static assertion.\n\n    @return reference to the internally stored JSON value if the requested\n    reference type @a ReferenceType fits to the JSON value; throws\n    type_error.303 otherwise\n\n    @throw type_error.303 in case passed type @a ReferenceType is incompatible\n    with the stored JSON value; see example below\n\n    @complexity Constant.\n\n    @liveexample{The example shows several calls to `get_ref()`.,get_ref}\n\n    @since version 1.1.0\n    */\n    template<typename ReferenceType, typename std::enable_if<\n                 std::is_reference<ReferenceType>::value, int>::type = 0>\n    ReferenceType get_ref()\n    {\n        // delegate call to get_ref_impl\n        return get_ref_impl<ReferenceType>(*this);\n    }\n\n    /*!\n    @brief get a reference value (implicit)\n    @copydoc get_ref()\n    */\n    template < typename ReferenceType, typename std::enable_if <\n                   std::is_reference<ReferenceType>::value&&\n                   std::is_const<typename std::remove_reference<ReferenceType>::type>::value, int >::type = 0 >\n    ReferenceType get_ref() const\n    {\n        // delegate call to get_ref_impl\n        return get_ref_impl<ReferenceType>(*this);\n    }\n\n    /*!\n    @brief get a value (implicit)\n\n    Implicit type conversion between the JSON value and a compatible value.\n    The call is realized by calling @ref get() const.\n\n    @tparam ValueType non-pointer type compatible to the JSON value, for\n    instance `int` for JSON integer numbers, `bool` for JSON booleans, or\n    `std::vector` types for JSON arrays. The character type of @ref string_t\n    as well as an initializer list of this type is excluded to avoid\n    ambiguities as these types implicitly convert to `std::string`.\n\n    @return copy of the JSON value, converted to type @a ValueType\n\n    @throw type_error.302 in case passed type @a ValueType is incompatible\n    to the JSON value type (e.g., the JSON value is of type boolean, but a\n    string is requested); see example below\n\n    @complexity Linear in the size of the JSON value.\n\n    @liveexample{The example below shows several conversions from JSON values\n    to other types. There a few things to note: (1) Floating-point numbers can\n    be converted to integers\\, (2) A JSON array can be converted to a standard\n    `std::vector<short>`\\, (3) A JSON object can be converted to C++\n    associative containers such as `std::unordered_map<std::string\\,\n    json>`.,operator__ValueType}\n\n    @since version 1.0.0\n    */\n    template < typename ValueType, typename std::enable_if <\n                   !std::is_pointer<ValueType>::value&&\n                   !std::is_same<ValueType, detail::json_ref<basic_json>>::value&&\n                   !std::is_same<ValueType, typename string_t::value_type>::value&&\n                   !detail::is_basic_json<ValueType>::value\n                   && !std::is_same<ValueType, std::initializer_list<typename string_t::value_type>>::value\n#if defined(JSON_HAS_CPP_17) && (defined(__GNUC__) || (defined(_MSC_VER) && _MSC_VER >= 1910 && _MSC_VER <= 1914))\n                   && !std::is_same<ValueType, typename std::string_view>::value\n#endif\n                   && detail::is_detected<detail::get_template_function, const basic_json_t&, ValueType>::value\n                   , int >::type = 0 >\n    JSON_EXPLICIT operator ValueType() const\n    {\n        // delegate the call to get<>() const\n        return get<ValueType>();\n    }\n\n    /*!\n    @return reference to the binary value\n\n    @throw type_error.302 if the value is not binary\n\n    @sa see @ref is_binary() to check if the value is binary\n\n    @since version 3.8.0\n    */\n    binary_t& get_binary()\n    {\n        if (!is_binary())\n        {\n            JSON_THROW(type_error::create(302, \"type must be binary, but is \" + std::string(type_name()), *this));\n        }\n\n        return *get_ptr<binary_t*>();\n    }\n\n    /// @copydoc get_binary()\n    const binary_t& get_binary() const\n    {\n        if (!is_binary())\n        {\n            JSON_THROW(type_error::create(302, \"type must be binary, but is \" + std::string(type_name()), *this));\n        }\n\n        return *get_ptr<const binary_t*>();\n    }\n\n    /// @}\n\n\n    ////////////////////\n    // element access //\n    ////////////////////\n\n    /// @name element access\n    /// Access to the JSON value.\n    /// @{\n\n    /*!\n    @brief access specified array element with bounds checking\n\n    Returns a reference to the element at specified location @a idx, with\n    bounds checking.\n\n    @param[in] idx  index of the element to access\n\n    @return reference to the element at index @a idx\n\n    @throw type_error.304 if the JSON value is not an array; in this case,\n    calling `at` with an index makes no sense. See example below.\n    @throw out_of_range.401 if the index @a idx is out of range of the array;\n    that is, `idx >= size()`. See example below.\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes in the JSON value.\n\n    @complexity Constant.\n\n    @since version 1.0.0\n\n    @liveexample{The example below shows how array elements can be read and\n    written using `at()`. It also demonstrates the different exceptions that\n    can be thrown.,at__size_type}\n    */\n    reference at(size_type idx)\n    {\n        // at only works for arrays\n        if (JSON_HEDLEY_LIKELY(is_array()))\n        {\n            JSON_TRY\n            {\n                return set_parent(m_value.array->at(idx));\n            }\n            JSON_CATCH (std::out_of_range&)\n            {\n                // create better exception explanation\n                JSON_THROW(out_of_range::create(401, \"array index \" + std::to_string(idx) + \" is out of range\", *this));\n            }\n        }\n        else\n        {\n            JSON_THROW(type_error::create(304, \"cannot use at() with \" + std::string(type_name()), *this));\n        }\n    }\n\n    /*!\n    @brief access specified array element with bounds checking\n\n    Returns a const reference to the element at specified location @a idx,\n    with bounds checking.\n\n    @param[in] idx  index of the element to access\n\n    @return const reference to the element at index @a idx\n\n    @throw type_error.304 if the JSON value is not an array; in this case,\n    calling `at` with an index makes no sense. See example below.\n    @throw out_of_range.401 if the index @a idx is out of range of the array;\n    that is, `idx >= size()`. See example below.\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes in the JSON value.\n\n    @complexity Constant.\n\n    @since version 1.0.0\n\n    @liveexample{The example below shows how array elements can be read using\n    `at()`. It also demonstrates the different exceptions that can be thrown.,\n    at__size_type_const}\n    */\n    const_reference at(size_type idx) const\n    {\n        // at only works for arrays\n        if (JSON_HEDLEY_LIKELY(is_array()))\n        {\n            JSON_TRY\n            {\n                return m_value.array->at(idx);\n            }\n            JSON_CATCH (std::out_of_range&)\n            {\n                // create better exception explanation\n                JSON_THROW(out_of_range::create(401, \"array index \" + std::to_string(idx) + \" is out of range\", *this));\n            }\n        }\n        else\n        {\n            JSON_THROW(type_error::create(304, \"cannot use at() with \" + std::string(type_name()), *this));\n        }\n    }\n\n    /*!\n    @brief access specified object element with bounds checking\n\n    Returns a reference to the element at with specified key @a key, with\n    bounds checking.\n\n    @param[in] key  key of the element to access\n\n    @return reference to the element at key @a key\n\n    @throw type_error.304 if the JSON value is not an object; in this case,\n    calling `at` with a key makes no sense. See example below.\n    @throw out_of_range.403 if the key @a key is is not stored in the object;\n    that is, `find(key) == end()`. See example below.\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes in the JSON value.\n\n    @complexity Logarithmic in the size of the container.\n\n    @sa see @ref operator[](const typename object_t::key_type&) for unchecked\n    access by reference\n    @sa see @ref value() for access by value with a default value\n\n    @since version 1.0.0\n\n    @liveexample{The example below shows how object elements can be read and\n    written using `at()`. It also demonstrates the different exceptions that\n    can be thrown.,at__object_t_key_type}\n    */\n    reference at(const typename object_t::key_type& key)\n    {\n        // at only works for objects\n        if (JSON_HEDLEY_LIKELY(is_object()))\n        {\n            JSON_TRY\n            {\n                return set_parent(m_value.object->at(key));\n            }\n            JSON_CATCH (std::out_of_range&)\n            {\n                // create better exception explanation\n                JSON_THROW(out_of_range::create(403, \"key '\" + key + \"' not found\", *this));\n            }\n        }\n        else\n        {\n            JSON_THROW(type_error::create(304, \"cannot use at() with \" + std::string(type_name()), *this));\n        }\n    }\n\n    /*!\n    @brief access specified object element with bounds checking\n\n    Returns a const reference to the element at with specified key @a key,\n    with bounds checking.\n\n    @param[in] key  key of the element to access\n\n    @return const reference to the element at key @a key\n\n    @throw type_error.304 if the JSON value is not an object; in this case,\n    calling `at` with a key makes no sense. See example below.\n    @throw out_of_range.403 if the key @a key is is not stored in the object;\n    that is, `find(key) == end()`. See example below.\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes in the JSON value.\n\n    @complexity Logarithmic in the size of the container.\n\n    @sa see @ref operator[](const typename object_t::key_type&) for unchecked\n    access by reference\n    @sa see @ref value() for access by value with a default value\n\n    @since version 1.0.0\n\n    @liveexample{The example below shows how object elements can be read using\n    `at()`. It also demonstrates the different exceptions that can be thrown.,\n    at__object_t_key_type_const}\n    */\n    const_reference at(const typename object_t::key_type& key) const\n    {\n        // at only works for objects\n        if (JSON_HEDLEY_LIKELY(is_object()))\n        {\n            JSON_TRY\n            {\n                return m_value.object->at(key);\n            }\n            JSON_CATCH (std::out_of_range&)\n            {\n                // create better exception explanation\n                JSON_THROW(out_of_range::create(403, \"key '\" + key + \"' not found\", *this));\n            }\n        }\n        else\n        {\n            JSON_THROW(type_error::create(304, \"cannot use at() with \" + std::string(type_name()), *this));\n        }\n    }\n\n    /*!\n    @brief access specified array element\n\n    Returns a reference to the element at specified location @a idx.\n\n    @note If @a idx is beyond the range of the array (i.e., `idx >= size()`),\n    then the array is silently filled up with `null` values to make `idx` a\n    valid reference to the last stored element.\n\n    @param[in] idx  index of the element to access\n\n    @return reference to the element at index @a idx\n\n    @throw type_error.305 if the JSON value is not an array or null; in that\n    cases, using the [] operator with an index makes no sense.\n\n    @complexity Constant if @a idx is in the range of the array. Otherwise\n    linear in `idx - size()`.\n\n    @liveexample{The example below shows how array elements can be read and\n    written using `[]` operator. Note the addition of `null`\n    values.,operatorarray__size_type}\n\n    @since version 1.0.0\n    */\n    reference operator[](size_type idx)\n    {\n        // implicitly convert null value to an empty array\n        if (is_null())\n        {\n            m_type = value_t::array;\n            m_value.array = create<array_t>();\n            assert_invariant();\n        }\n\n        // operator[] only works for arrays\n        if (JSON_HEDLEY_LIKELY(is_array()))\n        {\n            // fill up array with null values if given idx is outside range\n            if (idx >= m_value.array->size())\n            {\n#if JSON_DIAGNOSTICS\n                // remember array size before resizing\n                const auto previous_size = m_value.array->size();\n#endif\n                m_value.array->resize(idx + 1);\n\n#if JSON_DIAGNOSTICS\n                // set parent for values added above\n                set_parents(begin() + static_cast<typename iterator::difference_type>(previous_size), static_cast<typename iterator::difference_type>(idx + 1 - previous_size));\n#endif\n            }\n\n            return m_value.array->operator[](idx);\n        }\n\n        JSON_THROW(type_error::create(305, \"cannot use operator[] with a numeric argument with \" + std::string(type_name()), *this));\n    }\n\n    /*!\n    @brief access specified array element\n\n    Returns a const reference to the element at specified location @a idx.\n\n    @param[in] idx  index of the element to access\n\n    @return const reference to the element at index @a idx\n\n    @throw type_error.305 if the JSON value is not an array; in that case,\n    using the [] operator with an index makes no sense.\n\n    @complexity Constant.\n\n    @liveexample{The example below shows how array elements can be read using\n    the `[]` operator.,operatorarray__size_type_const}\n\n    @since version 1.0.0\n    */\n    const_reference operator[](size_type idx) const\n    {\n        // const operator[] only works for arrays\n        if (JSON_HEDLEY_LIKELY(is_array()))\n        {\n            return m_value.array->operator[](idx);\n        }\n\n        JSON_THROW(type_error::create(305, \"cannot use operator[] with a numeric argument with \" + std::string(type_name()), *this));\n    }\n\n    /*!\n    @brief access specified object element\n\n    Returns a reference to the element at with specified key @a key.\n\n    @note If @a key is not found in the object, then it is silently added to\n    the object and filled with a `null` value to make `key` a valid reference.\n    In case the value was `null` before, it is converted to an object.\n\n    @param[in] key  key of the element to access\n\n    @return reference to the element at key @a key\n\n    @throw type_error.305 if the JSON value is not an object or null; in that\n    cases, using the [] operator with a key makes no sense.\n\n    @complexity Logarithmic in the size of the container.\n\n    @liveexample{The example below shows how object elements can be read and\n    written using the `[]` operator.,operatorarray__key_type}\n\n    @sa see @ref at(const typename object_t::key_type&) for access by reference\n    with range checking\n    @sa see @ref value() for access by value with a default value\n\n    @since version 1.0.0\n    */\n    reference operator[](const typename object_t::key_type& key)\n    {\n        // implicitly convert null value to an empty object\n        if (is_null())\n        {\n            m_type = value_t::object;\n            m_value.object = create<object_t>();\n            assert_invariant();\n        }\n\n        // operator[] only works for objects\n        if (JSON_HEDLEY_LIKELY(is_object()))\n        {\n            return set_parent(m_value.object->operator[](key));\n        }\n\n        JSON_THROW(type_error::create(305, \"cannot use operator[] with a string argument with \" + std::string(type_name()), *this));\n    }\n\n    /*!\n    @brief read-only access specified object element\n\n    Returns a const reference to the element at with specified key @a key. No\n    bounds checking is performed.\n\n    @warning If the element with key @a key does not exist, the behavior is\n    undefined.\n\n    @param[in] key  key of the element to access\n\n    @return const reference to the element at key @a key\n\n    @pre The element with key @a key must exist. **This precondition is\n         enforced with an assertion.**\n\n    @throw type_error.305 if the JSON value is not an object; in that case,\n    using the [] operator with a key makes no sense.\n\n    @complexity Logarithmic in the size of the container.\n\n    @liveexample{The example below shows how object elements can be read using\n    the `[]` operator.,operatorarray__key_type_const}\n\n    @sa see @ref at(const typename object_t::key_type&) for access by reference\n    with range checking\n    @sa see @ref value() for access by value with a default value\n\n    @since version 1.0.0\n    */\n    const_reference operator[](const typename object_t::key_type& key) const\n    {\n        // const operator[] only works for objects\n        if (JSON_HEDLEY_LIKELY(is_object()))\n        {\n            JSON_ASSERT(m_value.object->find(key) != m_value.object->end());\n            return m_value.object->find(key)->second;\n        }\n\n        JSON_THROW(type_error::create(305, \"cannot use operator[] with a string argument with \" + std::string(type_name()), *this));\n    }\n\n    /*!\n    @brief access specified object element\n\n    Returns a reference to the element at with specified key @a key.\n\n    @note If @a key is not found in the object, then it is silently added to\n    the object and filled with a `null` value to make `key` a valid reference.\n    In case the value was `null` before, it is converted to an object.\n\n    @param[in] key  key of the element to access\n\n    @return reference to the element at key @a key\n\n    @throw type_error.305 if the JSON value is not an object or null; in that\n    cases, using the [] operator with a key makes no sense.\n\n    @complexity Logarithmic in the size of the container.\n\n    @liveexample{The example below shows how object elements can be read and\n    written using the `[]` operator.,operatorarray__key_type}\n\n    @sa see @ref at(const typename object_t::key_type&) for access by reference\n    with range checking\n    @sa see @ref value() for access by value with a default value\n\n    @since version 1.1.0\n    */\n    template<typename T>\n    JSON_HEDLEY_NON_NULL(2)\n    reference operator[](T* key)\n    {\n        // implicitly convert null to object\n        if (is_null())\n        {\n            m_type = value_t::object;\n            m_value = value_t::object;\n            assert_invariant();\n        }\n\n        // at only works for objects\n        if (JSON_HEDLEY_LIKELY(is_object()))\n        {\n            return set_parent(m_value.object->operator[](key));\n        }\n\n        JSON_THROW(type_error::create(305, \"cannot use operator[] with a string argument with \" + std::string(type_name()), *this));\n    }\n\n    /*!\n    @brief read-only access specified object element\n\n    Returns a const reference to the element at with specified key @a key. No\n    bounds checking is performed.\n\n    @warning If the element with key @a key does not exist, the behavior is\n    undefined.\n\n    @param[in] key  key of the element to access\n\n    @return const reference to the element at key @a key\n\n    @pre The element with key @a key must exist. **This precondition is\n         enforced with an assertion.**\n\n    @throw type_error.305 if the JSON value is not an object; in that case,\n    using the [] operator with a key makes no sense.\n\n    @complexity Logarithmic in the size of the container.\n\n    @liveexample{The example below shows how object elements can be read using\n    the `[]` operator.,operatorarray__key_type_const}\n\n    @sa see @ref at(const typename object_t::key_type&) for access by reference\n    with range checking\n    @sa see @ref value() for access by value with a default value\n\n    @since version 1.1.0\n    */\n    template<typename T>\n    JSON_HEDLEY_NON_NULL(2)\n    const_reference operator[](T* key) const\n    {\n        // at only works for objects\n        if (JSON_HEDLEY_LIKELY(is_object()))\n        {\n            JSON_ASSERT(m_value.object->find(key) != m_value.object->end());\n            return m_value.object->find(key)->second;\n        }\n\n        JSON_THROW(type_error::create(305, \"cannot use operator[] with a string argument with \" + std::string(type_name()), *this));\n    }\n\n    /*!\n    @brief access specified object element with default value\n\n    Returns either a copy of an object's element at the specified key @a key\n    or a given default value if no element with key @a key exists.\n\n    The function is basically equivalent to executing\n    @code {.cpp}\n    try {\n        return at(key);\n    } catch(out_of_range) {\n        return default_value;\n    }\n    @endcode\n\n    @note Unlike @ref at(const typename object_t::key_type&), this function\n    does not throw if the given key @a key was not found.\n\n    @note Unlike @ref operator[](const typename object_t::key_type& key), this\n    function does not implicitly add an element to the position defined by @a\n    key. This function is furthermore also applicable to const objects.\n\n    @param[in] key  key of the element to access\n    @param[in] default_value  the value to return if @a key is not found\n\n    @tparam ValueType type compatible to JSON values, for instance `int` for\n    JSON integer numbers, `bool` for JSON booleans, or `std::vector` types for\n    JSON arrays. Note the type of the expected value at @a key and the default\n    value @a default_value must be compatible.\n\n    @return copy of the element at key @a key or @a default_value if @a key\n    is not found\n\n    @throw type_error.302 if @a default_value does not match the type of the\n    value at @a key\n    @throw type_error.306 if the JSON value is not an object; in that case,\n    using `value()` with a key makes no sense.\n\n    @complexity Logarithmic in the size of the container.\n\n    @liveexample{The example below shows how object elements can be queried\n    with a default value.,basic_json__value}\n\n    @sa see @ref at(const typename object_t::key_type&) for access by reference\n    with range checking\n    @sa see @ref operator[](const typename object_t::key_type&) for unchecked\n    access by reference\n\n    @since version 1.0.0\n    */\n    // using std::is_convertible in a std::enable_if will fail when using explicit conversions\n    template < class ValueType, typename std::enable_if <\n                   detail::is_getable<basic_json_t, ValueType>::value\n                   && !std::is_same<value_t, ValueType>::value, int >::type = 0 >\n    ValueType value(const typename object_t::key_type& key, const ValueType& default_value) const\n    {\n        // at only works for objects\n        if (JSON_HEDLEY_LIKELY(is_object()))\n        {\n            // if key is found, return value and given default value otherwise\n            const auto it = find(key);\n            if (it != end())\n            {\n                return it->template get<ValueType>();\n            }\n\n            return default_value;\n        }\n\n        JSON_THROW(type_error::create(306, \"cannot use value() with \" + std::string(type_name()), *this));\n    }\n\n    /*!\n    @brief overload for a default value of type const char*\n    @copydoc basic_json::value(const typename object_t::key_type&, const ValueType&) const\n    */\n    string_t value(const typename object_t::key_type& key, const char* default_value) const\n    {\n        return value(key, string_t(default_value));\n    }\n\n    /*!\n    @brief access specified object element via JSON Pointer with default value\n\n    Returns either a copy of an object's element at the specified key @a key\n    or a given default value if no element with key @a key exists.\n\n    The function is basically equivalent to executing\n    @code {.cpp}\n    try {\n        return at(ptr);\n    } catch(out_of_range) {\n        return default_value;\n    }\n    @endcode\n\n    @note Unlike @ref at(const json_pointer&), this function does not throw\n    if the given key @a key was not found.\n\n    @param[in] ptr  a JSON pointer to the element to access\n    @param[in] default_value  the value to return if @a ptr found no value\n\n    @tparam ValueType type compatible to JSON values, for instance `int` for\n    JSON integer numbers, `bool` for JSON booleans, or `std::vector` types for\n    JSON arrays. Note the type of the expected value at @a key and the default\n    value @a default_value must be compatible.\n\n    @return copy of the element at key @a key or @a default_value if @a key\n    is not found\n\n    @throw type_error.302 if @a default_value does not match the type of the\n    value at @a ptr\n    @throw type_error.306 if the JSON value is not an object; in that case,\n    using `value()` with a key makes no sense.\n\n    @complexity Logarithmic in the size of the container.\n\n    @liveexample{The example below shows how object elements can be queried\n    with a default value.,basic_json__value_ptr}\n\n    @sa see @ref operator[](const json_pointer&) for unchecked access by reference\n\n    @since version 2.0.2\n    */\n    template<class ValueType, typename std::enable_if<\n                 detail::is_getable<basic_json_t, ValueType>::value, int>::type = 0>\n    ValueType value(const json_pointer& ptr, const ValueType& default_value) const\n    {\n        // at only works for objects\n        if (JSON_HEDLEY_LIKELY(is_object()))\n        {\n            // if pointer resolves a value, return it or use default value\n            JSON_TRY\n            {\n                return ptr.get_checked(this).template get<ValueType>();\n            }\n            JSON_INTERNAL_CATCH (out_of_range&)\n            {\n                return default_value;\n            }\n        }\n\n        JSON_THROW(type_error::create(306, \"cannot use value() with \" + std::string(type_name()), *this));\n    }\n\n    /*!\n    @brief overload for a default value of type const char*\n    @copydoc basic_json::value(const json_pointer&, ValueType) const\n    */\n    JSON_HEDLEY_NON_NULL(3)\n    string_t value(const json_pointer& ptr, const char* default_value) const\n    {\n        return value(ptr, string_t(default_value));\n    }\n\n    /*!\n    @brief access the first element\n\n    Returns a reference to the first element in the container. For a JSON\n    container `c`, the expression `c.front()` is equivalent to `*c.begin()`.\n\n    @return In case of a structured type (array or object), a reference to the\n    first element is returned. In case of number, string, boolean, or binary\n    values, a reference to the value is returned.\n\n    @complexity Constant.\n\n    @pre The JSON value must not be `null` (would throw `std::out_of_range`)\n    or an empty array or object (undefined behavior, **guarded by\n    assertions**).\n    @post The JSON value remains unchanged.\n\n    @throw invalid_iterator.214 when called on `null` value\n\n    @liveexample{The following code shows an example for `front()`.,front}\n\n    @sa see @ref back() -- access the last element\n\n    @since version 1.0.0\n    */\n    reference front()\n    {\n        return *begin();\n    }\n\n    /*!\n    @copydoc basic_json::front()\n    */\n    const_reference front() const\n    {\n        return *cbegin();\n    }\n\n    /*!\n    @brief access the last element\n\n    Returns a reference to the last element in the container. For a JSON\n    container `c`, the expression `c.back()` is equivalent to\n    @code {.cpp}\n    auto tmp = c.end();\n    --tmp;\n    return *tmp;\n    @endcode\n\n    @return In case of a structured type (array or object), a reference to the\n    last element is returned. In case of number, string, boolean, or binary\n    values, a reference to the value is returned.\n\n    @complexity Constant.\n\n    @pre The JSON value must not be `null` (would throw `std::out_of_range`)\n    or an empty array or object (undefined behavior, **guarded by\n    assertions**).\n    @post The JSON value remains unchanged.\n\n    @throw invalid_iterator.214 when called on a `null` value. See example\n    below.\n\n    @liveexample{The following code shows an example for `back()`.,back}\n\n    @sa see @ref front() -- access the first element\n\n    @since version 1.0.0\n    */\n    reference back()\n    {\n        auto tmp = end();\n        --tmp;\n        return *tmp;\n    }\n\n    /*!\n    @copydoc basic_json::back()\n    */\n    const_reference back() const\n    {\n        auto tmp = cend();\n        --tmp;\n        return *tmp;\n    }\n\n    /*!\n    @brief remove element given an iterator\n\n    Removes the element specified by iterator @a pos. The iterator @a pos must\n    be valid and dereferenceable. Thus the `end()` iterator (which is valid,\n    but is not dereferenceable) cannot be used as a value for @a pos.\n\n    If called on a primitive type other than `null`, the resulting JSON value\n    will be `null`.\n\n    @param[in] pos iterator to the element to remove\n    @return Iterator following the last removed element. If the iterator @a\n    pos refers to the last element, the `end()` iterator is returned.\n\n    @tparam IteratorType an @ref iterator or @ref const_iterator\n\n    @post Invalidates iterators and references at or after the point of the\n    erase, including the `end()` iterator.\n\n    @throw type_error.307 if called on a `null` value; example: `\"cannot use\n    erase() with null\"`\n    @throw invalid_iterator.202 if called on an iterator which does not belong\n    to the current JSON value; example: `\"iterator does not fit current\n    value\"`\n    @throw invalid_iterator.205 if called on a primitive type with invalid\n    iterator (i.e., any iterator which is not `begin()`); example: `\"iterator\n    out of range\"`\n\n    @complexity The complexity depends on the type:\n    - objects: amortized constant\n    - arrays: linear in distance between @a pos and the end of the container\n    - strings and binary: linear in the length of the member\n    - other types: constant\n\n    @liveexample{The example shows the result of `erase()` for different JSON\n    types.,erase__IteratorType}\n\n    @sa see @ref erase(IteratorType, IteratorType) -- removes the elements in\n    the given range\n    @sa see @ref erase(const typename object_t::key_type&) -- removes the element\n    from an object at the given key\n    @sa see @ref erase(const size_type) -- removes the element from an array at\n    the given index\n\n    @since version 1.0.0\n    */\n    template < class IteratorType, typename std::enable_if <\n                   std::is_same<IteratorType, typename basic_json_t::iterator>::value ||\n                   std::is_same<IteratorType, typename basic_json_t::const_iterator>::value, int >::type\n               = 0 >\n    IteratorType erase(IteratorType pos)\n    {\n        // make sure iterator fits the current value\n        if (JSON_HEDLEY_UNLIKELY(this != pos.m_object))\n        {\n            JSON_THROW(invalid_iterator::create(202, \"iterator does not fit current value\", *this));\n        }\n\n        IteratorType result = end();\n\n        switch (m_type)\n        {\n            case value_t::boolean:\n            case value_t::number_float:\n            case value_t::number_integer:\n            case value_t::number_unsigned:\n            case value_t::string:\n            case value_t::binary:\n            {\n                if (JSON_HEDLEY_UNLIKELY(!pos.m_it.primitive_iterator.is_begin()))\n                {\n                    JSON_THROW(invalid_iterator::create(205, \"iterator out of range\", *this));\n                }\n\n                if (is_string())\n                {\n                    AllocatorType<string_t> alloc;\n                    std::allocator_traits<decltype(alloc)>::destroy(alloc, m_value.string);\n                    std::allocator_traits<decltype(alloc)>::deallocate(alloc, m_value.string, 1);\n                    m_value.string = nullptr;\n                }\n                else if (is_binary())\n                {\n                    AllocatorType<binary_t> alloc;\n                    std::allocator_traits<decltype(alloc)>::destroy(alloc, m_value.binary);\n                    std::allocator_traits<decltype(alloc)>::deallocate(alloc, m_value.binary, 1);\n                    m_value.binary = nullptr;\n                }\n\n                m_type = value_t::null;\n                assert_invariant();\n                break;\n            }\n\n            case value_t::object:\n            {\n                result.m_it.object_iterator = m_value.object->erase(pos.m_it.object_iterator);\n                break;\n            }\n\n            case value_t::array:\n            {\n                result.m_it.array_iterator = m_value.array->erase(pos.m_it.array_iterator);\n                break;\n            }\n\n            default:\n                JSON_THROW(type_error::create(307, \"cannot use erase() with \" + std::string(type_name()), *this));\n        }\n\n        return result;\n    }\n\n    /*!\n    @brief remove elements given an iterator range\n\n    Removes the element specified by the range `[first; last)`. The iterator\n    @a first does not need to be dereferenceable if `first == last`: erasing\n    an empty range is a no-op.\n\n    If called on a primitive type other than `null`, the resulting JSON value\n    will be `null`.\n\n    @param[in] first iterator to the beginning of the range to remove\n    @param[in] last iterator past the end of the range to remove\n    @return Iterator following the last removed element. If the iterator @a\n    second refers to the last element, the `end()` iterator is returned.\n\n    @tparam IteratorType an @ref iterator or @ref const_iterator\n\n    @post Invalidates iterators and references at or after the point of the\n    erase, including the `end()` iterator.\n\n    @throw type_error.307 if called on a `null` value; example: `\"cannot use\n    erase() with null\"`\n    @throw invalid_iterator.203 if called on iterators which does not belong\n    to the current JSON value; example: `\"iterators do not fit current value\"`\n    @throw invalid_iterator.204 if called on a primitive type with invalid\n    iterators (i.e., if `first != begin()` and `last != end()`); example:\n    `\"iterators out of range\"`\n\n    @complexity The complexity depends on the type:\n    - objects: `log(size()) + std::distance(first, last)`\n    - arrays: linear in the distance between @a first and @a last, plus linear\n      in the distance between @a last and end of the container\n    - strings and binary: linear in the length of the member\n    - other types: constant\n\n    @liveexample{The example shows the result of `erase()` for different JSON\n    types.,erase__IteratorType_IteratorType}\n\n    @sa see @ref erase(IteratorType) -- removes the element at a given position\n    @sa see @ref erase(const typename object_t::key_type&) -- removes the element\n    from an object at the given key\n    @sa see @ref erase(const size_type) -- removes the element from an array at\n    the given index\n\n    @since version 1.0.0\n    */\n    template < class IteratorType, typename std::enable_if <\n                   std::is_same<IteratorType, typename basic_json_t::iterator>::value ||\n                   std::is_same<IteratorType, typename basic_json_t::const_iterator>::value, int >::type\n               = 0 >\n    IteratorType erase(IteratorType first, IteratorType last)\n    {\n        // make sure iterator fits the current value\n        if (JSON_HEDLEY_UNLIKELY(this != first.m_object || this != last.m_object))\n        {\n            JSON_THROW(invalid_iterator::create(203, \"iterators do not fit current value\", *this));\n        }\n\n        IteratorType result = end();\n\n        switch (m_type)\n        {\n            case value_t::boolean:\n            case value_t::number_float:\n            case value_t::number_integer:\n            case value_t::number_unsigned:\n            case value_t::string:\n            case value_t::binary:\n            {\n                if (JSON_HEDLEY_LIKELY(!first.m_it.primitive_iterator.is_begin()\n                                       || !last.m_it.primitive_iterator.is_end()))\n                {\n                    JSON_THROW(invalid_iterator::create(204, \"iterators out of range\", *this));\n                }\n\n                if (is_string())\n                {\n                    AllocatorType<string_t> alloc;\n                    std::allocator_traits<decltype(alloc)>::destroy(alloc, m_value.string);\n                    std::allocator_traits<decltype(alloc)>::deallocate(alloc, m_value.string, 1);\n                    m_value.string = nullptr;\n                }\n                else if (is_binary())\n                {\n                    AllocatorType<binary_t> alloc;\n                    std::allocator_traits<decltype(alloc)>::destroy(alloc, m_value.binary);\n                    std::allocator_traits<decltype(alloc)>::deallocate(alloc, m_value.binary, 1);\n                    m_value.binary = nullptr;\n                }\n\n                m_type = value_t::null;\n                assert_invariant();\n                break;\n            }\n\n            case value_t::object:\n            {\n                result.m_it.object_iterator = m_value.object->erase(first.m_it.object_iterator,\n                                              last.m_it.object_iterator);\n                break;\n            }\n\n            case value_t::array:\n            {\n                result.m_it.array_iterator = m_value.array->erase(first.m_it.array_iterator,\n                                             last.m_it.array_iterator);\n                break;\n            }\n\n            default:\n                JSON_THROW(type_error::create(307, \"cannot use erase() with \" + std::string(type_name()), *this));\n        }\n\n        return result;\n    }\n\n    /*!\n    @brief remove element from a JSON object given a key\n\n    Removes elements from a JSON object with the key value @a key.\n\n    @param[in] key value of the elements to remove\n\n    @return Number of elements removed. If @a ObjectType is the default\n    `std::map` type, the return value will always be `0` (@a key was not\n    found) or `1` (@a key was found).\n\n    @post References and iterators to the erased elements are invalidated.\n    Other references and iterators are not affected.\n\n    @throw type_error.307 when called on a type other than JSON object;\n    example: `\"cannot use erase() with null\"`\n\n    @complexity `log(size()) + count(key)`\n\n    @liveexample{The example shows the effect of `erase()`.,erase__key_type}\n\n    @sa see @ref erase(IteratorType) -- removes the element at a given position\n    @sa see @ref erase(IteratorType, IteratorType) -- removes the elements in\n    the given range\n    @sa see @ref erase(const size_type) -- removes the element from an array at\n    the given index\n\n    @since version 1.0.0\n    */\n    size_type erase(const typename object_t::key_type& key)\n    {\n        // this erase only works for objects\n        if (JSON_HEDLEY_LIKELY(is_object()))\n        {\n            return m_value.object->erase(key);\n        }\n\n        JSON_THROW(type_error::create(307, \"cannot use erase() with \" + std::string(type_name()), *this));\n    }\n\n    /*!\n    @brief remove element from a JSON array given an index\n\n    Removes element from a JSON array at the index @a idx.\n\n    @param[in] idx index of the element to remove\n\n    @throw type_error.307 when called on a type other than JSON object;\n    example: `\"cannot use erase() with null\"`\n    @throw out_of_range.401 when `idx >= size()`; example: `\"array index 17\n    is out of range\"`\n\n    @complexity Linear in distance between @a idx and the end of the container.\n\n    @liveexample{The example shows the effect of `erase()`.,erase__size_type}\n\n    @sa see @ref erase(IteratorType) -- removes the element at a given position\n    @sa see @ref erase(IteratorType, IteratorType) -- removes the elements in\n    the given range\n    @sa see @ref erase(const typename object_t::key_type&) -- removes the element\n    from an object at the given key\n\n    @since version 1.0.0\n    */\n    void erase(const size_type idx)\n    {\n        // this erase only works for arrays\n        if (JSON_HEDLEY_LIKELY(is_array()))\n        {\n            if (JSON_HEDLEY_UNLIKELY(idx >= size()))\n            {\n                JSON_THROW(out_of_range::create(401, \"array index \" + std::to_string(idx) + \" is out of range\", *this));\n            }\n\n            m_value.array->erase(m_value.array->begin() + static_cast<difference_type>(idx));\n        }\n        else\n        {\n            JSON_THROW(type_error::create(307, \"cannot use erase() with \" + std::string(type_name()), *this));\n        }\n    }\n\n    /// @}\n\n\n    ////////////\n    // lookup //\n    ////////////\n\n    /// @name lookup\n    /// @{\n\n    /*!\n    @brief find an element in a JSON object\n\n    Finds an element in a JSON object with key equivalent to @a key. If the\n    element is not found or the JSON value is not an object, end() is\n    returned.\n\n    @note This method always returns @ref end() when executed on a JSON type\n          that is not an object.\n\n    @param[in] key key value of the element to search for.\n\n    @return Iterator to an element with key equivalent to @a key. If no such\n    element is found or the JSON value is not an object, past-the-end (see\n    @ref end()) iterator is returned.\n\n    @complexity Logarithmic in the size of the JSON object.\n\n    @liveexample{The example shows how `find()` is used.,find__key_type}\n\n    @sa see @ref contains(KeyT&&) const -- checks whether a key exists\n\n    @since version 1.0.0\n    */\n    template<typename KeyT>\n    iterator find(KeyT&& key)\n    {\n        auto result = end();\n\n        if (is_object())\n        {\n            result.m_it.object_iterator = m_value.object->find(std::forward<KeyT>(key));\n        }\n\n        return result;\n    }\n\n    /*!\n    @brief find an element in a JSON object\n    @copydoc find(KeyT&&)\n    */\n    template<typename KeyT>\n    const_iterator find(KeyT&& key) const\n    {\n        auto result = cend();\n\n        if (is_object())\n        {\n            result.m_it.object_iterator = m_value.object->find(std::forward<KeyT>(key));\n        }\n\n        return result;\n    }\n\n    /*!\n    @brief returns the number of occurrences of a key in a JSON object\n\n    Returns the number of elements with key @a key. If ObjectType is the\n    default `std::map` type, the return value will always be `0` (@a key was\n    not found) or `1` (@a key was found).\n\n    @note This method always returns `0` when executed on a JSON type that is\n          not an object.\n\n    @param[in] key key value of the element to count\n\n    @return Number of elements with key @a key. If the JSON value is not an\n    object, the return value will be `0`.\n\n    @complexity Logarithmic in the size of the JSON object.\n\n    @liveexample{The example shows how `count()` is used.,count}\n\n    @since version 1.0.0\n    */\n    template<typename KeyT>\n    size_type count(KeyT&& key) const\n    {\n        // return 0 for all nonobject types\n        return is_object() ? m_value.object->count(std::forward<KeyT>(key)) : 0;\n    }\n\n    /*!\n    @brief check the existence of an element in a JSON object\n\n    Check whether an element exists in a JSON object with key equivalent to\n    @a key. If the element is not found or the JSON value is not an object,\n    false is returned.\n\n    @note This method always returns false when executed on a JSON type\n          that is not an object.\n\n    @param[in] key key value to check its existence.\n\n    @return true if an element with specified @a key exists. If no such\n    element with such key is found or the JSON value is not an object,\n    false is returned.\n\n    @complexity Logarithmic in the size of the JSON object.\n\n    @liveexample{The following code shows an example for `contains()`.,contains}\n\n    @sa see @ref find(KeyT&&) -- returns an iterator to an object element\n    @sa see @ref contains(const json_pointer&) const -- checks the existence for a JSON pointer\n\n    @since version 3.6.0\n    */\n    template < typename KeyT, typename std::enable_if <\n                   !std::is_same<typename std::decay<KeyT>::type, json_pointer>::value, int >::type = 0 >\n    bool contains(KeyT && key) const\n    {\n        return is_object() && m_value.object->find(std::forward<KeyT>(key)) != m_value.object->end();\n    }\n\n    /*!\n    @brief check the existence of an element in a JSON object given a JSON pointer\n\n    Check whether the given JSON pointer @a ptr can be resolved in the current\n    JSON value.\n\n    @note This method can be executed on any JSON value type.\n\n    @param[in] ptr JSON pointer to check its existence.\n\n    @return true if the JSON pointer can be resolved to a stored value, false\n    otherwise.\n\n    @post If `j.contains(ptr)` returns true, it is safe to call `j[ptr]`.\n\n    @throw parse_error.106   if an array index begins with '0'\n    @throw parse_error.109   if an array index was not a number\n\n    @complexity Logarithmic in the size of the JSON object.\n\n    @liveexample{The following code shows an example for `contains()`.,contains_json_pointer}\n\n    @sa see @ref contains(KeyT &&) const -- checks the existence of a key\n\n    @since version 3.7.0\n    */\n    bool contains(const json_pointer& ptr) const\n    {\n        return ptr.contains(this);\n    }\n\n    /// @}\n\n\n    ///////////////\n    // iterators //\n    ///////////////\n\n    /// @name iterators\n    /// @{\n\n    /*!\n    @brief returns an iterator to the first element\n\n    Returns an iterator to the first element.\n\n    @image html range-begin-end.svg \"Illustration from cppreference.com\"\n\n    @return iterator to the first element\n\n    @complexity Constant.\n\n    @requirement This function helps `basic_json` satisfying the\n    [Container](https://en.cppreference.com/w/cpp/named_req/Container)\n    requirements:\n    - The complexity is constant.\n\n    @liveexample{The following code shows an example for `begin()`.,begin}\n\n    @sa see @ref cbegin() -- returns a const iterator to the beginning\n    @sa see @ref end() -- returns an iterator to the end\n    @sa see @ref cend() -- returns a const iterator to the end\n\n    @since version 1.0.0\n    */\n    iterator begin() noexcept\n    {\n        iterator result(this);\n        result.set_begin();\n        return result;\n    }\n\n    /*!\n    @copydoc basic_json::cbegin()\n    */\n    const_iterator begin() const noexcept\n    {\n        return cbegin();\n    }\n\n    /*!\n    @brief returns a const iterator to the first element\n\n    Returns a const iterator to the first element.\n\n    @image html range-begin-end.svg \"Illustration from cppreference.com\"\n\n    @return const iterator to the first element\n\n    @complexity Constant.\n\n    @requirement This function helps `basic_json` satisfying the\n    [Container](https://en.cppreference.com/w/cpp/named_req/Container)\n    requirements:\n    - The complexity is constant.\n    - Has the semantics of `const_cast<const basic_json&>(*this).begin()`.\n\n    @liveexample{The following code shows an example for `cbegin()`.,cbegin}\n\n    @sa see @ref begin() -- returns an iterator to the beginning\n    @sa see @ref end() -- returns an iterator to the end\n    @sa see @ref cend() -- returns a const iterator to the end\n\n    @since version 1.0.0\n    */\n    const_iterator cbegin() const noexcept\n    {\n        const_iterator result(this);\n        result.set_begin();\n        return result;\n    }\n\n    /*!\n    @brief returns an iterator to one past the last element\n\n    Returns an iterator to one past the last element.\n\n    @image html range-begin-end.svg \"Illustration from cppreference.com\"\n\n    @return iterator one past the last element\n\n    @complexity Constant.\n\n    @requirement This function helps `basic_json` satisfying the\n    [Container](https://en.cppreference.com/w/cpp/named_req/Container)\n    requirements:\n    - The complexity is constant.\n\n    @liveexample{The following code shows an example for `end()`.,end}\n\n    @sa see @ref cend() -- returns a const iterator to the end\n    @sa see @ref begin() -- returns an iterator to the beginning\n    @sa see @ref cbegin() -- returns a const iterator to the beginning\n\n    @since version 1.0.0\n    */\n    iterator end() noexcept\n    {\n        iterator result(this);\n        result.set_end();\n        return result;\n    }\n\n    /*!\n    @copydoc basic_json::cend()\n    */\n    const_iterator end() const noexcept\n    {\n        return cend();\n    }\n\n    /*!\n    @brief returns a const iterator to one past the last element\n\n    Returns a const iterator to one past the last element.\n\n    @image html range-begin-end.svg \"Illustration from cppreference.com\"\n\n    @return const iterator one past the last element\n\n    @complexity Constant.\n\n    @requirement This function helps `basic_json` satisfying the\n    [Container](https://en.cppreference.com/w/cpp/named_req/Container)\n    requirements:\n    - The complexity is constant.\n    - Has the semantics of `const_cast<const basic_json&>(*this).end()`.\n\n    @liveexample{The following code shows an example for `cend()`.,cend}\n\n    @sa see @ref end() -- returns an iterator to the end\n    @sa see @ref begin() -- returns an iterator to the beginning\n    @sa see @ref cbegin() -- returns a const iterator to the beginning\n\n    @since version 1.0.0\n    */\n    const_iterator cend() const noexcept\n    {\n        const_iterator result(this);\n        result.set_end();\n        return result;\n    }\n\n    /*!\n    @brief returns an iterator to the reverse-beginning\n\n    Returns an iterator to the reverse-beginning; that is, the last element.\n\n    @image html range-rbegin-rend.svg \"Illustration from cppreference.com\"\n\n    @complexity Constant.\n\n    @requirement This function helps `basic_json` satisfying the\n    [ReversibleContainer](https://en.cppreference.com/w/cpp/named_req/ReversibleContainer)\n    requirements:\n    - The complexity is constant.\n    - Has the semantics of `reverse_iterator(end())`.\n\n    @liveexample{The following code shows an example for `rbegin()`.,rbegin}\n\n    @sa see @ref crbegin() -- returns a const reverse iterator to the beginning\n    @sa see @ref rend() -- returns a reverse iterator to the end\n    @sa see @ref crend() -- returns a const reverse iterator to the end\n\n    @since version 1.0.0\n    */\n    reverse_iterator rbegin() noexcept\n    {\n        return reverse_iterator(end());\n    }\n\n    /*!\n    @copydoc basic_json::crbegin()\n    */\n    const_reverse_iterator rbegin() const noexcept\n    {\n        return crbegin();\n    }\n\n    /*!\n    @brief returns an iterator to the reverse-end\n\n    Returns an iterator to the reverse-end; that is, one before the first\n    element.\n\n    @image html range-rbegin-rend.svg \"Illustration from cppreference.com\"\n\n    @complexity Constant.\n\n    @requirement This function helps `basic_json` satisfying the\n    [ReversibleContainer](https://en.cppreference.com/w/cpp/named_req/ReversibleContainer)\n    requirements:\n    - The complexity is constant.\n    - Has the semantics of `reverse_iterator(begin())`.\n\n    @liveexample{The following code shows an example for `rend()`.,rend}\n\n    @sa see @ref crend() -- returns a const reverse iterator to the end\n    @sa see @ref rbegin() -- returns a reverse iterator to the beginning\n    @sa see @ref crbegin() -- returns a const reverse iterator to the beginning\n\n    @since version 1.0.0\n    */\n    reverse_iterator rend() noexcept\n    {\n        return reverse_iterator(begin());\n    }\n\n    /*!\n    @copydoc basic_json::crend()\n    */\n    const_reverse_iterator rend() const noexcept\n    {\n        return crend();\n    }\n\n    /*!\n    @brief returns a const reverse iterator to the last element\n\n    Returns a const iterator to the reverse-beginning; that is, the last\n    element.\n\n    @image html range-rbegin-rend.svg \"Illustration from cppreference.com\"\n\n    @complexity Constant.\n\n    @requirement This function helps `basic_json` satisfying the\n    [ReversibleContainer](https://en.cppreference.com/w/cpp/named_req/ReversibleContainer)\n    requirements:\n    - The complexity is constant.\n    - Has the semantics of `const_cast<const basic_json&>(*this).rbegin()`.\n\n    @liveexample{The following code shows an example for `crbegin()`.,crbegin}\n\n    @sa see @ref rbegin() -- returns a reverse iterator to the beginning\n    @sa see @ref rend() -- returns a reverse iterator to the end\n    @sa see @ref crend() -- returns a const reverse iterator to the end\n\n    @since version 1.0.0\n    */\n    const_reverse_iterator crbegin() const noexcept\n    {\n        return const_reverse_iterator(cend());\n    }\n\n    /*!\n    @brief returns a const reverse iterator to one before the first\n\n    Returns a const reverse iterator to the reverse-end; that is, one before\n    the first element.\n\n    @image html range-rbegin-rend.svg \"Illustration from cppreference.com\"\n\n    @complexity Constant.\n\n    @requirement This function helps `basic_json` satisfying the\n    [ReversibleContainer](https://en.cppreference.com/w/cpp/named_req/ReversibleContainer)\n    requirements:\n    - The complexity is constant.\n    - Has the semantics of `const_cast<const basic_json&>(*this).rend()`.\n\n    @liveexample{The following code shows an example for `crend()`.,crend}\n\n    @sa see @ref rend() -- returns a reverse iterator to the end\n    @sa see @ref rbegin() -- returns a reverse iterator to the beginning\n    @sa see @ref crbegin() -- returns a const reverse iterator to the beginning\n\n    @since version 1.0.0\n    */\n    const_reverse_iterator crend() const noexcept\n    {\n        return const_reverse_iterator(cbegin());\n    }\n\n  public:\n    /*!\n    @brief wrapper to access iterator member functions in range-based for\n\n    This function allows to access @ref iterator::key() and @ref\n    iterator::value() during range-based for loops. In these loops, a\n    reference to the JSON values is returned, so there is no access to the\n    underlying iterator.\n\n    For loop without iterator_wrapper:\n\n    @code{cpp}\n    for (auto it = j_object.begin(); it != j_object.end(); ++it)\n    {\n        std::cout << \"key: \" << it.key() << \", value:\" << it.value() << '\\n';\n    }\n    @endcode\n\n    Range-based for loop without iterator proxy:\n\n    @code{cpp}\n    for (auto it : j_object)\n    {\n        // \"it\" is of type json::reference and has no key() member\n        std::cout << \"value: \" << it << '\\n';\n    }\n    @endcode\n\n    Range-based for loop with iterator proxy:\n\n    @code{cpp}\n    for (auto it : json::iterator_wrapper(j_object))\n    {\n        std::cout << \"key: \" << it.key() << \", value:\" << it.value() << '\\n';\n    }\n    @endcode\n\n    @note When iterating over an array, `key()` will return the index of the\n          element as string (see example).\n\n    @param[in] ref  reference to a JSON value\n    @return iteration proxy object wrapping @a ref with an interface to use in\n            range-based for loops\n\n    @liveexample{The following code shows how the wrapper is used,iterator_wrapper}\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes in the JSON value.\n\n    @complexity Constant.\n\n    @note The name of this function is not yet final and may change in the\n    future.\n\n    @deprecated This stream operator is deprecated and will be removed in\n                future 4.0.0 of the library. Please use @ref items() instead;\n                that is, replace `json::iterator_wrapper(j)` with `j.items()`.\n    */\n    JSON_HEDLEY_DEPRECATED_FOR(3.1.0, items())\n    static iteration_proxy<iterator> iterator_wrapper(reference ref) noexcept\n    {\n        return ref.items();\n    }\n\n    /*!\n    @copydoc iterator_wrapper(reference)\n    */\n    JSON_HEDLEY_DEPRECATED_FOR(3.1.0, items())\n    static iteration_proxy<const_iterator> iterator_wrapper(const_reference ref) noexcept\n    {\n        return ref.items();\n    }\n\n    /*!\n    @brief helper to access iterator member functions in range-based for\n\n    This function allows to access @ref iterator::key() and @ref\n    iterator::value() during range-based for loops. In these loops, a\n    reference to the JSON values is returned, so there is no access to the\n    underlying iterator.\n\n    For loop without `items()` function:\n\n    @code{cpp}\n    for (auto it = j_object.begin(); it != j_object.end(); ++it)\n    {\n        std::cout << \"key: \" << it.key() << \", value:\" << it.value() << '\\n';\n    }\n    @endcode\n\n    Range-based for loop without `items()` function:\n\n    @code{cpp}\n    for (auto it : j_object)\n    {\n        // \"it\" is of type json::reference and has no key() member\n        std::cout << \"value: \" << it << '\\n';\n    }\n    @endcode\n\n    Range-based for loop with `items()` function:\n\n    @code{cpp}\n    for (auto& el : j_object.items())\n    {\n        std::cout << \"key: \" << el.key() << \", value:\" << el.value() << '\\n';\n    }\n    @endcode\n\n    The `items()` function also allows to use\n    [structured bindings](https://en.cppreference.com/w/cpp/language/structured_binding)\n    (C++17):\n\n    @code{cpp}\n    for (auto& [key, val] : j_object.items())\n    {\n        std::cout << \"key: \" << key << \", value:\" << val << '\\n';\n    }\n    @endcode\n\n    @note When iterating over an array, `key()` will return the index of the\n          element as string (see example). For primitive types (e.g., numbers),\n          `key()` returns an empty string.\n\n    @warning Using `items()` on temporary objects is dangerous. Make sure the\n             object's lifetime exeeds the iteration. See\n             <https://github.com/nlohmann/json/issues/2040> for more\n             information.\n\n    @return iteration proxy object wrapping @a ref with an interface to use in\n            range-based for loops\n\n    @liveexample{The following code shows how the function is used.,items}\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes in the JSON value.\n\n    @complexity Constant.\n\n    @since version 3.1.0, structured bindings support since 3.5.0.\n    */\n    iteration_proxy<iterator> items() noexcept\n    {\n        return iteration_proxy<iterator>(*this);\n    }\n\n    /*!\n    @copydoc items()\n    */\n    iteration_proxy<const_iterator> items() const noexcept\n    {\n        return iteration_proxy<const_iterator>(*this);\n    }\n\n    /// @}\n\n\n    //////////////\n    // capacity //\n    //////////////\n\n    /// @name capacity\n    /// @{\n\n    /*!\n    @brief checks whether the container is empty.\n\n    Checks if a JSON value has no elements (i.e. whether its @ref size is `0`).\n\n    @return The return value depends on the different types and is\n            defined as follows:\n            Value type  | return value\n            ----------- | -------------\n            null        | `true`\n            boolean     | `false`\n            string      | `false`\n            number      | `false`\n            binary      | `false`\n            object      | result of function `object_t::empty()`\n            array       | result of function `array_t::empty()`\n\n    @liveexample{The following code uses `empty()` to check if a JSON\n    object contains any elements.,empty}\n\n    @complexity Constant, as long as @ref array_t and @ref object_t satisfy\n    the Container concept; that is, their `empty()` functions have constant\n    complexity.\n\n    @iterators No changes.\n\n    @exceptionsafety No-throw guarantee: this function never throws exceptions.\n\n    @note This function does not return whether a string stored as JSON value\n    is empty - it returns whether the JSON container itself is empty which is\n    false in the case of a string.\n\n    @requirement This function helps `basic_json` satisfying the\n    [Container](https://en.cppreference.com/w/cpp/named_req/Container)\n    requirements:\n    - The complexity is constant.\n    - Has the semantics of `begin() == end()`.\n\n    @sa see @ref size() -- returns the number of elements\n\n    @since version 1.0.0\n    */\n    bool empty() const noexcept\n    {\n        switch (m_type)\n        {\n            case value_t::null:\n            {\n                // null values are empty\n                return true;\n            }\n\n            case value_t::array:\n            {\n                // delegate call to array_t::empty()\n                return m_value.array->empty();\n            }\n\n            case value_t::object:\n            {\n                // delegate call to object_t::empty()\n                return m_value.object->empty();\n            }\n\n            default:\n            {\n                // all other types are nonempty\n                return false;\n            }\n        }\n    }\n\n    /*!\n    @brief returns the number of elements\n\n    Returns the number of elements in a JSON value.\n\n    @return The return value depends on the different types and is\n            defined as follows:\n            Value type  | return value\n            ----------- | -------------\n            null        | `0`\n            boolean     | `1`\n            string      | `1`\n            number      | `1`\n            binary      | `1`\n            object      | result of function object_t::size()\n            array       | result of function array_t::size()\n\n    @liveexample{The following code calls `size()` on the different value\n    types.,size}\n\n    @complexity Constant, as long as @ref array_t and @ref object_t satisfy\n    the Container concept; that is, their size() functions have constant\n    complexity.\n\n    @iterators No changes.\n\n    @exceptionsafety No-throw guarantee: this function never throws exceptions.\n\n    @note This function does not return the length of a string stored as JSON\n    value - it returns the number of elements in the JSON value which is 1 in\n    the case of a string.\n\n    @requirement This function helps `basic_json` satisfying the\n    [Container](https://en.cppreference.com/w/cpp/named_req/Container)\n    requirements:\n    - The complexity is constant.\n    - Has the semantics of `std::distance(begin(), end())`.\n\n    @sa see @ref empty() -- checks whether the container is empty\n    @sa see @ref max_size() -- returns the maximal number of elements\n\n    @since version 1.0.0\n    */\n    size_type size() const noexcept\n    {\n        switch (m_type)\n        {\n            case value_t::null:\n            {\n                // null values are empty\n                return 0;\n            }\n\n            case value_t::array:\n            {\n                // delegate call to array_t::size()\n                return m_value.array->size();\n            }\n\n            case value_t::object:\n            {\n                // delegate call to object_t::size()\n                return m_value.object->size();\n            }\n\n            default:\n            {\n                // all other types have size 1\n                return 1;\n            }\n        }\n    }\n\n    /*!\n    @brief returns the maximum possible number of elements\n\n    Returns the maximum number of elements a JSON value is able to hold due to\n    system or library implementation limitations, i.e. `std::distance(begin(),\n    end())` for the JSON value.\n\n    @return The return value depends on the different types and is\n            defined as follows:\n            Value type  | return value\n            ----------- | -------------\n            null        | `0` (same as `size()`)\n            boolean     | `1` (same as `size()`)\n            string      | `1` (same as `size()`)\n            number      | `1` (same as `size()`)\n            binary      | `1` (same as `size()`)\n            object      | result of function `object_t::max_size()`\n            array       | result of function `array_t::max_size()`\n\n    @liveexample{The following code calls `max_size()` on the different value\n    types. Note the output is implementation specific.,max_size}\n\n    @complexity Constant, as long as @ref array_t and @ref object_t satisfy\n    the Container concept; that is, their `max_size()` functions have constant\n    complexity.\n\n    @iterators No changes.\n\n    @exceptionsafety No-throw guarantee: this function never throws exceptions.\n\n    @requirement This function helps `basic_json` satisfying the\n    [Container](https://en.cppreference.com/w/cpp/named_req/Container)\n    requirements:\n    - The complexity is constant.\n    - Has the semantics of returning `b.size()` where `b` is the largest\n      possible JSON value.\n\n    @sa see @ref size() -- returns the number of elements\n\n    @since version 1.0.0\n    */\n    size_type max_size() const noexcept\n    {\n        switch (m_type)\n        {\n            case value_t::array:\n            {\n                // delegate call to array_t::max_size()\n                return m_value.array->max_size();\n            }\n\n            case value_t::object:\n            {\n                // delegate call to object_t::max_size()\n                return m_value.object->max_size();\n            }\n\n            default:\n            {\n                // all other types have max_size() == size()\n                return size();\n            }\n        }\n    }\n\n    /// @}\n\n\n    ///////////////\n    // modifiers //\n    ///////////////\n\n    /// @name modifiers\n    /// @{\n\n    /*!\n    @brief clears the contents\n\n    Clears the content of a JSON value and resets it to the default value as\n    if @ref basic_json(value_t) would have been called with the current value\n    type from @ref type():\n\n    Value type  | initial value\n    ----------- | -------------\n    null        | `null`\n    boolean     | `false`\n    string      | `\"\"`\n    number      | `0`\n    binary      | An empty byte vector\n    object      | `{}`\n    array       | `[]`\n\n    @post Has the same effect as calling\n    @code {.cpp}\n    *this = basic_json(type());\n    @endcode\n\n    @liveexample{The example below shows the effect of `clear()` to different\n    JSON types.,clear}\n\n    @complexity Linear in the size of the JSON value.\n\n    @iterators All iterators, pointers and references related to this container\n               are invalidated.\n\n    @exceptionsafety No-throw guarantee: this function never throws exceptions.\n\n    @sa see @ref basic_json(value_t) -- constructor that creates an object with the\n        same value than calling `clear()`\n\n    @since version 1.0.0\n    */\n    void clear() noexcept\n    {\n        switch (m_type)\n        {\n            case value_t::number_integer:\n            {\n                m_value.number_integer = 0;\n                break;\n            }\n\n            case value_t::number_unsigned:\n            {\n                m_value.number_unsigned = 0;\n                break;\n            }\n\n            case value_t::number_float:\n            {\n                m_value.number_float = 0.0;\n                break;\n            }\n\n            case value_t::boolean:\n            {\n                m_value.boolean = false;\n                break;\n            }\n\n            case value_t::string:\n            {\n                m_value.string->clear();\n                break;\n            }\n\n            case value_t::binary:\n            {\n                m_value.binary->clear();\n                break;\n            }\n\n            case value_t::array:\n            {\n                m_value.array->clear();\n                break;\n            }\n\n            case value_t::object:\n            {\n                m_value.object->clear();\n                break;\n            }\n\n            default:\n                break;\n        }\n    }\n\n    /*!\n    @brief add an object to an array\n\n    Appends the given element @a val to the end of the JSON value. If the\n    function is called on a JSON null value, an empty array is created before\n    appending @a val.\n\n    @param[in] val the value to add to the JSON array\n\n    @throw type_error.308 when called on a type other than JSON array or\n    null; example: `\"cannot use push_back() with number\"`\n\n    @complexity Amortized constant.\n\n    @liveexample{The example shows how `push_back()` and `+=` can be used to\n    add elements to a JSON array. Note how the `null` value was silently\n    converted to a JSON array.,push_back}\n\n    @since version 1.0.0\n    */\n    void push_back(basic_json&& val)\n    {\n        // push_back only works for null objects or arrays\n        if (JSON_HEDLEY_UNLIKELY(!(is_null() || is_array())))\n        {\n            JSON_THROW(type_error::create(308, \"cannot use push_back() with \" + std::string(type_name()), *this));\n        }\n\n        // transform null object into an array\n        if (is_null())\n        {\n            m_type = value_t::array;\n            m_value = value_t::array;\n            assert_invariant();\n        }\n\n        // add element to array (move semantics)\n        m_value.array->push_back(std::move(val));\n        set_parent(m_value.array->back());\n        // if val is moved from, basic_json move constructor marks it null so we do not call the destructor\n    }\n\n    /*!\n    @brief add an object to an array\n    @copydoc push_back(basic_json&&)\n    */\n    reference operator+=(basic_json&& val)\n    {\n        push_back(std::move(val));\n        return *this;\n    }\n\n    /*!\n    @brief add an object to an array\n    @copydoc push_back(basic_json&&)\n    */\n    void push_back(const basic_json& val)\n    {\n        // push_back only works for null objects or arrays\n        if (JSON_HEDLEY_UNLIKELY(!(is_null() || is_array())))\n        {\n            JSON_THROW(type_error::create(308, \"cannot use push_back() with \" + std::string(type_name()), *this));\n        }\n\n        // transform null object into an array\n        if (is_null())\n        {\n            m_type = value_t::array;\n            m_value = value_t::array;\n            assert_invariant();\n        }\n\n        // add element to array\n        m_value.array->push_back(val);\n        set_parent(m_value.array->back());\n    }\n\n    /*!\n    @brief add an object to an array\n    @copydoc push_back(basic_json&&)\n    */\n    reference operator+=(const basic_json& val)\n    {\n        push_back(val);\n        return *this;\n    }\n\n    /*!\n    @brief add an object to an object\n\n    Inserts the given element @a val to the JSON object. If the function is\n    called on a JSON null value, an empty object is created before inserting\n    @a val.\n\n    @param[in] val the value to add to the JSON object\n\n    @throw type_error.308 when called on a type other than JSON object or\n    null; example: `\"cannot use push_back() with number\"`\n\n    @complexity Logarithmic in the size of the container, O(log(`size()`)).\n\n    @liveexample{The example shows how `push_back()` and `+=` can be used to\n    add elements to a JSON object. Note how the `null` value was silently\n    converted to a JSON object.,push_back__object_t__value}\n\n    @since version 1.0.0\n    */\n    void push_back(const typename object_t::value_type& val)\n    {\n        // push_back only works for null objects or objects\n        if (JSON_HEDLEY_UNLIKELY(!(is_null() || is_object())))\n        {\n            JSON_THROW(type_error::create(308, \"cannot use push_back() with \" + std::string(type_name()), *this));\n        }\n\n        // transform null object into an object\n        if (is_null())\n        {\n            m_type = value_t::object;\n            m_value = value_t::object;\n            assert_invariant();\n        }\n\n        // add element to object\n        auto res = m_value.object->insert(val);\n        set_parent(res.first->second);\n    }\n\n    /*!\n    @brief add an object to an object\n    @copydoc push_back(const typename object_t::value_type&)\n    */\n    reference operator+=(const typename object_t::value_type& val)\n    {\n        push_back(val);\n        return *this;\n    }\n\n    /*!\n    @brief add an object to an object\n\n    This function allows to use `push_back` with an initializer list. In case\n\n    1. the current value is an object,\n    2. the initializer list @a init contains only two elements, and\n    3. the first element of @a init is a string,\n\n    @a init is converted into an object element and added using\n    @ref push_back(const typename object_t::value_type&). Otherwise, @a init\n    is converted to a JSON value and added using @ref push_back(basic_json&&).\n\n    @param[in] init  an initializer list\n\n    @complexity Linear in the size of the initializer list @a init.\n\n    @note This function is required to resolve an ambiguous overload error,\n          because pairs like `{\"key\", \"value\"}` can be both interpreted as\n          `object_t::value_type` or `std::initializer_list<basic_json>`, see\n          https://github.com/nlohmann/json/issues/235 for more information.\n\n    @liveexample{The example shows how initializer lists are treated as\n    objects when possible.,push_back__initializer_list}\n    */\n    void push_back(initializer_list_t init)\n    {\n        if (is_object() && init.size() == 2 && (*init.begin())->is_string())\n        {\n            basic_json&& key = init.begin()->moved_or_copied();\n            push_back(typename object_t::value_type(\n                          std::move(key.get_ref<string_t&>()), (init.begin() + 1)->moved_or_copied()));\n        }\n        else\n        {\n            push_back(basic_json(init));\n        }\n    }\n\n    /*!\n    @brief add an object to an object\n    @copydoc push_back(initializer_list_t)\n    */\n    reference operator+=(initializer_list_t init)\n    {\n        push_back(init);\n        return *this;\n    }\n\n    /*!\n    @brief add an object to an array\n\n    Creates a JSON value from the passed parameters @a args to the end of the\n    JSON value. If the function is called on a JSON null value, an empty array\n    is created before appending the value created from @a args.\n\n    @param[in] args arguments to forward to a constructor of @ref basic_json\n    @tparam Args compatible types to create a @ref basic_json object\n\n    @return reference to the inserted element\n\n    @throw type_error.311 when called on a type other than JSON array or\n    null; example: `\"cannot use emplace_back() with number\"`\n\n    @complexity Amortized constant.\n\n    @liveexample{The example shows how `push_back()` can be used to add\n    elements to a JSON array. Note how the `null` value was silently converted\n    to a JSON array.,emplace_back}\n\n    @since version 2.0.8, returns reference since 3.7.0\n    */\n    template<class... Args>\n    reference emplace_back(Args&& ... args)\n    {\n        // emplace_back only works for null objects or arrays\n        if (JSON_HEDLEY_UNLIKELY(!(is_null() || is_array())))\n        {\n            JSON_THROW(type_error::create(311, \"cannot use emplace_back() with \" + std::string(type_name()), *this));\n        }\n\n        // transform null object into an array\n        if (is_null())\n        {\n            m_type = value_t::array;\n            m_value = value_t::array;\n            assert_invariant();\n        }\n\n        // add element to array (perfect forwarding)\n#ifdef JSON_HAS_CPP_17\n        return set_parent(m_value.array->emplace_back(std::forward<Args>(args)...));\n#else\n        m_value.array->emplace_back(std::forward<Args>(args)...);\n        return set_parent(m_value.array->back());\n#endif\n    }\n\n    /*!\n    @brief add an object to an object if key does not exist\n\n    Inserts a new element into a JSON object constructed in-place with the\n    given @a args if there is no element with the key in the container. If the\n    function is called on a JSON null value, an empty object is created before\n    appending the value created from @a args.\n\n    @param[in] args arguments to forward to a constructor of @ref basic_json\n    @tparam Args compatible types to create a @ref basic_json object\n\n    @return a pair consisting of an iterator to the inserted element, or the\n            already-existing element if no insertion happened, and a bool\n            denoting whether the insertion took place.\n\n    @throw type_error.311 when called on a type other than JSON object or\n    null; example: `\"cannot use emplace() with number\"`\n\n    @complexity Logarithmic in the size of the container, O(log(`size()`)).\n\n    @liveexample{The example shows how `emplace()` can be used to add elements\n    to a JSON object. Note how the `null` value was silently converted to a\n    JSON object. Further note how no value is added if there was already one\n    value stored with the same key.,emplace}\n\n    @since version 2.0.8\n    */\n    template<class... Args>\n    std::pair<iterator, bool> emplace(Args&& ... args)\n    {\n        // emplace only works for null objects or arrays\n        if (JSON_HEDLEY_UNLIKELY(!(is_null() || is_object())))\n        {\n            JSON_THROW(type_error::create(311, \"cannot use emplace() with \" + std::string(type_name()), *this));\n        }\n\n        // transform null object into an object\n        if (is_null())\n        {\n            m_type = value_t::object;\n            m_value = value_t::object;\n            assert_invariant();\n        }\n\n        // add element to array (perfect forwarding)\n        auto res = m_value.object->emplace(std::forward<Args>(args)...);\n        set_parent(res.first->second);\n\n        // create result iterator and set iterator to the result of emplace\n        auto it = begin();\n        it.m_it.object_iterator = res.first;\n\n        // return pair of iterator and boolean\n        return {it, res.second};\n    }\n\n    /// Helper for insertion of an iterator\n    /// @note: This uses std::distance to support GCC 4.8,\n    ///        see https://github.com/nlohmann/json/pull/1257\n    template<typename... Args>\n    iterator insert_iterator(const_iterator pos, Args&& ... args)\n    {\n        iterator result(this);\n        JSON_ASSERT(m_value.array != nullptr);\n\n        auto insert_pos = std::distance(m_value.array->begin(), pos.m_it.array_iterator);\n        m_value.array->insert(pos.m_it.array_iterator, std::forward<Args>(args)...);\n        result.m_it.array_iterator = m_value.array->begin() + insert_pos;\n\n        // This could have been written as:\n        // result.m_it.array_iterator = m_value.array->insert(pos.m_it.array_iterator, cnt, val);\n        // but the return value of insert is missing in GCC 4.8, so it is written this way instead.\n\n        return result;\n    }\n\n    /*!\n    @brief inserts element\n\n    Inserts element @a val before iterator @a pos.\n\n    @param[in] pos iterator before which the content will be inserted; may be\n    the end() iterator\n    @param[in] val element to insert\n    @return iterator pointing to the inserted @a val.\n\n    @throw type_error.309 if called on JSON values other than arrays;\n    example: `\"cannot use insert() with string\"`\n    @throw invalid_iterator.202 if @a pos is not an iterator of *this;\n    example: `\"iterator does not fit current value\"`\n\n    @complexity Constant plus linear in the distance between @a pos and end of\n    the container.\n\n    @liveexample{The example shows how `insert()` is used.,insert}\n\n    @since version 1.0.0\n    */\n    iterator insert(const_iterator pos, const basic_json& val)\n    {\n        // insert only works for arrays\n        if (JSON_HEDLEY_LIKELY(is_array()))\n        {\n            // check if iterator pos fits to this JSON value\n            if (JSON_HEDLEY_UNLIKELY(pos.m_object != this))\n            {\n                JSON_THROW(invalid_iterator::create(202, \"iterator does not fit current value\", *this));\n            }\n\n            // insert to array and return iterator\n            return set_parents(insert_iterator(pos, val), static_cast<typename iterator::difference_type>(1));\n        }\n\n        JSON_THROW(type_error::create(309, \"cannot use insert() with \" + std::string(type_name()), *this));\n    }\n\n    /*!\n    @brief inserts element\n    @copydoc insert(const_iterator, const basic_json&)\n    */\n    iterator insert(const_iterator pos, basic_json&& val)\n    {\n        return insert(pos, val);\n    }\n\n    /*!\n    @brief inserts elements\n\n    Inserts @a cnt copies of @a val before iterator @a pos.\n\n    @param[in] pos iterator before which the content will be inserted; may be\n    the end() iterator\n    @param[in] cnt number of copies of @a val to insert\n    @param[in] val element to insert\n    @return iterator pointing to the first element inserted, or @a pos if\n    `cnt==0`\n\n    @throw type_error.309 if called on JSON values other than arrays; example:\n    `\"cannot use insert() with string\"`\n    @throw invalid_iterator.202 if @a pos is not an iterator of *this;\n    example: `\"iterator does not fit current value\"`\n\n    @complexity Linear in @a cnt plus linear in the distance between @a pos\n    and end of the container.\n\n    @liveexample{The example shows how `insert()` is used.,insert__count}\n\n    @since version 1.0.0\n    */\n    iterator insert(const_iterator pos, size_type cnt, const basic_json& val)\n    {\n        // insert only works for arrays\n        if (JSON_HEDLEY_LIKELY(is_array()))\n        {\n            // check if iterator pos fits to this JSON value\n            if (JSON_HEDLEY_UNLIKELY(pos.m_object != this))\n            {\n                JSON_THROW(invalid_iterator::create(202, \"iterator does not fit current value\", *this));\n            }\n\n            // insert to array and return iterator\n            return set_parents(insert_iterator(pos, cnt, val), static_cast<typename iterator::difference_type>(cnt));\n        }\n\n        JSON_THROW(type_error::create(309, \"cannot use insert() with \" + std::string(type_name()), *this));\n    }\n\n    /*!\n    @brief inserts elements\n\n    Inserts elements from range `[first, last)` before iterator @a pos.\n\n    @param[in] pos iterator before which the content will be inserted; may be\n    the end() iterator\n    @param[in] first begin of the range of elements to insert\n    @param[in] last end of the range of elements to insert\n\n    @throw type_error.309 if called on JSON values other than arrays; example:\n    `\"cannot use insert() with string\"`\n    @throw invalid_iterator.202 if @a pos is not an iterator of *this;\n    example: `\"iterator does not fit current value\"`\n    @throw invalid_iterator.210 if @a first and @a last do not belong to the\n    same JSON value; example: `\"iterators do not fit\"`\n    @throw invalid_iterator.211 if @a first or @a last are iterators into\n    container for which insert is called; example: `\"passed iterators may not\n    belong to container\"`\n\n    @return iterator pointing to the first element inserted, or @a pos if\n    `first==last`\n\n    @complexity Linear in `std::distance(first, last)` plus linear in the\n    distance between @a pos and end of the container.\n\n    @liveexample{The example shows how `insert()` is used.,insert__range}\n\n    @since version 1.0.0\n    */\n    iterator insert(const_iterator pos, const_iterator first, const_iterator last)\n    {\n        // insert only works for arrays\n        if (JSON_HEDLEY_UNLIKELY(!is_array()))\n        {\n            JSON_THROW(type_error::create(309, \"cannot use insert() with \" + std::string(type_name()), *this));\n        }\n\n        // check if iterator pos fits to this JSON value\n        if (JSON_HEDLEY_UNLIKELY(pos.m_object != this))\n        {\n            JSON_THROW(invalid_iterator::create(202, \"iterator does not fit current value\", *this));\n        }\n\n        // check if range iterators belong to the same JSON object\n        if (JSON_HEDLEY_UNLIKELY(first.m_object != last.m_object))\n        {\n            JSON_THROW(invalid_iterator::create(210, \"iterators do not fit\", *this));\n        }\n\n        if (JSON_HEDLEY_UNLIKELY(first.m_object == this))\n        {\n            JSON_THROW(invalid_iterator::create(211, \"passed iterators may not belong to container\", *this));\n        }\n\n        // insert to array and return iterator\n        return set_parents(insert_iterator(pos, first.m_it.array_iterator, last.m_it.array_iterator), std::distance(first, last));\n    }\n\n    /*!\n    @brief inserts elements\n\n    Inserts elements from initializer list @a ilist before iterator @a pos.\n\n    @param[in] pos iterator before which the content will be inserted; may be\n    the end() iterator\n    @param[in] ilist initializer list to insert the values from\n\n    @throw type_error.309 if called on JSON values other than arrays; example:\n    `\"cannot use insert() with string\"`\n    @throw invalid_iterator.202 if @a pos is not an iterator of *this;\n    example: `\"iterator does not fit current value\"`\n\n    @return iterator pointing to the first element inserted, or @a pos if\n    `ilist` is empty\n\n    @complexity Linear in `ilist.size()` plus linear in the distance between\n    @a pos and end of the container.\n\n    @liveexample{The example shows how `insert()` is used.,insert__ilist}\n\n    @since version 1.0.0\n    */\n    iterator insert(const_iterator pos, initializer_list_t ilist)\n    {\n        // insert only works for arrays\n        if (JSON_HEDLEY_UNLIKELY(!is_array()))\n        {\n            JSON_THROW(type_error::create(309, \"cannot use insert() with \" + std::string(type_name()), *this));\n        }\n\n        // check if iterator pos fits to this JSON value\n        if (JSON_HEDLEY_UNLIKELY(pos.m_object != this))\n        {\n            JSON_THROW(invalid_iterator::create(202, \"iterator does not fit current value\", *this));\n        }\n\n        // insert to array and return iterator\n        return set_parents(insert_iterator(pos, ilist.begin(), ilist.end()), static_cast<typename iterator::difference_type>(ilist.size()));\n    }\n\n    /*!\n    @brief inserts elements\n\n    Inserts elements from range `[first, last)`.\n\n    @param[in] first begin of the range of elements to insert\n    @param[in] last end of the range of elements to insert\n\n    @throw type_error.309 if called on JSON values other than objects; example:\n    `\"cannot use insert() with string\"`\n    @throw invalid_iterator.202 if iterator @a first or @a last does does not\n    point to an object; example: `\"iterators first and last must point to\n    objects\"`\n    @throw invalid_iterator.210 if @a first and @a last do not belong to the\n    same JSON value; example: `\"iterators do not fit\"`\n\n    @complexity Logarithmic: `O(N*log(size() + N))`, where `N` is the number\n    of elements to insert.\n\n    @liveexample{The example shows how `insert()` is used.,insert__range_object}\n\n    @since version 3.0.0\n    */\n    void insert(const_iterator first, const_iterator last)\n    {\n        // insert only works for objects\n        if (JSON_HEDLEY_UNLIKELY(!is_object()))\n        {\n            JSON_THROW(type_error::create(309, \"cannot use insert() with \" + std::string(type_name()), *this));\n        }\n\n        // check if range iterators belong to the same JSON object\n        if (JSON_HEDLEY_UNLIKELY(first.m_object != last.m_object))\n        {\n            JSON_THROW(invalid_iterator::create(210, \"iterators do not fit\", *this));\n        }\n\n        // passed iterators must belong to objects\n        if (JSON_HEDLEY_UNLIKELY(!first.m_object->is_object()))\n        {\n            JSON_THROW(invalid_iterator::create(202, \"iterators first and last must point to objects\", *this));\n        }\n\n        m_value.object->insert(first.m_it.object_iterator, last.m_it.object_iterator);\n    }\n\n    /*!\n    @brief updates a JSON object from another object, overwriting existing keys\n\n    Inserts all values from JSON object @a j and overwrites existing keys.\n\n    @param[in] j  JSON object to read values from\n\n    @throw type_error.312 if called on JSON values other than objects; example:\n    `\"cannot use update() with string\"`\n\n    @complexity O(N*log(size() + N)), where N is the number of elements to\n                insert.\n\n    @liveexample{The example shows how `update()` is used.,update}\n\n    @sa https://docs.python.org/3.6/library/stdtypes.html#dict.update\n\n    @since version 3.0.0\n    */\n    void update(const_reference j)\n    {\n        // implicitly convert null value to an empty object\n        if (is_null())\n        {\n            m_type = value_t::object;\n            m_value.object = create<object_t>();\n            assert_invariant();\n        }\n\n        if (JSON_HEDLEY_UNLIKELY(!is_object()))\n        {\n            JSON_THROW(type_error::create(312, \"cannot use update() with \" + std::string(type_name()), *this));\n        }\n        if (JSON_HEDLEY_UNLIKELY(!j.is_object()))\n        {\n            JSON_THROW(type_error::create(312, \"cannot use update() with \" + std::string(j.type_name()), *this));\n        }\n\n        for (auto it = j.cbegin(); it != j.cend(); ++it)\n        {\n            m_value.object->operator[](it.key()) = it.value();\n        }\n    }\n\n    /*!\n    @brief updates a JSON object from another object, overwriting existing keys\n\n    Inserts all values from from range `[first, last)` and overwrites existing\n    keys.\n\n    @param[in] first begin of the range of elements to insert\n    @param[in] last end of the range of elements to insert\n\n    @throw type_error.312 if called on JSON values other than objects; example:\n    `\"cannot use update() with string\"`\n    @throw invalid_iterator.202 if iterator @a first or @a last does does not\n    point to an object; example: `\"iterators first and last must point to\n    objects\"`\n    @throw invalid_iterator.210 if @a first and @a last do not belong to the\n    same JSON value; example: `\"iterators do not fit\"`\n\n    @complexity O(N*log(size() + N)), where N is the number of elements to\n                insert.\n\n    @liveexample{The example shows how `update()` is used__range.,update}\n\n    @sa https://docs.python.org/3.6/library/stdtypes.html#dict.update\n\n    @since version 3.0.0\n    */\n    void update(const_iterator first, const_iterator last)\n    {\n        // implicitly convert null value to an empty object\n        if (is_null())\n        {\n            m_type = value_t::object;\n            m_value.object = create<object_t>();\n            assert_invariant();\n        }\n\n        if (JSON_HEDLEY_UNLIKELY(!is_object()))\n        {\n            JSON_THROW(type_error::create(312, \"cannot use update() with \" + std::string(type_name()), *this));\n        }\n\n        // check if range iterators belong to the same JSON object\n        if (JSON_HEDLEY_UNLIKELY(first.m_object != last.m_object))\n        {\n            JSON_THROW(invalid_iterator::create(210, \"iterators do not fit\", *this));\n        }\n\n        // passed iterators must belong to objects\n        if (JSON_HEDLEY_UNLIKELY(!first.m_object->is_object()\n                                 || !last.m_object->is_object()))\n        {\n            JSON_THROW(invalid_iterator::create(202, \"iterators first and last must point to objects\", *this));\n        }\n\n        for (auto it = first; it != last; ++it)\n        {\n            m_value.object->operator[](it.key()) = it.value();\n        }\n    }\n\n    /*!\n    @brief exchanges the values\n\n    Exchanges the contents of the JSON value with those of @a other. Does not\n    invoke any move, copy, or swap operations on individual elements. All\n    iterators and references remain valid. The past-the-end iterator is\n    invalidated.\n\n    @param[in,out] other JSON value to exchange the contents with\n\n    @complexity Constant.\n\n    @liveexample{The example below shows how JSON values can be swapped with\n    `swap()`.,swap__reference}\n\n    @since version 1.0.0\n    */\n    void swap(reference other) noexcept (\n        std::is_nothrow_move_constructible<value_t>::value&&\n        std::is_nothrow_move_assignable<value_t>::value&&\n        std::is_nothrow_move_constructible<json_value>::value&&\n        std::is_nothrow_move_assignable<json_value>::value\n    )\n    {\n        std::swap(m_type, other.m_type);\n        std::swap(m_value, other.m_value);\n\n        set_parents();\n        other.set_parents();\n        assert_invariant();\n    }\n\n    /*!\n    @brief exchanges the values\n\n    Exchanges the contents of the JSON value from @a left with those of @a right. Does not\n    invoke any move, copy, or swap operations on individual elements. All\n    iterators and references remain valid. The past-the-end iterator is\n    invalidated. implemented as a friend function callable via ADL.\n\n    @param[in,out] left JSON value to exchange the contents with\n    @param[in,out] right JSON value to exchange the contents with\n\n    @complexity Constant.\n\n    @liveexample{The example below shows how JSON values can be swapped with\n    `swap()`.,swap__reference}\n\n    @since version 1.0.0\n    */\n    friend void swap(reference left, reference right) noexcept (\n        std::is_nothrow_move_constructible<value_t>::value&&\n        std::is_nothrow_move_assignable<value_t>::value&&\n        std::is_nothrow_move_constructible<json_value>::value&&\n        std::is_nothrow_move_assignable<json_value>::value\n    )\n    {\n        left.swap(right);\n    }\n\n    /*!\n    @brief exchanges the values\n\n    Exchanges the contents of a JSON array with those of @a other. Does not\n    invoke any move, copy, or swap operations on individual elements. All\n    iterators and references remain valid. The past-the-end iterator is\n    invalidated.\n\n    @param[in,out] other array to exchange the contents with\n\n    @throw type_error.310 when JSON value is not an array; example: `\"cannot\n    use swap() with string\"`\n\n    @complexity Constant.\n\n    @liveexample{The example below shows how arrays can be swapped with\n    `swap()`.,swap__array_t}\n\n    @since version 1.0.0\n    */\n    void swap(array_t& other) // NOLINT(bugprone-exception-escape)\n    {\n        // swap only works for arrays\n        if (JSON_HEDLEY_LIKELY(is_array()))\n        {\n            std::swap(*(m_value.array), other);\n        }\n        else\n        {\n            JSON_THROW(type_error::create(310, \"cannot use swap() with \" + std::string(type_name()), *this));\n        }\n    }\n\n    /*!\n    @brief exchanges the values\n\n    Exchanges the contents of a JSON object with those of @a other. Does not\n    invoke any move, copy, or swap operations on individual elements. All\n    iterators and references remain valid. The past-the-end iterator is\n    invalidated.\n\n    @param[in,out] other object to exchange the contents with\n\n    @throw type_error.310 when JSON value is not an object; example:\n    `\"cannot use swap() with string\"`\n\n    @complexity Constant.\n\n    @liveexample{The example below shows how objects can be swapped with\n    `swap()`.,swap__object_t}\n\n    @since version 1.0.0\n    */\n    void swap(object_t& other) // NOLINT(bugprone-exception-escape)\n    {\n        // swap only works for objects\n        if (JSON_HEDLEY_LIKELY(is_object()))\n        {\n            std::swap(*(m_value.object), other);\n        }\n        else\n        {\n            JSON_THROW(type_error::create(310, \"cannot use swap() with \" + std::string(type_name()), *this));\n        }\n    }\n\n    /*!\n    @brief exchanges the values\n\n    Exchanges the contents of a JSON string with those of @a other. Does not\n    invoke any move, copy, or swap operations on individual elements. All\n    iterators and references remain valid. The past-the-end iterator is\n    invalidated.\n\n    @param[in,out] other string to exchange the contents with\n\n    @throw type_error.310 when JSON value is not a string; example: `\"cannot\n    use swap() with boolean\"`\n\n    @complexity Constant.\n\n    @liveexample{The example below shows how strings can be swapped with\n    `swap()`.,swap__string_t}\n\n    @since version 1.0.0\n    */\n    void swap(string_t& other) // NOLINT(bugprone-exception-escape)\n    {\n        // swap only works for strings\n        if (JSON_HEDLEY_LIKELY(is_string()))\n        {\n            std::swap(*(m_value.string), other);\n        }\n        else\n        {\n            JSON_THROW(type_error::create(310, \"cannot use swap() with \" + std::string(type_name()), *this));\n        }\n    }\n\n    /*!\n    @brief exchanges the values\n\n    Exchanges the contents of a JSON string with those of @a other. Does not\n    invoke any move, copy, or swap operations on individual elements. All\n    iterators and references remain valid. The past-the-end iterator is\n    invalidated.\n\n    @param[in,out] other binary to exchange the contents with\n\n    @throw type_error.310 when JSON value is not a string; example: `\"cannot\n    use swap() with boolean\"`\n\n    @complexity Constant.\n\n    @liveexample{The example below shows how strings can be swapped with\n    `swap()`.,swap__binary_t}\n\n    @since version 3.8.0\n    */\n    void swap(binary_t& other) // NOLINT(bugprone-exception-escape)\n    {\n        // swap only works for strings\n        if (JSON_HEDLEY_LIKELY(is_binary()))\n        {\n            std::swap(*(m_value.binary), other);\n        }\n        else\n        {\n            JSON_THROW(type_error::create(310, \"cannot use swap() with \" + std::string(type_name()), *this));\n        }\n    }\n\n    /// @copydoc swap(binary_t&)\n    void swap(typename binary_t::container_type& other) // NOLINT(bugprone-exception-escape)\n    {\n        // swap only works for strings\n        if (JSON_HEDLEY_LIKELY(is_binary()))\n        {\n            std::swap(*(m_value.binary), other);\n        }\n        else\n        {\n            JSON_THROW(type_error::create(310, \"cannot use swap() with \" + std::string(type_name()), *this));\n        }\n    }\n\n    /// @}\n\n  public:\n    //////////////////////////////////////////\n    // lexicographical comparison operators //\n    //////////////////////////////////////////\n\n    /// @name lexicographical comparison operators\n    /// @{\n\n    /*!\n    @brief comparison: equal\n\n    Compares two JSON values for equality according to the following rules:\n    - Two JSON values are equal if (1) they are from the same type and (2)\n      their stored values are the same according to their respective\n      `operator==`.\n    - Integer and floating-point numbers are automatically converted before\n      comparison. Note that two NaN values are always treated as unequal.\n    - Two JSON null values are equal.\n\n    @note Floating-point inside JSON values numbers are compared with\n    `json::number_float_t::operator==` which is `double::operator==` by\n    default. To compare floating-point while respecting an epsilon, an alternative\n    [comparison function](https://github.com/mariokonrad/marnav/blob/master/include/marnav/math/floatingpoint.hpp#L34-#L39)\n    could be used, for instance\n    @code {.cpp}\n    template<typename T, typename = typename std::enable_if<std::is_floating_point<T>::value, T>::type>\n    inline bool is_same(T a, T b, T epsilon = std::numeric_limits<T>::epsilon()) noexcept\n    {\n        return std::abs(a - b) <= epsilon;\n    }\n    @endcode\n    Or you can self-defined operator equal function like this:\n    @code {.cpp}\n    bool my_equal(const_reference lhs, const_reference rhs) {\n    const auto lhs_type lhs.type();\n    const auto rhs_type rhs.type();\n    if (lhs_type == rhs_type) {\n        switch(lhs_type)\n            // self_defined case\n            case value_t::number_float:\n                return std::abs(lhs - rhs) <= std::numeric_limits<float>::epsilon();\n            // other cases remain the same with the original\n            ...\n    }\n    ...\n    }\n    @endcode\n\n    @note NaN values never compare equal to themselves or to other NaN values.\n\n    @param[in] lhs  first JSON value to consider\n    @param[in] rhs  second JSON value to consider\n    @return whether the values @a lhs and @a rhs are equal\n\n    @exceptionsafety No-throw guarantee: this function never throws exceptions.\n\n    @complexity Linear.\n\n    @liveexample{The example demonstrates comparing several JSON\n    types.,operator__equal}\n\n    @since version 1.0.0\n    */\n    friend bool operator==(const_reference lhs, const_reference rhs) noexcept\n    {\n        const auto lhs_type = lhs.type();\n        const auto rhs_type = rhs.type();\n\n        if (lhs_type == rhs_type)\n        {\n            switch (lhs_type)\n            {\n                case value_t::array:\n                    return *lhs.m_value.array == *rhs.m_value.array;\n\n                case value_t::object:\n                    return *lhs.m_value.object == *rhs.m_value.object;\n\n                case value_t::null:\n                    return true;\n\n                case value_t::string:\n                    return *lhs.m_value.string == *rhs.m_value.string;\n\n                case value_t::boolean:\n                    return lhs.m_value.boolean == rhs.m_value.boolean;\n\n                case value_t::number_integer:\n                    return lhs.m_value.number_integer == rhs.m_value.number_integer;\n\n                case value_t::number_unsigned:\n                    return lhs.m_value.number_unsigned == rhs.m_value.number_unsigned;\n\n                case value_t::number_float:\n                    return lhs.m_value.number_float == rhs.m_value.number_float;\n\n                case value_t::binary:\n                    return *lhs.m_value.binary == *rhs.m_value.binary;\n\n                default:\n                    return false;\n            }\n        }\n        else if (lhs_type == value_t::number_integer && rhs_type == value_t::number_float)\n        {\n            return static_cast<number_float_t>(lhs.m_value.number_integer) == rhs.m_value.number_float;\n        }\n        else if (lhs_type == value_t::number_float && rhs_type == value_t::number_integer)\n        {\n            return lhs.m_value.number_float == static_cast<number_float_t>(rhs.m_value.number_integer);\n        }\n        else if (lhs_type == value_t::number_unsigned && rhs_type == value_t::number_float)\n        {\n            return static_cast<number_float_t>(lhs.m_value.number_unsigned) == rhs.m_value.number_float;\n        }\n        else if (lhs_type == value_t::number_float && rhs_type == value_t::number_unsigned)\n        {\n            return lhs.m_value.number_float == static_cast<number_float_t>(rhs.m_value.number_unsigned);\n        }\n        else if (lhs_type == value_t::number_unsigned && rhs_type == value_t::number_integer)\n        {\n            return static_cast<number_integer_t>(lhs.m_value.number_unsigned) == rhs.m_value.number_integer;\n        }\n        else if (lhs_type == value_t::number_integer && rhs_type == value_t::number_unsigned)\n        {\n            return lhs.m_value.number_integer == static_cast<number_integer_t>(rhs.m_value.number_unsigned);\n        }\n\n        return false;\n    }\n\n    /*!\n    @brief comparison: equal\n    @copydoc operator==(const_reference, const_reference)\n    */\n    template<typename ScalarType, typename std::enable_if<\n                 std::is_scalar<ScalarType>::value, int>::type = 0>\n    friend bool operator==(const_reference lhs, ScalarType rhs) noexcept\n    {\n        return lhs == basic_json(rhs);\n    }\n\n    /*!\n    @brief comparison: equal\n    @copydoc operator==(const_reference, const_reference)\n    */\n    template<typename ScalarType, typename std::enable_if<\n                 std::is_scalar<ScalarType>::value, int>::type = 0>\n    friend bool operator==(ScalarType lhs, const_reference rhs) noexcept\n    {\n        return basic_json(lhs) == rhs;\n    }\n\n    /*!\n    @brief comparison: not equal\n\n    Compares two JSON values for inequality by calculating `not (lhs == rhs)`.\n\n    @param[in] lhs  first JSON value to consider\n    @param[in] rhs  second JSON value to consider\n    @return whether the values @a lhs and @a rhs are not equal\n\n    @complexity Linear.\n\n    @exceptionsafety No-throw guarantee: this function never throws exceptions.\n\n    @liveexample{The example demonstrates comparing several JSON\n    types.,operator__notequal}\n\n    @since version 1.0.0\n    */\n    friend bool operator!=(const_reference lhs, const_reference rhs) noexcept\n    {\n        return !(lhs == rhs);\n    }\n\n    /*!\n    @brief comparison: not equal\n    @copydoc operator!=(const_reference, const_reference)\n    */\n    template<typename ScalarType, typename std::enable_if<\n                 std::is_scalar<ScalarType>::value, int>::type = 0>\n    friend bool operator!=(const_reference lhs, ScalarType rhs) noexcept\n    {\n        return lhs != basic_json(rhs);\n    }\n\n    /*!\n    @brief comparison: not equal\n    @copydoc operator!=(const_reference, const_reference)\n    */\n    template<typename ScalarType, typename std::enable_if<\n                 std::is_scalar<ScalarType>::value, int>::type = 0>\n    friend bool operator!=(ScalarType lhs, const_reference rhs) noexcept\n    {\n        return basic_json(lhs) != rhs;\n    }\n\n    /*!\n    @brief comparison: less than\n\n    Compares whether one JSON value @a lhs is less than another JSON value @a\n    rhs according to the following rules:\n    - If @a lhs and @a rhs have the same type, the values are compared using\n      the default `<` operator.\n    - Integer and floating-point numbers are automatically converted before\n      comparison\n    - In case @a lhs and @a rhs have different types, the values are ignored\n      and the order of the types is considered, see\n      @ref operator<(const value_t, const value_t).\n\n    @param[in] lhs  first JSON value to consider\n    @param[in] rhs  second JSON value to consider\n    @return whether @a lhs is less than @a rhs\n\n    @complexity Linear.\n\n    @exceptionsafety No-throw guarantee: this function never throws exceptions.\n\n    @liveexample{The example demonstrates comparing several JSON\n    types.,operator__less}\n\n    @since version 1.0.0\n    */\n    friend bool operator<(const_reference lhs, const_reference rhs) noexcept\n    {\n        const auto lhs_type = lhs.type();\n        const auto rhs_type = rhs.type();\n\n        if (lhs_type == rhs_type)\n        {\n            switch (lhs_type)\n            {\n                case value_t::array:\n                    // note parentheses are necessary, see\n                    // https://github.com/nlohmann/json/issues/1530\n                    return (*lhs.m_value.array) < (*rhs.m_value.array);\n\n                case value_t::object:\n                    return (*lhs.m_value.object) < (*rhs.m_value.object);\n\n                case value_t::null:\n                    return false;\n\n                case value_t::string:\n                    return (*lhs.m_value.string) < (*rhs.m_value.string);\n\n                case value_t::boolean:\n                    return (lhs.m_value.boolean) < (rhs.m_value.boolean);\n\n                case value_t::number_integer:\n                    return (lhs.m_value.number_integer) < (rhs.m_value.number_integer);\n\n                case value_t::number_unsigned:\n                    return (lhs.m_value.number_unsigned) < (rhs.m_value.number_unsigned);\n\n                case value_t::number_float:\n                    return (lhs.m_value.number_float) < (rhs.m_value.number_float);\n\n                case value_t::binary:\n                    return (*lhs.m_value.binary) < (*rhs.m_value.binary);\n\n                default:\n                    return false;\n            }\n        }\n        else if (lhs_type == value_t::number_integer && rhs_type == value_t::number_float)\n        {\n            return static_cast<number_float_t>(lhs.m_value.number_integer) < rhs.m_value.number_float;\n        }\n        else if (lhs_type == value_t::number_float && rhs_type == value_t::number_integer)\n        {\n            return lhs.m_value.number_float < static_cast<number_float_t>(rhs.m_value.number_integer);\n        }\n        else if (lhs_type == value_t::number_unsigned && rhs_type == value_t::number_float)\n        {\n            return static_cast<number_float_t>(lhs.m_value.number_unsigned) < rhs.m_value.number_float;\n        }\n        else if (lhs_type == value_t::number_float && rhs_type == value_t::number_unsigned)\n        {\n            return lhs.m_value.number_float < static_cast<number_float_t>(rhs.m_value.number_unsigned);\n        }\n        else if (lhs_type == value_t::number_integer && rhs_type == value_t::number_unsigned)\n        {\n            return lhs.m_value.number_integer < static_cast<number_integer_t>(rhs.m_value.number_unsigned);\n        }\n        else if (lhs_type == value_t::number_unsigned && rhs_type == value_t::number_integer)\n        {\n            return static_cast<number_integer_t>(lhs.m_value.number_unsigned) < rhs.m_value.number_integer;\n        }\n\n        // We only reach this line if we cannot compare values. In that case,\n        // we compare types. Note we have to call the operator explicitly,\n        // because MSVC has problems otherwise.\n        return operator<(lhs_type, rhs_type);\n    }\n\n    /*!\n    @brief comparison: less than\n    @copydoc operator<(const_reference, const_reference)\n    */\n    template<typename ScalarType, typename std::enable_if<\n                 std::is_scalar<ScalarType>::value, int>::type = 0>\n    friend bool operator<(const_reference lhs, ScalarType rhs) noexcept\n    {\n        return lhs < basic_json(rhs);\n    }\n\n    /*!\n    @brief comparison: less than\n    @copydoc operator<(const_reference, const_reference)\n    */\n    template<typename ScalarType, typename std::enable_if<\n                 std::is_scalar<ScalarType>::value, int>::type = 0>\n    friend bool operator<(ScalarType lhs, const_reference rhs) noexcept\n    {\n        return basic_json(lhs) < rhs;\n    }\n\n    /*!\n    @brief comparison: less than or equal\n\n    Compares whether one JSON value @a lhs is less than or equal to another\n    JSON value by calculating `not (rhs < lhs)`.\n\n    @param[in] lhs  first JSON value to consider\n    @param[in] rhs  second JSON value to consider\n    @return whether @a lhs is less than or equal to @a rhs\n\n    @complexity Linear.\n\n    @exceptionsafety No-throw guarantee: this function never throws exceptions.\n\n    @liveexample{The example demonstrates comparing several JSON\n    types.,operator__greater}\n\n    @since version 1.0.0\n    */\n    friend bool operator<=(const_reference lhs, const_reference rhs) noexcept\n    {\n        return !(rhs < lhs);\n    }\n\n    /*!\n    @brief comparison: less than or equal\n    @copydoc operator<=(const_reference, const_reference)\n    */\n    template<typename ScalarType, typename std::enable_if<\n                 std::is_scalar<ScalarType>::value, int>::type = 0>\n    friend bool operator<=(const_reference lhs, ScalarType rhs) noexcept\n    {\n        return lhs <= basic_json(rhs);\n    }\n\n    /*!\n    @brief comparison: less than or equal\n    @copydoc operator<=(const_reference, const_reference)\n    */\n    template<typename ScalarType, typename std::enable_if<\n                 std::is_scalar<ScalarType>::value, int>::type = 0>\n    friend bool operator<=(ScalarType lhs, const_reference rhs) noexcept\n    {\n        return basic_json(lhs) <= rhs;\n    }\n\n    /*!\n    @brief comparison: greater than\n\n    Compares whether one JSON value @a lhs is greater than another\n    JSON value by calculating `not (lhs <= rhs)`.\n\n    @param[in] lhs  first JSON value to consider\n    @param[in] rhs  second JSON value to consider\n    @return whether @a lhs is greater than to @a rhs\n\n    @complexity Linear.\n\n    @exceptionsafety No-throw guarantee: this function never throws exceptions.\n\n    @liveexample{The example demonstrates comparing several JSON\n    types.,operator__lessequal}\n\n    @since version 1.0.0\n    */\n    friend bool operator>(const_reference lhs, const_reference rhs) noexcept\n    {\n        return !(lhs <= rhs);\n    }\n\n    /*!\n    @brief comparison: greater than\n    @copydoc operator>(const_reference, const_reference)\n    */\n    template<typename ScalarType, typename std::enable_if<\n                 std::is_scalar<ScalarType>::value, int>::type = 0>\n    friend bool operator>(const_reference lhs, ScalarType rhs) noexcept\n    {\n        return lhs > basic_json(rhs);\n    }\n\n    /*!\n    @brief comparison: greater than\n    @copydoc operator>(const_reference, const_reference)\n    */\n    template<typename ScalarType, typename std::enable_if<\n                 std::is_scalar<ScalarType>::value, int>::type = 0>\n    friend bool operator>(ScalarType lhs, const_reference rhs) noexcept\n    {\n        return basic_json(lhs) > rhs;\n    }\n\n    /*!\n    @brief comparison: greater than or equal\n\n    Compares whether one JSON value @a lhs is greater than or equal to another\n    JSON value by calculating `not (lhs < rhs)`.\n\n    @param[in] lhs  first JSON value to consider\n    @param[in] rhs  second JSON value to consider\n    @return whether @a lhs is greater than or equal to @a rhs\n\n    @complexity Linear.\n\n    @exceptionsafety No-throw guarantee: this function never throws exceptions.\n\n    @liveexample{The example demonstrates comparing several JSON\n    types.,operator__greaterequal}\n\n    @since version 1.0.0\n    */\n    friend bool operator>=(const_reference lhs, const_reference rhs) noexcept\n    {\n        return !(lhs < rhs);\n    }\n\n    /*!\n    @brief comparison: greater than or equal\n    @copydoc operator>=(const_reference, const_reference)\n    */\n    template<typename ScalarType, typename std::enable_if<\n                 std::is_scalar<ScalarType>::value, int>::type = 0>\n    friend bool operator>=(const_reference lhs, ScalarType rhs) noexcept\n    {\n        return lhs >= basic_json(rhs);\n    }\n\n    /*!\n    @brief comparison: greater than or equal\n    @copydoc operator>=(const_reference, const_reference)\n    */\n    template<typename ScalarType, typename std::enable_if<\n                 std::is_scalar<ScalarType>::value, int>::type = 0>\n    friend bool operator>=(ScalarType lhs, const_reference rhs) noexcept\n    {\n        return basic_json(lhs) >= rhs;\n    }\n\n    /// @}\n\n    ///////////////////\n    // serialization //\n    ///////////////////\n\n    /// @name serialization\n    /// @{\n\n    /*!\n    @brief serialize to stream\n\n    Serialize the given JSON value @a j to the output stream @a o. The JSON\n    value will be serialized using the @ref dump member function.\n\n    - The indentation of the output can be controlled with the member variable\n      `width` of the output stream @a o. For instance, using the manipulator\n      `std::setw(4)` on @a o sets the indentation level to `4` and the\n      serialization result is the same as calling `dump(4)`.\n\n    - The indentation character can be controlled with the member variable\n      `fill` of the output stream @a o. For instance, the manipulator\n      `std::setfill('\\\\t')` sets indentation to use a tab character rather than\n      the default space character.\n\n    @param[in,out] o  stream to serialize to\n    @param[in] j  JSON value to serialize\n\n    @return the stream @a o\n\n    @throw type_error.316 if a string stored inside the JSON value is not\n                          UTF-8 encoded\n\n    @complexity Linear.\n\n    @liveexample{The example below shows the serialization with different\n    parameters to `width` to adjust the indentation level.,operator_serialize}\n\n    @since version 1.0.0; indentation character added in version 3.0.0\n    */\n    friend std::ostream& operator<<(std::ostream& o, const basic_json& j)\n    {\n        // read width member and use it as indentation parameter if nonzero\n        const bool pretty_print = o.width() > 0;\n        const auto indentation = pretty_print ? o.width() : 0;\n\n        // reset width to 0 for subsequent calls to this stream\n        o.width(0);\n\n        // do the actual serialization\n        serializer s(detail::output_adapter<char>(o), o.fill());\n        s.dump(j, pretty_print, false, static_cast<unsigned int>(indentation));\n        return o;\n    }\n\n    /*!\n    @brief serialize to stream\n    @deprecated This stream operator is deprecated and will be removed in\n                future 4.0.0 of the library. Please use\n                @ref operator<<(std::ostream&, const basic_json&)\n                instead; that is, replace calls like `j >> o;` with `o << j;`.\n    @since version 1.0.0; deprecated since version 3.0.0\n    */\n    JSON_HEDLEY_DEPRECATED_FOR(3.0.0, operator<<(std::ostream&, const basic_json&))\n    friend std::ostream& operator>>(const basic_json& j, std::ostream& o)\n    {\n        return o << j;\n    }\n\n    /// @}\n\n\n    /////////////////////\n    // deserialization //\n    /////////////////////\n\n    /// @name deserialization\n    /// @{\n\n    /*!\n    @brief deserialize from a compatible input\n\n    @tparam InputType A compatible input, for instance\n    - an std::istream object\n    - a FILE pointer\n    - a C-style array of characters\n    - a pointer to a null-terminated string of single byte characters\n    - an object obj for which begin(obj) and end(obj) produces a valid pair of\n      iterators.\n\n    @param[in] i  input to read from\n    @param[in] cb  a parser callback function of type @ref parser_callback_t\n    which is used to control the deserialization by filtering unwanted values\n    (optional)\n    @param[in] allow_exceptions  whether to throw exceptions in case of a\n    parse error (optional, true by default)\n    @param[in] ignore_comments  whether comments should be ignored and treated\n    like whitespace (true) or yield a parse error (true); (optional, false by\n    default)\n\n    @return deserialized JSON value; in case of a parse error and\n            @a allow_exceptions set to `false`, the return value will be\n            value_t::discarded.\n\n    @throw parse_error.101 if a parse error occurs; example: `\"\"unexpected end\n    of input; expected string literal\"\"`\n    @throw parse_error.102 if to_unicode fails or surrogate error\n    @throw parse_error.103 if to_unicode fails\n\n    @complexity Linear in the length of the input. The parser is a predictive\n    LL(1) parser. The complexity can be higher if the parser callback function\n    @a cb or reading from the input @a i has a super-linear complexity.\n\n    @note A UTF-8 byte order mark is silently ignored.\n\n    @liveexample{The example below demonstrates the `parse()` function reading\n    from an array.,parse__array__parser_callback_t}\n\n    @liveexample{The example below demonstrates the `parse()` function with\n    and without callback function.,parse__string__parser_callback_t}\n\n    @liveexample{The example below demonstrates the `parse()` function with\n    and without callback function.,parse__istream__parser_callback_t}\n\n    @liveexample{The example below demonstrates the `parse()` function reading\n    from a contiguous container.,parse__contiguouscontainer__parser_callback_t}\n\n    @since version 2.0.3 (contiguous containers); version 3.9.0 allowed to\n    ignore comments.\n    */\n    template<typename InputType>\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json parse(InputType&& i,\n                            const parser_callback_t cb = nullptr,\n                            const bool allow_exceptions = true,\n                            const bool ignore_comments = false)\n    {\n        basic_json result;\n        parser(detail::input_adapter(std::forward<InputType>(i)), cb, allow_exceptions, ignore_comments).parse(true, result);\n        return result;\n    }\n\n    /*!\n    @brief deserialize from a pair of character iterators\n\n    The value_type of the iterator must be a integral type with size of 1, 2 or\n    4 bytes, which will be interpreted respectively as UTF-8, UTF-16 and UTF-32.\n\n    @param[in] first iterator to start of character range\n    @param[in] last  iterator to end of character range\n    @param[in] cb  a parser callback function of type @ref parser_callback_t\n    which is used to control the deserialization by filtering unwanted values\n    (optional)\n    @param[in] allow_exceptions  whether to throw exceptions in case of a\n    parse error (optional, true by default)\n    @param[in] ignore_comments  whether comments should be ignored and treated\n    like whitespace (true) or yield a parse error (true); (optional, false by\n    default)\n\n    @return deserialized JSON value; in case of a parse error and\n            @a allow_exceptions set to `false`, the return value will be\n            value_t::discarded.\n\n    @throw parse_error.101 if a parse error occurs; example: `\"\"unexpected end\n    of input; expected string literal\"\"`\n    @throw parse_error.102 if to_unicode fails or surrogate error\n    @throw parse_error.103 if to_unicode fails\n    */\n    template<typename IteratorType>\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json parse(IteratorType first,\n                            IteratorType last,\n                            const parser_callback_t cb = nullptr,\n                            const bool allow_exceptions = true,\n                            const bool ignore_comments = false)\n    {\n        basic_json result;\n        parser(detail::input_adapter(std::move(first), std::move(last)), cb, allow_exceptions, ignore_comments).parse(true, result);\n        return result;\n    }\n\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    JSON_HEDLEY_DEPRECATED_FOR(3.8.0, parse(ptr, ptr + len))\n    static basic_json parse(detail::span_input_adapter&& i,\n                            const parser_callback_t cb = nullptr,\n                            const bool allow_exceptions = true,\n                            const bool ignore_comments = false)\n    {\n        basic_json result;\n        parser(i.get(), cb, allow_exceptions, ignore_comments).parse(true, result);\n        return result;\n    }\n\n    /*!\n    @brief check if the input is valid JSON\n\n    Unlike the @ref parse(InputType&&, const parser_callback_t,const bool)\n    function, this function neither throws an exception in case of invalid JSON\n    input (i.e., a parse error) nor creates diagnostic information.\n\n    @tparam InputType A compatible input, for instance\n    - an std::istream object\n    - a FILE pointer\n    - a C-style array of characters\n    - a pointer to a null-terminated string of single byte characters\n    - an object obj for which begin(obj) and end(obj) produces a valid pair of\n      iterators.\n\n    @param[in] i input to read from\n    @param[in] ignore_comments  whether comments should be ignored and treated\n    like whitespace (true) or yield a parse error (true); (optional, false by\n    default)\n\n    @return Whether the input read from @a i is valid JSON.\n\n    @complexity Linear in the length of the input. The parser is a predictive\n    LL(1) parser.\n\n    @note A UTF-8 byte order mark is silently ignored.\n\n    @liveexample{The example below demonstrates the `accept()` function reading\n    from a string.,accept__string}\n    */\n    template<typename InputType>\n    static bool accept(InputType&& i,\n                       const bool ignore_comments = false)\n    {\n        return parser(detail::input_adapter(std::forward<InputType>(i)), nullptr, false, ignore_comments).accept(true);\n    }\n\n    template<typename IteratorType>\n    static bool accept(IteratorType first, IteratorType last,\n                       const bool ignore_comments = false)\n    {\n        return parser(detail::input_adapter(std::move(first), std::move(last)), nullptr, false, ignore_comments).accept(true);\n    }\n\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    JSON_HEDLEY_DEPRECATED_FOR(3.8.0, accept(ptr, ptr + len))\n    static bool accept(detail::span_input_adapter&& i,\n                       const bool ignore_comments = false)\n    {\n        return parser(i.get(), nullptr, false, ignore_comments).accept(true);\n    }\n\n    /*!\n    @brief generate SAX events\n\n    The SAX event lister must follow the interface of @ref json_sax.\n\n    This function reads from a compatible input. Examples are:\n    - an std::istream object\n    - a FILE pointer\n    - a C-style array of characters\n    - a pointer to a null-terminated string of single byte characters\n    - an object obj for which begin(obj) and end(obj) produces a valid pair of\n      iterators.\n\n    @param[in] i  input to read from\n    @param[in,out] sax  SAX event listener\n    @param[in] format  the format to parse (JSON, CBOR, MessagePack, or UBJSON)\n    @param[in] strict  whether the input has to be consumed completely\n    @param[in] ignore_comments  whether comments should be ignored and treated\n    like whitespace (true) or yield a parse error (true); (optional, false by\n    default); only applies to the JSON file format.\n\n    @return return value of the last processed SAX event\n\n    @throw parse_error.101 if a parse error occurs; example: `\"\"unexpected end\n    of input; expected string literal\"\"`\n    @throw parse_error.102 if to_unicode fails or surrogate error\n    @throw parse_error.103 if to_unicode fails\n\n    @complexity Linear in the length of the input. The parser is a predictive\n    LL(1) parser. The complexity can be higher if the SAX consumer @a sax has\n    a super-linear complexity.\n\n    @note A UTF-8 byte order mark is silently ignored.\n\n    @liveexample{The example below demonstrates the `sax_parse()` function\n    reading from string and processing the events with a user-defined SAX\n    event consumer.,sax_parse}\n\n    @since version 3.2.0\n    */\n    template <typename InputType, typename SAX>\n    JSON_HEDLEY_NON_NULL(2)\n    static bool sax_parse(InputType&& i, SAX* sax,\n                          input_format_t format = input_format_t::json,\n                          const bool strict = true,\n                          const bool ignore_comments = false)\n    {\n        auto ia = detail::input_adapter(std::forward<InputType>(i));\n        return format == input_format_t::json\n               ? parser(std::move(ia), nullptr, true, ignore_comments).sax_parse(sax, strict)\n               : detail::binary_reader<basic_json, decltype(ia), SAX>(std::move(ia)).sax_parse(format, sax, strict);\n    }\n\n    template<class IteratorType, class SAX>\n    JSON_HEDLEY_NON_NULL(3)\n    static bool sax_parse(IteratorType first, IteratorType last, SAX* sax,\n                          input_format_t format = input_format_t::json,\n                          const bool strict = true,\n                          const bool ignore_comments = false)\n    {\n        auto ia = detail::input_adapter(std::move(first), std::move(last));\n        return format == input_format_t::json\n               ? parser(std::move(ia), nullptr, true, ignore_comments).sax_parse(sax, strict)\n               : detail::binary_reader<basic_json, decltype(ia), SAX>(std::move(ia)).sax_parse(format, sax, strict);\n    }\n\n    template <typename SAX>\n    JSON_HEDLEY_DEPRECATED_FOR(3.8.0, sax_parse(ptr, ptr + len, ...))\n    JSON_HEDLEY_NON_NULL(2)\n    static bool sax_parse(detail::span_input_adapter&& i, SAX* sax,\n                          input_format_t format = input_format_t::json,\n                          const bool strict = true,\n                          const bool ignore_comments = false)\n    {\n        auto ia = i.get();\n        return format == input_format_t::json\n               // NOLINTNEXTLINE(hicpp-move-const-arg,performance-move-const-arg)\n               ? parser(std::move(ia), nullptr, true, ignore_comments).sax_parse(sax, strict)\n               // NOLINTNEXTLINE(hicpp-move-const-arg,performance-move-const-arg)\n               : detail::binary_reader<basic_json, decltype(ia), SAX>(std::move(ia)).sax_parse(format, sax, strict);\n    }\n\n    /*!\n    @brief deserialize from stream\n    @deprecated This stream operator is deprecated and will be removed in\n                version 4.0.0 of the library. Please use\n                @ref operator>>(std::istream&, basic_json&)\n                instead; that is, replace calls like `j << i;` with `i >> j;`.\n    @since version 1.0.0; deprecated since version 3.0.0\n    */\n    JSON_HEDLEY_DEPRECATED_FOR(3.0.0, operator>>(std::istream&, basic_json&))\n    friend std::istream& operator<<(basic_json& j, std::istream& i)\n    {\n        return operator>>(i, j);\n    }\n\n    /*!\n    @brief deserialize from stream\n\n    Deserializes an input stream to a JSON value.\n\n    @param[in,out] i  input stream to read a serialized JSON value from\n    @param[in,out] j  JSON value to write the deserialized input to\n\n    @throw parse_error.101 in case of an unexpected token\n    @throw parse_error.102 if to_unicode fails or surrogate error\n    @throw parse_error.103 if to_unicode fails\n\n    @complexity Linear in the length of the input. The parser is a predictive\n    LL(1) parser.\n\n    @note A UTF-8 byte order mark is silently ignored.\n\n    @liveexample{The example below shows how a JSON value is constructed by\n    reading a serialization from a stream.,operator_deserialize}\n\n    @sa parse(std::istream&, const parser_callback_t) for a variant with a\n    parser callback function to filter values while parsing\n\n    @since version 1.0.0\n    */\n    friend std::istream& operator>>(std::istream& i, basic_json& j)\n    {\n        parser(detail::input_adapter(i)).parse(false, j);\n        return i;\n    }\n\n    /// @}\n\n    ///////////////////////////\n    // convenience functions //\n    ///////////////////////////\n\n    /*!\n    @brief return the type as string\n\n    Returns the type name as string to be used in error messages - usually to\n    indicate that a function was called on a wrong JSON type.\n\n    @return a string representation of a the @a m_type member:\n            Value type  | return value\n            ----------- | -------------\n            null        | `\"null\"`\n            boolean     | `\"boolean\"`\n            string      | `\"string\"`\n            number      | `\"number\"` (for all number types)\n            object      | `\"object\"`\n            array       | `\"array\"`\n            binary      | `\"binary\"`\n            discarded   | `\"discarded\"`\n\n    @exceptionsafety No-throw guarantee: this function never throws exceptions.\n\n    @complexity Constant.\n\n    @liveexample{The following code exemplifies `type_name()` for all JSON\n    types.,type_name}\n\n    @sa see @ref type() -- return the type of the JSON value\n    @sa see @ref operator value_t() -- return the type of the JSON value (implicit)\n\n    @since version 1.0.0, public since 2.1.0, `const char*` and `noexcept`\n    since 3.0.0\n    */\n    JSON_HEDLEY_RETURNS_NON_NULL\n    const char* type_name() const noexcept\n    {\n        {\n            switch (m_type)\n            {\n                case value_t::null:\n                    return \"null\";\n                case value_t::object:\n                    return \"object\";\n                case value_t::array:\n                    return \"array\";\n                case value_t::string:\n                    return \"string\";\n                case value_t::boolean:\n                    return \"boolean\";\n                case value_t::binary:\n                    return \"binary\";\n                case value_t::discarded:\n                    return \"discarded\";\n                default:\n                    return \"number\";\n            }\n        }\n    }\n\n\n  JSON_PRIVATE_UNLESS_TESTED:\n    //////////////////////\n    // member variables //\n    //////////////////////\n\n    /// the type of the current element\n    value_t m_type = value_t::null;\n\n    /// the value of the current element\n    json_value m_value = {};\n\n#if JSON_DIAGNOSTICS\n    /// a pointer to a parent value (for debugging purposes)\n    basic_json* m_parent = nullptr;\n#endif\n\n    //////////////////////////////////////////\n    // binary serialization/deserialization //\n    //////////////////////////////////////////\n\n    /// @name binary serialization/deserialization support\n    /// @{\n\n  public:\n    /*!\n    @brief create a CBOR serialization of a given JSON value\n\n    Serializes a given JSON value @a j to a byte vector using the CBOR (Concise\n    Binary Object Representation) serialization format. CBOR is a binary\n    serialization format which aims to be more compact than JSON itself, yet\n    more efficient to parse.\n\n    The library uses the following mapping from JSON values types to\n    CBOR types according to the CBOR specification (RFC 7049):\n\n    JSON value type | value/range                                | CBOR type                          | first byte\n    --------------- | ------------------------------------------ | ---------------------------------- | ---------------\n    null            | `null`                                     | Null                               | 0xF6\n    boolean         | `true`                                     | True                               | 0xF5\n    boolean         | `false`                                    | False                              | 0xF4\n    number_integer  | -9223372036854775808..-2147483649          | Negative integer (8 bytes follow)  | 0x3B\n    number_integer  | -2147483648..-32769                        | Negative integer (4 bytes follow)  | 0x3A\n    number_integer  | -32768..-129                               | Negative integer (2 bytes follow)  | 0x39\n    number_integer  | -128..-25                                  | Negative integer (1 byte follow)   | 0x38\n    number_integer  | -24..-1                                    | Negative integer                   | 0x20..0x37\n    number_integer  | 0..23                                      | Integer                            | 0x00..0x17\n    number_integer  | 24..255                                    | Unsigned integer (1 byte follow)   | 0x18\n    number_integer  | 256..65535                                 | Unsigned integer (2 bytes follow)  | 0x19\n    number_integer  | 65536..4294967295                          | Unsigned integer (4 bytes follow)  | 0x1A\n    number_integer  | 4294967296..18446744073709551615           | Unsigned integer (8 bytes follow)  | 0x1B\n    number_unsigned | 0..23                                      | Integer                            | 0x00..0x17\n    number_unsigned | 24..255                                    | Unsigned integer (1 byte follow)   | 0x18\n    number_unsigned | 256..65535                                 | Unsigned integer (2 bytes follow)  | 0x19\n    number_unsigned | 65536..4294967295                          | Unsigned integer (4 bytes follow)  | 0x1A\n    number_unsigned | 4294967296..18446744073709551615           | Unsigned integer (8 bytes follow)  | 0x1B\n    number_float    | *any value representable by a float*       | Single-Precision Float             | 0xFA\n    number_float    | *any value NOT representable by a float*   | Double-Precision Float             | 0xFB\n    string          | *length*: 0..23                            | UTF-8 string                       | 0x60..0x77\n    string          | *length*: 23..255                          | UTF-8 string (1 byte follow)       | 0x78\n    string          | *length*: 256..65535                       | UTF-8 string (2 bytes follow)      | 0x79\n    string          | *length*: 65536..4294967295                | UTF-8 string (4 bytes follow)      | 0x7A\n    string          | *length*: 4294967296..18446744073709551615 | UTF-8 string (8 bytes follow)      | 0x7B\n    array           | *size*: 0..23                              | array                              | 0x80..0x97\n    array           | *size*: 23..255                            | array (1 byte follow)              | 0x98\n    array           | *size*: 256..65535                         | array (2 bytes follow)             | 0x99\n    array           | *size*: 65536..4294967295                  | array (4 bytes follow)             | 0x9A\n    array           | *size*: 4294967296..18446744073709551615   | array (8 bytes follow)             | 0x9B\n    object          | *size*: 0..23                              | map                                | 0xA0..0xB7\n    object          | *size*: 23..255                            | map (1 byte follow)                | 0xB8\n    object          | *size*: 256..65535                         | map (2 bytes follow)               | 0xB9\n    object          | *size*: 65536..4294967295                  | map (4 bytes follow)               | 0xBA\n    object          | *size*: 4294967296..18446744073709551615   | map (8 bytes follow)               | 0xBB\n    binary          | *size*: 0..23                              | byte string                        | 0x40..0x57\n    binary          | *size*: 23..255                            | byte string (1 byte follow)        | 0x58\n    binary          | *size*: 256..65535                         | byte string (2 bytes follow)       | 0x59\n    binary          | *size*: 65536..4294967295                  | byte string (4 bytes follow)       | 0x5A\n    binary          | *size*: 4294967296..18446744073709551615   | byte string (8 bytes follow)       | 0x5B\n\n    @note The mapping is **complete** in the sense that any JSON value type\n          can be converted to a CBOR value.\n\n    @note If NaN or Infinity are stored inside a JSON number, they are\n          serialized properly. This behavior differs from the @ref dump()\n          function which serializes NaN or Infinity to `null`.\n\n    @note The following CBOR types are not used in the conversion:\n          - UTF-8 strings terminated by \"break\" (0x7F)\n          - arrays terminated by \"break\" (0x9F)\n          - maps terminated by \"break\" (0xBF)\n          - byte strings terminated by \"break\" (0x5F)\n          - date/time (0xC0..0xC1)\n          - bignum (0xC2..0xC3)\n          - decimal fraction (0xC4)\n          - bigfloat (0xC5)\n          - expected conversions (0xD5..0xD7)\n          - simple values (0xE0..0xF3, 0xF8)\n          - undefined (0xF7)\n          - half-precision floats (0xF9)\n          - break (0xFF)\n\n    @param[in] j  JSON value to serialize\n    @return CBOR serialization as byte vector\n\n    @complexity Linear in the size of the JSON value @a j.\n\n    @liveexample{The example shows the serialization of a JSON value to a byte\n    vector in CBOR format.,to_cbor}\n\n    @sa http://cbor.io\n    @sa see @ref from_cbor(InputType&&, const bool, const bool, const cbor_tag_handler_t) for the\n        analogous deserialization\n    @sa see @ref to_msgpack(const basic_json&) for the related MessagePack format\n    @sa see @ref to_ubjson(const basic_json&, const bool, const bool) for the\n             related UBJSON format\n\n    @since version 2.0.9; compact representation of floating-point numbers\n           since version 3.8.0\n    */\n    static std::vector<uint8_t> to_cbor(const basic_json& j)\n    {\n        std::vector<uint8_t> result;\n        to_cbor(j, result);\n        return result;\n    }\n\n    static void to_cbor(const basic_json& j, detail::output_adapter<uint8_t> o)\n    {\n        binary_writer<uint8_t>(o).write_cbor(j);\n    }\n\n    static void to_cbor(const basic_json& j, detail::output_adapter<char> o)\n    {\n        binary_writer<char>(o).write_cbor(j);\n    }\n\n    /*!\n    @brief create a MessagePack serialization of a given JSON value\n\n    Serializes a given JSON value @a j to a byte vector using the MessagePack\n    serialization format. MessagePack is a binary serialization format which\n    aims to be more compact than JSON itself, yet more efficient to parse.\n\n    The library uses the following mapping from JSON values types to\n    MessagePack types according to the MessagePack specification:\n\n    JSON value type | value/range                       | MessagePack type | first byte\n    --------------- | --------------------------------- | ---------------- | ----------\n    null            | `null`                            | nil              | 0xC0\n    boolean         | `true`                            | true             | 0xC3\n    boolean         | `false`                           | false            | 0xC2\n    number_integer  | -9223372036854775808..-2147483649 | int64            | 0xD3\n    number_integer  | -2147483648..-32769               | int32            | 0xD2\n    number_integer  | -32768..-129                      | int16            | 0xD1\n    number_integer  | -128..-33                         | int8             | 0xD0\n    number_integer  | -32..-1                           | negative fixint  | 0xE0..0xFF\n    number_integer  | 0..127                            | positive fixint  | 0x00..0x7F\n    number_integer  | 128..255                          | uint 8           | 0xCC\n    number_integer  | 256..65535                        | uint 16          | 0xCD\n    number_integer  | 65536..4294967295                 | uint 32          | 0xCE\n    number_integer  | 4294967296..18446744073709551615  | uint 64          | 0xCF\n    number_unsigned | 0..127                            | positive fixint  | 0x00..0x7F\n    number_unsigned | 128..255                          | uint 8           | 0xCC\n    number_unsigned | 256..65535                        | uint 16          | 0xCD\n    number_unsigned | 65536..4294967295                 | uint 32          | 0xCE\n    number_unsigned | 4294967296..18446744073709551615  | uint 64          | 0xCF\n    number_float    | *any value representable by a float*     | float 32 | 0xCA\n    number_float    | *any value NOT representable by a float* | float 64 | 0xCB\n    string          | *length*: 0..31                   | fixstr           | 0xA0..0xBF\n    string          | *length*: 32..255                 | str 8            | 0xD9\n    string          | *length*: 256..65535              | str 16           | 0xDA\n    string          | *length*: 65536..4294967295       | str 32           | 0xDB\n    array           | *size*: 0..15                     | fixarray         | 0x90..0x9F\n    array           | *size*: 16..65535                 | array 16         | 0xDC\n    array           | *size*: 65536..4294967295         | array 32         | 0xDD\n    object          | *size*: 0..15                     | fix map          | 0x80..0x8F\n    object          | *size*: 16..65535                 | map 16           | 0xDE\n    object          | *size*: 65536..4294967295         | map 32           | 0xDF\n    binary          | *size*: 0..255                    | bin 8            | 0xC4\n    binary          | *size*: 256..65535                | bin 16           | 0xC5\n    binary          | *size*: 65536..4294967295         | bin 32           | 0xC6\n\n    @note The mapping is **complete** in the sense that any JSON value type\n          can be converted to a MessagePack value.\n\n    @note The following values can **not** be converted to a MessagePack value:\n          - strings with more than 4294967295 bytes\n          - byte strings with more than 4294967295 bytes\n          - arrays with more than 4294967295 elements\n          - objects with more than 4294967295 elements\n\n    @note Any MessagePack output created @ref to_msgpack can be successfully\n          parsed by @ref from_msgpack.\n\n    @note If NaN or Infinity are stored inside a JSON number, they are\n          serialized properly. This behavior differs from the @ref dump()\n          function which serializes NaN or Infinity to `null`.\n\n    @param[in] j  JSON value to serialize\n    @return MessagePack serialization as byte vector\n\n    @complexity Linear in the size of the JSON value @a j.\n\n    @liveexample{The example shows the serialization of a JSON value to a byte\n    vector in MessagePack format.,to_msgpack}\n\n    @sa http://msgpack.org\n    @sa see @ref from_msgpack for the analogous deserialization\n    @sa see @ref to_cbor(const basic_json& for the related CBOR format\n    @sa see @ref to_ubjson(const basic_json&, const bool, const bool) for the\n             related UBJSON format\n\n    @since version 2.0.9\n    */\n    static std::vector<uint8_t> to_msgpack(const basic_json& j)\n    {\n        std::vector<uint8_t> result;\n        to_msgpack(j, result);\n        return result;\n    }\n\n    static void to_msgpack(const basic_json& j, detail::output_adapter<uint8_t> o)\n    {\n        binary_writer<uint8_t>(o).write_msgpack(j);\n    }\n\n    static void to_msgpack(const basic_json& j, detail::output_adapter<char> o)\n    {\n        binary_writer<char>(o).write_msgpack(j);\n    }\n\n    /*!\n    @brief create a UBJSON serialization of a given JSON value\n\n    Serializes a given JSON value @a j to a byte vector using the UBJSON\n    (Universal Binary JSON) serialization format. UBJSON aims to be more compact\n    than JSON itself, yet more efficient to parse.\n\n    The library uses the following mapping from JSON values types to\n    UBJSON types according to the UBJSON specification:\n\n    JSON value type | value/range                       | UBJSON type | marker\n    --------------- | --------------------------------- | ----------- | ------\n    null            | `null`                            | null        | `Z`\n    boolean         | `true`                            | true        | `T`\n    boolean         | `false`                           | false       | `F`\n    number_integer  | -9223372036854775808..-2147483649 | int64       | `L`\n    number_integer  | -2147483648..-32769               | int32       | `l`\n    number_integer  | -32768..-129                      | int16       | `I`\n    number_integer  | -128..127                         | int8        | `i`\n    number_integer  | 128..255                          | uint8       | `U`\n    number_integer  | 256..32767                        | int16       | `I`\n    number_integer  | 32768..2147483647                 | int32       | `l`\n    number_integer  | 2147483648..9223372036854775807   | int64       | `L`\n    number_unsigned | 0..127                            | int8        | `i`\n    number_unsigned | 128..255                          | uint8       | `U`\n    number_unsigned | 256..32767                        | int16       | `I`\n    number_unsigned | 32768..2147483647                 | int32       | `l`\n    number_unsigned | 2147483648..9223372036854775807   | int64       | `L`\n    number_unsigned | 2147483649..18446744073709551615  | high-precision | `H`\n    number_float    | *any value*                       | float64     | `D`\n    string          | *with shortest length indicator*  | string      | `S`\n    array           | *see notes on optimized format*   | array       | `[`\n    object          | *see notes on optimized format*   | map         | `{`\n\n    @note The mapping is **complete** in the sense that any JSON value type\n          can be converted to a UBJSON value.\n\n    @note The following values can **not** be converted to a UBJSON value:\n          - strings with more than 9223372036854775807 bytes (theoretical)\n\n    @note The following markers are not used in the conversion:\n          - `Z`: no-op values are not created.\n          - `C`: single-byte strings are serialized with `S` markers.\n\n    @note Any UBJSON output created @ref to_ubjson can be successfully parsed\n          by @ref from_ubjson.\n\n    @note If NaN or Infinity are stored inside a JSON number, they are\n          serialized properly. This behavior differs from the @ref dump()\n          function which serializes NaN or Infinity to `null`.\n\n    @note The optimized formats for containers are supported: Parameter\n          @a use_size adds size information to the beginning of a container and\n          removes the closing marker. Parameter @a use_type further checks\n          whether all elements of a container have the same type and adds the\n          type marker to the beginning of the container. The @a use_type\n          parameter must only be used together with @a use_size = true. Note\n          that @a use_size = true alone may result in larger representations -\n          the benefit of this parameter is that the receiving side is\n          immediately informed on the number of elements of the container.\n\n    @note If the JSON data contains the binary type, the value stored is a list\n          of integers, as suggested by the UBJSON documentation.  In particular,\n          this means that serialization and the deserialization of a JSON\n          containing binary values into UBJSON and back will result in a\n          different JSON object.\n\n    @param[in] j  JSON value to serialize\n    @param[in] use_size  whether to add size annotations to container types\n    @param[in] use_type  whether to add type annotations to container types\n                         (must be combined with @a use_size = true)\n    @return UBJSON serialization as byte vector\n\n    @complexity Linear in the size of the JSON value @a j.\n\n    @liveexample{The example shows the serialization of a JSON value to a byte\n    vector in UBJSON format.,to_ubjson}\n\n    @sa http://ubjson.org\n    @sa see @ref from_ubjson(InputType&&, const bool, const bool) for the\n        analogous deserialization\n    @sa see @ref to_cbor(const basic_json& for the related CBOR format\n    @sa see @ref to_msgpack(const basic_json&) for the related MessagePack format\n\n    @since version 3.1.0\n    */\n    static std::vector<uint8_t> to_ubjson(const basic_json& j,\n                                          const bool use_size = false,\n                                          const bool use_type = false)\n    {\n        std::vector<uint8_t> result;\n        to_ubjson(j, result, use_size, use_type);\n        return result;\n    }\n\n    static void to_ubjson(const basic_json& j, detail::output_adapter<uint8_t> o,\n                          const bool use_size = false, const bool use_type = false)\n    {\n        binary_writer<uint8_t>(o).write_ubjson(j, use_size, use_type);\n    }\n\n    static void to_ubjson(const basic_json& j, detail::output_adapter<char> o,\n                          const bool use_size = false, const bool use_type = false)\n    {\n        binary_writer<char>(o).write_ubjson(j, use_size, use_type);\n    }\n\n\n    /*!\n    @brief Serializes the given JSON object `j` to BSON and returns a vector\n           containing the corresponding BSON-representation.\n\n    BSON (Binary JSON) is a binary format in which zero or more ordered key/value pairs are\n    stored as a single entity (a so-called document).\n\n    The library uses the following mapping from JSON values types to BSON types:\n\n    JSON value type | value/range                       | BSON type   | marker\n    --------------- | --------------------------------- | ----------- | ------\n    null            | `null`                            | null        | 0x0A\n    boolean         | `true`, `false`                   | boolean     | 0x08\n    number_integer  | -9223372036854775808..-2147483649 | int64       | 0x12\n    number_integer  | -2147483648..2147483647           | int32       | 0x10\n    number_integer  | 2147483648..9223372036854775807   | int64       | 0x12\n    number_unsigned | 0..2147483647                     | int32       | 0x10\n    number_unsigned | 2147483648..9223372036854775807   | int64       | 0x12\n    number_unsigned | 9223372036854775808..18446744073709551615| --   | --\n    number_float    | *any value*                       | double      | 0x01\n    string          | *any value*                       | string      | 0x02\n    array           | *any value*                       | document    | 0x04\n    object          | *any value*                       | document    | 0x03\n    binary          | *any value*                       | binary      | 0x05\n\n    @warning The mapping is **incomplete**, since only JSON-objects (and things\n    contained therein) can be serialized to BSON.\n    Also, integers larger than 9223372036854775807 cannot be serialized to BSON,\n    and the keys may not contain U+0000, since they are serialized a\n    zero-terminated c-strings.\n\n    @throw out_of_range.407  if `j.is_number_unsigned() && j.get<std::uint64_t>() > 9223372036854775807`\n    @throw out_of_range.409  if a key in `j` contains a NULL (U+0000)\n    @throw type_error.317    if `!j.is_object()`\n\n    @pre The input `j` is required to be an object: `j.is_object() == true`.\n\n    @note Any BSON output created via @ref to_bson can be successfully parsed\n          by @ref from_bson.\n\n    @param[in] j  JSON value to serialize\n    @return BSON serialization as byte vector\n\n    @complexity Linear in the size of the JSON value @a j.\n\n    @liveexample{The example shows the serialization of a JSON value to a byte\n    vector in BSON format.,to_bson}\n\n    @sa http://bsonspec.org/spec.html\n    @sa see @ref from_bson(detail::input_adapter&&, const bool strict) for the\n        analogous deserialization\n    @sa see @ref to_ubjson(const basic_json&, const bool, const bool) for the\n             related UBJSON format\n    @sa see @ref to_cbor(const basic_json&) for the related CBOR format\n    @sa see @ref to_msgpack(const basic_json&) for the related MessagePack format\n    */\n    static std::vector<uint8_t> to_bson(const basic_json& j)\n    {\n        std::vector<uint8_t> result;\n        to_bson(j, result);\n        return result;\n    }\n\n    /*!\n    @brief Serializes the given JSON object `j` to BSON and forwards the\n           corresponding BSON-representation to the given output_adapter `o`.\n    @param j The JSON object to convert to BSON.\n    @param o The output adapter that receives the binary BSON representation.\n    @pre The input `j` shall be an object: `j.is_object() == true`\n    @sa see @ref to_bson(const basic_json&)\n    */\n    static void to_bson(const basic_json& j, detail::output_adapter<uint8_t> o)\n    {\n        binary_writer<uint8_t>(o).write_bson(j);\n    }\n\n    /*!\n    @copydoc to_bson(const basic_json&, detail::output_adapter<uint8_t>)\n    */\n    static void to_bson(const basic_json& j, detail::output_adapter<char> o)\n    {\n        binary_writer<char>(o).write_bson(j);\n    }\n\n\n    /*!\n    @brief create a JSON value from an input in CBOR format\n\n    Deserializes a given input @a i to a JSON value using the CBOR (Concise\n    Binary Object Representation) serialization format.\n\n    The library maps CBOR types to JSON value types as follows:\n\n    CBOR type              | JSON value type | first byte\n    ---------------------- | --------------- | ----------\n    Integer                | number_unsigned | 0x00..0x17\n    Unsigned integer       | number_unsigned | 0x18\n    Unsigned integer       | number_unsigned | 0x19\n    Unsigned integer       | number_unsigned | 0x1A\n    Unsigned integer       | number_unsigned | 0x1B\n    Negative integer       | number_integer  | 0x20..0x37\n    Negative integer       | number_integer  | 0x38\n    Negative integer       | number_integer  | 0x39\n    Negative integer       | number_integer  | 0x3A\n    Negative integer       | number_integer  | 0x3B\n    Byte string            | binary          | 0x40..0x57\n    Byte string            | binary          | 0x58\n    Byte string            | binary          | 0x59\n    Byte string            | binary          | 0x5A\n    Byte string            | binary          | 0x5B\n    UTF-8 string           | string          | 0x60..0x77\n    UTF-8 string           | string          | 0x78\n    UTF-8 string           | string          | 0x79\n    UTF-8 string           | string          | 0x7A\n    UTF-8 string           | string          | 0x7B\n    UTF-8 string           | string          | 0x7F\n    array                  | array           | 0x80..0x97\n    array                  | array           | 0x98\n    array                  | array           | 0x99\n    array                  | array           | 0x9A\n    array                  | array           | 0x9B\n    array                  | array           | 0x9F\n    map                    | object          | 0xA0..0xB7\n    map                    | object          | 0xB8\n    map                    | object          | 0xB9\n    map                    | object          | 0xBA\n    map                    | object          | 0xBB\n    map                    | object          | 0xBF\n    False                  | `false`         | 0xF4\n    True                   | `true`          | 0xF5\n    Null                   | `null`          | 0xF6\n    Half-Precision Float   | number_float    | 0xF9\n    Single-Precision Float | number_float    | 0xFA\n    Double-Precision Float | number_float    | 0xFB\n\n    @warning The mapping is **incomplete** in the sense that not all CBOR\n             types can be converted to a JSON value. The following CBOR types\n             are not supported and will yield parse errors (parse_error.112):\n             - date/time (0xC0..0xC1)\n             - bignum (0xC2..0xC3)\n             - decimal fraction (0xC4)\n             - bigfloat (0xC5)\n             - expected conversions (0xD5..0xD7)\n             - simple values (0xE0..0xF3, 0xF8)\n             - undefined (0xF7)\n\n    @warning CBOR allows map keys of any type, whereas JSON only allows\n             strings as keys in object values. Therefore, CBOR maps with keys\n             other than UTF-8 strings are rejected (parse_error.113).\n\n    @note Any CBOR output created @ref to_cbor can be successfully parsed by\n          @ref from_cbor.\n\n    @param[in] i  an input in CBOR format convertible to an input adapter\n    @param[in] strict  whether to expect the input to be consumed until EOF\n                       (true by default)\n    @param[in] allow_exceptions  whether to throw exceptions in case of a\n    parse error (optional, true by default)\n    @param[in] tag_handler how to treat CBOR tags (optional, error by default)\n\n    @return deserialized JSON value; in case of a parse error and\n            @a allow_exceptions set to `false`, the return value will be\n            value_t::discarded.\n\n    @throw parse_error.110 if the given input ends prematurely or the end of\n    file was not reached when @a strict was set to true\n    @throw parse_error.112 if unsupported features from CBOR were\n    used in the given input @a v or if the input is not valid CBOR\n    @throw parse_error.113 if a string was expected as map key, but not found\n\n    @complexity Linear in the size of the input @a i.\n\n    @liveexample{The example shows the deserialization of a byte vector in CBOR\n    format to a JSON value.,from_cbor}\n\n    @sa http://cbor.io\n    @sa see @ref to_cbor(const basic_json&) for the analogous serialization\n    @sa see @ref from_msgpack(InputType&&, const bool, const bool) for the\n        related MessagePack format\n    @sa see @ref from_ubjson(InputType&&, const bool, const bool) for the\n        related UBJSON format\n\n    @since version 2.0.9; parameter @a start_index since 2.1.1; changed to\n           consume input adapters, removed start_index parameter, and added\n           @a strict parameter since 3.0.0; added @a allow_exceptions parameter\n           since 3.2.0; added @a tag_handler parameter since 3.9.0.\n    */\n    template<typename InputType>\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json from_cbor(InputType&& i,\n                                const bool strict = true,\n                                const bool allow_exceptions = true,\n                                const cbor_tag_handler_t tag_handler = cbor_tag_handler_t::error)\n    {\n        basic_json result;\n        detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);\n        auto ia = detail::input_adapter(std::forward<InputType>(i));\n        const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::cbor, &sdp, strict, tag_handler);\n        return res ? result : basic_json(value_t::discarded);\n    }\n\n    /*!\n    @copydoc from_cbor(InputType&&, const bool, const bool, const cbor_tag_handler_t)\n    */\n    template<typename IteratorType>\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json from_cbor(IteratorType first, IteratorType last,\n                                const bool strict = true,\n                                const bool allow_exceptions = true,\n                                const cbor_tag_handler_t tag_handler = cbor_tag_handler_t::error)\n    {\n        basic_json result;\n        detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);\n        auto ia = detail::input_adapter(std::move(first), std::move(last));\n        const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::cbor, &sdp, strict, tag_handler);\n        return res ? result : basic_json(value_t::discarded);\n    }\n\n    template<typename T>\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    JSON_HEDLEY_DEPRECATED_FOR(3.8.0, from_cbor(ptr, ptr + len))\n    static basic_json from_cbor(const T* ptr, std::size_t len,\n                                const bool strict = true,\n                                const bool allow_exceptions = true,\n                                const cbor_tag_handler_t tag_handler = cbor_tag_handler_t::error)\n    {\n        return from_cbor(ptr, ptr + len, strict, allow_exceptions, tag_handler);\n    }\n\n\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    JSON_HEDLEY_DEPRECATED_FOR(3.8.0, from_cbor(ptr, ptr + len))\n    static basic_json from_cbor(detail::span_input_adapter&& i,\n                                const bool strict = true,\n                                const bool allow_exceptions = true,\n                                const cbor_tag_handler_t tag_handler = cbor_tag_handler_t::error)\n    {\n        basic_json result;\n        detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);\n        auto ia = i.get();\n        // NOLINTNEXTLINE(hicpp-move-const-arg,performance-move-const-arg)\n        const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::cbor, &sdp, strict, tag_handler);\n        return res ? result : basic_json(value_t::discarded);\n    }\n\n    /*!\n    @brief create a JSON value from an input in MessagePack format\n\n    Deserializes a given input @a i to a JSON value using the MessagePack\n    serialization format.\n\n    The library maps MessagePack types to JSON value types as follows:\n\n    MessagePack type | JSON value type | first byte\n    ---------------- | --------------- | ----------\n    positive fixint  | number_unsigned | 0x00..0x7F\n    fixmap           | object          | 0x80..0x8F\n    fixarray         | array           | 0x90..0x9F\n    fixstr           | string          | 0xA0..0xBF\n    nil              | `null`          | 0xC0\n    false            | `false`         | 0xC2\n    true             | `true`          | 0xC3\n    float 32         | number_float    | 0xCA\n    float 64         | number_float    | 0xCB\n    uint 8           | number_unsigned | 0xCC\n    uint 16          | number_unsigned | 0xCD\n    uint 32          | number_unsigned | 0xCE\n    uint 64          | number_unsigned | 0xCF\n    int 8            | number_integer  | 0xD0\n    int 16           | number_integer  | 0xD1\n    int 32           | number_integer  | 0xD2\n    int 64           | number_integer  | 0xD3\n    str 8            | string          | 0xD9\n    str 16           | string          | 0xDA\n    str 32           | string          | 0xDB\n    array 16         | array           | 0xDC\n    array 32         | array           | 0xDD\n    map 16           | object          | 0xDE\n    map 32           | object          | 0xDF\n    bin 8            | binary          | 0xC4\n    bin 16           | binary          | 0xC5\n    bin 32           | binary          | 0xC6\n    ext 8            | binary          | 0xC7\n    ext 16           | binary          | 0xC8\n    ext 32           | binary          | 0xC9\n    fixext 1         | binary          | 0xD4\n    fixext 2         | binary          | 0xD5\n    fixext 4         | binary          | 0xD6\n    fixext 8         | binary          | 0xD7\n    fixext 16        | binary          | 0xD8\n    negative fixint  | number_integer  | 0xE0-0xFF\n\n    @note Any MessagePack output created @ref to_msgpack can be successfully\n          parsed by @ref from_msgpack.\n\n    @param[in] i  an input in MessagePack format convertible to an input\n                  adapter\n    @param[in] strict  whether to expect the input to be consumed until EOF\n                       (true by default)\n    @param[in] allow_exceptions  whether to throw exceptions in case of a\n    parse error (optional, true by default)\n\n    @return deserialized JSON value; in case of a parse error and\n            @a allow_exceptions set to `false`, the return value will be\n            value_t::discarded.\n\n    @throw parse_error.110 if the given input ends prematurely or the end of\n    file was not reached when @a strict was set to true\n    @throw parse_error.112 if unsupported features from MessagePack were\n    used in the given input @a i or if the input is not valid MessagePack\n    @throw parse_error.113 if a string was expected as map key, but not found\n\n    @complexity Linear in the size of the input @a i.\n\n    @liveexample{The example shows the deserialization of a byte vector in\n    MessagePack format to a JSON value.,from_msgpack}\n\n    @sa http://msgpack.org\n    @sa see @ref to_msgpack(const basic_json&) for the analogous serialization\n    @sa see @ref from_cbor(InputType&&, const bool, const bool, const cbor_tag_handler_t) for the\n        related CBOR format\n    @sa see @ref from_ubjson(InputType&&, const bool, const bool) for\n        the related UBJSON format\n    @sa see @ref from_bson(InputType&&, const bool, const bool) for\n        the related BSON format\n\n    @since version 2.0.9; parameter @a start_index since 2.1.1; changed to\n           consume input adapters, removed start_index parameter, and added\n           @a strict parameter since 3.0.0; added @a allow_exceptions parameter\n           since 3.2.0\n    */\n    template<typename InputType>\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json from_msgpack(InputType&& i,\n                                   const bool strict = true,\n                                   const bool allow_exceptions = true)\n    {\n        basic_json result;\n        detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);\n        auto ia = detail::input_adapter(std::forward<InputType>(i));\n        const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::msgpack, &sdp, strict);\n        return res ? result : basic_json(value_t::discarded);\n    }\n\n    /*!\n    @copydoc from_msgpack(InputType&&, const bool, const bool)\n    */\n    template<typename IteratorType>\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json from_msgpack(IteratorType first, IteratorType last,\n                                   const bool strict = true,\n                                   const bool allow_exceptions = true)\n    {\n        basic_json result;\n        detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);\n        auto ia = detail::input_adapter(std::move(first), std::move(last));\n        const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::msgpack, &sdp, strict);\n        return res ? result : basic_json(value_t::discarded);\n    }\n\n\n    template<typename T>\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    JSON_HEDLEY_DEPRECATED_FOR(3.8.0, from_msgpack(ptr, ptr + len))\n    static basic_json from_msgpack(const T* ptr, std::size_t len,\n                                   const bool strict = true,\n                                   const bool allow_exceptions = true)\n    {\n        return from_msgpack(ptr, ptr + len, strict, allow_exceptions);\n    }\n\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    JSON_HEDLEY_DEPRECATED_FOR(3.8.0, from_msgpack(ptr, ptr + len))\n    static basic_json from_msgpack(detail::span_input_adapter&& i,\n                                   const bool strict = true,\n                                   const bool allow_exceptions = true)\n    {\n        basic_json result;\n        detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);\n        auto ia = i.get();\n        // NOLINTNEXTLINE(hicpp-move-const-arg,performance-move-const-arg)\n        const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::msgpack, &sdp, strict);\n        return res ? result : basic_json(value_t::discarded);\n    }\n\n\n    /*!\n    @brief create a JSON value from an input in UBJSON format\n\n    Deserializes a given input @a i to a JSON value using the UBJSON (Universal\n    Binary JSON) serialization format.\n\n    The library maps UBJSON types to JSON value types as follows:\n\n    UBJSON type | JSON value type                         | marker\n    ----------- | --------------------------------------- | ------\n    no-op       | *no value, next value is read*          | `N`\n    null        | `null`                                  | `Z`\n    false       | `false`                                 | `F`\n    true        | `true`                                  | `T`\n    float32     | number_float                            | `d`\n    float64     | number_float                            | `D`\n    uint8       | number_unsigned                         | `U`\n    int8        | number_integer                          | `i`\n    int16       | number_integer                          | `I`\n    int32       | number_integer                          | `l`\n    int64       | number_integer                          | `L`\n    high-precision number | number_integer, number_unsigned, or number_float - depends on number string | 'H'\n    string      | string                                  | `S`\n    char        | string                                  | `C`\n    array       | array (optimized values are supported)  | `[`\n    object      | object (optimized values are supported) | `{`\n\n    @note The mapping is **complete** in the sense that any UBJSON value can\n          be converted to a JSON value.\n\n    @param[in] i  an input in UBJSON format convertible to an input adapter\n    @param[in] strict  whether to expect the input to be consumed until EOF\n                       (true by default)\n    @param[in] allow_exceptions  whether to throw exceptions in case of a\n    parse error (optional, true by default)\n\n    @return deserialized JSON value; in case of a parse error and\n            @a allow_exceptions set to `false`, the return value will be\n            value_t::discarded.\n\n    @throw parse_error.110 if the given input ends prematurely or the end of\n    file was not reached when @a strict was set to true\n    @throw parse_error.112 if a parse error occurs\n    @throw parse_error.113 if a string could not be parsed successfully\n\n    @complexity Linear in the size of the input @a i.\n\n    @liveexample{The example shows the deserialization of a byte vector in\n    UBJSON format to a JSON value.,from_ubjson}\n\n    @sa http://ubjson.org\n    @sa see @ref to_ubjson(const basic_json&, const bool, const bool) for the\n             analogous serialization\n    @sa see @ref from_cbor(InputType&&, const bool, const bool, const cbor_tag_handler_t) for the\n        related CBOR format\n    @sa see @ref from_msgpack(InputType&&, const bool, const bool) for\n        the related MessagePack format\n    @sa see @ref from_bson(InputType&&, const bool, const bool) for\n        the related BSON format\n\n    @since version 3.1.0; added @a allow_exceptions parameter since 3.2.0\n    */\n    template<typename InputType>\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json from_ubjson(InputType&& i,\n                                  const bool strict = true,\n                                  const bool allow_exceptions = true)\n    {\n        basic_json result;\n        detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);\n        auto ia = detail::input_adapter(std::forward<InputType>(i));\n        const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::ubjson, &sdp, strict);\n        return res ? result : basic_json(value_t::discarded);\n    }\n\n    /*!\n    @copydoc from_ubjson(InputType&&, const bool, const bool)\n    */\n    template<typename IteratorType>\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json from_ubjson(IteratorType first, IteratorType last,\n                                  const bool strict = true,\n                                  const bool allow_exceptions = true)\n    {\n        basic_json result;\n        detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);\n        auto ia = detail::input_adapter(std::move(first), std::move(last));\n        const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::ubjson, &sdp, strict);\n        return res ? result : basic_json(value_t::discarded);\n    }\n\n    template<typename T>\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    JSON_HEDLEY_DEPRECATED_FOR(3.8.0, from_ubjson(ptr, ptr + len))\n    static basic_json from_ubjson(const T* ptr, std::size_t len,\n                                  const bool strict = true,\n                                  const bool allow_exceptions = true)\n    {\n        return from_ubjson(ptr, ptr + len, strict, allow_exceptions);\n    }\n\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    JSON_HEDLEY_DEPRECATED_FOR(3.8.0, from_ubjson(ptr, ptr + len))\n    static basic_json from_ubjson(detail::span_input_adapter&& i,\n                                  const bool strict = true,\n                                  const bool allow_exceptions = true)\n    {\n        basic_json result;\n        detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);\n        auto ia = i.get();\n        // NOLINTNEXTLINE(hicpp-move-const-arg,performance-move-const-arg)\n        const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::ubjson, &sdp, strict);\n        return res ? result : basic_json(value_t::discarded);\n    }\n\n\n    /*!\n    @brief Create a JSON value from an input in BSON format\n\n    Deserializes a given input @a i to a JSON value using the BSON (Binary JSON)\n    serialization format.\n\n    The library maps BSON record types to JSON value types as follows:\n\n    BSON type       | BSON marker byte | JSON value type\n    --------------- | ---------------- | ---------------------------\n    double          | 0x01             | number_float\n    string          | 0x02             | string\n    document        | 0x03             | object\n    array           | 0x04             | array\n    binary          | 0x05             | binary\n    undefined       | 0x06             | still unsupported\n    ObjectId        | 0x07             | still unsupported\n    boolean         | 0x08             | boolean\n    UTC Date-Time   | 0x09             | still unsupported\n    null            | 0x0A             | null\n    Regular Expr.   | 0x0B             | still unsupported\n    DB Pointer      | 0x0C             | still unsupported\n    JavaScript Code | 0x0D             | still unsupported\n    Symbol          | 0x0E             | still unsupported\n    JavaScript Code | 0x0F             | still unsupported\n    int32           | 0x10             | number_integer\n    Timestamp       | 0x11             | still unsupported\n    128-bit decimal float | 0x13       | still unsupported\n    Max Key         | 0x7F             | still unsupported\n    Min Key         | 0xFF             | still unsupported\n\n    @warning The mapping is **incomplete**. The unsupported mappings\n             are indicated in the table above.\n\n    @param[in] i  an input in BSON format convertible to an input adapter\n    @param[in] strict  whether to expect the input to be consumed until EOF\n                       (true by default)\n    @param[in] allow_exceptions  whether to throw exceptions in case of a\n    parse error (optional, true by default)\n\n    @return deserialized JSON value; in case of a parse error and\n            @a allow_exceptions set to `false`, the return value will be\n            value_t::discarded.\n\n    @throw parse_error.114 if an unsupported BSON record type is encountered\n\n    @complexity Linear in the size of the input @a i.\n\n    @liveexample{The example shows the deserialization of a byte vector in\n    BSON format to a JSON value.,from_bson}\n\n    @sa http://bsonspec.org/spec.html\n    @sa see @ref to_bson(const basic_json&) for the analogous serialization\n    @sa see @ref from_cbor(InputType&&, const bool, const bool, const cbor_tag_handler_t) for the\n        related CBOR format\n    @sa see @ref from_msgpack(InputType&&, const bool, const bool) for\n        the related MessagePack format\n    @sa see @ref from_ubjson(InputType&&, const bool, const bool) for the\n        related UBJSON format\n    */\n    template<typename InputType>\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json from_bson(InputType&& i,\n                                const bool strict = true,\n                                const bool allow_exceptions = true)\n    {\n        basic_json result;\n        detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);\n        auto ia = detail::input_adapter(std::forward<InputType>(i));\n        const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::bson, &sdp, strict);\n        return res ? result : basic_json(value_t::discarded);\n    }\n\n    /*!\n    @copydoc from_bson(InputType&&, const bool, const bool)\n    */\n    template<typename IteratorType>\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json from_bson(IteratorType first, IteratorType last,\n                                const bool strict = true,\n                                const bool allow_exceptions = true)\n    {\n        basic_json result;\n        detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);\n        auto ia = detail::input_adapter(std::move(first), std::move(last));\n        const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::bson, &sdp, strict);\n        return res ? result : basic_json(value_t::discarded);\n    }\n\n    template<typename T>\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    JSON_HEDLEY_DEPRECATED_FOR(3.8.0, from_bson(ptr, ptr + len))\n    static basic_json from_bson(const T* ptr, std::size_t len,\n                                const bool strict = true,\n                                const bool allow_exceptions = true)\n    {\n        return from_bson(ptr, ptr + len, strict, allow_exceptions);\n    }\n\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    JSON_HEDLEY_DEPRECATED_FOR(3.8.0, from_bson(ptr, ptr + len))\n    static basic_json from_bson(detail::span_input_adapter&& i,\n                                const bool strict = true,\n                                const bool allow_exceptions = true)\n    {\n        basic_json result;\n        detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);\n        auto ia = i.get();\n        // NOLINTNEXTLINE(hicpp-move-const-arg,performance-move-const-arg)\n        const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::bson, &sdp, strict);\n        return res ? result : basic_json(value_t::discarded);\n    }\n    /// @}\n\n    //////////////////////////\n    // JSON Pointer support //\n    //////////////////////////\n\n    /// @name JSON Pointer functions\n    /// @{\n\n    /*!\n    @brief access specified element via JSON Pointer\n\n    Uses a JSON pointer to retrieve a reference to the respective JSON value.\n    No bound checking is performed. Similar to @ref operator[](const typename\n    object_t::key_type&), `null` values are created in arrays and objects if\n    necessary.\n\n    In particular:\n    - If the JSON pointer points to an object key that does not exist, it\n      is created an filled with a `null` value before a reference to it\n      is returned.\n    - If the JSON pointer points to an array index that does not exist, it\n      is created an filled with a `null` value before a reference to it\n      is returned. All indices between the current maximum and the given\n      index are also filled with `null`.\n    - The special value `-` is treated as a synonym for the index past the\n      end.\n\n    @param[in] ptr  a JSON pointer\n\n    @return reference to the element pointed to by @a ptr\n\n    @complexity Constant.\n\n    @throw parse_error.106   if an array index begins with '0'\n    @throw parse_error.109   if an array index was not a number\n    @throw out_of_range.404  if the JSON pointer can not be resolved\n\n    @liveexample{The behavior is shown in the example.,operatorjson_pointer}\n\n    @since version 2.0.0\n    */\n    reference operator[](const json_pointer& ptr)\n    {\n        return ptr.get_unchecked(this);\n    }\n\n    /*!\n    @brief access specified element via JSON Pointer\n\n    Uses a JSON pointer to retrieve a reference to the respective JSON value.\n    No bound checking is performed. The function does not change the JSON\n    value; no `null` values are created. In particular, the special value\n    `-` yields an exception.\n\n    @param[in] ptr  JSON pointer to the desired element\n\n    @return const reference to the element pointed to by @a ptr\n\n    @complexity Constant.\n\n    @throw parse_error.106   if an array index begins with '0'\n    @throw parse_error.109   if an array index was not a number\n    @throw out_of_range.402  if the array index '-' is used\n    @throw out_of_range.404  if the JSON pointer can not be resolved\n\n    @liveexample{The behavior is shown in the example.,operatorjson_pointer_const}\n\n    @since version 2.0.0\n    */\n    const_reference operator[](const json_pointer& ptr) const\n    {\n        return ptr.get_unchecked(this);\n    }\n\n    /*!\n    @brief access specified element via JSON Pointer\n\n    Returns a reference to the element at with specified JSON pointer @a ptr,\n    with bounds checking.\n\n    @param[in] ptr  JSON pointer to the desired element\n\n    @return reference to the element pointed to by @a ptr\n\n    @throw parse_error.106 if an array index in the passed JSON pointer @a ptr\n    begins with '0'. See example below.\n\n    @throw parse_error.109 if an array index in the passed JSON pointer @a ptr\n    is not a number. See example below.\n\n    @throw out_of_range.401 if an array index in the passed JSON pointer @a ptr\n    is out of range. See example below.\n\n    @throw out_of_range.402 if the array index '-' is used in the passed JSON\n    pointer @a ptr. As `at` provides checked access (and no elements are\n    implicitly inserted), the index '-' is always invalid. See example below.\n\n    @throw out_of_range.403 if the JSON pointer describes a key of an object\n    which cannot be found. See example below.\n\n    @throw out_of_range.404 if the JSON pointer @a ptr can not be resolved.\n    See example below.\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes in the JSON value.\n\n    @complexity Constant.\n\n    @since version 2.0.0\n\n    @liveexample{The behavior is shown in the example.,at_json_pointer}\n    */\n    reference at(const json_pointer& ptr)\n    {\n        return ptr.get_checked(this);\n    }\n\n    /*!\n    @brief access specified element via JSON Pointer\n\n    Returns a const reference to the element at with specified JSON pointer @a\n    ptr, with bounds checking.\n\n    @param[in] ptr  JSON pointer to the desired element\n\n    @return reference to the element pointed to by @a ptr\n\n    @throw parse_error.106 if an array index in the passed JSON pointer @a ptr\n    begins with '0'. See example below.\n\n    @throw parse_error.109 if an array index in the passed JSON pointer @a ptr\n    is not a number. See example below.\n\n    @throw out_of_range.401 if an array index in the passed JSON pointer @a ptr\n    is out of range. See example below.\n\n    @throw out_of_range.402 if the array index '-' is used in the passed JSON\n    pointer @a ptr. As `at` provides checked access (and no elements are\n    implicitly inserted), the index '-' is always invalid. See example below.\n\n    @throw out_of_range.403 if the JSON pointer describes a key of an object\n    which cannot be found. See example below.\n\n    @throw out_of_range.404 if the JSON pointer @a ptr can not be resolved.\n    See example below.\n\n    @exceptionsafety Strong guarantee: if an exception is thrown, there are no\n    changes in the JSON value.\n\n    @complexity Constant.\n\n    @since version 2.0.0\n\n    @liveexample{The behavior is shown in the example.,at_json_pointer_const}\n    */\n    const_reference at(const json_pointer& ptr) const\n    {\n        return ptr.get_checked(this);\n    }\n\n    /*!\n    @brief return flattened JSON value\n\n    The function creates a JSON object whose keys are JSON pointers (see [RFC\n    6901](https://tools.ietf.org/html/rfc6901)) and whose values are all\n    primitive. The original JSON value can be restored using the @ref\n    unflatten() function.\n\n    @return an object that maps JSON pointers to primitive values\n\n    @note Empty objects and arrays are flattened to `null` and will not be\n          reconstructed correctly by the @ref unflatten() function.\n\n    @complexity Linear in the size the JSON value.\n\n    @liveexample{The following code shows how a JSON object is flattened to an\n    object whose keys consist of JSON pointers.,flatten}\n\n    @sa see @ref unflatten() for the reverse function\n\n    @since version 2.0.0\n    */\n    basic_json flatten() const\n    {\n        basic_json result(value_t::object);\n        json_pointer::flatten(\"\", *this, result);\n        return result;\n    }\n\n    /*!\n    @brief unflatten a previously flattened JSON value\n\n    The function restores the arbitrary nesting of a JSON value that has been\n    flattened before using the @ref flatten() function. The JSON value must\n    meet certain constraints:\n    1. The value must be an object.\n    2. The keys must be JSON pointers (see\n       [RFC 6901](https://tools.ietf.org/html/rfc6901))\n    3. The mapped values must be primitive JSON types.\n\n    @return the original JSON from a flattened version\n\n    @note Empty objects and arrays are flattened by @ref flatten() to `null`\n          values and can not unflattened to their original type. Apart from\n          this example, for a JSON value `j`, the following is always true:\n          `j == j.flatten().unflatten()`.\n\n    @complexity Linear in the size the JSON value.\n\n    @throw type_error.314  if value is not an object\n    @throw type_error.315  if object values are not primitive\n\n    @liveexample{The following code shows how a flattened JSON object is\n    unflattened into the original nested JSON object.,unflatten}\n\n    @sa see @ref flatten() for the reverse function\n\n    @since version 2.0.0\n    */\n    basic_json unflatten() const\n    {\n        return json_pointer::unflatten(*this);\n    }\n\n    /// @}\n\n    //////////////////////////\n    // JSON Patch functions //\n    //////////////////////////\n\n    /// @name JSON Patch functions\n    /// @{\n\n    /*!\n    @brief applies a JSON patch\n\n    [JSON Patch](http://jsonpatch.com) defines a JSON document structure for\n    expressing a sequence of operations to apply to a JSON) document. With\n    this function, a JSON Patch is applied to the current JSON value by\n    executing all operations from the patch.\n\n    @param[in] json_patch  JSON patch document\n    @return patched document\n\n    @note The application of a patch is atomic: Either all operations succeed\n          and the patched document is returned or an exception is thrown. In\n          any case, the original value is not changed: the patch is applied\n          to a copy of the value.\n\n    @throw parse_error.104 if the JSON patch does not consist of an array of\n    objects\n\n    @throw parse_error.105 if the JSON patch is malformed (e.g., mandatory\n    attributes are missing); example: `\"operation add must have member path\"`\n\n    @throw out_of_range.401 if an array index is out of range.\n\n    @throw out_of_range.403 if a JSON pointer inside the patch could not be\n    resolved successfully in the current JSON value; example: `\"key baz not\n    found\"`\n\n    @throw out_of_range.405 if JSON pointer has no parent (\"add\", \"remove\",\n    \"move\")\n\n    @throw other_error.501 if \"test\" operation was unsuccessful\n\n    @complexity Linear in the size of the JSON value and the length of the\n    JSON patch. As usually only a fraction of the JSON value is affected by\n    the patch, the complexity can usually be neglected.\n\n    @liveexample{The following code shows how a JSON patch is applied to a\n    value.,patch}\n\n    @sa see @ref diff -- create a JSON patch by comparing two JSON values\n\n    @sa [RFC 6902 (JSON Patch)](https://tools.ietf.org/html/rfc6902)\n    @sa [RFC 6901 (JSON Pointer)](https://tools.ietf.org/html/rfc6901)\n\n    @since version 2.0.0\n    */\n    basic_json patch(const basic_json& json_patch) const\n    {\n        // make a working copy to apply the patch to\n        basic_json result = *this;\n\n        // the valid JSON Patch operations\n        enum class patch_operations {add, remove, replace, move, copy, test, invalid};\n\n        const auto get_op = [](const std::string & op)\n        {\n            if (op == \"add\")\n            {\n                return patch_operations::add;\n            }\n            if (op == \"remove\")\n            {\n                return patch_operations::remove;\n            }\n            if (op == \"replace\")\n            {\n                return patch_operations::replace;\n            }\n            if (op == \"move\")\n            {\n                return patch_operations::move;\n            }\n            if (op == \"copy\")\n            {\n                return patch_operations::copy;\n            }\n            if (op == \"test\")\n            {\n                return patch_operations::test;\n            }\n\n            return patch_operations::invalid;\n        };\n\n        // wrapper for \"add\" operation; add value at ptr\n        const auto operation_add = [&result](json_pointer & ptr, basic_json val)\n        {\n            // adding to the root of the target document means replacing it\n            if (ptr.empty())\n            {\n                result = val;\n                return;\n            }\n\n            // make sure the top element of the pointer exists\n            json_pointer top_pointer = ptr.top();\n            if (top_pointer != ptr)\n            {\n                result.at(top_pointer);\n            }\n\n            // get reference to parent of JSON pointer ptr\n            const auto last_path = ptr.back();\n            ptr.pop_back();\n            basic_json& parent = result[ptr];\n\n            switch (parent.m_type)\n            {\n                case value_t::null:\n                case value_t::object:\n                {\n                    // use operator[] to add value\n                    parent[last_path] = val;\n                    break;\n                }\n\n                case value_t::array:\n                {\n                    if (last_path == \"-\")\n                    {\n                        // special case: append to back\n                        parent.push_back(val);\n                    }\n                    else\n                    {\n                        const auto idx = json_pointer::array_index(last_path);\n                        if (JSON_HEDLEY_UNLIKELY(idx > parent.size()))\n                        {\n                            // avoid undefined behavior\n                            JSON_THROW(out_of_range::create(401, \"array index \" + std::to_string(idx) + \" is out of range\", parent));\n                        }\n\n                        // default case: insert add offset\n                        parent.insert(parent.begin() + static_cast<difference_type>(idx), val);\n                    }\n                    break;\n                }\n\n                // if there exists a parent it cannot be primitive\n                default:            // LCOV_EXCL_LINE\n                    JSON_ASSERT(false); // NOLINT(cert-dcl03-c,hicpp-static-assert,misc-static-assert) LCOV_EXCL_LINE\n            }\n        };\n\n        // wrapper for \"remove\" operation; remove value at ptr\n        const auto operation_remove = [this, &result](json_pointer & ptr)\n        {\n            // get reference to parent of JSON pointer ptr\n            const auto last_path = ptr.back();\n            ptr.pop_back();\n            basic_json& parent = result.at(ptr);\n\n            // remove child\n            if (parent.is_object())\n            {\n                // perform range check\n                auto it = parent.find(last_path);\n                if (JSON_HEDLEY_LIKELY(it != parent.end()))\n                {\n                    parent.erase(it);\n                }\n                else\n                {\n                    JSON_THROW(out_of_range::create(403, \"key '\" + last_path + \"' not found\", *this));\n                }\n            }\n            else if (parent.is_array())\n            {\n                // note erase performs range check\n                parent.erase(json_pointer::array_index(last_path));\n            }\n        };\n\n        // type check: top level value must be an array\n        if (JSON_HEDLEY_UNLIKELY(!json_patch.is_array()))\n        {\n            JSON_THROW(parse_error::create(104, 0, \"JSON patch must be an array of objects\", json_patch));\n        }\n\n        // iterate and apply the operations\n        for (const auto& val : json_patch)\n        {\n            // wrapper to get a value for an operation\n            const auto get_value = [&val](const std::string & op,\n                                          const std::string & member,\n                                          bool string_type) -> basic_json &\n            {\n                // find value\n                auto it = val.m_value.object->find(member);\n\n                // context-sensitive error message\n                const auto error_msg = (op == \"op\") ? \"operation\" : \"operation '\" + op + \"'\";\n\n                // check if desired value is present\n                if (JSON_HEDLEY_UNLIKELY(it == val.m_value.object->end()))\n                {\n                    // NOLINTNEXTLINE(performance-inefficient-string-concatenation)\n                    JSON_THROW(parse_error::create(105, 0, error_msg + \" must have member '\" + member + \"'\", val));\n                }\n\n                // check if result is of type string\n                if (JSON_HEDLEY_UNLIKELY(string_type && !it->second.is_string()))\n                {\n                    // NOLINTNEXTLINE(performance-inefficient-string-concatenation)\n                    JSON_THROW(parse_error::create(105, 0, error_msg + \" must have string member '\" + member + \"'\", val));\n                }\n\n                // no error: return value\n                return it->second;\n            };\n\n            // type check: every element of the array must be an object\n            if (JSON_HEDLEY_UNLIKELY(!val.is_object()))\n            {\n                JSON_THROW(parse_error::create(104, 0, \"JSON patch must be an array of objects\", val));\n            }\n\n            // collect mandatory members\n            const auto op = get_value(\"op\", \"op\", true).template get<std::string>();\n            const auto path = get_value(op, \"path\", true).template get<std::string>();\n            json_pointer ptr(path);\n\n            switch (get_op(op))\n            {\n                case patch_operations::add:\n                {\n                    operation_add(ptr, get_value(\"add\", \"value\", false));\n                    break;\n                }\n\n                case patch_operations::remove:\n                {\n                    operation_remove(ptr);\n                    break;\n                }\n\n                case patch_operations::replace:\n                {\n                    // the \"path\" location must exist - use at()\n                    result.at(ptr) = get_value(\"replace\", \"value\", false);\n                    break;\n                }\n\n                case patch_operations::move:\n                {\n                    const auto from_path = get_value(\"move\", \"from\", true).template get<std::string>();\n                    json_pointer from_ptr(from_path);\n\n                    // the \"from\" location must exist - use at()\n                    basic_json v = result.at(from_ptr);\n\n                    // The move operation is functionally identical to a\n                    // \"remove\" operation on the \"from\" location, followed\n                    // immediately by an \"add\" operation at the target\n                    // location with the value that was just removed.\n                    operation_remove(from_ptr);\n                    operation_add(ptr, v);\n                    break;\n                }\n\n                case patch_operations::copy:\n                {\n                    const auto from_path = get_value(\"copy\", \"from\", true).template get<std::string>();\n                    const json_pointer from_ptr(from_path);\n\n                    // the \"from\" location must exist - use at()\n                    basic_json v = result.at(from_ptr);\n\n                    // The copy is functionally identical to an \"add\"\n                    // operation at the target location using the value\n                    // specified in the \"from\" member.\n                    operation_add(ptr, v);\n                    break;\n                }\n\n                case patch_operations::test:\n                {\n                    bool success = false;\n                    JSON_TRY\n                    {\n                        // check if \"value\" matches the one at \"path\"\n                        // the \"path\" location must exist - use at()\n                        success = (result.at(ptr) == get_value(\"test\", \"value\", false));\n                    }\n                    JSON_INTERNAL_CATCH (out_of_range&)\n                    {\n                        // ignore out of range errors: success remains false\n                    }\n\n                    // throw an exception if test fails\n                    if (JSON_HEDLEY_UNLIKELY(!success))\n                    {\n                        JSON_THROW(other_error::create(501, \"unsuccessful: \" + val.dump(), val));\n                    }\n\n                    break;\n                }\n\n                default:\n                {\n                    // op must be \"add\", \"remove\", \"replace\", \"move\", \"copy\", or\n                    // \"test\"\n                    JSON_THROW(parse_error::create(105, 0, \"operation value '\" + op + \"' is invalid\", val));\n                }\n            }\n        }\n\n        return result;\n    }\n\n    /*!\n    @brief creates a diff as a JSON patch\n\n    Creates a [JSON Patch](http://jsonpatch.com) so that value @a source can\n    be changed into the value @a target by calling @ref patch function.\n\n    @invariant For two JSON values @a source and @a target, the following code\n    yields always `true`:\n    @code {.cpp}\n    source.patch(diff(source, target)) == target;\n    @endcode\n\n    @note Currently, only `remove`, `add`, and `replace` operations are\n          generated.\n\n    @param[in] source  JSON value to compare from\n    @param[in] target  JSON value to compare against\n    @param[in] path    helper value to create JSON pointers\n\n    @return a JSON patch to convert the @a source to @a target\n\n    @complexity Linear in the lengths of @a source and @a target.\n\n    @liveexample{The following code shows how a JSON patch is created as a\n    diff for two JSON values.,diff}\n\n    @sa see @ref patch -- apply a JSON patch\n    @sa see @ref merge_patch -- apply a JSON Merge Patch\n\n    @sa [RFC 6902 (JSON Patch)](https://tools.ietf.org/html/rfc6902)\n\n    @since version 2.0.0\n    */\n    JSON_HEDLEY_WARN_UNUSED_RESULT\n    static basic_json diff(const basic_json& source, const basic_json& target,\n                           const std::string& path = \"\")\n    {\n        // the patch\n        basic_json result(value_t::array);\n\n        // if the values are the same, return empty patch\n        if (source == target)\n        {\n            return result;\n        }\n\n        if (source.type() != target.type())\n        {\n            // different types: replace value\n            result.push_back(\n            {\n                {\"op\", \"replace\"}, {\"path\", path}, {\"value\", target}\n            });\n            return result;\n        }\n\n        switch (source.type())\n        {\n            case value_t::array:\n            {\n                // first pass: traverse common elements\n                std::size_t i = 0;\n                while (i < source.size() && i < target.size())\n                {\n                    // recursive call to compare array values at index i\n                    auto temp_diff = diff(source[i], target[i], path + \"/\" + std::to_string(i));\n                    result.insert(result.end(), temp_diff.begin(), temp_diff.end());\n                    ++i;\n                }\n\n                // i now reached the end of at least one array\n                // in a second pass, traverse the remaining elements\n\n                // remove my remaining elements\n                const auto end_index = static_cast<difference_type>(result.size());\n                while (i < source.size())\n                {\n                    // add operations in reverse order to avoid invalid\n                    // indices\n                    result.insert(result.begin() + end_index, object(\n                    {\n                        {\"op\", \"remove\"},\n                        {\"path\", path + \"/\" + std::to_string(i)}\n                    }));\n                    ++i;\n                }\n\n                // add other remaining elements\n                while (i < target.size())\n                {\n                    result.push_back(\n                    {\n                        {\"op\", \"add\"},\n                        {\"path\", path + \"/-\"},\n                        {\"value\", target[i]}\n                    });\n                    ++i;\n                }\n\n                break;\n            }\n\n            case value_t::object:\n            {\n                // first pass: traverse this object's elements\n                for (auto it = source.cbegin(); it != source.cend(); ++it)\n                {\n                    // escape the key name to be used in a JSON patch\n                    const auto path_key = path + \"/\" + detail::escape(it.key());\n\n                    if (target.find(it.key()) != target.end())\n                    {\n                        // recursive call to compare object values at key it\n                        auto temp_diff = diff(it.value(), target[it.key()], path_key);\n                        result.insert(result.end(), temp_diff.begin(), temp_diff.end());\n                    }\n                    else\n                    {\n                        // found a key that is not in o -> remove it\n                        result.push_back(object(\n                        {\n                            {\"op\", \"remove\"}, {\"path\", path_key}\n                        }));\n                    }\n                }\n\n                // second pass: traverse other object's elements\n                for (auto it = target.cbegin(); it != target.cend(); ++it)\n                {\n                    if (source.find(it.key()) == source.end())\n                    {\n                        // found a key that is not in this -> add it\n                        const auto path_key = path + \"/\" + detail::escape(it.key());\n                        result.push_back(\n                        {\n                            {\"op\", \"add\"}, {\"path\", path_key},\n                            {\"value\", it.value()}\n                        });\n                    }\n                }\n\n                break;\n            }\n\n            default:\n            {\n                // both primitive type: replace value\n                result.push_back(\n                {\n                    {\"op\", \"replace\"}, {\"path\", path}, {\"value\", target}\n                });\n                break;\n            }\n        }\n\n        return result;\n    }\n\n    /// @}\n\n    ////////////////////////////////\n    // JSON Merge Patch functions //\n    ////////////////////////////////\n\n    /// @name JSON Merge Patch functions\n    /// @{\n\n    /*!\n    @brief applies a JSON Merge Patch\n\n    The merge patch format is primarily intended for use with the HTTP PATCH\n    method as a means of describing a set of modifications to a target\n    resource's content. This function applies a merge patch to the current\n    JSON value.\n\n    The function implements the following algorithm from Section 2 of\n    [RFC 7396 (JSON Merge Patch)](https://tools.ietf.org/html/rfc7396):\n\n    ```\n    define MergePatch(Target, Patch):\n      if Patch is an Object:\n        if Target is not an Object:\n          Target = {} // Ignore the contents and set it to an empty Object\n        for each Name/Value pair in Patch:\n          if Value is null:\n            if Name exists in Target:\n              remove the Name/Value pair from Target\n          else:\n            Target[Name] = MergePatch(Target[Name], Value)\n        return Target\n      else:\n        return Patch\n    ```\n\n    Thereby, `Target` is the current object; that is, the patch is applied to\n    the current value.\n\n    @param[in] apply_patch  the patch to apply\n\n    @complexity Linear in the lengths of @a patch.\n\n    @liveexample{The following code shows how a JSON Merge Patch is applied to\n    a JSON document.,merge_patch}\n\n    @sa see @ref patch -- apply a JSON patch\n    @sa [RFC 7396 (JSON Merge Patch)](https://tools.ietf.org/html/rfc7396)\n\n    @since version 3.0.0\n    */\n    void merge_patch(const basic_json& apply_patch)\n    {\n        if (apply_patch.is_object())\n        {\n            if (!is_object())\n            {\n                *this = object();\n            }\n            for (auto it = apply_patch.begin(); it != apply_patch.end(); ++it)\n            {\n                if (it.value().is_null())\n                {\n                    erase(it.key());\n                }\n                else\n                {\n                    operator[](it.key()).merge_patch(it.value());\n                }\n            }\n        }\n        else\n        {\n            *this = apply_patch;\n        }\n    }\n\n    /// @}\n};\n\n/*!\n@brief user-defined to_string function for JSON values\n\nThis function implements a user-defined to_string  for JSON objects.\n\n@param[in] j  a JSON object\n@return a std::string object\n*/\n\nNLOHMANN_BASIC_JSON_TPL_DECLARATION\nstd::string to_string(const NLOHMANN_BASIC_JSON_TPL& j)\n{\n    return j.dump();\n}\n} // namespace nlohmann\n\n///////////////////////\n// nonmember support //\n///////////////////////\n\n// specialization of std::swap, and std::hash\nnamespace std\n{\n\n/// hash value for JSON objects\ntemplate<>\nstruct hash<nlohmann::json>\n{\n    /*!\n    @brief return a hash value for a JSON object\n\n    @since version 1.0.0\n    */\n    std::size_t operator()(const nlohmann::json& j) const\n    {\n        return nlohmann::detail::hash(j);\n    }\n};\n\n/// specialization for std::less<value_t>\n/// @note: do not remove the space after '<',\n///        see https://github.com/nlohmann/json/pull/679\ntemplate<>\nstruct less<::nlohmann::detail::value_t>\n{\n    /*!\n    @brief compare two value_t enum values\n    @since version 3.0.0\n    */\n    bool operator()(nlohmann::detail::value_t lhs,\n                    nlohmann::detail::value_t rhs) const noexcept\n    {\n        return nlohmann::detail::operator<(lhs, rhs);\n    }\n};\n\n// C++20 prohibit function specialization in the std namespace.\n#ifndef JSON_HAS_CPP_20\n\n/*!\n@brief exchanges the values of two JSON objects\n\n@since version 1.0.0\n*/\ntemplate<>\ninline void swap<nlohmann::json>(nlohmann::json& j1, nlohmann::json& j2) noexcept( // NOLINT(readability-inconsistent-declaration-parameter-name)\n    is_nothrow_move_constructible<nlohmann::json>::value&&  // NOLINT(misc-redundant-expression)\n    is_nothrow_move_assignable<nlohmann::json>::value\n                              )\n{\n    j1.swap(j2);\n}\n\n#endif\n\n} // namespace std\n\n/*!\n@brief user-defined string literal for JSON values\n\nThis operator implements a user-defined string literal for JSON objects. It\ncan be used by adding `\"_json\"` to a string literal and returns a JSON object\nif no parse error occurred.\n\n@param[in] s  a string representation of a JSON object\n@param[in] n  the length of string @a s\n@return a JSON object\n\n@since version 1.0.0\n*/\nJSON_HEDLEY_NON_NULL(1)\ninline nlohmann::json operator \"\" _json(const char* s, std::size_t n)\n{\n    return nlohmann::json::parse(s, s + n);\n}\n\n/*!\n@brief user-defined string literal for JSON pointer\n\nThis operator implements a user-defined string literal for JSON Pointers. It\ncan be used by adding `\"_json_pointer\"` to a string literal and returns a JSON pointer\nobject if no parse error occurred.\n\n@param[in] s  a string representation of a JSON Pointer\n@param[in] n  the length of string @a s\n@return a JSON pointer object\n\n@since version 2.0.0\n*/\nJSON_HEDLEY_NON_NULL(1)\ninline nlohmann::json::json_pointer operator \"\" _json_pointer(const char* s, std::size_t n)\n{\n    return nlohmann::json::json_pointer(std::string(s, n));\n}\n\n// #include <nlohmann/detail/macro_unscope.hpp>\n\n\n// restore GCC/clang diagnostic settings\n#if defined(__clang__)\n    #pragma GCC diagnostic pop\n#endif\n\n// clean up\n#undef JSON_ASSERT\n#undef JSON_INTERNAL_CATCH\n#undef JSON_CATCH\n#undef JSON_THROW\n#undef JSON_TRY\n#undef JSON_PRIVATE_UNLESS_TESTED\n#undef JSON_HAS_CPP_14\n#undef JSON_HAS_CPP_17\n#undef NLOHMANN_BASIC_JSON_TPL_DECLARATION\n#undef NLOHMANN_BASIC_JSON_TPL\n#undef JSON_EXPLICIT\n\n// #include <nlohmann/thirdparty/hedley/hedley_undef.hpp>\n\n\n#undef JSON_HEDLEY_ALWAYS_INLINE\n#undef JSON_HEDLEY_ARM_VERSION\n#undef JSON_HEDLEY_ARM_VERSION_CHECK\n#undef JSON_HEDLEY_ARRAY_PARAM\n#undef JSON_HEDLEY_ASSUME\n#undef JSON_HEDLEY_BEGIN_C_DECLS\n#undef JSON_HEDLEY_CLANG_HAS_ATTRIBUTE\n#undef JSON_HEDLEY_CLANG_HAS_BUILTIN\n#undef JSON_HEDLEY_CLANG_HAS_CPP_ATTRIBUTE\n#undef JSON_HEDLEY_CLANG_HAS_DECLSPEC_DECLSPEC_ATTRIBUTE\n#undef JSON_HEDLEY_CLANG_HAS_EXTENSION\n#undef JSON_HEDLEY_CLANG_HAS_FEATURE\n#undef JSON_HEDLEY_CLANG_HAS_WARNING\n#undef JSON_HEDLEY_COMPCERT_VERSION\n#undef JSON_HEDLEY_COMPCERT_VERSION_CHECK\n#undef JSON_HEDLEY_CONCAT\n#undef JSON_HEDLEY_CONCAT3\n#undef JSON_HEDLEY_CONCAT3_EX\n#undef JSON_HEDLEY_CONCAT_EX\n#undef JSON_HEDLEY_CONST\n#undef JSON_HEDLEY_CONSTEXPR\n#undef JSON_HEDLEY_CONST_CAST\n#undef JSON_HEDLEY_CPP_CAST\n#undef JSON_HEDLEY_CRAY_VERSION\n#undef JSON_HEDLEY_CRAY_VERSION_CHECK\n#undef JSON_HEDLEY_C_DECL\n#undef JSON_HEDLEY_DEPRECATED\n#undef JSON_HEDLEY_DEPRECATED_FOR\n#undef JSON_HEDLEY_DIAGNOSTIC_DISABLE_CAST_QUAL\n#undef JSON_HEDLEY_DIAGNOSTIC_DISABLE_CPP98_COMPAT_WRAP_\n#undef JSON_HEDLEY_DIAGNOSTIC_DISABLE_DEPRECATED\n#undef JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_CPP_ATTRIBUTES\n#undef JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNKNOWN_PRAGMAS\n#undef JSON_HEDLEY_DIAGNOSTIC_DISABLE_UNUSED_FUNCTION\n#undef JSON_HEDLEY_DIAGNOSTIC_POP\n#undef JSON_HEDLEY_DIAGNOSTIC_PUSH\n#undef JSON_HEDLEY_DMC_VERSION\n#undef JSON_HEDLEY_DMC_VERSION_CHECK\n#undef JSON_HEDLEY_EMPTY_BASES\n#undef JSON_HEDLEY_EMSCRIPTEN_VERSION\n#undef JSON_HEDLEY_EMSCRIPTEN_VERSION_CHECK\n#undef JSON_HEDLEY_END_C_DECLS\n#undef JSON_HEDLEY_FLAGS\n#undef JSON_HEDLEY_FLAGS_CAST\n#undef JSON_HEDLEY_GCC_HAS_ATTRIBUTE\n#undef JSON_HEDLEY_GCC_HAS_BUILTIN\n#undef JSON_HEDLEY_GCC_HAS_CPP_ATTRIBUTE\n#undef JSON_HEDLEY_GCC_HAS_DECLSPEC_ATTRIBUTE\n#undef JSON_HEDLEY_GCC_HAS_EXTENSION\n#undef JSON_HEDLEY_GCC_HAS_FEATURE\n#undef JSON_HEDLEY_GCC_HAS_WARNING\n#undef JSON_HEDLEY_GCC_NOT_CLANG_VERSION_CHECK\n#undef JSON_HEDLEY_GCC_VERSION\n#undef JSON_HEDLEY_GCC_VERSION_CHECK\n#undef JSON_HEDLEY_GNUC_HAS_ATTRIBUTE\n#undef JSON_HEDLEY_GNUC_HAS_BUILTIN\n#undef JSON_HEDLEY_GNUC_HAS_CPP_ATTRIBUTE\n#undef JSON_HEDLEY_GNUC_HAS_DECLSPEC_ATTRIBUTE\n#undef JSON_HEDLEY_GNUC_HAS_EXTENSION\n#undef JSON_HEDLEY_GNUC_HAS_FEATURE\n#undef JSON_HEDLEY_GNUC_HAS_WARNING\n#undef JSON_HEDLEY_GNUC_VERSION\n#undef JSON_HEDLEY_GNUC_VERSION_CHECK\n#undef JSON_HEDLEY_HAS_ATTRIBUTE\n#undef JSON_HEDLEY_HAS_BUILTIN\n#undef JSON_HEDLEY_HAS_CPP_ATTRIBUTE\n#undef JSON_HEDLEY_HAS_CPP_ATTRIBUTE_NS\n#undef JSON_HEDLEY_HAS_DECLSPEC_ATTRIBUTE\n#undef JSON_HEDLEY_HAS_EXTENSION\n#undef JSON_HEDLEY_HAS_FEATURE\n#undef JSON_HEDLEY_HAS_WARNING\n#undef JSON_HEDLEY_IAR_VERSION\n#undef JSON_HEDLEY_IAR_VERSION_CHECK\n#undef JSON_HEDLEY_IBM_VERSION\n#undef JSON_HEDLEY_IBM_VERSION_CHECK\n#undef JSON_HEDLEY_IMPORT\n#undef JSON_HEDLEY_INLINE\n#undef JSON_HEDLEY_INTEL_CL_VERSION\n#undef JSON_HEDLEY_INTEL_CL_VERSION_CHECK\n#undef JSON_HEDLEY_INTEL_VERSION\n#undef JSON_HEDLEY_INTEL_VERSION_CHECK\n#undef JSON_HEDLEY_IS_CONSTANT\n#undef JSON_HEDLEY_IS_CONSTEXPR_\n#undef JSON_HEDLEY_LIKELY\n#undef JSON_HEDLEY_MALLOC\n#undef JSON_HEDLEY_MCST_LCC_VERSION\n#undef JSON_HEDLEY_MCST_LCC_VERSION_CHECK\n#undef JSON_HEDLEY_MESSAGE\n#undef JSON_HEDLEY_MSVC_VERSION\n#undef JSON_HEDLEY_MSVC_VERSION_CHECK\n#undef JSON_HEDLEY_NEVER_INLINE\n#undef JSON_HEDLEY_NON_NULL\n#undef JSON_HEDLEY_NO_ESCAPE\n#undef JSON_HEDLEY_NO_RETURN\n#undef JSON_HEDLEY_NO_THROW\n#undef JSON_HEDLEY_NULL\n#undef JSON_HEDLEY_PELLES_VERSION\n#undef JSON_HEDLEY_PELLES_VERSION_CHECK\n#undef JSON_HEDLEY_PGI_VERSION\n#undef JSON_HEDLEY_PGI_VERSION_CHECK\n#undef JSON_HEDLEY_PREDICT\n#undef JSON_HEDLEY_PRINTF_FORMAT\n#undef JSON_HEDLEY_PRIVATE\n#undef JSON_HEDLEY_PUBLIC\n#undef JSON_HEDLEY_PURE\n#undef JSON_HEDLEY_REINTERPRET_CAST\n#undef JSON_HEDLEY_REQUIRE\n#undef JSON_HEDLEY_REQUIRE_CONSTEXPR\n#undef JSON_HEDLEY_REQUIRE_MSG\n#undef JSON_HEDLEY_RESTRICT\n#undef JSON_HEDLEY_RETURNS_NON_NULL\n#undef JSON_HEDLEY_SENTINEL\n#undef JSON_HEDLEY_STATIC_ASSERT\n#undef JSON_HEDLEY_STATIC_CAST\n#undef JSON_HEDLEY_STRINGIFY\n#undef JSON_HEDLEY_STRINGIFY_EX\n#undef JSON_HEDLEY_SUNPRO_VERSION\n#undef JSON_HEDLEY_SUNPRO_VERSION_CHECK\n#undef JSON_HEDLEY_TINYC_VERSION\n#undef JSON_HEDLEY_TINYC_VERSION_CHECK\n#undef JSON_HEDLEY_TI_ARMCL_VERSION\n#undef JSON_HEDLEY_TI_ARMCL_VERSION_CHECK\n#undef JSON_HEDLEY_TI_CL2000_VERSION\n#undef JSON_HEDLEY_TI_CL2000_VERSION_CHECK\n#undef JSON_HEDLEY_TI_CL430_VERSION\n#undef JSON_HEDLEY_TI_CL430_VERSION_CHECK\n#undef JSON_HEDLEY_TI_CL6X_VERSION\n#undef JSON_HEDLEY_TI_CL6X_VERSION_CHECK\n#undef JSON_HEDLEY_TI_CL7X_VERSION\n#undef JSON_HEDLEY_TI_CL7X_VERSION_CHECK\n#undef JSON_HEDLEY_TI_CLPRU_VERSION\n#undef JSON_HEDLEY_TI_CLPRU_VERSION_CHECK\n#undef JSON_HEDLEY_TI_VERSION\n#undef JSON_HEDLEY_TI_VERSION_CHECK\n#undef JSON_HEDLEY_UNAVAILABLE\n#undef JSON_HEDLEY_UNLIKELY\n#undef JSON_HEDLEY_UNPREDICTABLE\n#undef JSON_HEDLEY_UNREACHABLE\n#undef JSON_HEDLEY_UNREACHABLE_RETURN\n#undef JSON_HEDLEY_VERSION\n#undef JSON_HEDLEY_VERSION_DECODE_MAJOR\n#undef JSON_HEDLEY_VERSION_DECODE_MINOR\n#undef JSON_HEDLEY_VERSION_DECODE_REVISION\n#undef JSON_HEDLEY_VERSION_ENCODE\n#undef JSON_HEDLEY_WARNING\n#undef JSON_HEDLEY_WARN_UNUSED_RESULT\n#undef JSON_HEDLEY_WARN_UNUSED_RESULT_MSG\n#undef JSON_HEDLEY_FALL_THROUGH\n\n\n\n#endif  // INCLUDE_NLOHMANN_JSON_HPP_\n"
  },
  {
    "path": "src/m4/ax_check_opencl.m4",
    "content": "# Check if OpenCL is available and that it supports a CPU device.\n# The check for a CPU device is the same check that is performed\n# by opencl_create_device in ocl_utilities.c\nAC_DEFUN([AX_CHECK_OPENCL], [\n\tAC_SUBST(HAVE_OPENCL)\n\tHAVE_OPENCL=no\n\tAC_CHECK_HEADER([CL/opencl.h], [\n\t\tAC_CHECK_LIB([OpenCL], [clGetPlatformIDs], [\n\t\t\tSAVE_LIBS=$LIBS\n\t\t\tLIBS=\"$LIBS -lOpenCL\"\n\t\t\tAC_MSG_CHECKING([for OpenCL CPU device])\n\t\t\tAC_RUN_IFELSE([AC_LANG_PROGRAM(\n\t\t\t\t[[#include <CL/opencl.h>]], [[\n\tcl_platform_id platform;\n\tcl_device_id dev;\n\n\tif (clGetPlatformIDs(1, &platform, NULL) < 0)\n\t\treturn 1;\n\tif (clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &dev, NULL) < 0)\n\t\treturn 1;\n\t\t\t\t]])], [HAVE_OPENCL=yes])\n\t\t\tAC_MSG_RESULT($HAVE_OPENCL)\n\t\t\tLIBS=$SAVE_LIBS\n\t\t\t])])\n])\n"
  },
  {
    "path": "src/m4/ax_check_openmp.m4",
    "content": "# Check if $CC supports openmp.\nAC_DEFUN([AX_CHECK_OPENMP], [\n\tAC_SUBST(HAVE_OPENMP)\n\tHAVE_OPENMP=no\n\tAC_MSG_CHECKING([for OpenMP support by $CC])\n\techo | $CC -x c - -fsyntax-only -fopenmp -Werror >/dev/null 2>/dev/null\n\tif test $? -eq 0; then\n\t\tHAVE_OPENMP=yes\n\tfi\n\tAC_MSG_RESULT($HAVE_OPENMP)\n\n\tif test $HAVE_OPENMP = yes; then\n\t\tSAVE_CFLAGS=$CFLAGS\n\t\tCFLAGS=\"$CFLAGS -fopenmp\"\n\t\t# Using some version of clang, the value of \"m\" becomes zero\n\t\t# after the parallel for loop.\n\t\tAC_RUN_IFELSE([AC_LANG_PROGRAM([[\n\t\t#include <stdlib.h>\n\n\t\tstatic void f(int m, double A[m])\n\t\t{\n\t\t\t#pragma omp parallel for\n\t\t\tfor (int c0 = 0; c0 < m; c0 += 1)\n\t\t\t\tA[c0] = 0.;\n\t\t\tif (m != 100)\n\t\t\t\tabort();\n\t\t}\n\t\t]],[[\n\t\tdouble A[100];\n\n\t\tf(100, A);\n\t\t]])],[],[\n\t\t\tAC_MSG_NOTICE([OpenMP support broken, disabling])\n\t\t\tHAVE_OPENMP=no\n\t\t],[])\n\t\tCFLAGS=$SAVE_CFLAGS\n\tfi\n])\n"
  },
  {
    "path": "src/m4/ax_detect_git_head.m4",
    "content": "AC_DEFUN([AX_DETECT_GIT_HEAD], [\n\tAC_SUBST(GIT_HEAD_ID)\n\tAC_SUBST(GIT_HEAD)\n\tAC_SUBST(GIT_HEAD_VERSION)\n\tif test -f $srcdir/.git; then\n\t\tgitdir=`GIT_DIR=$srcdir/.git git rev-parse --git-dir`\n\t\tGIT_HEAD=\"$gitdir/index\"\n\t\tGIT_REPO=\"$gitdir\"\n\t\tGIT_HEAD_ID=`GIT_DIR=$GIT_REPO git describe --always`\n\telif test -f $srcdir/.git/HEAD; then\n\t\tGIT_HEAD=\"$srcdir/.git/index\"\n\t\tGIT_REPO=\"$srcdir/.git\"\n\t\tGIT_HEAD_ID=`GIT_DIR=$GIT_REPO git describe --always`\n\telif test -f $srcdir/GIT_HEAD_ID; then\n\t\tGIT_HEAD_ID=`cat $srcdir/GIT_HEAD_ID`\n\telse\n\t\tmysrcdir=`(cd $srcdir; pwd)`\n\t\thead=`basename $mysrcdir | sed -e 's/.*-//'`\n\t\thead2=`echo $head | sed -e 's/[^0-9a-f]//'`\n\t\thead3=`echo $head2 | sed -e 's/........................................//'`\n\t\tif test \"x$head3\" = \"x\" -a \"x$head\" = \"x$head2\"; then\n\t\t\tGIT_HEAD_ID=\"$head\"\n\t\telse\n\t\t\tGIT_HEAD_ID=\"UNKNOWN\"\n\t\tfi\n\tfi\n\tif test -z \"$GIT_REPO\" ; then\n\t\tGIT_HEAD_VERSION=\"$GIT_HEAD_ID\"\n\telse\n\t\tGIT_HEAD_VERSION=\"\\`GIT_DIR=$GIT_REPO git describe --always\\`\"\n\tfi\n])\n"
  },
  {
    "path": "src/m4/ax_submodule.m4",
    "content": "AC_DEFUN([_AX_SUBMODULE],\n[\n\nm4_if(m4_bregexp($3,|,choice),choice,\n\t[AC_ARG_WITH($2,\n\t\t[AS_HELP_STRING([--with-$1=$3],\n\t\t\t\t[Which $1 to use [default=$4]])])])\ncase \"system\" in\n$3)\n\tAC_ARG_WITH($2_prefix,\n\t\t    [AS_HELP_STRING([--with-$1-prefix=DIR],\n\t\t\t\t    [Prefix of $1 installation])])\n\tAC_ARG_WITH($2_exec_prefix,\n\t\t    [AS_HELP_STRING([--with-$1-exec-prefix=DIR],\n\t\t\t\t    [Exec prefix of $1 installation])])\nesac\nm4_if(m4_bregexp($3,build,build),build,\n\t[AC_ARG_WITH($2_builddir,\n\t\t[AS_HELP_STRING([--with-$1-builddir=DIR],\n\t\t\t\t[Location of $1 builddir])])])\nif test \"x$with_$2_prefix\" != \"x\" -a \"x$with_$2_exec_prefix\" = \"x\"; then\n\twith_$2_exec_prefix=$with_$2_prefix\nfi\nif test \"x$with_$2_prefix\" != \"x\" -o \"x$with_$2_exec_prefix\" != \"x\"; then\n\tif test \"x$with_$2\" != \"x\" -a \"x$with_$2\" != \"xsystem\"; then\n\t\tAC_MSG_ERROR([Setting $with_$2_prefix implies use of system $1])\n\tfi\n\twith_$2=\"system\"\nfi\nif test \"x$with_$2_builddir\" != \"x\"; then\n\tif test \"x$with_$2\" != \"x\" -a \"x$with_$2\" != \"xbuild\"; then\n\t\tAC_MSG_ERROR([Setting $with_$2_builddir implies use of build $1])\n\tfi\n\twith_$2=\"build\"\n\t$2_srcdir=`echo @abs_srcdir@ | $with_$2_builddir/config.status --file=-`\n\tAC_MSG_NOTICE($1 sources in $$2_srcdir)\nfi\nif test \"x$with_$2_exec_prefix\" != \"x\"; then\n\texport PKG_CONFIG_PATH=\"$with_$2_exec_prefix/lib/pkgconfig${PKG_CONFIG_PATH+:$PKG_CONFIG_PATH}\"\nfi\ncase \"$with_$2\" in\n$3)\n\t;;\n*)\n\tcase \"$4\" in\n\tbundled)\n\t\tif test -d $srcdir/.git -a \\\n\t\t\t-d $srcdir/$1 -a \\\n\t\t\t\"`cd $srcdir; git submodule status $1 | cut -c1`\" = '-'; then\n\t\t\tAC_MSG_WARN([git repo detected, but submodule $1 not initialized])\n\t\t\tAC_MSG_WARN([You may want to run])\n\t\t\tAC_MSG_WARN([\tgit submodule init])\n\t\t\tAC_MSG_WARN([\tgit submodule update])\n\t\t\tAC_MSG_WARN([\tsh autogen.sh])\n\t\tfi\n\t\tif test -f $srcdir/$1/configure; then\n\t\t\twith_$2=\"bundled\"\n\t\telse\n\t\t\tcase \"system\" in\n\t\t\t$3)\n\t\t\t\twith_$2=\"system\"\n\t\t\t\t;;\n\t\t\t*)\n\t\t\t\twith_$2=\"no\"\n\t\t\t\t;;\n\t\t\tesac\n\t\tfi\n\t\t;;\n\t*)\n\t\twith_$2=\"$4\"\n\t\t;;\n\tesac\n\t;;\nesac\nAC_MSG_CHECKING([which $1 to use])\nAC_MSG_RESULT($with_$2)\n\n])\n\nAC_DEFUN([AX_SUBMODULE], [\n\t_AX_SUBMODULE($1, m4_bpatsubst([$1],\n\t\t\t[[^_abcdefghijklmnopqrstuvwxyz0123456789]],[_]), $2, $3)\n])\n"
  },
  {
    "path": "src/main.cpp",
    "content": "#include <assert.h>\n#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n#include <isl/ctx.h>\n#include <isl/id.h>\n#include <isl/val.h>\n#include <isl/set.h>\n#include <isl/union_set.h>\n#include <isl/union_map.h>\n#include <isl/aff.h>\n#include <isl/flow.h>\n#include <isl/options.h>\n#include <isl/schedule.h>\n#include <isl/ast.h>\n#include <isl/id_to_ast_expr.h>\n#include <isl/ast_build.h>\n#include <isl/schedule.h>\n#include <isl/arg.h>\n#include <isl/options.h>\n#include <pet.h>\n#include \"ppcg.h\"\n#include \"ppcg_options.h\"\n//#include \"cuda.h\"\n//#include \"opencl.h\"\n//#include \"cpu.h\"\n\n#include <iostream>\n\nusing namespace std;\n\nint main(int argc, char **argv)\n{\n\tint r;\n\n\tr = autosa_main_wrap(argc, argv);\n\n\treturn r;\n}\n"
  },
  {
    "path": "src/ocl_utilities.c",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n#include \"ocl_utilities.h\"\n\n/* Return the OpenCL error string for a given error number.\n */\nconst char *opencl_error_string(cl_int error)\n{\n\tint errorCount;\n\tint index;\n\n\tstatic const char *errorString[] = {\n\t\t[CL_SUCCESS] = \"CL_SUCCESS\",\n\t\t[-CL_DEVICE_NOT_FOUND] = \"CL_DEVICE_NOT_FOUND\",\n\t\t[-CL_DEVICE_NOT_AVAILABLE] = \"CL_DEVICE_NOT_AVAILABLE\",\n\t\t[-CL_COMPILER_NOT_AVAILABLE] = \"CL_COMPILER_NOT_AVAILABLE\",\n\t\t[-CL_MEM_OBJECT_ALLOCATION_FAILURE] =\n\t\t\t\"CL_MEM_OBJECT_ALLOCATION_FAILURE\",\n\t\t[-CL_OUT_OF_RESOURCES] = \"CL_OUT_OF_RESOURCES\",\n\t\t[-CL_OUT_OF_HOST_MEMORY] = \"CL_OUT_OF_HOST_MEMORY\",\n\t\t[-CL_PROFILING_INFO_NOT_AVAILABLE] =\n\t\t\t\"CL_PROFILING_INFO_NOT_AVAILABLE\",\n\t\t[-CL_MEM_COPY_OVERLAP] = \"CL_MEM_COPY_OVERLAP\",\n\t\t[-CL_IMAGE_FORMAT_MISMATCH] = \"CL_IMAGE_FORMAT_MISMATCH\",\n\t\t[-CL_IMAGE_FORMAT_NOT_SUPPORTED] =\n\t\t\t\"CL_IMAGE_FORMAT_NOT_SUPPORTED\",\n\t\t[-CL_BUILD_PROGRAM_FAILURE] = \"CL_BUILD_PROGRAM_FAILURE\",\n\t\t[-CL_MAP_FAILURE] = \"CL_MAP_FAILURE\",\n\t\t[-CL_INVALID_VALUE] = \"CL_INVALID_VALUE\",\n\t\t[-CL_INVALID_DEVICE_TYPE] = \"CL_INVALID_DEVICE_TYPE\",\n\t\t[-CL_INVALID_PLATFORM] = \"CL_INVALID_PLATFORM\",\n\t\t[-CL_INVALID_DEVICE] = \"CL_INVALID_DEVICE\",\n\t\t[-CL_INVALID_CONTEXT] = \"CL_INVALID_CONTEXT\",\n\t\t[-CL_INVALID_QUEUE_PROPERTIES] = \"CL_INVALID_QUEUE_PROPERTIES\",\n\t\t[-CL_INVALID_COMMAND_QUEUE] = \"CL_INVALID_COMMAND_QUEUE\",\n\t\t[-CL_INVALID_HOST_PTR] = \"CL_INVALID_HOST_PTR\",\n\t\t[-CL_INVALID_MEM_OBJECT] = \"CL_INVALID_MEM_OBJECT\",\n\t\t[-CL_INVALID_IMAGE_FORMAT_DESCRIPTOR] =\n\t\t\t\"CL_INVALID_IMAGE_FORMAT_DESCRIPTOR\",\n\t\t[-CL_INVALID_IMAGE_SIZE] = \"CL_INVALID_IMAGE_SIZE\",\n\t\t[-CL_INVALID_SAMPLER] = \"CL_INVALID_SAMPLER\",\n\t\t[-CL_INVALID_BINARY] = \"CL_INVALID_BINARY\",\n\t\t[-CL_INVALID_BUILD_OPTIONS] = \"CL_INVALID_BUILD_OPTIONS\",\n\t\t[-CL_INVALID_PROGRAM] = \"CL_INVALID_PROGRAM\",\n\t\t[-CL_INVALID_PROGRAM_EXECUTABLE] =\n\t\t\t\"CL_INVALID_PROGRAM_EXECUTABLE\",\n\t\t[-CL_INVALID_KERNEL_NAME] = \"CL_INVALID_KERNEL_NAME\",\n\t\t[-CL_INVALID_KERNEL_DEFINITION] =\n\t\t\t\"CL_INVALID_KERNEL_DEFINITION\",\n\t\t[-CL_INVALID_KERNEL] = \"CL_INVALID_KERNEL\",\n\t\t[-CL_INVALID_ARG_INDEX] = \"CL_INVALID_ARG_INDEX\",\n\t\t[-CL_INVALID_ARG_VALUE] = \"CL_INVALID_ARG_VALUE\",\n\t\t[-CL_INVALID_ARG_SIZE] = \"CL_INVALID_ARG_SIZE\",\n\t\t[-CL_INVALID_KERNEL_ARGS] = \"CL_INVALID_KERNEL_ARGS\",\n\t\t[-CL_INVALID_WORK_DIMENSION] = \"CL_INVALID_WORK_DIMENSION\",\n\t\t[-CL_INVALID_WORK_GROUP_SIZE] = \"CL_INVALID_WORK_GROUP_SIZE\",\n\t\t[-CL_INVALID_WORK_ITEM_SIZE] = \"CL_INVALID_WORK_ITEM_SIZE\",\n\t\t[-CL_INVALID_GLOBAL_OFFSET] = \"CL_INVALID_GLOBAL_OFFSET\",\n\t\t[-CL_INVALID_EVENT_WAIT_LIST] = \"CL_INVALID_EVENT_WAIT_LIST\",\n\t\t[-CL_INVALID_EVENT] = \"CL_INVALID_EVENT\",\n\t\t[-CL_INVALID_OPERATION] = \"CL_INVALID_OPERATION\",\n\t\t[-CL_INVALID_GL_OBJECT] = \"CL_INVALID_GL_OBJECT\",\n\t\t[-CL_INVALID_BUFFER_SIZE] = \"CL_INVALID_BUFFER_SIZE\",\n\t\t[-CL_INVALID_MIP_LEVEL] = \"CL_INVALID_MIP_LEVEL\",\n\t\t[-CL_INVALID_GLOBAL_WORK_SIZE] = \"CL_INVALID_GLOBAL_WORK_SIZE\",\n\t\t[-CL_INVALID_PROPERTY] = \"CL_INVALID_PROPERTY\"\n\t};\n\n\terrorCount = sizeof(errorString) / sizeof(errorString[0]);\n\tindex = -error;\n\n\treturn (index >= 0 && index < errorCount) ?\n\t\terrorString[index] : \"Unspecified Error\";\n}\n\n/* Find a GPU or a CPU associated with the first available platform.\n * If use_gpu is set, then this function first tries to look for a GPU\n * in the first available platform.\n * If this fails or if use_gpu is not set, then it tries to use the CPU.\n */\ncl_device_id opencl_create_device(int use_gpu)\n{\n\tcl_platform_id platform;\n\tcl_device_id dev;\n\tint err;\n\n\terr = clGetPlatformIDs(1, &platform, NULL);\n\tif (err < 0) {\n\t\tfprintf(stderr, \"Error %s while looking for a platform.\\n\",\n\t\t\t\topencl_error_string(err));\n\t\texit(1);\n\t}\n\n\terr = CL_DEVICE_NOT_FOUND;\n\tif (use_gpu)\n\t\terr = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &dev,\n\t\t\t\tNULL);\n\tif (err == CL_DEVICE_NOT_FOUND)\n\t\terr = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &dev,\n\t\t\t\tNULL);\n\tif (err < 0) {\n\t\tfprintf(stderr, \"Error %s while looking for a device.\\n\",\n\t\t\t\topencl_error_string(err));\n\t\texit(1);\n\t}\n\treturn dev;\n}\n\n/* Create an OpenCL program from a string and compile it.\n */\ncl_program opencl_build_program_from_string(cl_context ctx, cl_device_id dev,\n\tconst char *program_source, size_t program_size,\n\tconst char *opencl_options)\n{\n\tint err;\n\tcl_program program;\n\tchar *program_log;\n\tsize_t log_size;\n\n\tprogram = clCreateProgramWithSource(ctx, 1,\n\t\t\t&program_source, &program_size, &err);\n\tif (err < 0) {\n\t\tfprintf(stderr, \"Could not create the program\\n\");\n\t\texit(1);\n\t}\n\terr = clBuildProgram(program, 0, NULL, opencl_options, NULL, NULL);\n\tif (err < 0) {\n\t\tfprintf(stderr, \"Could not build the program.\\n\");\n\t\tclGetProgramBuildInfo(program, dev, CL_PROGRAM_BUILD_LOG, 0,\n\t\t\t\tNULL, &log_size);\n\t\tprogram_log = (char *) malloc(log_size + 1);\n\t\tprogram_log[log_size] = '\\0';\n\t\tclGetProgramBuildInfo(program, dev, CL_PROGRAM_BUILD_LOG,\n\t\t\t\tlog_size + 1, program_log, NULL);\n\t\tfprintf(stderr, \"%s\\n\", program_log);\n\t\tfree(program_log);\n\t\texit(1);\n\t}\n\treturn program;\n}\n\n/* Create an OpenCL program from a source file and compile it.\n */\ncl_program opencl_build_program_from_file(cl_context ctx, cl_device_id dev,\n\tconst char* filename, const char* opencl_options)\n{\n\tcl_program program;\n\tFILE *program_file;\n\tchar *program_source;\n\tsize_t program_size, read;\n\n\tprogram_file = fopen(filename, \"r\");\n\tif (program_file == NULL) {\n\t\tfprintf(stderr, \"Could not find the source file.\\n\");\n\t\texit(1);\n\t}\n\tfseek(program_file, 0, SEEK_END);\n\tprogram_size = ftell(program_file);\n\trewind(program_file);\n\tprogram_source = (char *) malloc(program_size + 1);\n\tprogram_source[program_size] = '\\0';\n\tread = fread(program_source, sizeof(char), program_size, program_file);\n\tif (read != program_size) {\n\t\tfprintf(stderr, \"Error while reading the kernel.\\n\");\n\t\texit(1);\n\t}\n\tfclose(program_file);\n\n\tprogram = opencl_build_program_from_string(ctx, dev, program_source,\n\t\t\t\t\t\tprogram_size, opencl_options);\n\tfree(program_source);\n\n\treturn program;\n}\n"
  },
  {
    "path": "src/ocl_utilities.h",
    "content": "#ifndef OCL_UTILITIES_H\n#define OCL_UTILITIES_H\n\n#if defined(__APPLE__)\n#include <OpenCL/opencl.h>\n#else\n#include <CL/opencl.h>\n#endif\n\n/* Return the OpenCL error string for a given error number.\n */\nconst char *opencl_error_string(cl_int error);\n\n/* Find a GPU or a CPU associated with the first available platform.\n * If use_gpu is set, then this function first tries to look for a GPU\n * in the first available platform.\n * If this fails or if use_gpu is not set, then it tries to use the CPU.\n */\ncl_device_id opencl_create_device(int use_gpu);\n\n/* Create an OpenCL program from a string and compile it.\n */\ncl_program opencl_build_program_from_string(cl_context ctx, cl_device_id dev,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tconst char *program_source, size_t program_size,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tconst char *opencl_options);\n\n/* Create an OpenCL program from a source file and compile it.\n */\ncl_program opencl_build_program_from_file(cl_context ctx, cl_device_id dev,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tconst char *filename, const char *opencl_options);\n\n#endif\n"
  },
  {
    "path": "src/opencl_test.sh.in",
    "content": "#!/bin/sh\n\nkeep=no\n\nfor option; do\n\tcase \"$option\" in\n\t\t--keep)\n\t\t\tkeep=yes\n\t\t\t;;\n\tesac\ndone\n\nEXEEXT=@EXEEXT@\nVERSION=@GIT_HEAD_VERSION@\nCC=\"@CC@\"\nCFLAGS=\"--std=gnu99\"\nsrcdir=\"@srcdir@\"\n\nif [ $keep = \"yes\" ]; then\n\tOUTDIR=\"opencl_test.$VERSION\"\n\tmkdir \"$OUTDIR\" || exit 1\nelse\n\tif test \"x$TMPDIR\" = \"x\"; then\n\t\tTMPDIR=/tmp\n\tfi\n\tOUTDIR=`mktemp -d $TMPDIR/ppcg.XXXXXXXXXX` || exit 1\nfi\n\nrun_tests () {\n\tsubdir=$1\n\tppcg_options=$2\n\n\techo Test with PPCG options \\'$ppcg_options\\'\n\tmkdir ${OUTDIR}/${subdir} || exit 1\n\tfor i in $srcdir/tests/*.c; do\n\t\techo $i\n\t\tname=`basename $i`\n\t\tname=\"${name%.c}\"\n\t\tout_c=\"${OUTDIR}/${subdir}/$name.ppcg.c\"\n\t\tout=\"${OUTDIR}/${subdir}/$name.ppcg$EXEEXT\"\n\t\toptions=\"--target=opencl --opencl-no-use-gpu $ppcg_options\"\n\t\tfunctions=\"$srcdir/tests/${name}_opencl_functions.cl\"\n\t\tif test -f $functions; then\n\t\t\toptions=\"$options --opencl-include-file=$functions\"\n\t\t\toptions=\"$options --opencl-compiler-options=-I.\"\n\t\tfi\n\t\t./ppcg$EXEEXT $options $i -o \"$out_c\" || exit\n\t\t$CC $CFLAGS -I \"$srcdir\" \"$srcdir/ocl_utilities.c\" -lOpenCL \\\n\t\t\t-I. \"$out_c\" -o \"$out\" || exit\n\t\t$out || exit\n\tdone\n}\n\nrun_tests default\nrun_tests embed --opencl-embed-kernel-code\n\nfor i in $srcdir/examples/*.c; do\n\techo $i\n\tname=`basename $i`\n\tname=\"${name%.c}\"\n\texe_ref=\"${OUTDIR}/$name.ref$EXEEXT\"\n\tgen_ocl=\"${OUTDIR}/$name.ppcg.c\"\n\texe_ocl=\"${OUTDIR}/$name.ppcg$EXEEXT\"\n\toutput_ref=\"${OUTDIR}/$name.ref.out\"\n\toutput_ocl=\"${OUTDIR}/$name.ppcg.out\"\n\t$CC $CFLAGS $i -o $exe_ref || exit\n\t./ppcg$EXEEXT --target=opencl --opencl-no-use-gpu $i -o \"$gen_ocl\" || \\\n\t\texit\n\t$CC $CFLAGS -I \"$srcdir\" \"$srcdir/ocl_utilities.c\" -lOpenCL \\\n\t\t\"$gen_ocl\" -o \"$exe_ocl\" || exit\n\t$exe_ref > $output_ref || exit\n\t$exe_ocl > $output_ocl || exit\n\tcmp $output_ref $output_ocl || exit\ndone\n\nif [ $keep = \"no\" ]; then\n\trm -r \"${OUTDIR}\"\nfi\n"
  },
  {
    "path": "src/polybench_test.sh.in",
    "content": "#!/bin/sh\n\nkeep=no\nverbose=no\n\nfor option; do\n\tcase \"$option\" in\n\t\t--keep)\n\t\t\tkeep=yes\n\t\t\t;;\n\t\t--verbose)\n\t\t\tverbose=yes\n\t\t\t;;\n\tesac\ndone\n\nEXEEXT=@EXEEXT@\nDIR=@POLYBENCH_DIR@\nVERSION=@GIT_HEAD_VERSION@\nSIZE=-DMINI_DATASET\nCC=\"@CC@\"\nHAVE_OPENCL=@HAVE_OPENCL@\nHAVE_OPENMP=@HAVE_OPENMP@\nsrcdir=\"@srcdir@\"\nif [ $keep = \"yes\" ]; then\n\tOUTDIR=\"out.$VERSION\"\n\tmkdir \"$OUTDIR\" || exit 1\nelse\n\tif test \"x$TMPDIR\" = \"x\"; then\n\t\tTMPDIR=/tmp\n\tfi\n\tOUTDIR=`mktemp -d $TMPDIR/ppcg.XXXXXXXXXX` || exit 1\nfi\nCPPFLAGS=\"-DPOLYBENCH_USE_C99_PROTO -DPOLYBENCH_DUMP_ARRAYS\"\nCPPFLAGS=\"$CPPFLAGS $SIZE -I $DIR/utilities\"\nCFLAGS=\"-lm --std=gnu99\"\n\necho \"Running tests in folder ${OUTDIR}\"\n\nrun_tests () {\n\text=$1\n\n\tppcg_options=$2\n\tcc_options=$3\n\n\tif [ \"x$ppcg_options\" = \"x\" ]; then\n\t\tppcg_option_str=\"none\"\n\telse\n\t\tppcg_option_str=$ppcg_options\n\tfi\n\n\tif [ \"x$cc_options\" = \"x\" ]; then\n\t\tcc_option_str=\"none\"\n\telse\n\t\tcc_option_str=$cc_options\n\tfi\n\n\techo Test: $ext, ppcg options: $ppcg_option_str, CC options: $cc_option_str\n\tfor i in `cat $DIR/utilities/benchmark_list`; do\n\t\techo $i\n\t\tname=`basename $i`\n\t\tname=${name%.c}\n\t\tsource_opt=\"${OUTDIR}/$name.$ext.c\"\n\t\tprog_orig=${OUTDIR}/$name.orig${EXEEXT}\n\t\tprog_opt=${OUTDIR}/$name.$ext${EXEEXT}\n\t\toutput_orig=${OUTDIR}/$name.orig.out\n\t\toutput_opt=${OUTDIR}/$name.$ext.out\n\t\tdir=`dirname $i`\n\t\tif [ $verbose = \"yes\" ]; then\n\t\t\techo ./ppcg$EXEEXT -I $DIR/$dir $DIR/$i \\\n\t\t\t\t$CPPFLAGS -o $source_opt $ppcg_options\n\t\tfi\n\t\t./ppcg$EXEEXT -I $DIR/$dir $DIR/$i $CPPFLAGS \\\n\t\t\t-o $source_opt $ppcg_options || exit\n\t\t$CC -I $DIR/$dir $CPPFLAGS $DIR/$i -o $prog_orig \\\n\t\t\t$DIR/utilities/polybench.c $CFLAGS\n\t\t$prog_orig 2> $output_orig\n\t\tif [ $verbose = \"yes\" ]; then\n\t\t\techo $CC -I $DIR/$dir $CPPFLAGS $source_opt \\\n\t\t\t\t-o $prog_opt $DIR/utilities/polybench.c \\\n\t\t\t\t$CFLAGS $cc_options\n\t\tfi\n\t\t$CC -I $DIR/$dir $CPPFLAGS $source_opt -o $prog_opt \\\n\t\t\t$DIR/utilities/polybench.c $CFLAGS $cc_options || exit\n\n\t\t$prog_opt 2> $output_opt || exit\n\t\tcmp $output_orig $output_opt || exit\n\tdone\n}\n\nrun_tests ppcg \"--target=c --tile\"\nrun_tests ppcg_live \"--target=c --no-live-range-reordering --tile\"\n\n# Test OpenMP code, if compiler supports openmp\nif [ $HAVE_OPENMP = \"yes\" ]; then\n\trun_tests ppcg_omp \"--target=c --openmp\" -fopenmp\n\techo Introduced `grep -R 'omp parallel' \"${OUTDIR}\" | wc -l` '\"pragma omp parallel for\"'\nelse\n\techo Compiler does not support OpenMP. Skipping OpenMP tests.\nfi\n\nif [ $HAVE_OPENCL = \"yes\" ]; then\n\trun_tests ppcg_opencl \"--target=opencl --opencl-no-use-gpu\" \\\n\t\t\t\t\"-I $srcdir $srcdir/ocl_utilities.c -lOpenCL\"\nfi\n\nif [ $keep = \"no\" ]; then\n\trm -r \"${OUTDIR}\"\nfi\n"
  },
  {
    "path": "src/ppcg.c",
    "content": "/*\n * Copyright 2011      INRIA Saclay\n * Copyright 2013      Ecole Normale Superieure\n * Copyright 2015      Sven Verdoolaege\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege, INRIA Saclay - Ile-de-France,\n * Parc Club Orsay Universite, ZAC des vignes, 4 rue Jacques Monod,\n * 91893 Orsay, France\n * and Ecole Normale Superieure, 45 rue d'Ulm, 75230 Paris, France\n */\n\n#include <assert.h>\n#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n#include <isl/ctx.h>\n#include <isl/id.h>\n#include <isl/val.h>\n#include <isl/set.h>\n#include <isl/union_set.h>\n#include <isl/union_map.h>\n#include <isl/space.h>\n#include <isl/aff.h>\n#include <isl/flow.h>\n#include <isl/options.h>\n#include <isl/schedule.h>\n#include <isl/ast.h>\n#include <isl/id_to_ast_expr.h>\n#include <isl/ast_build.h>\n#include <isl/schedule.h>\n#include <isl/constraint.h>\n#include <pet.h>\n#include <math.h>\n#include \"ppcg.h\"\n#include \"ppcg_options.h\"\n//#include \"cuda.h\"\n//#include \"opencl.h\"\n//#include \"cpu.h\"\n#include \"autosa_xilinx_hls_c.h\"\n#include \"autosa_intel_opencl.h\"\n#include \"autosa_catapult_hls_c.h\"\n#include \"autosa_tapa_cpp.h\"\n\n//#define _DEBUG\n\nstruct options {\n\tstruct pet_options *pet;\n\tstruct ppcg_options *ppcg;\n\tchar *input;\n\tchar *output;\n};\n\n//const char *ppcg_version(void);\n//static void print_version(void)\n//{\n//\tprintf(\"%s\", ppcg_version());\n//}\n\nISL_ARGS_START(struct options, options_args)\nISL_ARG_CHILD(struct options, pet, \"pet\", &pet_options_args, \"pet options\")\nISL_ARG_CHILD(struct options, ppcg, NULL, &ppcg_options_args, \"ppcg options\")\nISL_ARG_STR(struct options, output, 'o', NULL,\n\t\"filename\", NULL, \"output filename (c and opencl targets)\")\nISL_ARG_ARG(struct options, input, \"input\", NULL)\n//ISL_ARG_VERSION(print_version)\nISL_ARGS_END\n\nISL_ARG_DEF(options, struct options, options_args)\n\n/* Return a pointer to the final path component of \"filename\" or\n * to \"filename\" itself if it does not contain any components.\n */\nconst char *ppcg_base_name(const char *filename)\n{\n\tconst char *base;\n\n\tbase = strrchr(filename, '/');\n\tif (base)\n\t\treturn ++base;\n\telse\n\t\treturn filename;\n}\n\n/* Copy the base name of \"input\" to \"name\" and return its length.\n * \"name\" is not NULL terminated.\n *\n * In particular, remove all leading directory components and\n * the final extension, if any.\n */\nint ppcg_extract_base_name(char *name, const char *input)\n{\n\tconst char *base;\n\tconst char *ext;\n\tint len;\n\n\tbase = ppcg_base_name(input);\n\text = strrchr(base, '.');\n\tlen = ext ? ext - base : strlen(base);\n\n\tmemcpy(name, base, len);\n\n\treturn len;\n}\n\n/* Does \"scop\" refer to any arrays that are declared, but not\n * exposed to the code after the scop?\n */\nint ppcg_scop_any_hidden_declarations(struct ppcg_scop *scop)\n{\n\tint i;\n\n\tif (!scop)\n\t\treturn 0;\n\n\tfor (i = 0; i < scop->pet->n_array; ++i)\n\t\tif (scop->pet->arrays[i]->declared &&\n\t\t    !scop->pet->arrays[i]->exposed)\n\t\t\treturn 1;\n\n\treturn 0;\n}\n\n/* Collect all variable names that are in use in \"scop\".\n * In particular, collect all parameters in the context and\n * all the array names.\n * Store these names in an isl_id_to_ast_expr by mapping\n * them to a dummy value (0).\n */\nstatic __isl_give isl_id_to_ast_expr *collect_names(struct pet_scop *scop)\n{\n\tint i, n;\n\tisl_ctx *ctx;\n\tisl_ast_expr *zero;\n\tisl_id_to_ast_expr *names;\n\n\tctx = isl_set_get_ctx(scop->context);\n\n\tn = isl_set_dim(scop->context, isl_dim_param);\n\n\tnames = isl_id_to_ast_expr_alloc(ctx, n + scop->n_array);\n\tzero = isl_ast_expr_from_val(isl_val_zero(ctx));\n\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_id *id;\n\n\t\tid = isl_set_get_dim_id(scop->context, isl_dim_param, i);\n\t\tnames = isl_id_to_ast_expr_set(names,\n\t\t\t\t\t\tid, isl_ast_expr_copy(zero));\n\t}\n\n\tfor (i = 0; i < scop->n_array; ++i) {\n\t\tstruct pet_array *array = scop->arrays[i];\n\t\tisl_id *id;\n\n\t\tid = isl_set_get_tuple_id(array->extent);\n\t\tnames = isl_id_to_ast_expr_set(names,\n\t\t\t\t\t\tid, isl_ast_expr_copy(zero));\n\t}\n\n\tisl_ast_expr_free(zero);\n\n\treturn names;\n}\n\n/* Return an isl_id called \"prefix%d\", with \"%d\" set to \"i\".\n * If an isl_id with such a name already appears among the variable names\n * of \"scop\", then adjust the name to \"prefix%d_%d\".\n */\nstatic __isl_give isl_id *generate_name(struct ppcg_scop *scop,\n\tconst char *prefix, int i)\n{\n\tint j;\n\tchar name[23];\n\tisl_ctx *ctx;\n\tisl_id *id;\n\tint has_name;\n\n\tctx = isl_set_get_ctx(scop->context);\n\tsnprintf(name, sizeof(name), \"%s%d\", prefix, i);\n\tid = isl_id_alloc(ctx, name, NULL);\n\n\tj = 0;\n\twhile ((has_name = isl_id_to_ast_expr_has(scop->names, id)) == 1) {\n\t\tisl_id_free(id);\n\t\tsnprintf(name, sizeof(name), \"%s%d_%d\", prefix, i, j++);\n\t\tid = isl_id_alloc(ctx, name, NULL);\n\t}\n\n\treturn has_name < 0 ? isl_id_free(id) : id;\n}\n\n/* Return a list of \"n\" isl_ids of the form \"prefix%d\".\n * If an isl_id with such a name already appears among the variable names\n * of \"scop\", then adjust the name to \"prefix%d_%d\".\n */\n__isl_give isl_id_list *ppcg_scop_generate_names(struct ppcg_scop *scop,\n\tint n, const char *prefix)\n{\n\tint i;\n\tisl_ctx *ctx;\n\tisl_id_list *names;\n\n\tctx = isl_set_get_ctx(scop->context);\n\tnames = isl_id_list_alloc(ctx, n);\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_id *id;\n\n\t\tid = generate_name(scop, prefix, i);\n\t\tnames = isl_id_list_add(names, id);\n\t}\n\n\treturn names;\n}\n\n/* Is \"stmt\" not a kill statement?\n */\nstatic int is_not_kill(struct pet_stmt *stmt)\n{\n\treturn !pet_stmt_is_kill(stmt);\n}\n\n/* Collect the iteration domains of the statements in \"scop\" that\n * satisfy \"pred\".\n */\nstatic __isl_give isl_union_set *collect_domains(struct pet_scop *scop,\n\tint (*pred)(struct pet_stmt *stmt))\n{\n\tint i;\n\tisl_set *domain_i;\n\tisl_union_set *domain;\n\n\tif (!scop)\n\t\treturn NULL;\n\n\tdomain = isl_union_set_empty(isl_set_get_space(scop->context));\n\n\tfor (i = 0; i < scop->n_stmt; ++i) {\n\t\tstruct pet_stmt *stmt = scop->stmts[i];\n\n\t\tif (!pred(stmt))\n\t\t\tcontinue;\n\n\t\tif (stmt->n_arg > 0)\n\t\t\tisl_die(isl_union_set_get_ctx(domain),\n\t\t\t\tisl_error_unsupported,\n\t\t\t\t\"data dependent conditions not supported\",\n\t\t\t\treturn isl_union_set_free(domain));\n\n\t\tdomain_i = isl_set_copy(scop->stmts[i]->domain);\n\t\tdomain = isl_union_set_add_set(domain, domain_i);\n\t}\n\n\treturn domain;\n}\n\n/* Collect the iteration domains of the statements in \"scop\",\n * skipping kill statements.\n */\nstatic __isl_give isl_union_set *collect_non_kill_domains(struct pet_scop *scop)\n{\n\treturn collect_domains(scop, &is_not_kill);\n}\n\n/* This function is used as a callback to pet_expr_foreach_call_expr\n * to detect if there is any call expression in the input expression.\n * Assign the value 1 to the integer that \"user\" points to and\n * abort the search since we have found what we were looking for.\n */\nstatic int set_has_call(__isl_keep pet_expr *expr, void *user)\n{\n\tint *has_call = user;\n\n\t*has_call = 1;\n\n\treturn -1;\n}\n\n/* Does \"expr\" contain any call expressions?\n */\nstatic int expr_has_call(__isl_keep pet_expr *expr)\n{\n\tint has_call = 0;\n\n\tif (pet_expr_foreach_call_expr(expr, &set_has_call, &has_call) < 0 &&\n\t    !has_call)\n\t\treturn -1;\n\n\treturn has_call;\n}\n\n/* This function is a callback for pet_tree_foreach_expr.\n * If \"expr\" contains any call (sub)expressions, then set *has_call\n * and abort the search.\n */\nstatic int check_call(__isl_keep pet_expr *expr, void *user)\n{\n\tint *has_call = user;\n\n\tif (expr_has_call(expr))\n\t\t*has_call = 1;\n\n\treturn *has_call ? -1 : 0;\n}\n\n/* Does \"stmt\" contain any call expressions?\n */\nstatic int has_call(struct pet_stmt *stmt)\n{\n\tint has_call = 0;\n\n\tif (pet_tree_foreach_expr(stmt->body, &check_call, &has_call) < 0 &&\n\t    !has_call)\n\t\treturn -1;\n\n\treturn has_call;\n}\n\n/* Collect the iteration domains of the statements in \"scop\"\n * that contain a call expression.\n */\nstatic __isl_give isl_union_set *collect_call_domains(struct pet_scop *scop)\n{\n\treturn collect_domains(scop, &has_call);\n}\n\n/* Given a union of \"tagged\" access relations of the form\n *\n *\t[S_i[...] -> R_j[]] -> A_k[...]\n *\n * project out the \"tags\" (R_j[]).\n * That is, return a union of relations of the form\n *\n *\tS_i[...] -> A_k[...]\n */\nstatic __isl_give isl_union_map *project_out_tags(\n\t__isl_take isl_union_map *umap)\n{\n\treturn isl_union_map_domain_factor_domain(umap);\n}\n\n/* Construct a function from tagged iteration domains to the corresponding\n * untagged iteration domains with as range of the wrapped map in the domain\n * the reference tags that appear in any of the reads, writes or kills.\n * Store the result in ps->tagger.\n *\n * For example, if the statement with iteration space S[i,j]\n * contains two array references R_1[] and R_2[], then ps->tagger will contain\n *\n *\t{ [S[i,j] -> R_1[]] -> S[i,j]; [S[i,j] -> R_2[]] -> S[i,j] }\n */\nstatic void compute_tagger(struct ppcg_scop *ps)\n{\n\tisl_union_map *tagged;\n\tisl_union_pw_multi_aff *tagger;\n\n\ttagged = isl_union_map_copy(ps->tagged_reads);\n\ttagged = isl_union_map_union(tagged,\n\t\t\t\tisl_union_map_copy(ps->tagged_may_writes));\n\ttagged = isl_union_map_union(tagged,\n\t\t\t\tisl_union_map_copy(ps->tagged_must_kills));\n\ttagged = isl_union_map_universe(tagged);\n\ttagged = isl_union_set_unwrap(isl_union_map_domain(tagged));\n\n\ttagger = isl_union_map_domain_map_union_pw_multi_aff(tagged);\n\n\tps->tagger = tagger;\n}\n\n/* Compute the live out accesses, i.e., the writes that are\n * potentially not killed by any kills or any other writes, and\n * store them in ps->live_out.\n *\n * We compute the \"dependence\" of any \"kill\" (an explicit kill\n * or a must write) on any may write.\n * The elements accessed by the may writes with a \"depending\" kill\n * also accessing the element are definitely killed.\n * The remaining may writes can potentially be live out.\n *\n * The result of the dependence analysis is\n *\n *\t{ IW -> [IK -> A] }\n *\n * with IW the instance of the write statement, IK the instance of kill\n * statement and A the element that was killed.\n * The range factor range is\n *\n *\t{ IW -> A }\n *\n * containing all such pairs for which there is a kill statement instance,\n * i.e., all pairs that have been killed.\n */\nstatic void compute_live_out(struct ppcg_scop *ps)\n{\n\tisl_schedule *schedule;\n\tisl_union_map *kills;\n\tisl_union_map *exposed;\n\tisl_union_map *covering;\n\tisl_union_access_info *access;\n\tisl_union_flow *flow;\n\n\tschedule = isl_schedule_copy(ps->schedule);\n\tkills = isl_union_map_union(isl_union_map_copy(ps->must_writes),\n\t\t\t\t    isl_union_map_copy(ps->must_kills));\n\taccess = isl_union_access_info_from_sink(kills);\n\taccess = isl_union_access_info_set_may_source(access,\n\t\t\t\t    isl_union_map_copy(ps->may_writes));\n\taccess = isl_union_access_info_set_schedule(access, schedule);\n\tflow = isl_union_access_info_compute_flow(access);\n\tcovering = isl_union_flow_get_full_may_dependence(flow);\n\tisl_union_flow_free(flow);\n\n\tcovering = isl_union_map_range_factor_range(covering);\n\texposed = isl_union_map_copy(ps->may_writes);\n\texposed = isl_union_map_subtract(exposed, covering);\n\tps->live_out = exposed;\n}\n\n/* Compute the tagged flow dependences and the live_in accesses and store\n * the results in ps->tagged_dep_flow and ps->live_in.\n *\n * Both must-writes and must-kills are allowed to kill dependences\n * from earlier writes to subsequent reads.\n * The must-kills are not included in the potential sources, though.\n * The flow dependences with a must-kill as source would\n * reflect possibly uninitialized reads.\n * No dependences need to be introduced to protect such reads\n * (other than those imposed by potential flows from may writes\n * that follow the kill).  Those flow dependences are therefore not needed.\n * The dead code elimination also assumes\n * the flow sources are non-kill instances.\n */\nstatic void compute_tagged_flow_dep_only(struct ppcg_scop *ps)\n{\n\tisl_union_pw_multi_aff *tagger;\n\tisl_schedule *schedule;\n\tisl_union_map *live_in;\n\tisl_union_access_info *access;\n\tisl_union_flow *flow;\n\tisl_union_map *must_source;\n\tisl_union_map *kills;\n\tisl_union_map *tagged_flow;\n\n\ttagger = isl_union_pw_multi_aff_copy(ps->tagger);\n\tschedule = isl_schedule_copy(ps->schedule);\n\tschedule = isl_schedule_pullback_union_pw_multi_aff(schedule, tagger);\n\tkills = isl_union_map_copy(ps->tagged_must_kills);\n\tmust_source = isl_union_map_copy(ps->tagged_must_writes);\n\tkills = isl_union_map_union(kills, must_source);\n\taccess = isl_union_access_info_from_sink(\n\t\t\t\tisl_union_map_copy(ps->tagged_reads));\n\taccess = isl_union_access_info_set_kill(access, kills);\n\taccess = isl_union_access_info_set_may_source(access,\n\t\t\t\tisl_union_map_copy(ps->tagged_may_writes));\n\taccess = isl_union_access_info_set_schedule(access, schedule);\n\tflow = isl_union_access_info_compute_flow(access);\n\ttagged_flow = isl_union_flow_get_may_dependence(flow);\n\tps->tagged_dep_flow = tagged_flow;\n\tlive_in = isl_union_flow_get_may_no_source(flow);\n\tps->live_in = project_out_tags(live_in);\n\tisl_union_flow_free(flow);\n}\n\n/* Compute ps->dep_flow from ps->tagged_dep_flow\n * by projecting out the reference tags.\n */\nstatic void derive_flow_dep_from_tagged_flow_dep(struct ppcg_scop *ps)\n{\n\tps->dep_flow = isl_union_map_copy(ps->tagged_dep_flow);\n\tps->dep_flow = isl_union_map_factor_domain(ps->dep_flow);\n}\n\n/* Compute the flow dependences and the live_in accesses and store\n * the results in ps->dep_flow and ps->live_in.\n * A copy of the flow dependences, tagged with the reference tags\n * is stored in ps->tagged_dep_flow.\n *\n * We first compute ps->tagged_dep_flow, i.e., the tagged flow dependences\n * and then project out the tags.\n */\nstatic void compute_tagged_flow_dep(struct ppcg_scop *ps)\n{\n\tcompute_tagged_flow_dep_only(ps);\n\tderive_flow_dep_from_tagged_flow_dep(ps);\n}\n\n/* Compute the order dependences that prevent the potential live ranges\n * from overlapping.\n *\n * In particular, construct a union of relations\n *\n *\t[R[...] -> R_1[]] -> [W[...] -> R_2[]]\n *\n * where [R[...] -> R_1[]] is the range of one or more live ranges\n * (i.e., a read) and [W[...] -> R_2[]] is the domain of one or more\n * live ranges (i.e., a write).  Moreover, the read and the write\n * access the same memory element and the read occurs before the write\n * in the original schedule.\n * The scheduler allows some of these dependences to be violated, provided\n * the adjacent live ranges are all local (i.e., their domain and range\n * are mapped to the same point by the current schedule band).\n *\n * Note that if a live range is not local, then we need to make\n * sure it does not overlap with _any_ other live range, and not\n * just with the \"previous\" and/or the \"next\" live range.\n * We therefore add order dependences between reads and\n * _any_ later potential write.\n *\n * We also need to be careful about writes without a corresponding read.\n * They are already prevented from moving past non-local preceding\n * intervals, but we also need to prevent them from moving past non-local\n * following intervals.  We therefore also add order dependences from\n * potential writes that do not appear in any intervals\n * to all later potential writes.\n * Note that dead code elimination should have removed most of these\n * dead writes, but the dead code elimination may not remove all dead writes,\n * so we need to consider them to be safe.\n *\n * The order dependences are computed by computing the \"dataflow\"\n * from the above unmatched writes and the reads to the may writes.\n * The unmatched writes and the reads are treated as may sources\n * such that they would not kill order dependences from earlier\n * such writes and reads.\n */\nstatic void compute_order_dependences(struct ppcg_scop *ps)\n{\n\tisl_union_map *reads;\n\tisl_union_map *shared_access;\n\tisl_union_set *matched;\n\tisl_union_map *unmatched;\n\tisl_union_pw_multi_aff *tagger;\n\tisl_schedule *schedule;\n\tisl_union_access_info *access;\n\tisl_union_flow *flow;\n\n\ttagger = isl_union_pw_multi_aff_copy(ps->tagger);\n\tschedule = isl_schedule_copy(ps->schedule);\n\tschedule = isl_schedule_pullback_union_pw_multi_aff(schedule, tagger);\n\treads = isl_union_map_copy(ps->tagged_reads);\n\tmatched = isl_union_map_domain(isl_union_map_copy(ps->tagged_dep_flow));\n\tunmatched = isl_union_map_copy(ps->tagged_may_writes);\n\tunmatched = isl_union_map_subtract_domain(unmatched, matched);\n\treads = isl_union_map_union(reads, unmatched);\n\taccess = isl_union_access_info_from_sink(\n\t\t\t\tisl_union_map_copy(ps->tagged_may_writes));\n\taccess = isl_union_access_info_set_may_source(access, reads);\n\taccess = isl_union_access_info_set_schedule(access, schedule);\n\tflow = isl_union_access_info_compute_flow(access);\n\tshared_access = isl_union_flow_get_may_dependence(flow);\n\tisl_union_flow_free(flow);\n\n\tps->tagged_dep_order = isl_union_map_copy(shared_access);\n\tps->dep_order = isl_union_map_factor_domain(shared_access);\n}\n\n/* Compute those validity dependences of the program represented by \"scop\"\n * that should be unconditionally enforced even when live-range reordering\n * is used.\n *\n * In particular, compute the external false dependences\n * as well as order dependences between sources with the same sink.\n * The anti-dependences are already taken care of by the order dependences.\n * The external false dependences are only used to ensure that live-in and\n * live-out data is not overwritten by any writes inside the scop.\n * The independences are removed from the external false dependences,\n * but not from the order dependences between sources with the same sink.\n *\n * In particular, the reads from live-in data need to precede any\n * later write to the same memory element.\n * As to live-out data, the last writes need to remain the last writes.\n * That is, any earlier write in the original schedule needs to precede\n * the last write to the same memory element in the computed schedule.\n * The possible last writes have been computed by compute_live_out.\n * They may include kills, but if the last access is a kill,\n * then the corresponding dependences will effectively be ignored\n * since we do not schedule any kill statements.\n *\n * Note that the set of live-in and live-out accesses may be\n * an overapproximation.  There may therefore be potential writes\n * before a live-in access and after a live-out access.\n *\n * In the presence of may-writes, there may be multiple live-ranges\n * with the same sink, accessing the same memory element.\n * The sources of these live-ranges need to be executed\n * in the same relative order as in the original program\n * since we do not know which of the may-writes will actually\n * perform a write.  Consider all sources that share a sink and\n * that may write to the same memory element and compute\n * the order dependences among them.\n */\nstatic void compute_forced_dependences(struct ppcg_scop *ps)\n{\n\tisl_union_map *shared_access;\n\tisl_union_map *exposed;\n\tisl_union_map *live_in;\n\tisl_union_map *sink_access;\n\tisl_union_map *shared_sink;\n\tisl_union_access_info *access;\n\tisl_union_flow *flow;\n\tisl_schedule *schedule;\n\n\texposed = isl_union_map_copy(ps->live_out);\n\tschedule = isl_schedule_copy(ps->schedule);\n\taccess = isl_union_access_info_from_sink(exposed);\n\taccess = isl_union_access_info_set_may_source(access,\n\t\t\t\tisl_union_map_copy(ps->may_writes));\n\taccess = isl_union_access_info_set_schedule(access, schedule);\n\tflow = isl_union_access_info_compute_flow(access);\n\tshared_access = isl_union_flow_get_may_dependence(flow);\n\tisl_union_flow_free(flow);\n\tps->dep_forced = shared_access;\n\n\tschedule = isl_schedule_copy(ps->schedule);\n\taccess = isl_union_access_info_from_sink(\n\t\t\t\tisl_union_map_copy(ps->may_writes));\n\taccess = isl_union_access_info_set_may_source(access,\n\t\t\t\tisl_union_map_copy(ps->live_in));\n\taccess = isl_union_access_info_set_schedule(access, schedule);\n\tflow = isl_union_access_info_compute_flow(access);\n\tlive_in = isl_union_flow_get_may_dependence(flow);\n\tisl_union_flow_free(flow);\n\n\tps->dep_forced = isl_union_map_union(ps->dep_forced, live_in);\n\tps->dep_forced = isl_union_map_subtract(ps->dep_forced,\n\t\t\t\tisl_union_map_copy(ps->independence));\n\n\tschedule = isl_schedule_copy(ps->schedule);\n\tsink_access = isl_union_map_copy(ps->tagged_dep_flow);\n\tsink_access = isl_union_map_range_product(sink_access,\n\t\t\t\tisl_union_map_copy(ps->tagged_may_writes));\n\tsink_access = isl_union_map_domain_factor_domain(sink_access);\n\taccess = isl_union_access_info_from_sink(\n\t\t\t\tisl_union_map_copy(sink_access));\n\taccess = isl_union_access_info_set_may_source(access, sink_access);\n\taccess = isl_union_access_info_set_schedule(access, schedule);\n\tflow = isl_union_access_info_compute_flow(access);\n\tshared_sink = isl_union_flow_get_may_dependence(flow);\n\tisl_union_flow_free(flow);\n\tps->dep_forced = isl_union_map_union(ps->dep_forced, shared_sink);\n}\n\n/* Remove independence from the tagged flow dependences.\n * Since the user has guaranteed that source and sink of an independence\n * can be executed in any order, there cannot be a flow dependence\n * between them, so they can be removed from the set of flow dependences.\n * However, if the source of such a flow dependence is a must write,\n * then it may have killed other potential sources, which would have\n * to be recovered if we were to remove those flow dependences.\n * We therefore keep the flow dependences that originate in a must write,\n * even if it corresponds to a known independence.\n */\nstatic void remove_independences_from_tagged_flow(struct ppcg_scop *ps)\n{\n\tisl_union_map *tf;\n\tisl_union_set *indep;\n\tisl_union_set *mw;\n\n\ttf = isl_union_map_copy(ps->tagged_dep_flow);\n\ttf = isl_union_map_zip(tf);\n\tindep = isl_union_map_wrap(isl_union_map_copy(ps->independence));\n\ttf = isl_union_map_intersect_domain(tf, indep);\n\ttf = isl_union_map_zip(tf);\n\tmw = isl_union_map_domain(isl_union_map_copy(ps->tagged_must_writes));\n\ttf = isl_union_map_subtract_domain(tf, mw);\n\tps->tagged_dep_flow = isl_union_map_subtract(ps->tagged_dep_flow, tf);\n}\n\n/* Compute the dependences of the program represented by \"scop\"\n * in case live range reordering is allowed.\n *\n * We compute the actual live ranges and the corresponding order\n * false dependences.\n *\n * The independences are removed from the flow dependences\n * (provided the source is not a must-write) as well as\n * from the external false dependences (by compute_forced_dependences).\n */\nstatic void compute_live_range_reordering_dependences(struct ppcg_scop *ps)\n{\n\tcompute_tagged_flow_dep_only(ps);\n\tremove_independences_from_tagged_flow(ps);\n\tderive_flow_dep_from_tagged_flow_dep(ps);\n\tcompute_order_dependences(ps);\n\tcompute_forced_dependences(ps);\n}\n\n/* Compute the potential flow dependences and the potential live in\n * accesses.\n *\n * Both must-writes and must-kills are allowed to kill dependences\n * from earlier writes to subsequent reads, as in compute_tagged_flow_dep_only.\n */\nstatic void compute_flow_dep(struct ppcg_scop *ps)\n{\n\tisl_union_access_info *access;\n\tisl_union_flow *flow;\n\tisl_union_map *kills, *must_writes;\n\n\taccess = isl_union_access_info_from_sink(isl_union_map_copy(ps->reads));\n\tkills = isl_union_map_copy(ps->must_kills);\n\tmust_writes = isl_union_map_copy(ps->must_writes);\n\tkills = isl_union_map_union(kills, must_writes);\n\taccess = isl_union_access_info_set_kill(access, kills);\n\taccess = isl_union_access_info_set_may_source(access,\n\t\t\t\tisl_union_map_copy(ps->may_writes));\n\taccess = isl_union_access_info_set_schedule(access,\n\t\t\t\tisl_schedule_copy(ps->schedule));\n\tflow = isl_union_access_info_compute_flow(access);\n\n\tps->dep_flow = isl_union_flow_get_may_dependence(flow);\n\tps->live_in = isl_union_flow_get_may_no_source(flow);\n\tisl_union_flow_free(flow);\n}\n\n/* Examine if the access \"map\" is an external access, i.e., it is not\n * associated with flow deps.\n */\nstatic isl_bool is_external_access(__isl_keep isl_map *map, void *user) \n{\n  isl_map *read_access = (isl_map *)(user);\n  /* The read access is in the format of\n   * {[S1[] -> pet_ref1] -> A[]}\n   */\n  isl_space *read_access_space = isl_map_get_space(read_access);\n  /* Factor the read access to\n   * {pet_ref[] -> A[]}\n   */\n  read_access_space = isl_space_domain_factor_range(read_access_space);\n  const char *read_access_name = isl_space_get_tuple_name(read_access_space, isl_dim_in);\n\n  /* The flow dpendence is in the format of\n   * {[S1[] -> pet_ref1] -> [S1[] -> pet_ref2]}\n   * We factor it to\n   * {pet_ref1[] -> pet_ref2[]}\n   */\n  isl_map *dep = isl_map_factor_range(isl_map_copy(map));\n  isl_space *dep_space = isl_map_get_space(dep);\n  const char *dep_src_name = isl_space_get_tuple_name(dep_space, isl_dim_in);\n  const char *dep_sink_name = isl_space_get_tuple_name(dep_space, isl_dim_out);\n  isl_map_free(dep);\n\n  /* Compare if the read access name equals either source or sink access name\n   * in the flow dependence.\n   */\n  if (!strcmp(read_access_name, dep_src_name) || !strcmp(read_access_name, dep_sink_name)) {\n    isl_space_free(read_access_space);\n    isl_space_free(dep_space);\n    return isl_bool_false;\n  } else {\n    isl_space_free(read_access_space);\n    isl_space_free(dep_space);   \n    return isl_bool_true;\n  }\n}\n\n/* This function takes the tagged access relation in the format of\n * {[S1[] -> pet_ref..] -> A[i,j]}\n * and returns the access matrix.\n */\nstatic __isl_give isl_mat *get_acc_mat_from_tagged_acc(__isl_keep isl_map *map) \n{\n  isl_map *acc = isl_map_domain_factor_domain(isl_map_copy(map));\n  /* The parameters and constants are truncated. */\n  isl_mat *acc_mat = isl_mat_alloc(isl_map_get_ctx(acc), isl_map_dim(acc, isl_dim_out), isl_map_dim(acc, isl_dim_in));\n  /* Fill in the matrix. */\n  assert(isl_map_n_basic_map(acc) == 1);\n  isl_basic_map_list *bmap_list = isl_map_get_basic_map_list(acc);\n  isl_basic_map *bmap = isl_basic_map_list_get_basic_map(bmap_list, 0);\n\n  isl_mat *eq_mat = isl_basic_map_equalities_matrix(bmap, isl_dim_out, isl_dim_in, isl_dim_div, isl_dim_param, isl_dim_cst);\n  isl_mat *ieq_mat = isl_basic_map_inequalities_matrix(bmap, isl_dim_out, isl_dim_in, isl_dim_div, isl_dim_param, isl_dim_cst);\n\n  for (int row = 0; row < isl_mat_rows(eq_mat); row++) {\n    isl_val *sum = isl_val_zero(isl_basic_map_get_ctx(bmap));\n    int index;\n    for (int col = 0; col < isl_basic_map_dim(bmap, isl_dim_out); col++) {\n      sum = isl_val_add(sum, isl_val_abs(isl_mat_get_element_val(eq_mat, row, col)));\n      isl_val *mat_val = isl_mat_get_element_val(eq_mat, row, col);\n      if (isl_val_is_one(mat_val)) {\n        index = col;\n      }\n      isl_val_free(mat_val);\n    }\n    if (!isl_val_is_one(sum)) {\n      isl_val_free(sum);\n      continue;\n    }\n    for (int col = 0; col < isl_basic_map_dim(bmap, isl_dim_in); col++) {\n      isl_mat_set_element_val(acc_mat, index, col, isl_val_neg(isl_mat_get_element_val(eq_mat, row, col + isl_basic_map_dim(bmap, isl_dim_out))));\n    }\n    isl_val_free(sum);\n  }\n\n  isl_mat_free(eq_mat);\n  isl_mat_free(ieq_mat);\n  isl_map_free(acc);\n\n  isl_basic_map_list_free(bmap_list);\n  isl_basic_map_free(bmap);\n\n  return acc_mat;\n}\n\n/* There could be mulitple solutions (basis) in the null space. \n * This function finds one solution based on the heuristics below:\n * Dependence distance with the simpler pattern is preferred.\n *  \n * We first count the non-zero components in the dependence vector, \n * and select those with the least non-zero components. \n * Then, among those with the same number of non-zero components, \n * we select ones with the least absolute value of the score computed by:\n * sum(abs(ele_of_dep) * 2^(loop_depth)).\n * We favor non-zero components at the upper level, since they are more likely\n * to be carried by the space loops.\n *\n * For T2S only:\n * At the second phase of tiled T2S code generation,\n * the coefficients  at space loop dimensions should be no less than zero.\n * For now, we will set any dependence vector with negative coefficient with a negative\n * score -1.\n * \n * Temporary: We only allow one non-zero component in the reuse vector to simplify\n * the generation of hardware. We may relax it in the future.\n */\nstatic int rar_sol_smart_pick(\n  __isl_keep isl_mat *mat, struct ppcg_scop *ps, int *n_candidates, int *n_default, int user_choice)\n{\n  int score[isl_mat_cols(mat)];\n  int depth = isl_mat_rows(mat);\n  int pick_idx = -1;\n  int min_score = 0;  \n  int min_non_zero_cnt = -1;\n  int non_zero_cnts[isl_mat_cols(mat)];\n\n  for (int c = 0; c < isl_mat_cols(mat); c++) {\n    int non_zero_cnt = 0;\n    for (int r = 0; r < isl_mat_rows(mat); r++) {\n      isl_val *val = isl_mat_get_element_val(mat, r, c);\n      long val_int = isl_val_get_num_si(val);\n      isl_val_free(val);\n      if (val_int != 0)\n        non_zero_cnt++;\n    }\n    non_zero_cnts[c] = non_zero_cnt;\n    if (min_non_zero_cnt == -1) {\n      min_non_zero_cnt = non_zero_cnt;    \n    } else {\n      if (non_zero_cnt < min_non_zero_cnt)\n        min_non_zero_cnt = non_zero_cnt;\n    }\n  }\n\n  /* Temporary: We only allow one non-zero component in the reuse vector to simplify\n   * the generation of hardware. We may relax it in the future.\n   */\n  if (min_non_zero_cnt > 1) {\n\treturn pick_idx;\n  }\n  \n  for (int c = 0; c < isl_mat_cols(mat); c++) {\n    score[c] = 0; \n    for (int r = 0; r < isl_mat_rows(mat); r++) {\n      isl_val *val = isl_mat_get_element_val(mat, r, c);\n      long val_int = isl_val_get_num_si(val);\n      score[c] += abs(val_int) * pow(2, r);    \n      isl_val_free(val);\n      if (ps->options->autosa->t2s_tile && \n\t\t\t\t\t\tps->options->autosa->t2s_tile_phase == 1) {\n        if (val_int < 0) {\n          score[c] = -1;\n          break;\n        }\n      }\n    }\n    if (score[c] >= 0 && non_zero_cnts[c] == min_non_zero_cnt) {\n\t  if (user_choice == -1) {\n\t    printf(\"[AutoSA] Candidate %d: \", *n_candidates);\n\t    isl_printer *p_tmp = isl_printer_to_file(isl_mat_get_ctx(mat), stdout);\n\t    isl_vec *sol_tmp = isl_vec_alloc(isl_mat_get_ctx(mat), isl_mat_rows(mat));\n\t    for (int r = 0; r < isl_mat_rows(mat); r++) {\n\t  \t  sol_tmp = isl_vec_set_element_val(sol_tmp, r, isl_mat_get_element_val(mat, r, c));\n\t    }\n\t    p_tmp = isl_printer_print_vec(p_tmp, sol_tmp);\t  \n\t    isl_printer_free(p_tmp);\n\t    isl_vec_free(sol_tmp);\n\t    printf(\"\\n\");\n\t\tif (pick_idx == -1) {\n          pick_idx = c;\n          min_score = score[c];\n\t\t  *n_default = *n_candidates;\n        } else {\n          if (min_score > score[c]) {\n            pick_idx = c;\n            min_score = score[c];\n\t\t    *n_default = *n_candidates;\n          }\n        }\n\t  }\telse {\n\t    if (user_choice == *n_candidates) {\n\t\t  pick_idx = c;\n\t      break;\n\t\t}\n\t  }\n\t  (*n_candidates)++;\n    }\n  }\n\n  return pick_idx;\n}\n\n/* Construct a pseudo RAR dependence that is an identity map of the read access. */\nstatic __isl_give isl_map *construct_pseudo_dep_rar(__isl_keep isl_map *map)\n{\n\tisl_set *set;\n\n//#ifdef _DEBUG\n//\tDBGMAP(stdout, map, isl_map_get_ctx(map));\n//#endif\n\tset = isl_map_domain(isl_map_copy(map));\n\tisl_map *dep_map;\n\tdep_map = isl_set_identity(set);\n//#ifdef _DEBUG\n//\tDBGMAP(stdout, dep_map, isl_map_get_ctx(dep_map));\n//#endif\n\n\treturn dep_map;\n}\n\n/* Construct the RAR dependence based on the dependence vector in \"sol\" and the \n * access relation \"map\".\n */\nstatic __isl_give isl_map *construct_dep_rar(__isl_keep isl_vec *sol, \n\t__isl_keep isl_map *map) \n{\n  /* Build the space. */\n  isl_space *space = isl_map_get_space(map);\n  space = isl_space_domain(space);\n  isl_space *space_d = isl_space_factor_domain(isl_space_copy(space));\n  isl_space *space_r = isl_space_factor_range(isl_space_copy(space));\n\n  isl_space *space_d_d = isl_space_map_from_domain_and_range(space_d, isl_space_copy(space_d));\n  isl_space *space_r_r = isl_space_map_from_domain_and_range(space_r, isl_space_copy(space_r));\n\n  isl_space_free(space);\n  space = isl_space_product(space_d_d, space_r_r);\n  isl_map *dep_map = isl_map_universe(isl_space_copy(space));\n\n  /* Add the dep vector constraint. */\n  isl_local_space *ls = isl_local_space_from_space(space);\n  for (int i = 0; i < isl_vec_size(sol); i++) {\n    isl_constraint *cst = isl_constraint_alloc_equality(isl_local_space_copy(ls));\n    isl_constraint_set_coefficient_si(cst, isl_dim_in, i, 1);\n    isl_constraint_set_coefficient_si(cst, isl_dim_out, i, -1);\n    isl_constraint_set_constant_val(cst, isl_vec_get_element_val(sol, i));\n    dep_map = isl_map_add_constraint(dep_map, cst);\n  }\n\n  /* Add the iteration domain constraints. */  \n  isl_set *domain = isl_map_domain(isl_map_copy(map));\n  isl_map *new_map = isl_map_from_domain_and_range(domain, isl_set_copy(domain));\n  dep_map = isl_map_intersect(dep_map, new_map);\n\n  isl_local_space_free(ls);\n\n  return dep_map;\n}\n\nstruct autosa_extract_size_data\n{\n  const char *type;\n  isl_set *res;\n};\n\n/* This function is called for each set in a union_set.\n * If the name of the set matches data->type, we store the\n * set in data->res.\n */\nstatic isl_stat extract_size_of_type(__isl_take isl_set *size, void *user)\n{\n  struct autosa_extract_size_data *data = (struct autosa_extract_size_data *)user;\n  const char *name;\n\n  name = isl_set_get_tuple_name(size);\n  if (name && !strcmp(name, data->type))\n  {\n    data->res = size;\n    return isl_stat_error;\n  }\n\n  isl_set_free(size);\n  return isl_stat_ok;\n}\n\nstatic __isl_give isl_set *extract_sa_sizes(__isl_keep isl_union_map *sizes,\n                                     const char *type)\n{\n  isl_space *space;\n  isl_set *dom;\n  isl_union_set *local_sizes;\n  struct autosa_extract_size_data data = {type, NULL};\n\n  if (!sizes)\n    return NULL;\n\n  space = isl_union_map_get_space(sizes);\n  space = isl_space_set_from_params(space);  \n  space = isl_space_set_tuple_name(space, isl_dim_set, \"kernel\");\n  dom = isl_set_universe(space);  \n\n  local_sizes = isl_union_set_apply(isl_union_set_from_set(dom),\n                                    isl_union_map_copy(sizes));\n  isl_union_set_foreach_set(local_sizes, &extract_size_of_type, &data);\n  isl_union_set_free(local_sizes);\n  return data.res;\n}\n\nstatic __isl_give isl_union_map *extract_sizes_from_str(isl_ctx *ctx, const char *str)\n{\n  if (!str)\n    return NULL;\n  return isl_union_map_read_from_str(ctx, str);\n}\n\nstatic int read_select_rar_dep_choices(struct ppcg_scop *ps, __isl_keep isl_map *map)\n{\n  /* Extract the reference name */\n  isl_set *domain = isl_map_domain(isl_map_copy(map));\n  isl_map *domain_map = isl_set_unwrap(domain);\n  isl_space *space = isl_map_get_space(domain_map);\n  isl_map_free(domain_map);  \n  const char *ref_name = isl_space_get_tuple_name(space, isl_dim_out);\n  isl_space_free(space);  \n  isl_union_map *sizes = extract_sizes_from_str(isl_map_get_ctx(map), ps->options->autosa->select_rar_dep);\n  isl_set *size = extract_sa_sizes(sizes, ref_name);\n  isl_union_map_free(sizes);\n  int ret = -1;\n  if (size) {\n    isl_val *v = isl_set_plain_get_val_if_fixed(size, isl_dim_set, 0);\n    ret = isl_val_get_num_si(v);\n\tisl_val_free(v);\t\n  }\n  isl_set_free(size);\n\n  return ret;\t\n}\n\n/* Builds the RAR dependence for the given access \"map\".\n * First we examine the access is an external access (not assoiciated with\n * any flow dependence). Next, we compute the null space of the access matrix.\n * At present, we will take one of the solutions in the null space as the \n * RAR dependence for the given array access. \n */\nstatic isl_stat build_rar_dep(__isl_take isl_map *map, void *user) {\n  struct ppcg_scop *ps = (struct ppcg_scop *)(user);\n  isl_map *tagged_dep_rar;\n  /* Examine if the read access is an external access. */\n  isl_union_map *tagged_dep_flow = ps->tagged_dep_flow;\n  isl_bool is_external = isl_union_map_every_map(tagged_dep_flow, &is_external_access, map);\n  if (!is_external) {\n    isl_map_free(map);\n    return isl_stat_ok;\n  }\n\n  /* Take the access function and compute the null space */\n  isl_mat *acc_mat = get_acc_mat_from_tagged_acc(map); \n  isl_mat *acc_null_mat = isl_mat_right_kernel(acc_mat);\n  int nsol = isl_mat_cols(acc_null_mat);  \n  if (nsol > 0) {\n  \t/* Build the RAR dependence.\n   \t * TODO: Temporary solution. We will construnct the RAR dep\n     * using one independent solution based on hueristics.\n     */\n\tint n_candidates = 0;\n\t{\n\t  printf(\"[AutoSA] Extract RAR dep for the array access: \");\n\t  isl_space *space = isl_map_get_space(map);\n\t  isl_map *map_tmp = isl_map_universe(space);\n\t  isl_printer *p_tmp = isl_printer_to_file(isl_map_get_ctx(map_tmp), stdout);\n\t  p_tmp = isl_printer_print_map(p_tmp, map_tmp);\n\t  isl_printer_free(p_tmp);\n\t  isl_map_free(map_tmp);\n\t  printf(\"\\n\");\t\t\t\t\t\t\n\t}\n\tint default_candidate = -1;\n    int col = rar_sol_smart_pick(acc_null_mat, ps, &n_candidates, &default_candidate, -1);\n\tif (col >= 0) {\n\t  /* Check if users have specified any choice. */\n\t  int user_choice = read_select_rar_dep_choices(ps, map);\n      if (n_candidates > 1) {\n\t\tprintf(\"[AutoSA] Found more than one legal RAR deps. \");\n\t\tif (user_choice == -1)\n\t\t  printf(\"Candidate %d is used by default.\\n\", default_candidate);\n\t\telse {\n\t\t  printf(\"Candidate %d is used.\\n\", user_choice);\n\t\t  n_candidates = 0;\n\t\t  col = rar_sol_smart_pick(acc_null_mat, ps, &n_candidates, &default_candidate, user_choice);\n\t\t}\n\t  }\n\n      isl_vec *sol = isl_vec_alloc(isl_map_get_ctx(map), isl_mat_rows(acc_null_mat));\n      for (int row = 0; row < isl_mat_rows(acc_null_mat); row++) {\n        sol = isl_vec_set_element_val(sol, row, isl_mat_get_element_val(acc_null_mat, row, col));\n      }\n\t  //DBGVEC(stdout, sol, isl_vec_get_ctx(sol));\n      tagged_dep_rar = construct_dep_rar(sol, map);\n//\t  DBGMAP(stdout, tagged_dep_rar, isl_map_get_ctx(tagged_dep_rar));\n      isl_vec_free(sol);      \n\n\t  /* Test if the dependence is empty. In such case, we will build an identity map \n\t   * serving as a pseudo-dependence. \n\t   */\n\t  if (isl_map_is_empty(tagged_dep_rar)) {\n\t\tisl_map_free(tagged_dep_rar);\n\t\tcol = -1;\n\t  } \n\t}\n\n\tif (col < 0) {\n\t  tagged_dep_rar = construct_pseudo_dep_rar(map);\n\t}\n\n    ps->tagged_dep_rar = isl_union_map_union(ps->tagged_dep_rar, isl_union_map_from_map(tagged_dep_rar));\n  } else {\t\n\t/* Since there is no data reuse opportunity, we will build an identity map here. */\n\ttagged_dep_rar = construct_pseudo_dep_rar(map);\n\tps->tagged_dep_rar = isl_union_map_union(ps->tagged_dep_rar, isl_union_map_from_map(tagged_dep_rar));\n  }\n\n  isl_mat_free(acc_null_mat);\n  isl_map_free(map);\n  return isl_stat_ok;\n}\n\n/* Compute ps->dep_rar from ps->tagged_dep_rar\n * by projecting out the reference tags.\n */\nstatic void derive_rar_dep_from_tagged_rar_dep(struct ppcg_scop *ps)\n{\n  ps->dep_rar = isl_union_map_copy(ps->tagged_dep_rar);\n  ps->dep_rar = isl_union_map_factor_domain(ps->dep_rar);\n}\n\n/* Computed the tagged RAR dependence and store the results in\n * ps->tagged_rar_flow.\n */\nstatic void compute_tagged_rar_dep_only(struct ppcg_scop *ps)\n{\n  /* For each read access, if the read is an external read access,\n   * compute the null space of the access function, and \n   * construct the RAR deps based on the independent solution in the null space.\n   */\n  isl_union_map *tagged_reads = ps->tagged_reads;\n  isl_union_map_foreach_map(tagged_reads, &build_rar_dep, ps);\n}\n\n/* Compute the RAR dependence for each externel read access.\n * The results are stored in ps->dep_rar.\n * A copy of the RAR dependences, tagged with the reference tags \n * is stored in ps->tagged_dep_rar.\n *\n * We first compute ps->tagged_dep_rar, i.e., the tagged RAR dependences\n * and then project out the tags.\n */\nstatic void compute_tagged_rar_dep(struct ppcg_scop *ps)\n{\n  isl_space *space = isl_union_map_get_space(ps->tagged_dep_flow);\n  ps->tagged_dep_rar = isl_union_map_empty(\n\t\t\tisl_space_set_alloc(isl_union_map_get_ctx(ps->tagged_dep_flow),\n        isl_space_dim(space, isl_dim_param), 0));\n  isl_space_free(space);\n  compute_tagged_rar_dep_only(ps);\n  derive_rar_dep_from_tagged_rar_dep(ps);\n}\n\nstatic void compute_tagged_waw_dep_only(struct ppcg_scop *ps)\n{\n  isl_union_pw_multi_aff *tagger;\n  isl_schedule *schedule;\n  isl_union_map *kills;\n  isl_union_map *must_source;\n  isl_union_access_info *access;\n  isl_union_flow *flow;\n  isl_union_map *tagged_flow;\n\n  tagger = isl_union_pw_multi_aff_copy(ps->tagger);\n  schedule = isl_schedule_copy(ps->schedule);\n  schedule = isl_schedule_pullback_union_pw_multi_aff(schedule, tagger);\n  kills = isl_union_map_copy(ps->tagged_must_kills);\n  must_source = isl_union_map_copy(ps->tagged_must_writes);\n  kills = isl_union_map_union(kills, must_source);\n  access = isl_union_access_info_from_sink(\n      isl_union_map_copy(ps->tagged_may_writes));\n  access = isl_union_access_info_set_kill(access, kills);\n  access = isl_union_access_info_set_may_source(access, \n      isl_union_map_copy(ps->tagged_may_writes));\n  access = isl_union_access_info_set_schedule(access, schedule);\n  flow = isl_union_access_info_compute_flow(access);\n  tagged_flow = isl_union_flow_get_may_dependence(flow);\n  ps->tagged_dep_waw = tagged_flow;\n  isl_union_flow_free(flow);\n}\n\nstatic void derive_waw_dep_from_tagged_waw_dep(struct ppcg_scop *ps)\n{\n  ps->dep_waw = isl_union_map_copy(ps->tagged_dep_waw);\n  ps->dep_waw = isl_union_map_factor_domain(ps->dep_waw);\n}\n\n/* Compute the WAW dependence for each intermediate write access.\n * The results are stored in ps->dep_waw.\n * A copy of the waw dependences, tagged with the reference tags \n * is stored in ps->tagged_dep_waw.\n *\n * We first compute ps->tagged_dep_waw, i.e., the tagged WAW dependences\n * and then project out the tags. \n */\nstatic void compute_tagged_waw_dep(struct ppcg_scop *ps)\n{\n  compute_tagged_waw_dep_only(ps); \n  derive_waw_dep_from_tagged_waw_dep(ps);\n}\n\n/* Compute the dependences of the program represented by \"scop\".\n * Store the computed potential flow dependences\n * in scop->dep_flow and the reads with potentially no corresponding writes in\n * scop->live_in.\n * Store the potential live out accesses in scop->live_out.\n * Store the potential false (anti and output) dependences in scop->dep_false.\n *\n * If live range reordering is allowed, then we compute a separate\n * set of order dependences and a set of external false dependences\n * in compute_live_range_reordering_dependences.\n * \n * Extended by AutoSA: Add analysis for WAW and RAR dependences.\n */\nstatic void compute_dependences(struct ppcg_scop *scop)\n{\n\tisl_union_map *may_source;\n\tisl_union_access_info *access;\n\tisl_union_flow *flow;\n\n\tif (!scop)\n\t\treturn;\n\n\tcompute_live_out(scop);\n\n\tif (scop->options->live_range_reordering)\n\t\tcompute_live_range_reordering_dependences(scop);\n\telse if (scop->options->target != PPCG_TARGET_C)\n\t\tcompute_tagged_flow_dep(scop);\n\telse\n\t\tcompute_flow_dep(scop);\n\t\n\tmay_source = isl_union_map_union(isl_union_map_copy(scop->may_writes),\n\t\t\t\t\tisl_union_map_copy(scop->reads));\n\taccess = isl_union_access_info_from_sink(\n\t\t\t\tisl_union_map_copy(scop->may_writes));\n\t//access = isl_union_access_info_set_kill(access,\n\t//\t\t\tisl_union_map_copy(scop->must_writes));\n\taccess = isl_union_access_info_set_kill(access,\n\t\t\t\t\tisl_union_map_union(isl_union_map_copy(scop->must_writes), \n\t\t\t\t\t                    isl_union_map_copy(scop->must_kills)));\n\taccess = isl_union_access_info_set_may_source(access, may_source);\n\taccess = isl_union_access_info_set_schedule(access,\n\t\t\t\tisl_schedule_copy(scop->schedule));\n\tflow = isl_union_access_info_compute_flow(access);\n\n\tscop->dep_false = isl_union_flow_get_may_dependence(flow);\n\tscop->dep_false = isl_union_map_coalesce(scop->dep_false);\n\tisl_union_flow_free(flow);\n\n\t/* AutoSA Extended */\n\tif (scop->options->autosa->autosa) {\n\t\tcompute_tagged_rar_dep(scop);\n\t\tcompute_tagged_waw_dep(scop);\t\t\t\n\t}\n\t/* AutoSA Extended */\n}\n\n/* Eliminate dead code from ps->domain.\n *\n * In particular, intersect both ps->domain and the domain of\n * ps->schedule with the (parts of) iteration\n * domains that are needed to produce the output or for statement\n * iterations that call functions.\n * Also intersect the range of the dataflow dependences with\n * this domain such that the removed instances will no longer\n * be considered as targets of dataflow.\n *\n * We start with the iteration domains that call functions\n * and the set of iterations that last write to an array\n * (except those that are later killed).\n *\n * Then we add those statement iterations that produce\n * something needed by the \"live\" statements iterations.\n * We keep doing this until no more statement iterations can be added.\n * To ensure that the procedure terminates, we compute the affine\n * hull of the live iterations (bounded to the original iteration\n * domains) each time we have added extra iterations.\n */\nstatic void eliminate_dead_code(struct ppcg_scop *ps)\n{\n\tisl_union_set *live;\n\tisl_union_map *dep;\n\tisl_union_pw_multi_aff *tagger;\n\n\tlive = isl_union_map_domain(isl_union_map_copy(ps->live_out));\n\tif (!isl_union_set_is_empty(ps->call)) {\n\t\tlive = isl_union_set_union(live, isl_union_set_copy(ps->call));\n\t\tlive = isl_union_set_coalesce(live);\n\t}\n\n\tdep = isl_union_map_copy(ps->dep_flow);\n\tdep = isl_union_map_reverse(dep);\n\n\tfor (;;) {\n\t\tisl_union_set *extra;\n\n\t\textra = isl_union_set_apply(isl_union_set_copy(live),\n\t\t\t\t\t    isl_union_map_copy(dep));\n\t\tif (isl_union_set_is_subset(extra, live)) {\n\t\t\tisl_union_set_free(extra);\n\t\t\tbreak;\n\t\t}\n\n\t\tlive = isl_union_set_union(live, extra);\n\t\tlive = isl_union_set_affine_hull(live);\n\t\tlive = isl_union_set_intersect(live,\n\t\t\t\t\t    isl_union_set_copy(ps->domain));\n\t}\n\n\tisl_union_map_free(dep);\n\n\tps->domain = isl_union_set_intersect(ps->domain,\n\t\t\t\t\t\tisl_union_set_copy(live));\n\tps->schedule = isl_schedule_intersect_domain(ps->schedule,\n\t\t\t\t\t\tisl_union_set_copy(live));\n\tps->dep_flow = isl_union_map_intersect_range(ps->dep_flow,\n\t\t\t\t\t\tisl_union_set_copy(live));\n\ttagger = isl_union_pw_multi_aff_copy(ps->tagger);\n\tlive = isl_union_set_preimage_union_pw_multi_aff(live, tagger);\n\tps->tagged_dep_flow = isl_union_map_intersect_range(ps->tagged_dep_flow,\n\t\t\t\t\t\tlive);\n}\n\n/* Intersect \"set\" with the set described by \"str\", taking the NULL\n * string to represent the universal set.\n */\nstatic __isl_give isl_set *set_intersect_str(__isl_take isl_set *set,\n\tconst char *str)\n{\n\tisl_ctx *ctx;\n\tisl_set *set2;\n\n\tif (!str)\n\t\treturn set;\n\n\tctx = isl_set_get_ctx(set);\n\tset2 = isl_set_read_from_str(ctx, str);\n\tset = isl_set_intersect(set, set2);\n\n\treturn set;\n}\n\nstatic void *ppcg_scop_free(struct ppcg_scop *ps)\n{\n\tif (!ps)\n\t\treturn NULL;\n\n\tisl_set_free(ps->context);\n\tisl_union_set_free(ps->domain);\n\tisl_union_set_free(ps->call);\n\tisl_union_map_free(ps->tagged_reads);\n\tisl_union_map_free(ps->reads);\n\tisl_union_map_free(ps->live_in);\n\tisl_union_map_free(ps->tagged_may_writes);\n\tisl_union_map_free(ps->tagged_must_writes);\n\tisl_union_map_free(ps->may_writes);\n\tisl_union_map_free(ps->must_writes);\n\tisl_union_map_free(ps->live_out);\n\tisl_union_map_free(ps->tagged_must_kills);\n\tisl_union_map_free(ps->must_kills);\n\tisl_union_map_free(ps->tagged_dep_flow);\n\tisl_union_map_free(ps->dep_flow);\n\tisl_union_map_free(ps->dep_false);\n\tisl_union_map_free(ps->dep_forced);\n\tisl_union_map_free(ps->tagged_dep_order);\n\tisl_union_map_free(ps->dep_order);\n\tisl_schedule_free(ps->schedule);\n\tisl_union_pw_multi_aff_free(ps->tagger);\n\tisl_union_map_free(ps->independence);\n\tisl_id_to_ast_expr_free(ps->names);\n\t/* AutoSA Extended */\n\tisl_union_map_free(ps->tagged_dep_rar);\n\tisl_union_map_free(ps->dep_rar);\n\tisl_union_map_free(ps->tagged_dep_waw);\n\tisl_union_map_free(ps->dep_waw);\n\t/* AutoSA Extended */\n\n\tfree(ps);\n\n\treturn NULL;\n}\n\n/* Extract a ppcg_scop from a pet_scop.\n *\n * The constructed ppcg_scop refers to elements from the pet_scop\n * so the pet_scop should not be freed before the ppcg_scop.\n */\nstatic struct ppcg_scop *ppcg_scop_from_pet_scop(struct pet_scop *scop,\n\tstruct ppcg_options *options)\n{\n\tint i;\n\tisl_ctx *ctx;\n\tstruct ppcg_scop *ps;\n\n\tif (!scop)\n\t\treturn NULL;\n\n\tctx = isl_set_get_ctx(scop->context);\n\n\tps = isl_calloc_type(ctx, struct ppcg_scop);\n\tif (!ps)\n\t\treturn NULL;\n\n\tps->names = collect_names(scop);\n\tps->options = options;\n\tps->start = pet_loc_get_start(scop->loc);\n\tps->end = pet_loc_get_end(scop->loc);\n\tps->context = isl_set_copy(scop->context);\n\tps->context = set_intersect_str(ps->context, options->ctx);\n\tif (options->non_negative_parameters) {\n\t\tisl_space *space = isl_set_get_space(ps->context);\n\t\tisl_set *nn = isl_set_nat_universe(space);\n\t\tps->context = isl_set_intersect(ps->context, nn);\n\t}\n\tps->domain = collect_non_kill_domains(scop);\n\tps->call = collect_call_domains(scop);\n\tps->tagged_reads = pet_scop_get_tagged_may_reads(scop);\n\tps->reads = pet_scop_get_may_reads(scop);\n\tps->tagged_may_writes = pet_scop_get_tagged_may_writes(scop);\n\tps->may_writes = pet_scop_get_may_writes(scop);\n\tps->tagged_must_writes = pet_scop_get_tagged_must_writes(scop);\n\tps->must_writes = pet_scop_get_must_writes(scop);\n\tps->tagged_must_kills = pet_scop_get_tagged_must_kills(scop);\n\tps->must_kills = pet_scop_get_must_kills(scop);\n\tps->schedule = isl_schedule_copy(scop->schedule);\n\tps->pet = scop;\n\tps->independence = isl_union_map_empty(isl_set_get_space(ps->context));\n\tfor (i = 0; i < scop->n_independence; ++i)\n\t\tps->independence = isl_union_map_union(ps->independence,\n\t\t\tisl_union_map_copy(scop->independences[i]->filter));\n\n\tcompute_tagger(ps);\n\tcompute_dependences(ps);\n\teliminate_dead_code(ps);\n\n\tif (!ps->context || !ps->domain || !ps->call || !ps->reads ||\n\t    !ps->may_writes || !ps->must_writes || !ps->tagged_must_kills ||\n\t    !ps->must_kills || !ps->schedule || !ps->independence || !ps->names)\n\t\treturn ppcg_scop_free(ps);\n\n\treturn ps;\n}\n\n/* Internal data structure for ppcg_transform.\n */\nstruct ppcg_transform_data {\n\tstruct ppcg_options *options;\n\t__isl_give isl_printer *(*transform)(__isl_take isl_printer *p,\n\t\tstruct ppcg_scop *scop, void *user);\n\tvoid *user;\n};\n\n/* Should we print the original code?\n * That is, does \"scop\" involve any data dependent conditions or\n * nested expressions that cannot be handled by pet_stmt_build_ast_exprs?\n */\nstatic int print_original(struct pet_scop *scop, struct ppcg_options *options)\n{\n\tif (!pet_scop_can_build_ast_exprs(scop)) {\n\t\tif (options->debug->verbose)\n\t\t\tfprintf(stdout, \"Printing original code because \"\n\t\t\t\t\"some index expressions cannot currently \"\n\t\t\t\t\"be printed\\n\");\n\t\treturn 1;\n\t}\n\n\tif (pet_scop_has_data_dependent_conditions(scop)) {\n\t\tif (options->debug->verbose)\n\t\t\tfprintf(stdout, \"Printing original code because \"\n\t\t\t\t\"input involves data dependent conditions\\n\");\n\t\treturn 1;\n\t}\n\n\treturn 0;\n}\n\n/* Callback for pet_transform_C_source that transforms\n * the given pet_scop to a ppcg_scop before calling the\n * ppcg_transform callback.\n *\n * If \"scop\" contains any data dependent conditions or if we may\n * not be able to print the transformed program, then just print\n * the original code.\n */\nstatic __isl_give isl_printer *transform(__isl_take isl_printer *p,\n\tstruct pet_scop *scop, void *user)\n{\n\tstruct ppcg_transform_data *data = user;\n\tstruct ppcg_scop *ps;\n\n\tif (print_original(scop, data->options)) {\n\t\tp = pet_scop_print_original(scop, p);\n\t\tpet_scop_free(scop);\n\t\treturn p;\n\t}\n\n\tscop = pet_scop_align_params(scop);\n\tps = ppcg_scop_from_pet_scop(scop, data->options);\n\n\tp = data->transform(p, ps, data->user);\n\n\tppcg_scop_free(ps);\n\tpet_scop_free(scop);\n\n\treturn p;\n}\n\n/* Transform the C source file \"input\" by rewriting each scop\n * through a call to \"transform\".\n * The transformed C code is written to \"out\".\n *\n * This is a wrapper around pet_transform_C_source that transforms\n * the pet_scop to a ppcg_scop before calling \"fn\".\n */\nint ppcg_transform(isl_ctx *ctx, const char *input, FILE *out,\n\tstruct ppcg_options *options,\n\t__isl_give isl_printer *(*fn)(__isl_take isl_printer *p,\n\t\tstruct ppcg_scop *scop, void *user), void *user)\n{\n\tstruct ppcg_transform_data data = { options, fn, user };\n\treturn pet_transform_C_source(ctx, input, out, &transform, &data);\n}\n\n/* Check consistency of options.\n *\n * Return -1 on error.\n */\nstatic int check_options(isl_ctx *ctx)\n{\n\tstruct options *options;//\n\toptions = isl_ctx_peek_options(ctx, &options_args);\n\tif (!options)\n\t\tisl_die(ctx, isl_error_internal,\n\t\t\t\"unable to find options\", return -1);//\n\tif (options->ppcg->openmp &&\n\t    !isl_options_get_ast_build_atomic_upper_bound(ctx))\n\t\tisl_die(ctx, isl_error_invalid,\n\t\t\t\"OpenMP requires atomic bounds\", return -1);//\n\treturn 0;\n}\n\n//int main(int argc, char **argv)\n//{\n//\tint r;\n//\tisl_ctx *ctx;\n//\tstruct options *options;\n//\n//\toptions = options_new_with_defaults();\n//\tassert(options);\n//\n//\tctx = isl_ctx_alloc_with_options(&options_args, options);\n//\tppcg_options_set_target_defaults(options->ppcg);\n//\tisl_options_set_ast_build_detect_min_max(ctx, 1);\n//\tisl_options_set_ast_print_macro_once(ctx, 1);\n//\tisl_options_set_schedule_whole_component(ctx, 0);\n//\tisl_options_set_schedule_maximize_band_depth(ctx, 1);\n//\tisl_options_set_schedule_maximize_coincidence(ctx, 1);\n//\tpet_options_set_encapsulate_dynamic_control(ctx, 1);\n//\targc = options_parse(options, argc, argv, ISL_ARG_ALL);\n//\n//\tif (check_options(ctx) < 0)\n//\t\tr = EXIT_FAILURE;\n//\telse if (options->ppcg->target == PPCG_TARGET_CUDA)\n//\t\tr = generate_cuda(ctx, options->ppcg, options->input);\n//\telse if (options->ppcg->target == PPCG_TARGET_OPENCL)\n//\t\tr = generate_opencl(ctx, options->ppcg, options->input,\n//\t\t\t\toptions->output);\n//\telse\n//\t\tr = generate_cpu(ctx, options->ppcg, options->input,\n//\t\t\t\toptions->output);\n//\n//\tisl_ctx_free(ctx);\n//\n//\treturn r;\n//}\n\nint autosa_main_wrap(int argc, char **argv)\n{\n\tint r;\n\tisl_ctx *ctx;\n\tstruct options *options;\n\n\toptions = options_new_with_defaults();\n\tassert(options);\n\n\tctx = isl_ctx_alloc_with_options(&options_args, options);\n\tppcg_options_set_target_defaults(options->ppcg);\n\tisl_options_set_ast_build_detect_min_max(ctx, 1);\n\tisl_options_set_ast_print_macro_once(ctx, 1);\n\tisl_options_set_schedule_whole_component(ctx, 0);\n\tisl_options_set_schedule_maximize_band_depth(ctx, 1);\n\tisl_options_set_schedule_maximize_coincidence(ctx, 1);\n\tpet_options_set_encapsulate_dynamic_control(ctx, 1);\n\targc = options_parse(options, argc, argv, ISL_ARG_ALL);\n\n\tif (check_options(ctx) < 0)\n\t\tr = EXIT_FAILURE;\n\t//else if (options->ppcg->target == PPCG_TARGET_CUDA)\n\t//\tr = generate_cuda(ctx, options->ppcg, options->input);\n\t//else if (options->ppcg->target == PPCG_TARGET_OPENCL)\n\t//\tr = generate_opencl(ctx, options->ppcg, options->input,\n\t//\t\t\toptions->output);\n\t//else if (options->ppcg->target == PPCG_TARGET_C)\n\t//\tr = generate_cpu(ctx, options->ppcg, options->input,\n\t//\t\t\toptions->output);\n\telse if (options->ppcg->target == AUTOSA_TARGET_XILINX_HLS_C) \n\t  r = generate_autosa_xilinx_hls_c(ctx, options->ppcg, options->input);\n\telse if (options->ppcg->target == AUTOSA_TARGET_INTEL_OPENCL)\n\t  r = generate_autosa_intel_opencl(ctx, options->ppcg, options->input);\n\telse if (options->ppcg->target == AUTOSA_TARGET_CATAPULT_HLS_C)\n\t\tr = generate_autosa_catapult_hls_c(ctx, options->ppcg, options->input);\n\telse if (options->ppcg->target == AUTOSA_TARGET_TAPA_CPP)\n\t  r = generate_autosa_tapa_cpp(ctx, options->ppcg, options->input);\n//\telse if (options->ppcg->target == AUTOSA_TARGET_T2S)\n//\t  r = generate_autosa_t2s(ctx, options->ppcg, options->input, \n//\t\t\t\toptions->output); // TODO: To fix\n//\telse if (options->ppcg->target == AUTOSA_TARGET_C)\n//\t  r = generate_autosa_cpu(ctx, options->ppcg, options->input); // TODO: to fix\n\n\tisl_ctx_free(ctx);\n\n\treturn r;\n}\n"
  },
  {
    "path": "src/ppcg.h",
    "content": "#ifndef PPCG_H\n#define PPCG_H\n\n#include <isl/schedule.h>\n#include <isl/set.h>\n#include <isl/id.h>\n#include <isl/union_set.h>\n#include <isl/union_map.h>\n#include <isl/id_to_ast_expr.h>\n#include <pet.h>\n\n#include \"ppcg_options.h\"\n\n#define _DEBUG\n\n#define DBGVAR(os, var)                                  \\\n  (os) << \"DBG: \" << __FILE__ << \"(\" << __LINE__ << \") \" \\\n       << #var << \" = [\" << (var) << \"]\" << std::endl;\n\n#define DBGSCHDNODE(os, node, ctx)                                    {\\\n  printf(\"%s(%d) Print schedule_node.\\n\", __FILE__, __LINE__);         \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_set_yaml_style(p_debug, ISL_YAML_STYLE_BLOCK); \\\n  p_debug = isl_printer_print_schedule_node(p_debug, node);            \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n}\n\n#define DBGSCHD(os, node, ctx)                                        {\\\n  printf(\"%s(%d) Print schedule.\\n\", __FILE__, __LINE__);              \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_set_yaml_style(p_debug, ISL_YAML_STYLE_BLOCK); \\\n  p_debug = isl_printer_print_schedule(p_debug, node);                 \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n} \n\n#define DBGSET(os, set, ctx)                                          {\\\n  printf(\"%s(%d) Print set.\\n\", __FILE__, __LINE__);                   \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_print_set(p_debug, set);                       \\\n  p_debug = isl_printer_print_str(p_debug, \"\\n\");                      \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n}\n\n#define DBGSPACE(os, space, ctx)                                      {\\\n  printf(\"%s(%d) Print space.\\n\", __FILE__, __LINE__);                 \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_print_space(p_debug, space);                   \\\n  p_debug = isl_printer_print_str(p_debug, \"\\n\");                      \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n}\n\n#define DBGUSET(os, uset, ctx)                                        {\\\n  printf(\"%s(%d) Print union_set.\\n\", __FILE__, __LINE__);             \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_print_union_set(p_debug, uset);                \\\n  p_debug = isl_printer_print_str(p_debug, \"\\n\");                      \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n}\n\n#define DBGUMAP(os, umap, ctx)                                        {\\\n  printf(\"%s(%d) Print union_map.\\n\", __FILE__, __LINE__);             \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_print_union_map(p_debug, umap);                \\\n  p_debug = isl_printer_print_str(p_debug, \"\\n\");                      \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n}\n\n#define DBGMAP(os, map, ctx)                                          {\\\n  printf(\"%s(%d) Print map.\\n\", __FILE__, __LINE__);                   \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_print_map(p_debug, map);                       \\\n  p_debug = isl_printer_print_str(p_debug, \"\\n\");                      \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n}\n\n#define DBGBMAP(os, bmap, ctx)                                        {\\\n  printf(\"%s(%d) Print basic_map.\\n\", __FILE__, __LINE__);             \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_print_basic_map(p_debug, bmap);                \\\n  p_debug = isl_printer_print_str(p_debug, \"\\n\");                      \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n}\n\n#define DBGMA(os, ma, ctx)                                            {\\\n  printf(\"%s(%d) Print multi_aff.\\n\", __FILE__, __LINE__);             \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_print_multi_aff(p_debug, ma);                  \\\n  p_debug = isl_printer_print_str(p_debug, \"\\n\");                      \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n}\n\n#define DBGVEC(os, vec, ctx)                                          {\\\n  printf(\"%s(%d) Print vec.\\n\", __FILE__, __LINE__);                   \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_print_vec(p_debug, vec);                       \\\n  p_debug = isl_printer_print_str(p_debug, \"\\n\");                      \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n}\n\n#define DBGASTEXPR(os, astexpr, ctx)                                  {\\\n  printf(\"%s(%d) Print AST expr.\\n\", __FILE__, __LINE__);              \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_set_output_format(p_debug, ISL_FORMAT_C);      \\\n  p_debug = isl_printer_print_ast_expr(p_debug, astexpr);              \\\n  p_debug = isl_printer_print_str(p_debug, \"\\n\");                      \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n}\n\n#define DBGASTNODE(os, astnode, ctx)                                  {\\\n  printf(\"%s(%d) Print AST node.\\n\", __FILE__, __LINE__);              \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_set_output_format(p_debug, ISL_FORMAT_C);      \\\n  p_debug = isl_printer_print_ast_node(p_debug, astnode);              \\\n  p_debug = isl_printer_print_str(p_debug, \"\\n\");                      \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n}\n\n#define DBGMUPA(os, mupa, ctx)                                        {\\\n  printf(\"%s(%d) Print multi_union_pw_aff.\\n\", __FILE__, __LINE__);    \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_print_multi_union_pw_aff(p_debug, mupa);       \\\n  p_debug = isl_printer_print_str(p_debug, \"\\n\");                      \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n}\n\n#define DBGUPA(os, upa, ctx)                                          {\\\n  printf(\"%s(%d) Print union_pw_aff.\\n\", __FILE__, __LINE__);          \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_print_union_pw_aff(p_debug, upa);              \\\n  p_debug = isl_printer_print_str(p_debug, \"\\n\");                      \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n}\n\n#define DBGVAL(os, val, ctx)                                          {\\\n  printf(\"%s(%d) Print val.\\n\", __FILE__, __LINE__);                   \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_print_val(p_debug, val);                       \\\n  p_debug = isl_printer_print_str(p_debug, \"\\n\");                      \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n}\n\n#define DBGID(os, id, ctx)                                            {\\\n  printf(\"%s(%d) Print id.\\n\", __FILE__, __LINE__);                    \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_print_id(p_debug, id);                         \\\n  p_debug = isl_printer_print_str(p_debug, \"\\n\");                      \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n}\n\n#define DBGPWQPOLY(os, pwqpoly, ctx)                                  {\\\n  printf(\"%s(%d) Print id.\\n\", __FILE__, __LINE__);                    \\\n  isl_printer *p_debug = isl_printer_to_file(ctx, os);                 \\\n  p_debug = isl_printer_print_pw_qpolynomial(p_debug, pwqpoly);        \\\n  p_debug = isl_printer_print_str(p_debug, \"\\n\");                      \\\n  p_debug = isl_printer_free(p_debug);                                 \\\n}\n\n#ifdef __cplusplus\nextern \"C\"\n{\n#endif\n\n\tconst char *ppcg_base_name(const char *filename);\n\tint ppcg_extract_base_name(char *name, const char *input);\n\n\t/* Representation of the scop for use inside PPCG.\n *\n * \"options\" are the options specified by the user.\n * Some fields in this structure may depend on some of the options.\n *\n * \"start\" and \"end\" are file offsets of the corresponding program text.\n * \"context\" represents constraints on the parameters.\n * \"domain\" is the union of all iteration domains.\n * \"call\" contains the iteration domains of statements with a call expression.\n * \"reads\" contains all potential read accesses.\n * \"tagged_reads\" is the same as \"reads\", except that the domain is a wrapped\n *\trelation mapping an iteration domain to a reference identifier\n * \"live_in\" contains the potential read accesses that potentially\n *\thave no corresponding writes in the scop.\n * \"may_writes\" contains all potential write accesses.\n * \"tagged_may_writes\" is the same as \"may_writes\", except that the domain\n *\tis a wrapped relation mapping an iteration domain\n *\tto a reference identifier\n * \"must_writes\" contains all definite write accesses.\n * \"tagged_must_writes\" is the same as \"must_writes\", except that the domain\n *\tis a wrapped relation mapping an iteration domain\n *\tto a reference identifier\n * \"live_out\" contains the potential write accesses that are potentially\n *\tnot killed by any kills or any other writes.\n * \"must_kills\" contains all definite kill accesses.\n * \"tagged_must_kills\" is the same as \"must_kills\", except that the domain\n *\tis a wrapped relation mapping an iteration domain\n *\tto a reference identifier.\n *\n * \"tagger\" maps tagged iteration domains to the corresponding untagged\n *\titeration domain.\n *\n * \"independence\" is the union of all independence filters.\n *\n * \"dep_flow\" represents the potential flow dependences.\n * \"tagged_dep_flow\" is the same as \"dep_flow\", except that both domain and\n *\trange are wrapped relations mapping an iteration domain to\n *\ta reference identifier.  May be NULL if not computed.\n * \"dep_false\" represents the potential false (anti and output) dependences.\n * \"dep_forced\" represents the validity constraints that should be enforced\n *\teven when live-range reordering is used.\n *\tIn particular, these constraints ensure that all live-in\n *\taccesses remain live-in and that all live-out accesses remain live-out\n *\tand that multiple potential sources for the same read are\n *\texecuted in the original order.\n * \"dep_order\"/\"tagged_dep_order\" represents the order dependences between\n *\tthe live range intervals in \"dep_flow\"/\"tagged_dep_flow\".\n *\tIt is only used if the live_range_reordering\n *\toption is set.  Otherwise it is NULL.\n *\tIf \"dep_order\" is used, then \"dep_false\" only contains a limited\n *\tset of anti and output dependences.\n * \"schedule\" represents the (original) schedule.\n *\n * \"names\" contains all variable names that are in use by the scop.\n * The names are mapped to a dummy value.\n *\n * \"pet\" is the original pet_scop.\n */\n\tstruct ppcg_scop\n\t{\n\t\tstruct ppcg_options *options;\n\n\t\tunsigned start;\n\t\tunsigned end;\n\n\t\tisl_set *context;\n\t\tisl_union_set *domain;\n\t\tisl_union_set *call;\n\t\tisl_union_map *tagged_reads;\n\t\tisl_union_map *reads;\n\t\tisl_union_map *live_in;\n\t\tisl_union_map *tagged_may_writes;\n\t\tisl_union_map *may_writes;\n\t\tisl_union_map *tagged_must_writes;\n\t\tisl_union_map *must_writes;\n\t\tisl_union_map *live_out;\n\t\tisl_union_map *tagged_must_kills;\n\t\tisl_union_map *must_kills;\n\n\t\tisl_union_pw_multi_aff *tagger;\n\n\t\tisl_union_map *independence;\n\n\t\tisl_union_map *dep_flow;\n\t\tisl_union_map *tagged_dep_flow;\n\t\tisl_union_map *dep_false;\n\t\tisl_union_map *dep_forced;\n\t\tisl_union_map *dep_order;\n\t\tisl_union_map *tagged_dep_order;\n\t\tisl_schedule *schedule;\n\n\t\tisl_id_to_ast_expr *names;\n\n\t\tstruct pet_scop *pet;\n\n\t\t/* AutoSA Extended */\n\t\tisl_union_map *dep_rar;\n\t\tisl_union_map *tagged_dep_rar;\n\t\tisl_union_map *dep_waw;\n\t\tisl_union_map *tagged_dep_waw;\n\t\t/* AutoSA Extended */\n\t};\n\n\tint ppcg_scop_any_hidden_declarations(struct ppcg_scop *scop);\n\t__isl_give isl_id_list *ppcg_scop_generate_names(struct ppcg_scop *scop,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t int n, const char *prefix);\n\n\tint ppcg_transform(isl_ctx *ctx, const char *input, FILE *out,\n\t\t\t\t\t\t\t\t\t\t struct ppcg_options *options,\n\t\t\t\t\t\t\t\t\t\t __isl_give isl_printer *(*fn)(__isl_take isl_printer *p,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t struct ppcg_scop *scop, void *user),\n\t\t\t\t\t\t\t\t\t\t void *user);\n\n\tint autosa_main_wrap(int argc, char **argv);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "src/ppcg_files/cuda.c",
    "content": "/*\n * Copyright 2012      Ecole Normale Superieure\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege,\n * Ecole Normale Superieure, 45 rue d’Ulm, 75230 Paris, France\n */\n\n#include <isl/aff.h>\n#include <isl/ast.h>\n\n#include \"cuda_common.h\"\n#include \"cuda.h\"\n#include \"gpu.h\"\n#include \"gpu_print.h\"\n#include \"print.h\"\n#include \"util.h\"\n\nstatic __isl_give isl_printer *print_cuda_macros(__isl_take isl_printer *p)\n{\n\tconst char *macros =\n\t\t\"#define cudaCheckReturn(ret) \\\\\\n\"\n\t\t\"  do { \\\\\\n\"\n\t\t\"    cudaError_t cudaCheckReturn_e = (ret); \\\\\\n\"\n\t\t\"    if (cudaCheckReturn_e != cudaSuccess) { \\\\\\n\"\n\t\t\"      fprintf(stderr, \\\"CUDA error: %s\\\\n\\\", \"\n\t\t\"cudaGetErrorString(cudaCheckReturn_e)); \\\\\\n\"\n\t\t\"      fflush(stderr); \\\\\\n\"\n\t\t\"    } \\\\\\n\"\n\t\t\"    assert(cudaCheckReturn_e == cudaSuccess); \\\\\\n\"\n\t\t\"  } while(0)\\n\"\n\t\t\"#define cudaCheckKernel() \\\\\\n\"\n\t\t\"  do { \\\\\\n\"\n\t\t\"    cudaCheckReturn(cudaGetLastError()); \\\\\\n\"\n\t\t\"  } while(0)\\n\\n\";\n\n\tp = isl_printer_print_str(p, macros);\n\treturn p;\n}\n\n/* Print a declaration for the device array corresponding to \"array\" on \"p\".\n */\nstatic __isl_give isl_printer *declare_device_array(__isl_take isl_printer *p,\n\tstruct gpu_array_info *array)\n{\n\tint i;\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, array->type);\n\tp = isl_printer_print_str(p, \" \");\n\tif (!array->linearize && array->n_index > 1)\n\t\tp = isl_printer_print_str(p, \"(\");\n\tp = isl_printer_print_str(p, \"*dev_\");\n\tp = isl_printer_print_str(p, array->name);\n\tif (!array->linearize && array->n_index > 1) {\n\t\tp = isl_printer_print_str(p, \")\");\n\t\tfor (i = 1; i < array->n_index; i++) {\n\t\t\tisl_ast_expr *bound;\n\t\t\tbound = isl_ast_expr_get_op_arg(array->bound_expr,\n\t\t\t\t\t\t\t1 + i);\n\t\t\tp = isl_printer_print_str(p, \"[\");\n\t\t\tp = isl_printer_print_ast_expr(p, bound);\n\t\t\tp = isl_printer_print_str(p, \"]\");\n\t\t\tisl_ast_expr_free(bound);\n\t\t}\n\t}\n\tp = isl_printer_print_str(p, \";\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\nstatic __isl_give isl_printer *declare_device_arrays(__isl_take isl_printer *p,\n\tstruct gpu_prog *prog)\n{\n\tint i;\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tif (!gpu_array_requires_device_allocation(&prog->array[i]))\n\t\t\tcontinue;\n\n\t\tp = declare_device_array(p, &prog->array[i]);\n\t}\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_end_line(p);\n\treturn p;\n}\n\nstatic __isl_give isl_printer *allocate_device_arrays(\n\t__isl_take isl_printer *p, struct gpu_prog *prog)\n{\n\tint i;\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tstruct gpu_array_info *array = &prog->array[i];\n\n\t\tif (!gpu_array_requires_device_allocation(&prog->array[i]))\n\t\t\tcontinue;\n\t\tp = ppcg_ast_expr_print_macros(array->bound_expr, p);\n\t\tp = isl_printer_start_line(p);\n\t\tp = isl_printer_print_str(p,\n\t\t\t\"cudaCheckReturn(cudaMalloc((void **) &dev_\");\n\t\tp = isl_printer_print_str(p, prog->array[i].name);\n\t\tp = isl_printer_print_str(p, \", \");\n\t\tp = gpu_array_info_print_size(p, &prog->array[i]);\n\t\tp = isl_printer_print_str(p, \"));\");\n\t\tp = isl_printer_end_line(p);\n\t}\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_end_line(p);\n\treturn p;\n}\n\nstatic __isl_give isl_printer *free_device_arrays(__isl_take isl_printer *p,\n\tstruct gpu_prog *prog)\n{\n\tint i;\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tif (!gpu_array_requires_device_allocation(&prog->array[i]))\n\t\t\tcontinue;\n\t\tp = isl_printer_start_line(p);\n\t\tp = isl_printer_print_str(p, \"cudaCheckReturn(cudaFree(dev_\");\n\t\tp = isl_printer_print_str(p, prog->array[i].name);\n\t\tp = isl_printer_print_str(p, \"));\");\n\t\tp = isl_printer_end_line(p);\n\t}\n\n\treturn p;\n}\n\n/* Print code to \"p\" for copying \"array\" from the host to the device\n * in its entirety.  The bounds on the extent of \"array\" have\n * been precomputed in extract_array_info and are used in\n * gpu_array_info_print_size.\n */\nstatic __isl_give isl_printer *copy_array_to_device(__isl_take isl_printer *p,\n\tstruct gpu_array_info *array)\n{\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"cudaCheckReturn(cudaMemcpy(dev_\");\n\tp = isl_printer_print_str(p, array->name);\n\tp = isl_printer_print_str(p, \", \");\n\n\tif (gpu_array_is_scalar(array))\n\t\tp = isl_printer_print_str(p, \"&\");\n\tp = isl_printer_print_str(p, array->name);\n\tp = isl_printer_print_str(p, \", \");\n\n\tp = gpu_array_info_print_size(p, array);\n\tp = isl_printer_print_str(p, \", cudaMemcpyHostToDevice));\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\n/* Print code to \"p\" for copying \"array\" back from the device to the host\n * in its entirety.  The bounds on the extent of \"array\" have\n * been precomputed in extract_array_info and are used in\n * gpu_array_info_print_size.\n */\nstatic __isl_give isl_printer *copy_array_from_device(\n\t__isl_take isl_printer *p, struct gpu_array_info *array)\n{\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"cudaCheckReturn(cudaMemcpy(\");\n\tif (gpu_array_is_scalar(array))\n\t\tp = isl_printer_print_str(p, \"&\");\n\tp = isl_printer_print_str(p, array->name);\n\tp = isl_printer_print_str(p, \", dev_\");\n\tp = isl_printer_print_str(p, array->name);\n\tp = isl_printer_print_str(p, \", \");\n\tp = gpu_array_info_print_size(p, array);\n\tp = isl_printer_print_str(p, \", cudaMemcpyDeviceToHost));\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\nstatic void print_reverse_list(FILE *out, int len, int *list)\n{\n\tint i;\n\n\tif (!out || len == 0)\n\t\treturn;\n\n\tfprintf(out, \"(\");\n\tfor (i = 0; i < len; ++i) {\n\t\tif (i)\n\t\t\tfprintf(out, \", \");\n\t\tfprintf(out, \"%d\", list[len - 1 - i]);\n\t}\n\tfprintf(out, \")\");\n}\n\n/* Print the effective grid size as a list of the sizes in each\n * dimension, from innermost to outermost.\n */\nstatic __isl_give isl_printer *print_grid_size(__isl_take isl_printer *p,\n\tstruct ppcg_kernel *kernel)\n{\n\tint i;\n\tint dim;\n\n\tdim = isl_multi_pw_aff_dim(kernel->grid_size, isl_dim_set);\n\tif (dim == 0)\n\t\treturn p;\n\n\tp = isl_printer_print_str(p, \"(\");\n\tfor (i = dim - 1; i >= 0; --i) {\n\t\tisl_ast_expr *bound;\n\n\t\tbound = isl_ast_expr_get_op_arg(kernel->grid_size_expr, 1 + i);\n\t\tp = isl_printer_print_ast_expr(p, bound);\n\t\tisl_ast_expr_free(bound);\n\n\t\tif (i > 0)\n\t\t\tp = isl_printer_print_str(p, \", \");\n\t}\n\n\tp = isl_printer_print_str(p, \")\");\n\n\treturn p;\n}\n\n/* Print the grid definition.\n */\nstatic __isl_give isl_printer *print_grid(__isl_take isl_printer *p,\n\tstruct ppcg_kernel *kernel)\n{\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"dim3 k\");\n\tp = isl_printer_print_int(p, kernel->id);\n\tp = isl_printer_print_str(p, \"_dimGrid\");\n\tp = print_grid_size(p, kernel);\n\tp = isl_printer_print_str(p, \";\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\n/* Print the arguments to a kernel declaration or call.  If \"types\" is set,\n * then print a declaration (including the types of the arguments).\n *\n * The arguments are printed in the following order\n * - the arrays accessed by the kernel\n * - the parameters\n * - the host loop iterators\n */\nstatic __isl_give isl_printer *print_kernel_arguments(__isl_take isl_printer *p,\n\tstruct gpu_prog *prog, struct ppcg_kernel *kernel, int types)\n{\n\tint i, n;\n\tint first = 1;\n\tunsigned nparam;\n\tisl_space *space;\n\tconst char *type;\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tint required;\n\n\t\trequired = ppcg_kernel_requires_array_argument(kernel, i);\n\t\tif (required < 0)\n\t\t\treturn isl_printer_free(p);\n\t\tif (!required)\n\t\t\tcontinue;\n\n\t\tif (!first)\n\t\t\tp = isl_printer_print_str(p, \", \");\n\n\t\tif (types)\n\t\t\tp = gpu_array_info_print_declaration_argument(p,\n\t\t\t\t&prog->array[i], NULL);\n\t\telse\n\t\t\tp = gpu_array_info_print_call_argument(p,\n\t\t\t\t&prog->array[i]);\n\n\t\tfirst = 0;\n\t}\n\n\tspace = isl_union_set_get_space(kernel->arrays);\n\tnparam = isl_space_dim(space, isl_dim_param);\n\tfor (i = 0; i < nparam; ++i) {\n\t\tconst char *name;\n\n\t\tname = isl_space_get_dim_name(space, isl_dim_param, i);\n\n\t\tif (!first)\n\t\t\tp = isl_printer_print_str(p, \", \");\n\t\tif (types)\n\t\t\tp = isl_printer_print_str(p, \"int \");\n\t\tp = isl_printer_print_str(p, name);\n\n\t\tfirst = 0;\n\t}\n\tisl_space_free(space);\n\n\tn = isl_space_dim(kernel->space, isl_dim_set);\n\ttype = isl_options_get_ast_iterator_type(prog->ctx);\n\tfor (i = 0; i < n; ++i) {\n\t\tconst char *name;\n\n\t\tif (!first)\n\t\t\tp = isl_printer_print_str(p, \", \");\n\t\tname = isl_space_get_dim_name(kernel->space, isl_dim_set, i);\n\t\tif (types) {\n\t\t\tp = isl_printer_print_str(p, type);\n\t\t\tp = isl_printer_print_str(p, \" \");\n\t\t}\n\t\tp = isl_printer_print_str(p, name);\n\n\t\tfirst = 0;\n\t}\n\n\treturn p;\n}\n\n/* Print the header of the given kernel.\n */\nstatic __isl_give isl_printer *print_kernel_header(__isl_take isl_printer *p,\n\tstruct gpu_prog *prog, struct ppcg_kernel *kernel)\n{\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"__global__ void kernel\");\n\tp = isl_printer_print_int(p, kernel->id);\n\tp = isl_printer_print_str(p, \"(\");\n\tp = print_kernel_arguments(p, prog, kernel, 1);\n\tp = isl_printer_print_str(p, \")\");\n\n\treturn p;\n}\n\n/* Print the header of the given kernel to both gen->cuda.kernel_h\n * and gen->cuda.kernel_c.\n */\nstatic void print_kernel_headers(struct gpu_prog *prog,\n\tstruct ppcg_kernel *kernel, struct cuda_info *cuda)\n{\n\tisl_printer *p;\n\n\tp = isl_printer_to_file(prog->ctx, cuda->kernel_h);\n\tp = isl_printer_set_output_format(p, ISL_FORMAT_C);\n\tp = print_kernel_header(p, prog, kernel);\n\tp = isl_printer_print_str(p, \";\");\n\tp = isl_printer_end_line(p);\n\tisl_printer_free(p);\n\n\tp = isl_printer_to_file(prog->ctx, cuda->kernel_c);\n\tp = isl_printer_set_output_format(p, ISL_FORMAT_C);\n\tp = print_kernel_header(p, prog, kernel);\n\tp = isl_printer_end_line(p);\n\tisl_printer_free(p);\n}\n\nstatic void print_indent(FILE *dst, int indent)\n{\n\tfprintf(dst, \"%*s\", indent, \"\");\n}\n\n/* Print a list of iterators of type \"type\" with names \"ids\" to \"out\".\n * Each iterator is assigned one of the cuda identifiers in cuda_dims.\n * In particular, the last iterator is assigned the x identifier\n * (the first in the list of cuda identifiers).\n */\nstatic void print_iterators(FILE *out, const char *type,\n\t__isl_keep isl_id_list *ids, const char *cuda_dims[])\n{\n\tint i, n;\n\n\tn = isl_id_list_n_id(ids);\n\tif (n <= 0)\n\t\treturn;\n\tprint_indent(out, 4);\n\tfprintf(out, \"%s \", type);\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_id *id;\n\n\t\tif (i)\n\t\t\tfprintf(out, \", \");\n\t\tid = isl_id_list_get_id(ids, i);\n\t\tfprintf(out, \"%s = %s\", isl_id_get_name(id),\n\t\t\tcuda_dims[n - 1 - i]);\n\t\tisl_id_free(id);\n\t}\n\tfprintf(out, \";\\n\");\n}\n\nstatic void print_kernel_iterators(FILE *out, struct ppcg_kernel *kernel)\n{\n\tisl_ctx *ctx = isl_ast_node_get_ctx(kernel->tree);\n\tconst char *type;\n\tconst char *block_dims[] = { \"blockIdx.x\", \"blockIdx.y\" };\n\tconst char *thread_dims[] = { \"threadIdx.x\", \"threadIdx.y\",\n\t\t\t\t\t\"threadIdx.z\" };\n\n\ttype = isl_options_get_ast_iterator_type(ctx);\n\n\tprint_iterators(out, type, kernel->block_ids, block_dims);\n\tprint_iterators(out, type, kernel->thread_ids, thread_dims);\n}\n\nstatic __isl_give isl_printer *print_kernel_var(__isl_take isl_printer *p,\n\tstruct ppcg_kernel_var *var)\n{\n\tint j;\n\n\tp = isl_printer_start_line(p);\n\tif (var->type == ppcg_access_shared)\n\t\tp = isl_printer_print_str(p, \"__shared__ \");\n\tp = isl_printer_print_str(p, var->array->type);\n\tp = isl_printer_print_str(p, \" \");\n\tp = isl_printer_print_str(p,  var->name);\n\tfor (j = 0; j < var->array->n_index; ++j) {\n\t\tisl_val *v;\n\n\t\tp = isl_printer_print_str(p, \"[\");\n\t\tv = isl_vec_get_element_val(var->size, j);\n\t\tp = isl_printer_print_val(p, v);\n\t\tisl_val_free(v);\n\t\tp = isl_printer_print_str(p, \"]\");\n\t}\n\tp = isl_printer_print_str(p, \";\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\nstatic __isl_give isl_printer *print_kernel_vars(__isl_take isl_printer *p,\n\tstruct ppcg_kernel *kernel)\n{\n\tint i;\n\n\tfor (i = 0; i < kernel->n_var; ++i)\n\t\tp = print_kernel_var(p, &kernel->var[i]);\n\n\treturn p;\n}\n\n/* Print a sync statement.\n */\nstatic __isl_give isl_printer *print_sync(__isl_take isl_printer *p,\n\tstruct ppcg_kernel_stmt *stmt)\n{\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"__syncthreads();\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\n/* This function is called for each user statement in the AST,\n * i.e., for each kernel body statement, copy statement or sync statement.\n */\nstatic __isl_give isl_printer *print_kernel_stmt(__isl_take isl_printer *p,\n\t__isl_take isl_ast_print_options *print_options,\n\t__isl_keep isl_ast_node *node, void *user)\n{\n\tisl_id *id;\n\tstruct ppcg_kernel_stmt *stmt;\n\n\tid = isl_ast_node_get_annotation(node);\n\tstmt = isl_id_get_user(id);\n\tisl_id_free(id);\n\n\tisl_ast_print_options_free(print_options);\n\n\tswitch (stmt->type) {\n\tcase ppcg_kernel_copy:\n\t\treturn ppcg_kernel_print_copy(p, stmt);\n\tcase ppcg_kernel_sync:\n\t\treturn print_sync(p, stmt);\n\tcase ppcg_kernel_domain:\n\t\treturn ppcg_kernel_print_domain(p, stmt);\n\t}\n\n\treturn p;\n}\n\nstatic void print_kernel(struct gpu_prog *prog, struct ppcg_kernel *kernel,\n\tstruct cuda_info *cuda)\n{\n\tisl_ctx *ctx = isl_ast_node_get_ctx(kernel->tree);\n\tisl_ast_print_options *print_options;\n\tisl_printer *p;\n\n\tprint_kernel_headers(prog, kernel, cuda);\n\tfprintf(cuda->kernel_c, \"{\\n\");\n\tprint_kernel_iterators(cuda->kernel_c, kernel);\n\n\tp = isl_printer_to_file(ctx, cuda->kernel_c);\n\tp = isl_printer_set_output_format(p, ISL_FORMAT_C);\n\tp = isl_printer_indent(p, 2);\n\n\tp = print_kernel_vars(p, kernel);\n\tp = isl_printer_end_line(p);\n\tp = ppcg_set_macro_names(p);\n\tp = gpu_print_macros(p, kernel->tree);\n\n\tprint_options = isl_ast_print_options_alloc(ctx);\n\tprint_options = isl_ast_print_options_set_print_user(print_options,\n\t\t\t\t\t\t    &print_kernel_stmt, NULL);\n\tp = isl_ast_node_print(kernel->tree, p, print_options);\n\tisl_printer_free(p);\n\n\tfprintf(cuda->kernel_c, \"}\\n\");\n}\n\n/* Print code for initializing the device for execution of the transformed\n * code.  This includes declaring locally defined variables as well as\n * declaring and allocating the required copies of arrays on the device.\n */\nstatic __isl_give isl_printer *init_device(__isl_take isl_printer *p,\n\tstruct gpu_prog *prog)\n{\n\tp = print_cuda_macros(p);\n\n\tp = gpu_print_local_declarations(p, prog);\n\tp = declare_device_arrays(p, prog);\n\tp = allocate_device_arrays(p, prog);\n\n\treturn p;\n}\n\n/* Print code for clearing the device after execution of the transformed code.\n * In particular, free the memory that was allocated on the device.\n */\nstatic __isl_give isl_printer *clear_device(__isl_take isl_printer *p,\n\tstruct gpu_prog *prog)\n{\n\tp = free_device_arrays(p, prog);\n\n\treturn p;\n}\n\n/* Print a statement for copying an array to or from the device,\n * or for initializing or clearing the device.\n * The statement identifier of a copying node is called\n * \"to_device_<array name>\" or \"from_device_<array name>\" and\n * its user pointer points to the gpu_array_info of the array\n * that needs to be copied.\n * The node for initializing the device is called \"init_device\".\n * The node for clearing the device is called \"clear_device\".\n *\n * Extract the array (if any) from the identifier and call\n * init_device, clear_device, copy_array_to_device or copy_array_from_device.\n */\nstatic __isl_give isl_printer *print_device_node(__isl_take isl_printer *p,\n\t__isl_keep isl_ast_node *node, struct gpu_prog *prog)\n{\n\tisl_ast_expr *expr, *arg;\n\tisl_id *id;\n\tconst char *name;\n\tstruct gpu_array_info *array;\n\n\texpr = isl_ast_node_user_get_expr(node);\n\targ = isl_ast_expr_get_op_arg(expr, 0);\n\tid = isl_ast_expr_get_id(arg);\n\tname = isl_id_get_name(id);\n\tarray = isl_id_get_user(id);\n\tisl_id_free(id);\n\tisl_ast_expr_free(arg);\n\tisl_ast_expr_free(expr);\n\n\tif (!name)\n\t\treturn isl_printer_free(p);\n\tif (!strcmp(name, \"init_device\"))\n\t\treturn init_device(p, prog);\n\tif (!strcmp(name, \"clear_device\"))\n\t\treturn clear_device(p, prog);\n\tif (!array)\n\t\treturn isl_printer_free(p);\n\n\tif (!prefixcmp(name, \"to_device\"))\n\t\treturn copy_array_to_device(p, array);\n\telse\n\t\treturn copy_array_from_device(p, array);\n}\n\nstruct print_host_user_data {\n\tstruct cuda_info *cuda;\n\tstruct gpu_prog *prog;\n};\n\n/* Print the user statement of the host code to \"p\".\n *\n * The host code may contain original user statements, kernel launches,\n * statements that copy data to/from the device and statements\n * the initialize or clear the device.\n * The original user statements and the kernel launches have\n * an associated annotation, while the other statements do not.\n * The latter are handled by print_device_node.\n * The annotation on the user statements is called \"user\".\n *\n * In case of a kernel launch, print a block of statements that\n * defines the grid and the block and then launches the kernel.\n */\nstatic __isl_give isl_printer *print_host_user(__isl_take isl_printer *p,\n\t__isl_take isl_ast_print_options *print_options,\n\t__isl_keep isl_ast_node *node, void *user)\n{\n\tisl_id *id;\n\tint is_user;\n\tstruct ppcg_kernel *kernel;\n\tstruct ppcg_kernel_stmt *stmt;\n\tstruct print_host_user_data *data;\n\n\tisl_ast_print_options_free(print_options);\n\n\tdata = (struct print_host_user_data *) user;\n\n\tid = isl_ast_node_get_annotation(node);\n\tif (!id)\n\t\treturn print_device_node(p, node, data->prog);\n\n\tis_user = !strcmp(isl_id_get_name(id), \"user\");\n\tkernel = is_user ? NULL : isl_id_get_user(id);\n\tstmt = is_user ? isl_id_get_user(id) : NULL;\n\tisl_id_free(id);\n\n\tif (is_user)\n\t\treturn ppcg_kernel_print_domain(p, stmt);\n\n\tp = ppcg_start_block(p);\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"dim3 k\");\n\tp = isl_printer_print_int(p, kernel->id);\n\tp = isl_printer_print_str(p, \"_dimBlock\");\n\tprint_reverse_list(isl_printer_get_file(p),\n\t\t\t\tkernel->n_block, kernel->block_dim);\n\tp = isl_printer_print_str(p, \";\");\n\tp = isl_printer_end_line(p);\n\n\tp = print_grid(p, kernel);\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"kernel\");\n\tp = isl_printer_print_int(p, kernel->id);\n\tp = isl_printer_print_str(p, \" <<<k\");\n\tp = isl_printer_print_int(p, kernel->id);\n\tp = isl_printer_print_str(p, \"_dimGrid, k\");\n\tp = isl_printer_print_int(p, kernel->id);\n\tp = isl_printer_print_str(p, \"_dimBlock>>> (\");\n\tp = print_kernel_arguments(p, data->prog, kernel, 0);\n\tp = isl_printer_print_str(p, \");\");\n\tp = isl_printer_end_line(p);\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"cudaCheckKernel();\");\n\tp = isl_printer_end_line(p);\n\n\tp = ppcg_end_block(p);\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_end_line(p);\n\n\tprint_kernel(data->prog, kernel, data->cuda);\n\n\treturn p;\n}\n\nstatic __isl_give isl_printer *print_host_code(__isl_take isl_printer *p,\n\tstruct gpu_prog *prog, __isl_keep isl_ast_node *tree,\n\tstruct cuda_info *cuda)\n{\n\tisl_ast_print_options *print_options;\n\tisl_ctx *ctx = isl_ast_node_get_ctx(tree);\n\tstruct print_host_user_data data = { cuda, prog };\n\n\tprint_options = isl_ast_print_options_alloc(ctx);\n\tprint_options = isl_ast_print_options_set_print_user(print_options,\n\t\t\t\t\t\t&print_host_user, &data);\n\n\tp = gpu_print_macros(p, tree);\n\tp = isl_ast_node_print(tree, p, print_options);\n\n\treturn p;\n}\n\n/* Given a gpu_prog \"prog\" and the corresponding transformed AST\n * \"tree\", print the entire CUDA code to \"p\".\n * \"types\" collects the types for which a definition has already\n * been printed.\n */\nstatic __isl_give isl_printer *print_cuda(__isl_take isl_printer *p,\n\tstruct gpu_prog *prog, __isl_keep isl_ast_node *tree,\n\tstruct gpu_types *types, void *user)\n{\n\tstruct cuda_info *cuda = user;\n\tisl_printer *kernel;\n\n\tkernel = isl_printer_to_file(isl_printer_get_ctx(p), cuda->kernel_c);\n\tkernel = isl_printer_set_output_format(kernel, ISL_FORMAT_C);\n\tkernel = gpu_print_types(kernel, types, prog);\n\tisl_printer_free(kernel);\n\n\tif (!kernel)\n\t\treturn isl_printer_free(p);\n\n\tp = print_host_code(p, prog, tree, cuda);\n\n\treturn p;\n}\n\n/* Transform the code in the file called \"input\" by replacing\n * all scops by corresponding CUDA code.\n * The names of the output files are derived from \"input\".\n *\n * We let generate_gpu do all the hard work and then let it call\n * us back for printing the AST in print_cuda.\n *\n * To prepare for this printing, we first open the output files\n * and we close them after generate_gpu has finished.\n */\nint generate_cuda(isl_ctx *ctx, struct ppcg_options *options,\n\tconst char *input)\n{\n\tstruct cuda_info cuda;\n\tint r;\n\n\tcuda_open_files(&cuda, input);\n\n\tr = generate_gpu(ctx, input, cuda.host_c, options, &print_cuda, &cuda);\n\n\tcuda_close_files(&cuda);\n\n\treturn r;\n}\n"
  },
  {
    "path": "src/ppcg_files/cuda.h",
    "content": "#ifndef _CUDA_H\n#define _CUDA_H\n\n#include \"ppcg_options.h\"\n#include \"ppcg.h\"\n\n#ifdef __cplusplus\nextern \"C\"\n{\n#endif\n\n\tint generate_cuda(isl_ctx *ctx, struct ppcg_options *options,\n\t\t\t\t\t\t\t\t\t\tconst char *input);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "src/ppcg_files/cuda_common.c",
    "content": "/*\n * Copyright 2010      INRIA Saclay\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege, INRIA Saclay - Ile-de-France,\n * Parc Club Orsay Universite, ZAC des vignes, 4 rue Jacques Monod,\n * 91893 Orsay, France\n */\n\n#include <ctype.h>\n#include <limits.h>\n#include <string.h>\n\n#include \"cuda_common.h\"\n#include \"ppcg.h\"\n\n/* Open the host .cu file and the kernel .hu and .cu files for writing.\n * Add the necessary includes.\n */\nvoid cuda_open_files(struct cuda_info *info, const char *input)\n{\n    char name[PATH_MAX];\n    int len;\n\n    len = ppcg_extract_base_name(name, input);\n\n    strcpy(name + len, \"_host.cu\");\n    info->host_c = fopen(name, \"w\");\n\n    strcpy(name + len, \"_kernel.cu\");\n    info->kernel_c = fopen(name, \"w\");\n\n    strcpy(name + len, \"_kernel.hu\");\n    info->kernel_h = fopen(name, \"w\");\n    fprintf(info->host_c, \"#include <assert.h>\\n\");\n    fprintf(info->host_c, \"#include <stdio.h>\\n\");\n    fprintf(info->host_c, \"#include \\\"%s\\\"\\n\", name);\n    fprintf(info->kernel_c, \"#include \\\"%s\\\"\\n\", name);\n    fprintf(info->kernel_h, \"#include \\\"cuda.h\\\"\\n\\n\");\n}\n\n/* Close all output files.\n */\nvoid cuda_close_files(struct cuda_info *info)\n{\n    fclose(info->kernel_c);\n    fclose(info->kernel_h);\n    fclose(info->host_c);\n}\n"
  },
  {
    "path": "src/ppcg_files/cuda_common.h",
    "content": "#ifndef _CUDA_COMMON_H_\n#define _CUDA_COMMON_H_\n\n#include <stdio.h>\n\nstruct cuda_info\n{\n\tFILE *host_c;\n\tFILE *kernel_c;\n\tFILE *kernel_h;\n};\n\nvoid cuda_open_files(struct cuda_info *info, const char *input);\nvoid cuda_close_files(struct cuda_info *info);\n\n#endif\n"
  },
  {
    "path": "src/ppcg_files/gpu.c",
    "content": "/*\n * Copyright 2010-2011 INRIA Saclay\n * Copyright 2012-2013 Ecole Normale Superieure\n * Copyright 2015-2016 Sven Verdoolaege\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege, INRIA Saclay - Ile-de-France,\n * Parc Club Orsay Universite, ZAC des vignes, 4 rue Jacques Monod,\n * 91893 Orsay, France\n * and Ecole Normale Superieure, 45 rue d’Ulm, 75230 Paris, France\n */\n\n#include <stdlib.h>\n#include <string.h>\n\n#include <isl/polynomial.h>\n#include <isl/union_set.h>\n#include <isl/aff.h>\n#include <isl/ilp.h>\n#include <isl/flow.h>\n#include <isl/schedule.h>\n#include <isl/schedule_node.h>\n#include <isl/options.h>\n#include <isl/ast_build.h>\n\n#include \"cpu.h\"\n#include \"gpu.h\"\n#include \"gpu_array_tile.h\"\n#include \"gpu_group.h\"\n#include \"gpu_hybrid.h\"\n#include \"gpu_tree.h\"\n#include \"hybrid.h\"\n#include \"schedule.h\"\n#include \"ppcg_options.h\"\n#include \"print.h\"\n#include \"util.h\"\n\nstruct gpu_array_info;\n\n/* Return the name of the outer array (of structs) accessed by \"access\".\n */\nstatic const char *get_outer_array_name(__isl_keep isl_map *access)\n{\n\tisl_space *space;\n\tconst char *name;\n\n\tspace = isl_space_range(isl_map_get_space(access));\n\twhile (space && isl_space_is_wrapping(space))\n\t\tspace = isl_space_domain(isl_space_unwrap(space));\n\tname = isl_space_get_tuple_name(space, isl_dim_set);\n\tisl_space_free(space);\n\n\treturn name;\n}\n\n/* Collect all references to the given array and store pointers to them\n * in array->refs.\n */\nstatic isl_stat collect_references(struct gpu_prog *prog,\n\tstruct gpu_array_info *array)\n{\n\tint i;\n\tint n;\n\n\tn = 0;\n\tfor (i = 0; i < prog->n_stmts; ++i) {\n\t\tstruct gpu_stmt *stmt = &prog->stmts[i];\n\t\tstruct gpu_stmt_access *access;\n\n\t\tfor (access = stmt->accesses; access; access = access->next) {\n\t\t\tconst char *name;\n\t\t\tname = get_outer_array_name(access->access);\n\t\t\tif (name && !strcmp(array->name, name))\n\t\t\t\tn++;\n\t\t}\n\t}\n\n\tarray->refs = isl_alloc_array(prog->ctx, struct gpu_stmt_access *, n);\n\tif (!array->refs)\n\t\treturn isl_stat_error;\n\tarray->n_ref = n;\n\n\tn = 0;\n\tfor (i = 0; i < prog->n_stmts; ++i) {\n\t\tstruct gpu_stmt *stmt = &prog->stmts[i];\n\t\tstruct gpu_stmt_access *access;\n\n\t\tfor (access = stmt->accesses; access; access = access->next) {\n\t\t\tconst char *name;\n\t\t\tname = get_outer_array_name(access->access);\n\t\t\tif (!name || strcmp(array->name, name))\n\t\t\t\tcontinue;\n\n\t\t\tarray->refs[n++] = access;\n\t\t}\n\t}\n\n\treturn isl_stat_ok;\n}\n\n/* Compute and return the extent of \"array\", taking into account the set of\n * accessed elements.\n *\n * In particular, the extent in the outer dimension is taken\n * from \"accessed\", while the extents in the remaining dimensions\n * are taken from array->extent.\n *\n * The extent in the outer dimension cannot be taken from array->extent\n * because that may be unbounded.  Furthermore, even if it is bounded,\n * it may be larger than the piece of the array that is being accessed.\n */\nstatic __isl_give isl_set *compute_extent(struct pet_array *array,\n\t__isl_keep isl_set *accessed)\n{\n\tint n_index;\n\tisl_id *id;\n\tisl_set *outer;\n\tisl_set *extent;\n\n\textent = isl_set_copy(array->extent);\n\n\tn_index = isl_set_dim(accessed, isl_dim_set);\n\tif (n_index == 0)\n\t\treturn extent;\n\n\textent = isl_set_project_out(extent, isl_dim_set, 0, 1);\n\touter = isl_set_copy(accessed);\n\touter = isl_set_project_out(outer, isl_dim_set, 1, n_index - 1);\n\textent = isl_set_flat_product(outer, extent);\n\tid = isl_set_get_tuple_id(accessed);\n\textent = isl_set_set_tuple_id(extent, id);\n\n\treturn extent;\n}\n\n/* Is the array \"array\" being extracted a read-only scalar?\n *\n * That is, is \"array\" a scalar that is never possibly written to.\n * An array containing structures is never considered to be a scalar.\n */\nstatic int is_read_only_scalar(struct gpu_array_info *array,\n\tstruct gpu_prog *prog)\n{\n\tisl_set *space;\n\tisl_union_map *write;\n\tint empty;\n\n\tif (array->has_compound_element)\n\t\treturn 0;\n\tif (array->n_index != 0)\n\t\treturn 0;\n\n\twrite = isl_union_map_copy(prog->may_write);\n\tspace = isl_set_universe(isl_space_copy(array->space));\n\twrite = isl_union_map_intersect_range(write,\n\t\t\t\t\t\tisl_union_set_from_set(space));\n\tempty = isl_union_map_is_empty(write);\n\tisl_union_map_free(write);\n\n\treturn empty;\n}\n\n/* Is \"array\" only accessed as individual, fixed elements?\n * That is, does each access to \"array\" access a single, fixed element?\n */\nstatic isl_bool only_fixed_element_accessed(struct gpu_array_info *array)\n{\n\tint i;\n\n\tfor (i = 0; i < array->n_ref; ++i)\n\t\tif (!array->refs[i]->fixed_element)\n\t\t\treturn isl_bool_false;\n\n\treturn isl_bool_true;\n}\n\n/* Compute bounds on the host array \"pa\" based on the corresponding\n * accessed elements in \"arrays\"\n * and collect all references to the array.\n * Store the results in \"info\".\n *\n * If the array is zero-dimensional and does not contain structures,\n * i.e., if the array is a scalar, we check whether it is read-only.\n * We also check whether the array is accessed at all.\n */\nstatic isl_stat extract_array_info(struct gpu_prog *prog,\n\tstruct gpu_array_info *info, struct pet_array *pa,\n\t__isl_keep isl_union_set *arrays)\n{\n\tint empty;\n\tconst char *name;\n\tint n_index;\n\tisl_multi_pw_aff *bounds;\n\tisl_set *accessed, *extent;\n\n\tn_index = isl_set_dim(pa->extent, isl_dim_set);\n\tname = isl_set_get_tuple_name(pa->extent);\n\n\tinfo->space = isl_set_get_space(pa->extent);\n\tinfo->name = strdup(name);\n\tinfo->n_index = n_index;\n\tinfo->linearize = prog->scop->options->linearize_device_arrays;\n\n\tinfo->type = strdup(pa->element_type);\n\tinfo->size = pa->element_size;\n\tinfo->local = pa->declared && !pa->exposed;\n\tinfo->has_compound_element = pa->element_is_record;\n\tinfo->read_only_scalar = is_read_only_scalar(info, prog);\n\n\tinfo->declared_extent = isl_set_copy(pa->extent);\n\taccessed = isl_union_set_extract_set(arrays,\n\t\t\t\t\t    isl_space_copy(info->space));\n\tempty = isl_set_is_empty(accessed);\n\textent = compute_extent(pa, accessed);\n\tisl_set_free(accessed);\n\tinfo->extent = extent;\n\tif (empty < 0)\n\t\treturn isl_stat_error;\n\tinfo->accessed = !empty;\n\tbounds = ppcg_size_from_extent(isl_set_copy(extent));\n\tbounds = isl_multi_pw_aff_gist(bounds, isl_set_copy(prog->context));\n\tif (!bounds)\n\t\treturn isl_stat_error;\n\tif (!isl_multi_pw_aff_is_cst(bounds))\n\t\tinfo->linearize = 1;\n\tinfo->bound = bounds;\n\n\tif (collect_references(prog, info) < 0)\n\t\treturn isl_stat_error;\n\tinfo->only_fixed_element = only_fixed_element_accessed(info);\n\n\treturn isl_stat_ok;\n}\n\n/* Remove independence from the order constraints \"order\" on array \"array\".\n * Since the pairs of iterations in the filter relation of an independence\n * are guaranteed to be completely independent by the user, there is\n * no need to ensure that live ranges are ordered along those pairs.\n * We make an exception for local variables, though, as the independence\n * guarantee does not apply to those.\n *\n * The order constraints are used in two places.\n * Those on scalars are used in check_scalar_live_ranges to check if\n * we need to force the scalar to be private.  Any non-local scalar\n * should not be forced scalar if it only appears in independent loops.\n * Those on non-scalars are added to the coincidence constraints\n * in compute_schedule because we do not support any array expansion.\n * Accesses to non-local arrays should not prevent a loop from being\n * considered coincident so we should indeed remove those constraints\n * from the order constraints.\n */\nstatic __isl_give isl_union_map *remove_independences(struct gpu_prog *prog,\n\tstruct gpu_array_info *array, __isl_take isl_union_map *order)\n{\n\tint i;\n\n\tfor (i = 0; i < prog->scop->pet->n_independence; ++i) {\n\t\tstruct pet_independence *pi = prog->scop->pet->independences[i];\n\t\tif (isl_union_set_contains(pi->local, array->space))\n\t\t\tcontinue;\n\n\t\torder = isl_union_map_subtract(order,\n\t\t\t\t\t\tisl_union_map_copy(pi->filter));\n\t}\n\n\treturn order;\n}\n\n/* For each array in \"prog\", store the (untagged) order dependences\n * derived from the array in array->dep_order.\n * In particular, consider all references that access the given array\n * and take the order dependences that have one of these references\n * as source.  (Since an order dependence relates two references to\n * the same array, the target of these order dependences will also\n * be one of these references.)\n * Additionally, store the union of these array->dep_order relations\n * for all arrays that cannot be mapped to private memory in prog->array_order.\n */\nvoid collect_order_dependences(struct gpu_prog *prog)\n{\n\tint i;\n\tisl_space *space;\n\tisl_union_map *accesses;\n\n\tspace = isl_union_map_get_space(prog->read);\n\tprog->array_order = isl_union_map_empty(space);\n\n\taccesses = isl_union_map_copy(prog->scop->tagged_reads);\n\taccesses = isl_union_map_union(accesses,\n\t\t\t    isl_union_map_copy(prog->scop->tagged_may_writes));\n\taccesses = isl_union_map_universe(accesses);\n\taccesses = isl_union_map_apply_range(accesses,\n\t\t\t\t\t    isl_union_map_copy(prog->to_outer));\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tstruct gpu_array_info *array = &prog->array[i];\n\t\tisl_set *set;\n\t\tisl_union_set *uset;\n\t\tisl_union_map *order;\n\n\t\tset = isl_set_universe(isl_space_copy(array->space));\n\t\tuset = isl_union_set_from_set(set);\n\t\tuset = isl_union_map_domain(\n\t\t    isl_union_map_intersect_range(isl_union_map_copy(accesses),\n\t\t\t\t\t\t    uset));\n\t\torder = isl_union_map_copy(prog->scop->tagged_dep_order);\n\t\torder = isl_union_map_intersect_domain(order, uset);\n\t\torder = isl_union_map_zip(order);\n\t\torder = isl_union_set_unwrap(isl_union_map_domain(order));\n\t\torder = remove_independences(prog, array, order);\n\t\tarray->dep_order = order;\n\n\t\tif (gpu_array_can_be_private(array))\n\t\t\tcontinue;\n\n\t\tprog->array_order = isl_union_map_union(prog->array_order,\n\t\t\t\t\tisl_union_map_copy(array->dep_order));\n\t}\n\n\tisl_union_map_free(accesses);\n}\n\n/* Construct a gpu_array_info for each array referenced by prog->scop and\n * collect them in prog->array.\n *\n * The sizes are based on the extents and the set of possibly accessed\n * elements by \"prog\".\n * If there are any member accesses involved, then they are first mapped\n * to the outer arrays of structs.\n * Only extract gpu_array_info entries for these outer arrays.\n *\n * If we are allowing live range reordering, then also set\n * the dep_order field.  Otherwise leave it NULL.\n */\nstatic isl_stat collect_array_info(struct gpu_prog *prog)\n{\n\tint i;\n\tisl_stat r = isl_stat_ok;\n\tisl_union_set *arrays;\n\n\tprog->n_array = 0;\n\tprog->array = isl_calloc_array(prog->ctx,\n\t\t\t     struct gpu_array_info, prog->scop->pet->n_array);\n\tif (!prog->array)\n\t\treturn isl_stat_error;\n\n\tarrays = isl_union_map_range(isl_union_map_copy(prog->read));\n\tarrays = isl_union_set_union(arrays,\n\t\t    isl_union_map_range(isl_union_map_copy(prog->may_write)));\n\n\tarrays = isl_union_set_apply(arrays,\n\t\t\t\t\tisl_union_map_copy(prog->to_outer));\n\n\tarrays = isl_union_set_coalesce(arrays);\n\n\tfor (i = 0; i < prog->scop->pet->n_array; ++i) {\n\t\tisl_bool field;\n\n\t\tfield = isl_set_is_wrapping(prog->scop->pet->arrays[i]->extent);\n\t\tif (field < 0)\n\t\t\tbreak;\n\t\tif (field)\n\t\t\tcontinue;\n\t\tif (extract_array_info(prog, &prog->array[prog->n_array++],\n\t\t\t\t\tprog->scop->pet->arrays[i], arrays) < 0)\n\t\t\tr = isl_stat_error;\n\t}\n\tif (i < prog->scop->pet->n_array)\n\t\tr = isl_stat_error;\n\n\tisl_union_set_free(arrays);\n\n\tif (prog->scop->options->live_range_reordering)\n\t\tcollect_order_dependences(prog);\n\n\treturn r;\n}\n\nstatic void free_array_info(struct gpu_prog *prog)\n{\n\tint i;\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tfree(prog->array[i].type);\n\t\tfree(prog->array[i].name);\n\t\tisl_multi_pw_aff_free(prog->array[i].bound);\n\t\tisl_ast_expr_free(prog->array[i].bound_expr);\n\t\tisl_space_free(prog->array[i].space);\n\t\tisl_set_free(prog->array[i].declared_extent);\n\t\tisl_set_free(prog->array[i].extent);\n\t\tisl_ast_expr_free(prog->array[i].declared_size);\n\t\tfree(prog->array[i].refs);\n\t\tisl_union_map_free(prog->array[i].dep_order);\n\t}\n\tfree(prog->array);\n}\n\n/* Check if a gpu array is a scalar.  A scalar is a value that is not stored\n * as an array or through a pointer reference, but as a single data element.\n * At the moment, scalars are represented as zero-dimensional arrays.\n * Note that the single data element may be an entire structure.\n */\nint gpu_array_is_scalar(struct gpu_array_info *array)\n{\n\treturn array->n_index == 0;\n}\n\n/* Can \"array\" be mapped to private memory?\n * That is, is it only accessed as individual elements with\n * constant index expressions?\n */\nisl_bool gpu_array_can_be_private(struct gpu_array_info *array)\n{\n\tif (!array)\n\t\treturn isl_bool_error;\n\treturn array->only_fixed_element;\n}\n\n/* Is \"array\" a read-only scalar?\n */\nint gpu_array_is_read_only_scalar(struct gpu_array_info *array)\n{\n\treturn array->read_only_scalar;\n}\n\n/* Does \"array\" need to be allocated on the device?\n * If it is a read-only scalar, then it will be passed as an argument\n * to the kernel and therefore does not require any allocation.\n * If this device memory is not accessed at all, then it does not\n * need to be allocated either.\n */\nint gpu_array_requires_device_allocation(struct gpu_array_info *array)\n{\n\tif (gpu_array_is_read_only_scalar(array))\n\t\treturn 0;\n\tif (!array->global)\n\t\treturn 0;\n\treturn 1;\n}\n\n/* Return the set of parameter values for which the array has a positive\n * size in all dimensions.\n * If the sizes are only valid for some parameter values, then those\n * constraints are also taken into account.\n */\n__isl_give isl_set *gpu_array_positive_size_guard(struct gpu_array_info *array)\n{\n\tint i;\n\tisl_space *space;\n\tisl_set *guard;\n\n\tif (!array)\n\t\treturn NULL;\n\n\tspace = isl_space_params(isl_space_copy(array->space));\n\tguard = isl_set_universe(space);\n\n\tfor (i = 0; i < array->n_index; ++i) {\n\t\tisl_pw_aff *bound;\n\t\tisl_set *guard_i, *zero;\n\n\t\tbound = isl_multi_pw_aff_get_pw_aff(array->bound, i);\n\t\tguard_i = isl_pw_aff_nonneg_set(isl_pw_aff_copy(bound));\n\t\tzero = isl_pw_aff_zero_set(bound);\n\t\tguard_i = isl_set_subtract(guard_i, zero);\n\t\tguard = isl_set_intersect(guard, guard_i);\n\t}\n\n\treturn guard;\n}\n\n/* Internal data structure for extract_size_of_type.\n * \"type\" specifies the name of the space that we want to extract.\n * \"res\" is used to store the subset of that space.\n */\nstruct ppcg_extract_size_data {\n\tconst char *type;\n\tisl_set *res;\n};\n\n/* This function is called for each set in a union_set.\n * If the name of the set matches data->type, we store the\n * set in data->res.\n */\nstatic isl_stat extract_size_of_type(__isl_take isl_set *size, void *user)\n{\n\tstruct ppcg_extract_size_data *data = user;\n\tconst char *name;\n\n\tname = isl_set_get_tuple_name(size);\n\tif (name && !strcmp(name, data->type)) {\n\t\tdata->res = size;\n\t\treturn isl_stat_error;\n\t}\n\n\tisl_set_free(size);\n\treturn isl_stat_ok;\n}\n\n/* Given a union map { kernel[i] -> *[...] },\n * return the range in the space called \"type\" for the kernel with\n * sequence number \"id\".\n */\nstatic __isl_give isl_set *extract_sizes(__isl_keep isl_union_map *sizes,\n\tconst char *type, int id)\n{\n\tisl_space *space;\n\tisl_set *dom;\n\tisl_union_set *local_sizes;\n\tstruct ppcg_extract_size_data data = { type, NULL };\n\n\tif (!sizes)\n\t\treturn NULL;\n\n\tspace = isl_union_map_get_space(sizes);\n\tspace = isl_space_set_from_params(space);\n\tspace = isl_space_add_dims(space, isl_dim_set, 1);\n\tspace = isl_space_set_tuple_name(space, isl_dim_set, \"kernel\");\n\tdom = isl_set_universe(space);\n\tdom = isl_set_fix_si(dom, isl_dim_set, 0, id);\n\n\tlocal_sizes = isl_union_set_apply(isl_union_set_from_set(dom),\n\t\t\t\t\tisl_union_map_copy(sizes));\n\tisl_union_set_foreach_set(local_sizes, &extract_size_of_type, &data);\n\tisl_union_set_free(local_sizes);\n\treturn data.res;\n}\n\n/* Given a singleton set, extract the first (at most *len) elements\n * of the single integer tuple into *sizes and update *len if needed.\n *\n * If \"set\" is NULL, then the \"sizes\" array is not updated.\n */\nstatic isl_stat read_sizes_from_set(__isl_take isl_set *set, int *sizes,\n\tint *len)\n{\n\tint i;\n\tint dim;\n\n\tif (!set)\n\t\treturn isl_stat_ok;\n\n\tdim = isl_set_dim(set, isl_dim_set);\n\tif (dim < *len)\n\t\t*len = dim;\n\n\tfor (i = 0; i < *len; ++i) {\n\t\tisl_val *v;\n\n\t\tv = isl_set_plain_get_val_if_fixed(set, isl_dim_set, i);\n\t\tif (!v)\n\t\t\tgoto error;\n\t\tsizes[i] = isl_val_get_num_si(v);\n\t\tisl_val_free(v);\n\t}\n\n\tisl_set_free(set);\n\treturn isl_stat_ok;\nerror:\n\tisl_set_free(set);\n\treturn isl_stat_error;\n}\n\n/* Add the map { kernel[id] -> type[sizes] } to gen->used_sizes,\n * if the option debug->dump_sizes is set.\n */\nstatic void set_used_sizes(struct gpu_gen *gen, const char *type, int id,\n\tint *sizes, int len)\n{\n\tint i;\n\tisl_space *space;\n\tisl_map *map;\n\n\tif (!gen->options->debug->dump_sizes)\n\t\treturn;\n\n\tspace = isl_union_map_get_space(gen->used_sizes);\n\tspace = isl_space_set_from_params(space);\n\tspace = isl_space_add_dims(space, isl_dim_set, 1);\n\tspace = isl_space_set_tuple_name(space, isl_dim_set, \"kernel\");\n\tspace = isl_space_from_domain(space);\n\tspace = isl_space_add_dims(space, isl_dim_out, len);\n\tspace = isl_space_set_tuple_name(space, isl_dim_out, type);\n\n\tmap = isl_map_universe(space);\n\tmap = isl_map_fix_si(map, isl_dim_in, 0, id);\n\tfor (i = 0; i < len; ++i)\n\t\tmap = isl_map_fix_si(map, isl_dim_out, i, sizes[i]);\n\n\tgen->used_sizes = isl_union_map_add_map(gen->used_sizes, map);\n}\n\n/* Extract user specified \"tile\" sizes from the \"sizes\" command line option,\n * defaulting to option->tile_size in each dimension.\n * *tile_len contains the maximum number of tile sizes needed.\n * Update *tile_len to the number of specified tile sizes, if any, and\n * return a pointer to the tile sizes (or NULL on error).\n * Add the effectively used sizes to gen->used_sizes.\n */\nstatic int *read_tile_sizes(struct gpu_gen *gen, int *tile_len)\n{\n\tint n;\n\tint *tile_size;\n\tisl_set *size;\n\n\ttile_size = isl_alloc_array(gen->ctx, int, *tile_len);\n\tif (!tile_size)\n\t\treturn NULL;\n\tfor (n = 0; n < *tile_len; ++n)\n\t\ttile_size[n] = gen->options->tile_size;\n\n\tsize = extract_sizes(gen->sizes, \"tile\", gen->kernel_id);\n\tif (read_sizes_from_set(size, tile_size, tile_len) < 0)\n\t\tgoto error;\n\tset_used_sizes(gen, \"tile\", gen->kernel_id, tile_size, *tile_len);\n\n\treturn tile_size;\nerror:\n\tfree(tile_size);\n\treturn NULL;\n}\n\n/* Extract user specified \"block\" sizes from the \"sizes\" command line option,\n * after filling in some potentially useful defaults.\n */\nstatic isl_stat read_block_sizes(struct ppcg_kernel *kernel,\n\t__isl_keep isl_union_map *sizes)\n{\n\tisl_set *size;\n\n\tif (kernel->n_block > 3)\n\t\tkernel->n_block = 3;\n\tswitch (kernel->n_block) {\n\tcase 1:\n\t\tkernel->block_dim[0] = 512;\n\t\tbreak;\n\tcase 2:\n\t\tkernel->block_dim[0] = 32;\n\t\tkernel->block_dim[1] = 16;\n\t\tbreak;\n\tdefault:\n\t\tkernel->block_dim[0] = 32;\n\t\tkernel->block_dim[1] = 4;\n\t\tkernel->block_dim[2] = 4;\n\t\tbreak;\n\t}\n\n\tsize = extract_sizes(sizes, \"block\", kernel->id);\n\treturn read_sizes_from_set(size, kernel->block_dim, &kernel->n_block);\n}\n\n/* Extract user specified \"grid\" sizes from the \"sizes\" command line option,\n * after filling in some potentially useful defaults.\n */\nstatic isl_stat read_grid_sizes(struct ppcg_kernel *kernel,\n\t__isl_keep isl_union_map *sizes)\n{\n\tisl_set *size;\n\n\tif (kernel->n_grid > 2)\n\t\tkernel->n_grid = 2;\n\tswitch (kernel->n_grid) {\n\tcase 1:\n\t\tkernel->grid_dim[0] = 32768;\n\t\tbreak;\n\tdefault:\n\t\tkernel->grid_dim[0] = 256;\n\t\tkernel->grid_dim[1] = 256;\n\t\tbreak;\n\t}\n\n\tsize = extract_sizes(sizes, \"grid\", kernel->id);\n\treturn read_sizes_from_set(size, kernel->grid_dim, &kernel->n_grid);\n}\n\n/* Extract user specified grid and block sizes from the gen->sizes\n * command line option after filling in some potentially useful defaults.\n * Store the extracted sizes in \"kernel\".\n * Add the effectively used sizes to gen->used_sizes.\n */\nstatic isl_stat read_grid_and_block_sizes(struct ppcg_kernel *kernel,\n\tstruct gpu_gen *gen)\n{\n\tif (read_block_sizes(kernel, gen->sizes) < 0)\n\t\treturn isl_stat_error;\n\tif (read_grid_sizes(kernel, gen->sizes) < 0)\n\t\treturn isl_stat_error;\n\tset_used_sizes(gen, \"block\", kernel->id,\n\t\t\t\t\t    kernel->block_dim, kernel->n_block);\n\tset_used_sizes(gen, \"grid\", kernel->id,\n\t\t\t\t\t    kernel->grid_dim, kernel->n_grid);\n\treturn isl_stat_ok;\n}\n\nstatic void *free_stmts(struct gpu_stmt *stmts, int n)\n{\n\tint i;\n\n\tif (!stmts)\n\t\treturn NULL;\n\n\tfor (i = 0; i < n; ++i) {\n\t\tstruct gpu_stmt_access *access, *next;\n\n\t\tfor (access = stmts[i].accesses; access; access = next) {\n\t\t\tnext = access->next;\n\t\t\tisl_id_free(access->ref_id);\n\t\t\tisl_map_free(access->access);\n\t\t\tisl_map_free(access->tagged_access);\n\t\t\tfree(access);\n\t\t}\n\n\t\tisl_id_free(stmts[i].id);\n\t}\n\tfree(stmts);\n\n\treturn NULL;\n}\n\n/* Add parameters p[i] with identifiers \"ids\" to \"set\",\n * with bounds to 0 <= p[i] < size[i].\n */\n__isl_give isl_set *add_bounded_parameters(__isl_take isl_set *set,\n\tint *size, __isl_keep isl_id_list *ids)\n{\n\tint i, len;\n\tunsigned nparam;\n\n\tlen = isl_id_list_n_id(ids);\n\tnparam = isl_set_dim(set, isl_dim_param);\n\tset = isl_set_add_dims(set, isl_dim_param, len);\n\n\tfor (i = 0; i < len; ++i) {\n\t\tisl_id *id;\n\n\t\tid = isl_id_list_get_id(ids, i);\n\t\tset = isl_set_set_dim_id(set, isl_dim_param, nparam + i, id);\n\t\tset = isl_set_lower_bound_si(set, isl_dim_param, nparam + i, 0);\n\t\tset = isl_set_upper_bound_si(set, isl_dim_param,\n\t\t\t\t\t    nparam + i, size[i] - 1);\n\t}\n\n\treturn set;\n}\n\n/* Add \"len\" parameters p[i] with identifiers \"ids\" and intersect \"set\"\n * with\n *\n *\t{ : 0 <= p[i] < size[i] }\n *\n * or an overapproximation.\n */\nstatic __isl_give isl_set *add_bounded_parameters_dynamic(\n\t__isl_take isl_set *set, __isl_keep isl_multi_pw_aff *size,\n\t__isl_keep isl_id_list *ids)\n{\n\tint i, len;\n\tunsigned nparam;\n\tisl_space *space;\n\tisl_local_space *ls;\n\n\tlen = isl_multi_pw_aff_dim(size, isl_dim_out);\n\tnparam = isl_set_dim(set, isl_dim_param);\n\tset = isl_set_add_dims(set, isl_dim_param, len);\n\n\tfor (i = 0; i < len; ++i) {\n\t\tisl_id *id;\n\n\t\tid = isl_id_list_get_id(ids, i);\n\t\tset = isl_set_set_dim_id(set, isl_dim_param, nparam + i, id);\n\t}\n\n\tspace = isl_space_params(isl_set_get_space(set));\n\tls = isl_local_space_from_space(space);\n\tfor (i = 0; i < len; ++i) {\n\t\tisl_pw_aff *param, *size_i, *zero;\n\t\tisl_set *bound;\n\n\t\tparam = isl_pw_aff_var_on_domain(isl_local_space_copy(ls),\n\t\t\t\t\t\tisl_dim_param, nparam + i);\n\n\t\tsize_i = isl_multi_pw_aff_get_pw_aff(size, i);\n\t\tbound = isl_pw_aff_lt_set(isl_pw_aff_copy(param), size_i);\n\t\tbound = isl_set_from_basic_set(isl_set_simple_hull(bound));\n\t\tset = isl_set_intersect_params(set, bound);\n\n\t\tzero = isl_pw_aff_zero_on_domain(isl_local_space_copy(ls));\n\t\tbound = isl_pw_aff_ge_set(param, zero);\n\t\tset = isl_set_intersect_params(set, bound);\n\t}\n\tisl_local_space_free(ls);\n\n\treturn set;\n}\n\n/* Return the union of all tagged access relations in the group.\n */\nstatic __isl_give isl_union_map *group_tagged_access_relation(\n\tstruct gpu_array_ref_group *group)\n{\n\tint i;\n\tisl_union_map *access;\n\n\taccess = isl_union_map_empty(isl_map_get_space(group->access));\n\tfor (i = 0; i < group->n_ref; ++i) {\n\t\tisl_map *map_i;\n\n\t\tmap_i = isl_map_copy(group->refs[i]->tagged_access);\n\t\taccess = isl_union_map_union(access,\n\t\t\t\t\t    isl_union_map_from_map(map_i));\n\t}\n\n\treturn access;\n}\n\n/* Return the extent of \"array\", recomputed from the bounds.\n * The recomputed extent may be simpler than the original extent.\n */\nstatic __isl_give isl_set *array_extent(struct gpu_array_info *array)\n{\n\tint i;\n\tisl_id *id;\n\tisl_space *space;\n\tisl_local_space *ls;\n\tisl_set *extent;\n\n\tid = isl_set_get_tuple_id(array->extent);\n\tspace = isl_set_get_space(array->extent);\n\textent = isl_set_universe(isl_space_copy(space));\n\tls = isl_local_space_from_space(space);\n\tfor (i = 0; i < array->n_index; ++i) {\n\t\tisl_pw_aff *bound;\n\t\tisl_aff *aff;\n\t\tisl_pw_aff *index;\n\t\tisl_set *lt;\n\n\t\textent = isl_set_lower_bound_si(extent, isl_dim_set, i, 0);\n\n\t\taff = isl_aff_var_on_domain(isl_local_space_copy(ls),\n\t\t\t\t\t\tisl_dim_set, i);\n\t\tindex = isl_pw_aff_from_aff(aff);\n\t\tbound = isl_multi_pw_aff_get_pw_aff(array->bound, i);\n\t\tbound = isl_pw_aff_from_range(bound);\n\t\tbound = isl_pw_aff_add_dims(bound, isl_dim_in, array->n_index);\n\t\tbound = isl_pw_aff_set_tuple_id(bound, isl_dim_in,\n\t\t\t\t\t\tisl_id_copy(id));\n\t\tlt = isl_pw_aff_lt_set(index, bound);\n\t\textent = isl_set_intersect(extent, lt);\n\t}\n\tisl_local_space_free(ls);\n\tisl_id_free(id);\n\n\treturn extent;\n}\n\n/* Return a map from the first group->shared_tile->depth dimensions\n * of the computed schedule to the array tile in\n * global memory that corresponds to the shared memory copy.\n *\n * In particular, return a map\n *\n *\t{ D[i] -> A[a] }\n *\n * with constraints\n *\n *\ttile_offset(i) <= a <= tile_offset(i) + tile_size - 1\t\t(1)\n *\n * and\n *\n *\t0 <= a <= array_size - 1\t\t\t\t\t(2)\n *\n * Note that if some stride has been detected (i.e., when\n * group->shared_tile->bound[i].shift is set), then a in (1) refers\n * to the shifted and scaled down version.\n *\n * Constraints (1) are obtained by mapping the size constraints on the\n * shared/private memory tile back to the access relation.\n * Constraints (2) are obtained from the (recomputed) extent.\n */\nstatic __isl_give isl_map *group_tile(struct gpu_array_ref_group *group)\n{\n\tint i;\n\tint n_index = group->array->n_index;\n\tisl_map *tile;\n\tisl_space *space;\n\tisl_set *local;\n\tisl_set *extent;\n\n\tspace = isl_multi_aff_get_space(group->shared_tile->tiling);\n\tspace = isl_space_range(space);\n\tlocal = isl_set_universe(space);\n\tfor (i = 0; i < n_index; ++i) {\n\t\tisl_val *bound;\n\n\t\tlocal = isl_set_lower_bound_si(local, isl_dim_set, i, 0);\n\t\tbound = isl_val_copy(group->shared_tile->bound[i].size);\n\t\tbound = isl_val_sub_ui(bound, 1);\n\t\tlocal = isl_set_upper_bound_val(local, isl_dim_set, i, bound);\n\t}\n\tlocal = isl_set_preimage_multi_aff(local,\n\t\t\t\tisl_multi_aff_copy(group->shared_tile->tiling));\n\ttile = isl_set_unwrap(local);\n\textent = array_extent(group->array);\n\ttile = isl_map_intersect_range(tile, extent);\n\n\treturn tile;\n}\n\n/* Given a mapping \"iterator_map\" from the AST schedule to a domain,\n * return the corresponding mapping from the AST schedule\n * to the outer kernel->copy_schedule_dim dimensions of\n * the schedule computed by PPCG for this kernel.\n *\n * Note that kernel->copy_schedule_dim is at least as large as\n * the largest depth of any array reference group associated to the kernel.\n * This is needed as the returned schedule is used to extract a mapping\n * to the outer tile->depth dimensions in transform_index.\n */\nstatic __isl_give isl_pw_multi_aff *compute_sched_to_copy(\n\tstruct ppcg_kernel *kernel, __isl_take isl_pw_multi_aff *iterator_map)\n{\n\tisl_union_pw_multi_aff *upma;\n\tisl_pw_multi_aff *pma;\n\tisl_space *space;\n\n\tspace = isl_space_range(isl_pw_multi_aff_get_space(iterator_map));\n\tspace = isl_space_from_domain(space);\n\tspace = isl_space_add_dims(space, isl_dim_out,\n\t\t\t\t\tkernel->copy_schedule_dim);\n\n\tupma = isl_union_pw_multi_aff_copy(kernel->copy_schedule);\n\tpma = isl_union_pw_multi_aff_extract_pw_multi_aff(upma, space);\n\tisl_union_pw_multi_aff_free(upma);\n\n\treturn isl_pw_multi_aff_pullback_pw_multi_aff(pma, iterator_map);\n}\n\n/* If max_shared_memory is not set to infinity (-1), then make\n * sure that the total amount of shared memory required by the\n * array reference groups mapped to shared memory by \"kernel\"\n * is no larger than this maximum.\n *\n * We apply a greedy approach and discard (keep in global memory)\n * those groups that would result in a total memory size that\n * is larger than the maximum.\n *\n * This function should be called after any function that may\n * affect the decision on whether to place a reference group\n * in private, shared or global memory.\n */\nstatic void check_shared_memory_bound(struct ppcg_kernel *kernel)\n{\n\tint i, j;\n\tisl_val *left, *size;\n\n\tif (kernel->options->max_shared_memory < 0)\n\t\treturn;\n\n\tleft = isl_val_int_from_si(kernel->ctx,\n\t\t\t\t    kernel->options->max_shared_memory);\n\n\tfor (i = 0; i < kernel->n_array; ++i) {\n\t\tstruct gpu_local_array_info *local = &kernel->array[i];\n\n\t\tfor (j = 0; j < local->n_group; ++j) {\n\t\t\tstruct gpu_array_ref_group *group;\n\t\t\tenum ppcg_group_access_type type;\n\n\t\t\tgroup = local->groups[j];\n\t\t\ttype = gpu_array_ref_group_type(group);\n\t\t\tif (type != ppcg_access_shared)\n\t\t\t\tcontinue;\n\n\t\t\tsize = gpu_array_tile_size(group->shared_tile);\n\t\t\tsize = isl_val_mul_ui(size, local->array->size);\n\n\t\t\tif (isl_val_le(size, left)) {\n\t\t\t\tleft = isl_val_sub(left, size);\n\t\t\t\tcontinue;\n\t\t\t}\n\t\t\tisl_val_free(size);\n\n\t\t\tgroup->shared_tile =\n\t\t\t\t\tgpu_array_tile_free(group->shared_tile);\n\t\t}\n\t}\n\n\tisl_val_free(left);\n}\n\n/* Mark all arrays of \"kernel\" that have an array reference group\n * that is not mapped to private or shared memory as\n * accessing the corresponding global device memory.\n */\nstatic void mark_global_arrays(struct ppcg_kernel *kernel)\n{\n\tint i, j;\n\n\tfor (i = 0; i < kernel->n_array; ++i) {\n\t\tstruct gpu_local_array_info *local = &kernel->array[i];\n\n\t\tif (local->global)\n\t\t\tcontinue;\n\t\tfor (j = 0; j < local->n_group; ++j) {\n\t\t\tif (gpu_array_ref_group_tile(local->groups[j]))\n\t\t\t\tcontinue;\n\n\t\t\tlocal->global = 1;\n\t\t\tlocal->array->global = 1;\n\t\t\tbreak;\n\t\t}\n\t}\n}\n\n/* Compute a tiling for all the array reference groups in \"kernel\".\n */\nstatic void compute_group_tilings(struct ppcg_kernel *kernel)\n{\n\tint i, j;\n\n\tfor (i = 0; i < kernel->n_array; ++i) {\n\t\tstruct gpu_local_array_info *array = &kernel->array[i];\n\n\t\tfor (j = 0; j < array->n_group; ++j)\n\t\t\tgpu_array_ref_group_compute_tiling(array->groups[j]);\n\t}\n}\n\n/* Compute the effective grid size as a list of the sizes in each dimension.\n *\n * The grid size specified by the user or set by default\n * in read_grid_sizes() and applied by the block filter,\n * may be too large for the given code in the sense that\n * it may contain blocks that don't need to execute anything.\n * We therefore don't return this grid size, but instead the\n * smallest grid size that ensures that all blocks that actually\n * execute code are included in the grid.\n *\n * We first extract a description of the grid, i.e., the possible values\n * of the block ids, from the domain elements in \"domain\" and\n * kernel->block_filter.\n * The block ids are parameters in kernel->block_filter.\n * We simply need to change them into set dimensions.\n *\n * Then, for each block dimension, we compute the maximal value of the block id\n * and add one.\n */\nstatic __isl_give isl_multi_pw_aff *extract_grid_size(\n\tstruct ppcg_kernel *kernel, __isl_take isl_union_set *domain)\n{\n\tint i;\n\tisl_set *grid;\n\tisl_set *context;\n\tisl_multi_pw_aff *size;\n\n\tdomain = isl_union_set_intersect(domain,\n\t\t\t\t    isl_union_set_copy(kernel->block_filter));\n\tgrid = isl_union_set_params(domain);\n\tgrid = isl_set_from_params(grid);\n\tgrid = isl_set_add_dims(grid, isl_dim_set, kernel->n_grid);\n\tfor (i = 0; i < kernel->n_grid; ++i) {\n\t\tint pos;\n\t\tisl_id *id;\n\n\t\tif (!grid)\n\t\t\treturn NULL;\n\n\t\tid = isl_id_list_get_id(kernel->block_ids, i);\n\t\tpos = isl_set_find_dim_by_id(grid, isl_dim_param, id);\n\t\tisl_id_free(id);\n\t\tif (pos < 0)\n\t\t\tisl_die(isl_set_get_ctx(grid), isl_error_internal,\n\t\t\t\t\"missing constraints on block identifier\",\n\t\t\t\tgrid = isl_set_free(grid));\n\t\tgrid = isl_set_equate(grid, isl_dim_param, pos, isl_dim_set, i);\n\t\tgrid = isl_set_project_out(grid, isl_dim_param, pos, 1);\n\t}\n\n\tgrid = isl_set_coalesce(grid);\n\tsize = ppcg_size_from_extent(grid);\n\tcontext = isl_set_params(isl_set_copy(kernel->context));\n\treturn isl_multi_pw_aff_gist(size, context);\n}\n\n/* Compute the size of a fixed bounding box around the origin and \"set\",\n * where \"set\" is assumed to contain only non-negative elements,\n * and store the results in \"size\".\n * In particular, compute the maximal value of \"set\" in each direction\n * and add one.\n */\nstatic void extract_fixed_size(__isl_take isl_set *set, int *size)\n{\n\tint i, n;\n\tisl_local_space *ls;\n\tisl_aff *obj;\n\n\tn = isl_set_dim(set, isl_dim_set);\n\tls = isl_local_space_from_space(isl_set_get_space(set));\n\tobj = isl_aff_zero_on_domain(ls);\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_val *max;\n\n\t\tobj = isl_aff_set_coefficient_si(obj, isl_dim_in, i, 1);\n\t\tmax = isl_set_max_val(set, obj);\n\t\tsize[i] = isl_val_get_num_si(max) + 1;\n\t\tisl_val_free(max);\n\t\tobj = isl_aff_set_coefficient_si(obj, isl_dim_in, i, 0);\n\t}\n\tisl_aff_free(obj);\n\tisl_set_free(set);\n}\n\n/* Compute the effective block size as a list of the sizes in each dimension\n * and store the sizes in kernel->block_dim.\n *\n * The block size specified by the user or set by default\n * in read_block_sizes() and applied by the thread filter,\n * may be too large for the given code in the sense that\n * it may contain threads that don't need to execute anything.\n * We therefore update this block size in kernel->block_dim\n * to the smallest block size that ensures that all threads\n * that actually execute code are included in the block.\n *\n * The set of possible values of the thread ids is obtained from\n * the domain elements \"domain\" and kernel->thread_filter.\n * The current implementation eliminates all parameters, ensuring\n * that the size is a fixed constant in each dimension.\n * In principle we could also compute parametric sizes.\n * We would have to make sure to project out all b%d and t%d parameters,\n * however.\n */\nstatic isl_stat extract_block_size(struct ppcg_kernel *kernel,\n\t__isl_take isl_union_set *domain)\n{\n\tint i;\n\tint nparam;\n\tisl_set *block;\n\n\tdomain = isl_union_set_intersect(domain,\n\t\t\t\t    isl_union_set_copy(kernel->thread_filter));\n\tblock = isl_union_set_params(domain);\n\tblock = isl_set_from_params(block);\n\tblock = isl_set_add_dims(block, isl_dim_set, kernel->n_block);\n\tfor (i = 0; i < kernel->n_block; ++i) {\n\t\tint pos;\n\t\tisl_id *id;\n\n\t\tif (!block)\n\t\t\treturn isl_stat_error;\n\n\t\tid = isl_id_list_get_id(kernel->thread_ids, i);\n\t\tpos = isl_set_find_dim_by_id(block, isl_dim_param, id);\n\t\tisl_id_free(id);\n\t\tif (pos < 0)\n\t\t\tisl_die(isl_set_get_ctx(block), isl_error_internal,\n\t\t\t\t\"missing constraints on thread identifier\",\n\t\t\t\tblock = isl_set_free(block));\n\t\tblock = isl_set_equate(block, isl_dim_param, pos,\n\t\t\t\t\tisl_dim_set, i);\n\t}\n\tnparam = isl_set_dim(block, isl_dim_param);\n\tblock = isl_set_project_out(block, isl_dim_param, 0, nparam);\n\n\tif (!block)\n\t\treturn isl_stat_error;\n\n\textract_fixed_size(block, kernel->block_dim);\n\n\treturn isl_stat_ok;\n}\n\nstruct ppcg_kernel *ppcg_kernel_free(struct ppcg_kernel *kernel)\n{\n\tint i, j;\n\n\tif (!kernel)\n\t\treturn NULL;\n\n\tisl_id_list_free(kernel->block_ids);\n\tisl_id_list_free(kernel->thread_ids);\n\tisl_multi_pw_aff_free(kernel->grid_size);\n\tisl_ast_expr_free(kernel->grid_size_expr);\n\tisl_set_free(kernel->context);\n\tisl_union_set_free(kernel->core);\n\tisl_union_set_free(kernel->arrays);\n\tisl_union_pw_multi_aff_free(kernel->contraction);\n\tisl_union_set_free(kernel->expanded_domain);\n\tisl_space_free(kernel->space);\n\tisl_ast_node_free(kernel->tree);\n\tisl_union_set_free(kernel->block_filter);\n\tisl_union_set_free(kernel->thread_filter);\n\tisl_union_pw_multi_aff_free(kernel->copy_schedule);\n\tisl_union_set_free(kernel->sync_writes);\n\n\tfor (i = 0; i < kernel->n_array; ++i) {\n\t\tstruct gpu_local_array_info *array = &kernel->array[i];\n\n\t\tfor (j = 0; j < array->n_group; ++j)\n\t\t\tgpu_array_ref_group_free(array->groups[j]);\n\t\tfree(array->groups);\n\n\t\tisl_multi_pw_aff_free(array->bound);\n\t\tisl_ast_expr_free(array->bound_expr);\n\t}\n\tfree(kernel->array);\n\n\tfor (i = 0; i < kernel->n_var; ++i) {\n\t\tfree(kernel->var[i].name);\n\t\tisl_vec_free(kernel->var[i].size);\n\t}\n\tfree(kernel->var);\n\n\tfree(kernel);\n\n\treturn NULL;\n}\n\n/* Wrapper around ppcg_kernel_free for use as a isl_id_set_free_user callback.\n */\nstatic void ppcg_kernel_free_wrap(void *user)\n{\n\tstruct ppcg_kernel *kernel = user;\n\n\tppcg_kernel_free(kernel);\n}\n\nstatic void create_kernel_var(isl_ctx *ctx, struct gpu_array_ref_group *group,\n\tstruct ppcg_kernel_var *var)\n{\n\tint j;\n\tstruct gpu_array_tile *tile;\n\tisl_printer *p;\n\n\tvar->array = group->array;\n\n\tvar->type = gpu_array_ref_group_type(group);\n\ttile = gpu_array_ref_group_tile(group);\n\n\tp = isl_printer_to_str(ctx);\n\tp = gpu_array_ref_group_print_name(group, p);\n\tvar->name = isl_printer_get_str(p);\n\tisl_printer_free(p);\n\n\tvar->size = isl_vec_alloc(ctx, group->array->n_index);\n\n\tfor (j = 0; j < group->array->n_index; ++j)\n\t\tvar->size = isl_vec_set_element_val(var->size, j,\n\t\t\t\t\t    isl_val_copy(tile->bound[j].size));\n}\n\nstatic isl_stat create_kernel_vars(struct ppcg_kernel *kernel)\n{\n\tint i, j, n;\n\n\tn = 0;\n\tfor (i = 0; i < kernel->n_array; ++i) {\n\t\tstruct gpu_local_array_info *array = &kernel->array[i];\n\n\t\tfor (j = 0; j < array->n_group; ++j) {\n\t\t\tstruct gpu_array_ref_group *group = array->groups[j];\n\t\t\tenum ppcg_group_access_type type;\n\n\t\t\ttype = gpu_array_ref_group_type(group);\n\t\t\tif (type != ppcg_access_global)\n\t\t\t\t++n;\n\t\t}\n\t}\n\n\tkernel->var = isl_calloc_array(kernel->ctx, struct ppcg_kernel_var, n);\n\tif (!kernel->var)\n\t\treturn isl_stat_error;\n\tkernel->n_var = n;\n\n\tn = 0;\n\tfor (i = 0; i < kernel->n_array; ++i) {\n\t\tstruct gpu_local_array_info *array = &kernel->array[i];\n\n\t\tfor (j = 0; j < array->n_group; ++j) {\n\t\t\tstruct gpu_array_ref_group *group = array->groups[j];\n\t\t\tenum ppcg_group_access_type type;\n\n\t\t\ttype = gpu_array_ref_group_type(group);\n\t\t\tif (type == ppcg_access_global)\n\t\t\t\tcontinue;\n\t\t\tcreate_kernel_var(kernel->ctx, group, &kernel->var[n]);\n\t\t\t++n;\n\t\t}\n\t}\n\n\treturn isl_stat_ok;\n}\n\n/* Replace \"pa\" by the zero function defined over the universe domain\n * in the space of \"pa\".\n */\nstatic __isl_give isl_pw_aff *set_universally_zero(__isl_take isl_pw_aff *pa)\n{\n\tisl_space *space;\n\tisl_aff *zero;\n\n\tspace = isl_space_domain(isl_pw_aff_get_space(pa));\n\tisl_pw_aff_free(pa);\n\tzero = isl_aff_zero_on_domain(isl_local_space_from_space(space));\n\n\treturn isl_pw_aff_from_aff(zero);\n}\n\n/* The sizes of the arrays on the host that have been computed by\n * extract_array_info may depend on the parameters.  Use the extra\n * constraints on the parameters that are valid at \"host_domain\"\n * to simplify these expressions and store the results in kernel->array.\n *\n * We only need these localized bounds for arrays that are accessed\n * by the current kernel.  If we have found at least one reference group\n * then the array is accessed by the kernel.\n *\n * The resulting sizes may be functions that are nowhere defined\n * in case the access function cannot possibly access anything inside\n * the kernel for some reason.  If so, they are replaced by the zero\n * function.  Since the access function cannot actually access anything,\n * there is no harm in printing the array sizes as zero.\n */\nstatic void localize_bounds(struct ppcg_kernel *kernel,\n\t__isl_keep isl_set *host_domain)\n{\n\tint i, j;\n\tisl_set *context;\n\n\tcontext = isl_set_copy(host_domain);\n\tcontext = isl_set_params(context);\n\n\tfor (i = 0; i < kernel->n_array; ++i) {\n\t\tstruct gpu_local_array_info *local = &kernel->array[i];\n\t\tisl_multi_pw_aff *bound;\n\t\tint n_index;\n\n\t\tif (local->n_group == 0)\n\t\t\tcontinue;\n\n\t\tn_index = local->array->n_index;\n\t\tbound = isl_multi_pw_aff_copy(local->array->bound);\n\n\t\tfor (j = 0; j < n_index; ++j) {\n\t\t\tisl_pw_aff *pwaff;\n\t\t\tint empty;\n\n\t\t\tpwaff = isl_multi_pw_aff_get_pw_aff(bound, j);\n\t\t\tpwaff = isl_pw_aff_gist(pwaff, isl_set_copy(context));\n\t\t\tempty = isl_pw_aff_is_empty(pwaff);\n\t\t\tif (empty < 0)\n\t\t\t\tpwaff = isl_pw_aff_free(pwaff);\n\t\t\telse if (empty)\n\t\t\t\tpwaff = set_universally_zero(pwaff);\n\t\t\tbound = isl_multi_pw_aff_set_pw_aff(bound, j, pwaff);\n\t\t}\n\n\t\tlocal->n_index = n_index;\n\t\tlocal->bound = bound;\n\t}\n\tisl_set_free(context);\n}\n\n/* Create the array of gpu_local_array_info structures \"array\"\n * inside \"kernel\".  The number of elements in this array is\n * the same as the number of arrays in \"prog\".\n * Initialize the \"array\" field of each local array to point\n * to the corresponding array in \"prog\".\n */\nstatic struct ppcg_kernel *ppcg_kernel_create_local_arrays(\n\tstruct ppcg_kernel *kernel, struct gpu_prog *prog)\n{\n\tint i;\n\tisl_ctx *ctx;\n\n\tif (!kernel)\n\t\treturn NULL;\n\n\tctx = isl_set_get_ctx(prog->context);\n\tkernel->array = isl_calloc_array(ctx,\n\t\t\t    struct gpu_local_array_info, prog->n_array);\n\tif (!kernel->array)\n\t\treturn ppcg_kernel_free(kernel);\n\tkernel->n_array = prog->n_array;\n\n\tfor (i = 0; i < prog->n_array; ++i)\n\t\tkernel->array[i].array = &prog->array[i];\n\n\treturn kernel;\n}\n\n/* Does \"kernel\" need to be passed an argument corresponding to array \"i\"?\n *\n * The argument is only needed if the kernel accesses this device memory.\n */\nint ppcg_kernel_requires_array_argument(struct ppcg_kernel *kernel, int i)\n{\n\treturn kernel->array[i].global;\n}\n\n/* Find the element in gen->stmt that has the given \"id\".\n * Return NULL if no such gpu_stmt can be found.\n */\nstatic struct gpu_stmt *find_stmt(struct gpu_prog *prog, __isl_keep isl_id *id)\n{\n\tint i;\n\n\tfor (i = 0; i < prog->n_stmts; ++i) {\n\t\tif (id == prog->stmts[i].id)\n\t\t\tbreak;\n\t}\n\n\treturn i < prog->n_stmts ? &prog->stmts[i] : NULL;\n}\n\nvoid ppcg_kernel_stmt_free(void *user)\n{\n\tstruct ppcg_kernel_stmt *stmt = user;\n\n\tif (!stmt)\n\t\treturn;\n\n\tswitch (stmt->type) {\n\tcase ppcg_kernel_copy:\n\t\tisl_ast_expr_free(stmt->u.c.index);\n\t\tisl_ast_expr_free(stmt->u.c.local_index);\n\t\tbreak;\n\tcase ppcg_kernel_domain:\n\t\tisl_id_to_ast_expr_free(stmt->u.d.ref2expr);\n\t\tbreak;\n\tcase ppcg_kernel_sync:\n\t\tbreak;\n\t}\n\n\tfree(stmt);\n}\n\n/* Return the gpu_stmt_access in the list \"accesses\" that corresponds\n * to \"ref_id\".\n */\nstatic struct gpu_stmt_access *find_access(struct gpu_stmt_access *accesses,\n\t__isl_keep isl_id *ref_id)\n{\n\tstruct gpu_stmt_access *access;\n\n\tfor (access = accesses; access; access = access->next)\n\t\tif (access->ref_id == ref_id)\n\t\t\treturn access;\n\n\treturn NULL;\n}\n\n/* Return the index of the array called \"name\" in the list of arrays.\n */\nstatic int find_array_index(struct ppcg_kernel *kernel, const char *name)\n{\n\tint i;\n\n\tfor (i = 0; i < kernel->n_array; ++i)\n\t\tif (!strcmp(name, kernel->array[i].array->name))\n\t\t\treturn i;\n\n\treturn -1;\n}\n\n/* Internal data structure for the index and AST expression transformation\n * callbacks for pet_stmt_build_ast_exprs.\n *\n * \"kernel\" is the kernel for which are computing AST expressions and\n * may be NULL if we are not inside a kernel.\n * \"accesses\" is the list of gpu_stmt_access in the statement.\n * \"iterator_map\" expresses the statement iterators in terms of\n * the AST loop iterators.\n * \"sched2copy\" expresses the outer copy_schedule_dim dimensions of\n * the kernel schedule in terms of the AST loop iterators and\n * may be NULL if we are not inside a kernel.\n *\n * The following fields are set in transform_index and used in transform_expr.\n * \"array\" is the array that is being accessed.\n * \"global\" is set if the global array is accessed (rather than\n * shared/private memory).\n * \"local_array\" refers to information on the array specialized\n * to the current kernel.\n */\nstruct ppcg_transform_data {\n\tstruct ppcg_kernel *kernel;\n\tstruct gpu_stmt_access *accesses;\n\tisl_pw_multi_aff *iterator_map;\n\tisl_pw_multi_aff *sched2copy;\n\n\tstruct gpu_array_info *array;\n\tint global;\n\tstruct gpu_local_array_info *local_array;\n};\n\n/* Return a pointer to the gpu_array_ref_group in \"local\"\n * that contains the reference \"access\".\n * Return NULL if no such group can be found.\n */\nstatic struct gpu_array_ref_group *find_ref_group(\n\tstruct gpu_local_array_info *local, struct gpu_stmt_access *access)\n{\n\tint i, j;\n\n\tfor (i = 0; i < local->n_group; ++i) {\n\t\tstruct gpu_array_ref_group *group = local->groups[i];\n\n\t\tfor (j = 0; j < group->n_ref; ++j)\n\t\t\tif (group->refs[j] == access)\n\t\t\t\treturn group;\n\t}\n\n\treturn NULL;\n}\n\n/* Given an index expression \"index\" of the form\n *\n *\tL -> F(A),\n *\n * with F(A) either A or some subfield of A and L the AST loop iterators,\n * and a tiling \"tiling\" of the form\n *\n *\t[L -> A] -> T\n *\n * apply the tiling to the outer array in the index expression to obtain\n *\n *\tL -> T(A)\n *\n * If F(A) is some subfield of A, then separate the member access\n * into the base index expression and the field index expression,\n * apply the tiling to the base index expression and combine the result\n * with the field index expression.\n *\n * If F(A) is A, then modify index to keep track of the iterators\n *\n *\tL -> [L -> A]\n *\n * and combine the result with the tiling to obtain a tiled index expression\n * in terms of the AST loop iterators\n *\n *\tL -> T\n */\nstatic __isl_give isl_multi_pw_aff *tile_outer(\n\t__isl_take isl_multi_pw_aff *index, __isl_take isl_multi_pw_aff *tiling)\n{\n\tisl_bool is_wrapping;\n\tisl_space *space;\n\tisl_multi_pw_aff *mpa;\n\n\tis_wrapping = isl_multi_pw_aff_range_is_wrapping(index);\n\tif (is_wrapping < 0)\n\t\tgoto error;\n\tif (is_wrapping) {\n\t\tisl_multi_pw_aff *field;\n\n\t\tfield = isl_multi_pw_aff_copy(index);\n\t\tfield = isl_multi_pw_aff_range_factor_range(field);\n\t\tindex = isl_multi_pw_aff_range_factor_domain(index);\n\t\tindex = tile_outer(index, tiling);\n\t\treturn isl_multi_pw_aff_range_product(index, field);\n\t}\n\n\tspace = isl_space_domain(isl_multi_pw_aff_get_space(index));\n\tspace = isl_space_map_from_set(space);\n\tmpa = isl_multi_pw_aff_identity(space);\n\tindex = isl_multi_pw_aff_range_product(mpa, index);\n\tindex = isl_multi_pw_aff_pullback_multi_pw_aff(tiling, index);\n\n\treturn index;\nerror:\n\tisl_multi_pw_aff_free(index);\n\tisl_multi_pw_aff_free(tiling);\n\treturn NULL;\n}\n\n/* Index transformation callback for pet_stmt_build_ast_exprs.\n *\n * \"index\" expresses the array indices in terms of statement iterators\n *\n * We first reformulate \"index\" in terms of the AST loop iterators.\n * Then we check if we are accessing the global array or\n * a shared/private copy.  In particular, if we are not inside a kernel\n * then we must be accessing a global array.\n * In the former case, we simply return\n * the updated index.  If \"index\" is an affine expression rather\n * than an array access, then we also return the updated index here.\n *\n * If no reference groups have been computed for the array,\n * then we can only be accessing the global array.\n *\n * Otherwise, we apply the tiling to the index.\n * This tiling is of the form\n *\n *\t[D -> A] -> T\n *\n * where D corresponds to the outer tile->depth dimensions of\n * the kernel schedule.\n * The index is of the form\n *\n *\tL -> A\n *\n * We update the tiling to refer to the AST loop iterators\n *\n *\t[L -> A] -> T\n *\n * and combine it with the index to obtain a tiled index expression in terms\n * of the AST loop iterators\n *\n *\tL -> T\n *\n * Note that while the tiling applies directly to an outer array.\n * the index may refer to some subfield of this outer array.\n * In such cases, the result will refer to the same subfield of the tile.\n * That is, an index expression of the form  L -> F(A) will be transformed\n * into an index expression of the form L -> F(T).\n */\nstatic __isl_give isl_multi_pw_aff *transform_index(\n\t__isl_take isl_multi_pw_aff *index, __isl_keep isl_id *ref_id,\n\tvoid *user)\n{\n\tstruct ppcg_transform_data *data = user;\n\tstruct gpu_stmt_access *access;\n\tstruct gpu_array_ref_group *group;\n\tstruct gpu_array_tile *tile;\n\tisl_pw_multi_aff *iterator_map;\n\tint i;\n\tint dim;\n\tconst char *name;\n\tisl_space *space;\n\tisl_multi_pw_aff *tiling;\n\tisl_pw_multi_aff *pma;\n\tisl_pw_multi_aff *sched2depth;\n\n\tdata->array = NULL;\n\n\titerator_map = isl_pw_multi_aff_copy(data->iterator_map);\n\tindex = isl_multi_pw_aff_pullback_pw_multi_aff(index, iterator_map);\n\n\tif (!data->kernel)\n\t\treturn index;\n\n\taccess = find_access(data->accesses, ref_id);\n\tif (!access)\n\t\treturn index;\n\tif (!isl_map_has_tuple_name(access->access, isl_dim_out))\n\t\treturn index;\n\n\tname = get_outer_array_name(access->access);\n\tif (!name)\n\t\treturn isl_multi_pw_aff_free(index);\n\ti = find_array_index(data->kernel, name);\n\tif (i < 0)\n\t\tisl_die(isl_multi_pw_aff_get_ctx(index), isl_error_internal,\n\t\t\t\"cannot find array\",\n\t\t\treturn isl_multi_pw_aff_free(index));\n\tdata->local_array = &data->kernel->array[i];\n\tdata->array = data->local_array->array;\n\n\tgroup = find_ref_group(data->local_array, access);\n\tif (!group) {\n\t\tdata->global = 1;\n\t\treturn index;\n\t}\n\n\ttile = gpu_array_ref_group_tile(group);\n\tdata->global = !tile;\n\tif (!tile)\n\t\treturn index;\n\n\tspace = isl_space_domain(isl_multi_aff_get_space(tile->tiling));\n\tspace = isl_space_range(isl_space_unwrap(space));\n\tspace = isl_space_map_from_set(space);\n\tpma = isl_pw_multi_aff_identity(space);\n\tsched2depth = isl_pw_multi_aff_copy(data->sched2copy);\n\tdim = isl_pw_multi_aff_dim(sched2depth, isl_dim_out);\n\tsched2depth = isl_pw_multi_aff_drop_dims(sched2depth, isl_dim_out,\n\t\t\t\t\t    tile->depth, dim - tile->depth);\n\tpma = isl_pw_multi_aff_product(sched2depth, pma);\n\ttiling = isl_multi_pw_aff_from_multi_aff(\n\t\t\t\t    isl_multi_aff_copy(tile->tiling));\n\ttiling = isl_multi_pw_aff_pullback_pw_multi_aff(tiling, pma);\n\n\tindex = tile_outer(index, tiling);\n\n\treturn index;\n}\n\n/* Dereference \"expr\" by adding an index [0].\n * The original \"expr\" is assumed not to have any indices.\n *\n * If \"expr\" is a member access, then the dereferencing needs\n * to be applied to the structure argument of this member access.\n */\nstatic __isl_give isl_ast_expr *dereference(__isl_take isl_ast_expr *expr)\n{\n\tisl_ctx *ctx;\n\tisl_ast_expr *arg0, *res;\n\tisl_ast_expr_list *list;\n\n\targ0 = isl_ast_expr_get_op_arg(expr, 0);\n\tif (!arg0)\n\t\treturn isl_ast_expr_free(expr);\n\tif (isl_ast_expr_get_type(arg0) == isl_ast_expr_op &&\n\t    isl_ast_expr_get_op_type(arg0) == isl_ast_op_member) {\n\t\tisl_ast_expr *arg;\n\n\t\targ = isl_ast_expr_get_op_arg(arg0, 0);\n\t\targ = dereference(arg);\n\t\targ0 = isl_ast_expr_set_op_arg(arg0, 0, arg);\n\t\texpr = isl_ast_expr_set_op_arg(expr, 0, arg0);\n\n\t\treturn expr;\n\t}\n\tisl_ast_expr_free(arg0);\n\n\tctx = isl_ast_expr_get_ctx(expr);\n\tres = isl_ast_expr_from_val(isl_val_zero(ctx));\n\tlist = isl_ast_expr_list_from_ast_expr(res);\n\tres = isl_ast_expr_get_op_arg(expr, 0);\n\tres = isl_ast_expr_access(res, list);\n\tisl_ast_expr_free(expr);\n\n\treturn res;\n}\n\n/* Linearize the index expression \"expr\" based on the array bounds\n * of \"array\".\n *\n * That is, transform expression\n *\n *\tA[i_0][i_1]...[i_n]\n *\n * to\n *\n *\tA[(..((i_0 * b_1 + i_1) ... ) * b_n + i_n]\n *\n * where b_0, b_1, ..., b_n are the bounds on the array.\n *\n * If the base of \"expr\" is a member access, then the linearization needs\n * to be applied to the structure argument of this member access.\n *\n * In the base case, if \"expr\" has no arguments (other than the name of\n * the array), then we are passing an entire array to a function.\n * In this case, there is nothing to linearize.\n * Note that at this point an expression with no arguments can\n * only be an entire array because the scalar case and\n * the case of single struct are handled by the caller.\n *\n * If the number of specified index expressions in \"expr\"\n * is smaller than the dimension of the accessed array,\n * then the missing i_j also do not appear in the linearized expression.\n * Furthermore, since such an expression does not refer to a single\n * element while the default linearized expression would refer to\n * a single element, we return the expression\n *\n *\tA + (..((i_0 * b_1 + i_1) ... ) * b_l + i_l)\n *\n * instead.  Note that because of the special case handling above,\n * we can assume here that there is at least one index expression.\n */\n__isl_give isl_ast_expr *gpu_local_array_info_linearize_index(\n\tstruct gpu_local_array_info *array, __isl_take isl_ast_expr *expr)\n{\n\tint i, n;\n\tisl_ast_expr *arg0;\n\tisl_ast_expr *res;\n\tisl_ast_expr_list *list;\n\n\targ0 = isl_ast_expr_get_op_arg(expr, 0);\n\tif (isl_ast_expr_get_type(arg0) == isl_ast_expr_op &&\n\t    isl_ast_expr_get_op_type(arg0) == isl_ast_op_member) {\n\t\tisl_ast_expr *arg;\n\n\t\targ = isl_ast_expr_get_op_arg(arg0, 0);\n\t\targ = gpu_local_array_info_linearize_index(array, arg);\n\t\targ0 = isl_ast_expr_set_op_arg(arg0, 0, arg);\n\t\texpr = isl_ast_expr_set_op_arg(expr, 0, arg0);\n\n\t\treturn expr;\n\t}\n\tisl_ast_expr_free(arg0);\n\n\tif (isl_ast_expr_get_op_n_arg(expr) == 1)\n\t\treturn expr;\n\n\tn = isl_ast_expr_get_op_n_arg(expr);\n\tres = isl_ast_expr_get_op_arg(expr, 1);\n\tfor (i = 1; i < array->n_index; ++i) {\n\t\tisl_ast_expr *expr_i;\n\n\t\texpr_i = isl_ast_expr_get_op_arg(array->bound_expr, 1 + i);\n\t\tres = isl_ast_expr_mul(res, expr_i);\n\n\t\tif (i + 1 >= n)\n\t\t\tcontinue;\n\t\texpr_i = isl_ast_expr_get_op_arg(expr, i + 1);\n\t\tres = isl_ast_expr_add(res, expr_i);\n\t}\n\n\tif (1 + array->n_index > n) {\n\t\tres = isl_ast_expr_add(isl_ast_expr_get_op_arg(expr, 0), res);\n\t} else {\n\t\tlist = isl_ast_expr_list_from_ast_expr(res);\n\t\tres = isl_ast_expr_get_op_arg(expr, 0);\n\t\tres = isl_ast_expr_access(res, list);\n\t}\n\n\tisl_ast_expr_free(expr);\n\n\treturn res;\n}\n\n/* AST expression transformation callback for pet_stmt_build_ast_exprs.\n *\n * If the AST expression refers to an array that is not accessed\n * at all, then this means the value of the expression is not used,\n * so we might as well print zero (NULL pointer) instead.\n *\n * If the AST expression refers to a global scalar that is not\n * a read-only scalar, then its address was passed to the kernel and\n * we need to dereference it.\n *\n * If the AST expression refers to an access to a global array,\n * then we linearize the access exploiting the bounds in data->local_array.\n */\nstatic __isl_give isl_ast_expr *transform_expr(__isl_take isl_ast_expr *expr,\n\t__isl_keep isl_id *id, void *user)\n{\n\tstruct ppcg_transform_data *data = user;\n\n\tif (!data->array)\n\t\treturn expr;\n\tif (!data->array->accessed) {\n\t\tisl_ctx *ctx;\n\n\t\tctx = isl_ast_expr_get_ctx(expr);\n\t\tisl_ast_expr_free(expr);\n\t\treturn isl_ast_expr_from_val(isl_val_zero(ctx));\n\t}\n\tif (gpu_array_is_read_only_scalar(data->array))\n\t\treturn expr;\n\tif (!data->global)\n\t\treturn expr;\n\tif (data->array->n_index == 0)\n\t\treturn dereference(expr);\n\tif (!data->array->linearize)\n\t\treturn expr;\n\n\treturn gpu_local_array_info_linearize_index(data->local_array, expr);\n}\n\n/* This function is called for each instance of a user statement\n * in the kernel \"kernel\", identified by \"gpu_stmt\".\n * \"kernel\" may be NULL if we are not inside a kernel.\n *\n * We attach a struct ppcg_kernel_stmt to the \"node\", containing\n * a computed AST expression for each access, through an annotation\n * with name \"user\".\n * These AST expressions are computed from iterator_map,\n * which expresses the domain\n * elements in terms of the generated loops, and sched2copy,\n * which expresses the outer copy_schedule_dim dimensions of\n * the kernel schedule computed by PPCG in terms of the generated loops.\n */\nstatic __isl_give isl_ast_node *create_domain_leaf(\n\tstruct ppcg_kernel *kernel, __isl_take isl_ast_node *node,\n\t__isl_keep isl_ast_build *build, struct gpu_stmt *gpu_stmt)\n{\n\tstruct ppcg_transform_data data;\n\tstruct ppcg_kernel_stmt *stmt;\n\tisl_ctx *ctx;\n\tisl_id *id;\n\tisl_pw_multi_aff *sched2copy;\n\tisl_map *map;\n\tisl_pw_multi_aff *iterator_map;\n\tisl_union_map *schedule;\n\n\tif (!node)\n\t\treturn NULL;\n\tctx = isl_ast_node_get_ctx(node);\n\n\tstmt = isl_calloc_type(ctx, struct ppcg_kernel_stmt);\n\tif (!stmt)\n\t\treturn isl_ast_node_free(node);\n\n\tschedule = isl_ast_build_get_schedule(build);\n\tmap = isl_map_reverse(isl_map_from_union_map(schedule));\n\titerator_map = isl_pw_multi_aff_from_map(map);\n\tif (kernel)\n\t\tsched2copy = compute_sched_to_copy(kernel,\n\t\t\t\t\tisl_pw_multi_aff_copy(iterator_map));\n\telse\n\t\tsched2copy = NULL;\n\n\tstmt->type = ppcg_kernel_domain;\n\tstmt->u.d.stmt = gpu_stmt;\n\n\tdata.kernel = kernel;\n\tdata.accesses = stmt->u.d.stmt->accesses;\n\tdata.iterator_map = iterator_map;\n\tdata.sched2copy = sched2copy;\n\tstmt->u.d.ref2expr = pet_stmt_build_ast_exprs(stmt->u.d.stmt->stmt,\n\t\t\t\t\t    build, &transform_index, &data,\n\t\t\t\t\t    &transform_expr, &data);\n\n\tisl_pw_multi_aff_free(iterator_map);\n\tisl_pw_multi_aff_free(sched2copy);\n\n\tid = isl_id_alloc(ctx, \"user\", stmt);\n\tid = isl_id_set_free_user(id, &ppcg_kernel_stmt_free);\n\tif (!id)\n\t\tppcg_kernel_stmt_free(stmt);\n\treturn isl_ast_node_set_annotation(node, id);\n}\n\n/* This function is called for each statement node in the AST\n * for copying to or from shared/private memory.\n * Attach a pointer to a ppcg_kernel_stmt representing the copy\n * statement to the node.\n * The statement name is \"read\" or \"write\", depending on whether we are\n * reading from global memory or writing to global memory.\n *\n * The schedule is of the form\n *\n *\ttype[D -> A] -> L\n *\n * where D corresponds to the outer tile->depth dimensions of\n * the kernel schedule, A to the global array and L to the outer\n * generated AST schedule.\n * We compute the inverse and strip off the type, resulting in\n *\n *\tL -> [D -> A]\n *\n * We combine this mapping with on the one hand the projection\n *\n *\t[D -> A] -> A\n *\n * and on the other hand the group tiling\n *\n *\t[D -> A] -> T\n *\n * resulting in\n *\n *\tL -> A\t\tand \tL -> T\n *\n * and store the corresponding expressions in stmt->index and stmt->local_index,\n * where stmt points to the ppcg_kernel_stmt that is attached to the node.\n * stmt->index is linearized if the global memory array is linearized.\n */\nstatic __isl_give isl_ast_node *create_access_leaf(struct ppcg_kernel *kernel,\n\tstruct gpu_array_ref_group *group, __isl_take isl_ast_node *node,\n\t__isl_keep isl_ast_build *build)\n{\n\tstruct ppcg_kernel_stmt *stmt;\n\tstruct gpu_array_tile *tile;\n\tisl_id *id;\n\tisl_ast_expr *expr;\n\tisl_space *space;\n\tisl_map *access;\n\tisl_pw_multi_aff *pma, *pma2;\n\tconst char *type;\n\n\tstmt = isl_calloc_type(kernel->ctx, struct ppcg_kernel_stmt);\n\tif (!stmt)\n\t\treturn isl_ast_node_free(node);\n\n\taccess = isl_map_from_union_map(isl_ast_build_get_schedule(build));\n\ttype = isl_map_get_tuple_name(access, isl_dim_in);\n\tstmt->u.c.read = type && !strcmp(type, \"read\");\n\taccess = isl_map_reverse(access);\n\tpma = isl_pw_multi_aff_from_map(access);\n\tpma = isl_pw_multi_aff_reset_tuple_id(pma, isl_dim_out);\n\n\tspace = isl_space_range(isl_pw_multi_aff_get_space(pma));\n\tspace = isl_space_unwrap(space);\n\tpma2 = isl_pw_multi_aff_range_map(space);\n\tpma2 = isl_pw_multi_aff_pullback_pw_multi_aff(pma2,\n\t\t\t\t\t\t    isl_pw_multi_aff_copy(pma));\n\texpr = isl_ast_build_access_from_pw_multi_aff(build, pma2);\n\tif (group->array->linearize)\n\t\texpr = gpu_local_array_info_linearize_index(group->local_array,\n\t\t\t\t\t\t\t    expr);\n\tstmt->u.c.index = expr;\n\n\ttile = gpu_array_ref_group_tile(group);\n\tpma2 = isl_pw_multi_aff_from_multi_aff(\n\t\t\t\t\t    isl_multi_aff_copy(tile->tiling));\n\tpma2 = isl_pw_multi_aff_pullback_pw_multi_aff(pma2, pma);\n\texpr = isl_ast_build_access_from_pw_multi_aff(build, pma2);\n\tstmt->u.c.local_index = expr;\n\n\tstmt->u.c.array = group->array;\n\tstmt->u.c.local_array = group->local_array;\n\tstmt->type = ppcg_kernel_copy;\n\n\tid = isl_id_alloc(kernel->ctx, \"copy\", stmt);\n\tid = isl_id_set_free_user(id, &ppcg_kernel_stmt_free);\n\tif (!id)\n\t\tppcg_kernel_stmt_free(stmt);\n\treturn isl_ast_node_set_annotation(node, id);\n}\n\n/* Create a synchronization ppcg_kernel_stmt and\n * attach it to the node \"node\" representing the synchronization.\n */\nstatic __isl_give isl_ast_node *create_sync_leaf(\n\tstruct ppcg_kernel *kernel, __isl_take isl_ast_node *node,\n\t__isl_keep isl_ast_build *build)\n{\n\tstruct ppcg_kernel_stmt *stmt;\n\tisl_id *id;\n\n\tstmt = isl_calloc_type(kernel->ctx, struct ppcg_kernel_stmt);\n\tif (!stmt)\n\t\treturn isl_ast_node_free(node);\n\n\tstmt->type = ppcg_kernel_sync;\n\tid = isl_id_alloc(kernel->ctx, \"sync\", stmt);\n\tid = isl_id_set_free_user(id, &ppcg_kernel_stmt_free);\n\tif (!id)\n\t\tppcg_kernel_stmt_free(stmt);\n\treturn isl_ast_node_set_annotation(node, id);\n}\n\n/* Build AST expressions for the device array sizes of all arrays in \"prog\"\n * that require allocation on the device using \"build\", as well as\n * for the original array sizes of all arrays that need to be declared\n * on the host.\n * \"node\" is freed in case of error.\n */\nstatic __isl_give isl_ast_node *build_array_bounds(\n\t__isl_take isl_ast_node *node, struct gpu_prog *prog,\n\t__isl_keep isl_ast_build *build)\n{\n\tint i;\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tstruct gpu_array_info *array = &prog->array[i];\n\t\tisl_multi_pw_aff *size;\n\t\tisl_ast_expr *expr;\n\n\t\tif (!gpu_array_requires_device_allocation(array))\n\t\t\tcontinue;\n\n\t\tsize = isl_multi_pw_aff_copy(array->bound);\n\t\texpr = ppcg_build_size_expr(size, build);\n\t\tarray->bound_expr = expr;\n\t\tif (!expr)\n\t\t\treturn isl_ast_node_free(node);\n\t}\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tstruct gpu_array_info *array = &prog->array[i];\n\t\tisl_set *extent;\n\t\tisl_multi_pw_aff *size;\n\t\tisl_ast_expr *expr;\n\n\t\tif (!array->declare_local)\n\t\t\tcontinue;\n\t\textent = isl_set_copy(array->declared_extent);\n\t\tsize = ppcg_size_from_extent(extent);\n\t\texpr = ppcg_build_size_expr(size, build);\n\t\tarray->declared_size = expr;\n\t\tif (!expr)\n\t\t\treturn isl_ast_node_free(node);\n\t}\n\n\treturn node;\n}\n\n/* Internal data structure for at_domain.\n *\n * \"prog\" represents the entire scop.\n * \"kernel\" points to the kernel to which the current schedule node\n * belongs.  It is set by before_mark and reset by after_mark.\n * It may be NULL if we are outside any kernel.\n */\nstruct ppcg_at_domain_data {\n\tstruct gpu_prog *prog;\n\tstruct ppcg_kernel *kernel;\n};\n\n/* This function is called for each instance of a user statement\n * in the kernel.  This may be one of the original user statements\n * or a statement introduced by PPCG.\n *\n * We first check if the statement id corresponds to a gpu statement,\n * which indicates the statement is an original user statement. Any statement\n * that is not an original user statement has been introduced by PPCG and\n * requires special handling.\n *\n * If the user statement is one of the original user statements, then we call\n * create_domain_leaf.  If it is \"init_device\", then we call\n * build_array_bounds.  Otherwise, we check if it is a copy or synchronization\n * statement and call the appropriate functions.  Statements that copy an array\n * to/from the device do not need any further treatment.\n * Neither does \"clear_device\".\n */\nstatic __isl_give isl_ast_node *at_domain(__isl_take isl_ast_node *node,\n\t__isl_keep isl_ast_build *build, void *user)\n{\n\tstruct ppcg_at_domain_data *data = user;\n\tstruct gpu_stmt *gpu_stmt;\n\tisl_ast_expr *expr, *arg;\n\tisl_id *id;\n\tint is_sync;\n\tconst char *name;\n\tvoid *p;\n\n\texpr = isl_ast_node_user_get_expr(node);\n\targ = isl_ast_expr_get_op_arg(expr, 0);\n\tid = isl_ast_expr_get_id(arg);\n\tname = isl_id_get_name(id);\n\tp = isl_id_get_user(id);\n\tisl_ast_expr_free(expr);\n\tisl_ast_expr_free(arg);\n\n\tgpu_stmt = find_stmt(data->prog, id);\n\tis_sync = gpu_tree_id_is_sync(id, data->kernel);\n\tisl_id_free(id);\n\n\tif (gpu_stmt)\n\t\treturn create_domain_leaf(data->kernel, node, build, gpu_stmt);\n\n\tif (!prefixcmp(name, \"to_device_\") || !prefixcmp(name, \"from_device_\"))\n\t\treturn node;\n\tif (!strcmp(name, \"init_device\"))\n\t\treturn build_array_bounds(node, data->prog, build);\n\tif (!strcmp(name, \"clear_device\"))\n\t\treturn node;\n\tif (is_sync < 0)\n\t\treturn isl_ast_node_free(node);\n\tif (!strcmp(name, \"read\") || !strcmp(name, \"write\")) {\n\t\tstruct gpu_array_ref_group *group = p;\n\t\treturn create_access_leaf(data->kernel, group, node, build);\n\t}\n\tif (!is_sync)\n\t\tisl_die(data->prog->ctx, isl_error_internal,\n\t\t\t\"unknown statement type\",\n\t\t\treturn isl_ast_node_free(node));\n\treturn create_sync_leaf(data->kernel, node, build);\n}\n\n/* Given a set of wrapped references \"ref\", return the corresponding\n * access relations based on the tagged access relations \"tagged\".\n *\n * The elements of \"ref\" are of the form\n *\n *\t[D -> R]\n *\n * with D an iteration domains and R a reference.\n * The elements of \"tagged\" are of the form\n *\n *\t[D -> R] -> A\n *\n * with A an array.\n *\n * Extend \"tagged\" to include the iteration domain in the range, i.e.,\n *\n *\t[D -> R] -> [D -> A]\n *\n * apply the result to \"ref\" and then unwrap the resulting set\n * to obtain relations of the form\n *\n *\tD -> A\n */\nstatic __isl_give isl_union_map *wrapped_reference_to_access(\n\t__isl_take isl_union_set *ref, __isl_take isl_union_map *tagged)\n{\n\tisl_union_map *tag2access;\n\n\ttag2access = isl_union_map_copy(tagged);\n\ttag2access = isl_union_map_universe(tag2access);\n\ttag2access = isl_union_set_unwrap(isl_union_map_domain(tag2access));\n\ttag2access = isl_union_map_domain_map(tag2access);\n\ttag2access = isl_union_map_range_product(tag2access, tagged);\n\n\tref = isl_union_set_coalesce(ref);\n\tref = isl_union_set_apply(ref, tag2access);\n\n\treturn isl_union_set_unwrap(ref);\n}\n\n/* Given an access relation \"access\" from one or more array reference groups,\n * remove those reads if (\"read\" is 1) or writes (if \"read\" is 0)\n * that are only needed to communicate data within\n * the same iteration of \"sched\".\n * The domain of \"sched\" corresponds to the original statement instances,\n * i.e., those that appear in the domains of the access relations.\n * \"tagged\" contains all tagged access relations to all\n * the array reference groups accessed by \"access\" from statement\n * instances scheduled by \"sched\".\n *\n * If the access is a read then it is either an element of\n *\n *\tlive_in union (range flow)\n *\n * where live_in and flow may be overapproximations, or\n * it reads an uninitialized value (that is not live-in because\n * there is an intermediate kill) or it reads a value that was\n * written within the same (compound) statement instance.\n * If the access is a write then it is either an element of\n *\n *\tlive_out union (domain flow)\n *\n * or it writes a value that is never read (and is not live-out\n * because of an intermediate kill) or only\n * within the same (compound) statement instance.\n * In both cases, the access relation is also a subset of\n * the group access relation.\n *\n * The cases where an uninitialized value is read or a value is written\n * that is never read or where the dataflow occurs within a statement\n * instance are also considered local and may also be removed.\n *\n * Essentially, we compute the intersection of \"access\" with either\n *\n *\tlive_in union (range non-local-flow)\n *\n * or\n *\n *\tlive_out union (domain non-local-flow)\n *\n * We first construct a relation \"local\"\n *\n *\t[[D -> R] -> [D' -> R']]\n *\n * of pairs of domain iterations accessing the reference group\n * and references in the group that are coscheduled by \"sched\".\n *\n * If this relation does not intersect the dataflow dependences,\n * then there is nothing we can possibly remove, unless the dataflow\n * dependences themselves only relate a subset of the accesses.\n * In particular, the accesses may not be involved in any dataflow\n * dependences, either because they are uninitialized reads/dead writes\n * or because the dataflow occurs inside a statement instance.\n *\n * Since the computation below may break up the access relation\n * into smaller pieces, we only perform the intersection with\n * the non-local dependent accesses if the local pairs\n * intersect the dataflow dependences.  Otherwise, we intersect\n * with the universe of the non-local dependent accesses.\n * This should at least remove accesses from statements that\n * do not participate in any dependences.\n *\n * In particular, we remove the \"local\" dataflow dependences from\n * the set of all dataflow dependences, or at least those\n * that may contribute to a domain/range that intersects\n * the domain of \"access\".\n * Note that if the potential dataflow dependences are an overapproximation\n * of the actual dataflow dependences, then the result remains an\n * overapproximation of the non-local dataflow dependences.\n * Copying to/from global memory is only needed for the references\n * in the domain/range of the result or for accesses that are live out/in\n * for the entire scop.\n *\n * We therefore map the domain/range of the \"external\" relation\n * to the corresponding access relation and take the union with\n * the live out/in relation.\n */\nstatic __isl_give isl_union_map *remove_local_accesses(\n\tstruct gpu_prog *prog, __isl_take isl_union_map *tagged,\n\t__isl_take isl_union_map *access, __isl_take isl_union_map *sched,\n\tint read)\n{\n\tint empty;\n\tisl_union_pw_multi_aff *tagger;\n\tisl_union_set *domain, *access_domain;\n\tisl_union_map *local, *external, *universe;\n\tisl_union_set *tag_set;\n\n\tif (isl_union_map_is_empty(access)) {\n\t\tisl_union_map_free(sched);\n\t\tisl_union_map_free(tagged);\n\t\treturn access;\n\t}\n\n\ttagger = isl_union_pw_multi_aff_copy(prog->scop->tagger);\n\tdomain = isl_union_map_domain(isl_union_map_copy(tagged));\n\ttagger = isl_union_pw_multi_aff_intersect_domain(tagger,\n\t\t\t\t\tisl_union_set_copy(domain));\n\tsched = isl_union_map_preimage_domain_union_pw_multi_aff(sched, tagger);\n\n\tlocal = isl_union_map_apply_range(sched,\n\t\t\t    isl_union_map_reverse(isl_union_map_copy(sched)));\n\tlocal = isl_union_map_intersect(local,\n\t\t\tisl_union_map_copy(prog->scop->tagged_dep_flow));\n\n\tempty = isl_union_map_is_empty(local);\n\n\texternal = isl_union_map_copy(prog->scop->tagged_dep_flow);\n\tuniverse = isl_union_map_universe(isl_union_map_copy(access));\n\taccess_domain = isl_union_map_domain(universe);\n\tdomain = isl_union_set_universe(domain);\n\tuniverse = isl_union_set_unwrap(domain);\n\tuniverse = isl_union_map_intersect_domain(universe, access_domain);\n\tdomain = isl_union_map_wrap(universe);\n\tif (read)\n\t\texternal = isl_union_map_intersect_range(external, domain);\n\telse\n\t\texternal = isl_union_map_intersect_domain(external, domain);\n\texternal = isl_union_map_intersect_params(external,\n\t\t\t\tisl_set_copy(prog->scop->context));\n\texternal = isl_union_map_subtract(external, local);\n\n\tif (read) {\n\t\ttag_set = isl_union_map_range(external);\n\t\texternal = wrapped_reference_to_access(tag_set, tagged);\n\t\texternal = isl_union_map_union(external,\n\t\t\t\tisl_union_map_copy(prog->scop->live_in));\n\t} else {\n\t\ttag_set = isl_union_map_domain(external);\n\t\texternal = wrapped_reference_to_access(tag_set, tagged);\n\t\texternal = isl_union_map_union(external,\n\t\t\t\tisl_union_map_copy(prog->scop->live_out));\n\t}\n\n\tif (empty < 0)\n\t\texternal = isl_union_map_free(external);\n\telse if (empty)\n\t\texternal = isl_union_map_universe(external);\n\n\taccess = isl_union_map_intersect(access, external);\n\n\treturn access;\n}\n\n/* Given an access relation \"access\" from \"group\", remove those reads\n * if (\"read\" is 1) or writes (if \"read\" is 0) that are only needed to\n * communicate data within the same iteration of the schedule \"prefix\"\n * at the position where the copying of the group is inserted.\n * That is, the output dimension of \"prefix\"\n * is equal to tile->depth.\n * The domain of \"prefix\" corresponds to the original statement instances,\n * i.e., those that appear in the domains of the access relations.\n *\n * Extract the tagged access relation of \"group\" and\n * then call remove_local_accesses.\n */\nstatic __isl_give isl_union_map *remove_local_accesses_group(\n\tstruct ppcg_kernel *kernel, struct gpu_array_ref_group *group,\n\t__isl_take isl_union_map *access, __isl_keep isl_union_map *prefix,\n\tint read)\n{\n\tisl_union_map *sched, *tagged;\n\n\tif (isl_union_map_is_empty(access))\n\t\treturn access;\n\n\ttagged = group_tagged_access_relation(group);\n\tsched = isl_union_map_copy(prefix);\n\n\treturn remove_local_accesses(kernel->prog, tagged, access, sched, read);\n}\n\n/* Build an access AST expression for the effective grid size using \"build\".\n * Store the result in kernel->grid_size_expr.\n */\nstatic isl_stat build_grid_size(struct ppcg_kernel *kernel,\n\t__isl_keep isl_ast_build *build)\n{\n\tisl_multi_pw_aff *size;\n\n\tsize = isl_multi_pw_aff_copy(kernel->grid_size);\n\tsize = isl_multi_pw_aff_set_tuple_name(size, isl_dim_out, \"grid\");\n\tkernel->grid_size_expr = ppcg_build_size_expr(size, build);\n\n\tif (!kernel->grid_size_expr)\n\t\treturn isl_stat_error;\n\treturn isl_stat_ok;\n}\n\n/* Build access AST expressions for the localized array sizes using \"build\".\n * Store the result in local->bound_expr.\n * Only do this for arrays for which localized bounds have been computed.\n */\nstatic isl_stat build_local_array_sizes(struct ppcg_kernel *kernel,\n\t__isl_keep isl_ast_build *build)\n{\n\tint i;\n\n\tfor (i = 0; i < kernel->n_array; ++i) {\n\t\tstruct gpu_local_array_info *local = &kernel->array[i];\n\t\tisl_multi_pw_aff *size;\n\n\t\tif (local->n_group == 0)\n\t\t\tcontinue;\n\t\tsize = isl_multi_pw_aff_copy(local->bound);\n\t\tlocal->bound_expr = ppcg_build_size_expr(size, build);\n\t\tif (!local->bound_expr)\n\t\t\treturn isl_stat_error;\n\t}\n\n\treturn isl_stat_ok;\n}\n\n/* Build access AST expressions for the effective grid size and\n * the localized array sizes using \"build\".\n */\nstatic isl_stat build_grid_and_local_array_sizes(struct ppcg_kernel *kernel,\n\t__isl_keep isl_ast_build *build)\n{\n\tif (build_grid_size(kernel, build) < 0)\n\t\treturn isl_stat_error;\n\tif (build_local_array_sizes(kernel, build) < 0)\n\t\treturn isl_stat_error;\n\treturn isl_stat_ok;\n}\n\n/* This function is called before the AST generator starts traversing\n * the schedule subtree of a node with mark \"mark\".\n *\n * If the mark is called \"kernel\", store the kernel pointer in data->kernel\n * for use in at_domain and build AST expressions for the grid size and\n * the localized array sizes.\n */\nstatic isl_stat before_mark(__isl_keep isl_id *mark,\n\t__isl_keep isl_ast_build *build, void *user)\n{\n\tstruct ppcg_at_domain_data *data = user;\n\n\tif (!mark)\n\t\treturn isl_stat_error;\n\tif (!strcmp(isl_id_get_name(mark), \"kernel\")) {\n\t\tdata->kernel = isl_id_get_user(mark);\n\t\tif (build_grid_and_local_array_sizes(data->kernel, build) < 0)\n\t\t\treturn isl_stat_error;\n\t}\n\treturn isl_stat_ok;\n}\n\n/* This function is called after the AST generator has finished traversing\n * the schedule subtree of a mark node.  \"node\" points to the corresponding\n * mark AST node.\n *\n * If the mark is called \"kernel\", then replace \"node\" by a user node\n * that \"calls\" the kernel, representing the launch of the kernel.\n * The original \"node\" is stored inside the kernel object so that\n * it can be used to print the device code.\n * Note that this assumes that a kernel is only launched once.\n * Also clear data->kernel.\n */\nstatic __isl_give isl_ast_node *after_mark(__isl_take isl_ast_node *node,\n        __isl_keep isl_ast_build *build, void *user)\n{\n\tisl_ctx *ctx;\n\tisl_id *id;\n\tisl_ast_expr *expr;\n\tisl_ast_expr_list *list;\n\tstruct ppcg_kernel *kernel;\n\tstruct ppcg_at_domain_data *data = user;\n\n\tctx = isl_ast_node_get_ctx(node);\n\tid = isl_ast_node_mark_get_id(node);\n\tif (!id)\n\t\treturn isl_ast_node_free(node);\n\tif (strcmp(isl_id_get_name(id), \"kernel\") || !data->kernel) {\n\t\tisl_id_free(id);\n\t\treturn node;\n\t}\n\tkernel = data->kernel;\n\tdata->kernel = NULL;\n\tkernel->space = isl_ast_build_get_schedule_space(build);\n\tkernel->tree = isl_ast_node_mark_get_node(node);\n\tisl_ast_node_free(node);\n\n\texpr = isl_ast_expr_from_id(isl_id_copy(id));\n\tlist = isl_ast_expr_list_alloc(ctx, 0);\n\texpr = isl_ast_expr_call(expr, list);\n\tnode = isl_ast_node_alloc_user(expr);\n\tnode = isl_ast_node_set_annotation(node, id);\n\n\treturn node;\n}\n\nstatic isl_bool update_depth(__isl_keep isl_schedule_node *node, void *user)\n{\n\tint *depth = user;\n\tint node_depth;\n\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_leaf)\n\t\treturn isl_bool_true;\n\tnode_depth = isl_schedule_node_get_schedule_depth(node);\n\tif (node_depth > *depth)\n\t\t*depth = node_depth;\n\n\treturn isl_bool_false;\n}\n\n/* Use isl to generate code for both the host and the device\n * from \"schedule\".\n * The device code is marked by \"kernel\" mark nodes in the schedule tree,\n * containing a pointer to a ppcg_kernel object.\n * The returned AST only contains the AST for the host code.\n * The ASTs for the device code are embedded in ppcg_kernel objects\n * attached to the leaf nodes that call \"kernel\".\n */\nstatic __isl_give isl_ast_node *generate_code(struct gpu_gen *gen,\n\t__isl_take isl_schedule *schedule)\n{\n\tstruct ppcg_at_domain_data data;\n\tisl_ast_build *build;\n\tisl_ast_node *tree;\n\tisl_id_list *iterators;\n\tint depth;\n\n\tdata.prog = gen->prog;\n\tdata.kernel = NULL;\n\n\tdepth = 0;\n\tif (isl_schedule_foreach_schedule_node_top_down(schedule, &update_depth,\n\t\t\t\t\t\t&depth) < 0)\n\t\tschedule = isl_schedule_free(schedule);\n\tbuild = isl_ast_build_alloc(gen->prog->ctx);\n\titerators = ppcg_scop_generate_names(gen->prog->scop, depth, \"c\");\n\tbuild = isl_ast_build_set_iterators(build, iterators);\n\tbuild = isl_ast_build_set_at_each_domain(build, &at_domain, &data);\n\tbuild = isl_ast_build_set_before_each_mark(build, &before_mark, &data);\n\tbuild = isl_ast_build_set_after_each_mark(build, &after_mark, &data);\n\tif (gen->prog->scop->options->debug->dump_final_schedule)\n\t\tisl_schedule_dump(schedule);\n\ttree = isl_ast_build_node_from_schedule(build, schedule);\n\tisl_ast_build_free(build);\n\n\treturn tree;\n}\n\n__isl_give isl_union_map *extract_sizes_from_str(isl_ctx *ctx, const char *str)\n{\n\tif (!str)\n\t\treturn NULL;\n\treturn isl_union_map_read_from_str(ctx, str);\n}\n\n/* Can \"node\" be tiled and then mapped to block and thread identifiers?\n * That is, is it permutable with at least one coincident dimension?\n */\nstatic isl_bool is_permutable(__isl_keep isl_schedule_node *node)\n{\n\tif (!node)\n\t\treturn isl_bool_error;\n\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n\t\treturn isl_bool_false;\n\tif (!isl_schedule_node_band_get_permutable(node))\n\t\treturn isl_bool_false;\n\tif (isl_schedule_node_band_n_member(node) < 1)\n\t\treturn isl_bool_false;\n\tif (!isl_schedule_node_band_member_get_coincident(node, 0))\n\t\treturn isl_bool_false;\n\n\treturn isl_bool_true;\n}\n\n/* Is \"node\" not a suitably permutable band?\n */\nstatic isl_bool not_permutable(__isl_keep isl_schedule_node *node, void *user)\n{\n\treturn isl_bool_not(is_permutable(node));\n}\n\n/* Does the subtree rooted at \"node\" have any suitably permutable band nodes?\n * That is, does it have any nodes that are permutable and that\n * have a least one coincident dimension?\n */\nstatic isl_bool subtree_has_permutable_bands(__isl_keep isl_schedule_node *node)\n{\n\tisl_bool all_non_permutable;\n\n\tall_non_permutable = isl_schedule_node_every_descendant(node,\n\t\t\t\t\t\t&not_permutable, NULL);\n\treturn isl_bool_not(all_non_permutable);\n}\n\n/* Does \"schedule\" contain any permutable band with at least one coincident\n * member?\n */\nstatic isl_bool has_any_permutable_node(__isl_keep isl_schedule *schedule)\n{\n\tisl_schedule_node *root;\n\tisl_bool any_permutable;\n\n\troot = isl_schedule_get_root(schedule);\n\tany_permutable = subtree_has_permutable_bands(root);\n\tisl_schedule_node_free(root);\n\n\treturn any_permutable;\n}\n\n/* Is \"node\" a candidate for mapping to block and thread identifiers?\n * In particular, is it permutable with at least one coincident dimension?\n * Alternatively, does the subtree rooted at \"node\" not contain\n * any such permutable node?  Filter nodes are skipped in this case,\n * because a band node will be inserted in front of the returned\n * node and this is not possible for filter nodes that are children\n * of set or sequence nodes.\n */\nstatic int is_candidate(__isl_keep isl_schedule_node *node)\n{\n\tisl_bool permutable;\n\n\tif (isl_schedule_node_get_type(node) == isl_schedule_node_leaf)\n\t\treturn 1;\n\tpermutable = is_permutable(node);\n\tif (permutable < 0 || permutable)\n\t\treturn permutable;\n\tif (isl_schedule_node_get_type(node) == isl_schedule_node_filter)\n\t\treturn 0;\n\tpermutable = subtree_has_permutable_bands(node);\n\tif (permutable < 0)\n\t\treturn -1;\n\treturn !permutable;\n}\n\n/* Is \"node\" the outermost node in its branch that can be tiled\n * and then mapped to block and thread identifiers?\n * If there are no such nodes in the subtree at \"node\" and\n * if \"node\" is not a filter node, then it is accepted too.\n */\nstatic int is_outer_tilable(__isl_keep isl_schedule_node *node)\n{\n\tint tilable;\n\tisl_schedule_node *ancestor;\n\n\ttilable = is_candidate(node);\n\tif (tilable < 0)\n\t\treturn -1;\n\tif (!tilable)\n\t\treturn 0;\n\n\ttilable = 0;\n\tancestor = isl_schedule_node_copy(node);\n\twhile (isl_schedule_node_has_parent(ancestor)) {\n\t\tancestor = isl_schedule_node_parent(ancestor);\n\n\t\ttilable = is_candidate(ancestor);\n\t\tif (tilable < 0 || tilable)\n\t\t\tbreak;\n\t}\n\n\tisl_schedule_node_free(ancestor);\n\treturn tilable < 0 ? -1 : !tilable;\n}\n\n/* Collect the references to all writes in \"group\".\n * Each reference is represented by a universe set in a space\n *\n *\t[S[i,j] -> R[]]\n *\n * with S[i,j] the statement instance space and R[] the array reference.\n */\nstatic __isl_give isl_union_set *group_tagged_writes(\n\tstruct gpu_array_ref_group *group)\n{\n\tint i;\n\tisl_space *space;\n\tisl_union_set *writes;\n\n\tspace = isl_map_get_space(group->access);\n\twrites = isl_union_set_empty(space);\n\tfor (i = 0; i < group->n_ref; ++i) {\n\t\tisl_space *space;\n\t\tisl_set *writes_i;\n\n\t\tif (!group->refs[i]->write)\n\t\t\tcontinue;\n\n\t\tspace = isl_map_get_space(group->refs[i]->tagged_access);\n\t\tspace = isl_space_domain(space);\n\t\twrites_i = isl_set_universe(space);\n\t\twrites = isl_union_set_add_set(writes, writes_i);\n\t}\n\n\treturn writes;\n}\n\n/* Is there any write access in \"group\" that requires synchronization\n * on a write to global memory?\n * We currently take into account all writes that would require\n * synchronization at the thread level depth, but if the copying\n * for this group is performed at an outer level, then we do not\n * actually need to take into account dependences at intermediate levels.\n */\nstatic int any_sync_writes_in_group(struct ppcg_kernel *kernel,\n\tstruct gpu_array_ref_group *group)\n{\n\tisl_union_set *writes;\n\tint empty, disjoint;\n\n\tempty = isl_union_set_is_empty(kernel->sync_writes);\n\tif (empty < 0)\n\t\treturn -1;\n\tif (empty)\n\t\treturn 0;\n\n\twrites = group_tagged_writes(group);\n\tdisjoint = isl_union_set_is_disjoint(kernel->sync_writes, writes);\n\tisl_union_set_free(writes);\n\n\treturn disjoint < 0 ? -1 : !disjoint;\n}\n\n/* Collect the references to all writes in \"kernel\" that write directly\n * to global or shared memory, i.e., that are not mapped to private memory.\n * Each reference is represented by a universe set in a space\n *\n *\t[S[i,j] -> R[]]\n *\n * with S[i,j] the statement instance space and R[] the array reference.\n */\nstatic __isl_give isl_union_set *collect_non_private_tagged_writes(\n\tstruct ppcg_kernel *kernel)\n{\n\tisl_union_set *writes;\n\tint i, j;\n\n\twrites = isl_union_set_empty(isl_union_set_get_space(kernel->arrays));\n\n\tfor (i = 0; i < kernel->n_array; ++i) {\n\t\tstruct gpu_local_array_info *array = &kernel->array[i];\n\n\t\tfor (j = 0; j < array->n_group; ++j) {\n\t\t\tstruct gpu_array_ref_group *group = array->groups[j];\n\t\t\tenum ppcg_group_access_type type;\n\t\t\tisl_union_set *writes_ij;\n\n\t\t\tif (!group->write)\n\t\t\t\tcontinue;\n\t\t\ttype = gpu_array_ref_group_type(group);\n\t\t\tif (type == ppcg_access_private)\n\t\t\t\tcontinue;\n\t\t\twrites_ij = group_tagged_writes(group);\n\t\t\twrites = isl_union_set_union(writes, writes_ij);\n\t\t}\n\t}\n\n\treturn writes;\n}\n\n/* Are there any direct writes to global memory that require\n * synchronization?\n */\nstatic int any_global_or_shared_sync_writes(struct ppcg_kernel *kernel)\n{\n\tisl_union_set *writes;\n\tint empty, disjoint;\n\n\tempty = isl_union_set_is_empty(kernel->sync_writes);\n\tif (empty < 0)\n\t\treturn -1;\n\tif (empty)\n\t\treturn 0;\n\n\twrites = collect_non_private_tagged_writes(kernel);\n\tdisjoint = isl_union_set_is_disjoint(kernel->sync_writes, writes);\n\tisl_union_set_free(writes);\n\n\treturn disjoint < 0 ? -1 : !disjoint;\n}\n\n/* Construct an isl_multi_val for use as tile sizes for tiling \"node\"\n * from the elements in \"tile_size\".\n */\nstatic __isl_give isl_multi_val *construct_band_tiles_sizes(\n\t__isl_keep isl_schedule_node *node, int *tile_size)\n{\n\tisl_space *space;\n\n\tif (!node)\n\t\treturn NULL;\n\n\tspace = isl_schedule_node_band_get_space(node);\n\treturn ppcg_multi_val_from_int_list(space, tile_size);\n}\n\n/* Replace the partial schedule S of the band node \"node\" by\n *\n *\tfloor(S/f)\n *\n * or\n *\n *\tf * floor(S/f)\n *\n * if scale_tile_loops is set, with f the integers in \"factor\".\n * The list that \"factor\" points to is assumed to contain at least\n * as many elements as the number of members in the band.\n */\nstatic __isl_give isl_schedule_node *snap_band_to_sizes(\n\t__isl_take isl_schedule_node *node, int *factor,\n\tstruct ppcg_options *options)\n{\n\tisl_multi_val *mv;\n\n\tmv = construct_band_tiles_sizes(node, factor);\n\tnode = isl_schedule_node_band_scale_down(node, isl_multi_val_copy(mv));\n\tif (options->scale_tile_loops)\n\t\tnode = isl_schedule_node_band_scale(node,\n\t\t\t\t\t\t\tisl_multi_val_copy(mv));\n\tisl_multi_val_free(mv);\n\n\treturn node;\n}\n\n/* Tile \"band\" with tile size specified by \"sizes\".\n *\n * Since the tile loops will be mapped to block ids, we forcibly\n * turn off tile loop scaling.  We may want to enable tile loop scaling\n * at some later point, but then we would have to support the detection\n * of strides during the mapping to block ids.\n * Similarly, since the point loops will be mapped to thread ids,\n * we forcibly shift the point loops so that they start at zero.\n */\nstatic __isl_give isl_schedule_node *tile_band(\n\t__isl_take isl_schedule_node *node, __isl_take isl_multi_val *sizes)\n{\n\tisl_ctx *ctx = isl_schedule_node_get_ctx(node);\n\tint scale_tile;\n\tint shift_point;\n\n\tscale_tile = isl_options_get_tile_scale_tile_loops(ctx);\n\tisl_options_set_tile_scale_tile_loops(ctx, 0);\n\tshift_point = isl_options_get_tile_shift_point_loops(ctx);\n\tisl_options_set_tile_shift_point_loops(ctx, 1);\n\n\tnode = isl_schedule_node_band_tile(node, sizes);\n\n\tisl_options_set_tile_scale_tile_loops(ctx, scale_tile);\n\tisl_options_set_tile_shift_point_loops(ctx, shift_point);\n\n\treturn node;\n}\n\n/* Extract the set of parameter values and outer schedule dimensions\n * for which any statement instance\n * in the kernel inserted at \"node\" needs to be executed.\n * Intersect the set of parameter values derived from the host schedule\n * relation with the context of \"prog\".\n */\nstatic __isl_give isl_set *extract_context(__isl_keep isl_schedule_node *node,\n\tstruct gpu_prog *prog)\n{\n\tisl_union_map *schedule;\n\tisl_union_set *schedule_domain;\n\tisl_set *context;\n\tint empty;\n\n\tschedule = isl_schedule_node_get_prefix_schedule_relation(node);\n\tschedule_domain = isl_union_map_range(schedule);\n\tempty = isl_union_set_is_empty(schedule_domain);\n\tif (empty < 0) {\n\t\tisl_union_set_free(schedule_domain);\n\t\treturn NULL;\n\t}\n\tif (empty) {\n\t\tint depth;\n\t\tisl_space *space;\n\n\t\tspace = isl_union_set_get_space(schedule_domain);\n\t\tisl_union_set_free(schedule_domain);\n\t\tspace = isl_space_set_from_params(space);\n\t\tdepth = isl_schedule_node_get_schedule_depth(node);\n\t\tspace = isl_space_add_dims(space, isl_dim_set, depth);\n\t\tcontext = isl_set_empty(space);\n\t} else {\n\t\tcontext = isl_set_from_union_set(schedule_domain);\n\t}\n\tcontext = isl_set_intersect_params(context,\n\t\t\t\t\t    isl_set_copy(prog->context));\n\n\treturn context;\n}\n\n/* Return the set of outer array elements accessed by\n * by the statement instances in \"domain\" in \"prog\".\n * The instances in \"domain\" are those that appear\n * in the domains of the access relations in \"prog\".\n */\nstatic __isl_give isl_union_set *accessed_by_domain(\n\t__isl_take isl_union_set *domain, struct gpu_prog *prog)\n{\n\tisl_union_map *access;\n\tisl_union_set *arrays;\n\n\taccess = isl_union_map_union(isl_union_map_copy(prog->read),\n\t\t\t\t     isl_union_map_copy(prog->may_write));\n\taccess = isl_union_map_intersect_domain(access, domain);\n\tarrays = isl_union_map_range(access);\n\tarrays = isl_union_set_apply(arrays,\n\t\t\t\tisl_union_map_copy(prog->to_outer));\n\n\treturn arrays;\n}\n\n/* Return the number of outer band members of the band node \"node\"\n * that are marked coincident.\n */\nstatic int n_outer_coincidence(__isl_keep isl_schedule_node *node)\n{\n\tint i, n;\n\n\tn = isl_schedule_node_band_n_member(node);\n\n\tfor (i = 0; i < n; ++i)\n\t\tif (!isl_schedule_node_band_member_get_coincident(node, i))\n\t\t\tbreak;\n\n\treturn i;\n}\n\n/* If the band node \"node\" has more than \"n\" members, then split off\n * the first \"n\" of them.\n */\nstatic __isl_give isl_schedule_node *split_band(\n\t__isl_take isl_schedule_node *node, int n)\n{\n\tint dim;\n\n\tdim = isl_schedule_node_band_n_member(node);\n\tif (n < dim)\n\t\tnode = isl_schedule_node_band_split(node, n);\n\n\treturn node;\n}\n\n/* Scale a band node that may have been split by split_band.\n * \"sizes\" are the scaling factors for the original node.\n * \"node\" either points to the original band node, or the outer\n * of the two pieces after splitting.\n *\n * If the number of elements in \"node\" is smaller than the number of\n * elements in \"sizes\", then some splitting has occurred and we split\n * \"sizes\" in the same way.\n */\nstatic __isl_give isl_schedule_node *scale_band(\n\t__isl_take isl_schedule_node *node, __isl_take isl_multi_val *sizes)\n{\n\tint n, dim;\n\n\tn = isl_multi_val_dim(sizes, isl_dim_set);\n\tdim = isl_schedule_node_band_n_member(node);\n\tif (n > dim) {\n\t\tisl_multi_val *sizes2;\n\n\t\tsizes2 = isl_multi_val_copy(sizes);\n\t\tsizes = isl_multi_val_drop_dims(sizes,\n\t\t\t\t\t\tisl_dim_set, dim, n - dim);\n\t\tsizes2 = isl_multi_val_drop_dims(sizes2, isl_dim_set, 0, dim);\n\t\tnode = isl_schedule_node_child(node, 0);\n\t\tnode = isl_schedule_node_band_scale(node, sizes2);\n\t\tnode = isl_schedule_node_parent(node);\n\t}\n\n\treturn isl_schedule_node_band_scale(node, sizes);\n}\n\n/* Return an isl_multi_aff, with as elements the parameters in \"space\"\n * that have the names specified by the elements in \"names\".\n * If (some of) these parameters do not already appear in \"space\",\n * then they are added first.\n */\nstatic __isl_give isl_multi_aff *parameter_vector(__isl_take isl_space *space,\n\t__isl_keep isl_id_list *names)\n{\n\tint i, n;\n\tisl_local_space *ls;\n\tisl_multi_aff *ma;\n\n\tif (!names)\n\t\tspace = isl_space_free(space);\n\n\tn = isl_id_list_n_id(names);\n\tfor (i = 0; i < n; ++i) {\n\t\tint pos;\n\t\tisl_id *id;\n\n\t\tid = isl_id_list_get_id(names, i);\n\t\tpos = isl_space_find_dim_by_id(space, isl_dim_param, id);\n\t\tif (pos >= 0) {\n\t\t\tisl_id_free(id);\n\t\t\tcontinue;\n\t\t}\n\t\tpos = isl_space_dim(space, isl_dim_param);\n\t\tspace = isl_space_add_dims(space, isl_dim_param, 1);\n\t\tspace = isl_space_set_dim_id(space, isl_dim_param, pos, id);\n\t}\n\tma = isl_multi_aff_zero(isl_space_copy(space));\n\tls = isl_local_space_from_space(isl_space_domain(space));\n\tfor (i = 0; i < n; ++i) {\n\t\tint pos;\n\t\tisl_id *id;\n\t\tisl_aff *aff;\n\n\t\tid = isl_id_list_get_id(names, i);\n\t\tpos = isl_space_find_dim_by_id(space, isl_dim_param, id);\n\t\tisl_id_free(id);\n\t\taff = isl_aff_var_on_domain(isl_local_space_copy(ls),\n\t\t\t\t\t    isl_dim_param, pos);\n\t\tma = isl_multi_aff_set_aff(ma, i, aff);\n\t}\n\tisl_local_space_free(ls);\n\n\treturn ma;\n}\n\n/* Return constraints on the domain elements that equate a sequence of\n * parameters called \"names\", to the partial schedule\n * of \"node\" modulo the integers in \"size\".\n * The number of elements in the array \"size\" should be equal\n * to the number of elements in \"names\".\n * The number of members of the band node \"node\" should be smaller\n * than or equal to this number.  If it is smaller, then the first\n * elements of \"names\" are equated to zero.\n */\nstatic __isl_give isl_union_set *set_schedule_modulo(\n\t__isl_keep isl_schedule_node *node, __isl_keep isl_id_list *names,\n\tint *size)\n{\n\tint n, n_zero;\n\tisl_space *space;\n\tisl_multi_aff *ma;\n\tisl_multi_union_pw_aff *mupa, *mupa2;\n\tisl_multi_val *mv;\n\tisl_union_set *domain;\n\n\tif (!node)\n\t\treturn NULL;\n\tn = isl_id_list_n_id(names);\n\tif (n == 0)\n\t\treturn isl_schedule_node_get_universe_domain(node);\n\tn_zero = n - isl_schedule_node_band_n_member(node);\n\n\tmupa = isl_schedule_node_band_get_partial_schedule(node);\n\tmv = construct_band_tiles_sizes(node, size + n_zero);\n\tmupa = isl_multi_union_pw_aff_mod_multi_val(mupa, mv);\n\n\tspace = isl_multi_union_pw_aff_get_space(mupa);\n\tspace = isl_space_params(space);\n\tspace = isl_space_set_from_params(space);\n\tspace = isl_space_add_dims(space, isl_dim_set, n_zero);\n\tma = isl_multi_aff_zero(space);\n\n\tdomain = isl_schedule_node_get_universe_domain(node);\n\tmupa2 = isl_multi_union_pw_aff_multi_aff_on_domain(\n\t\t\t\t\t\tisl_union_set_copy(domain), ma);\n\tmupa = isl_multi_union_pw_aff_range_product(mupa2, mupa);\n\n\tspace = isl_multi_union_pw_aff_get_space(mupa);\n\tma = parameter_vector(space, names);\n\n\tmupa2 = isl_multi_union_pw_aff_multi_aff_on_domain(domain, ma);\n\tmupa = isl_multi_union_pw_aff_sub(mupa, mupa2);\n\n\treturn isl_multi_union_pw_aff_zero_union_set(mupa);\n}\n\n/* Insert a context node at \"node\" introducing the block and thread\n * identifiers along with their bounds, which are stored in kernel->grid_size\n * and kernel->block_dim.\n * Note that the bounds on the block identifiers may implicitly impose\n * constraints on the parameters.  A guard needs to be inserted\n * in the schedule tree to ensure that those bounds hold at \"node\".\n * This guard is inserted in insert_guard.\n */\nstatic __isl_give isl_schedule_node *insert_context(struct ppcg_kernel *kernel,\n\t__isl_take isl_schedule_node *node)\n{\n\tisl_set *context;\n\n\tcontext = isl_set_universe(isl_set_get_space(kernel->context));\n\n\tcontext = add_bounded_parameters_dynamic(context,\n\t\t\t\t\tkernel->grid_size, kernel->block_ids);\n\tcontext = add_bounded_parameters(context,\n\t\t\t\t\tkernel->block_dim, kernel->thread_ids);\n\n\tnode = isl_schedule_node_insert_context(node, context);\n\n\treturn node;\n}\n\n/* Insert a guard that eliminates kernel launches where the kernel\n * obviously does not have any work to do.\n *\n * In particular, eliminate kernel launches where there are obviously\n * zero blocks.\n * Use the same block size constraints that are used to create the context\n * to ensure that all constraints implicit in the constructed context\n * are imposed by the guard.\n *\n * Additionally, add other constraints that are valid\n * for each executed instance (\"context\"), as long as this does not result\n * in a disjunction.\n */\nstatic __isl_give isl_schedule_node *insert_guard(\n\t__isl_take isl_schedule_node *node, __isl_keep isl_set *context,\n\t__isl_keep isl_multi_pw_aff *size, struct ppcg_scop *scop)\n{\n\tunsigned nparam, n;\n\tisl_set *guard;\n\tisl_id_list *ids;\n\n\tguard = isl_set_copy(context);\n\tguard = isl_set_compute_divs(guard);\n\tguard = isl_set_from_basic_set(isl_set_simple_hull(guard));\n\n\tnparam = isl_set_dim(guard, isl_dim_param);\n\tn = isl_multi_pw_aff_dim(size, isl_dim_out);\n\tids = ppcg_scop_generate_names(scop, n, \"__ppcg_tmp\");\n\tguard = add_bounded_parameters_dynamic(guard, size, ids);\n\tisl_id_list_free(ids);\n\tguard = isl_set_project_out(guard, isl_dim_param, nparam, n);\n\n\tnode = isl_schedule_node_insert_guard(node, guard);\n\n\treturn node;\n}\n\n/* Does any array reference group mapping require the band that is mapped\n * to threads to be unrolled?\n */\nstatic int kernel_requires_unroll(struct ppcg_kernel *kernel)\n{\n\tint i, j;\n\n\tfor (i = 0; i < kernel->n_array; ++i) {\n\t\tstruct gpu_local_array_info *array = &kernel->array[i];\n\n\t\tfor (j = 0; j < array->n_group; ++j) {\n\t\t\tstruct gpu_array_ref_group *group = array->groups[j];\n\t\t\tif (gpu_array_ref_group_requires_unroll(group))\n\t\t\t\treturn 1;\n\t\t}\n\t}\n\n\treturn 0;\n}\n\n/* Mark the given band node \"node\" for unrolling by the AST generator and\n * then sink it to the leaves of the schedule tree.\n * All dimensions of \"node\" are assumed to be coincident, such that this\n * sinking is a valid operation.\n */\nstatic __isl_give isl_schedule_node *unroll(__isl_take isl_schedule_node *node)\n{\n\tnode = ppcg_set_schedule_node_type(node, isl_ast_loop_unroll);\n\n\tnode = isl_schedule_node_band_sink(node);\n\n\treturn node;\n}\n\n/* Insert a synchronization node in the schedule tree of \"node\"\n * after the core computation of \"kernel\" at the level of the band\n * that is mapped to threads, except if that level is equal to\n * that of the band that is mapped to blocks or if there are no writes\n * to global or shared memory in the core computation that require\n * synchronization.\n * If there are any writes to shared memory and the shared memory\n * copying is performed at the same level, then synchronization\n * is needed between the core and the copying anyway, so we might\n * as well add it here.  If the copying is performed at a higher\n * level, then different iterations of intermediate schedule dimensions\n * may have a different mapping from between shared memory elements and\n * threads, such that synchronization is required after the core.\n * \"node\" is assumed to point to the kernel node.\n *\n * If the shared and the thread mark point to the same node, then make\n * sure the synchronization is inserted outside of the shared mark.\n */\nstatic __isl_give isl_schedule_node *add_sync(struct ppcg_kernel *kernel,\n\t__isl_take isl_schedule_node *node)\n{\n\tint depth;\n\tint need_sync;\n\n\tneed_sync = any_global_or_shared_sync_writes(kernel);\n\tif (need_sync < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (!need_sync)\n\t\treturn node;\n\n\tnode = gpu_tree_move_down_to_thread(node, kernel->core);\n\tdepth = isl_schedule_node_get_schedule_depth(node);\n\tnode = gpu_tree_move_up_to_kernel(node);\n\tif (depth == isl_schedule_node_get_schedule_depth(node))\n\t\treturn node;\n\n\tnode = gpu_tree_move_down_to_depth(node, depth, kernel->core);\n\tnode = gpu_tree_ensure_following_sync(node, kernel);\n\n\tnode = gpu_tree_move_up_to_kernel(node);\n\n\treturn node;\n}\n\n/* Return a read (\"read\" is 1) or write access relation for \"group\"\n * with those accesses removed that are only needed to communicate data\n * within the subtree of the schedule rooted at \"node\".\n * Furthermore, include the prefix schedule at \"node\".\n * That is, return a relation of the form\n *\n *\tS -> [D -> A]\n *\n * with D the outer schedule dimensions at \"node\".\n */\nstatic __isl_give isl_union_map *anchored_non_local_accesses(\n\tstruct ppcg_kernel *kernel, struct gpu_array_ref_group *group,\n\t__isl_take isl_schedule_node *node, int read)\n{\n\tisl_union_map *access;\n\tisl_union_map *prefix;\n\n\tprefix = isl_schedule_node_get_prefix_schedule_relation(node);\n\tprefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix,\n\t\t\t    isl_union_pw_multi_aff_copy(kernel->contraction));\n\taccess = gpu_array_ref_group_access_relation(group, read, !read);\n\taccess = remove_local_accesses_group(kernel, group, access, prefix,\n\t\t\t\t\t\tread);\n\taccess = isl_union_map_range_product(prefix, access);\n\n\treturn access;\n}\n\n/* Given an array reference group \"group\", create a mapping\n *\n *\tread[D -> A] -> [D -> A]\n *\n * if \"read\" is set or\n *\n *\twrite[D -> A] -> [D -> A]\n *\n * if \"read\" is not set.\n * D corresponds to the outer tile->depth dimensions of\n * the kernel schedule.\n */\nstatic __isl_give isl_multi_aff *create_from_access(isl_ctx *ctx,\n\tstruct gpu_array_ref_group *group, int read)\n{\n\tstruct gpu_array_tile *tile;\n\tisl_space *space;\n\tisl_id *id;\n\n\ttile = gpu_array_ref_group_tile(group);\n\tspace = isl_space_copy(group->array->space);\n\tspace = isl_space_from_range(space);\n\tspace = isl_space_add_dims(space, isl_dim_in, tile->depth);\n\tspace = isl_space_wrap(space);\n\tspace = isl_space_map_from_set(space);\n\n\tid = isl_id_alloc(ctx, read ? \"read\" : \"write\", group);\n\tspace = isl_space_set_tuple_id(space, isl_dim_in, id);\n\n\treturn isl_multi_aff_identity(space);\n}\n\n/* If any writes in \"group\" require synchronization, then make sure\n * that there is a synchronization node for \"kernel\" after the node\n * following \"node\" in a sequence.\n *\n * If \"shared\" is set and no synchronization is needed for\n * the writes to global memory, then add synchronization before\n * the kernel to protect shared memory from being overwritten\n * by the next iteration of the core computation.\n * No additional synchronization is needed to protect against\n * the next copy into shared memory because each element of\n * the shared memory tile is always copied by the same thread.\n */\nstatic __isl_give isl_schedule_node *add_group_write_sync(\n\t__isl_take isl_schedule_node *node, struct ppcg_kernel *kernel,\n\tstruct gpu_array_ref_group *group, int shared)\n{\n\tint need_sync;\n\n\tneed_sync = any_sync_writes_in_group(kernel, group);\n\tif (need_sync < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (need_sync) {\n\t\tnode = isl_schedule_node_parent(node);\n\t\tnode = isl_schedule_node_next_sibling(node);\n\t\tnode = isl_schedule_node_child(node, 0);\n\t\tnode = gpu_tree_ensure_following_sync(node, kernel);\n\t} else if (shared) {\n\t\tstruct gpu_array_tile *tile;\n\n\t\ttile = gpu_array_ref_group_tile(group);\n\t\tnode = isl_schedule_node_parent(node);\n\t\tnode = isl_schedule_node_parent(node);\n\t\tnode = gpu_tree_move_down_to_depth(node, tile->depth,\n\t\t\t\t\t\t\tkernel->core);\n\t\tnode = gpu_tree_move_left_to_sync(node, kernel);\n\t}\n\n\treturn node;\n}\n\n/* Add copy statements to the schedule tree of \"node\"\n * for reading from global memory to private memory (if \"read\" is set) or\n * for writing back from private memory to global memory\n * (if \"read\" is not set) for the array reference group \"group\" that\n * is mapped to private memory.\n * On input, \"node\" points to the kernel node, and it is moved\n * back there on output.\n *\n * The copies are performed in the order of the array elements.\n * The copy statement instances include a reference to the outer\n * tile->depth dimensions of the kernel schedule for ease of\n * combining them with the group tiling.\n *\n * That is, the extra schedule is of the form\n *\n *\ttype[D -> A] -> A\n *\n * where D corresponds to the outer tile->depth dimensions of\n * the kernel schedule and A to the global array.\n * This schedule is unrolled because registers are not addressable.\n *\n * The copying is inserted in the schedule tree through an extension\n * of the form\n *\n *\tD -> type[D -> A]\n *\n * where the extra domain elements type[D -> A] are those accessed\n * by the group.\n * A filter is inserted on type[D -> A] to ensure that the element\n * is read/written by the same thread that needs the element.\n * This filter is obtained by applying\n *\n *\tS -> type[D -> A]\n *\n * to the thread filter for the core statements.\n *\n * The extension is inserted before the core computation in case of a read\n * and after the core computation in case of a write.\n * In the latter case, we also make sure that there is a synchronization\n * node after the write to global memory, unless this write is performed\n * at the outer level of the kernel.\n * In principle, this synchronization could be inserted higher\n * in the schedule tree depending on where the corresponding reads\n * from global memory are performed.\n */\nstatic __isl_give isl_schedule_node *add_copies_group_private(\n\tstruct ppcg_kernel *kernel, struct gpu_array_ref_group *group,\n\t__isl_take isl_schedule_node *node, int read)\n{\n\tstruct gpu_array_tile *tile;\n\tisl_union_map *access;\n\tisl_union_set *domain;\n\tisl_space *space;\n\tisl_multi_aff *from_access;\n\tisl_multi_pw_aff *mpa;\n\tisl_multi_union_pw_aff *mupa;\n\tisl_union_pw_multi_aff *contraction;\n\tisl_schedule_node *graft;\n\tisl_union_set *filter;\n\tint kernel_depth;\n\tint empty;\n\n\tkernel_depth = isl_schedule_node_get_schedule_depth(node);\n\ttile = gpu_array_ref_group_tile(group);\n\tnode = gpu_tree_move_down_to_depth(node, tile->depth, kernel->core);\n\n\taccess = anchored_non_local_accesses(kernel, group, node, read);\n\tempty = isl_union_map_is_empty(access);\n\tif (empty < 0 || empty) {\n\t\tisl_union_map_free(access);\n\t\tif (empty < 0)\n\t\t\treturn isl_schedule_node_free(node);\n\t\treturn gpu_tree_move_up_to_kernel(node);\n\t}\n\n\tgroup->array->global = 1;\n\tgroup->local_array->global = 1;\n\n\tfrom_access = create_from_access(kernel->ctx, group, read);\n\tspace = isl_space_domain(isl_multi_aff_get_space(from_access));\n\taccess = isl_union_map_preimage_range_multi_aff(access, from_access);\n\n\tfilter = isl_union_set_copy(kernel->thread_filter);\n\tcontraction = isl_union_pw_multi_aff_copy(kernel->contraction);\n\tfilter = isl_union_set_preimage_union_pw_multi_aff(filter, contraction);\n\tfilter = isl_union_set_apply(filter, isl_union_map_copy(access));\n\tfilter = isl_union_set_detect_equalities(filter);\n\tfilter = isl_union_set_coalesce(filter);\n\n\tdomain = isl_union_map_range(access);\n\taccess = isl_union_set_wrapped_domain_map(domain);\n\taccess = isl_union_map_reverse(access);\n\taccess = isl_union_map_coalesce(access);\n\tgraft = isl_schedule_node_from_extension(access);\n\n\tspace = isl_space_map_from_set(space);\n\tmpa = isl_multi_pw_aff_identity(space);\n\tmpa = isl_multi_pw_aff_range_factor_range(mpa);\n\tmupa = isl_multi_union_pw_aff_from_multi_pw_aff(mpa);\n\n\tgraft = isl_schedule_node_child(graft, 0);\n\tgraft = isl_schedule_node_insert_partial_schedule(graft, mupa);\n\tgraft = unroll(graft);\n\n\tgraft = isl_schedule_node_insert_filter(graft, filter);\n\n\tgraft = isl_schedule_node_parent(graft);\n\n\tif (read)\n\t\tnode = isl_schedule_node_graft_before(node, graft);\n\telse {\n\t\tnode = isl_schedule_node_graft_after(node, graft);\n\t\tif (kernel_depth < tile->depth)\n\t\t\tnode = add_group_write_sync(node, kernel, group, 0);\n\t}\n\n\tnode = gpu_tree_move_up_to_kernel(node);\n\n\treturn node;\n}\n\n/* Add copy statements to the schedule tree of \"node\"\n * for reading from global memory to shared memory (if \"read\" is set) or\n * for writing back from shared memory to global memory\n * (if \"read\" is not set) for the array reference group \"group\" that\n * is mapped to shared memory.\n * On input, \"node\" points to the kernel node, and it is moved\n * back there on output.\n *\n * The copies are performed in the order of the corresponding shared\n * memory tile.\n * The copy statement instances include a reference to the outer\n * tile->depth dimensions of the kernel schedule for ease of\n * combining them with the group tiling.\n *\n * If we are performing a read from global memory to shared memory and\n * if the array involved is not a scalar, then we copy\n * the entire tile to shared memory.  This may result in some extra\n * elements getting copied, but it should lead to simpler code\n * (which means that fewer registers may be needed) and less divergence.\n *\n * Otherwise, we only copy the elements that will be read or have been written\n * in the kernel.\n *\n * That is, the extra schedule is of the form\n *\n *\ttype[D -> A] -> T\n *\n * where D corresponds to the outer tile->depth dimensions of\n * the kernel schedule, A to the global array and T is the corresponding\n * shared memory tile.\n *\n * The copying is inserted in the schedule tree through an extension\n * of the form\n *\n *\tD -> type[D -> A]\n *\n * where the extra domain elements type[D -> A] are those accessed\n * by the group.  In the case of read from a non-scalar, this set\n * is replaced by the entire shared memory tile.\n *\n * If the \"unroll_copy_shared\" option is set, then the AST generator\n * is instructed to unroll the copying code.\n *\n * A filter is inserted on type[D -> A] to map the copy instances\n * to the threads.  In particular, the thread identifiers are\n * equated to the position inside the shared memory tile (T)\n * modulo the block size.\n * We try to align the innermost tile dimension with the innermost\n * thread identifier (x) as a heuristic to improve coalescing.\n * In particular, if the dimension of the tile is greater than\n * the dimension of the block, then the schedule mapping to the tile\n * is broken up into two pieces and the filter is applied to the inner part.\n * If, on the other hand, the dimension of the tile is smaller than\n * the dimension of the block, then the initial thread identifiers\n * are equated to zero and the remaining thread identifiers are\n * matched to the memory tile.\n *\n * The extension is inserted before the core computation in case of a read\n * and after the core computation in case of a write.\n * In the case of a read, we first need to make sure there is some\n * synchronization before the core computation such that we can put the read\n * from global memory to shared memory before that synchronization.\n * This ensures that all threads have finished copying into shared memory\n * before the shared memory is used.\n * We also need to make sure that there is a synchronization node after\n * the core computation to ensure that the next load into shared memory\n * only happens after all data has been used.  There is no need for\n * this synchronization if we are at the outer level since then there\n * won't be a next load.\n * In the case of a write, we need to make sure there is some synchronization\n * after the core computation such that we can put the write from shared\n * memory to global memory after that synchronization.\n * Unless we are at the outer level, we also need a synchronization node\n * after the write to ensure the data is saved to global memory\n * before the next iteration writes to the same shared memory.\n * It also makes sure the data has arrived in global memory before\n * it is read in a subsequent iteration.\n */\nstatic __isl_give isl_schedule_node *add_copies_group_shared(\n\tstruct ppcg_kernel *kernel, struct gpu_array_ref_group *group,\n\t__isl_take isl_schedule_node *node, int read)\n{\n\tstruct gpu_array_tile *tile;\n\tisl_union_map *access;\n\tisl_union_set *domain;\n\tisl_multi_aff *ma;\n\tisl_multi_aff *from_access;\n\tisl_multi_pw_aff *mpa;\n\tisl_multi_union_pw_aff *mupa;\n\tisl_schedule_node *graft;\n\tisl_union_set *filter;\n\tint skip;\n\tint kernel_depth;\n\tint empty;\n\n\ttile = gpu_array_ref_group_tile(group);\n\tkernel_depth = isl_schedule_node_get_schedule_depth(node);\n\tnode = gpu_tree_move_down_to_depth(node, tile->depth, kernel->core);\n\n\taccess = anchored_non_local_accesses(kernel, group, node, read);\n\tempty = isl_union_map_is_empty(access);\n\tif (empty < 0 || empty) {\n\t\tisl_union_map_free(access);\n\t\tif (empty < 0)\n\t\t\treturn isl_schedule_node_free(node);\n\t\treturn gpu_tree_move_up_to_kernel(node);\n\t}\n\n\tgroup->array->global = 1;\n\tgroup->local_array->global = 1;\n\n\tfrom_access = create_from_access(kernel->ctx, group, read);\n\n\tma = isl_multi_aff_copy(tile->tiling);\n\tma = isl_multi_aff_pullback_multi_aff(ma,\n\t\t\t\t\t    isl_multi_aff_copy(from_access));\n\tmpa = isl_multi_pw_aff_from_multi_aff(ma);\n\tmupa = isl_multi_union_pw_aff_from_multi_pw_aff(mpa);\n\n\tdomain = isl_union_map_range(access);\n\n\tif (read && !gpu_array_is_scalar(group->array)) {\n\t\tisl_map *map;\n\t\tisl_union_set_free(domain);\n\t\tmap = group_tile(group);\n\t\tdomain = isl_union_set_from_set(isl_map_wrap(map));\n\t}\n\n\tdomain = isl_union_set_preimage_multi_aff(domain, from_access);\n\taccess = isl_union_set_wrapped_domain_map(domain);\n\taccess = isl_union_map_reverse(access);\n\taccess = isl_union_map_coalesce(access);\n\tgraft = isl_schedule_node_from_extension(access);\n\n\tgraft = isl_schedule_node_child(graft, 0);\n\n\tgraft = isl_schedule_node_insert_partial_schedule(graft, mupa);\n\tif (kernel->options->unroll_copy_shared)\n\t\tgraft = ppcg_set_schedule_node_type(graft, isl_ast_loop_unroll);\n\n\tif (tile->n > kernel->n_block && kernel->n_block > 0) {\n\t\tgraft = isl_schedule_node_band_split(graft,\n\t\t\t\t\t\ttile->n - kernel->n_block);\n\t\tgraft = isl_schedule_node_child(graft, 0);\n\t}\n\tif (tile->n < kernel->n_block)\n\t\tskip = kernel->n_block - tile->n;\n\telse\n\t\tskip = 0;\n\tfilter = set_schedule_modulo(graft, kernel->thread_ids,\n\t\t\t\t\tkernel->block_dim);\n\tif (!kernel->options->wrap)\n\t\tgraft = snap_band_to_sizes(graft, kernel->block_dim + skip,\n\t\t\t    kernel->options);\n\tif (tile->n > kernel->n_block && kernel->n_block > 0)\n\t\tgraft = isl_schedule_node_parent(graft);\n\tgraft = isl_schedule_node_insert_filter(graft, filter);\n\n\twhile (graft && isl_schedule_node_has_parent(graft))\n\t\tgraft = isl_schedule_node_parent(graft);\n\n\tif (read) {\n\t\tif (kernel_depth < tile->depth)\n\t\t\tnode = gpu_tree_ensure_sync_after_core(node, kernel);\n\t\tnode = gpu_tree_move_left_to_sync(node, kernel);\n\t\tnode = isl_schedule_node_graft_before(node, graft);\n\t} else {\n\t\tnode = gpu_tree_move_right_to_sync(node, kernel);\n\t\tnode = isl_schedule_node_graft_after(node, graft);\n\t\tif (kernel_depth < tile->depth)\n\t\t\tnode = add_group_write_sync(node, kernel, group, 1);\n\t}\n\n\tnode = gpu_tree_move_up_to_kernel(node);\n\n\treturn node;\n}\n\n/* Check whether the array reference group \"group\" is mapped to\n * private or shared memory and, if so,\n * add copy statements to the schedule tree of \"node\"\n * for reading from global memory to private or shared memory\n * (if \"read\" is set) or for writing back from private or shared memory\n * to global memory (if \"read\" is not set) for this group.\n * On input, \"node\" points to the kernel node, and it is moved\n * back there on output.\n */\nstatic __isl_give isl_schedule_node *add_copies_group(\n\tstruct ppcg_kernel *kernel, struct gpu_array_ref_group *group,\n\t__isl_take isl_schedule_node *node, int read)\n{\n\tenum ppcg_group_access_type type;\n\n\ttype = gpu_array_ref_group_type(group);\n\tif (type == ppcg_access_private)\n\t\treturn add_copies_group_private(kernel, group, node, read);\n\tif (type == ppcg_access_shared)\n\t\treturn add_copies_group_shared(kernel, group, node, read);\n\treturn node;\n}\n\n/* For each array reference group that is mapped to private or shared memory,\n * add copy statements to the schedule tree of \"node\"\n * for reading from global memory to private or shared memory\n * and for writing back.\n * On input, \"node\" points to the kernel node, and it is moved\n * back there on output.\n */\nstatic __isl_give isl_schedule_node *add_copies(struct ppcg_kernel *kernel,\n\t__isl_take isl_schedule_node *node)\n{\n\tint i, j;\n\n\tfor (i = 0; i < kernel->n_array; ++i) {\n\t\tstruct gpu_local_array_info *array = &kernel->array[i];\n\n\t\tfor (j = 0; j < array->n_group; ++j) {\n\t\t\tstruct gpu_array_ref_group *group = array->groups[j];\n\n\t\t\tnode = add_copies_group(kernel, group, node, 1);\n\t\t\tif (!node)\n\t\t\t\treturn NULL;\n\t\t\tnode = add_copies_group(kernel, group, node, 0);\n\t\t\tif (!node)\n\t\t\t\treturn NULL;\n\t\t}\n\t}\n\n\treturn node;\n}\n\n/* Mark all dimensions in the current band node atomic.\n */\nstatic __isl_give isl_schedule_node *atomic(__isl_take isl_schedule_node *node)\n{\n\treturn ppcg_set_schedule_node_type(node, isl_ast_loop_atomic);\n}\n\n/* Mark \"node\" atomic, if it is a band node.\n * Do the same for all ancestors.\n * Return a pointer to \"node\" (in the updated schedule tree).\n */\nstatic __isl_give isl_schedule_node *atomic_ancestors(\n\t__isl_take isl_schedule_node *node)\n{\n\tint pos;\n\n\tif (!node)\n\t\treturn NULL;\n\tif (!isl_schedule_node_has_parent(node))\n\t\treturn node;\n\n\tpos = isl_schedule_node_get_child_position(node);\n\tnode = isl_schedule_node_parent(node);\n\tif (isl_schedule_node_get_type(node) == isl_schedule_node_band)\n\t\tnode = atomic(node);\n\tnode = atomic_ancestors(node);\n\tnode = isl_schedule_node_child(node, pos);\n\n\treturn node;\n}\n\n/* Collect all write references that require synchronization.\n * \"node\" is assumed to point to the kernel node.\n * Each reference is represented by a universe set in a space\n *\n *\t[S[i,j] -> R[]]\n *\n * with S[i,j] the statement instance space and R[] the array reference.\n *\n * This function should be called before block and thread filters are added.\n *\n * Synchronization is needed after a write if there is a subsequent read\n * within the same block that may not be performed by the same thread.\n * There should not be any dependences between different blocks,\n * so we start with the flow dependences within the same kernel invocation\n * and we subtract from these those dependences that are mapped\n * to the same iteration of the bands where synchronization is inserted.\n * We do not remove pairs of instances that are known to map to\n * the same thread across different iterations of the intermediate\n * bands because the read may be performed by a different thread\n * than the one that needs the value if shared memory is involved.\n *\n * We also consider all pairs of possible writes that access the same\n * memory location and that may be mapped to the same block but not\n * to the same iteration of the intermediate bands.\n * In theory, it would be possible for one thread to still be in\n * a previous iteration of a loop in these bands.\n * A write to global memory in this delayed thread could then overwrite\n * a write from another thread that has already moved on to\n * the next iteration.\n *\n * After computing the above writes paired off with reads or writes\n * that depend on them, we project onto the domain writes.\n * Sychronization is needed after writes to global memory\n * through these references.\n */\nstatic __isl_give isl_union_set *compute_sync_writes(\n\tstruct ppcg_kernel *kernel, __isl_keep isl_schedule_node *node)\n{\n\tisl_union_map *local;\n\tisl_union_map *may_writes, *shared_access;\n\tisl_union_map *kernel_prefix, *thread_prefix;\n\tisl_union_map *equal;\n\tisl_union_set *wrap;\n\tisl_union_set *domain;\n\tisl_union_pw_multi_aff *contraction;\n\n\tkernel_prefix = isl_schedule_node_get_prefix_schedule_union_map(node);\n\tnode = isl_schedule_node_copy(node);\n\tnode = gpu_tree_move_down_to_thread(node, kernel->core);\n\tthread_prefix = isl_schedule_node_get_prefix_schedule_union_map(node);\n\tisl_schedule_node_free(node);\n\n\tcontraction = kernel->contraction;\n\tkernel_prefix = isl_union_map_preimage_domain_union_pw_multi_aff(\n\t\t    kernel_prefix, isl_union_pw_multi_aff_copy(contraction));\n\tthread_prefix = isl_union_map_preimage_domain_union_pw_multi_aff(\n\t\t    thread_prefix, isl_union_pw_multi_aff_copy(contraction));\n\tdomain = isl_union_set_copy(kernel->expanded_domain);\n\tdomain = isl_union_set_universe(domain);\n\n\tmay_writes = isl_union_map_copy(kernel->prog->scop->tagged_may_writes);\n\tmay_writes = isl_union_map_curry(may_writes);\n\tmay_writes = isl_union_map_intersect_domain(may_writes, domain);\n\tmay_writes = isl_union_map_uncurry(may_writes);\n\tshared_access = isl_union_map_copy(may_writes);\n\tshared_access = isl_union_map_apply_range(shared_access,\n\t\t\t\t\tisl_union_map_reverse(may_writes));\n\n\tlocal = isl_union_map_copy(kernel->prog->scop->tagged_dep_flow);\n\tlocal = isl_union_map_union(local, shared_access);\n\tlocal = isl_union_map_zip(local);\n\n\tequal = isl_union_map_apply_range(kernel_prefix,\n\t\t    isl_union_map_reverse(isl_union_map_copy(kernel_prefix)));\n\twrap = isl_union_map_wrap(equal);\n\tlocal = isl_union_map_intersect_domain(local, wrap);\n\tequal = isl_union_map_apply_range(thread_prefix,\n\t\t    isl_union_map_reverse(isl_union_map_copy(thread_prefix)));\n\twrap = isl_union_map_wrap(equal);\n\tlocal = isl_union_map_subtract_domain(local, wrap);\n\n\tlocal = isl_union_map_zip(local);\n\tlocal = isl_union_map_universe(local);\n\n\treturn isl_union_map_domain(local);\n}\n\n/* Group the domain elements into a single space, named kernelX,\n * with X the kernel sequence number \"kernel_id\".\n */\nstatic __isl_give isl_schedule_node *group_statements(\n\t__isl_take isl_schedule_node *node, int kernel_id)\n{\n\tchar buffer[20];\n\tisl_id *id;\n\n\tif (!node)\n\t\treturn NULL;\n\n\tsnprintf(buffer, sizeof(buffer), \"kernel%d\", kernel_id);\n\tid = isl_id_alloc(isl_schedule_node_get_ctx(node), buffer, NULL);\n\treturn isl_schedule_node_group(node, id);\n}\n\n/* Create a ppcg_kernel representing the domain instances that reach \"node\"\n * and insert a mark node pointing to the ppcg_kernel before \"node\".\n * The band that \"node\" points to is the band that needs to be mapped\n * to block identifiers.  The band that needs to be mapped to thread\n * identifiers should be marked by a \"thread\" mark by the caller.\n * The linear branch between the current node and the \"thread\" mark\n * may also have a \"shared\" mark.  If present, the mapping to shared\n * memory is computed at that point.\n * Both marks are removed by this function.\n * If \"scale\" is set, then the band that \"node\" points to is scaled\n * by \"sizes\".\n *\n * Mark all outer band nodes as atomic to ensure each kernel is only\n * scheduled once.\n * If the domain elements that reach \"node\" live in more than one space,\n * then group the domain elements into a single space, named kernelX,\n * with X the kernel sequence number.\n *\n * Insert a guard node governing the kernel node to ensure that\n * no kernels with zero blocks are launched.\n *\n * Insert a context node describing the block and thread\n * identifiers inside the kernel mark.\n * The context node needs to be inserted after the effective block size\n * has been determined such that the bounds on the thread identifiers\n * would reflect the effective block size.\n * Insert a filter node inside the context node mapping the statement\n * instances to block identifiers.  In particular, the block identifiers\n * are equated to the partial schedule of band that was marked for mapping\n * to blocks modulo the grid size.\n * Insert a filter node inside the \"thread\" mark mapping the statement\n * instances to thread identifiers.  In particular, the thread identifiers\n * are equated to the partial schedule of band that was marked for mapping\n * to threads modulo the block size.\n *\n * Compute array reference groups for all arrays, set the local\n * array bounds based on the set of domain instances that reach\n * the kernel node, check the total amount of shared memory used\n * and compute all group tilings.\n * The array reference groups are computed after the block filter\n * has been inserted because it affects the mapping to shared or\n * private memory.  This computation also requires the thread filter\n * (in the ppcg_kernel object), but this thread filter should not\n * have been added to the schedule tree yet since the computation\n * requires the schedule of the band that needs to be mapped to\n * threads before the privatization is applied.\n *\n * If any array reference group requires the band mapped to threads\n * to be unrolled, then we perform the required unrolling.\n *\n * We save a copy of the schedule that may influence the mappings\n * to shared or private memory in kernel->copy_schedule.\n *\n * Finally, we add synchronization and copy statements to the schedule tree,\n * remove the \"thread\" mark and create representations for the local\n * variables in the kernel.\n *\n * We keep a copy of the isl_id that points to the kernel to ensure\n * that the kernel does not get destroyed if the schedule node\n * is freed due to some error condition.\n */\n__isl_give isl_schedule_node *gpu_create_kernel(struct gpu_gen *gen,\n\t__isl_take isl_schedule_node *node, int scale,\n\t__isl_keep isl_multi_val *sizes)\n{\n\tstruct ppcg_kernel *kernel;\n\tisl_id *id;\n\tisl_schedule_node *node_thread;\n\tisl_union_map *host_schedule;\n\tisl_union_pw_multi_aff *contraction;\n\tisl_set *host_domain;\n\tisl_union_set *domain, *expanded;\n\tint single_statement;\n\n\tnode = gpu_tree_insert_shared_before_thread(node);\n\tif (!node)\n\t\treturn NULL;\n\n\tkernel = isl_calloc_type(gen->ctx, struct ppcg_kernel);\n\tkernel = ppcg_kernel_create_local_arrays(kernel, gen->prog);\n\tif (!kernel)\n\t\treturn isl_schedule_node_free(node);\n\n\tdomain = isl_schedule_node_get_domain(node);\n\tsingle_statement = isl_union_set_n_set(domain) == 1;\n\n\tkernel->ctx = gen->ctx;\n\tkernel->prog = gen->prog;\n\tkernel->options = gen->options;\n\tkernel->context = extract_context(node, gen->prog);\n\tkernel->core = isl_union_set_universe(isl_union_set_copy(domain));\n\tcontraction = isl_schedule_node_get_subtree_contraction(node);\n\tkernel->contraction = isl_union_pw_multi_aff_copy(contraction);\n\texpanded = isl_union_set_copy(domain);\n\texpanded = isl_union_set_preimage_union_pw_multi_aff(expanded,\n\t\t\t\t\t\tcontraction);\n\tkernel->expanded_domain = isl_union_set_copy(expanded);\n\tkernel->arrays = accessed_by_domain(expanded, gen->prog);\n\tkernel->n_grid = n_outer_coincidence(node);\n\tnode_thread = isl_schedule_node_copy(node);\n\tnode_thread = gpu_tree_move_down_to_thread(node_thread, kernel->core);\n\tnode_thread = isl_schedule_node_child(node_thread, 0);\n\tkernel->n_block = n_outer_coincidence(node_thread);\n\tisl_schedule_node_free(node_thread);\n\tkernel->id = gen->kernel_id++;\n\tif (read_grid_and_block_sizes(kernel, gen) < 0)\n\t\tnode = isl_schedule_node_free(node);\n\n\tkernel->sync_writes = compute_sync_writes(kernel, node);\n\n\thost_schedule = isl_schedule_node_get_prefix_schedule_union_map(node);\n\thost_domain = isl_set_from_union_set(isl_union_map_range(\n\t\t\t\t\t\t\t\thost_schedule));\n\n\tnode = atomic_ancestors(node);\n\n\tid = isl_id_alloc(gen->ctx, \"kernel\", kernel);\n\tid = isl_id_set_free_user(id, &ppcg_kernel_free_wrap);\n\tnode = isl_schedule_node_insert_mark(node, isl_id_copy(id));\n\n\tif (!single_statement)\n\t\tnode = group_statements(node, kernel->id);\n\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = split_band(node, kernel->n_grid);\n\tkernel->block_ids = ppcg_scop_generate_names(gen->prog->scop,\n\t\t\t\t\t\tkernel->n_grid, \"b\");\n\tkernel->block_filter = set_schedule_modulo(node, kernel->block_ids,\n\t\t\t\t\t\tkernel->grid_dim);\n\tkernel->grid_size = extract_grid_size(kernel,\n\t\t\t\t\t\tisl_union_set_copy(domain));\n\tif (!kernel->options->wrap)\n\t\tnode = snap_band_to_sizes(node, kernel->grid_dim,\n\t\t\t\t\t\tkernel->options);\n\tif (scale)\n\t\tnode = scale_band(node, isl_multi_val_copy(sizes));\n\tnode = isl_schedule_node_parent(node);\n\tif (!single_statement)\n\t\tnode = isl_schedule_node_parent(node);\n\tnode = insert_guard(node, kernel->context, kernel->grid_size,\n\t\t\t\tgen->prog->scop);\n\tnode = gpu_tree_move_down_to_thread(node, kernel->core);\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = split_band(node, kernel->n_block);\n\tkernel->thread_ids = ppcg_scop_generate_names(gen->prog->scop,\n\t\t\t\t\t\tkernel->n_block, \"t\");\n\tkernel->thread_filter = set_schedule_modulo(node, kernel->thread_ids,\n\t\t\t\t\t\tkernel->block_dim);\n\tif (extract_block_size(kernel, domain) < 0)\n\t\tnode = isl_schedule_node_free(node);\n\n\tnode = gpu_tree_move_up_to_kernel(node);\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = insert_context(kernel, node);\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = isl_schedule_node_insert_filter(node,\n\t\t\t\t    isl_union_set_copy(kernel->block_filter));\n\n\tnode = gpu_tree_move_up_to_kernel(node);\n\n\tif (gpu_group_references(kernel, node) < 0)\n\t\tnode = isl_schedule_node_free(node);\n\tlocalize_bounds(kernel, host_domain);\n\tisl_set_free(host_domain);\n\n\tcheck_shared_memory_bound(kernel);\n\tmark_global_arrays(kernel);\n\tcompute_group_tilings(kernel);\n\n\tnode = gpu_tree_move_down_to_thread(node, kernel->core);\n\tnode = isl_schedule_node_child(node, 0);\n\tif (!kernel->options->wrap)\n\t\tnode = snap_band_to_sizes(node, kernel->block_dim,\n\t\t\t\t\t\tkernel->options);\n\tnode = isl_schedule_node_insert_filter(node,\n\t\t\t\t    isl_union_set_copy(kernel->thread_filter));\n\tif (kernel_requires_unroll(kernel)) {\n\t\tnode = isl_schedule_node_child(node, 0);\n\t\tnode = unroll(node);\n\t}\n\n\tnode = gpu_tree_move_up_to_thread(node);\n\tkernel->copy_schedule_dim = isl_schedule_node_get_schedule_depth(node);\n\tkernel->copy_schedule =\n\t\tisl_schedule_node_get_prefix_schedule_union_pw_multi_aff(node);\n\tcontraction = isl_union_pw_multi_aff_copy(kernel->contraction);\n\tkernel->copy_schedule =\n\t\tisl_union_pw_multi_aff_pullback_union_pw_multi_aff(\n\t\t\t\t\t    kernel->copy_schedule, contraction);\n\n\tnode = gpu_tree_move_up_to_kernel(node);\n\n\tnode = add_sync(kernel, node);\n\tnode = add_copies(kernel, node);\n\n\tnode = gpu_tree_move_down_to_shared(node, kernel->core);\n\tnode = isl_schedule_node_delete(node);\n\n\tnode = gpu_tree_move_down_to_thread(node, kernel->core);\n\tnode = isl_schedule_node_delete(node);\n\n\tnode = gpu_tree_move_up_to_kernel(node);\n\n\tif (create_kernel_vars(kernel) < 0)\n\t\tnode = isl_schedule_node_free(node);\n\n\tif (!single_statement)\n\t\tnode = isl_schedule_node_parent(node);\n\tnode = isl_schedule_node_parent(node);\n\n\tisl_id_free(id);\n\tif (!id)\n\t\tppcg_kernel_free(kernel);\n\treturn node;\n}\n\n/* Insert a zero-dimensional permutable band at \"node\".\n */\nstatic __isl_give isl_schedule_node *insert_empty_permutable_band(\n\t__isl_take isl_schedule_node *node)\n{\n\tisl_space *space;\n\tisl_schedule *schedule;\n\tisl_union_set *domain;\n\tisl_multi_union_pw_aff *mupa;\n\n\tschedule = isl_schedule_node_get_schedule(node);\n\tdomain = isl_schedule_get_domain(schedule);\n\tspace = isl_union_set_get_space(domain);\n\tisl_union_set_free(domain);\n\tisl_schedule_free(schedule);\n\n\tspace = isl_space_set_from_params(space);\n\tmupa = isl_multi_union_pw_aff_zero(space);\n\tnode = isl_schedule_node_insert_partial_schedule(node, mupa);\n\tnode = isl_schedule_node_band_set_permutable(node, 1);\n\n\treturn node;\n}\n\n/* See if hybrid tiling can be performed on \"node\" and its parent.\n * If so, apply hybrid tiling and return the updated schedule tree.\n * If not, return the original schedule tree.\n * Return NULL on error.\n *\n * First check if \"node\", together with its parent, meets\n * the basic requirements for hybrid tiling.\n * If so, compute the relative dependence distances of \"node\"\n * with respect to its parent and check if they are sufficiently bounded.\n * If so, apply hybrid tiling using user specified tile sizes.\n *\n * The tile sizes are read before the dependence distance bounds are\n * computed, because the user may have specified fewer dimensions\n * than are available.  In this case, the remaining schedule dimensions\n * are split off and the dependence distances should be computed\n * after these dimensions have been split off.\n */\nstatic __isl_give isl_schedule_node *try_hybrid_tile(struct gpu_gen *gen,\n\t__isl_take isl_schedule_node *node)\n{\n\tint tile_len;\n\tint *tile_size;\n\tisl_bool ok;\n\tisl_schedule_node *orig = node;\n\tppcg_ht_bounds *bounds;\n\n\tok = ppcg_ht_parent_has_input_pattern(node);\n\tif (ok < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (!ok)\n\t\treturn orig;\n\n\ttile_len = 1 + isl_schedule_node_band_n_member(node);\n\ttile_size = read_tile_sizes(gen, &tile_len);\n\tif (!tile_size)\n\t\treturn isl_schedule_node_free(node);\n\n\tnode = isl_schedule_node_copy(node);\n\tnode = split_band(node, tile_len - 1);\n\tnode = isl_schedule_node_parent(node);\n\tbounds = ppcg_ht_compute_bounds(gen->prog->scop, node);\n\tnode = isl_schedule_node_child(node, 0);\n\n\tok = ppcg_ht_bounds_is_valid(bounds);\n\tif (ok >= 0 && ok)\n\t\tnode = gpu_hybrid_tile(gen, node, bounds, tile_size);\n\telse\n\t\tppcg_ht_bounds_free(bounds);\n\tfree(tile_size);\n\n\tif (ok >= 0 && !ok) {\n\t\tisl_schedule_node_free(node);\n\t\treturn orig;\n\t}\n\tisl_schedule_node_free(orig);\n\tif (ok < 0)\n\t\treturn isl_schedule_node_free(node);\n\treturn node;\n}\n\n/* If \"node\" is the outermost permutable band that can be mapped to block and\n * thread identifiers in its branch (or the root of a subtree with\n * no such outer bands),\n * then mark the band as such, attaching a ppcg_kernel to the mark.\n *\n * If hybrid tiling is allowed, then first try and apply it\n * to \"node\" and its parent.\n *\n * If \"node\" is the root of a subtree without permutable bands,\n * then insert a zero-dimensional permutable band such that\n * we can assume that \"node\" always points to a band node.\n * This includes the case where \"node\" already points to a band node,\n * but one without any coincident dimension.  In this case,\n * the extra node ensures that this original node does not get tiled.\n *\n * Tile \"node\" using user specified tile sizes, after splitting the band\n * if the number of specified tile sizes is smaller than the dimension\n * of the band.  Mark the point band of this tiling as the band that\n * needs to be mapped to threads and instruct the AST generator to unroll\n * the band if the \"unroll_gpu_tile\" option is set.\n * Create a kernel representing the domain instances that reach \"node\" and\n * insert a mark node pointing to the ppcg_kernel before the band node.\n */\nstatic __isl_give isl_schedule_node *mark_outer_permutable(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\tstruct gpu_gen *gen = user;\n\tint outer;\n\tint scale;\n\tint tile_len;\n\tint *tile_size;\n\tisl_id *id;\n\tisl_multi_val *sizes;\n\n\touter = is_outer_tilable(node);\n\tif (outer < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (!outer)\n\t\treturn node;\n\n\tif (gen->options->hybrid) {\n\t\tisl_schedule_node *saved = isl_schedule_node_copy(node);\n\t\tnode = try_hybrid_tile(gen, node);\n\t\tisl_schedule_node_free(saved);\n\t\tif (node != saved)\n\t\t\treturn node;\n\t}\n\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_band ||\n\t    !isl_schedule_node_band_member_get_coincident(node, 0))\n\t\tnode = insert_empty_permutable_band(node);\n\n\ttile_len = isl_schedule_node_band_n_member(node);\n\ttile_size = read_tile_sizes(gen, &tile_len);\n\tif (!tile_size)\n\t\treturn isl_schedule_node_free(node);\n\tif (tile_len < isl_schedule_node_band_n_member(node))\n\t\tnode = isl_schedule_node_band_split(node, tile_len);\n\tsizes = construct_band_tiles_sizes(node, tile_size);\n\tnode = tile_band(node, isl_multi_val_copy(sizes));\n\tnode = isl_schedule_node_child(node, 0);\n\tif (gen->options->unroll_gpu_tile)\n\t\tnode = ppcg_set_schedule_node_type(node, isl_ast_loop_unroll);\n\tid = isl_id_alloc(gen->ctx, \"thread\", NULL);\n\tnode = isl_schedule_node_insert_mark(node, id);\n\tnode = isl_schedule_node_parent(node);\n\n\tscale = gen->options->scale_tile_loops;\n\tnode = gpu_create_kernel(gen, node, scale, sizes);\n\tisl_multi_val_free(sizes);\n\tfree(tile_size);\n\n\treturn node;\n}\n\n/* Given a set or sequence node, return the union the filters of either all\n * (if \"only_initial\" is not set) or the initial (if \"only_initial\" is set)\n * direct subtrees that do not contain any suitably permutable bands\n * (according to subtree_has_permutable_bands).\n */\nstatic __isl_give isl_union_set *get_non_parallel_subtree_filters(\n\t__isl_keep isl_schedule_node *node, int only_initial)\n{\n\tisl_space *space;\n\tisl_union_set *filter;\n\tint i, n;\n\n\tn = isl_schedule_node_n_children(node);\n\tif (n < 0)\n\t\treturn NULL;\n\n\tnode = isl_schedule_node_copy(node);\n\tnode = isl_schedule_node_child(node, 0);\n\tfilter = isl_schedule_node_filter_get_filter(node);\n\tnode = isl_schedule_node_parent(node);\n\tspace = isl_union_set_get_space(filter);\n\tisl_union_set_free(filter);\n\tfilter = isl_union_set_empty(space);\n\n\tfor (i = 0; i < n; ++i) {\n\t\tint parallelism;\n\n\t\tnode = isl_schedule_node_child(node, i);\n\t\tparallelism = subtree_has_permutable_bands(node);\n\t\tif (parallelism < 0) {\n\t\t\tfilter = isl_union_set_free(filter);\n\t\t} else if (!parallelism) {\n\t\t\tisl_union_set *filter_i;\n\t\t\tfilter_i = isl_schedule_node_filter_get_filter(node);\n\t\t\tfilter = isl_union_set_union(filter, filter_i);\n\t\t} else if (only_initial)\n\t\t\tbreak;\n\t\tnode = isl_schedule_node_parent(node);\n\t}\n\n\tisl_schedule_node_free(node);\n\n\treturn filter;\n}\n\n/* Given a set or sequence node, return the union of the filters of\n * the direct subtrees that do not contain any suitably permutable bands\n * (according to subtree_has_permutable_bands).\n */\nstatic __isl_give isl_union_set *get_all_non_parallel_subtree_filters(\n\t__isl_keep isl_schedule_node *node)\n{\n\treturn get_non_parallel_subtree_filters(node, 0);\n}\n\n/* Given a set or sequence node, return the union of the filters of\n * the initial direct subtrees that do not contain any suitably permutable\n * bands (according to subtree_has_permutable_bands).\n */\nstatic __isl_give isl_union_set *get_initial_non_parallel_subtree_filters(\n\t__isl_keep isl_schedule_node *node)\n{\n\treturn get_non_parallel_subtree_filters(node, 1);\n}\n\n/* Mark all variables that are accessed by the statement instances in \"domain\"\n * and that are local to \"prog\" as requiring a declaration in the host code.\n * The statement instances in \"domain\" correspond to (a subset of)\n * the active instances at \"node\".\n * \"node\" is not modified by this function, except that NULL is returned\n * in case of error.\n */\nstatic __isl_give isl_schedule_node *declare_accessed_local_variables(\n\t__isl_take isl_schedule_node *node, struct gpu_prog *prog,\n\t__isl_keep isl_union_set *domain)\n{\n\tisl_union_pw_multi_aff *contraction;\n\tisl_union_set *arrays;\n\tint i;\n\n\tif (!ppcg_scop_any_hidden_declarations(prog->scop))\n\t\treturn node;\n\tcontraction = isl_schedule_node_get_subtree_contraction(node);\n\tdomain = isl_union_set_copy(domain);\n\tdomain = isl_union_set_preimage_union_pw_multi_aff(domain, contraction);\n\tarrays = accessed_by_domain(domain, prog);\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tisl_space *space;\n\t\tisl_set *set;\n\t\tint empty;\n\n\t\tif (!prog->array[i].local)\n\t\t\tcontinue;\n\t\tspace = isl_set_get_space(prog->array[i].extent);\n\t\tset = isl_union_set_extract_set(arrays, space);\n\t\tempty = isl_set_plain_is_empty(set);\n\t\tisl_set_free(set);\n\t\tif (empty < 0)\n\t\t\tgoto error;\n\t\tif (!empty)\n\t\t\tprog->array[i].declare_local = 1;\n\t}\n\n\tisl_union_set_free(arrays);\n\treturn node;\nerror:\n\tisl_union_set_free(arrays);\n\treturn isl_schedule_node_free(node);\n}\n\n/* If \"node\" points to a set node, then separate its children\n * into subtrees that have suitably permutable bands and\n * those that do not.\n * Adjust the schedule tree in order to execute the second group\n * after the first group and return a pointer to the first group,\n * assuming there are any such subtrees.\n * If \"node\" points to a sequence node, then separate the initial\n * children that do not have suitably permutable bands and\n * return a pointer to the subsequence of children that do have such bands,\n * assuming there are any such subtrees.\n *\n * In both cases, mark all local variables in \"prog\" that are accessed by\n * the group without permutable bands as requiring a declaration on the host.\n */\nstatic __isl_give isl_schedule_node *isolate_permutable_subtrees(\n\t__isl_take isl_schedule_node *node, struct gpu_prog *prog)\n{\n\tisl_union_set *filter;\n\tenum isl_schedule_node_type type;\n\n\tif (!node)\n\t\treturn NULL;\n\ttype = isl_schedule_node_get_type(node);\n\tif (type == isl_schedule_node_set) {\n\t\tfilter = get_all_non_parallel_subtree_filters(node);\n\t\tnode = declare_accessed_local_variables(node, prog, filter);\n\t\tnode = isl_schedule_node_order_after(node, filter);\n\t} else if (type == isl_schedule_node_sequence) {\n\t\tfilter = get_initial_non_parallel_subtree_filters(node);\n\t\tnode = declare_accessed_local_variables(node, prog, filter);\n\t\tnode = isl_schedule_node_order_before(node, filter);\n\t}\n\n\treturn node;\n}\n\n/* Replace any reference to an array element in the range of \"copy\"\n * by a reference to all array elements (defined by the extent of the array).\n */\nstatic __isl_give isl_union_map *approximate_copy_out(\n\t__isl_take isl_union_map *copy, struct gpu_prog *prog)\n{\n\tint i;\n\tisl_union_map *res;\n\n\tres = isl_union_map_empty(isl_union_map_get_space(copy));\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tisl_space *space;\n\t\tisl_set *set;\n\t\tisl_union_map *copy_i;\n\t\tisl_union_set *extent, *domain;\n\n\t\tspace = isl_space_copy(prog->array[i].space);\n\t\textent = isl_union_set_from_set(isl_set_universe(space));\n\t\tcopy_i = isl_union_map_copy(copy);\n\t\tcopy_i = isl_union_map_intersect_range(copy_i, extent);\n\t\tset = isl_set_copy(prog->array[i].extent);\n\t\textent = isl_union_set_from_set(set);\n\t\tdomain = isl_union_map_domain(copy_i);\n\t\tcopy_i = isl_union_map_from_domain_and_range(domain, extent);\n\t\tres = isl_union_map_union(res, copy_i);\n\t}\n\n\tisl_union_map_free(copy);\n\n\treturn res;\n}\n\n/* Insert \"kernel\" marks that point to a ppcg_kernel structure\n * in front of all outermost tilable band that (by construction)\n * have at least one parallel loop.\n */\nstatic __isl_give isl_schedule_node *mark_kernels(struct gpu_gen *gen,\n\t__isl_take isl_schedule_node *node)\n{\n\treturn isl_schedule_node_map_descendant_bottom_up(node,\n\t\t\t\t\t\t&mark_outer_permutable, gen);\n}\n\n/* Construct schedule constraints from the dependences in prog->scop and\n * the array order dependences in prog->array_order.\n *\n * If live range reordering is allowed, then we need to make sure\n * that live ranges on arrays are not run in parallel since doing\n * so would require array expansion.  We therefore add the array\n * order dependences to the coincidence dependences.  Non-zero array\n * order dependences will then prevent a schedule dimension from being\n * considered parallel.\n * Live ranges derived from scalars are allowed to be run in parallel\n * since we force the scalars to be mapped to private memory in\n * check_scalar_live_ranges.\n * If live range reordering is allowed, then the false dependences\n * are not added to the validity constraints as that would prevent\n * reordering.  Instead, the external false dependences that enforce that reads\n * from potentially live-in data precede any later write and\n * that writes of potentially live-out data follow any other earlier write\n * are added to the validity and the coincidence constraints.\n * The false dependences are still added to the proximity constraints\n * for consistency with the case where live range reordering is not allowed.\n * The coincidence constraints then consist of flow dependences,\n * external false dependences and array order dependences.\n * The independences can be filtered out from the first two sets.\n * They have already been filtered out from the array order dependences\n * on a per array basis in collect_order_dependences.\n * There is no need for a per array handling of the other two sets\n * as there should be no flow or external false dependence on local\n * variables that can be filtered out.\n */\nstatic __isl_give isl_schedule_constraints *construct_schedule_constraints(\n\tstruct gpu_prog *prog)\n{\n\tisl_union_set *domain;\n\tisl_union_map *dep_raw, *dep;\n\tisl_union_map *validity, *proximity, *coincidence;\n\tisl_schedule_constraints *sc;\n\n\tdomain = isl_union_set_copy(prog->scop->domain);\n\tsc = isl_schedule_constraints_on_domain(domain);\n\tsc = isl_schedule_constraints_set_context(sc,\n\t\t\t\tisl_set_copy(prog->scop->context));\n\tif (prog->scop->options->live_range_reordering) {\n\t\tsc = isl_schedule_constraints_set_conditional_validity(sc,\n\t\t\tisl_union_map_copy(prog->scop->tagged_dep_flow),\n\t\t\tisl_union_map_copy(prog->scop->tagged_dep_order));\n\t\tproximity = isl_union_map_copy(prog->scop->dep_flow);\n\t\tvalidity = isl_union_map_copy(proximity);\n\t\tvalidity = isl_union_map_union(validity,\n\t\t\t    isl_union_map_copy(prog->scop->dep_forced));\n\t\tproximity = isl_union_map_union(proximity,\n\t\t\t    isl_union_map_copy(prog->scop->dep_false));\n\t\tcoincidence = isl_union_map_copy(validity);\n\t\tcoincidence = isl_union_map_subtract(coincidence,\n\t\t\tisl_union_map_copy(prog->scop->independence));\n\t\tcoincidence = isl_union_map_union(coincidence,\n\t\t\t\tisl_union_map_copy(prog->array_order));\n\t} else {\n\t\tdep_raw = isl_union_map_copy(prog->scop->dep_flow);\n\t\tdep = isl_union_map_copy(prog->scop->dep_false);\n\t\tdep = isl_union_map_union(dep, dep_raw);\n\t\tdep = isl_union_map_coalesce(dep);\n\t\tproximity = isl_union_map_copy(dep);\n\t\tcoincidence = isl_union_map_copy(dep);\n\t\tvalidity = dep;\n\t}\n\tsc = isl_schedule_constraints_set_validity(sc, validity);\n\tsc = isl_schedule_constraints_set_coincidence(sc, coincidence);\n\tsc = isl_schedule_constraints_set_proximity(sc, proximity);\n\n\treturn sc;\n}\n\n/* Compute an appropriate schedule based on the accesses in\n * gen->read and gen->write.\n *\n * We derive schedule constraints from the dependences in gen->prog->scop\n * and then use isl to compute a schedule that has a parallel loop\n * in each tilable band.\n * During the schedule construction, some statement instances\n * may be grouped first based on the input schedule.\n */\nstatic __isl_give isl_schedule *compute_schedule(struct gpu_gen *gen)\n{\n\tisl_schedule_constraints *sc;\n\tisl_schedule *schedule;\n\n\tsc = construct_schedule_constraints(gen->prog);\n\tschedule = gen->prog->scop->schedule;\n\tschedule = ppcg_compute_schedule(sc, schedule, gen->options);\n\n\treturn schedule;\n}\n\n/* If the band node \"node\" has exactly one member then mark it permutable.\n */\nstatic __isl_give isl_schedule_node *band_set_permutable(\n\t__isl_take isl_schedule_node *node,\n\t__isl_keep isl_schedule_constraints *sc)\n{\n\tif (isl_schedule_node_band_n_member(node) == 1)\n\t\tnode = isl_schedule_node_band_set_permutable(node, 1);\n\n\treturn node;\n}\n\n/* Return the coincidence constraints between pairs of instances\n * that are scheduled together by the ancestors of \"node\".\n * That is, select those coincidence constraints that relate\n * pairs of instances that have the same value for the prefix schedule.\n * If the schedule depth is zero, then the prefix schedule does not\n * contain any information, so we intersect domain and range\n * of the schedule constraints with the reaching domain elements instead.\n */\nstatic __isl_give isl_union_map *get_local_coincidence(\n\t__isl_keep isl_schedule_node *node,\n\t__isl_keep isl_schedule_constraints *sc)\n{\n\tisl_union_map *coincidence;\n\tisl_multi_union_pw_aff *prefix;\n\tisl_union_pw_multi_aff *contraction;\n\n\tcoincidence = isl_schedule_constraints_get_coincidence(sc);\n\tcontraction = isl_schedule_node_get_subtree_contraction(node);\n\tif (isl_schedule_node_get_schedule_depth(node) == 0) {\n\t\tisl_union_set *domain;\n\n\t\tdomain = isl_schedule_node_get_domain(node);\n\t\tdomain = isl_union_set_preimage_union_pw_multi_aff(domain,\n\t\t\t\t\t\t    contraction);\n\t\tcoincidence = isl_union_map_intersect_domain(coincidence,\n\t\t\t\t\t\t    isl_union_set_copy(domain));\n\t\tcoincidence = isl_union_map_intersect_range(coincidence,\n\t\t\t\t\t\t    domain);\n\t\treturn coincidence;\n\t}\n\n\tprefix = isl_schedule_node_get_prefix_schedule_multi_union_pw_aff(node);\n\tprefix = isl_multi_union_pw_aff_pullback_union_pw_multi_aff(prefix,\n\t\t\t\t\t\t\t\tcontraction);\n\treturn isl_union_map_eq_at_multi_union_pw_aff(coincidence, prefix);\n}\n\n/* For each member in the band node \"node\", determine whether\n * it is coincident with respect to the outer nodes and mark\n * it accordingly.\n *\n * That is, for each coincidence constraint between pairs\n * of instances that are scheduled together by the outer nodes,\n * check that domain and range are assigned the same value\n * by the band member.  This test is performed by checking\n * that imposing the same value for the band member does not\n * remove any elements from the set of coincidence constraints.\n */\nstatic __isl_give isl_schedule_node *band_set_coincident(\n\t__isl_take isl_schedule_node *node,\n\t__isl_keep isl_schedule_constraints *sc)\n{\n\tisl_union_map *coincidence;\n\tisl_union_pw_multi_aff *contraction;\n\tisl_multi_union_pw_aff *partial;\n\tint i, n;\n\n\tcoincidence = get_local_coincidence(node, sc);\n\n\tpartial = isl_schedule_node_band_get_partial_schedule(node);\n\tcontraction = isl_schedule_node_get_subtree_contraction(node);\n\tpartial = isl_multi_union_pw_aff_pullback_union_pw_multi_aff(partial,\n\t\t\t\t\t\t\t\tcontraction);\n\tn = isl_schedule_node_band_n_member(node);\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_union_map *coincidence_i;\n\t\tisl_union_pw_aff *upa;\n\t\tisl_multi_union_pw_aff *partial_i;\n\t\tint subset;\n\n\t\tupa = isl_multi_union_pw_aff_get_union_pw_aff(partial, i);\n\t\tpartial_i = isl_multi_union_pw_aff_from_union_pw_aff(upa);\n\t\tcoincidence_i = isl_union_map_copy(coincidence);\n\t\tcoincidence_i = isl_union_map_eq_at_multi_union_pw_aff(\n\t\t\t\t\t\t    coincidence_i, partial_i);\n\t\tsubset = isl_union_map_is_subset(coincidence, coincidence_i);\n\t\tisl_union_map_free(coincidence_i);\n\n\t\tif (subset < 0)\n\t\t\tbreak;\n\t\tnode = isl_schedule_node_band_member_set_coincident(node, i,\n\t\t\t\t\t\t\t\t    subset);\n\t}\n\tif (i < n)\n\t\tnode = isl_schedule_node_free(node);\n\tisl_multi_union_pw_aff_free(partial);\n\tisl_union_map_free(coincidence);\n\n\treturn node;\n}\n\n/* If \"node\" is a band, then set its properties.\n *\n * In particular, if the band has exactly one member, then mark it permutable.\n * Mark the band members coincident based on the coincidence constraints\n * of \"sc\".\n */\nstatic __isl_give isl_schedule_node *set_band_properties(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\tisl_schedule_constraints *sc = user;\n\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_band)\n\t\treturn node;\n\tif (isl_schedule_node_band_n_member(node) == 0)\n\t\treturn node;\n\n\tnode = band_set_permutable(node, sc);\n\tnode = band_set_coincident(node, sc);\n\n\treturn node;\n}\n\n/* Return the original schedule with all bands marked permutable and\n * all band members marked coincident based on the coincidence constraints.\n * The bands are explicitly marked permutable so that they will be considered\n * by mark_outer_permutable.\n */\nstatic __isl_give isl_schedule *determine_properties_original_schedule(\n\tstruct gpu_gen *gen)\n{\n\tisl_schedule *schedule;\n\tisl_schedule_constraints *sc;\n\n\tschedule = isl_schedule_copy(gen->prog->scop->schedule);\n\tsc = construct_schedule_constraints(gen->prog);\n\tschedule = isl_schedule_map_schedule_node_bottom_up(schedule,\n\t\t\t\t\t\t    &set_band_properties, sc);\n\tisl_schedule_constraints_free(sc);\n\n\treturn schedule;\n}\n\n/* Compute a schedule or determine the properties of the original schedule\n * depending on the value of the \"reschedule\" option.\n */\nstatic __isl_give isl_schedule *compute_or_set_properties(void *user)\n{\n\tstruct gpu_gen *gen = user;\n\n\tif (gen->options->reschedule)\n\t\treturn compute_schedule(gen);\n\telse\n\t\treturn determine_properties_original_schedule(gen);\n}\n\n/* Obtain a schedule for the scop, by reading it from\n * a file, by computing one or by determining the properties\n * of the original schedule.\n */\nstatic __isl_give isl_schedule *get_schedule(struct gpu_gen *gen)\n{\n\treturn ppcg_get_schedule(gen->ctx, gen->options,\n\t\t\t\t&compute_or_set_properties, gen);\n}\n\n/* Construct the string \"<a>_<b>\".\n */\nstatic char *concat(isl_ctx *ctx, const char *a, const char *b)\n{\n\tisl_printer *p;\n\tchar *s;\n\n\tp = isl_printer_to_str(ctx);\n\tp = isl_printer_print_str(p, a);\n\tp = isl_printer_print_str(p, \"_\");\n\tp = isl_printer_print_str(p, b);\n\ts = isl_printer_get_str(p);\n\tisl_printer_free(p);\n\n\treturn s;\n}\n\n/* For each array in \"prog\" of which an element appears in \"accessed\" and\n * that is not a read only scalar, create a zero-dimensional universe set\n * of which the tuple id has name \"<prefix>_<name of array>\" and a user\n * pointer pointing to the array (gpu_array_info).\n *\n * If the array is local to \"prog\", then make sure it will be declared\n * in the host code.\n *\n * Return the list of these universe sets.\n */\nstatic __isl_give isl_union_set_list *create_copy_filters(struct gpu_prog *prog,\n\tconst char *prefix, __isl_take isl_union_set *accessed)\n{\n\tint i;\n\tisl_ctx *ctx;\n\tisl_union_set_list *filters;\n\n\tctx = prog->ctx;\n\tfilters = isl_union_set_list_alloc(ctx, 0);\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tstruct gpu_array_info *array = &prog->array[i];\n\t\tisl_space *space;\n\t\tisl_set *accessed_i;\n\t\tint empty;\n\t\tchar *name;\n\t\tisl_id *id;\n\t\tisl_union_set *uset;\n\n\t\tif (gpu_array_is_read_only_scalar(array))\n\t\t\tcontinue;\n\n\t\tspace = isl_space_copy(array->space);\n\t\taccessed_i = isl_union_set_extract_set(accessed, space);\n\t\tempty = isl_set_plain_is_empty(accessed_i);\n\t\tisl_set_free(accessed_i);\n\t\tif (empty < 0) {\n\t\t\tfilters = isl_union_set_list_free(filters);\n\t\t\tbreak;\n\t\t}\n\t\tif (empty)\n\t\t\tcontinue;\n\n\t\tarray->global = 1;\n\t\tif (array->local)\n\t\t\tarray->declare_local = 1;\n\n\t\tname = concat(ctx, prefix, array->name);\n\t\tid = name ? isl_id_alloc(ctx, name, array) : NULL;\n\t\tfree(name);\n\t\tspace = isl_space_set_alloc(ctx, 0, 0);\n\t\tspace = isl_space_set_tuple_id(space, isl_dim_set, id);\n\t\tuset = isl_union_set_from_set(isl_set_universe(space));\n\n\t\tfilters = isl_union_set_list_add(filters, uset);\n\t}\n\tisl_union_set_free(accessed);\n\n\treturn filters;\n}\n\n/* Make sure that code for the statements in \"filters\" that\n * copy arrays to or from the device is only generated when\n * the size of the corresponding array is positive.\n * That is, add a set node underneath \"graft\" with \"filters\" as children\n * and for each child add a guard that the selects the parameter\n * values for which the corresponding array has a positive size.\n * The array is available in the user pointer of the statement identifier.\n * \"depth\" is the schedule depth of the position where \"graft\"\n * will be added.\n */\nstatic __isl_give isl_schedule_node *insert_positive_size_guards(\n\t__isl_take isl_schedule_node *graft,\n\t__isl_take isl_union_set_list *filters, int depth)\n{\n\tint i, n;\n\n\tgraft = isl_schedule_node_child(graft, 0);\n\tgraft = isl_schedule_node_insert_set(graft, filters);\n\tn = isl_schedule_node_n_children(graft);\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_union_set *filter;\n\t\tisl_set *domain, *guard;\n\t\tisl_id *id;\n\t\tstruct gpu_array_info *array;\n\n\t\tgraft = isl_schedule_node_child(graft, i);\n\t\tfilter = isl_schedule_node_filter_get_filter(graft);\n\t\tdomain = isl_set_from_union_set(filter);\n\t\tid = isl_set_get_tuple_id(domain);\n\t\tarray = isl_id_get_user(id);\n\t\tisl_id_free(id);\n\t\tisl_set_free(domain);\n\t\tguard = gpu_array_positive_size_guard(array);\n\t\tguard = isl_set_from_params(guard);\n\t\tguard = isl_set_add_dims(guard, isl_dim_set, depth);\n\t\tgraft = isl_schedule_node_child(graft, 0);\n\t\tgraft = isl_schedule_node_insert_guard(graft, guard);\n\t\tgraft = isl_schedule_node_parent(graft);\n\t\tgraft = isl_schedule_node_parent(graft);\n\t}\n\tgraft = isl_schedule_node_parent(graft);\n\n\treturn graft;\n}\n\n/* Create a graft for copying arrays to or from the device,\n * whenever the size of the array is strictly positive.\n * Each statement is called \"<prefix>_<name of array>\" and\n * the identifier has a user pointer pointing to the array.\n * The graft will be added at the position specified by \"node\".\n * \"copy\" contains the array elements that need to be copied.\n * Only arrays of which some elements need to be copied\n * will have a corresponding statement in the graph.\n * Note though that each such statement will copy the entire array.\n */\nstatic __isl_give isl_schedule_node *create_copy_device(struct gpu_prog *prog,\n\t__isl_keep isl_schedule_node *node, const char *prefix,\n\t__isl_take isl_union_set *copy)\n{\n\tint depth;\n\tisl_ctx *ctx;\n\tisl_space *space;\n\tisl_union_set *all, *domain;\n\tisl_union_set_list *filters;\n\tisl_union_map *extension;\n\tisl_schedule_node *graft;\n\n\tctx = prog->ctx;\n\tdepth = isl_schedule_node_get_schedule_depth(node);\n\tfilters = create_copy_filters(prog, prefix, copy);\n\tall = isl_union_set_list_union(isl_union_set_list_copy(filters));\n\n\tspace = depth < 0 ? NULL : isl_space_set_alloc(ctx, 0, depth);\n\tdomain = isl_union_set_from_set(isl_set_universe(space));\n\textension = isl_union_map_from_domain_and_range(domain, all);\n\tgraft = isl_schedule_node_from_extension(extension);\n\n\tif (!filters)\n\t\treturn isl_schedule_node_free(graft);\n\tif (isl_union_set_list_n_union_set(filters) == 0) {\n\t\tisl_union_set_list_free(filters);\n\t\treturn graft;\n\t}\n\n\treturn insert_positive_size_guards(graft, filters, depth);\n}\n\n/* Return (the universe spaces of) the arrays that are declared\n * inside the scop corresponding to \"prog\" and for which all\n * potential writes inside the scop form a subset of \"domain\".\n */\nstatic __isl_give isl_union_set *extract_local_accesses(struct gpu_prog *prog,\n\t__isl_keep isl_union_set *domain)\n{\n\tint i;\n\tisl_union_set *local;\n\n\tlocal = isl_union_set_empty(isl_union_set_get_space(domain));\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tisl_set *set;\n\t\tisl_union_map *to_outer;\n\t\tisl_union_map *may_write;\n\t\tisl_union_set *write_domain;\n\t\tisl_union_set *fields;\n\t\tint subset;\n\n\t\tif (!prog->array[i].local)\n\t\t\tcontinue;\n\n\t\tset = isl_set_universe(isl_space_copy(prog->array[i].space));\n\t\tto_outer = isl_union_map_copy(prog->to_outer);\n\t\tto_outer = isl_union_map_intersect_range(to_outer,\n\t\t\t\t    isl_union_set_from_set(isl_set_copy(set)));\n\t\tfields = isl_union_map_domain(to_outer);\n\t\tmay_write = isl_union_map_copy(prog->may_write);\n\t\tmay_write = isl_union_map_intersect_range(may_write, fields);\n\t\twrite_domain = isl_union_map_domain(may_write);\n\t\tsubset = isl_union_set_is_subset(write_domain, domain);\n\t\tisl_union_set_free(write_domain);\n\n\t\tif (subset < 0) {\n\t\t\tisl_set_free(set);\n\t\t\treturn isl_union_set_free(local);\n\t\t} else if (subset) {\n\t\t\tlocal = isl_union_set_add_set(local, set);\n\t\t} else {\n\t\t\tisl_set_free(set);\n\t\t}\n\t}\n\n\treturn local;\n}\n\n/* Internal data structure for node_may_persist.\n *\n * \"tagger\" maps tagged iteration domains to the corresponding untagged\n *\titeration domain.\n *\n * \"may_persist_flow\" is the set of all tagged dataflow dependences\n * with those dependences removed that either precede or follow\n * the kernel launch in a sequence.\n * \"inner_band_flow\" is the set of all tagged dataflow dependences\n * that are local to a given iteration of the outer band nodes\n * with respect to the current node.\n * \"local_flow\" is equal to \"inner_band_flow\", except that the domain\n * and the range have been intersected with intermediate filters\n * on children of sets or sequences.\n */\nstruct ppcg_may_persist_data {\n\tisl_union_pw_multi_aff *tagger;\n\n\tisl_union_map *local_flow;\n\tisl_union_map *inner_band_flow;\n\tisl_union_map *may_persist_flow;\n};\n\n/* Update the information in \"data\" based on the band ancestor \"node\".\n *\n * In particular, we restrict the dependences in data->local_flow\n * to those dependence where the source and the sink occur in\n * the same iteration of the given band node.\n * We also update data->inner_band_flow to the new value of\n * data->local_flow.\n */\nstatic int update_may_persist_at_band(__isl_keep isl_schedule_node *node,\n\tstruct ppcg_may_persist_data *data)\n{\n\tisl_multi_union_pw_aff *partial;\n\tisl_union_pw_multi_aff *contraction;\n\tisl_union_map *flow;\n\n\tif (isl_schedule_node_band_n_member(node) == 0)\n\t\treturn 0;\n\n\tpartial = isl_schedule_node_band_get_partial_schedule(node);\n\tcontraction = isl_schedule_node_get_subtree_contraction(node);\n\tpartial = isl_multi_union_pw_aff_pullback_union_pw_multi_aff(partial,\n\t\t\t\t\t\t\t\tcontraction);\n\tpartial = isl_multi_union_pw_aff_pullback_union_pw_multi_aff(partial,\n\t\t\t\tisl_union_pw_multi_aff_copy(data->tagger));\n\n\tflow = data->local_flow;\n\tflow = isl_union_map_eq_at_multi_union_pw_aff(flow, partial);\n\tdata->local_flow = flow;\n\n\tisl_union_map_free(data->inner_band_flow);\n\tdata->inner_band_flow = isl_union_map_copy(data->local_flow);\n\n\treturn 0;\n}\n\n/* Given a set of local reaching domain elements \"domain\",\n * expand them to the corresponding leaf domain elements using \"contraction\"\n * and insert the array references tags using data->tagger.\n */\nstatic __isl_give isl_union_set *expand_and_tag(\n\t__isl_take isl_union_set *domain,\n\t__isl_take isl_union_pw_multi_aff *contraction,\n\tstruct ppcg_may_persist_data *data)\n{\n\tdomain = isl_union_set_preimage_union_pw_multi_aff(domain,\n\t\t\t    contraction);\n\tdomain = isl_union_set_preimage_union_pw_multi_aff(domain,\n\t\t\t    isl_union_pw_multi_aff_copy(data->tagger));\n\treturn domain;\n}\n\n/* Given a filter node that is the child of a set or sequence node,\n * restrict data->local_flow to refer only to those elements\n * in the filter of the node.\n * \"contraction\" maps the leaf domain elements of the schedule tree\n * to the corresponding domain elements at (the parent of) \"node\".\n */\nstatic int filter_flow(__isl_keep isl_schedule_node *node,\n\tstruct ppcg_may_persist_data *data,\n\t__isl_take isl_union_pw_multi_aff *contraction)\n{\n\tisl_union_set *filter;\n\tisl_union_map *flow;\n\n\tflow = data->local_flow;\n\tfilter = isl_schedule_node_filter_get_filter(node);\n\tfilter = expand_and_tag(filter, contraction, data);\n\tflow = isl_union_map_intersect_domain(flow, isl_union_set_copy(filter));\n\tflow = isl_union_map_intersect_range(flow, filter);\n\tdata->local_flow = flow;\n\n\treturn 0;\n}\n\n/* Given a filter node \"node\", collect the filters on all preceding siblings\n * (which are also filter nodes), add them to \"filters\" and return the result.\n */\nstatic __isl_give isl_union_set *add_previous_filters(\n\t__isl_take isl_union_set *filters, __isl_keep isl_schedule_node *node)\n{\n\tisl_schedule_node *sibling;\n\n\tsibling = isl_schedule_node_copy(node);\n\twhile (sibling && isl_schedule_node_has_previous_sibling(sibling)) {\n\t\tisl_union_set *filter;\n\n\t\tsibling = isl_schedule_node_previous_sibling(sibling);\n\t\tfilter = isl_schedule_node_filter_get_filter(sibling);\n\t\tfilters = isl_union_set_union(filters, filter);\n\t}\n\tisl_schedule_node_free(sibling);\n\tif (!sibling)\n\t\treturn isl_union_set_free(filters);\n\n\treturn filters;\n}\n\n/* Given a filter node \"node\", collect the filters on all following siblings\n * (which are also filter nodes), add them to \"filters\" and return the result.\n */\nstatic __isl_give isl_union_set *add_next_filters(\n\t__isl_take isl_union_set *filters, __isl_keep isl_schedule_node *node)\n{\n\tisl_schedule_node *sibling;\n\n\tsibling = isl_schedule_node_copy(node);\n\twhile (sibling && isl_schedule_node_has_next_sibling(sibling)) {\n\t\tisl_union_set *filter;\n\n\t\tsibling = isl_schedule_node_next_sibling(sibling);\n\t\tfilter = isl_schedule_node_filter_get_filter(sibling);\n\t\tfilters = isl_union_set_union(filters, filter);\n\t}\n\tisl_schedule_node_free(sibling);\n\tif (!sibling)\n\t\treturn isl_union_set_free(filters);\n\n\treturn filters;\n}\n\n/* Remove those flow dependences from data->may_persist_flow\n * that flow between elements of \"domain\" within the same iteration\n * of all outer band nodes.\n * \"contraction\" maps the leaf domain elements of the schedule tree\n * to the corresponding elements \"domain\".\n */\nstatic void remove_external_flow(struct ppcg_may_persist_data *data,\n\t__isl_take isl_union_set *domain,\n\t__isl_keep isl_union_pw_multi_aff *contraction)\n{\n\tisl_union_map *flow;\n\n\tcontraction = isl_union_pw_multi_aff_copy(contraction);\n\tdomain = expand_and_tag(domain, contraction, data);\n\tflow = isl_union_map_copy(data->local_flow);\n\tflow = isl_union_map_intersect_domain(flow, isl_union_set_copy(domain));\n\tflow = isl_union_map_intersect_range(flow, domain);\n\n\tdata->may_persist_flow = isl_union_map_subtract(data->may_persist_flow,\n\t\t\t\t\t\t\tflow);\n}\n\n/* Update the information in \"data\" based on the filter ancestor \"node\".\n * We only need to modify anything if the filter is the child\n * of a set or sequence node.\n *\n * In the case of a sequence, we remove the dependences between\n * statement instances that are both executed either before or\n * after the subtree that will be mapped to a kernel, within\n * the same iteration of outer bands.\n *\n * In both cases, we restrict data->local_flow to the current child.\n */\nstatic int update_may_persist_at_filter(__isl_keep isl_schedule_node *node,\n\tstruct ppcg_may_persist_data *data)\n{\n\tenum isl_schedule_node_type type;\n\tisl_schedule_node *parent;\n\tisl_space *space;\n\tisl_union_pw_multi_aff *contraction;\n\tisl_union_set *before, *after, *filter;\n\n\ttype = isl_schedule_node_get_parent_type(node);\n\tif (type != isl_schedule_node_sequence && type != isl_schedule_node_set)\n\t\treturn 0;\n\n\tparent = isl_schedule_node_copy(node);\n\tparent = isl_schedule_node_parent(parent);\n\tcontraction = isl_schedule_node_get_subtree_contraction(parent);\n\tisl_schedule_node_free(parent);\n\n\tif (type == isl_schedule_node_set)\n\t\treturn filter_flow(node, data, contraction);\n\n\tfilter = isl_schedule_node_filter_get_filter(node);\n\tspace = isl_union_set_get_space(filter);\n\tisl_union_set_free(filter);\n\tbefore = isl_union_set_empty(space);\n\tafter = isl_union_set_copy(before);\n\tbefore = add_previous_filters(before, node);\n\tafter = add_next_filters(after, node);\n\n\tremove_external_flow(data, before, contraction);\n\tremove_external_flow(data, after, contraction);\n\n\treturn filter_flow(node, data, contraction);\n}\n\n/* Update the information in \"data\" based on the ancestor \"node\".\n */\nstatic isl_stat update_may_persist_at(__isl_keep isl_schedule_node *node,\n\tvoid *user)\n{\n\tstruct ppcg_may_persist_data *data = user;\n\n\tswitch (isl_schedule_node_get_type(node)) {\n\tcase isl_schedule_node_error:\n\t\treturn isl_stat_error;\n\tcase isl_schedule_node_context:\n\tcase isl_schedule_node_domain:\n\tcase isl_schedule_node_expansion:\n\tcase isl_schedule_node_extension:\n\tcase isl_schedule_node_guard:\n\tcase isl_schedule_node_leaf:\n\tcase isl_schedule_node_mark:\n\tcase isl_schedule_node_sequence:\n\tcase isl_schedule_node_set:\n\t\tbreak;\n\tcase isl_schedule_node_band:\n\t\tif (update_may_persist_at_band(node, data) < 0)\n\t\t\treturn isl_stat_error;\n\t\tbreak;\n\tcase isl_schedule_node_filter:\n\t\tif (update_may_persist_at_filter(node, data) < 0)\n\t\t\treturn isl_stat_error;\n\t\tbreak;\n\t}\n\n\treturn isl_stat_ok;\n}\n\n/* Determine the set of array elements that may need to be perserved\n * by a kernel constructed from the subtree at \"node\".\n * This includes the set of array elements that may need to be preserved\n * by the entire scop (prog->may_persist) and the elements for which\n * there is a potential flow dependence that may cross a kernel launch.\n *\n * To determine the second set, we start from all flow dependences.\n * From this set of dependences, we remove those that cannot possibly\n * require data to be preserved by a kernel launch.\n * In particular, we consider the following sets of dependences.\n * - dependences of which the write occurs inside the kernel.\n *   If the data is needed outside the kernel, then it will\n *   be copied out immediately after the kernel launch, so there\n *   is no need for any special care.\n * - dependences of which the read occurs inside the kernel and the\n *   corresponding write occurs inside the same iteration of the\n *   outer band nodes.  This means that the data is needed in\n *   the first kernel launch after the write, which is already\n *   taken care of by the standard copy-in.  That is, the data\n *   do not need to be preserved by any intermediate call to\n *   the same kernel.\n * - dependences of which the write and the read either both occur\n *   before the kernel launch or both occur after the kernel launch,\n *   within the same iteration of the outer band nodes with respect\n *   to the sequence that determines the ordering of the dependence\n *   and the kernel launch.  Such flow dependences cannot cross\n *   any kernel launch.\n *\n * For the remaining (tagged) dependences, we take the domain\n * (i.e., the tagged writes) and apply the tagged access relation\n * to obtain the accessed data elements.\n * These are then combined with the elements that may need to be\n * preserved by the entire scop.\n */\nstatic __isl_give isl_union_set *node_may_persist(\n\t__isl_keep isl_schedule_node *node, struct gpu_prog *prog)\n{\n\tstruct ppcg_may_persist_data data;\n\tisl_union_pw_multi_aff *contraction;\n\tisl_union_set *domain;\n\tisl_union_set *persist;\n\tisl_union_map *flow, *local_flow;\n\n\tdata.tagger = prog->scop->tagger;\n\n\tflow = isl_union_map_copy(prog->scop->tagged_dep_flow);\n\tdata.local_flow = isl_union_map_copy(flow);\n\tdata.inner_band_flow = isl_union_map_copy(flow);\n\tdata.may_persist_flow = flow;\n\tif (isl_schedule_node_foreach_ancestor_top_down(node,\n\t\t\t\t\t&update_may_persist_at, &data) < 0)\n\t\tdata.may_persist_flow =\n\t\t\t\t    isl_union_map_free(data.may_persist_flow);\n\tflow = data.may_persist_flow;\n\tisl_union_map_free(data.local_flow);\n\n\tdomain = isl_schedule_node_get_domain(node);\n\tcontraction = isl_schedule_node_get_subtree_contraction(node);\n\tdomain = isl_union_set_preimage_union_pw_multi_aff(domain,\n\t\t\t\t    contraction);\n\tdomain = isl_union_set_preimage_union_pw_multi_aff(domain,\n\t\t\t\t    isl_union_pw_multi_aff_copy(data.tagger));\n\tflow = isl_union_map_subtract_domain(flow, isl_union_set_copy(domain));\n\tlocal_flow = data.inner_band_flow;\n\tlocal_flow = isl_union_map_intersect_range(local_flow, domain);\n\tflow = isl_union_map_subtract(flow, local_flow);\n\n\tpersist = isl_union_map_domain(flow);\n\tpersist = isl_union_set_apply(persist,\n\t\t\tisl_union_map_copy(prog->scop->tagged_may_writes));\n\tpersist = isl_union_set_union(persist,\n\t\t\tisl_union_set_copy(prog->may_persist));\n\n\treturn persist;\n}\n\n/* Add nodes for copying outer arrays in and out of the device\n * before and after the subtree \"node\", which contains one or more kernels.\n * \"domain\" contains the original statement instances, i.e.,\n * those that correspond to the domains of the access relations in \"prog\".\n * In particular, the domain has not been contracted in any way.\n * \"prefix\" contains the prefix schedule at that point, in terms\n * of the same original statement instances.\n *\n * We first compute the sets of outer array elements that need\n * to be copied in and out and then graft in the nodes for\n * performing this copying.\n *\n * In particular, for each array that is possibly written anywhere in\n * the subtree \"node\" and that may be used after \"node\"\n * or that may be visible outside the corresponding scop,\n * we copy out its entire extent.\n *\n * Any array elements that is read without first being written inside\n * the subtree \"node\" needs to be copied in.\n * Furthermore, if there are any array elements that\n * are copied out, but that may not be written inside \"node, then\n * they also need to be copied in to ensure that the value after execution\n * is the same as the value before execution, at least for those array\n * elements that may have their values preserved by the scop or that\n * may be written before \"node\" and read after \"node\".\n * In case the array elements are structures, we need to take into\n * account that all members of the structures need to be written\n * by \"node\" before we can avoid copying the data structure in.\n *\n * Note that the may_write relation is intersected with the domain,\n * which has been intersected with the context.\n * This helps in those cases where the arrays are declared with a fixed size,\n * while the accesses are parametric and the context assigns a fixed value\n * to the parameters.\n *\n * If an element from a local array is read without first being written,\n * then there is no point in copying it in since it cannot have been\n * written prior to the scop.  Warn about the uninitialized read instead.\n */\nstatic __isl_give isl_schedule_node *add_to_from_device(\n\t__isl_take isl_schedule_node *node, __isl_take isl_union_set *domain,\n\t__isl_take isl_union_map *prefix, struct gpu_prog *prog)\n{\n\tisl_union_set *local;\n\tisl_union_set *may_persist;\n\tisl_union_map *may_write, *must_write, *copy_out, *not_written;\n\tisl_union_map *read, *copy_in;\n\tisl_union_map *tagged;\n\tisl_union_map *local_uninitialized;\n\tisl_schedule_node *graft;\n\n\ttagged = isl_union_map_copy(prog->scop->tagged_reads);\n\ttagged = isl_union_map_union(tagged,\n\t\t\t    isl_union_map_copy(prog->scop->tagged_may_writes));\n\n\tmay_write = isl_union_map_copy(prog->may_write);\n\tmay_write = isl_union_map_intersect_domain(may_write,\n\t\t\t\t\tisl_union_set_copy(domain));\n\tmay_write = remove_local_accesses(prog,\n\t\t\t\t\tisl_union_map_copy(tagged), may_write,\n\t\t\t\t\tisl_union_map_copy(prefix), 0);\n\tmay_write = isl_union_map_apply_range(may_write,\n\t\t\t\t\tisl_union_map_copy(prog->to_outer));\n\tmay_write = isl_union_map_apply_domain(may_write,\n\t\t\t\t\tisl_union_map_copy(prefix));\n\tmay_write = approximate_copy_out(may_write, prog);\n\tcopy_out = isl_union_map_copy(may_write);\n\tmay_write = isl_union_map_apply_range(may_write,\n\t\t\t\t\tisl_union_map_copy(prog->to_inner));\n\tmust_write = isl_union_map_copy(prog->must_write);\n\tmust_write = isl_union_map_apply_domain(must_write,\n\t\t\t\t\tisl_union_map_copy(prefix));\n\tmay_persist = node_may_persist(node, prog);\n\tmay_write = isl_union_map_intersect_range(may_write, may_persist);\n\tnot_written = isl_union_map_subtract(may_write, must_write);\n\n\tlocal = extract_local_accesses(prog, domain);\n\tread = isl_union_map_copy(prog->read);\n\tread = isl_union_map_intersect_domain(read, domain);\n\tread = remove_local_accesses(prog, tagged, read,\n\t\t\t\t\tisl_union_map_copy(prefix), 1);\n\tlocal = isl_union_set_apply(local, isl_union_map_copy(prog->to_inner));\n\tlocal_uninitialized = isl_union_map_copy(prog->scop->live_in);\n\tlocal_uninitialized = isl_union_map_intersect_range(local_uninitialized,\n\t\t\t\t\t\t\t    local);\n\tlocal_uninitialized = isl_union_map_intersect(local_uninitialized,\n\t\t\t\t\t\t    isl_union_map_copy(read));\n\tif (!isl_union_map_is_empty(local_uninitialized)) {\n\t\tfprintf(stderr,\n\t\t\t\"possibly uninitialized reads (not copied in):\\n\");\n\t\tisl_union_map_dump(local_uninitialized);\n\t}\n\tread = isl_union_map_subtract(read, local_uninitialized);\n\tread = isl_union_map_apply_domain(read, prefix);\n\tcopy_in = isl_union_map_union(read, not_written);\n\tcopy_in = isl_union_map_apply_range(copy_in,\n\t\t\t\t    isl_union_map_copy(prog->to_outer));\n\n\tgraft = create_copy_device(prog, node, \"to_device\",\n\t\t\t\t\t\tisl_union_map_range(copy_in));\n\tnode = isl_schedule_node_graft_before(node, graft);\n\tgraft = create_copy_device(prog, node, \"from_device\",\n\t\t\t\t\t\tisl_union_map_range(copy_out));\n\tnode = isl_schedule_node_graft_after(node, graft);\n\n\treturn node;\n}\n\n/* Add nodes for initializing (\"init_device\") and clearing (\"clear_device\")\n * the device before and after \"node\".\n */\nstatic __isl_give isl_schedule_node *add_init_clear_device(\n\t__isl_take isl_schedule_node *node)\n{\n\tisl_ctx *ctx;\n\tisl_space *space;\n\tisl_union_set *domain;\n\tisl_schedule_node *graft;\n\n\tctx = isl_schedule_node_get_ctx(node);\n\n\tspace = isl_space_set_alloc(ctx, 0, 0);\n\tspace = isl_space_set_tuple_name(space, isl_dim_set, \"init_device\");\n\tdomain = isl_union_set_from_set(isl_set_universe(space));\n\tgraft = isl_schedule_node_from_domain(domain);\n\n\tnode = isl_schedule_node_graft_before(node, graft);\n\n\tspace = isl_space_set_alloc(ctx, 0, 0);\n\tspace = isl_space_set_tuple_name(space, isl_dim_set, \"clear_device\");\n\tdomain = isl_union_set_from_set(isl_set_universe(space));\n\tgraft = isl_schedule_node_from_domain(domain);\n\n\tnode = isl_schedule_node_graft_after(node, graft);\n\n\treturn node;\n}\n\n/* Update \"schedule\" for mapping to a GPU device.\n *\n * In particular, insert a context node, create kernels for\n * each outermost tilable band and introduce nodes for copying arrays\n * in and out of the device and for initializing and clearing the device.\n * If the child of the initial root points to a set node,\n * then children of this node that do not contain any tilable bands\n * are separated from the other children and are not mapped to\n * the device.\n *\n * The GPU code is generated in a context where at least one\n * statement instance is executed.  The corresponding guard is inserted\n * around the entire schedule.\n */\nstatic __isl_give isl_schedule *map_to_device(struct gpu_gen *gen,\n\t__isl_take isl_schedule *schedule)\n{\n\tisl_schedule_node *node;\n\tisl_set *context;\n\tisl_set *guard;\n\tisl_union_set *domain;\n\tisl_union_map *prefix;\n\tisl_union_pw_multi_aff *contraction;\n\tstruct gpu_prog *prog;\n\n\tcontext = isl_set_copy(gen->prog->context);\n\tcontext = isl_set_from_params(context);\n\tschedule = isl_schedule_insert_context(schedule, context);\n\n\tprog = gen->prog;\n\tguard = isl_union_set_params(isl_union_set_copy(prog->scop->domain));\n\tprog->context = isl_set_intersect(prog->context, isl_set_copy(guard));\n\tguard = isl_set_from_params(guard);\n\n\tnode = isl_schedule_get_root(schedule);\n\tisl_schedule_free(schedule);\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = isolate_permutable_subtrees(node, gen->prog);\n\tdomain = isl_schedule_node_get_domain(node);\n\tcontraction = isl_schedule_node_get_subtree_contraction(node);\n\tdomain = isl_union_set_preimage_union_pw_multi_aff(domain,\n\t\t\t\t    isl_union_pw_multi_aff_copy(contraction));\n\tprefix = isl_schedule_node_get_prefix_schedule_union_map(node);\n\tprefix = isl_union_map_preimage_domain_union_pw_multi_aff(prefix,\n\t\t\t\t    contraction);\n\tnode = mark_kernels(gen, node);\n\tnode = add_to_from_device(node, domain, prefix, gen->prog);\n\tnode = isl_schedule_node_root(node);\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = isl_schedule_node_insert_guard(node, guard);\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = add_init_clear_device(node);\n\tschedule = isl_schedule_node_get_schedule(node);\n\tisl_schedule_node_free(node);\n\n\treturn schedule;\n}\n\n/* Internal data structure for extract_access.\n * \"next_access\" points to the end of a linked list that is extended\n * by extract_access.\n * \"single_expression\" is set if the access expressions belong to\n * an expression statement (i.e., a statement without internal control).\n * \"any_to_outer\" maps all intermediate arrays to their outer arrays.\n */\nstruct ppcg_extract_access_data {\n\tstruct gpu_stmt_access **next_access;\n\tint single_expression;\n\tisl_union_map *any_to_outer;\n};\n\n/* Given a tagged access relation to a single array \"tagged\", extract it\n * as a map, taking into account that the input may be empty.\n * If the access relation is empty, then it does not contain\n * any space information, so we try to recover it from the index\n * expression.\n * The space of the index expression is of the form I -> A,\n * with I the statement instances and A the array, or [I -> F] -> A,\n * with F the filters corresponding to arguments.\n * We first drop F, if present, obtaining I -> A.\n * Then we construct I -> R, with R the reference tag,\n * combine the two into I -> [R -> A] and uncurry to obtain\n * the final result [I -> R] -> A.\n * Note that the index expression may have a lower dimension\n * than that of the array, but this dimension is not used\n * if the access relation is empty.\n */\nstatic __isl_give isl_map *extract_single_tagged_access(\n\t__isl_take isl_union_map *tagged, __isl_keep pet_expr *expr)\n{\n\tint empty;\n\tisl_id *id;\n\tisl_space *space, *space2;\n\tisl_multi_pw_aff *index;\n\n\tempty = isl_union_map_is_empty(tagged);\n\tif (empty < 0)\n\t\tgoto error;\n\tif (!empty)\n\t\treturn isl_map_from_union_map(tagged);\n\tisl_union_map_free(tagged);\n\n\tindex = pet_expr_access_get_index(expr);\n\tspace = isl_multi_pw_aff_get_space(index);\n\tisl_multi_pw_aff_free(index);\n\tif (isl_space_domain_is_wrapping(space))\n\t\tspace = isl_space_domain_factor_domain(space);\n\tspace2 = isl_space_copy(space);\n\tspace2 = isl_space_from_domain(isl_space_domain(space));\n\tid = pet_expr_access_get_ref_id(expr);\n\tspace2 = isl_space_set_tuple_id(space2, isl_dim_out, id);\n\tspace = isl_space_range_product(space2, space);\n\tspace = isl_space_uncurry(space);\n\n\treturn isl_map_empty(space);\nerror:\n\tisl_union_map_free(tagged);\n\treturn NULL;\n}\n\n/* Does the index expression \"index\" of \"expr\" represent an access\n * to a single element?\n * That is, is \"index\" completely specified?\n *\n * If \"expr\" accesses elements from different spaces (i.e., fields\n * of a structure), then it does not access a single element.\n * Otherwise, if the single space of the access matches the space\n * of \"index\", then the index expression is completely specified\n * (no pointer to a lower-dimensional slice of the accessed array)\n * and a single element is being accessed.\n */\nstatic isl_bool complete_index(__isl_keep pet_expr *expr,\n\t__isl_keep isl_multi_pw_aff *index)\n{\n\tisl_union_map *read, *write, *all;\n\tisl_map *map;\n\tisl_space *space1, *space2;\n\tisl_bool complete;\n\n\tread = pet_expr_access_get_may_read(expr);\n\twrite = pet_expr_access_get_may_write(expr);\n\tall = isl_union_map_union(read, write);\n\tif (!all)\n\t\treturn isl_bool_error;\n\tif (isl_union_map_n_map(all) != 1) {\n\t\tisl_union_map_free(all);\n\t\treturn isl_bool_false;\n\t}\n\tmap = isl_map_from_union_map(all);\n\tspace1 = isl_map_get_space(map);\n\tisl_map_free(map);\n\tspace2 = isl_multi_pw_aff_get_space(index);\n\tcomplete = isl_space_tuple_is_equal(space1, isl_dim_out,\n\t\t\t\t\t    space2, isl_dim_out);\n\tisl_space_free(space1);\n\tisl_space_free(space2);\n\n\treturn complete;\n}\n\n/* Does \"expr\" access a single, fixed element (independently of the statement\n * instance)?\n * That is, does it have a completely specified constant index expression?\n *\n * Note that it is not sufficient for the index expression to be\n * piecewise constant.  isl_multi_pw_aff_is_cst can therefore not be used.\n */\nstatic isl_bool accesses_fixed_element(__isl_keep pet_expr *expr)\n{\n\tint i, n;\n\tisl_multi_pw_aff *index;\n\tisl_bool fixed = isl_bool_true;\n\n\tindex = pet_expr_access_get_index(expr);\n\tif (index < 0)\n\t\treturn isl_bool_error;\n\tn = isl_multi_pw_aff_dim(index, isl_dim_out);\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_pw_aff *pa;\n\n\t\tpa = isl_multi_pw_aff_get_pw_aff(index, 0);\n\t\tfixed = isl_pw_aff_n_piece(pa) == 1;\n\t\tif (fixed)\n\t\t\tfixed = isl_pw_aff_is_cst(pa);\n\t\tisl_pw_aff_free(pa);\n\t\tif (fixed < 0 || !fixed)\n\t\t\tbreak;\n\t}\n\tif (fixed >= 0 && fixed)\n\t\tfixed = complete_index(expr, index);\n\tisl_multi_pw_aff_free(index);\n\n\treturn fixed;\n}\n\n/* Extract a gpu_stmt_access from \"expr\", append it to the list\n * that ends in *data->next_access and update the end of the list.\n * If the access expression performs a write, then it is considered\n * exact only if it appears in a single expression statement and\n * if its may access relation is equal to its must access relation.\n *\n * The combined set of may accesses may be a union if member accesses\n * are involved, but the entire set is derived from a single reference and\n * therefore from a single index expression.  These accesses therefore\n * all map to the same outer array.\n */\nstatic int extract_access(__isl_keep pet_expr *expr, void *user)\n{\n\tstruct ppcg_extract_access_data *data = user;\n\tisl_union_map *tagged;\n\tstruct gpu_stmt_access *access;\n\tisl_ctx *ctx = pet_expr_get_ctx(expr);\n\tisl_multi_pw_aff *index;\n\n\taccess = isl_alloc_type(ctx, struct gpu_stmt_access);\n\tif (!access)\n\t\treturn -1;\n\taccess->next = NULL;\n\taccess->read = pet_expr_access_is_read(expr);\n\taccess->write = pet_expr_access_is_write(expr);\n\ttagged = pet_expr_access_get_tagged_may_read(expr);\n\ttagged = isl_union_map_union(tagged,\n\t\t\t\tpet_expr_access_get_tagged_may_write(expr));\n\ttagged = isl_union_map_apply_range(tagged,\n\t\t\t\t\tisl_union_map_copy(data->any_to_outer));\n\tif (!access->write) {\n\t\taccess->exact_write = 1;\n\t} else if (!data->single_expression) {\n\t\taccess->exact_write = 0;\n\t} else {\n\t\tisl_union_map *must, *may;\n\t\tmay = isl_union_map_copy(tagged);\n\t\tmay = isl_union_map_domain_factor_domain(may);\n\t\tmust = pet_expr_access_get_must_write(expr);\n\t\taccess->exact_write = isl_union_map_is_equal(must, may);\n\t\tisl_union_map_free(must);\n\t\tisl_union_map_free(may);\n\t}\n\tindex = pet_expr_access_get_index(expr);\n\taccess->n_index = isl_multi_pw_aff_dim(index, isl_dim_out);\n\tisl_multi_pw_aff_free(index);\n\taccess->ref_id = pet_expr_access_get_ref_id(expr);\n\taccess->tagged_access = extract_single_tagged_access(tagged, expr);\n\taccess->access = isl_map_copy(access->tagged_access);\n\taccess->access = isl_map_domain_factor_domain(access->access);\n\taccess->fixed_element = accesses_fixed_element(expr);\n\n\t*data->next_access = access;\n\tdata->next_access = &(*data->next_access)->next;\n\n\tif (!access->access || access->fixed_element < 0)\n\t\treturn -1;\n\n\treturn 0;\n}\n\n/* Construct a linked list of gpu_stmt_access objects,\n * one for each access expression in the statement body.\n * \"any_to_outer\" maps all intermediate arrays to their outer arrays.\n */\nstatic int pet_stmt_extract_accesses(struct gpu_stmt *stmt,\n\t__isl_keep isl_union_map *any_to_outer)\n{\n\tstruct ppcg_extract_access_data data;\n\n\tstmt->accesses = NULL;\n\tdata.next_access = &stmt->accesses;\n\tdata.single_expression =\n\t\tpet_tree_get_type(stmt->stmt->body) == pet_tree_expr;\n\tdata.any_to_outer = any_to_outer;\n\treturn pet_tree_foreach_access_expr(stmt->stmt->body,\n\t\t\t\t\t\t&extract_access, &data);\n}\n\n/* Has statement \"stmt\" been killed from \"scop\"?\n * That is, is the instance set of \"scop\" free from any\n * instances of \"stmt\"?\n */\nstatic isl_bool is_stmt_killed(struct ppcg_scop *scop, struct pet_stmt *stmt)\n{\n\tisl_space *space;\n\tisl_set *left;\n\tisl_bool empty;\n\n\tif (!scop || !stmt)\n\t\treturn isl_bool_error;\n\tspace = isl_set_get_space(stmt->domain);\n\tleft = isl_union_set_extract_set(scop->domain, space);\n\tempty = isl_set_plain_is_empty(left);\n\tisl_set_free(left);\n\n\treturn empty;\n}\n\n/* Return an array of gpu_stmt representing the statements in \"scop\".\n * Do not collect array accesses for statements that have been killed.\n */\nstatic struct gpu_stmt *extract_stmts(isl_ctx *ctx, struct ppcg_scop *scop,\n\t__isl_keep isl_union_map *any_to_outer)\n{\n\tint i;\n\tstruct gpu_stmt *stmts;\n\n\tstmts = isl_calloc_array(ctx, struct gpu_stmt, scop->pet->n_stmt);\n\tif (!stmts)\n\t\treturn NULL;\n\n\tfor (i = 0; i < scop->pet->n_stmt; ++i) {\n\t\tstruct gpu_stmt *s = &stmts[i];\n\t\tisl_bool killed;\n\n\t\ts->id = isl_set_get_tuple_id(scop->pet->stmts[i]->domain);\n\t\ts->stmt = scop->pet->stmts[i];\n\t\tkilled = is_stmt_killed(scop, scop->pet->stmts[i]);\n\t\tif (killed < 0)\n\t\t\treturn free_stmts(stmts, i + 1);\n\t\tif (killed)\n\t\t\tcontinue;\n\t\tif (pet_stmt_extract_accesses(s, any_to_outer) < 0)\n\t\t\treturn free_stmts(stmts, i + 1);\n\t}\n\n\treturn stmts;\n}\n\n/* Generate CUDA code for \"scop\" and print it to \"p\".\n * After generating an AST for the transformed scop as explained below,\n * we call \"gen->print\" to print the AST in the desired output format\n * to \"p\".\n *\n * If it turns out that it does not make sense to generate GPU code,\n * then we generate CPU code instead.\n *\n * The declarations of the arrays that are visible outside of the scop\n * are printed outside of the code generated from the schedule,\n * because the generated code may involve a guard around the entire code.\n *\n * We first compute a schedule that respects the dependences\n * of the original program and select the outermost bands\n * of tilable dimensions that have at least one parallel loop.\n * If the --load-schedule is specified, then the loaded schedule\n * is used instead of a computed schedule.\n *\n * Each of these bands B is then tiled according to \"tile\" sizes, resulting\n * in two nested bands, with a kernel marker on top\n *\n *\t\tK\n *\t\t|\n *\t\tT\n *\t\t|\n *\t\tP\n *\n * We then split off at most 2 parallel dimensions from the T band and\n * at most 3 parallel dimension from the P band\n *\n *\t\tK\n *\t\t|\n *\t\tT\n *\t\tT1\n *\t\t|\n *\t\tT2\n *\t\t|\n *\t\tP1\n *\t\t|\n *\t\tP2\n *\n * A filter is introduced in front of T1 that maps the domain instances\n * to block identifiers.  Similarly, a filter is introduced in front of P1\n * that maps the domain instances to thread identifiers.\n *\n * For each iteration of the T2 band and for each array, we compute\n * the array elements accessed by that iteration, construct a rectangular\n * box around it and shift it to the origin.  The result is used\n * as shared memory for the array.\n *\n * Copying and synchronization statements are added to this schedule tree.\n * In principle, these are added in front of the P1 band, but some of\n * them may get hoisted up to higher levels.\n *\n * The entire AST is then generated from the single resulting schedule tree.\n * During the generation the subtrees at kernel nodes (K) are saved\n * aside and replaced by kernel calls.  The result is printed as host code\n * while the saved subtrees are printed as device code.\n */\nstatic __isl_give isl_printer *generate(__isl_take isl_printer *p,\n\tstruct gpu_gen *gen, struct ppcg_scop *scop,\n\tstruct ppcg_options *options)\n{\n\tstruct gpu_prog *prog;\n\tisl_ctx *ctx;\n\tisl_schedule *schedule;\n\tisl_bool any_permutable;\n\n\tif (!scop)\n\t\treturn isl_printer_free(p);\n\n\tctx = isl_printer_get_ctx(p);\n\tprog = gpu_prog_alloc(ctx, scop);\n\tif (!prog)\n\t\treturn isl_printer_free(p);\n\n\tgen->prog = prog;\n\tschedule = get_schedule(gen);\n\n\tany_permutable = has_any_permutable_node(schedule);\n\tif (any_permutable < 0 || !any_permutable) {\n\t\tif (any_permutable < 0)\n\t\t\tp = isl_printer_free(p);\n\t\telse\n\t\t\tp = print_cpu(p, scop, options);\n\t\tisl_schedule_free(schedule);\n\t} else {\n\t\tschedule = map_to_device(gen, schedule);\n\t\tgen->tree = generate_code(gen, schedule);\n\t\tp = ppcg_set_macro_names(p);\n\t\tp = ppcg_print_exposed_declarations(p, prog->scop);\n\t\tp = gen->print(p, gen->prog, gen->tree, &gen->types,\n\t\t\t\t    gen->print_user);\n\t\tisl_ast_node_free(gen->tree);\n\t}\n\n\tgpu_prog_free(prog);\n\n\treturn p;\n}\n\n/* Wrapper around generate for use as a ppcg_transform callback.\n */\nstatic __isl_give isl_printer *generate_wrap(__isl_take isl_printer *p,\n\tstruct ppcg_scop *scop, void *user)\n{\n\tstruct gpu_gen *gen = user;\n\n\treturn generate(p, gen, scop, gen->options);\n}\n\n/* Transform the code in the file called \"input\" by replacing\n * all scops by corresponding GPU code and write the results to \"out\".\n */\nint generate_gpu(isl_ctx *ctx, const char *input, FILE *out,\n\tstruct ppcg_options *options,\n\t__isl_give isl_printer *(*print)(__isl_take isl_printer *p,\n\t\tstruct gpu_prog *prog, __isl_keep isl_ast_node *tree,\n\t\tstruct gpu_types *types, void *user), void *user)\n{\n\tstruct gpu_gen gen;\n\tint r;\n\tint i;\n\n\tgen.ctx = ctx;\n\tgen.sizes = extract_sizes_from_str(ctx, options->sizes);\n\tgen.options = options;\n\tgen.kernel_id = 0;\n\tgen.print = print;\n\tgen.print_user = user;\n\tgen.types.n = 0;\n\tgen.types.name = NULL;\n\n\tif (options->debug->dump_sizes) {\n\t\tisl_space *space = isl_space_params_alloc(ctx, 0);\n\t\tgen.used_sizes = isl_union_map_empty(space);\n\t}\n\n\tr = ppcg_transform(ctx, input, out, options, &generate_wrap, &gen);\n\n\tif (options->debug->dump_sizes) {\n\t\tisl_union_map_dump(gen.used_sizes);\n\t\tisl_union_map_free(gen.used_sizes);\n\t}\n\n\tisl_union_map_free(gen.sizes);\n\tfor (i = 0; i < gen.types.n; ++i)\n\t\tfree(gen.types.name[i]);\n\tfree(gen.types.name);\n\n\treturn r;\n}\n\n/* Compute the set of inner array elements that may have their values\n * preserved by \"prog\".  In particular, collect the array elements of\n * arrays that are not local to \"prog\" and remove those elements that\n * are definitely killed or definitely written by \"prog\".\n */\nstatic __isl_give isl_union_set *compute_may_persist(struct gpu_prog *prog)\n{\n\tint i;\n\tisl_union_set *may_persist, *killed;\n\tisl_union_map *must_kill;\n\n\tmay_persist = isl_union_set_empty(isl_set_get_space(prog->context));\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tisl_set *extent;\n\n\t\tif (prog->array[i].local)\n\t\t\tcontinue;\n\n\t\textent = isl_set_copy(prog->array[i].extent);\n\t\tmay_persist = isl_union_set_add_set(may_persist, extent);\n\t}\n\n\tmay_persist = isl_union_set_intersect_params(may_persist,\n\t\t\t\t\t\tisl_set_copy(prog->context));\n\tmay_persist = isl_union_set_apply(may_persist,\n\t\t\t\t\tisl_union_map_copy(prog->to_inner));\n\tmust_kill = isl_union_map_copy(prog->tagged_must_kill);\n\tkilled = isl_union_map_range(must_kill);\n\tmust_kill = isl_union_map_copy(prog->must_write);\n\tkilled = isl_union_set_union(killed, isl_union_map_range(must_kill));\n\n\tmay_persist = isl_union_set_subtract(may_persist, killed);\n\treturn may_persist;\n}\n\nstruct gpu_prog *gpu_prog_alloc(isl_ctx *ctx, struct ppcg_scop *scop)\n{\n\tstruct gpu_prog *prog;\n\tisl_space *space;\n\tisl_map *id;\n\n\tif (!scop)\n\t\treturn NULL;\n\n\tprog = isl_calloc_type(ctx, struct gpu_prog);\n\tif (!prog)\n\t\treturn NULL;\n\n\tprog->ctx = ctx;\n\tprog->scop = scop;\n\tprog->context = isl_set_copy(scop->context);\n\tprog->n_stmts = scop->pet->n_stmt;\n\tprog->any_to_outer = pet_scop_compute_outer_to_any(scop->pet);\n\tprog->any_to_outer = isl_union_map_reverse(prog->any_to_outer);\n\tspace = isl_union_map_get_space(prog->any_to_outer);\n\tspace = isl_space_set_from_params(space);\n\tspace = isl_space_add_dims(space, isl_dim_set, 1);\n\tspace = isl_space_map_from_set(space);\n\tid = isl_map_identity(space);\n\tprog->any_to_outer = isl_union_map_add_map(prog->any_to_outer, id);\n\tprog->stmts = extract_stmts(ctx, scop, prog->any_to_outer);\n\tprog->read = isl_union_map_copy(scop->reads);\n\tprog->may_write = isl_union_map_copy(scop->may_writes);\n\tprog->must_write = isl_union_map_copy(scop->must_writes);\n\tprog->tagged_must_kill = isl_union_map_copy(scop->tagged_must_kills);\n\tprog->to_inner = pet_scop_compute_outer_to_inner(scop->pet);\n\tprog->to_outer = isl_union_map_copy(prog->to_inner);\n\tprog->to_outer = isl_union_map_reverse(prog->to_outer);\n\n\tif (!prog->stmts)\n\t\treturn gpu_prog_free(prog);\n\n\tif (collect_array_info(prog) < 0)\n\t\treturn gpu_prog_free(prog);\n\tprog->may_persist = compute_may_persist(prog);\n\n\treturn prog;\n}\n\nvoid *gpu_prog_free(struct gpu_prog *prog)\n{\n\tif (!prog)\n\t\treturn NULL;\n\tfree_array_info(prog);\n\tfree_stmts(prog->stmts, prog->n_stmts);\n\tisl_union_map_free(prog->any_to_outer);\n\tisl_union_map_free(prog->to_outer);\n\tisl_union_map_free(prog->to_inner);\n\tisl_union_map_free(prog->read);\n\tisl_union_map_free(prog->may_write);\n\tisl_union_map_free(prog->must_write);\n\tisl_union_map_free(prog->tagged_must_kill);\n\tisl_union_map_free(prog->array_order);\n\tisl_union_set_free(prog->may_persist);\n\tisl_set_free(prog->context);\n\tfree(prog);\n\treturn NULL;\n}\n"
  },
  {
    "path": "src/ppcg_files/gpu.h",
    "content": "#ifndef _GPU_H\n#define _GPU_H\n\n#include <isl/ast.h>\n#include <isl/id.h>\n#include <isl/id_to_ast_expr.h>\n\n#include <pet.h>\n\n#include \"ppcg.h\"\n#include \"ppcg_options.h\"\n\n#ifdef __cplusplus\nextern \"C\"\n{\n#endif\n\n\t/* An access to an outer array element or an iterator.\n * Accesses to iterators have an access relation that maps to an unnamed space.\n * An access may be both read and write.\n * If the access relation is empty, then the output dimension may\n * not be equal to the dimension of the corresponding array.\n */\n\tstruct gpu_stmt_access\n\t{\n\t\t/* Access reads elements */\n\t\tint read;\n\t\t/* Access writes elements */\n\t\tint write;\n\t\t/* All writes are definite writes. */\n\t\tint exact_write;\n\t\t/* Is a single, fixed element being accessed? */\n\t\tisl_bool fixed_element;\n\t\t/* The number of index expressions specified in the access. */\n\t\tint n_index;\n\n\t\t/* May access relation */\n\t\tisl_map *access;\n\t\t/* May access relation with as domain a mapping from iteration domain\n\t * to a reference identifier.\n\t */\n\t\tisl_map *tagged_access;\n\t\t/* The reference id of the corresponding pet_expr. */\n\t\tisl_id *ref_id;\n\n\t\tstruct gpu_stmt_access *next;\n\t};\n\n\t/* A representation of a user statement.\n * \"stmt\" points to the corresponding pet statement.\n * \"id\" is the identifier of the instance set of the statement.\n * \"accesses\" is a linked list of accesses performed by the statement.\n * If the statement has been killed, i.e., if it will not be scheduled,\n * then this linked list may be empty even if the actual statement does\n * perform accesses.\n */\n\tstruct gpu_stmt\n\t{\n\t\tisl_id *id;\n\t\tstruct pet_stmt *stmt;\n\n\t\tstruct gpu_stmt_access *accesses;\n\t};\n\n\t/* Represents an outer array possibly accessed by a gpu_prog.\n */\n\tstruct gpu_array_info\n\t{\n\t\t/* The array data space. */\n\t\tisl_space *space;\n\t\t/* Element type. */\n\t\tchar *type;\n\t\t/* Element size. */\n\t\tint size;\n\t\t/* Name of the array. */\n\t\tchar *name;\n\t\t/* Declared extent of original array. */\n\t\tisl_set *declared_extent;\n\t\t/* AST expression for declared size of original array. */\n\t\tisl_ast_expr *declared_size;\n\t\t/* Extent of the array that needs to be copied. */\n\t\tisl_set *extent;\n\t\t/* Number of indices. */\n\t\tunsigned n_index;\n\t\t/* For each index, a bound on \"extent\" in that direction. */\n\t\tisl_multi_pw_aff *bound;\n\t\t/* The corresponding access AST expression, if the array needs\n\t * to be allocated on the device.\n\t */\n\t\tisl_ast_expr *bound_expr;\n\n\t\t/* All references to this array; point to elements of a linked list. */\n\t\tint n_ref;\n\t\tstruct gpu_stmt_access **refs;\n\n\t\t/* Is this array accessed at all by the program? */\n\t\tint accessed;\n\n\t\t/* Is this a scalar that is read-only within the entire program? */\n\t\tint read_only_scalar;\n\n\t\t/* Are the elements of the array structures? */\n\t\tint has_compound_element;\n\n\t\t/* Are the elements only accessed through constant index expressions? */\n\t\tint only_fixed_element;\n\n\t\t/* Is the array local to the scop? */\n\t\tint local;\n\t\t/* Is the array local and should it be declared on the host? */\n\t\tint declare_local;\n\n\t\t/* Is the corresponding global device memory accessed in any way? */\n\t\tint global;\n\n\t\t/* Should the array be linearized? */\n\t\tint linearize;\n\n\t\t/* Order dependences on this array.\n\t * Only used if live_range_reordering option is set.\n\t * It is set to NULL otherwise.\n\t */\n\t\tisl_union_map *dep_order;\n\t};\n\n\t/* Represents an outer array accessed by a ppcg_kernel, localized\n * to the context of this kernel.\n *\n * \"array\" points to the corresponding array in the gpu_prog.\n * The \"n_group\" \"groups\" are the reference groups associated to the array.\n * If \"force_private\" is set, then the array (in practice a scalar)\n * must be mapped to a register.\n * \"global\" is set if the global device memory corresponding\n * to this array is accessed by the kernel.\n * \"bound\" is equal to array->bound specialized to the current kernel.\n * \"bound_expr\" is the corresponding access AST expression.\n */\n\tstruct gpu_local_array_info\n\t{\n\t\tstruct gpu_array_info *array;\n\n\t\tint n_group;\n\t\tstruct gpu_array_ref_group **groups;\n\n\t\tint force_private;\n\t\tint global;\n\n\t\tunsigned n_index;\n\t\tisl_multi_pw_aff *bound;\n\t\tisl_ast_expr *bound_expr;\n\t};\n\n\t__isl_give isl_ast_expr *gpu_local_array_info_linearize_index(\n\t\t\tstruct gpu_local_array_info *array, __isl_take isl_ast_expr *expr);\n\n\t/* A sequence of \"n\" names of types.\n */\n\tstruct gpu_types\n\t{\n\t\tint n;\n\t\tchar **name;\n\t};\n\n\t/* \"read\" and \"write\" contain the original access relations, possibly\n * involving member accesses.\n *\n * The elements of \"array\", as well as the ranges of \"copy_in\" and \"copy_out\"\n * only refer to the outer arrays of any possible member accesses.\n */\n\tstruct gpu_prog\n\t{\n\t\tisl_ctx *ctx;\n\n\t\tstruct ppcg_scop *scop;\n\n\t\t/* Set of parameter values */\n\t\tisl_set *context;\n\n\t\t/* All potential read accesses in the entire program */\n\t\tisl_union_map *read;\n\n\t\t/* All potential write accesses in the entire program */\n\t\tisl_union_map *may_write;\n\t\t/* All definite write accesses in the entire program */\n\t\tisl_union_map *must_write;\n\t\t/* All tagged definite kills in the entire program */\n\t\tisl_union_map *tagged_must_kill;\n\n\t\t/* The set of inner array elements that may be preserved. */\n\t\tisl_union_set *may_persist;\n\n\t\t/* A mapping from all innermost arrays to their outer arrays. */\n\t\tisl_union_map *to_outer;\n\t\t/* A mapping from the outer arrays to all corresponding inner arrays. */\n\t\tisl_union_map *to_inner;\n\t\t/* A mapping from all intermediate arrays to their outer arrays,\n\t * including an identity mapping from the anonymous 1D space to itself.\n\t */\n\t\tisl_union_map *any_to_outer;\n\n\t\t/* Order dependences on non-scalars. */\n\t\tisl_union_map *array_order;\n\n\t\t/* Array of statements */\n\t\tint n_stmts;\n\t\tstruct gpu_stmt *stmts;\n\n\t\tint n_array;\n\t\tstruct gpu_array_info *array;\n\t};\n\n\tstruct gpu_gen\n\t{\n\t\tisl_ctx *ctx;\n\t\tstruct ppcg_options *options;\n\n\t\t/* Callback for printing of AST in appropriate format. */\n\t\t__isl_give isl_printer *(*print)(__isl_take isl_printer *p,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t struct gpu_prog *prog, __isl_keep isl_ast_node *tree,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t struct gpu_types *types, void *user);\n\t\tvoid *print_user;\n\n\t\tstruct gpu_prog *prog;\n\t\t/* The generated AST. */\n\t\tisl_ast_node *tree;\n\n\t\t/* The sequence of types for which a definition has been printed. */\n\t\tstruct gpu_types types;\n\n\t\t/* User specified tile, grid and block sizes for each kernel */\n\t\tisl_union_map *sizes;\n\n\t\t/* Effectively used tile, grid and block sizes for each kernel */\n\t\tisl_union_map *used_sizes;\n\n\t\t/* Identifier of the next kernel. */\n\t\tint kernel_id;\n\t};\n\n\tenum ppcg_group_access_type\n\t{\n\t\tppcg_access_global,\n\t\tppcg_access_shared,\n\t\tppcg_access_private\n\t};\n\n\tenum ppcg_kernel_stmt_type\n\t{\n\t\tppcg_kernel_copy,\n\t\tppcg_kernel_domain,\n\t\tppcg_kernel_sync\n\t};\n\n\t/* Representation of special statements, in particular copy statements\n * and __syncthreads statements, inside a kernel.\n *\n * type represents the kind of statement\n *\n *\n * for ppcg_kernel_copy statements we have\n *\n * read is set if the statement should copy data from global memory\n * to shared memory or registers.\n *\n * index expresses an access to the array element that needs to be copied\n * local_index expresses the corresponding element in the tile\n *\n * array refers to the original array being copied\n * local_array is a pointer to the appropriate element in the \"array\"\n *\tarray of the ppcg_kernel to which this copy access belongs\n *\n *\n * for ppcg_kernel_domain statements we have\n *\n * stmt is the corresponding input statement\n *\n * n_access is the number of accesses in stmt\n * access is an array of local information about the accesses\n */\n\tstruct ppcg_kernel_stmt\n\t{\n\t\tenum ppcg_kernel_stmt_type type;\n\n\t\tunion {\n\t\t\tstruct\n\t\t\t{\n\t\t\t\tint read;\n\t\t\t\tisl_ast_expr *index;\n\t\t\t\tisl_ast_expr *local_index;\n\t\t\t\tstruct gpu_array_info *array;\n\t\t\t\tstruct gpu_local_array_info *local_array;\n\t\t\t} c;\n\t\t\tstruct\n\t\t\t{\n\t\t\t\tstruct gpu_stmt *stmt;\n\t\t\t\tisl_id_to_ast_expr *ref2expr;\n\t\t\t} d;\n\t\t} u;\n\t};\n\n\t/* Representation of a local variable in a kernel.\n */\n\tstruct ppcg_kernel_var\n\t{\n\t\tstruct gpu_array_info *array;\n\t\tenum ppcg_group_access_type type;\n\t\tchar *name;\n\t\tisl_vec *size;\n\t};\n\n\t/* Representation of a kernel.\n *\n * prog describes the original code from which the kernel is extracted.\n *\n * id is the sequence number of the kernel.\n *\n * block_ids contains the list of block identifiers for this kernel.\n * thread_ids contains the list of thread identifiers for this kernel.\n *\n * the first n_grid elements of grid_dim represent the specified size\n * of the grid.\n * the first n_block elements of block_dim represent the specified or\n * effective size of the block.\n * Note that in the input file, the sizes of the grid and the blocks\n * are specified in the order x, y, z, but internally, the sizes\n * are stored in reverse order, so that the last element always\n * refers to the x dimension.\n *\n * grid_size reflects the effective grid size.\n * grid_size_expr contains a corresponding access AST expression, built within\n * the context where the launch appears.\n *\n * context contains the values of the parameters and outer schedule dimensions\n * for which any statement instance in this kernel needs to be executed.\n *\n * n_sync is the number of synchronization operations that have\n * been introduced in the schedule tree corresponding to this kernel (so far).\n *\n * core contains the spaces of the statement domains that form\n * the core computation of the kernel.  It is used to navigate\n * the tree during the construction of the device part of the schedule\n * tree in gpu_create_kernel.\n *\n * expanded_domain contains the original statement instances,\n * i.e., those that appear in the domains of access relations,\n * that are involved in the kernel.\n * contraction maps those original statement instances to\n * the statement instances that are active at the point\n * in the schedule tree where the kernel is created.\n *\n * arrays is the set of possibly accessed outer array elements.\n *\n * space is the schedule space of the AST context.  That is, it represents\n * the loops of the generated host code containing the kernel launch.\n *\n * n_array is the total number of arrays in the input program and also\n * the number of element in the array array.\n * array contains information about each array that is local\n * to the current kernel.  If an array is not used in a kernel,\n * then the corresponding entry does not contain any information.\n *\n * any_force_private is set if any array in the kernel is marked force_private\n *\n * block_filter contains constraints on the domain elements in the kernel\n * that encode the mapping to block identifiers, where the block identifiers\n * are represented by \"n_grid\" parameters with as names the elements\n * of \"block_ids\".\n *\n * thread_filter contains constraints on the domain elements in the kernel\n * that encode the mapping to thread identifiers, where the thread identifiers\n * are represented by \"n_block\" parameters with as names the elements\n * of \"thread_ids\".\n *\n * copy_schedule corresponds to the schedule dimensions of\n * the (tiled) schedule for this kernel that have been taken into account\n * for computing private/shared memory tiles.\n * The domain corresponds to the original statement instances, i.e.,\n * those that appear in the leaves of the schedule tree.\n * copy_schedule_dim is the dimension of this schedule.\n *\n * sync_writes contains write references that require synchronization.\n * Each reference is represented by a universe set in a space [S[i,j] -> R[]]\n * with S[i,j] the statement instance space and R[] the array reference.\n */\n\tstruct ppcg_kernel\n\t{\n\t\tisl_ctx *ctx;\n\t\tstruct ppcg_options *options;\n\n\t\tstruct gpu_prog *prog;\n\n\t\tint id;\n\n\t\tisl_id_list *block_ids;\n\t\tisl_id_list *thread_ids;\n\n\t\tint n_grid;\n\t\tint n_block;\n\t\tint grid_dim[2];\n\t\tint block_dim[3];\n\n\t\tisl_multi_pw_aff *grid_size;\n\t\tisl_ast_expr *grid_size_expr;\n\t\tisl_set *context;\n\n\t\tint n_sync;\n\t\tisl_union_set *core;\n\t\tisl_union_set *arrays;\n\n\t\tisl_union_pw_multi_aff *contraction;\n\t\tisl_union_set *expanded_domain;\n\n\t\tisl_space *space;\n\n\t\tint n_array;\n\t\tstruct gpu_local_array_info *array;\n\n\t\tint n_var;\n\t\tstruct ppcg_kernel_var *var;\n\n\t\tint any_force_private;\n\n\t\tisl_union_set *block_filter;\n\t\tisl_union_set *thread_filter;\n\t\tisl_union_pw_multi_aff *copy_schedule;\n\t\tint copy_schedule_dim;\n\n\t\tisl_union_set *sync_writes;\n\n\t\tisl_ast_node *tree;\n\t};\n\n\tint gpu_array_is_scalar(struct gpu_array_info *array);\n\tint gpu_array_is_read_only_scalar(struct gpu_array_info *array);\n\tint gpu_array_requires_device_allocation(struct gpu_array_info *array);\n\t__isl_give isl_set *gpu_array_positive_size_guard(struct gpu_array_info *array);\n\tisl_bool gpu_array_can_be_private(struct gpu_array_info *array);\n\n\tstruct gpu_prog *gpu_prog_alloc(isl_ctx *ctx, struct ppcg_scop *scop);\n\tvoid *gpu_prog_free(struct gpu_prog *prog);\n\n\tint ppcg_kernel_requires_array_argument(struct ppcg_kernel *kernel, int i);\n\n\tint generate_gpu(isl_ctx *ctx, const char *input, FILE *out,\n\t\t\t\t\t\t\t\t\t struct ppcg_options *options,\n\t\t\t\t\t\t\t\t\t __isl_give isl_printer *(*print)(__isl_take isl_printer *p,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tstruct gpu_prog *prog, __isl_keep isl_ast_node *tree,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tstruct gpu_types *types, void *user),\n\t\t\t\t\t\t\t\t\t void *user);\n\n\t__isl_give isl_schedule_node *gpu_create_kernel(struct gpu_gen *gen,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t__isl_take isl_schedule_node *node, int scale,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t__isl_keep isl_multi_val *sizes);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "src/ppcg_files/gpu_array_tile.c",
    "content": "#include <isl/aff.h>\n#include <isl/map.h>\n\n#include \"gpu_array_tile.h\"\n\nstruct gpu_array_tile *gpu_array_tile_free(struct gpu_array_tile *tile)\n{\n\tint j;\n\n\tif (!tile)\n\t\treturn NULL;\n\n\tfor (j = 0; j < tile->n; ++j) {\n\t\tisl_val_free(tile->bound[j].size);\n\t\tisl_val_free(tile->bound[j].stride);\n\t\tisl_aff_free(tile->bound[j].lb);\n\t\tisl_aff_free(tile->bound[j].shift);\n\t}\n\tfree(tile->bound);\n\tisl_multi_aff_free(tile->tiling);\n\tfree(tile);\n\n\treturn NULL;\n}\n\n/* Create a gpu_array_tile for an array of dimension \"n_index\".\n */\nstruct gpu_array_tile *gpu_array_tile_create(isl_ctx *ctx, int n_index)\n{\n\tint i;\n\tstruct gpu_array_tile *tile;\n\n\ttile = isl_calloc_type(ctx, struct gpu_array_tile);\n\tif (!tile)\n\t\treturn NULL;\n\n\ttile->ctx = ctx;\n\ttile->bound = isl_alloc_array(ctx, struct gpu_array_bound, n_index);\n\tif (!tile->bound)\n\t\treturn gpu_array_tile_free(tile);\n\n\ttile->n = n_index;\n\n\tfor (i = 0; i < n_index; ++i) {\n\t\ttile->bound[i].size = NULL;\n\t\ttile->bound[i].lb = NULL;\n\t\ttile->bound[i].stride = NULL;\n\t\ttile->bound[i].shift = NULL;\n\t}\n\n\treturn tile;\n}\n\n/* Compute the size of the tile specified by \"tile\"\n * in number of elements and return the result.\n */\n__isl_give isl_val *gpu_array_tile_size(struct gpu_array_tile *tile)\n{\n\tint i;\n\tisl_val *size;\n\n\tif (!tile)\n\t\treturn NULL;\n\n\tsize = isl_val_one(tile->ctx);\n\n\tfor (i = 0; i < tile->n; ++i)\n\t\tsize = isl_val_mul(size, isl_val_copy(tile->bound[i].size));\n\n\treturn size;\n}\n"
  },
  {
    "path": "src/ppcg_files/gpu_array_tile.h",
    "content": "#ifndef GPU_ARRAY_TILE_H\n#define GPU_ARRAY_TILE_H\n\n#include <isl/aff_type.h>\n#include <isl/map_type.h>\n#include <isl/val.h>\n\n/* The current index is such that if you add \"shift\",\n * then the result is always a multiple of \"stride\",\n * where \"stride\" may be equal to 1.\n * Let D represent the initial tile->depth dimensions of the computed schedule.\n * The spaces of \"lb\" and \"shift\" are of the form\n *\n *\tD -> [b]\n */\nstruct gpu_array_bound\n{\n\tisl_val *size;\n\tisl_aff *lb;\n\n\tisl_val *stride;\n\tisl_aff *shift;\n};\n\n/* A tile of an outer array.\n *\n * requires_unroll is set if the schedule dimensions that are mapped\n * to threads need to be unrolled for this (private) tile to be used.\n *\n * \"depth\" reflects the number of schedule dimensions that affect the tile.\n * The copying into and/or out of the tile is performed at that depth.\n *\n * n is the dimension of the array.\n * bound is an array of size \"n\" representing the lower bound\n *\tand size for each index.\n *\n * tiling maps a tile in the global array to the corresponding\n * shared/private memory tile and is of the form\n *\n *\t{ [D[i] -> A[a]] -> T[(a + shift(i))/stride - lb(i)] }\n *\n * where D represents the initial \"depth\" dimensions\n * of the computed schedule.\n */\nstruct gpu_array_tile\n{\n\tisl_ctx *ctx;\n\tint requires_unroll;\n\tint depth;\n\tint n;\n\tstruct gpu_array_bound *bound;\n\tisl_multi_aff *tiling;\n};\n\nstruct gpu_array_tile *gpu_array_tile_create(isl_ctx *ctx, int n_index);\nstruct gpu_array_tile *gpu_array_tile_free(struct gpu_array_tile *tile);\n\n__isl_give isl_val *gpu_array_tile_size(struct gpu_array_tile *tile);\n\n#endif\n"
  },
  {
    "path": "src/ppcg_files/gpu_group.c",
    "content": "/*\n * Copyright 2010-2011 INRIA Saclay\n * Copyright 2012-2014 Ecole Normale Superieure\n * Copyright 2015      Sven Verdoolaege\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege, INRIA Saclay - Ile-de-France,\n * Parc Club Orsay Universite, ZAC des vignes, 4 rue Jacques Monod,\n * 91893 Orsay, France\n * and Ecole Normale Superieure, 45 rue d'Ulm, 75230 Paris, France\n */\n\n#include <isl/aff.h>\n#include <isl/map.h>\n#include <isl/constraint.h>\n\n#include \"gpu_array_tile.h\"\n#include \"gpu_group.h\"\n#include \"gpu_tree.h\"\n#include \"schedule.h\"\n\n/* Print the name of the local copy of a given group of array references.\n */\n__isl_give isl_printer *gpu_array_ref_group_print_name(\n\tstruct gpu_array_ref_group *group, __isl_take isl_printer *p)\n{\n\tint global = 0;\n\tenum ppcg_group_access_type type;\n\n\ttype = gpu_array_ref_group_type(group);\n\tif (type == ppcg_access_private)\n\t\tp = isl_printer_print_str(p, \"private_\");\n\telse if (type == ppcg_access_shared)\n\t\tp = isl_printer_print_str(p, \"shared_\");\n\telse\n\t\tglobal = 1;\n\tp = isl_printer_print_str(p, group->array->name);\n\tif (!global && group->local_array->n_group > 1) {\n\t\tp = isl_printer_print_str(p, \"_\");\n\t\tp = isl_printer_print_int(p, group->nr);\n\t}\n\n\treturn p;\n}\n\n/* Return the union of all read (read = 1) and/or write (write = 1)\n * access relations in the group.\n */\n__isl_give isl_union_map *gpu_array_ref_group_access_relation(\n\tstruct gpu_array_ref_group *group, int read, int write)\n{\n\tint i;\n\tisl_union_map *access;\n\n\taccess = isl_union_map_empty(isl_map_get_space(group->access));\n\tfor (i = 0; i < group->n_ref; ++i) {\n\t\tisl_map *map_i;\n\n\t\tif (!((read && group->refs[i]->read) ||\n\t\t     (write && group->refs[i]->write)))\n\t\t\tcontinue;\n\t\tmap_i = isl_map_copy(group->refs[i]->access);\n\t\taccess = isl_union_map_union(access,\n\t\t\t\t\t    isl_union_map_from_map(map_i));\n\t}\n\n\treturn access;\n}\n\n/* Should this array reference group be mapped to private, shared or global\n * memory?\n * If we have computed both a private and a shared tile, then\n * the tile with the smallest depth is used.  If both have the same depth,\n * then the private tile is used.\n */\nenum ppcg_group_access_type gpu_array_ref_group_type(\n\tstruct gpu_array_ref_group *group)\n{\n\tif (group->private_tile && group->shared_tile &&\n\t    group->shared_tile->depth < group->private_tile->depth)\n\t\treturn ppcg_access_shared;\n\tif (group->private_tile)\n\t\treturn ppcg_access_private;\n\tif (group->shared_tile)\n\t\treturn ppcg_access_shared;\n\treturn ppcg_access_global;\n}\n\n\n/* Return the effective gpu_array_tile associated to \"group\" or\n * NULL if there is no such gpu_array_tile.\n */\nstruct gpu_array_tile *gpu_array_ref_group_tile(\n\tstruct gpu_array_ref_group *group)\n{\n\tswitch (gpu_array_ref_group_type(group)) {\n\tcase ppcg_access_global:\n\t\treturn NULL;\n\tcase ppcg_access_shared:\n\t\treturn group->shared_tile;\n\tcase ppcg_access_private:\n\t\treturn group->private_tile;\n\t}\n}\n\n/* Does the tile associated to \"group\" require unrolling of the schedule\n * dimensions mapped to threads?\n * Note that this can only happen for private tiles.\n */\nint gpu_array_ref_group_requires_unroll(struct gpu_array_ref_group *group)\n{\n\tstruct gpu_array_tile *tile;\n\n\ttile = gpu_array_ref_group_tile(group);\n\tif (!tile)\n\t\treturn 0;\n\treturn tile->requires_unroll;\n}\n\n/* Given an array access \"access\", check if for any index i there is\n * a shift a(p) and a stride g such that\n *\n *\ta(p) + i = 0 mod g\n *\n * If so, record the information in tile->bound[i]->stride and\n * tile->bound[i]->shift.\n * Otherwise, set tile->bound[i]->stride to 1 (and tile->bound[i]->shift to 0).\n * Return isl_bool_true if any non-trivial stride was found.\n *\n * Note that the stride info returned by isl_map_get_range_stride_info\n * is of the form\n *\n *\ti = o(p) + g n\n *\n * a(p) can therefore be taken to be equal to -o(p).\n */\nstatic isl_bool detect_strides(struct gpu_array_tile *tile,\n\t__isl_keep isl_map *access)\n{\n\tint i;\n\tisl_bool has_strides = isl_bool_false;\n\n\tfor (i = 0; i < tile->n; ++i) {\n\t\tstruct gpu_array_bound *bound = &tile->bound[i];\n\t\tisl_stride_info *si;\n\n\t\tsi = isl_map_get_range_stride_info(access, i);\n\t\tbound->stride = isl_stride_info_get_stride(si);\n\t\tbound->shift = isl_aff_neg(isl_stride_info_get_offset(si));\n\t\tisl_stride_info_free(si);\n\n\t\tif (!has_strides)\n\t\t\thas_strides = isl_val_gt_si(bound->stride, 1);\n\t\tif (has_strides < 0)\n\t\t\treturn isl_bool_error;\n\t}\n\n\treturn has_strides;\n}\n\n/* Given an array access \"access\", remove the strides based\n * on the information in tile->bound[i]->stride and tile->bound[i]->shift.\n *\n * In particular let the access be A[a] and\n * let the shifts s_i(p) and the strides g_i be such that\n *\n *  S(p) + a = 0 mod G\n *\n * Replace the access by\n *\n *  A[(a + S(p))/G]\n *\n * First collect the shifts s_i into an isl_multi_aff and\n * the strides into the scaling function A[i] -> A[G i].\n * Then add the shifts to the original access and\n * take the preimage over the scaling.\n */\nstatic __isl_give isl_map *remove_strides(__isl_take isl_map *access,\n\tstruct gpu_array_tile *tile)\n{\n\tint i;\n\tisl_space *space;\n\tisl_multi_aff *shift, *scale;\n\tisl_multi_val *stride;\n\n\tspace = isl_map_get_space(access);\n\tshift = isl_multi_aff_zero(isl_space_copy(space));\n\tspace = isl_space_range(space);\n\tstride = isl_multi_val_zero(isl_space_copy(space));\n\tscale = isl_multi_aff_identity(isl_space_map_from_set(space));\n\tfor (i = 0; i < tile->n; ++i) {\n\t\tstruct gpu_array_bound *bound = &tile->bound[i];\n\t\tisl_aff *shift_i;\n\t\tisl_val *stride_i;\n\n\t\tshift_i = isl_aff_copy(bound->shift);\n\t\tstride_i = isl_val_copy(bound->stride);\n\t\tshift = isl_multi_aff_set_aff(shift, i, shift_i);\n\t\tstride = isl_multi_val_set_val(stride, i, stride_i);\n\t}\n\tscale = isl_multi_aff_scale_multi_val(scale, stride);\n\n\taccess = isl_map_sum(access, isl_map_from_multi_aff(shift));\n\taccess = isl_map_preimage_range_multi_aff(access, scale);\n\n\treturn access;\n}\n\n/* Check if we can find a memory tile for the given array\n * based on the given accesses, and if so, put the results in \"tile\".\n *\n * We project the accesses on each index in turn and look for a parametric\n * offset such that the size is constant, after removing\n * any stride that may appear in the accesses.\n *\n * tile->depth is initialized to the input dimension of the computed bounds.\n */\nstatic isl_bool can_tile(__isl_keep isl_map *access,\n\tstruct gpu_array_tile *tile)\n{\n\tint i;\n\tisl_bool has_strides, valid;\n\tisl_fixed_box *box;\n\tisl_multi_aff *offset;\n\tisl_multi_val *size;\n\n\tif (!tile)\n\t\treturn isl_bool_error;\n\n\tisl_map_free(isl_map_detect_equalities(isl_map_copy(access)));\n\n\thas_strides = detect_strides(tile, access);\n\tif (has_strides < 0)\n\t\treturn isl_bool_error;\n\n\ttile->depth = isl_map_dim(access, isl_dim_in);\n\n\taccess = isl_map_copy(access);\n\tif (has_strides)\n\t\taccess = remove_strides(access, tile);\n\n\tbox = isl_map_get_range_simple_fixed_box_hull(access);\n\tisl_map_free(access);\n\n\tvalid = isl_fixed_box_is_valid(box);\n\tif (valid >= 0 && valid) {\n\t\toffset = isl_fixed_box_get_offset(box);\n\t\tsize = isl_fixed_box_get_size(box);\n\t\tfor (i = 0; i < tile->n; ++i) {\n\t\t\ttile->bound[i].size = isl_multi_val_get_val(size, i);\n\t\t\ttile->bound[i].lb = isl_multi_aff_get_aff(offset, i);\n\t\t}\n\t\tisl_multi_aff_free(offset);\n\t\tisl_multi_val_free(size);\n\t}\n\tisl_fixed_box_free(box);\n\n\treturn valid;\n}\n\n/* Internal data structure for gpu_group_references.\n *\n * scop represents the input scop.\n * kernel_depth is the schedule depth where the kernel launch will\n * be introduced, i.e., it is the depth of the band that is mapped\n * to blocks.\n * shared_depth is the schedule depth at which the copying to/from\n * shared memory is computed.  The copy operation may then\n * later be hoisted to a higher level.\n * thread_depth is the schedule depth where the thread mark is located,\n * i.e., it is the depth of the band that is mapped to threads and also\n * the schedule depth at which the copying to/from private memory\n * is computed.  The copy operation may then later be hoisted to\n * a higher level.\n * n_thread is the number of schedule dimensions in the band that\n * is mapped to threads.\n * privatization lives in the range of thread_sched (i.e., it is\n * of dimension thread_depth + n_thread) and encodes the mapping\n * to thread identifiers (as parameters).\n * host_sched contains the kernel_depth dimensions of the host schedule.\n * shared_sched contains the first shared_depth dimensions of the\n * kernel schedule.\n * copy_sched contains the first thread_depth dimensions of the\n * kernel schedule.\n * thread_sched contains the first (thread_depth + n_thread) dimensions\n * of the kernel schedule.\n * full_sched is a union_map representation of the entire kernel schedule.\n * The schedules are all formulated in terms of the original statement\n * instances, i.e., those that appear in the domains of the access\n * relations.\n */\nstruct gpu_group_data {\n\tstruct ppcg_scop *scop;\n\tint kernel_depth;\n\tint shared_depth;\n\tint thread_depth;\n\tint n_thread;\n\tisl_set *privatization;\n\tisl_union_map *host_sched;\n\tisl_union_map *shared_sched;\n\tisl_union_map *copy_sched;\n\tisl_union_map *thread_sched;\n\tisl_union_map *full_sched;\n};\n\n/* Construct a map from domain_space to domain_space that increments\n * the dimension at position \"pos\" and leaves all other dimensions\n * constant.\n */\nstatic __isl_give isl_map *next(__isl_take isl_space *domain_space, int pos)\n{\n\tisl_space *space;\n\tisl_aff *aff;\n\tisl_multi_aff *next;\n\n\tspace = isl_space_map_from_set(domain_space);\n\tnext = isl_multi_aff_identity(space);\n\taff = isl_multi_aff_get_aff(next, pos);\n\taff = isl_aff_add_constant_si(aff, 1);\n\tnext = isl_multi_aff_set_aff(next, pos, aff);\n\n\treturn isl_map_from_multi_aff(next);\n}\n\n/* Check if the given access is coalesced (or if there is no point\n * in trying to coalesce the access by mapping the array to shared memory).\n * That is, check whether incrementing the dimension that will get\n * wrapped over the last thread index results in incrementing\n * the last array index.\n *\n * If no two consecutive array elements are ever accessed by \"access\",\n * then mapping the corresponding array to shared memory will not\n * improve coalescing.  In fact, the copying will likely be performed\n * by a single thread.  Consider the access as coalesced such that\n * the caller will not try and map the array to shared memory just\n * to improve coalescing.\n *\n * This function is only called for access relations without reuse and\n * kernels with at least one thread identifier.\n */\nstatic int access_is_coalesced(struct gpu_group_data *data,\n\t__isl_keep isl_union_map *access)\n{\n\tint dim;\n\tisl_space *space;\n\tisl_set *accessed;\n\tisl_map *access_map;\n\tisl_map *next_thread_x;\n\tisl_map *next_element;\n\tisl_map *map;\n\tint coalesced, empty;\n\n\taccess = isl_union_map_copy(access);\n\taccess = isl_union_map_apply_domain(access,\n\t\t\t\tisl_union_map_copy(data->full_sched));\n\taccess_map = isl_map_from_union_map(access);\n\n\tspace = isl_map_get_space(access_map);\n\tspace = isl_space_range(space);\n\tdim = isl_space_dim(space, isl_dim_set);\n\tif (dim == 0)\n\t\tnext_element = isl_map_empty(isl_space_map_from_set(space));\n\telse\n\t\tnext_element = next(space, dim - 1);\n\n\taccessed = isl_map_range(isl_map_copy(access_map));\n\tmap = isl_map_copy(next_element);\n\tmap = isl_map_intersect_domain(map, isl_set_copy(accessed));\n\tmap = isl_map_intersect_range(map, accessed);\n\tempty = isl_map_is_empty(map);\n\tisl_map_free(map);\n\n\tif (empty < 0 || empty) {\n\t\tisl_map_free(next_element);\n\t\tisl_map_free(access_map);\n\t\treturn empty;\n\t}\n\n\tspace = isl_map_get_space(access_map);\n\tspace = isl_space_domain(space);\n\tnext_thread_x = next(space, data->thread_depth + data->n_thread - 1);\n\n\tmap = isl_map_apply_domain(next_thread_x, isl_map_copy(access_map));\n\tmap = isl_map_apply_range(map, access_map);\n\n\tcoalesced = isl_map_is_subset(map, next_element);\n\n\tisl_map_free(next_element);\n\tisl_map_free(map);\n\n\treturn coalesced;\n}\n\n/* Replace the host schedule dimensions in the access relation \"access\"\n * by parameters, so that they are treated as fixed when checking for reuse\n * (within a kernel) or whether two consecutive elements are accessed\n * (within a kernel).\n */\nstatic __isl_give isl_union_map *localize_access(struct gpu_group_data *data,\n\t__isl_take isl_union_map *access)\n{\n\tint n;\n\tisl_space *space;\n\tisl_set *param;\n\tisl_union_map *umap;\n\tisl_id_list *ids;\n\n\tumap = isl_union_map_copy(data->host_sched);\n\tspace = isl_union_map_get_space(umap);\n\tn = data->kernel_depth;\n\tids = ppcg_scop_generate_names(data->scop, n, \"__ppcg_host_\");\n\tparam = parametrization(space, n, 0, ids);\n\tisl_id_list_free(ids);\n\tumap = isl_union_map_intersect_range(umap,\n\t\t\t\t\t\tisl_union_set_from_set(param));\n\taccess = isl_union_map_intersect_domain(access,\n\t\t\t\t\t\tisl_union_map_domain(umap));\n\n\treturn access;\n}\n\n/* Given an access relation in terms of at least data->thread_depth initial\n * dimensions of the computed schedule, check if it is bijective for\n * fixed values of the first data->thread_depth dimensions.\n * We perform this check by equating these dimensions to parameters.\n */\nstatic int access_is_bijective(struct gpu_group_data *data,\n\t__isl_keep isl_map *access)\n{\n\tint res;\n\tint dim;\n\tisl_set *par;\n\tisl_space *space;\n\tisl_id_list *ids;\n\n\taccess = isl_map_copy(access);\n\tspace = isl_space_params(isl_map_get_space(access));\n\tids = ppcg_scop_generate_names(data->scop, data->thread_depth, \"s\");\n\tdim = isl_map_dim(access, isl_dim_in);\n\tpar = parametrization(space, dim, 0, ids);\n\tisl_id_list_free(ids);\n\taccess = isl_map_intersect_domain(access, par);\n\tres = isl_map_is_bijective(access);\n\tisl_map_free(access);\n\n\treturn res;\n}\n\n/* Compute the number of outer schedule tile dimensions that affect\n * the offset of \"tile\".\n * If there is no such dimension, then return the index\n * of the first kernel dimension, i.e., data->kernel_depth.\n */\nstatic int compute_tile_depth(struct gpu_group_data *data,\n\tstruct gpu_array_tile *tile)\n{\n\tint i, j;\n\n\tfor (j = tile->depth - 1; j >= data->kernel_depth; --j) {\n\t\tfor (i = 0; i < tile->n; ++i) {\n\t\t\tisl_aff *lb;\n\t\t\tisl_aff *shift;\n\n\t\t\tlb = tile->bound[i].lb;\n\t\t\tif (isl_aff_involves_dims(lb, isl_dim_in, j, 1))\n\t\t\t\tbreak;\n\n\t\t\tshift = tile->bound[i].shift;\n\t\t\tif (!shift)\n\t\t\t\tcontinue;\n\t\t\tif (isl_aff_involves_dims(shift, isl_dim_in, j, 1))\n\t\t\t\tbreak;\n\t\t}\n\t\tif (i < tile->n)\n\t\t\tbreak;\n\t}\n\n\treturn ++j;\n}\n\n/* Return the lowest depth between data->kernel_depth and data->thread_depth\n * at which every array element accessed through \"acc\" is accessed\n * by a single thread.  The input dimension of \"acc\" is\n * data->thread_depth + data->n_thread, where the final data->n_thread\n * dimensions are those that will be mapped to threads.\n * If the values for these dimensions are uniquely determined\n * by the array index and a given number of outer dimensions, then\n * there is only one thread accessing that array element within those\n * outer dimensions.\n *\n * The input space of \"acc\" is first split up, such that it has the form\n *\n *\t[O -> T] -> A\n *\n * with O the outer dimensions, T the dimensions that will be mapped to threads\n * and A the array index.\n *\n * Then the positions of T and A are interchanged to simplify the test\n * whether T uniquely depends on O and A.\n * In particular, the above access relation is first combined with\n *\n *\t[O -> T] -> T\n *\n * to form\n *\n *\t[O -> T] -> [A -> T]\n *\n * from which\n *\n *\tO -> [A -> T]\n *\n * is extracted, which is then uncurried to\n *\n *\t[O -> A] -> T\n *\n * Finally, the final dimensions of O are projected out one by one\n * until T is no longer uniquely determined by A and the remaining\n * dimensions in O.  The value returned is that of the last dimension\n * that was successfully projected out.\n * Note that there is no need to test whether [O -> A] -> T itself\n * is single-valued as that was already tested in access_is_bijective.\n */\nstatic int compute_accessed_by_single_thread_depth(struct gpu_group_data *data,\n\t__isl_keep isl_map *acc)\n{\n\tint i;\n\tisl_space *space;\n\tisl_map *map;\n\tisl_bool sv;\n\n\tif (data->thread_depth == data->kernel_depth)\n\t\treturn data->thread_depth;\n\n\tacc = isl_map_copy(acc);\n\n\tspace = isl_map_get_space(acc);\n\tspace = isl_space_params(space);\n\tspace = isl_space_set_from_params(space);\n\tspace = isl_space_add_dims(space, isl_dim_set, data->thread_depth);\n\tspace = isl_space_from_domain(space);\n\tspace = isl_space_add_dims(space, isl_dim_out, data->n_thread);\n\tspace = isl_space_wrap(space);\n\tmap = isl_set_flatten_map(isl_set_universe(space));\n\tacc = isl_map_apply_range(map, acc);\n\n\tspace = isl_space_domain(isl_map_get_space(acc));\n\tmap = isl_map_range_map(isl_map_universe(isl_space_unwrap(space)));\n\tacc = isl_map_range_product(acc, map);\n\tacc = isl_map_domain_factor_domain(acc);\n\tacc = isl_map_uncurry(acc);\n\n\tfor (i = data->thread_depth - 1; i >= data->kernel_depth; --i) {\n\t\tacc = isl_map_project_out(acc, isl_dim_in, i, 1);\n\t\tsv = isl_map_is_single_valued(acc);\n\t\tif (sv < 0)\n\t\t\tgoto error;\n\t\tif (!sv)\n\t\t\tbreak;\n\t}\n\n\tisl_map_free(acc);\n\n\treturn ++i;\nerror:\n\tisl_map_free(acc);\n\treturn -1;\n}\n\n/* Adjust the fields of \"tile\" to reflect the new input dimension \"depth\".\n * The dimension beyond \"depth\" are assumed not to affect the tile,\n * so they can simply be dropped.\n */\nstatic int tile_adjust_depth(struct gpu_array_tile *tile, int depth)\n{\n\tint i;\n\n\tif (tile->depth == depth)\n\t\treturn 0;\n\n\tfor (i = 0; i < tile->n; ++i) {\n\t\ttile->bound[i].lb = isl_aff_drop_dims(tile->bound[i].lb,\n\t\t\t\t\tisl_dim_in, depth, tile->depth - depth);\n\t\tif (!tile->bound[i].lb)\n\t\t\treturn -1;\n\t\tif (!tile->bound[i].shift)\n\t\t\tcontinue;\n\t\ttile->bound[i].shift = isl_aff_drop_dims(tile->bound[i].shift,\n\t\t\t\t\tisl_dim_in, depth, tile->depth - depth);\n\t\tif (!tile->bound[i].shift)\n\t\t\treturn -1;\n\t}\n\n\ttile->depth = depth;\n\n\treturn 0;\n}\n\n/* Determine the number of schedule dimensions that affect the offset of the\n * shared or private tile \"tile\" and store the result in tile->depth, with\n * a lower bound of data->kernel_depth.\n * Also adjust the fields of the tile to only refer to the tile->depth\n * outer schedule dimensions.\n */\nstatic isl_stat tile_set_depth(struct gpu_group_data *data,\n\tstruct gpu_array_tile *tile)\n{\n\tif (tile_adjust_depth(tile, compute_tile_depth(data, tile)) < 0)\n\t\treturn isl_stat_error;\n\n\treturn isl_stat_ok;\n}\n\n/* Determine the number of schedule dimensions that affect the offset of the\n * shared tile and store the minimum of the private and shared tile depth\n * in group->min_depth, with a lower bound of data->kernel_depth.\n * If there is no tile defined on the array reference group,\n * then set group->min_depth to data->thread_depth.\n */\nstatic int set_depth(struct gpu_group_data *data,\n\tstruct gpu_array_ref_group *group)\n{\n\tgroup->min_depth = data->thread_depth;\n\n\tif (group->private_tile) {\n\t\tif (group->private_tile->depth < group->min_depth)\n\t\t\tgroup->min_depth = group->private_tile->depth;\n\t}\n\tif (group->shared_tile) {\n\t\tif (tile_set_depth(data, group->shared_tile) < 0)\n\t\t\treturn -1;\n\t\tif (group->shared_tile->depth < group->min_depth)\n\t\t\tgroup->min_depth = group->shared_tile->depth;\n\t}\n\n\treturn 0;\n}\n\n/* Fill up the groups array with singleton groups, i.e., one group\n * per reference, initializing the array, access, write, n_ref and refs fields.\n * In particular the access field is initialized to the scheduled\n * access relation of the array reference.\n *\n * Return the number of elements initialized, i.e., the number of\n * active references in the current kernel.\n */\nstatic int populate_array_references(struct gpu_local_array_info *local,\n\tstruct gpu_array_ref_group **groups, struct gpu_group_data *data)\n{\n\tint i;\n\tint n;\n\tisl_ctx *ctx = isl_union_map_get_ctx(data->copy_sched);\n\n\tn = 0;\n\tfor (i = 0; i < local->array->n_ref; ++i) {\n\t\tisl_union_map *umap;\n\t\tisl_map *map;\n\t\tstruct gpu_array_ref_group *group;\n\t\tstruct gpu_stmt_access *access = local->array->refs[i];\n\n\t\tmap = isl_map_copy(access->access);\n\t\tumap = isl_union_map_from_map(map);\n\t\tumap = isl_union_map_apply_domain(umap,\n\t\t\t\tisl_union_map_copy(data->copy_sched));\n\n\t\tif (isl_union_map_is_empty(umap)) {\n\t\t\tisl_union_map_free(umap);\n\t\t\tcontinue;\n\t\t}\n\n\t\tmap = isl_map_from_union_map(umap);\n\t\tmap = isl_map_detect_equalities(map);\n\n\t\tgroup = isl_calloc_type(ctx, struct gpu_array_ref_group);\n\t\tif (!group) {\n\t\t\tisl_map_free(map);\n\t\t\treturn -1;\n\t\t}\n\t\tgroup->local_array = local;\n\t\tgroup->array = local->array;\n\t\tgroup->access = map;\n\t\tgroup->write = access->write;\n\t\tgroup->exact_write = access->exact_write;\n\t\tgroup->slice = access->n_index < local->array->n_index;\n\t\tgroup->refs = &local->array->refs[i];\n\t\tgroup->n_ref = 1;\n\n\t\tgroups[n++] = group;\n\t}\n\n\treturn n;\n}\n\n/* If group->n_ref == 1, then group->refs was set by\n * populate_array_references to point directly into\n * group->array->refs and should not be freed.\n * If group->n_ref > 1, then group->refs was set by join_groups\n * to point to a newly allocated array.\n */\nstruct gpu_array_ref_group *gpu_array_ref_group_free(\n\tstruct gpu_array_ref_group *group)\n{\n\tif (!group)\n\t\treturn NULL;\n\tgpu_array_tile_free(group->shared_tile);\n\tgpu_array_tile_free(group->private_tile);\n\tisl_map_free(group->access);\n\tif (group->n_ref > 1)\n\t\tfree(group->refs);\n\tfree(group);\n\treturn NULL;\n}\n\n/* Check if the access relations of group1 and group2 overlap within\n * copy_sched.\n */\nstatic int accesses_overlap(struct gpu_array_ref_group *group1,\n\tstruct gpu_array_ref_group *group2)\n{\n\tint disjoint;\n\n\tdisjoint = isl_map_is_disjoint(group1->access, group2->access);\n\tif (disjoint < 0)\n\t\treturn -1;\n\n\treturn !disjoint;\n}\n\n/* Combine the given two groups into a single group, containing\n * the references of both groups.\n */\nstatic struct gpu_array_ref_group *join_groups(\n\tstruct gpu_array_ref_group *group1,\n\tstruct gpu_array_ref_group *group2)\n{\n\tint i;\n\tisl_ctx *ctx;\n\tstruct gpu_array_ref_group *group;\n\n\tif (!group1 || !group2)\n\t\treturn NULL;\n\n\tctx = isl_map_get_ctx(group1->access);\n\tgroup = isl_calloc_type(ctx, struct gpu_array_ref_group);\n\tif (!group)\n\t\treturn NULL;\n\tgroup->local_array = group1->local_array;\n\tgroup->array = group1->array;\n\tgroup->access = isl_map_union(isl_map_copy(group1->access),\n\t\t\t\t\tisl_map_copy(group2->access));\n\tgroup->write = group1->write || group2->write;\n\tgroup->exact_write = group1->exact_write && group2->exact_write;\n\tgroup->slice = group1->slice || group2->slice;\n\tgroup->n_ref = group1->n_ref + group2->n_ref;\n\tgroup->refs = isl_alloc_array(ctx, struct gpu_stmt_access *,\n\t\t\t\t\tgroup->n_ref);\n\tif (!group->refs)\n\t\treturn gpu_array_ref_group_free(group);\n\tfor (i = 0; i < group1->n_ref; ++i)\n\t\tgroup->refs[i] = group1->refs[i];\n\tfor (i = 0; i < group2->n_ref; ++i)\n\t\tgroup->refs[group1->n_ref + i] = group2->refs[i];\n\n\treturn group;\n}\n\n/* Combine the given two groups into a single group and free\n * the original two groups.\n */\nstatic struct gpu_array_ref_group *join_groups_and_free(\n\tstruct gpu_array_ref_group *group1,\n\tstruct gpu_array_ref_group *group2)\n{\n\tstruct gpu_array_ref_group *group;\n\n\tgroup = join_groups(group1, group2);\n\tgpu_array_ref_group_free(group1);\n\tgpu_array_ref_group_free(group2);\n\treturn group;\n}\n\n/* Report that the array reference group with the given access relation\n * is not mapped to shared memory in the given kernel because\n * it does not exhibit any reuse and is considered to be coalesced.\n */\nstatic void report_no_reuse_and_coalesced(struct ppcg_kernel *kernel,\n\t__isl_keep isl_union_map *access)\n{\n\tisl_ctx *ctx;\n\tisl_printer *p;\n\n\tctx = isl_union_map_get_ctx(access);\n\tp = isl_printer_to_file(ctx, stdout);\n\tp = isl_printer_print_str(p, \"Array reference group \");\n\tp = isl_printer_print_union_map(p, access);\n\tp = isl_printer_print_str(p,\n\t    \" not considered for mapping to shared memory in kernel\");\n\tp = isl_printer_print_int(p, kernel->id);\n\tp = isl_printer_print_str(p,\n\t    \" because it exhibits no reuse and is considered to be coalesced\");\n\tp = isl_printer_end_line(p);\n\tisl_printer_free(p);\n}\n\n/* Given an access relation in terms of the data->thread_depth initial\n * dimensions of the computed schedule and the thread identifiers\n * (as parameters), check if the use of the corresponding private tile\n * requires unrolling.\n *\n * If we are creating a private tile because we are forced to,\n * then no unrolling is required.\n * Otherwise we check if \"access\" is bijective and unrolling\n * is required if it is not.  Note that the access relation\n * has already been determined to be bijective before the introduction\n * of the thread identifiers and the removal of the schedule dimensions\n * that are mapped to these threads.  If the access relation is no longer\n * bijective, then this means that more than one value of one of those\n * schedule dimensions is mapped to the same thread and therefore\n * unrolling is required.\n */\nstatic int check_requires_unroll(struct gpu_group_data *data,\n\t__isl_keep isl_map *access, int force_private)\n{\n\tint bijective;\n\n\tif (force_private)\n\t\treturn 0;\n\tbijective = access_is_bijective(data, access);\n\tif (bijective < 0)\n\t\treturn -1;\n\treturn !bijective;\n}\n\n/* Map the domain of \"access\" to the outer data->shared_depth\n * schedule dimensions.  When data->shared_depth is equal to\n * data->thread_depth, this result is already available in group->access.\n */\nstatic __isl_give isl_map *shared_access(struct gpu_array_ref_group *group,\n\t__isl_keep isl_union_map *access, struct gpu_group_data *data)\n{\n\tisl_union_map *shared;\n\n\tif (data->shared_depth == data->thread_depth)\n\t\treturn isl_map_copy(group->access);\n\n\tshared = isl_union_map_copy(access);\n\tshared = isl_union_map_apply_domain(shared,\n\t\t\tisl_union_map_copy(data->shared_sched));\n\treturn isl_map_from_union_map(shared);\n}\n\n/* Compute the private and/or shared memory tiles for the array\n * reference group \"group\" of array \"array\".\n * Return isl_stat_ok on success and isl_stat_error on error.\n *\n * If the array is a read-only scalar or if the user requested\n * not to use shared or private memory, then we do not need to do anything.\n *\n * If any reference in the reference group accesses more than one element,\n * then we would have to make sure that the layout in shared memory\n * is the same as that in global memory.  Since we do not handle this yet\n * (and it may not even be possible), we refuse to map to private or\n * shared memory in such cases.\n *\n * If the array group involves any may writes (that are not must writes),\n * then we would have to make sure that we load the data into shared/private\n * memory first in case the data is not written by the kernel\n * (but still written back out to global memory).\n * Since we don't have any such mechanism at the moment, we don't\n * compute shared/private tiles for groups involving may writes.\n *\n * We only try to compute a shared memory tile if there is any reuse\n * or if the access is not coalesced.\n * Reuse and coalescing are checked within the given kernel.\n *\n * For computing a private memory tile, we also require that there is\n * some reuse.  Moreover, we require that the access is private\n * to the thread.  That is, we check that any given array element\n * is only accessed by a single thread.\n * We compute an access relation that maps the outer\n * data->thread_depth + data->n_thread schedule dimensions.\n * The latter data->n_thread will be mapped to thread identifiers.\n * We actually check that those iterators that will be wrapped\n * partition the array space.  This check is stricter than necessary\n * since several iterations may be mapped onto the same thread\n * and then they could be allowed to access the same memory elements,\n * but our check does not allow this situation.\n *\n * For private memory tiles, the number of schedule dimensions that\n * affect the offset is computed and stored in tile->depth, with\n * a lower bound of data->kernel_depth.  If this depth is smaller\n * than the minimal depth that still ensures that every element\n * is accessed by a single thread, then the depth is raised\n * to this minimal depth.\n * The fields of the tile are then adjusted to only refer to the tile->depth\n * outer schedule dimensions.\n *\n * We also check that the index expression only depends on parallel\n * loops.  That way, we can move those loops innermost and unroll them.\n * Again, we use a test that is stricter than necessary.\n * We actually check whether the index expression only depends\n * on the iterators that are wrapped over the threads.\n * These are necessarily parallel, but there may be more parallel loops.\n *\n * Combining the injectivity of the first test with the single-valuedness\n * of the second test, we simply test for bijectivity.\n *\n * If the use of the private tile requires unrolling, but some\n * of the other arrays are forcibly mapped to private memory,\n * then we do not allow the use of this private tile since\n * we cannot move the schedule dimensions that need to be unrolled down\n * without performing some kind of expansion on those arrays\n * that are forcibly mapped to private memory.\n *\n * If the array is marked force_private, then we bypass all checks\n * and assume we can (and should) use registers only.\n *\n * If it turns out we can (or have to) use registers, we compute\n * the private memory tile size using can_tile, after introducing a dependence\n * on the thread indices.\n */\nstatic isl_stat compute_group_bounds_core(struct ppcg_kernel *kernel,\n\tstruct gpu_array_ref_group *group, struct gpu_group_data *data)\n{\n\tisl_ctx *ctx = isl_space_get_ctx(group->array->space);\n\tisl_union_map *access, *local;\n\tint n_index = group->array->n_index;\n\tint no_reuse, coalesced;\n\tisl_map *acc;\n\tint force_private = group->local_array->force_private;\n\tint use_shared = !force_private && kernel->options->use_shared_memory &&\n\t\t\t\tdata->n_thread > 0;\n\tint use_private = force_private || kernel->options->use_private_memory;\n\tisl_stat r = isl_stat_ok;\n\tisl_bool ok;\n\tint requires_unroll;\n\tint unique_depth;\n\n\tif (!use_shared && !use_private)\n\t\treturn isl_stat_ok;\n\tif (gpu_array_is_read_only_scalar(group->array))\n\t\treturn isl_stat_ok;\n\tif (!force_private && !group->exact_write)\n\t\treturn isl_stat_ok;\n\tif (group->slice)\n\t\treturn isl_stat_ok;\n\n\taccess = gpu_array_ref_group_access_relation(group, 1, 1);\n\tlocal = localize_access(data, isl_union_map_copy(access));\n\tno_reuse = isl_union_map_is_injective(local);\n\tif (no_reuse < 0)\n\t\tr = isl_stat_error;\n\tif (use_shared && no_reuse)\n\t\tcoalesced = access_is_coalesced(data, local);\n\tisl_union_map_free(local);\n\n\tif (r >= 0 && kernel->options->debug->verbose &&\n\t    use_shared && no_reuse && coalesced)\n\t\treport_no_reuse_and_coalesced(kernel, access);\n\n\tif (use_shared && (!no_reuse || !coalesced)) {\n\t\tgroup->shared_tile = gpu_array_tile_create(ctx,\n\t\t\t\t\t\t\tgroup->array->n_index);\n\t\tacc = shared_access(group, access, data);\n\t\tok = can_tile(acc, group->shared_tile);\n\t\tif (ok < 0)\n\t\t\tr = isl_stat_error;\n\t\telse if (!ok)\n\t\t\tgroup->shared_tile =\n\t\t\t\t\tgpu_array_tile_free(group->shared_tile);\n\t\tisl_map_free(acc);\n\t}\n\n\tif (r < 0 || (!force_private && (!use_private || no_reuse))) {\n\t\tisl_union_map_free(access);\n\t\treturn r;\n\t}\n\n\taccess = isl_union_map_apply_domain(access,\n\t\t\t\t\tisl_union_map_copy(data->thread_sched));\n\n\tacc = isl_map_from_union_map(access);\n\n\tif (!force_private && !access_is_bijective(data, acc)) {\n\t\tisl_map_free(acc);\n\t\treturn isl_stat_ok;\n\t}\n\n\tunique_depth = compute_accessed_by_single_thread_depth(data, acc);\n\n\tacc = isl_map_intersect_domain(acc, isl_set_copy(data->privatization));\n\tacc = isl_map_project_out(acc, isl_dim_in, data->thread_depth,\n\t\t\t\t\t\t\t\tdata->n_thread);\n\trequires_unroll = check_requires_unroll(data, acc, force_private);\n\tif (unique_depth < 0 || requires_unroll < 0 ||\n\t    (requires_unroll && kernel->any_force_private)) {\n\t\tisl_map_free(acc);\n\t\treturn requires_unroll < 0 ? isl_stat_error : isl_stat_ok;\n\t}\n\n\tgroup->private_tile = gpu_array_tile_create(ctx, n_index);\n\tgroup->private_tile->requires_unroll = requires_unroll;\n\tok = can_tile(acc, group->private_tile);\n\tif (ok >= 0 && !ok)\n\t\tgroup->private_tile = gpu_array_tile_free(group->private_tile);\n\tisl_map_free(acc);\n\tif (ok < 0)\n\t\treturn isl_stat_error;\n\n\tif (group->private_tile) {\n\t\tstruct gpu_array_tile *tile = group->private_tile;\n\t\tint tile_depth = compute_tile_depth(data, tile);\n\t\tif (tile_depth < unique_depth)\n\t\t\ttile_depth = unique_depth;\n\t\tif (tile_adjust_depth(tile, tile_depth) < 0)\n\t\t\treturn isl_stat_error;\n\t}\n\n\tif (force_private && !group->private_tile)\n\t\tisl_die(ctx, isl_error_internal,\n\t\t\t\"unable to map array reference group to registers\",\n\t\t\treturn isl_stat_error);\n\n\treturn isl_stat_ok;\n}\n\n/* Compute the private and/or shared memory tiles for the array\n * reference group \"group\" of array \"array\" and set the tile depth.\n * Return 0 on success and -1 on error.\n */\nstatic int compute_group_bounds(struct ppcg_kernel *kernel,\n\tstruct gpu_array_ref_group *group, struct gpu_group_data *data)\n{\n\tif (!group)\n\t\treturn -1;\n\tif (compute_group_bounds_core(kernel, group, data) < 0)\n\t\treturn -1;\n\tif (set_depth(data, group) < 0)\n\t\treturn -1;\n\n\treturn 0;\n}\n\n/* If two groups have overlapping access relations (as determined by\n * the \"overlap\" function) and if one of them involves a write,\n * then merge the two groups into one.\n * If \"compute_bounds\" is set, then call compute_group_bounds\n * on the merged groups.\n * If any group is merged into the current group, then its access\n * relation may have changed or it may have been turned into a write.\n * The combined group might therefore overlap with groups that\n * the original group did not overlap with.  The groups therefore\n * need to be checked again.\n *\n * Return the updated number of groups.\n * Return -1 on error.\n */\nstatic int group_writes(struct ppcg_kernel *kernel,\n\tint n, struct gpu_array_ref_group **groups,\n\tint (*overlap)(struct gpu_array_ref_group *group1,\n\t\tstruct gpu_array_ref_group *group2), int compute_bounds,\n\tstruct gpu_group_data *data)\n{\n\tint i, j;\n\tint any_merge;\n\n\tfor (i = 0; i < n; i += !any_merge) {\n\t\tany_merge = 0;\n\t\tfor (j = n - 1; j > i; --j) {\n\t\t\tif (!groups[i]->write && !groups[j]->write)\n\t\t\t\tcontinue;\n\n\t\t\tif (!overlap(groups[i], groups[j]))\n\t\t\t\tcontinue;\n\n\t\t\tany_merge = 1;\n\t\t\tgroups[i] = join_groups_and_free(groups[i], groups[j]);\n\t\t\tif (j != n - 1)\n\t\t\t\tgroups[j] = groups[n - 1];\n\t\t\tgroups[n - 1] = NULL;\n\t\t\tn--;\n\n\t\t\tif (!groups[i])\n\t\t\t\treturn -1;\n\t\t\tif (compute_bounds &&\n\t\t\t    compute_group_bounds(kernel, groups[i], data) < 0)\n\t\t\t\treturn -1;\n\t\t}\n\t}\n\n\treturn n;\n}\n\n/* If two groups have overlapping access relations (within the innermost\n * loop) and if one of them involves a write, then merge the two groups\n * into one.\n *\n * Return the updated number of groups.\n */\nstatic int group_overlapping_writes(struct ppcg_kernel *kernel,\n\tint n, struct gpu_array_ref_group **groups,\n\tstruct gpu_group_data *data)\n{\n\treturn group_writes(kernel, n, groups, &accesses_overlap, 0, data);\n}\n\n/* Check if the access relations of group1 and group2 overlap within\n * the outermost min(group1->min_depth, group2->min_depth) loops.\n */\nstatic int depth_accesses_overlap(struct gpu_array_ref_group *group1,\n\tstruct gpu_array_ref_group *group2)\n{\n\tint depth;\n\tint dim;\n\tint empty;\n\tisl_map *map_i, *map_j, *map;\n\n\tdepth = group1->min_depth;\n\tif (group2->min_depth < depth)\n\t\tdepth = group2->min_depth;\n\tmap_i = isl_map_copy(group1->access);\n\tdim = isl_map_dim(map_i, isl_dim_in);\n\tmap_i = isl_map_eliminate(map_i, isl_dim_in, depth, dim - depth);\n\tmap_j = isl_map_copy(group2->access);\n\tmap_j = isl_map_eliminate(map_j, isl_dim_in, depth, dim - depth);\n\tmap = isl_map_intersect(map_i, map_j);\n\tempty = isl_map_is_empty(map);\n\tisl_map_free(map);\n\n\treturn !empty;\n}\n\n/* If two groups have overlapping access relations (within the outer\n * depth loops) and if one of them involves a write,\n * then merge the two groups into one.\n *\n * Return the updated number of groups.\n */\nstatic int group_depth_overlapping_writes(struct ppcg_kernel *kernel,\n\tint n, struct gpu_array_ref_group **groups, struct gpu_group_data *data)\n{\n\treturn group_writes(kernel, n, groups, &depth_accesses_overlap, 1,\n\t\t\t\tdata);\n}\n\n/* Is the size of the tile specified by \"tile\" smaller than the sum of\n * the sizes of the tiles specified by \"tile1\" and \"tile2\"?\n */\nstatic int smaller_tile(struct gpu_array_tile *tile,\n\tstruct gpu_array_tile *tile1, struct gpu_array_tile *tile2)\n{\n\tint smaller;\n\tisl_val *size, *size1, *size2;\n\n\tsize = gpu_array_tile_size(tile);\n\tsize1 = gpu_array_tile_size(tile1);\n\tsize2 = gpu_array_tile_size(tile2);\n\n\tsize = isl_val_sub(size, size1);\n\tsize = isl_val_sub(size, size2);\n\tsmaller = isl_val_is_neg(size);\n\n\tisl_val_free(size);\n\n\treturn smaller;\n}\n\n/* Given an initial grouping of array references and shared memory tiles\n * for each group that allows for a shared memory tile, merge two groups\n * if both have a shared memory tile, the merged group also has\n * a shared memory tile and the size of the tile for the merge group\n * is smaller than the sum of the tile sizes of the individual groups.\n * If any group is merged into the current group, then it may become\n * profitable to combine it with groups that were considered before\n * the merge.  The groups are therefore checked again after a merge.\n *\n * If merging two groups decreases the depth of the tile of\n * one or both of the two groups, then we need to check for overlapping\n * writes again.\n *\n * Return the number of groups after merging.\n * Return -1 on error.\n */\nstatic int group_common_shared_memory_tile(struct ppcg_kernel *kernel,\n\tstruct gpu_array_info *array, int n,\n\tstruct gpu_array_ref_group **groups, struct gpu_group_data *data)\n{\n\tint i, j;\n\tint recompute_overlap = 0;\n\tint any_merge;\n\n\tfor (i = 0; i < n; i += !any_merge) {\n\t\tany_merge = 0;\n\t\tif (!groups[i]->shared_tile)\n\t\t\tcontinue;\n\t\tfor (j = n - 1; j > i; --j) {\n\t\t\tstruct gpu_array_ref_group *group;\n\n\t\t\tif (!groups[j]->shared_tile)\n\t\t\t\tcontinue;\n\n\t\t\tif (!depth_accesses_overlap(groups[i], groups[j]))\n\t\t\t\tcontinue;\n\n\t\t\tgroup = join_groups(groups[i], groups[j]);\n\t\t\tif (compute_group_bounds(kernel, group, data) < 0) {\n\t\t\t\tgpu_array_ref_group_free(group);\n\t\t\t\treturn -1;\n\t\t\t}\n\t\t\tif (!group->shared_tile ||\n\t\t\t    !smaller_tile(group->shared_tile,\n\t\t\t\t\tgroups[i]->shared_tile,\n\t\t\t\t\tgroups[j]->shared_tile)) {\n\t\t\t\tgpu_array_ref_group_free(group);\n\t\t\t\tcontinue;\n\t\t\t}\n\n\t\t\tany_merge = 1;\n\t\t\tif (group->min_depth < groups[i]->min_depth ||\n\t\t\t    group->min_depth < groups[j]->min_depth)\n\t\t\t\trecompute_overlap = 1;\n\t\t\tgpu_array_ref_group_free(groups[i]);\n\t\t\tgpu_array_ref_group_free(groups[j]);\n\t\t\tgroups[i] = group;\n\t\t\tif (j != n - 1)\n\t\t\t\tgroups[j] = groups[n - 1];\n\t\t\tn--;\n\t\t}\n\t}\n\n\tif (recompute_overlap)\n\t\tn = group_depth_overlapping_writes(kernel, n, groups, data);\n\treturn n;\n}\n\n/* Set array->n_group and array->groups to n and groups.\n *\n * Additionally, set the \"nr\" field of each group.\n */\nstatic void set_array_groups(struct gpu_local_array_info *array,\n\tint n, struct gpu_array_ref_group **groups)\n{\n\tint i;\n\n\tarray->n_group = n;\n\tarray->groups = groups;\n\n\tfor (i = 0; i < n; ++i)\n\t\tgroups[i]->nr = i;\n}\n\n/* Combine all groups in \"groups\" into a single group and return\n * the new number of groups (1 or 0 if there were no groups to start with).\n */\nstatic int join_all_groups(int n, struct gpu_array_ref_group **groups)\n{\n\tint i;\n\n\tfor (i = n - 1; i > 0; --i) {\n\t\tgroups[0] = join_groups_and_free(groups[0], groups[i]);\n\t\tgroups[i] = NULL;\n\t\tn--;\n\t}\n\n\treturn n;\n}\n\n/* Group array references that should be considered together when\n * deciding whether to access them from private, shared or global memory.\n * Return -1 on error.\n *\n * In particular, if two array references overlap and if one of them\n * is a write, then the two references are grouped together.\n * We first perform an initial grouping based only on the access relation.\n * After computing shared and private memory tiles, we check for\n * overlapping writes again, but this time taking into account\n * the depth of the effective tile.\n *\n * Furthermore, if two groups admit a shared memory tile and if the\n * combination of the two also admits a shared memory tile, we merge\n * the two groups.\n *\n * If the array contains structures, then we compute a single\n * reference group without trying to find any tiles\n * since we do not map such arrays to private or shared\n * memory.  The only exception is when those arrays of structures\n * are required to be mapped to private memory.\n */\nstatic int group_array_references(struct ppcg_kernel *kernel,\n\tstruct gpu_local_array_info *local, struct gpu_group_data *data)\n{\n\tint i;\n\tint n;\n\tisl_ctx *ctx = isl_union_map_get_ctx(data->shared_sched);\n\tstruct gpu_array_ref_group **groups;\n\n\tgroups = isl_calloc_array(ctx, struct gpu_array_ref_group *,\n\t\t\t\t\tlocal->array->n_ref);\n\tif (!groups)\n\t\treturn -1;\n\n\tn = populate_array_references(local, groups, data);\n\n\tif (local->array->has_compound_element && !local->force_private) {\n\t\tn = join_all_groups(n, groups);\n\t\tset_array_groups(local, n, groups);\n\t\treturn 0;\n\t}\n\n\tn = group_overlapping_writes(kernel, n, groups, data);\n\n\tfor (i = 0; i < n; ++i)\n\t\tif (compute_group_bounds(kernel, groups[i], data) < 0)\n\t\t\tn = -1;\n\n\tn = group_depth_overlapping_writes(kernel, n, groups, data);\n\n\tn = group_common_shared_memory_tile(kernel, local->array,\n\t\t\t\t\t    n, groups, data);\n\n\tset_array_groups(local, n, groups);\n\n\tif (n >= 0)\n\t\treturn 0;\n\n\tfor (i = 0; i < local->array->n_ref; ++i)\n\t\tgpu_array_ref_group_free(groups[i]);\n\treturn -1;\n}\n\n/* For each array in the input program that can be mapped to private memory,\n * check if there are any order dependences active inside the current kernel,\n * within the same iteration of the host schedule, i.e., the prefix\n * schedule at \"node\".\n * If so, mark the array as force_private so that its reference groups will be\n * mapped to a registers.\n *\n * Note that the arrays that cannot be mapped to private memory have\n * had their order dependences added to prog->array_order and\n * subsequently to the coincidence constraints.\n */\nstatic void check_can_be_private_live_ranges(struct ppcg_kernel *kernel,\n\t__isl_keep isl_schedule_node *node)\n{\n\tint i;\n\tisl_union_set *domain;\n\tisl_multi_union_pw_aff *prefix;\n\tisl_union_pw_multi_aff *contraction;\n\n\tif (!kernel->options->live_range_reordering)\n\t\treturn;\n\n\tkernel->any_force_private = 0;\n\n\tprefix = isl_schedule_node_get_prefix_schedule_multi_union_pw_aff(node);\n\tcontraction = isl_union_pw_multi_aff_copy(kernel->contraction);\n\tprefix = isl_multi_union_pw_aff_pullback_union_pw_multi_aff(prefix,\n\t\t\t\t\t\t\t\tcontraction);\n\tdomain = isl_union_set_copy(kernel->expanded_domain);\n\tdomain = isl_union_set_universe(domain);\n\n\tfor (i = 0; i < kernel->n_array; ++i) {\n\t\tstruct gpu_local_array_info *local = &kernel->array[i];\n\t\tisl_union_map *order;\n\n\t\tlocal->force_private = 0;\n\t\tif (!gpu_array_can_be_private(local->array))\n\t\t\tcontinue;\n\t\torder = isl_union_map_copy(local->array->dep_order);\n\t\torder = isl_union_map_intersect_domain(order,\n\t\t\t\t\t\t    isl_union_set_copy(domain));\n\t\torder = isl_union_map_intersect_range(order,\n\t\t\t\t\t\t    isl_union_set_copy(domain));\n\t\torder = isl_union_map_eq_at_multi_union_pw_aff(order,\n\t\t\t\t\tisl_multi_union_pw_aff_copy(prefix));\n\t\tif (!isl_union_map_is_empty(order)) {\n\t\t\tlocal->force_private = 1;\n\t\t\tkernel->any_force_private = 1;\n\t\t}\n\t\tisl_union_map_free(order);\n\t}\n\n\tisl_multi_union_pw_aff_free(prefix);\n\tisl_union_set_free(domain);\n}\n\n/* Expand the domain of the schedule \"s\" by plugging in\n * the contraction \"contraction\" and return the result.\n */\nstatic __isl_give isl_union_map *expand(__isl_take isl_union_map *s,\n\t__isl_keep isl_union_pw_multi_aff *contraction)\n{\n\tcontraction = isl_union_pw_multi_aff_copy(contraction);\n\ts = isl_union_map_preimage_domain_union_pw_multi_aff(s, contraction);\n\treturn s;\n}\n\n/* Create a set of dimension data->thread_depth + data->n_thread\n * that equates the residue of the final data->n_thread dimensions\n * modulo the kernel->block_dim sizes to the thread identifiers.\n * Store the computed set in data->privatization.\n *\n * The construction starts with the space of kernel->thread_filter,\n * which is known to reference all thread identifiers.\n */\nstatic void compute_privatization(struct gpu_group_data *data,\n\tstruct ppcg_kernel *kernel)\n{\n\tint i;\n\tisl_ctx *ctx;\n\tisl_space *space;\n\tisl_local_space *ls;\n\tisl_set *set;\n\n\tctx = isl_union_map_get_ctx(data->shared_sched);\n\tspace = isl_union_set_get_space(kernel->thread_filter);\n\tspace = isl_space_set_from_params(space);\n\tspace = isl_space_add_dims(space, isl_dim_set,\n\t\t\t\t    data->thread_depth + data->n_thread);\n\tset = isl_set_universe(space);\n\tspace = isl_set_get_space(set);\n\tls = isl_local_space_from_space(space);\n\n\tfor (i = 0; i < data->n_thread; ++i) {\n\t\tisl_aff *aff, *aff2;\n\t\tisl_constraint *c;\n\t\tisl_val *v;\n\t\tisl_id *id;\n\t\tint pos;\n\n\t\tif (!set)\n\t\t\tbreak;\n\n\t\taff = isl_aff_var_on_domain(isl_local_space_copy(ls),\n\t\t\t\t\tisl_dim_set, data->thread_depth + i);\n\t\tv = isl_val_int_from_si(ctx, kernel->block_dim[i]);\n\t\taff = isl_aff_mod_val(aff, v);\n\t\tid = isl_id_list_get_id(kernel->thread_ids, i);\n\t\tpos = isl_set_find_dim_by_id(set, isl_dim_param, id);\n\t\tisl_id_free(id);\n\t\taff2 = isl_aff_var_on_domain(isl_local_space_copy(ls),\n\t\t\t\t\tisl_dim_param, pos);\n\t\taff = isl_aff_sub(aff, aff2);\n\t\tc = isl_equality_from_aff(aff);\n\t\tset = isl_set_add_constraint(set, c);\n\t}\n\n\tisl_local_space_free(ls);\n\tdata->privatization = set;\n}\n\n/* Return the prefix schedule at \"node\" as a relation\n * between domain elements and schedule dimensions after detecting\n * equalities in this relation.\n */\nstatic __isl_give isl_union_map *prefix_with_equalities(\n\t__isl_keep isl_schedule_node *node)\n{\n\tisl_union_map *schedule;\n\n\tschedule = isl_schedule_node_get_prefix_schedule_relation(node);\n\tschedule = isl_union_map_detect_equalities(schedule);\n\n\treturn schedule;\n}\n\n/* Group references of all arrays in \"kernel\".\n * \"node\" points to the kernel mark.\n * The mapping to shared memory in computed at the \"shared\" mark.\n *\n * We first extract all required schedule information into\n * a gpu_group_data structure and then consider each array\n * in turn.\n */\nint gpu_group_references(struct ppcg_kernel *kernel,\n\t__isl_keep isl_schedule_node *node)\n{\n\tint i;\n\tint r = 0;\n\tisl_union_pw_multi_aff *contraction;\n\tstruct gpu_group_data data;\n\n\tcheck_can_be_private_live_ranges(kernel, node);\n\n\tdata.scop = kernel->prog->scop;\n\n\tdata.kernel_depth = isl_schedule_node_get_schedule_depth(node);\n\tdata.host_sched = isl_schedule_node_get_prefix_schedule_relation(node);\n\n\tnode = isl_schedule_node_copy(node);\n\tnode = gpu_tree_move_down_to_shared(node, kernel->core);\n\tdata.shared_depth = isl_schedule_node_get_schedule_depth(node);\n\tdata.shared_sched = prefix_with_equalities(node);\n\n\tnode = gpu_tree_move_down_to_thread(node, kernel->core);\n\tnode = isl_schedule_node_child(node, 0);\n\tdata.thread_depth = isl_schedule_node_get_schedule_depth(node);\n\tdata.n_thread = isl_schedule_node_band_n_member(node);\n\tif (data.thread_depth == data.shared_depth)\n\t\tdata.copy_sched = isl_union_map_copy(data.shared_sched);\n\telse\n\t\tdata.copy_sched = prefix_with_equalities(node);\n\tdata.thread_sched = isl_union_map_copy(data.copy_sched);\n\tdata.thread_sched = isl_union_map_flat_range_product(data.thread_sched,\n\t\tisl_schedule_node_band_get_partial_schedule_union_map(node));\n\tdata.thread_sched = isl_union_map_detect_equalities(data.thread_sched);\n\n\tcontraction = isl_union_pw_multi_aff_copy(kernel->contraction);\n\tdata.host_sched = expand(data.host_sched, contraction);\n\tdata.shared_sched = expand(data.shared_sched, contraction);\n\tif (data.thread_depth == data.shared_depth) {\n\t\tisl_union_map_free(data.copy_sched);\n\t\tdata.copy_sched = isl_union_map_copy(data.shared_sched);\n\t} else {\n\t\tdata.copy_sched = expand(data.copy_sched, contraction);\n\t}\n\tdata.thread_sched = expand(data.thread_sched, contraction);\n\tisl_union_pw_multi_aff_free(contraction);\n\n\tnode = isl_schedule_node_child(node, 0);\n\tdata.full_sched = isl_union_map_copy(data.thread_sched);\n\tdata.full_sched = isl_union_map_flat_range_product(data.full_sched,\n\t\tisl_schedule_node_get_subtree_schedule_union_map(node));\n\tisl_schedule_node_free(node);\n\n\tcompute_privatization(&data, kernel);\n\n\tfor (i = 0; i < kernel->n_array; ++i) {\n\t\tr = group_array_references(kernel, &kernel->array[i], &data);\n\t\tif (r < 0)\n\t\t\tbreak;\n\t}\n\n\tisl_union_map_free(data.host_sched);\n\tisl_union_map_free(data.shared_sched);\n\tisl_union_map_free(data.copy_sched);\n\tisl_union_map_free(data.thread_sched);\n\tisl_union_map_free(data.full_sched);\n\tisl_set_free(data.privatization);\n\n\treturn r;\n}\n\n/* Given a description of an array tile \"tile\" and the \"space\"\n *\n *\t{ D -> A }\n *\n * where D represents the first tile->depth schedule dimensions\n * and A represents the array, construct an isl_multi_aff\n *\n *\t{ [D[i] -> A[a]] -> A'[a'] }\n *\n * with A' a scaled down copy of A according to the shifts and strides\n * in \"tile\".  In particular,\n *\n *\ta' = (a + shift(i))/stride\n *\n * \"insert_array\" represents\n *\n *\t{ [D -> A] -> D }\n *\n * and is used to insert A into the domain of functions that only\n * reference D.\n */\nstatic __isl_give isl_multi_aff *strided_tile(\n\tstruct gpu_array_tile *tile, __isl_keep isl_space *space,\n\t__isl_keep isl_multi_aff *insert_array)\n{\n\tint i;\n\tisl_ctx *ctx;\n\tisl_multi_aff *shift;\n\tisl_multi_val *stride;\n\tisl_space *space2;\n\tisl_local_space *ls;\n\tisl_multi_aff *tiling;\n\n\tctx = isl_space_get_ctx(space);\n\tspace2 = isl_space_domain(isl_space_copy(space));\n\tls = isl_local_space_from_space(space2);\n\tspace2 = isl_space_range(isl_space_copy(space));\n\tstride = isl_multi_val_zero(space2);\n\tshift = isl_multi_aff_zero(isl_space_copy(space));\n\n\tfor (i = 0; i < tile->n; ++i) {\n\t\tstruct gpu_array_bound *bound = &tile->bound[i];\n\t\tisl_val *stride_i;\n\t\tisl_aff *shift_i;\n\n\t\tstride_i = isl_val_copy(bound->stride);\n\t\tshift_i = isl_aff_copy(bound->shift);\n\n\t\tstride = isl_multi_val_set_val(stride, i, stride_i);\n\t\tshift = isl_multi_aff_set_aff(shift, i, shift_i);\n\t}\n\tisl_local_space_free(ls);\n\n\tshift = isl_multi_aff_pullback_multi_aff(shift,\n\t\t\t\t    isl_multi_aff_copy(insert_array));\n\n\ttiling = isl_multi_aff_range_map(isl_space_copy(space));\n\ttiling = isl_multi_aff_add(tiling, shift);\n\ttiling = isl_multi_aff_scale_down_multi_val(tiling, stride);\n\n\treturn tiling;\n}\n\n/* Compute a tiling for the array reference group \"group\".\n *\n * The tiling is of the form\n *\n *\t{ [D[i] -> A[a]] -> T[t] }\n *\n * where D represents the first tile->depth schedule dimensions,\n * A represents the global array and T represents the shared or\n * private memory tile.  The name of T is the name of the local\n * array.\n *\n * If there is any stride in the accesses, then the mapping is\n *\n *\tt = (a + shift(i))/stride - lb(i)\n *\n * otherwise, it is simply\n *\n *\tt = a - lb(i)\n */\nvoid gpu_array_ref_group_compute_tiling(struct gpu_array_ref_group *group)\n{\n\tint i;\n\tstruct gpu_array_tile *tile;\n\tisl_space *space;\n\tisl_multi_aff *tiling, *lb, *insert_array;\n\tisl_printer *p;\n\tchar *local_name;\n\n\ttile = gpu_array_ref_group_tile(group);\n\tif (!tile)\n\t\treturn;\n\n\tspace = isl_map_get_space(group->access);\n\tspace = isl_space_from_range(isl_space_range(space));\n\tspace = isl_space_add_dims(space, isl_dim_in, tile->depth);\n\tinsert_array = isl_multi_aff_domain_map(isl_space_copy(space));\n\n\tfor (i = 0; i < tile->n; ++i)\n\t\tif (tile->bound[i].shift)\n\t\t\tbreak;\n\n\tif (i < tile->n)\n\t\ttiling = strided_tile(tile, space, insert_array);\n\telse\n\t\ttiling = isl_multi_aff_range_map(isl_space_copy(space));\n\n\tlb = isl_multi_aff_zero(space);\n\tfor (i = 0; i < tile->n; ++i) {\n\t\tisl_aff *lb_i = isl_aff_copy(tile->bound[i].lb);\n\t\tlb = isl_multi_aff_set_aff(lb, i, lb_i);\n\t}\n\tlb = isl_multi_aff_pullback_multi_aff(lb, insert_array);\n\n\ttiling = isl_multi_aff_sub(tiling, lb);\n\n\tp = isl_printer_to_str(isl_multi_aff_get_ctx(tiling));\n\tp = gpu_array_ref_group_print_name(group, p);\n\tlocal_name = isl_printer_get_str(p);\n\tisl_printer_free(p);\n\ttiling = isl_multi_aff_set_tuple_name(tiling, isl_dim_out, local_name);\n\tfree(local_name);\n\n\ttile->tiling = tiling;\n}\n"
  },
  {
    "path": "src/ppcg_files/gpu_group.h",
    "content": "#ifndef GPU_GROUP_H\n#define GPU_GROUP_H\n\n#include <isl/schedule_node.h>\n#include \"gpu.h\"\n\n/* A group of array references in a kernel that should be handled together.\n * If private_tile is not NULL, then it is mapped to registers.\n * Otherwise, if shared_tile is not NULL, it is mapped to shared memory.\n * Otherwise, it is accessed from global memory.\n * Note that if both private_tile and shared_tile are set, then shared_tile\n * is only used inside group_common_shared_memory_tile.\n */\nstruct gpu_array_ref_group\n{\n\t/* The references in this group access this local array. */\n\tstruct gpu_local_array_info *local_array;\n\t/* This is the corresponding array. */\n\tstruct gpu_array_info *array;\n\t/* Position of this group in the list of reference groups of array. */\n\tint nr;\n\n\t/* The following fields are use during the construction of the groups.\n\t * access is the combined access relation relative to the private\n\t * memory tiling.  In particular, the domain of the map corresponds\n\t * to the first thread_depth dimensions of the kernel schedule.\n\t * write is set if any access in the group is a write.\n\t * exact_write is set if all writes are definite writes.\n\t * slice is set if there is at least one access in the group\n\t * that refers to more than one element\n\t * \"min_depth\" is the minimum of the tile depths and thread_depth.\n\t */\n\tisl_map *access;\n\tint write;\n\tint exact_write;\n\tint slice;\n\tint min_depth;\n\n\t/* The shared memory tile, NULL if none. */\n\tstruct gpu_array_tile *shared_tile;\n\n\t/* The private memory tile, NULL if none. */\n\tstruct gpu_array_tile *private_tile;\n\n\t/* References in this group; point to elements of a linked list. */\n\tint n_ref;\n\tstruct gpu_stmt_access **refs;\n};\n\nint gpu_group_references(struct ppcg_kernel *kernel,\n\t\t\t\t\t\t\t\t\t\t\t\t __isl_keep isl_schedule_node *node);\n\n__isl_give isl_printer *gpu_array_ref_group_print_name(\n\t\tstruct gpu_array_ref_group *group, __isl_take isl_printer *p);\nvoid gpu_array_ref_group_compute_tiling(struct gpu_array_ref_group *group);\n__isl_give isl_union_map *gpu_array_ref_group_access_relation(\n\t\tstruct gpu_array_ref_group *group, int read, int write);\nint gpu_array_ref_group_requires_unroll(struct gpu_array_ref_group *group);\nenum ppcg_group_access_type gpu_array_ref_group_type(\n\t\tstruct gpu_array_ref_group *group);\nstruct gpu_array_tile *gpu_array_ref_group_tile(\n\t\tstruct gpu_array_ref_group *group);\nstruct gpu_array_ref_group *gpu_array_ref_group_free(\n\t\tstruct gpu_array_ref_group *group);\n\n#endif\n"
  },
  {
    "path": "src/ppcg_files/gpu_hybrid.c",
    "content": "/*\n * Copyright 2013      Ecole Normale Superieure\n * Copyright 2015      Sven Verdoolaege\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege,\n * Ecole Normale Superieure, 45 rue d'Ulm, 75230 Paris, France\n */\n\n#include <string.h>\n\n#include <isl/val.h>\n#include <isl/space.h>\n#include <isl/union_set.h>\n#include <isl/schedule_node.h>\n\n#include \"hybrid.h\"\n#include \"gpu_hybrid.h\"\n#include \"gpu_tree.h\"\n#include \"schedule.h\"\n#include \"util.h\"\n\n/* Have all domain elements been filtered out before reaching\n * the \"node\" position in the schedule tree?\n */\nstatic isl_bool has_empty_domain(__isl_keep isl_schedule_node *node)\n{\n\tisl_union_set *domain;\n\tisl_bool empty;\n\n\tdomain = isl_schedule_node_get_domain(node);\n\tempty = isl_union_set_is_empty(domain);\n\tisl_union_set_free(domain);\n\n\treturn empty;\n}\n\n/* Given a pointer to a phase in the result of hybrid tiling,\n * map the phase to the device, provided the phase is non-empty.\n * Empty phases can occur if the input schedule domain can be\n * covered by a small number of hexagons that all belong to the same phase.\n *\n * The input has the following form:\n *\n *\tM - CT - P - C - ...\n *\n * with M the phase marker, CT the space tiling, P the original\n * parent band and C the original child band.\n * The (outer dimensions of the) C band need to be mapped to threads.\n * The (outer dimension of the) CT band needs to be mapped to blocks.\n * The mapping to shared memory needs to be computed between the CT and\n * the P band.\n *\n * The C band is first shifted to start at zero.\n * Then the appropriate markers are introduced and a kernel is\n * created for the tree rooted at CT.\n * If the \"unroll_gpu_tile\" option is set, then the AST generator\n * is instructed to unroll the P and C bands.\n */\nstatic __isl_give isl_schedule_node *update_phase(\n\t__isl_take isl_schedule_node *node, void *user)\n{\n\tstruct gpu_gen *gen = user;\n\tint depth0, depth;\n\tisl_ctx *ctx;\n\tisl_id *id;\n\tisl_bool empty_domain;\n\tppcg_ht_phase *phase;\n\n\tempty_domain = has_empty_domain(node);\n\tif (empty_domain < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (empty_domain)\n\t\treturn node;\n\n\tif (!node)\n\t\treturn NULL;\n\tctx = isl_schedule_node_get_ctx(node);\n\n\tphase = ppcg_ht_phase_extract_from_mark(node);\n\n\tdepth0 = isl_schedule_node_get_tree_depth(node);\n\n\tnode = isl_schedule_node_child(node, 0);\n\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = isl_schedule_node_child(node, 0);\n\tnode = ppcg_ht_phase_shift_space_point(phase, node);\n\tif (gen->options->unroll_gpu_tile)\n\t\tnode = ppcg_set_schedule_node_type(node, isl_ast_loop_unroll);\n\tid = isl_id_alloc(ctx, \"thread\", NULL);\n\tnode = isl_schedule_node_insert_mark(node, id);\n\tnode = isl_schedule_node_parent(node);\n\tif (gen->options->unroll_gpu_tile)\n\t\tnode = ppcg_set_schedule_node_type(node, isl_ast_loop_unroll);\n\tid = isl_id_alloc(ctx, \"shared\", NULL);\n\tnode = isl_schedule_node_insert_mark(node, id);\n\tnode = isl_schedule_node_parent(node);\n\n\tnode = gpu_create_kernel(gen, node, 0, NULL);\n\n\tdepth = isl_schedule_node_get_tree_depth(node);\n\tnode = isl_schedule_node_ancestor(node, depth - depth0);\n\n\treturn node;\n}\n\n/* Apply hybrid tiling on \"node\" and its parent based on the (valid)\n * bounds on the relative dependence distances \"bounds\" and\n * the tile sizes in \"tile_sizes\".\n * The number of elements in \"tile_sizes\" is at least as large\n * as the sum of the dimensions of the parent and the child node.\n *\n * Convert the tile_sizes to an isl_multi_val in the right space,\n * insert the hybrid tiling and then create a kernel inside each phase.\n * Finally, remove the phase marks.\n */\n__isl_give isl_schedule_node *gpu_hybrid_tile(struct gpu_gen *gen,\n\t__isl_take isl_schedule_node *node, __isl_take ppcg_ht_bounds *bounds,\n\tint *tile_sizes)\n{\n\tisl_multi_val *mv;\n\tisl_space *space, *space2;\n\n\tif (!node || !bounds)\n\t\tgoto error;\n\n\tspace2 = isl_schedule_node_band_get_space(node);\n\tnode = isl_schedule_node_parent(node);\n\tspace = isl_schedule_node_band_get_space(node);\n\tspace = isl_space_product(space, space2);\n\tmv = ppcg_multi_val_from_int_list(space, tile_sizes);\n\n\tnode = ppcg_ht_bounds_insert_tiling(bounds, mv, node, gen->options);\n\n\tnode = hybrid_tile_foreach_phase(node, &update_phase, gen);\n\n\tnode = hybrid_tile_drop_phase_marks(node);\n\n\treturn node;\nerror:\n\tisl_schedule_node_free(node);\n\tppcg_ht_bounds_free(bounds);\n\treturn NULL;\n}\n"
  },
  {
    "path": "src/ppcg_files/gpu_hybrid.h",
    "content": "#ifndef GPU_HYBRID_H\n#define GPU_HYBRID_H\n\n#include <isl/schedule_node.h>\n\n#include \"gpu.h\"\n#include \"hybrid.h\"\n\n__isl_give isl_schedule_node *gpu_hybrid_tile(struct gpu_gen *gen,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t__isl_take isl_schedule_node *node, __isl_take ppcg_ht_bounds *bounds,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tint *tile_sizes);\n\n#endif\n"
  },
  {
    "path": "src/ppcg_files/gpu_print.c",
    "content": "/*\n * Copyright 2012      Ecole Normale Superieure\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege,\n * Ecole Normale Superieure, 45 rue d’Ulm, 75230 Paris, France\n */\n\n#include <string.h>\n\n#include <isl/aff.h>\n\n#include \"gpu_print.h\"\n#include \"print.h\"\n#include \"schedule.h\"\n\n/* Print declarations to \"p\" for arrays that are local to \"prog\"\n * but that are used on the host and therefore require a declaration.\n */\n__isl_give isl_printer *gpu_print_local_declarations(__isl_take isl_printer *p,\n\tstruct gpu_prog *prog)\n{\n\tint i;\n\n\tif (!prog)\n\t\treturn isl_printer_free(p);\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tstruct gpu_array_info *array = &prog->array[i];\n\t\tisl_ast_expr *size;\n\n\t\tif (!array->declare_local)\n\t\t\tcontinue;\n\t\tsize = array->declared_size;\n\t\tp = ppcg_print_declaration_with_size(p, array->type, size);\n\t}\n\n\treturn p;\n}\n\n/* Print an expression for the size of \"array\" in bytes.\n */\n__isl_give isl_printer *gpu_array_info_print_size(__isl_take isl_printer *prn,\n\tstruct gpu_array_info *array)\n{\n\tint i;\n\n\tfor (i = 0; i < array->n_index; ++i) {\n\t\tisl_ast_expr *bound;\n\n\t\tprn = isl_printer_print_str(prn, \"(\");\n\t\tbound = isl_ast_expr_get_op_arg(array->bound_expr, 1 + i);\n\t\tprn = isl_printer_print_ast_expr(prn, bound);\n\t\tisl_ast_expr_free(bound);\n\t\tprn = isl_printer_print_str(prn, \") * \");\n\t}\n\tprn = isl_printer_print_str(prn, \"sizeof(\");\n\tprn = isl_printer_print_str(prn, array->type);\n\tprn = isl_printer_print_str(prn, \")\");\n\n\treturn prn;\n}\n\n/* Print the declaration of a non-linearized array argument.\n */\nstatic __isl_give isl_printer *print_non_linearized_declaration_argument(\n\t__isl_take isl_printer *p, struct gpu_array_info *array)\n{\n\tp = isl_printer_print_str(p, array->type);\n\tp = isl_printer_print_str(p, \" \");\n\n\tp = isl_printer_print_ast_expr(p, array->bound_expr);\n\n\treturn p;\n}\n\n/* Print the declaration of an array argument.\n * \"memory_space\" allows to specify a memory space prefix.\n */\n__isl_give isl_printer *gpu_array_info_print_declaration_argument(\n\t__isl_take isl_printer *p, struct gpu_array_info *array,\n\tconst char *memory_space)\n{\n\tif (gpu_array_is_read_only_scalar(array)) {\n\t\tp = isl_printer_print_str(p, array->type);\n\t\tp = isl_printer_print_str(p, \" \");\n\t\tp = isl_printer_print_str(p, array->name);\n\t\treturn p;\n\t}\n\n\tif (memory_space) {\n\t\tp = isl_printer_print_str(p, memory_space);\n\t\tp = isl_printer_print_str(p, \" \");\n\t}\n\n\tif (array->n_index != 0 && !array->linearize)\n\t\treturn print_non_linearized_declaration_argument(p, array);\n\n\tp = isl_printer_print_str(p, array->type);\n\tp = isl_printer_print_str(p, \" \");\n\tp = isl_printer_print_str(p, \"*\");\n\tp = isl_printer_print_str(p, array->name);\n\n\treturn p;\n}\n\n/* Print the call of an array argument.\n */\n__isl_give isl_printer *gpu_array_info_print_call_argument(\n\t__isl_take isl_printer *p, struct gpu_array_info *array)\n{\n\tif (gpu_array_is_read_only_scalar(array))\n\t\treturn isl_printer_print_str(p, array->name);\n\n\tp = isl_printer_print_str(p, \"dev_\");\n\tp = isl_printer_print_str(p, array->name);\n\n\treturn p;\n}\n\n/* Print an access to the element in the private/shared memory copy\n * described by \"stmt\".  The index of the copy is recorded in\n * stmt->local_index as an access to the array.\n */\nstatic __isl_give isl_printer *stmt_print_local_index(__isl_take isl_printer *p,\n\tstruct ppcg_kernel_stmt *stmt)\n{\n\treturn isl_printer_print_ast_expr(p, stmt->u.c.local_index);\n}\n\n/* Print an access to the element in the global memory copy\n * described by \"stmt\".  The index of the copy is recorded in\n * stmt->index as an access to the array.\n */\nstatic __isl_give isl_printer *stmt_print_global_index(\n\t__isl_take isl_printer *p, struct ppcg_kernel_stmt *stmt)\n{\n\tstruct gpu_array_info *array = stmt->u.c.array;\n\tisl_ast_expr *index;\n\n\tif (gpu_array_is_scalar(array)) {\n\t\tif (!gpu_array_is_read_only_scalar(array))\n\t\t\tp = isl_printer_print_str(p, \"*\");\n\t\tp = isl_printer_print_str(p, array->name);\n\t\treturn p;\n\t}\n\n\tindex = isl_ast_expr_copy(stmt->u.c.index);\n\n\tp = isl_printer_print_ast_expr(p, index);\n\tisl_ast_expr_free(index);\n\n\treturn p;\n}\n\n/* Print a copy statement.\n *\n * A read copy statement is printed as\n *\n *\tlocal = global;\n *\n * while a write copy statement is printed as\n *\n *\tglobal = local;\n */\n__isl_give isl_printer *ppcg_kernel_print_copy(__isl_take isl_printer *p,\n\tstruct ppcg_kernel_stmt *stmt)\n{\n\tp = isl_printer_start_line(p);\n\tif (stmt->u.c.read) {\n\t\tp = stmt_print_local_index(p, stmt);\n\t\tp = isl_printer_print_str(p, \" = \");\n\t\tp = stmt_print_global_index(p, stmt);\n\t} else {\n\t\tp = stmt_print_global_index(p, stmt);\n\t\tp = isl_printer_print_str(p, \" = \");\n\t\tp = stmt_print_local_index(p, stmt);\n\t}\n\tp = isl_printer_print_str(p, \";\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\n__isl_give isl_printer *ppcg_kernel_print_domain(__isl_take isl_printer *p,\n\tstruct ppcg_kernel_stmt *stmt)\n{\n\treturn pet_stmt_print_body(stmt->u.d.stmt->stmt, p, stmt->u.d.ref2expr);\n}\n\n/* This function is called for each node in a GPU AST.\n * In case of a user node, print the macro definitions required\n * for printing the AST expressions in the annotation, if any.\n * For other nodes, return true such that descendants are also\n * visited.\n *\n * In particular, for a kernel launch, print the macro definitions\n * needed for the grid size.\n * For a copy statement, print the macro definitions needed\n * for the two index expressions.\n * For an original user statement, print the macro definitions\n * needed for the substitutions.\n */\nstatic isl_bool at_node(__isl_keep isl_ast_node *node, void *user)\n{\n\tconst char *name;\n\tisl_id *id;\n\tint is_kernel;\n\tstruct ppcg_kernel *kernel;\n\tstruct ppcg_kernel_stmt *stmt;\n\tisl_printer **p = user;\n\n\tif (isl_ast_node_get_type(node) != isl_ast_node_user)\n\t\treturn isl_bool_true;\n\n\tid = isl_ast_node_get_annotation(node);\n\tif (!id)\n\t\treturn isl_bool_false;\n\n\tname = isl_id_get_name(id);\n\tif (!name)\n\t\treturn isl_bool_error;\n\tis_kernel = !strcmp(name, \"kernel\");\n\tkernel = is_kernel ? isl_id_get_user(id) : NULL;\n\tstmt = is_kernel ? NULL : isl_id_get_user(id);\n\tisl_id_free(id);\n\n\tif ((is_kernel && !kernel) || (!is_kernel && !stmt))\n\t\treturn isl_bool_error;\n\n\tif (is_kernel) {\n\t\t*p = ppcg_ast_expr_print_macros(kernel->grid_size_expr, *p);\n\t} else if (stmt->type == ppcg_kernel_copy) {\n\t\t*p = ppcg_ast_expr_print_macros(stmt->u.c.index, *p);\n\t\t*p = ppcg_ast_expr_print_macros(stmt->u.c.local_index, *p);\n\t} else if (stmt->type == ppcg_kernel_domain) {\n\t\t*p = ppcg_print_body_macros(*p, stmt->u.d.ref2expr);\n\t}\n\tif (!*p)\n\t\treturn isl_bool_error;\n\n\treturn isl_bool_false;\n}\n\n/* Print the required macros for the GPU AST \"node\" to \"p\",\n * including those needed for the user statements inside the AST.\n */\n__isl_give isl_printer *gpu_print_macros(__isl_take isl_printer *p,\n\t__isl_keep isl_ast_node *node)\n{\n\tif (isl_ast_node_foreach_descendant_top_down(node, &at_node, &p) < 0)\n\t\treturn isl_printer_free(p);\n\tp = ppcg_print_macros(p, node);\n\treturn p;\n}\n\n/* Was the definition of \"type\" printed before?\n * That is, does its name appear in the list of printed types \"types\"?\n */\nstatic int already_printed(struct gpu_types *types,\n\tstruct pet_type *type)\n{\n\tint i;\n\n\tfor (i = 0; i < types->n; ++i)\n\t\tif (!strcmp(types->name[i], type->name))\n\t\t\treturn 1;\n\n\treturn 0;\n}\n\n/* Print the definitions of all types prog->scop that have not been\n * printed before (according to \"types\") on \"p\".\n * Extend the list of printed types \"types\" with the newly printed types.\n */\n__isl_give isl_printer *gpu_print_types(__isl_take isl_printer *p,\n\tstruct gpu_types *types, struct gpu_prog *prog)\n{\n\tint i, n;\n\tisl_ctx *ctx;\n\tchar **name;\n\n\tn = prog->scop->pet->n_type;\n\n\tif (n == 0)\n\t\treturn p;\n\n\tctx = isl_printer_get_ctx(p);\n\tname = isl_realloc_array(ctx, types->name, char *, types->n + n);\n\tif (!name)\n\t\treturn isl_printer_free(p);\n\ttypes->name = name;\n\n\tfor (i = 0; i < n; ++i) {\n\t\tstruct pet_type *type = prog->scop->pet->types[i];\n\n\t\tif (already_printed(types, type))\n\t\t\tcontinue;\n\n\t\tp = isl_printer_start_line(p);\n\t\tp = isl_printer_print_str(p, type->definition);\n\t\tp = isl_printer_print_str(p, \";\");\n\t\tp = isl_printer_end_line(p);\n\n\t\ttypes->name[types->n++] = strdup(type->name);\n\t}\n\n\treturn p;\n}\n"
  },
  {
    "path": "src/ppcg_files/gpu_print.h",
    "content": "#ifndef GPU_PRINT_H\n#define GPU_PRINT_H\n\n#include \"gpu.h\"\n\n__isl_give isl_printer *gpu_print_local_declarations(__isl_take isl_printer *p,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t struct gpu_prog *prog);\n\n__isl_give isl_printer *gpu_print_types(__isl_take isl_printer *p,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tstruct gpu_types *types, struct gpu_prog *prog);\n\n__isl_give isl_printer *gpu_print_macros(__isl_take isl_printer *p,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t __isl_keep isl_ast_node *node);\n\n__isl_give isl_printer *gpu_array_info_print_size(__isl_take isl_printer *prn,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tstruct gpu_array_info *array);\n__isl_give isl_printer *gpu_array_info_print_declaration_argument(\n\t\t__isl_take isl_printer *p, struct gpu_array_info *array,\n\t\tconst char *memory_space);\n__isl_give isl_printer *gpu_array_info_print_call_argument(\n\t\t__isl_take isl_printer *p, struct gpu_array_info *array);\n\n__isl_give isl_printer *ppcg_kernel_print_copy(__isl_take isl_printer *p,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t struct ppcg_kernel_stmt *stmt);\n__isl_give isl_printer *ppcg_kernel_print_domain(__isl_take isl_printer *p,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t struct ppcg_kernel_stmt *stmt);\n\n#endif\n"
  },
  {
    "path": "src/ppcg_files/gpu_tree.c",
    "content": "/*\n * Copyright 2013      Ecole Normale Superieure\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege,\n * Ecole Normale Superieure, 45 rue d'Ulm, 75230 Paris, France\n */\n\n#include <string.h>\n\n#include <isl/space.h>\n#include <isl/set.h>\n#include <isl/union_set.h>\n\n#include \"gpu_tree.h\"\n\n/* The functions in this file are used to navigate part of a schedule tree\n * that is mapped to blocks.  Initially, this part consists of a linear\n * branch segment with a mark node with name \"kernel\" on the outer end\n * and a mark node with name \"thread\" on the inner end.\n * During the mapping to blocks, branching may be introduced, but only\n * one of the elements in each sequence contains the \"thread\" mark.\n * The filter of this element (and only this filter) contains\n * domain elements identified by the \"core\" argument of the functions\n * that move down this tree.\n *\n * Synchronization statements have a name that starts with \"sync\" and\n * a user pointer pointing to the kernel that contains the synchronization.\n * The functions inserting or detecting synchronizations take a ppcg_kernel\n * argument to be able to create or identify such statements.\n * They may also use two fields in this structure, the \"core\" field\n * to move around in the tree and the \"n_sync\" field to make sure that\n * each synchronization has a different name (within the kernel).\n */\n\n/* Is \"node\" a mark node with an identifier called \"name\"?\n */\nstatic int is_marked(__isl_keep isl_schedule_node *node, const char *name)\n{\n\tisl_id *mark;\n\tint has_name;\n\n\tif (!node)\n\t\treturn -1;\n\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_mark)\n\t\treturn 0;\n\n\tmark = isl_schedule_node_mark_get_id(node);\n\tif (!mark)\n\t\treturn -1;\n\n\thas_name = !strcmp(isl_id_get_name(mark), name);\n\tisl_id_free(mark);\n\n\treturn has_name;\n}\n\n/* Is \"node\" a mark node with an identifier called \"kernel\"?\n */\nint gpu_tree_node_is_kernel(__isl_keep isl_schedule_node *node)\n{\n\treturn is_marked(node, \"kernel\");\n}\n\n/* Is \"node\" a mark node with an identifier called \"shared\"?\n */\nstatic int node_is_shared(__isl_keep isl_schedule_node *node)\n{\n\treturn is_marked(node, \"shared\");\n}\n\n/* Is \"node\" a mark node with an identifier called \"thread\"?\n */\nstatic int node_is_thread(__isl_keep isl_schedule_node *node)\n{\n\treturn is_marked(node, \"thread\");\n}\n\n/* Insert a mark node with identifier \"shared\" in front of \"node\".\n */\nstatic __isl_give isl_schedule_node *insert_shared(\n\t__isl_take isl_schedule_node *node)\n{\n\tisl_ctx *ctx;\n\tisl_id *id;\n\n\tctx = isl_schedule_node_get_ctx(node);\n\tid = isl_id_alloc(ctx, \"shared\", NULL);\n\tnode = isl_schedule_node_insert_mark(node, id);\n\n\treturn node;\n}\n\n/* Insert a \"shared\" mark in front of the \"thread\" mark\n * provided the linear branch between \"node\" and the \"thread\" mark\n * does not contain such a \"shared\" mark already.\n *\n * As a side effect, this function checks that the subtree at \"node\"\n * actually contains a \"thread\" mark and that there is no branching\n * in between \"node\" and this \"thread\" mark.\n */\n__isl_give isl_schedule_node *gpu_tree_insert_shared_before_thread(\n\t__isl_take isl_schedule_node *node)\n{\n\tint depth0, depth;\n\tint any_shared = 0;\n\n\tif (!node)\n\t\treturn NULL;\n\n\tdepth0 = isl_schedule_node_get_tree_depth(node);\n\n\tfor (;;) {\n\t\tint is_thread;\n\t\tint n;\n\n\t\tif (!any_shared) {\n\t\t\tany_shared = node_is_shared(node);\n\t\t\tif (any_shared < 0)\n\t\t\t\treturn isl_schedule_node_free(node);\n\t\t}\n\t\tis_thread = node_is_thread(node);\n\t\tif (is_thread < 0)\n\t\t\treturn isl_schedule_node_free(node);\n\t\tif (is_thread)\n\t\t\tbreak;\n\t\tn = isl_schedule_node_n_children(node);\n\t\tif (n == 0)\n\t\t\tisl_die(isl_schedule_node_get_ctx(node),\n\t\t\t\tisl_error_invalid,\n\t\t\t\t\"no thread marker found\",\n\t\t\t\treturn isl_schedule_node_free(node));\n\t\tif (n > 1)\n\t\t\tisl_die(isl_schedule_node_get_ctx(node),\n\t\t\t\tisl_error_invalid,\n\t\t\t\t\"expecting single thread marker\",\n\t\t\t\treturn isl_schedule_node_free(node));\n\n\t\tnode = isl_schedule_node_child(node, 0);\n\t}\n\n\tif (!any_shared)\n\t\tnode = insert_shared(node);\n\tdepth = isl_schedule_node_get_tree_depth(node);\n\tnode = isl_schedule_node_ancestor(node, depth - depth0);\n\n\treturn node;\n}\n\n/* Assuming \"node\" is a filter node, does it correspond to the branch\n * that contains the \"thread\" mark, i.e., does it contain any elements\n * in \"core\"?\n */\nstatic int node_is_core(__isl_keep isl_schedule_node *node,\n\t__isl_keep isl_union_set *core)\n{\n\tint disjoint;\n\tisl_union_set *filter;\n\n\tfilter = isl_schedule_node_filter_get_filter(node);\n\tdisjoint = isl_union_set_is_disjoint(filter, core);\n\tisl_union_set_free(filter);\n\tif (disjoint < 0)\n\t\treturn -1;\n\n\treturn !disjoint;\n}\n\n/* Move to the only child of \"node\" that has the \"thread\" mark as descendant,\n * where the branch containing this mark is identified by the domain elements\n * in \"core\".\n *\n * If \"node\" is not a sequence, then it only has one child and we move\n * to that single child.\n * Otherwise, we check each of the filters in the children, pick\n * the one that corresponds to \"core\" and return a pointer to the child\n * of the filter node.\n */\nstatic __isl_give isl_schedule_node *core_child(\n\t__isl_take isl_schedule_node *node, __isl_keep isl_union_set *core)\n{\n\tint i, n;\n\n\tif (isl_schedule_node_get_type(node) != isl_schedule_node_sequence)\n\t\treturn isl_schedule_node_child(node, 0);\n\n\tn = isl_schedule_node_n_children(node);\n\tfor (i = 0; i < n; ++i) {\n\t\tint is_core;\n\n\t\tnode = isl_schedule_node_child(node, i);\n\t\tis_core = node_is_core(node, core);\n\n\t\tif (is_core < 0)\n\t\t\treturn isl_schedule_node_free(node);\n\t\tif (is_core)\n\t\t\treturn isl_schedule_node_child(node, 0);\n\n\t\tnode = isl_schedule_node_parent(node);\n\t}\n\n\tisl_die(isl_schedule_node_get_ctx(node), isl_error_internal,\n\t\t\"core child not found\", return isl_schedule_node_free(node));\n}\n\n/* Move down the branch between \"kernel\" and \"thread\" until\n * the \"shared\" mark is reached, where the branch containing the \"shared\"\n * mark is identified by the domain elements in \"core\".\n */\n__isl_give isl_schedule_node *gpu_tree_move_down_to_shared(\n\t__isl_take isl_schedule_node *node, __isl_keep isl_union_set *core)\n{\n\tint is_shared;\n\n\twhile ((is_shared = node_is_shared(node)) == 0)\n\t\tnode = core_child(node, core);\n\tif (is_shared < 0)\n\t\tnode = isl_schedule_node_free(node);\n\n\treturn node;\n}\n\n/* Move down the branch between \"kernel\" and \"thread\" until\n * the \"thread\" mark is reached, where the branch containing the \"thread\"\n * mark is identified by the domain elements in \"core\".\n */\n__isl_give isl_schedule_node *gpu_tree_move_down_to_thread(\n\t__isl_take isl_schedule_node *node, __isl_keep isl_union_set *core)\n{\n\tint is_thread;\n\n\twhile ((is_thread = node_is_thread(node)) == 0)\n\t\tnode = core_child(node, core);\n\tif (is_thread < 0)\n\t\tnode = isl_schedule_node_free(node);\n\n\treturn node;\n}\n\n/* Move up the tree underneath the \"thread\" mark until\n * the \"thread\" mark is reached.\n */\n__isl_give isl_schedule_node *gpu_tree_move_up_to_thread(\n\t__isl_take isl_schedule_node *node)\n{\n\tint is_thread;\n\n\twhile ((is_thread = node_is_thread(node)) == 0)\n\t\tnode = isl_schedule_node_parent(node);\n\tif (is_thread < 0)\n\t\tnode = isl_schedule_node_free(node);\n\n\treturn node;\n}\n\n/* Move up the tree underneath the \"kernel\" mark until\n * the \"kernel\" mark is reached.\n */\n__isl_give isl_schedule_node *gpu_tree_move_up_to_kernel(\n\t__isl_take isl_schedule_node *node)\n{\n\tint is_kernel;\n\n\twhile ((is_kernel = gpu_tree_node_is_kernel(node)) == 0)\n\t\tnode = isl_schedule_node_parent(node);\n\tif (is_kernel < 0)\n\t\tnode = isl_schedule_node_free(node);\n\n\treturn node;\n}\n\n/* Move down from the \"kernel\" mark (or at least a node with schedule\n * depth smaller than or equal to \"depth\") to a band node at schedule\n * depth \"depth\".  The \"thread\" mark is assumed to have a schedule\n * depth greater than or equal to \"depth\".  The branch containing the\n * \"thread\" mark is identified by the domain elements in \"core\".\n *\n * If the desired schedule depth is in the middle of band node,\n * then the band node is split into two pieces, the second piece\n * at the desired schedule depth.\n */\n__isl_give isl_schedule_node *gpu_tree_move_down_to_depth(\n\t__isl_take isl_schedule_node *node, int depth,\n\t__isl_keep isl_union_set *core)\n{\n\tint is_shared;\n\tint is_thread = 0;\n\n\twhile (node && isl_schedule_node_get_schedule_depth(node) < depth) {\n\t\tif (isl_schedule_node_get_type(node) ==\n\t\t\t\t\t\t    isl_schedule_node_band) {\n\t\t\tint node_depth, node_dim;\n\t\t\tnode_depth = isl_schedule_node_get_schedule_depth(node);\n\t\t\tnode_dim = isl_schedule_node_band_n_member(node);\n\t\t\tif (node_depth + node_dim > depth)\n\t\t\t\tnode = isl_schedule_node_band_split(node,\n\t\t\t\t\t\t\tdepth - node_depth);\n\t\t}\n\t\tnode = core_child(node, core);\n\t}\n\twhile ((is_shared = node_is_shared(node)) == 0 &&\n\t    (is_thread = node_is_thread(node)) == 0 &&\n\t    isl_schedule_node_get_type(node) != isl_schedule_node_band)\n\t\tnode = core_child(node, core);\n\tif (is_shared < 0 || is_thread < 0)\n\t\tnode = isl_schedule_node_free(node);\n\n\treturn node;\n}\n\n/* Create a union set containing a single set with a tuple identifier\n * called \"syncX\" and user pointer equal to \"kernel\".\n */\nstatic __isl_give isl_union_set *create_sync_domain(struct ppcg_kernel *kernel)\n{\n\tisl_space *space;\n\tisl_id *id;\n\tchar name[40];\n\n\tspace = isl_space_set_alloc(kernel->ctx, 0, 0);\n\tsnprintf(name, sizeof(name), \"sync%d\", kernel->n_sync++);\n\tid = isl_id_alloc(kernel->ctx, name, kernel);\n\tspace = isl_space_set_tuple_id(space, isl_dim_set, id);\n\treturn isl_union_set_from_set(isl_set_universe(space));\n}\n\n/* Is \"id\" the identifier of a synchronization statement inside \"kernel\"?\n * That is, does its name start with \"sync\" and does it point to \"kernel\"?\n */\nint gpu_tree_id_is_sync(__isl_keep isl_id *id, struct ppcg_kernel *kernel)\n{\n\tconst char *name;\n\n\tname = isl_id_get_name(id);\n\tif (!name)\n\t\treturn 0;\n\telse if (strncmp(name, \"sync\", 4))\n\t\treturn 0;\n\treturn isl_id_get_user(id) == kernel;\n}\n\n/* Does \"domain\" consist of a single set with a tuple identifier\n * corresponding to a synchronization for \"kernel\"?\n */\nstatic int domain_is_sync(__isl_keep isl_union_set *domain,\n\tstruct ppcg_kernel *kernel)\n{\n\tint is_sync;\n\tisl_id *id;\n\tisl_set *set;\n\n\tif (isl_union_set_n_set(domain) != 1)\n\t\treturn 0;\n\tset = isl_set_from_union_set(isl_union_set_copy(domain));\n\tid = isl_set_get_tuple_id(set);\n\tis_sync = gpu_tree_id_is_sync(id, kernel);\n\tisl_id_free(id);\n\tisl_set_free(set);\n\n\treturn is_sync;\n}\n\n/* Does \"node\" point to a filter selecting a synchronization statement\n * for \"kernel\"?\n */\nstatic int node_is_sync_filter(__isl_keep isl_schedule_node *node,\n\tstruct ppcg_kernel *kernel)\n{\n\tint is_sync;\n\tenum isl_schedule_node_type type;\n\tisl_union_set *domain;\n\n\tif (!node)\n\t\treturn -1;\n\ttype = isl_schedule_node_get_type(node);\n\tif (type != isl_schedule_node_filter)\n\t\treturn 0;\n\tdomain = isl_schedule_node_filter_get_filter(node);\n\tis_sync = domain_is_sync(domain, kernel);\n\tisl_union_set_free(domain);\n\n\treturn is_sync;\n}\n\n/* Is \"node\" part of a sequence with a previous synchronization statement\n * for \"kernel\"?\n * That is, is the parent of \"node\" a filter such that there is\n * a previous filter that picks out exactly such a synchronization statement?\n */\nstatic int has_preceding_sync(__isl_keep isl_schedule_node *node,\n\tstruct ppcg_kernel *kernel)\n{\n\tint found = 0;\n\n\tnode = isl_schedule_node_copy(node);\n\tnode = isl_schedule_node_parent(node);\n\twhile (!found && isl_schedule_node_has_previous_sibling(node)) {\n\t\tnode = isl_schedule_node_previous_sibling(node);\n\t\tif (!node)\n\t\t\tbreak;\n\t\tfound = node_is_sync_filter(node, kernel);\n\t}\n\tif (!node)\n\t\tfound = -1;\n\tisl_schedule_node_free(node);\n\n\treturn found;\n}\n\n/* Is \"node\" part of a sequence with a subsequent synchronization statement\n * for \"kernel\"?\n * That is, is the parent of \"node\" a filter such that there is\n * a subsequent filter that picks out exactly such a synchronization statement?\n */\nstatic int has_following_sync(__isl_keep isl_schedule_node *node,\n\tstruct ppcg_kernel *kernel)\n{\n\tint found = 0;\n\n\tnode = isl_schedule_node_copy(node);\n\tnode = isl_schedule_node_parent(node);\n\twhile (!found && isl_schedule_node_has_next_sibling(node)) {\n\t\tnode = isl_schedule_node_next_sibling(node);\n\t\tif (!node)\n\t\t\tbreak;\n\t\tfound = node_is_sync_filter(node, kernel);\n\t}\n\tif (!node)\n\t\tfound = -1;\n\tisl_schedule_node_free(node);\n\n\treturn found;\n}\n\n/* Does the subtree rooted at \"node\" (which is a band node) contain\n * any synchronization statement for \"kernel\" that precedes\n * the core computation of \"kernel\" (identified by the elements\n * in kernel->core)?\n */\nstatic int has_sync_before_core(__isl_keep isl_schedule_node *node,\n\tstruct ppcg_kernel *kernel)\n{\n\tint has_sync = 0;\n\tint is_thread;\n\n\tnode = isl_schedule_node_copy(node);\n\twhile ((is_thread = node_is_thread(node)) == 0) {\n\t\tnode = core_child(node, kernel->core);\n\t\thas_sync = has_preceding_sync(node, kernel);\n\t\tif (has_sync < 0 || has_sync)\n\t\t\tbreak;\n\t}\n\tif (is_thread < 0 || !node)\n\t\thas_sync = -1;\n\tisl_schedule_node_free(node);\n\n\treturn has_sync;\n}\n\n/* Does the subtree rooted at \"node\" (which is a band node) contain\n * any synchronization statement for \"kernel\" that follows\n * the core computation of \"kernel\" (identified by the elements\n * in kernel->core)?\n */\nstatic int has_sync_after_core(__isl_keep isl_schedule_node *node,\n\tstruct ppcg_kernel *kernel)\n{\n\tint has_sync = 0;\n\tint is_thread;\n\n\tnode = isl_schedule_node_copy(node);\n\twhile ((is_thread = node_is_thread(node)) == 0) {\n\t\tnode = core_child(node, kernel->core);\n\t\thas_sync = has_following_sync(node, kernel);\n\t\tif (has_sync < 0 || has_sync)\n\t\t\tbreak;\n\t}\n\tif (is_thread < 0 || !node)\n\t\thas_sync = -1;\n\tisl_schedule_node_free(node);\n\n\treturn has_sync;\n}\n\n/* Insert (or extend) an extension on top of \"node\" that puts\n * a synchronization node for \"kernel\" before \"node\".\n * Return a pointer to the original node in the updated schedule tree.\n */\nstatic __isl_give isl_schedule_node *insert_sync_before(\n\t__isl_take isl_schedule_node *node, struct ppcg_kernel *kernel)\n{\n\tisl_union_set *domain;\n\tisl_schedule_node *graft;\n\n\tif (!node)\n\t\treturn NULL;\n\n\tdomain = create_sync_domain(kernel);\n\tgraft = isl_schedule_node_from_domain(domain);\n\tnode = isl_schedule_node_graft_before(node, graft);\n\n\treturn node;\n}\n\n/* Insert (or extend) an extension on top of \"node\" that puts\n * a synchronization node for \"kernel\" afater \"node\".\n * Return a pointer to the original node in the updated schedule tree.\n */\nstatic __isl_give isl_schedule_node *insert_sync_after(\n\t__isl_take isl_schedule_node *node, struct ppcg_kernel *kernel)\n{\n\tisl_union_set *domain;\n\tisl_schedule_node *graft;\n\n\tif (!node)\n\t\treturn NULL;\n\n\tdomain = create_sync_domain(kernel);\n\tgraft = isl_schedule_node_from_domain(domain);\n\tnode = isl_schedule_node_graft_after(node, graft);\n\n\treturn node;\n}\n\n/* Insert an extension on top of \"node\" that puts a synchronization node\n * for \"kernel\" before \"node\" unless there already is\n * such a synchronization node.\n */\n__isl_give isl_schedule_node *gpu_tree_ensure_preceding_sync(\n\t__isl_take isl_schedule_node *node, struct ppcg_kernel *kernel)\n{\n\tint has_sync;\n\n\thas_sync = has_preceding_sync(node, kernel);\n\tif (has_sync < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (has_sync)\n\t\treturn node;\n\treturn insert_sync_before(node, kernel);\n}\n\n/* Insert an extension on top of \"node\" that puts a synchronization node\n * for \"kernel\" after \"node\" unless there already is\n * such a synchronization node.\n */\n__isl_give isl_schedule_node *gpu_tree_ensure_following_sync(\n\t__isl_take isl_schedule_node *node, struct ppcg_kernel *kernel)\n{\n\tint has_sync;\n\n\thas_sync = has_following_sync(node, kernel);\n\tif (has_sync < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (has_sync)\n\t\treturn node;\n\treturn insert_sync_after(node, kernel);\n}\n\n/* Insert an extension on top of \"node\" that puts a synchronization node\n * for \"kernel\" after \"node\" unless there already is such a sync node or\n * \"node\" itself already * contains a synchronization node following\n * the core computation of \"kernel\".\n */\n__isl_give isl_schedule_node *gpu_tree_ensure_sync_after_core(\n\t__isl_take isl_schedule_node *node, struct ppcg_kernel *kernel)\n{\n\tint has_sync;\n\n\thas_sync = has_sync_after_core(node, kernel);\n\tif (has_sync < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (has_sync)\n\t\treturn node;\n\thas_sync = has_following_sync(node, kernel);\n\tif (has_sync < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (has_sync)\n\t\treturn node;\n\treturn insert_sync_after(node, kernel);\n}\n\n/* Move left in the sequence on top of \"node\" to a synchronization node\n * for \"kernel\".\n * If \"node\" itself contains a synchronization node preceding\n * the core computation of \"kernel\", then return \"node\" itself.\n * Otherwise, if \"node\" does not have a preceding synchronization node,\n * then create one first.\n */\n__isl_give isl_schedule_node *gpu_tree_move_left_to_sync(\n\t__isl_take isl_schedule_node *node, struct ppcg_kernel *kernel)\n{\n\tint has_sync;\n\tint is_sync;\n\n\thas_sync = has_sync_before_core(node, kernel);\n\tif (has_sync < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (has_sync)\n\t\treturn node;\n\tnode = gpu_tree_ensure_preceding_sync(node, kernel);\n\tnode = isl_schedule_node_parent(node);\n\twhile ((is_sync = node_is_sync_filter(node, kernel)) == 0)\n\t\tnode = isl_schedule_node_previous_sibling(node);\n\tif (is_sync < 0)\n\t\tnode = isl_schedule_node_free(node);\n\tnode = isl_schedule_node_child(node, 0);\n\n\treturn node;\n}\n\n/* Move right in the sequence on top of \"node\" to a synchronization node\n * for \"kernel\".\n * If \"node\" itself contains a synchronization node following\n * the core computation of \"kernel\", then return \"node\" itself.\n * Otherwise, if \"node\" does not have a following synchronization node,\n * then create one first.\n */\n__isl_give isl_schedule_node *gpu_tree_move_right_to_sync(\n\t__isl_take isl_schedule_node *node, struct ppcg_kernel *kernel)\n{\n\tint has_sync;\n\tint is_sync;\n\n\thas_sync = has_sync_after_core(node, kernel);\n\tif (has_sync < 0)\n\t\treturn isl_schedule_node_free(node);\n\tif (has_sync)\n\t\treturn node;\n\tnode = gpu_tree_ensure_following_sync(node, kernel);\n\tnode = isl_schedule_node_parent(node);\n\twhile ((is_sync = node_is_sync_filter(node, kernel)) == 0)\n\t\tnode = isl_schedule_node_next_sibling(node);\n\tif (is_sync < 0)\n\t\tnode = isl_schedule_node_free(node);\n\tnode = isl_schedule_node_child(node, 0);\n\n\treturn node;\n}\n"
  },
  {
    "path": "src/ppcg_files/gpu_tree.h",
    "content": "#ifndef GPU_TREE_H\n#define GPU_TREE_H\n\n#include <isl/schedule_node.h>\n\n#include \"gpu.h\"\n\n__isl_give isl_schedule_node *gpu_tree_insert_shared_before_thread(\n\t\t__isl_take isl_schedule_node *node);\nint gpu_tree_node_is_kernel(__isl_keep isl_schedule_node *node);\n__isl_give isl_schedule_node *gpu_tree_move_down_to_shared(\n\t\t__isl_take isl_schedule_node *node, __isl_keep isl_union_set *core);\n__isl_give isl_schedule_node *gpu_tree_move_up_to_thread(\n\t\t__isl_take isl_schedule_node *node);\n__isl_give isl_schedule_node *gpu_tree_move_down_to_thread(\n\t\t__isl_take isl_schedule_node *node, __isl_keep isl_union_set *core);\n__isl_give isl_schedule_node *gpu_tree_move_up_to_kernel(\n\t\t__isl_take isl_schedule_node *node);\n__isl_give isl_schedule_node *gpu_tree_move_down_to_depth(\n\t\t__isl_take isl_schedule_node *node, int depth,\n\t\t__isl_keep isl_union_set *core);\n\nint gpu_tree_id_is_sync(__isl_keep isl_id *id, struct ppcg_kernel *kernel);\n__isl_give isl_schedule_node *gpu_tree_ensure_sync_after_core(\n\t\t__isl_take isl_schedule_node *node, struct ppcg_kernel *kernel);\n__isl_give isl_schedule_node *gpu_tree_ensure_following_sync(\n\t\t__isl_take isl_schedule_node *node, struct ppcg_kernel *kernel);\n__isl_give isl_schedule_node *gpu_tree_move_left_to_sync(\n\t\t__isl_take isl_schedule_node *node, struct ppcg_kernel *kernel);\n__isl_give isl_schedule_node *gpu_tree_move_right_to_sync(\n\t\t__isl_take isl_schedule_node *node, struct ppcg_kernel *kernel);\n\n#endif\n"
  },
  {
    "path": "src/ppcg_files/opencl.c",
    "content": "/*\n * Copyright 2013      Ecole Normale Superieure\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege and Riyadh Baghdadi,\n * Ecole Normale Superieure, 45 rue d’Ulm, 75230 Paris, France\n */\n\n#include <ctype.h>\n#include <limits.h>\n#include <string.h>\n\n#include <isl/aff.h>\n#include <isl/ast.h>\n\n#include \"opencl.h\"\n#include \"gpu_print.h\"\n#include \"gpu.h\"\n#include \"ppcg.h\"\n#include \"print.h\"\n#include \"schedule.h\"\n#include \"util.h\"\n\n#define min(a, b)  (((a) < (b)) ? (a) : (b))\n#define max(a, b)  (((a) > (b)) ? (a) : (b))\n\n/* options are the global options passed to generate_opencl.\n * input is the name of the input file.\n * output is the user-specified output file name and may be NULL\n *\tif not specified by the user.\n * kernel_c_name is the name of the kernel_c file.\n * kprinter is an isl_printer for the kernel file.\n * host_c is the generated source file for the host code.  kernel_c is\n * the generated source file for the kernel.\n */\nstruct opencl_info {\n\tstruct ppcg_options *options;\n\tconst char *input;\n\tconst char *output;\n\tchar kernel_c_name[PATH_MAX];\n\n\tisl_printer *kprinter;\n\n\tFILE *host_c;\n\tFILE *kernel_c;\n};\n\n/* Open the file called \"name\" for writing or print an error message.\n */\nstatic FILE *open_or_croak(const char *name)\n{\n\tFILE *file;\n\n\tfile = fopen(name, \"w\");\n\tif (!file)\n\t\tfprintf(stderr, \"Failed to open \\\"%s\\\" for writing\\n\", name);\n\treturn file;\n}\n\n/* Open the host .c file and the kernel .h and .cl files for writing.\n * Their names are derived from info->output (or info->input if\n * the user did not specify an output file name).\n * Add the necessary includes to these files, including those specified\n * by the user.\n *\n * Return 0 on success and -1 on failure.\n */\nstatic int opencl_open_files(struct opencl_info *info)\n{\n\tchar name[PATH_MAX];\n\tint i;\n\tint len;\n\n\tif (info->output) {\n\t\tconst char *ext;\n\n\t\text = strrchr(info->output, '.');\n\t\tlen = ext ? ext - info->output : strlen(info->output);\n\t\tmemcpy(name, info->output, len);\n\n\t\tinfo->host_c = open_or_croak(info->output);\n\t} else {\n\t\tlen = ppcg_extract_base_name(name, info->input);\n\n\t\tstrcpy(name + len, \"_host.c\");\n\t\tinfo->host_c = open_or_croak(name);\n\t}\n\n\tmemcpy(info->kernel_c_name, name, len);\n\tstrcpy(info->kernel_c_name + len, \"_kernel.cl\");\n\tinfo->kernel_c = open_or_croak(info->kernel_c_name);\n\n\tif (!info->host_c || !info->kernel_c)\n\t\treturn -1;\n\n\tfprintf(info->host_c, \"#include <assert.h>\\n\");\n\tfprintf(info->host_c, \"#include <stdio.h>\\n\");\n\tfprintf(info->host_c, \"#include \\\"ocl_utilities.h\\\"\\n\");\n\tif (info->options->opencl_embed_kernel_code) {\n\t\tfprintf(info->host_c, \"#include \\\"%s\\\"\\n\\n\",\n\t\t\tinfo->kernel_c_name);\n\t}\n\n\tfor (i = 0; i < info->options->opencl_n_include_file; ++i) {\n\t\tinfo->kprinter = isl_printer_print_str(info->kprinter,\n\t\t\t\t\t\"#include <\");\n\t\tinfo->kprinter = isl_printer_print_str(info->kprinter,\n\t\t\t\t\tinfo->options->opencl_include_files[i]);\n\t\tinfo->kprinter = isl_printer_print_str(info->kprinter, \">\\n\");\n\t}\n\n\treturn 0;\n}\n\n/* Write text to a file and escape some special characters that would break a\n * C string.\n */\nstatic void opencl_print_escaped(const char *str, const char *end, FILE *file)\n{\n\tconst char *prev = str;\n\n\twhile ((str = strpbrk(prev, \"\\\"\\\\\")) && str < end) {\n\t\tfwrite(prev, 1, str - prev, file);\n\t\tfprintf(file, \"\\\\%c\", *str);\n\t\tprev = str + 1;\n\t}\n\n\tif (*prev)\n\t\tfwrite(prev, 1, end - prev, file);\n}\n\n/* Write text to a file as a C string literal.\n *\n * This function also prints any characters after the last newline, although\n * normally the input string should end with a newline.\n */\nstatic void opencl_print_as_c_string(const char *str, FILE *file)\n{\n\tconst char *prev = str;\n\n\twhile ((str = strchr(prev, '\\n'))) {\n\t\tfprintf(file, \"\\n\\\"\");\n\t\topencl_print_escaped(prev, str, file);\n\t\tfprintf(file, \"\\\\n\\\"\");\n\n\t\tprev = str + 1;\n\t}\n\n\tif (*prev) {\n\t\tfprintf(file, \"\\n\\\"\");\n\t\topencl_print_escaped(prev, prev + strlen(prev), file);\n\t\tfprintf(file, \"\\\"\");\n\t}\n}\n\n/* Write the code that we have accumulated in the kernel isl_printer to the\n * kernel.cl file.  If the opencl_embed_kernel_code option has been set, print\n * the code as a C string literal.  Start that string literal with an empty\n * line, such that line numbers reported by the OpenCL C compiler match those\n * of the kernel file.\n *\n * Return 0 on success and -1 on failure.\n */\nstatic int opencl_write_kernel_file(struct opencl_info *opencl)\n{\n\tchar *raw = isl_printer_get_str(opencl->kprinter);\n\n\tif (!raw)\n\t\treturn -1;\n\n\tif (opencl->options->opencl_embed_kernel_code) {\n\t\tfprintf(opencl->kernel_c,\n\t\t\t\"static const char kernel_code[] = \\\"\\\\n\\\"\");\n\t\topencl_print_as_c_string(raw, opencl->kernel_c);\n\t\tfprintf(opencl->kernel_c, \";\\n\");\n\t} else\n\t\tfprintf(opencl->kernel_c, \"%s\", raw);\n\n\tfree(raw);\n\n\treturn 0;\n}\n\n/* Close all output files.  Write the kernel contents to the kernel file before\n * closing it.\n *\n * Return 0 on success and -1 on failure.\n */\nstatic int opencl_close_files(struct opencl_info *info)\n{\n\tint r = 0;\n\n\tif (info->kernel_c) {\n\t\tr = opencl_write_kernel_file(info);\n\t\tfclose(info->kernel_c);\n\t}\n\tif (info->host_c)\n\t\tfclose(info->host_c);\n\n\treturn r;\n}\n\nstatic __isl_give isl_printer *opencl_print_host_macros(\n\t__isl_take isl_printer *p)\n{\n\tconst char *macros =\n\t\t\"#define openclCheckReturn(ret) \\\\\\n\"\n\t\t\"  if (ret != CL_SUCCESS) {\\\\\\n\"\n\t\t\"    fprintf(stderr, \\\"OpenCL error: %s\\\\n\\\", \"\n\t\t\"opencl_error_string(ret)); \\\\\\n\"\n\t\t\"    fflush(stderr); \\\\\\n\"\n\t\t\"    assert(ret == CL_SUCCESS);\\\\\\n  }\\n\";\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, macros);\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\nstatic __isl_give isl_printer *opencl_declare_device_arrays(\n\t__isl_take isl_printer *p, struct gpu_prog *prog)\n{\n\tint i;\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tif (!gpu_array_requires_device_allocation(&prog->array[i]))\n\t\t\tcontinue;\n\t\tp = isl_printer_start_line(p);\n\t\tp = isl_printer_print_str(p, \"cl_mem dev_\");\n\t\tp = isl_printer_print_str(p, prog->array[i].name);\n\t\tp = isl_printer_print_str(p, \";\");\n\t\tp = isl_printer_end_line(p);\n\t}\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_end_line(p);\n\treturn p;\n}\n\n/* Given an array, check whether its positive size guard expression is\n * trivial.\n */\nstatic int is_array_positive_size_guard_trivial(struct gpu_array_info *array)\n{\n\tisl_set *guard;\n\tint is_trivial;\n\n\tguard = gpu_array_positive_size_guard(array);\n\tis_trivial = isl_set_plain_is_universe(guard);\n\tisl_set_free(guard);\n\treturn is_trivial;\n}\n\n/* Allocate a device array for \"array'.\n *\n * Emit a max-expression to ensure the device array can contain at least one\n * element if the array's positive size guard expression is not trivial.\n */\nstatic __isl_give isl_printer *allocate_device_array(__isl_take isl_printer *p,\n\tstruct gpu_array_info *array)\n{\n\tint need_lower_bound;\n\n\tneed_lower_bound = !is_array_positive_size_guard_trivial(array);\n\tif (need_lower_bound)\n\t\tp = ppcg_print_macro(isl_ast_op_max, p);\n\n\tp = ppcg_ast_expr_print_macros(array->bound_expr, p);\n\tp = ppcg_start_block(p);\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"dev_\");\n\tp = isl_printer_print_str(p, array->name);\n\tp = isl_printer_print_str(p, \" = clCreateBuffer(context, \");\n\tp = isl_printer_print_str(p, \"CL_MEM_READ_WRITE, \");\n\n\tif (need_lower_bound) {\n\t\tp = isl_printer_print_str(p, ppcg_max);\n\t\tp = isl_printer_print_str(p, \"(sizeof(\");\n\t\tp = isl_printer_print_str(p, array->type);\n\t\tp = isl_printer_print_str(p, \"), \");\n\t}\n\tp = gpu_array_info_print_size(p, array);\n\tif (need_lower_bound)\n\t\tp = isl_printer_print_str(p, \")\");\n\n\tp = isl_printer_print_str(p, \", NULL, &err);\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"openclCheckReturn(err);\");\n\tp = isl_printer_end_line(p);\n\n\tp = ppcg_end_block(p);\n\n\treturn p;\n}\n\n/* Allocate accessed device arrays.\n */\nstatic __isl_give isl_printer *opencl_allocate_device_arrays(\n\t__isl_take isl_printer *p, struct gpu_prog *prog)\n{\n\tint i;\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tstruct gpu_array_info *array = &prog->array[i];\n\n\t\tif (!gpu_array_requires_device_allocation(array))\n\t\t\tcontinue;\n\n\t\tp = allocate_device_array(p, array);\n\t}\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_end_line(p);\n\treturn p;\n}\n\n/* Free the device array corresponding to \"array\"\n */\nstatic __isl_give isl_printer *release_device_array(__isl_take isl_printer *p,\n\tstruct gpu_array_info *array)\n{\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"openclCheckReturn(\"\n\t\t\t\t\t\"clReleaseMemObject(dev_\");\n\tp = isl_printer_print_str(p, array->name);\n\tp = isl_printer_print_str(p, \"));\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\n/* Free the accessed device arrays.\n */\nstatic __isl_give isl_printer *opencl_release_device_arrays(\n\t__isl_take isl_printer *p, struct gpu_prog *prog)\n{\n\tint i;\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tstruct gpu_array_info *array = &prog->array[i];\n\t\tif (!gpu_array_requires_device_allocation(array))\n\t\t\tcontinue;\n\n\t\tp = release_device_array(p, array);\n\t}\n\treturn p;\n}\n\n/* Create an OpenCL device, context, command queue and build the kernel.\n * input is the name of the input file provided to ppcg.\n */\nstatic __isl_give isl_printer *opencl_setup(__isl_take isl_printer *p,\n\tconst char *input, struct opencl_info *info)\n{\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"cl_device_id device;\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"cl_context context;\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"cl_program program;\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"cl_command_queue queue;\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"cl_int err;\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"device = opencl_create_device(\");\n\tp = isl_printer_print_int(p, info->options->opencl_use_gpu);\n\tp = isl_printer_print_str(p, \");\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"context = clCreateContext(NULL, 1, \"\n\t\t\"&device, NULL, NULL, &err);\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"openclCheckReturn(err);\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"queue = clCreateCommandQueue\"\n\t\t\t\t\t\"(context, device, 0, &err);\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"openclCheckReturn(err);\");\n\tp = isl_printer_end_line(p);\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"program = \");\n\n\tif (info->options->opencl_embed_kernel_code) {\n\t\tp = isl_printer_print_str(p, \"opencl_build_program_from_string(\"\n\t\t\t\t\t\t\"context, device, kernel_code, \"\n\t\t\t\t\t\t\"sizeof(kernel_code), \\\"\");\n\t} else {\n\t\tp = isl_printer_print_str(p, \"opencl_build_program_from_file(\"\n\t\t\t\t\t\t\"context, device, \\\"\");\n\t\tp = isl_printer_print_str(p, info->kernel_c_name);\n\t\tp = isl_printer_print_str(p, \"\\\", \\\"\");\n\t}\n\n\tif (info->options->opencl_compiler_options)\n\t\tp = isl_printer_print_str(p,\n\t\t\t\t\tinfo->options->opencl_compiler_options);\n\n\tp = isl_printer_print_str(p, \"\\\");\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\nstatic __isl_give isl_printer *opencl_release_cl_objects(\n\t__isl_take isl_printer *p, struct opencl_info *info)\n{\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"openclCheckReturn(clReleaseCommandQueue\"\n\t\t\t\t\t\"(queue));\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"openclCheckReturn(clReleaseProgram\"\n\t\t\t\t\t\"(program));\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"openclCheckReturn(clReleaseContext\"\n\t\t\t\t\t\"(context));\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\n/* Print a call to the OpenCL clSetKernelArg() function which sets\n * the arguments of the kernel.  arg_name and arg_index are the name and the\n * index of the kernel argument.  The index of the leftmost argument of\n * the kernel is 0 whereas the index of the rightmost argument of the kernel\n * is n - 1, where n is the total number of the kernel arguments.\n * read_only_scalar is a boolean that indicates whether the argument is a read\n * only scalar.\n */\nstatic __isl_give isl_printer *opencl_set_kernel_argument(\n\t__isl_take isl_printer *p, int kernel_id,\n\tconst char *arg_name, int arg_index, int read_only_scalar)\n{\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p,\n\t\t\"openclCheckReturn(clSetKernelArg(kernel\");\n\tp = isl_printer_print_int(p, kernel_id);\n\tp = isl_printer_print_str(p, \", \");\n\tp = isl_printer_print_int(p, arg_index);\n\tp = isl_printer_print_str(p, \", sizeof(\");\n\n\tif (read_only_scalar) {\n\t\tp = isl_printer_print_str(p, arg_name);\n\t\tp = isl_printer_print_str(p, \"), &\");\n\t} else\n\t\tp = isl_printer_print_str(p, \"cl_mem), (void *) &dev_\");\n\n\tp = isl_printer_print_str(p, arg_name);\n\tp = isl_printer_print_str(p, \"));\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\n/* Print the block sizes as a list of the sizes in each\n * dimension.\n */\nstatic __isl_give isl_printer *opencl_print_block_sizes(\n\t__isl_take isl_printer *p, struct ppcg_kernel *kernel)\n{\n\tint i;\n\n\tif (kernel->n_block > 0)\n\t\tfor (i = 0; i < kernel->n_block; ++i) {\n\t\t\tif (i)\n\t\t\t\tp = isl_printer_print_str(p, \", \");\n\t\t\tp = isl_printer_print_int(p, kernel->block_dim[i]);\n\t\t}\n\telse\n\t\tp = isl_printer_print_str(p, \"1\");\n\n\treturn p;\n}\n\n/* Set the arguments of the OpenCL kernel by printing a call to the OpenCL\n * clSetKernelArg() function for each kernel argument.\n */\nstatic __isl_give isl_printer *opencl_set_kernel_arguments(\n\t__isl_take isl_printer *p, struct gpu_prog *prog,\n\tstruct ppcg_kernel *kernel)\n{\n\tint i, n, ro;\n\tunsigned nparam;\n\tisl_space *space;\n\tint arg_index = 0;\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tint required;\n\n\t\trequired = ppcg_kernel_requires_array_argument(kernel, i);\n\t\tif (required < 0)\n\t\t\treturn isl_printer_free(p);\n\t\tif (!required)\n\t\t\tcontinue;\n\t\tro = gpu_array_is_read_only_scalar(&prog->array[i]);\n\t\topencl_set_kernel_argument(p, kernel->id, prog->array[i].name,\n\t\t\targ_index, ro);\n\t\targ_index++;\n\t}\n\n\tspace = isl_union_set_get_space(kernel->arrays);\n\tnparam = isl_space_dim(space, isl_dim_param);\n\tfor (i = 0; i < nparam; ++i) {\n\t\tconst char *name;\n\n\t\tname = isl_space_get_dim_name(space, isl_dim_param, i);\n\t\topencl_set_kernel_argument(p, kernel->id, name, arg_index, 1);\n\t\targ_index++;\n\t}\n\tisl_space_free(space);\n\n\tn = isl_space_dim(kernel->space, isl_dim_set);\n\tfor (i = 0; i < n; ++i) {\n\t\tconst char *name;\n\n\t\tname = isl_space_get_dim_name(kernel->space, isl_dim_set, i);\n\t\topencl_set_kernel_argument(p, kernel->id, name, arg_index, 1);\n\t\targ_index++;\n\t}\n\n\treturn p;\n}\n\n/* Print the arguments to a kernel declaration or call.  If \"types\" is set,\n * then print a declaration (including the types of the arguments).\n *\n * The arguments are printed in the following order\n * - the arrays accessed by the kernel\n * - the parameters\n * - the host loop iterators\n */\nstatic __isl_give isl_printer *opencl_print_kernel_arguments(\n\t__isl_take isl_printer *p, struct gpu_prog *prog,\n\tstruct ppcg_kernel *kernel, int types)\n{\n\tint i, n;\n\tint first = 1;\n\tunsigned nparam;\n\tisl_space *space;\n\tconst char *type;\n\n\tfor (i = 0; i < prog->n_array; ++i) {\n\t\tint required;\n\n\t\trequired = ppcg_kernel_requires_array_argument(kernel, i);\n\t\tif (required < 0)\n\t\t\treturn isl_printer_free(p);\n\t\tif (!required)\n\t\t\tcontinue;\n\n\t\tif (!first)\n\t\t\tp = isl_printer_print_str(p, \", \");\n\n\t\tif (types)\n\t\t\tp = gpu_array_info_print_declaration_argument(p,\n\t\t\t\t&prog->array[i], \"__global\");\n\t\telse\n\t\t\tp = gpu_array_info_print_call_argument(p,\n\t\t\t\t&prog->array[i]);\n\n\t\tfirst = 0;\n\t}\n\n\tspace = isl_union_set_get_space(kernel->arrays);\n\tnparam = isl_space_dim(space, isl_dim_param);\n\tfor (i = 0; i < nparam; ++i) {\n\t\tconst char *name;\n\n\t\tname = isl_space_get_dim_name(space, isl_dim_param, i);\n\n\t\tif (!first)\n\t\t\tp = isl_printer_print_str(p, \", \");\n\t\tif (types)\n\t\t\tp = isl_printer_print_str(p, \"int \");\n\t\tp = isl_printer_print_str(p, name);\n\n\t\tfirst = 0;\n\t}\n\tisl_space_free(space);\n\n\tn = isl_space_dim(kernel->space, isl_dim_set);\n\ttype = isl_options_get_ast_iterator_type(prog->ctx);\n\tfor (i = 0; i < n; ++i) {\n\t\tconst char *name;\n\n\t\tif (!first)\n\t\t\tp = isl_printer_print_str(p, \", \");\n\t\tname = isl_space_get_dim_name(kernel->space, isl_dim_set, i);\n\t\tif (types) {\n\t\t\tp = isl_printer_print_str(p, type);\n\t\t\tp = isl_printer_print_str(p, \" \");\n\t\t}\n\t\tp = isl_printer_print_str(p, name);\n\n\t\tfirst = 0;\n\t}\n\n\treturn p;\n}\n\n/* Print the header of the given kernel.\n */\nstatic __isl_give isl_printer *opencl_print_kernel_header(\n\t__isl_take isl_printer *p, struct gpu_prog *prog,\n\tstruct ppcg_kernel *kernel)\n{\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"__kernel void kernel\");\n\tp = isl_printer_print_int(p, kernel->id);\n\tp = isl_printer_print_str(p, \"(\");\n\tp = opencl_print_kernel_arguments(p, prog, kernel, 1);\n\tp = isl_printer_print_str(p, \")\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\n/* Print a list of iterators of type \"type\" with names \"ids\" to \"p\".\n * Each iterator is assigned the corresponding opencl identifier returned\n * by the function \"opencl_id\".\n * Unlike the equivalent function in the CUDA backend which prints iterators\n * in reverse order to promote coalescing, this function does not print\n * iterators in reverse order.  The OpenCL backend currently does not take\n * into account any coalescing considerations.\n */\nstatic __isl_give isl_printer *print_iterators(__isl_take isl_printer *p,\n\tconst char *type, __isl_keep isl_id_list *ids, const char *opencl_id)\n{\n\tint i, n;\n\n\tn = isl_id_list_n_id(ids);\n\tif (n <= 0)\n\t\treturn p;\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, type);\n\tp = isl_printer_print_str(p, \" \");\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_id *id;\n\n\t\tif (i)\n\t\t\tp = isl_printer_print_str(p, \", \");\n\t\tid = isl_id_list_get_id(ids, i);\n\t\tp = isl_printer_print_id(p, id);\n\t\tisl_id_free(id);\n\t\tp = isl_printer_print_str(p, \" = \");\n\t\tp = isl_printer_print_str(p, opencl_id);\n\t\tp = isl_printer_print_str(p, \"(\");\n\t\tp = isl_printer_print_int(p, i);\n\t\tp = isl_printer_print_str(p, \")\");\n\t}\n\tp = isl_printer_print_str(p, \";\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\nstatic __isl_give isl_printer *opencl_print_kernel_iterators(\n\t__isl_take isl_printer *p, struct ppcg_kernel *kernel)\n{\n\tisl_ctx *ctx = isl_ast_node_get_ctx(kernel->tree);\n\tconst char *type;\n\n\ttype = isl_options_get_ast_iterator_type(ctx);\n\n\tp = print_iterators(p, type, kernel->block_ids, \"get_group_id\");\n\tp = print_iterators(p, type, kernel->thread_ids, \"get_local_id\");\n\n\treturn p;\n}\n\nstatic __isl_give isl_printer *opencl_print_kernel_var(\n\t__isl_take isl_printer *p, struct ppcg_kernel_var *var)\n{\n\tint j;\n\tisl_val *v;\n\n\tp = isl_printer_start_line(p);\n\tif (var->type == ppcg_access_shared)\n\t\tp = isl_printer_print_str(p, \"__local \");\n\tp = isl_printer_print_str(p, var->array->type);\n\tp = isl_printer_print_str(p, \" \");\n\tp = isl_printer_print_str(p, var->name);\n\tfor (j = 0; j < var->array->n_index; ++j) {\n\t\tp = isl_printer_print_str(p, \"[\");\n\t\tv = isl_vec_get_element_val(var->size, j);\n\t\tp = isl_printer_print_val(p, v);\n\t\tp = isl_printer_print_str(p, \"]\");\n\t\tisl_val_free(v);\n\t}\n\tp = isl_printer_print_str(p, \";\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\nstatic __isl_give isl_printer *opencl_print_kernel_vars(\n\t\t__isl_take isl_printer *p, struct ppcg_kernel *kernel)\n{\n\tint i;\n\n\tfor (i = 0; i < kernel->n_var; ++i)\n\t\tp = opencl_print_kernel_var(p, &kernel->var[i]);\n\n\treturn p;\n}\n\n/* Print a call to barrier() which is a sync statement.\n * All work-items in a work-group executing the kernel on a processor must\n * execute the barrier() function before any are allowed to continue execution\n * beyond the barrier.\n * The flag CLK_LOCAL_MEM_FENCE makes the barrier function either flush any\n * variables stored in local memory or queue a memory fence to ensure correct\n * ordering of memory operations to local memory.\n * The flag CLK_GLOBAL_MEM_FENCE makes the barrier function queue a memory\n * fence to ensure correct ordering of memory operations to global memory.\n */\nstatic __isl_give isl_printer *opencl_print_sync(__isl_take isl_printer *p,\n\tstruct ppcg_kernel_stmt *stmt)\n{\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p,\n\t\t\"barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\n/* Data structure containing function names for which the calls\n * should be changed from\n *\n *\tname(arg)\n *\n * to\n *\n *\topencl_name((type) (arg))\n */\nstatic struct ppcg_opencl_fn {\n\tconst char *name;\n\tconst char *opencl_name;\n\tconst char *type;\n} opencl_fn[] = {\n\t{ \"expf\",\t\"exp\",\t\t\"float\" },\n\t{ \"powf\",\t\"pow\",\t\t\"float\" },\n\t{ \"sqrtf\",\t\"sqrt\",\t\t\"float\" },\n};\n\n#define ARRAY_SIZE(array) (sizeof(array)/sizeof(*array))\n\n/* If the name of function called by \"expr\" matches any of those\n * in ppcg_opencl_fn, then replace the call by a cast to the corresponding\n * type in ppcg_opencl_fn and a call to corresponding OpenCL function.\n */\nstatic __isl_give pet_expr *map_opencl_call(__isl_take pet_expr *expr,\n\tvoid *user)\n{\n\tconst char *name;\n\tint i;\n\n\tname = pet_expr_call_get_name(expr);\n\tfor (i = 0; i < ARRAY_SIZE(opencl_fn); ++i) {\n\t\tpet_expr *arg;\n\n\t\tif (strcmp(name, opencl_fn[i].name))\n\t\t\tcontinue;\n\t\texpr = pet_expr_call_set_name(expr, opencl_fn[i].opencl_name);\n\t\targ = pet_expr_get_arg(expr, 0);\n\t\targ = pet_expr_new_cast(opencl_fn[i].type, arg);\n\t\texpr = pet_expr_set_arg(expr, 0, arg);\n\t}\n\treturn expr;\n}\n\n/* Print the body of a statement from the input program,\n * for use in OpenCL code.\n *\n * Before calling ppcg_kernel_print_domain to print the actual statement body,\n * we first modify this body to take into account that the output code\n * is OpenCL code.  In particular, if the statement calls any function\n * with a \"f\" suffix, then it needs to be replaced by a call to\n * the corresponding function without suffix after casting the argument\n * to a float.\n */\nstatic __isl_give isl_printer *print_opencl_kernel_domain(\n\t__isl_take isl_printer *p, struct ppcg_kernel_stmt *stmt)\n{\n\tstruct pet_stmt *ps;\n\tpet_tree *tree;\n\n\tps = stmt->u.d.stmt->stmt;\n\ttree = pet_tree_copy(ps->body);\n\tps->body = pet_tree_map_call_expr(ps->body, &map_opencl_call, NULL);\n\tp = ppcg_kernel_print_domain(p, stmt);\n\tpet_tree_free(ps->body);\n\tps->body = tree;\n\n\treturn p;\n}\n\n/* This function is called for each user statement in the AST,\n * i.e., for each kernel body statement, copy statement or sync statement.\n */\nstatic __isl_give isl_printer *opencl_print_kernel_stmt(\n\t__isl_take isl_printer *p,\n\t__isl_take isl_ast_print_options *print_options,\n\t__isl_keep isl_ast_node *node, void *user)\n{\n\tisl_id *id;\n\tstruct ppcg_kernel_stmt *stmt;\n\n\tid = isl_ast_node_get_annotation(node);\n\tstmt = isl_id_get_user(id);\n\tisl_id_free(id);\n\n\tisl_ast_print_options_free(print_options);\n\n\tswitch (stmt->type) {\n\tcase ppcg_kernel_copy:\n\t\treturn ppcg_kernel_print_copy(p, stmt);\n\tcase ppcg_kernel_sync:\n\t\treturn opencl_print_sync(p, stmt);\n\tcase ppcg_kernel_domain:\n\t\treturn print_opencl_kernel_domain(p, stmt);\n\t}\n\n\treturn p;\n}\n\n/* Return true if there is a double array in prog->array or\n * if any of the types in prog->scop involve any doubles.\n * To check the latter condition, we simply search for the string \"double\"\n * in the type definitions, which may result in false positives.\n */\nstatic __isl_give int any_double_elements(struct gpu_prog *prog)\n{\n\tint i;\n\n\tfor (i = 0; i < prog->n_array; ++i)\n\t\tif (strcmp(prog->array[i].type, \"double\") == 0)\n\t\t\treturn 1;\n\n\tfor (i = 0; i < prog->scop->pet->n_type; ++i) {\n\t\tstruct pet_type *type = prog->scop->pet->types[i];\n\n\t\tif (strstr(type->definition, \"double\"))\n\t\t\treturn 1;\n\t}\n\n\treturn 0;\n}\n\n/* Prints a #pragma to enable support for double floating-point\n * precision.  OpenCL 1.0 adds support for double precision floating-point as\n * an optional extension. An application that wants to use double will need to\n * include the #pragma OPENCL EXTENSION cl_khr_fp64 : enable directive before\n * any double precision data type is declared in the kernel code.\n */\nstatic __isl_give isl_printer *opencl_enable_double_support(\n\t__isl_take isl_printer *p)\n{\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"#pragma OPENCL EXTENSION cl_khr_fp64 :\"\n\t\t\" enable\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\n/* Macro definitions for ppcg_min and ppcg_max for use\n * in OpenCL kernel code.\n * These macro definitions essentially call the corresponding\n * OpenCL macros/functions, but first ensure that the two arguments\n * have the same type, since the OpenCL versions are only defined\n * in case those arguments have the same type.\n */\nstatic const char *opencl_min =\n\t\"(x,y)    min((__typeof__(x + y)) x, (__typeof__(x + y)) y)\";\nstatic const char *opencl_max =\n\t\"(x,y)    max((__typeof__(x + y)) x, (__typeof__(x + y)) y)\";\n\n/* Set the macro definitions for ppcg_min and ppcg_max to\n * OpenCL specific versions.\n */\nstatic __isl_give isl_printer *set_opencl_macros(__isl_take isl_printer *p)\n{\n\treturn ppcg_set_macros(p, opencl_min, opencl_max);\n}\n\nstatic __isl_give isl_printer *opencl_print_kernel(struct gpu_prog *prog,\n\tstruct ppcg_kernel *kernel, __isl_take isl_printer *p)\n{\n\tisl_ctx *ctx = isl_ast_node_get_ctx(kernel->tree);\n\tisl_ast_print_options *print_options;\n\n\tprint_options = isl_ast_print_options_alloc(ctx);\n\tprint_options = isl_ast_print_options_set_print_user(print_options,\n\t\t\t\t&opencl_print_kernel_stmt, NULL);\n\n\tp = isl_printer_set_output_format(p, ISL_FORMAT_C);\n\tp = opencl_print_kernel_header(p, prog, kernel);\n\tp = isl_printer_print_str(p, \"{\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_indent(p, 2);\n\tp = opencl_print_kernel_iterators(p, kernel);\n\tp = opencl_print_kernel_vars(p, kernel);\n\tp = isl_printer_end_line(p);\n\tp = ppcg_set_macro_names(p);\n\tp = set_opencl_macros(p);\n\tp = gpu_print_macros(p, kernel->tree);\n\tp = isl_ast_node_print(kernel->tree, p, print_options);\n\tp = isl_printer_indent(p, -2);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"}\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\nstruct print_host_user_data_opencl {\n\tstruct opencl_info *opencl;\n\tstruct gpu_prog *prog;\n};\n\n/* This function prints the i'th block size multiplied by the i'th grid size,\n * where i (a parameter to this function) is one of the possible dimensions of\n * grid sizes and block sizes.\n * If the dimension of block sizes is not equal to the dimension of grid sizes\n * the output is calculated as follows:\n *\n * Suppose that:\n * block_sizes[dim1] is the list of blocks sizes and it contains dim1 elements.\n * grid_sizes[dim2] is the list of grid sizes and it contains dim2 elements.\n *\n * The output is:\n * If (i > dim2) then the output is block_sizes[i]\n * If (i > dim1) then the output is grid_sizes[i]\n */\nstatic __isl_give isl_printer *opencl_print_total_number_of_work_items_for_dim(\n\t__isl_take isl_printer *p, struct ppcg_kernel *kernel, int i)\n{\n\tint grid_dim, block_dim;\n\tisl_ast_expr *grid_size_expr;\n\tisl_ast_expr *bound_grid;\n\n\tgrid_dim = isl_multi_pw_aff_dim(kernel->grid_size, isl_dim_set);\n\tblock_dim = kernel->n_block;\n\n\tif (i < min(grid_dim, block_dim)) {\n\t\tgrid_size_expr = kernel->grid_size_expr;\n\t\tbound_grid = isl_ast_expr_get_op_arg(grid_size_expr, 1 + i);\n\t\tp = isl_printer_print_str(p, \"(\");\n\t\tp = isl_printer_print_ast_expr(p, bound_grid);\n\t\tp = isl_printer_print_str(p, \") * \");\n\t\tp = isl_printer_print_int(p, kernel->block_dim[i]);\n\t\tisl_ast_expr_free(bound_grid);\n\t} else if (i >= grid_dim) {\n\t\tp = isl_printer_print_int(p, kernel->block_dim[i]);\n\t} else {\n\t\tgrid_size_expr = kernel->grid_size_expr;\n\t\tbound_grid = isl_ast_expr_get_op_arg(grid_size_expr, 1 + i);\n\t\tp = isl_printer_print_ast_expr(p, bound_grid);\n\t\tisl_ast_expr_free(bound_grid);\n\t}\n\n\treturn p;\n}\n\n/* Print a list that represents the total number of work items.  The list is\n * constructed by performing an element-wise multiplication of the block sizes\n * and the grid sizes.  To explain how the list is constructed, suppose that:\n * block_sizes[dim1] is the list of blocks sizes and it contains dim1 elements.\n * grid_sizes[dim2] is the list of grid sizes and it contains dim2 elements.\n *\n * The output of this function is constructed as follows:\n * If (dim1 > dim2) then the output is the following list:\n * grid_sizes[0]*block_sizes[0], ..., grid_sizes[dim2-1]*block_sizes[dim2-1],\n * block_sizes[dim2], ..., block_sizes[dim1-2], block_sizes[dim1-1].\n *\n * If (dim2 > dim1) then the output is the following list:\n * grid_sizes[0]*block_sizes[0], ..., grid_sizes[dim1-1] * block_sizes[dim1-1],\n * grid_sizes[dim1], grid_sizes[dim2-2], grid_sizes[dim2-1].\n *\n * To calculate the total number of work items out of the list constructed by\n * this function, the user should multiply the elements of the list.\n */\nstatic __isl_give isl_printer *opencl_print_total_number_of_work_items_as_list(\n\t__isl_take isl_printer *p, struct ppcg_kernel *kernel)\n{\n\tint i;\n\tint grid_dim, block_dim;\n\n\tgrid_dim = isl_multi_pw_aff_dim(kernel->grid_size, isl_dim_set);\n\tblock_dim = kernel->n_block;\n\n\tif ((grid_dim <= 0) || (block_dim <= 0)) {\n\t\tp = isl_printer_print_str(p, \"1\");\n\t\treturn p;\n\t}\n\n\tfor (i = 0; i <= max(grid_dim, block_dim) - 1; i++) {\n\t\tif (i > 0)\n\t\t\tp = isl_printer_print_str(p, \", \");\n\n\t\tp = opencl_print_total_number_of_work_items_for_dim(p,\n\t\t\tkernel, i);\n\t}\n\n\treturn p;\n}\n\n/* Copy \"array\" from the host to the device (to_host = 0) or\n * back from the device to the host (to_host = 1).\n */\nstatic __isl_give isl_printer *copy_array(__isl_take isl_printer *p,\n\tstruct gpu_array_info *array, int to_host)\n{\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"openclCheckReturn(\");\n\tif (to_host)\n\t\tp = isl_printer_print_str(p, \"clEnqueueReadBuffer\");\n\telse\n\t\tp = isl_printer_print_str(p, \"clEnqueueWriteBuffer\");\n\tp = isl_printer_print_str(p, \"(queue, dev_\");\n\tp = isl_printer_print_str(p, array->name);\n\tp = isl_printer_print_str(p, \", CL_TRUE, 0, \");\n\tp = gpu_array_info_print_size(p, array);\n\n\tif (gpu_array_is_scalar(array))\n\t\tp = isl_printer_print_str(p, \", &\");\n\telse\n\t\tp = isl_printer_print_str(p, \", \");\n\tp = isl_printer_print_str(p, array->name);\n\tp = isl_printer_print_str(p, \", 0, NULL, NULL));\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\n/* Print code for initializing the device for execution of the transformed\n * code.  This includes declaring locally defined variables as well as\n * declaring and allocating the required copies of arrays on the device.\n */\nstatic __isl_give isl_printer *init_device(__isl_take isl_printer *p,\n\tstruct gpu_prog *prog, struct opencl_info *opencl)\n{\n\tp = opencl_print_host_macros(p);\n\n\tp = gpu_print_local_declarations(p, prog);\n\tp = opencl_declare_device_arrays(p, prog);\n\tp = opencl_setup(p, opencl->input, opencl);\n\tp = opencl_allocate_device_arrays(p, prog);\n\n\treturn p;\n}\n\n/* Print code for clearing the device after execution of the transformed code.\n * In particular, free the memory that was allocated on the device.\n */\nstatic __isl_give isl_printer *clear_device(__isl_take isl_printer *p,\n\tstruct gpu_prog *prog, struct opencl_info *opencl)\n{\n\tp = opencl_release_device_arrays(p, prog);\n\tp = opencl_release_cl_objects(p, opencl);\n\n\treturn p;\n}\n\n/* Print a statement for copying an array to or from the device,\n * or for initializing or clearing the device.\n * The statement identifier of a copying node is called\n * \"to_device_<array name>\" or \"from_device_<array name>\" and\n * its user pointer points to the gpu_array_info of the array\n * that needs to be copied.\n * The node for initializing the device is called \"init_device\".\n * The node for clearing the device is called \"clear_device\".\n *\n * Extract the array (if any) from the identifier and call\n * init_device, clear_device, copy_array_to_device or copy_array_from_device.\n */\nstatic __isl_give isl_printer *print_device_node(__isl_take isl_printer *p,\n\t__isl_keep isl_ast_node *node, struct gpu_prog *prog,\n\tstruct opencl_info *opencl)\n{\n\tisl_ast_expr *expr, *arg;\n\tisl_id *id;\n\tconst char *name;\n\tstruct gpu_array_info *array;\n\n\texpr = isl_ast_node_user_get_expr(node);\n\targ = isl_ast_expr_get_op_arg(expr, 0);\n\tid = isl_ast_expr_get_id(arg);\n\tname = isl_id_get_name(id);\n\tarray = isl_id_get_user(id);\n\tisl_id_free(id);\n\tisl_ast_expr_free(arg);\n\tisl_ast_expr_free(expr);\n\n\tif (!name)\n\t\treturn isl_printer_free(p);\n\tif (!strcmp(name, \"init_device\"))\n\t\treturn init_device(p, prog, opencl);\n\tif (!strcmp(name, \"clear_device\"))\n\t\treturn clear_device(p, prog, opencl);\n\tif (!array)\n\t\treturn isl_printer_free(p);\n\n\tif (!prefixcmp(name, \"to_device\"))\n\t\treturn copy_array(p, array, 0);\n\telse\n\t\treturn copy_array(p, array, 1);\n}\n\n/* Print the user statement of the host code to \"p\".\n *\n * The host code may contain original user statements, kernel launches,\n * statements that copy data to/from the device and statements\n * the initialize or clear the device.\n * The original user statements and the kernel launches have\n * an associated annotation, while the other statements do not.\n * The latter are handled by print_device_node.\n * The annotation on the user statements is called \"user\".\n *\n * In case of a kernel launch, print a block of statements that\n * defines the grid and the work group and then launches the kernel.\n *\n * A grid is composed of many work groups (blocks), each work group holds\n * many work-items (threads).\n *\n * global_work_size[kernel->n_block] represents the total number of work\n * items.  It points to an array of kernel->n_block unsigned\n * values that describe the total number of work-items that will execute\n * the kernel.  The total number of work-items is computed as:\n * global_work_size[0] *...* global_work_size[kernel->n_block - 1].\n *\n * The size of each work group (i.e. the number of work-items in each work\n * group) is described using block_size[kernel->n_block].  The total\n * number of work-items in a block (work-group) is computed as:\n * block_size[0] *... * block_size[kernel->n_block - 1].\n *\n * For more information check:\n * http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clEnqueueNDRangeKernel.html\n */\nstatic __isl_give isl_printer *opencl_print_host_user(\n\t__isl_take isl_printer *p,\n\t__isl_take isl_ast_print_options *print_options,\n\t__isl_keep isl_ast_node *node, void *user)\n{\n\tisl_id *id;\n\tint is_user;\n\tstruct ppcg_kernel *kernel;\n\tstruct ppcg_kernel_stmt *stmt;\n\tstruct print_host_user_data_opencl *data;\n\n\tisl_ast_print_options_free(print_options);\n\n\tdata = (struct print_host_user_data_opencl *) user;\n\n\tid = isl_ast_node_get_annotation(node);\n\tif (!id)\n\t\treturn print_device_node(p, node, data->prog, data->opencl);\n\n\tis_user = !strcmp(isl_id_get_name(id), \"user\");\n\tkernel = is_user ? NULL : isl_id_get_user(id);\n\tstmt = is_user ? isl_id_get_user(id) : NULL;\n\tisl_id_free(id);\n\n\tif (is_user)\n\t\treturn ppcg_kernel_print_domain(p, stmt);\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"{\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_indent(p, 2);\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"size_t global_work_size[\");\n\n\tif (kernel->n_block > 0)\n\t\tp = isl_printer_print_int(p, kernel->n_block);\n\telse\n\t\tp = isl_printer_print_int(p, 1);\n\n\tp = isl_printer_print_str(p, \"] = {\");\n\tp = opencl_print_total_number_of_work_items_as_list(p, kernel);\n\tp = isl_printer_print_str(p, \"};\");\n\tp = isl_printer_end_line(p);\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"size_t block_size[\");\n\n\tif (kernel->n_block > 0)\n\t\tp = isl_printer_print_int(p, kernel->n_block);\n\telse\n\t\tp = isl_printer_print_int(p, 1);\n\n\tp = isl_printer_print_str(p, \"] = {\");\n\tp = opencl_print_block_sizes(p, kernel);\n\tp = isl_printer_print_str(p, \"};\");\n\tp = isl_printer_end_line(p);\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"cl_kernel kernel\");\n\tp = isl_printer_print_int(p, kernel->id);\n\tp = isl_printer_print_str(p, \" = clCreateKernel(program, \\\"kernel\");\n\tp = isl_printer_print_int(p, kernel->id);\n\tp = isl_printer_print_str(p, \"\\\", &err);\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"openclCheckReturn(err);\");\n\tp = isl_printer_end_line(p);\n\n\topencl_set_kernel_arguments(p, data->prog, kernel);\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"openclCheckReturn(clEnqueueNDRangeKernel\"\n\t\t\"(queue, kernel\");\n\tp = isl_printer_print_int(p, kernel->id);\n\tp = isl_printer_print_str(p, \", \");\n\tif (kernel->n_block > 0)\n\t\tp = isl_printer_print_int(p, kernel->n_block);\n\telse\n\t\tp = isl_printer_print_int(p, 1);\n\n\tp = isl_printer_print_str(p, \", NULL, global_work_size, \"\n\t\t\t\t\t\"block_size, \"\n\t\t\t\t\t\"0, NULL, NULL));\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"openclCheckReturn(\"\n\t\t\t\t\t\"clReleaseKernel(kernel\");\n\tp = isl_printer_print_int(p, kernel->id);\n\tp = isl_printer_print_str(p, \"));\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"clFinish(queue);\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_indent(p, -2);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"}\");\n\tp = isl_printer_end_line(p);\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_end_line(p);\n\n\tdata->opencl->kprinter = opencl_print_kernel(data->prog, kernel,\n\t\t\t\t\t\tdata->opencl->kprinter);\n\n\treturn p;\n}\n\nstatic __isl_give isl_printer *opencl_print_host_code(\n\t__isl_take isl_printer *p, struct gpu_prog *prog,\n\t__isl_keep isl_ast_node *tree, struct opencl_info *opencl)\n{\n\tisl_ast_print_options *print_options;\n\tisl_ctx *ctx = isl_ast_node_get_ctx(tree);\n\tstruct print_host_user_data_opencl data = { opencl, prog };\n\n\tprint_options = isl_ast_print_options_alloc(ctx);\n\tprint_options = isl_ast_print_options_set_print_user(print_options,\n\t\t\t\t&opencl_print_host_user, &data);\n\n\tp = gpu_print_macros(p, tree);\n\tp = isl_ast_node_print(tree, p, print_options);\n\n\treturn p;\n}\n\n/* Given a gpu_prog \"prog\" and the corresponding transformed AST\n * \"tree\", print the entire OpenCL code to \"p\".\n */\nstatic __isl_give isl_printer *print_opencl(__isl_take isl_printer *p,\n\tstruct gpu_prog *prog, __isl_keep isl_ast_node *tree,\n\tstruct gpu_types *types, void *user)\n{\n\tstruct opencl_info *opencl = user;\n\n\topencl->kprinter = isl_printer_set_output_format(opencl->kprinter,\n\t\t\t\t\t\t\tISL_FORMAT_C);\n\tif (any_double_elements(prog))\n\t\topencl->kprinter = opencl_enable_double_support(\n\t\t\t\t\t\t\topencl->kprinter);\n\tif (opencl->options->opencl_print_kernel_types)\n\t\topencl->kprinter = gpu_print_types(opencl->kprinter, types,\n\t\t\t\t\t\t\t\tprog);\n\n\tif (!opencl->kprinter)\n\t\treturn isl_printer_free(p);\n\n\tp = opencl_print_host_code(p, prog, tree, opencl);\n\n\treturn p;\n}\n\n/* Transform the code in the file called \"input\" by replacing\n * all scops by corresponding OpenCL code.\n * The host code is written to \"output\" or a name derived from\n * \"input\" if \"output\" is NULL.\n * The kernel code is placed in separate files with names\n * derived from \"output\" or \"input\".\n *\n * We let generate_gpu do all the hard work and then let it call\n * us back for printing the AST in print_opencl.\n *\n * To prepare for this printing, we first open the output files\n * and we close them after generate_gpu has finished.\n */\nint generate_opencl(isl_ctx *ctx, struct ppcg_options *options,\n\tconst char *input, const char *output)\n{\n\tstruct opencl_info opencl = { options, input, output };\n\tint r;\n\n\topencl.kprinter = isl_printer_to_str(ctx);\n\tr = opencl_open_files(&opencl);\n\n\tif (r >= 0)\n\t\tr = generate_gpu(ctx, input, opencl.host_c, options,\n\t\t\t\t&print_opencl, &opencl);\n\n\tif (opencl_close_files(&opencl) < 0)\n\t\tr = -1;\n\tisl_printer_free(opencl.kprinter);\n\n\treturn r;\n}\n"
  },
  {
    "path": "src/ppcg_files/opencl.h",
    "content": "#ifndef _OPENCL_H\n#define _OPENCL_H\n\n#include <pet.h>\n#include \"ppcg_options.h\"\n#include \"ppcg.h\"\n\n#ifdef __cplusplus\nextern \"C\"\n{\n#endif\n\n\tint generate_opencl(isl_ctx *ctx, struct ppcg_options *options,\n\t\t\t\t\t\t\t\t\t\t\tconst char *input, const char *output);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "src/ppcg_options.c",
    "content": "/*\n * Copyright 2010-2011 INRIA Saclay\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege, INRIA Saclay - Ile-de-France,\n * Parc Club Orsay Universite, ZAC des vignes, 4 rue Jacques Monod,\n * 91893 Orsay, France\n */\n\n#include \"ppcg_options.h\"\n\nstatic struct isl_arg_choice target[] = {\n\t//{\"c\", PPCG_TARGET_C},\n\t//{\"cuda\", PPCG_TARGET_CUDA},\n\t//{\"opencl\", PPCG_TARGET_OPENCL},\n\t//{\"autosa_c\", AUTOSA_TARGET_C},\n\t{\"autosa_hls_c\", AUTOSA_TARGET_XILINX_HLS_C},\n\t{\"autosa_opencl\", AUTOSA_TARGET_INTEL_OPENCL},\n\t//{\"autosa_t2s\", AUTOSA_TARGET_T2S},\n\t{\"autosa_catapult_c\", AUTOSA_TARGET_CATAPULT_HLS_C},\n\t{\"autosa_tapa\", AUTOSA_TARGET_TAPA_CPP},\n\t{0}};\n\nstatic struct isl_arg_choice sa_type[] = {\n\t{\"sync\", AUTOSA_SA_TYPE_SYNC},\n\t{\"async\", AUTOSA_SA_TYPE_ASYNC},\n\t{0}};\n\n/* Set defaults that depend on the target.\n * In particular, set --schedule-outer-coincidence iff target is a GPU.\n */\nvoid ppcg_options_set_target_defaults(struct ppcg_options *options)\n{\n\tchar *argv[2] = {NULL};\n\n\targv[0] = \"ppcg_options_set_target_defaults\";\n\tif (options->target == PPCG_TARGET_C)\n\t\targv[1] = \"--no-schedule-outer-coincidence\";\n\telse\n\t\targv[1] = \"--schedule-outer-coincidence\";\n\n\tisl_options_parse(options->isl, 2, argv, ISL_ARG_ALL);\n}\n\n/* Callback that is called whenever the \"target\" option is set (to \"val\").\n * The callback is called after target has been updated.\n *\n * Call ppcg_options_set_target_defaults to reset the target-dependent options.\n */\nstatic int set_target(void *opt, unsigned val)\n{\n\tstruct ppcg_options *options = opt;\n\n\tppcg_options_set_target_defaults(options);\n\n\treturn 0;\n}\n\nISL_ARGS_START(struct ppcg_debug_options, ppcg_debug_options_args)\nISL_ARG_BOOL(struct ppcg_debug_options, dump_schedule_constraints, 0,\n\t\t\t \"dump-schedule-constraints\", 0, \"dump schedule constraints\")\nISL_ARG_BOOL(struct ppcg_debug_options, dump_schedule, 0,\n\t\t\t \"dump-schedule\", 0, \"dump isl computed schedule\")\nISL_ARG_BOOL(struct ppcg_debug_options, dump_final_schedule, 0,\n\t\t\t \"dump-final-schedule\", 0, \"dump PPCG computed schedule\")\nISL_ARG_BOOL(struct ppcg_debug_options, dump_sizes, 0,\n\t\t\t \"dump-sizes\", 0,\n\t\t\t \"dump effectively used per kernel tile, grid and block sizes\")\nISL_ARG_BOOL(struct ppcg_debug_options, verbose, 'v', \"verbose\", 0, NULL)\nISL_ARGS_END\n\n//ISL_ARGS_START(struct ppcg_options, ppcg_opencl_options_args)\n//ISL_ARG_STR(struct ppcg_options, opencl_compiler_options, 0, \"compiler-options\",\n//\t\t\t\"options\", NULL, \"options to pass to the OpenCL compiler\")\n//ISL_ARG_BOOL(struct ppcg_options, opencl_use_gpu, 0, \"use-gpu\", 1,\n//\t\t\t \"use GPU device (if available)\")\n//ISL_ARG_STR_LIST(struct ppcg_options, opencl_n_include_file,\n//\t\t\t\t opencl_include_files, 0, \"include-file\", \"filename\",\n//\t\t\t\t \"file to #include in generated OpenCL code\")\n//ISL_ARG_BOOL(struct ppcg_options, opencl_print_kernel_types, 0,\n//\t\t\t \"print-kernel-types\", 1,\n//\t\t\t \"print definitions of types in the kernel file\")\n//ISL_ARG_BOOL(struct ppcg_options, opencl_embed_kernel_code, 0,\n//\t\t\t \"embed-kernel-code\", 0, \"embed kernel code into host code\")\n//ISL_ARGS_END\n\nISL_ARGS_START(struct autosa_options, autosa_options_args)\nISL_ARG_BOOL(struct autosa_options, autosa, 0, \"autosa\", 1,\n\t\t\t\t\"generate systolic arrays using AutoSA\")\nISL_ARG_BOOL(struct autosa_options, array_contraction, 0, \"array-contraction\", 1,\n\t\t\t\t\"apply array contraction\")\nISL_ARG_BOOL(struct autosa_options, axi_stream, 0, \"axi-stream\", 0,\n\t\t\t\t\"generate AXI stream interface, must be used together with host serialization.\")\nISL_ARG_BOOL(struct autosa_options, block_sparse, 0, \"block-sparse\", 0,\n\t\t\t\t\"use block sparsity\")\nISL_ARG_STR(struct autosa_options, block_sparse_ratio, 0, \"block-sparse-ratio\", \"ratio\",\n\t\t\t\tNULL, \"block sparsity ratio (e.g., kernel[]->A[2,4])\")\nISL_ARG_STR(struct autosa_options, config, 0, \"config\", \"config\", NULL,\n\t\t\t\t\"AutoSA configuration file\")\nISL_ARG_BOOL(struct autosa_options, credit_control, 0, \"credit-control\", 0,\n\t\t\t \t\"enable credit control between different array partitions\")\nISL_ARG_BOOL(struct autosa_options, data_pack, 0, \"data-pack\", 1,\n\t\t\t \t\"enable data packing\")\nISL_ARG_STR(struct autosa_options, data_pack_sizes, 0, \"data-pack-sizes\", \"sizes\",\n\t\t\t\tNULL, \"data pack sizes upper bound (bytes) at innermost, intermediate, outermost I/O level [default: kernel[]->data_pack[8,32,64]]\")\nISL_ARG_BOOL(struct autosa_options, double_buffer, 0, \"double-buffer\", 1,\n\t\t\t \t\"enable double-buffering for data transfer\")\nISL_ARG_STR(struct autosa_options, double_buffer_assignment, 0, \"double-buffer-assign\", \"assignment\",\n\t\t\t\tNULL, \"assign arrays to be double bufferred (e.g., kernel[]->A[])\")\nISL_ARG_INT(struct autosa_options, double_buffer_style, 0, \"double-buffer-style\", \"id\", 1,\n\t\t\t\t\"change double-buffering logic coding style (0: while loop 1: for loop)\")\nISL_ARG_BOOL(struct autosa_options, dump_code, 0, \"dump-code\", 0,\n\t\t\t \t\"dump the intermediate code\")\nISL_ARG_BOOL(struct autosa_options, explore_loop_permute, 0, \"explore-loop-permute\", 0,\n\t\t\t\t\"explore loop permutation in the step of array partitioning\")\nISL_ARG_INT(struct autosa_options, loop_permute_order, 0, \"loop-permute-order\", \"order\", 0,\n\t\t\t\t\"specify which loop ordering to be explored\")\nISL_ARG_INT(struct autosa_options, fifo_depth, 0, \"fifo-depth\", \"depth\", 2, \"default FIFO depth\")\nISL_ARG_BOOL(struct autosa_options, hbm, 0, \"hbm\", 0,\n\t\t\t \t\"use multi-port DRAM/HBM\")\nISL_ARG_INT(struct autosa_options, n_hbm_port, 0, \"hbm-port-num\", \"num\", 2,\n\t\t\t\t\"default HBM port number per array\")\nISL_ARG_BOOL(struct autosa_options, hls, 0, \"hls\", 0,\n\t\t\t \t\"generate Xilinx HLS host\")\nISL_ARG_BOOL(struct autosa_options, host_serialize, 0, \"host-serialize\", 0,\n\t\t\t \t\"serialize/deserialize the host data\")\nISL_ARG_BOOL(struct autosa_options, insert_hls_dependence, 0, \"insert-hls-dependence\", 0,\n\t\t\t \t\"insert Xilinx HLS dependence pragma (alpha version)\")\nISL_ARG_INT(struct autosa_options, int_io_dir, 0, \"int-io-dir\", \"dir\", 0,\n\t\t\t \t\"set the default interior I/O direction (0: [1,x] 1: [x,1])\")\nISL_ARG_BOOL(struct autosa_options, io_module_embedding, 0, \"io-module-embedding\", 0,\n\t\t\t \t\"embed the I/O modules inside PEs if possible\")\nISL_ARG_BOOL(struct autosa_options, isl_sink, 0, \"isl-sink\", 1,\n\t\t\t \t\"sink time loops using ISL default APIs\")\nISL_ARG_BOOL(struct autosa_options, loop_infinitize, 0, \"loop-infinitize\", 0,\n\t\t\t \t\"apply loop infinitization optimization (Intel OpenCL only)\")\nISL_ARG_BOOL(struct autosa_options, local_reduce, 0, \"local-reduce\", 0,\n\t\t\t \t\"generate non-output-stationary array with local reduction\")\nISL_ARG_STR(struct autosa_options, reduce_op, 0, \"reduce-op\", \"op\",\n\t\t\t\tNULL, \"reduction operator (must be used with local-reduce together)\")\t\t\t \nISL_ARG_BOOL(struct autosa_options, lower_int_io_L1_buffer, 0, \"lower-int-io-L1-buffer\", 0,\n\t\t\t \t\"lower the L1 buffer for interior I/O modules\")\nISL_ARG_BOOL(struct autosa_options, lower_if_branch, 0, \"lower-if-branch\", 0,\n\t\t\t\t\"lower if branch in the I/O module\")\nISL_ARG_INT(struct autosa_options, max_local_memory, 0,\n\t\t\t\t\"max-local-memory\", \"size\", 8192, \"maximal amount of local memory\")\nISL_ARG_INT(struct autosa_options, max_sa_dim, 0,\n\t\t\t\t\"max-sa-dim\", \"dim\", 2, \"maximal systolic array dimension\")\t\t\t \nISL_ARG_STR(struct autosa_options, mem_port_map, 0, \"mem-port-map\", \"map\", NULL,\n\t\t\t\t\"memory port mapping\")\nISL_ARG_BOOL(struct autosa_options, non_block_fifo, 0, \"non-blocking-fifo\", 0,\n\t\t\t \t\"use non-blocking fifo interface\")\nISL_ARG_STR(struct autosa_options, output_dir, 0, \"output-dir\", \"dir\", \"./autosa.tmp/output\",\n\t\t\t\t\"AutoSA Output directory\")\nISL_ARG_BOOL(struct autosa_options, reverse_order, 0, \"reverse-order\", 1,\n\t\t\t \t\"reverse latency hiding loop tiling order\")\t\t\t\nISL_ARG_STR(struct autosa_options, select_rar_dep, 0, \"select-rar-dep\", \"choice\",\n\t\t\t\tNULL, \"select the RAR dependence for the array access. [example: kernel[]->__pet_ref_4[1]]\")\nISL_ARG_STR(struct autosa_options, sa_sizes, 0, \"sa-sizes\", \"sizes\", NULL,\n\t\t\t\t\"per kernel PE optimization tile sizes\")\nISL_ARG_INT(struct autosa_options, sa_tile_size, 0, \"sa-tile-size\", \"size\", 4,\n\t\t\t\t\"default tile size in PE optmization\")\nISL_ARG_USER_OPT_CHOICE(struct autosa_options, sa_type, 0, \"sa-type\", sa_type,\n\t\t\t\tNULL, AUTOSA_SA_TYPE_ASYNC, AUTOSA_SA_TYPE_ASYNC, \"systolic array type\")\nISL_ARG_STR(struct autosa_options, simd_info, 0, \"simd-info\", \"info\", NULL,\n\t\t\t\t\"per kernel SIMD information\")\nISL_ARG_BOOL(struct autosa_options, simd_touch_space, 0, \"simd-touch-space\", 0,\n\t\t\t\t\"use space loops as SIMD vectorization loops\")\nISL_ARG_INT(struct autosa_options, tuning_method, 0, \"tuning-method\", \"method\", -1,\n\t\t\t\t\"tuning method (0: exhaustive search 1: others)\")\nISL_ARG_BOOL(struct autosa_options, two_level_buffer, 0, \"two-level-buffer\", 0,\n\t\t\t \t\"enable two-level buffering in I/O modules\")\nISL_ARG_BOOL(struct autosa_options, t2s_tile, 0, \"t2s-tile\", 0,\n\t\t\t \t\"generate T2S code from tiled code\")\nISL_ARG_INT(struct autosa_options, t2s_tile_phase, 0,\n\t\t\t\t\"t2s-tile-phase\", \"phase\", 0, \"T2S tiled URE codegen phase\")\nISL_ARG_STR(struct autosa_options, param_names, 0, \"param-names\", \"name\", NULL,\n\t\t\t\t\"customized parameter names (for tuning)\")\nISL_ARG_BOOL(struct autosa_options, uram, 0, \"uram\", 0,\n\t\t\t \t\"use Xilinx FPGA URAM\")\nISL_ARG_BOOL(struct autosa_options, use_local_memory, 0, \"local-memory\", 1,\n\t\t\t \t\"use local memory in kernel code\")\nISL_ARG_BOOL(struct autosa_options, use_cplusplus_template, 0, \"use-cplusplus-template\", 0,\n\t\t\t \t\"use C++ template in codegen (necessary for irregular PEs)\")\t\t\t \nISL_ARG_BOOL(struct autosa_options, verbose, 'v', \"verbose\", 0,\n\t\t\t \t\"print verbose compilation information\")\nISL_ARG_BOOL(struct autosa_options, hcl, 0, \"hcl\", 0,\n\t\t\t \t\"generate code for integrating with HeteroCL\")\t\t\t \nISL_ARGS_END\n\nISL_ARGS_START(struct ppcg_options, ppcg_options_args)\nISL_ARG_CHILD(struct ppcg_options, isl, \"isl\", &isl_options_args, \"isl options\")\nISL_ARG_CHILD(struct ppcg_options, debug, NULL, &ppcg_debug_options_args,\n\t\t\t  \"debugging options\")\nISL_ARG_CHILD(struct ppcg_options, autosa, \"autosa\", &autosa_options_args,\n\t\t\t  \"AutoSA options\")\n//ISL_ARG_BOOL(struct ppcg_options, group_chains, 0, \"group-chains\", 1,\n//\t\t\t \"group chains of interdependent statements that are executed \"\n//\t\t\t \"consecutively in the original schedule before scheduling\")\nISL_ARG_BOOL(struct ppcg_options, reschedule, 0, \"reschedule\", 1,\n\t\t\t \"replace original schedule by isl computed schedule\")\n//ISL_ARG_BOOL(struct ppcg_options, scale_tile_loops, 0,\n//\t\t\t \"scale-tile-loops\", 1, NULL)\n//ISL_ARG_BOOL(struct ppcg_options, wrap, 0, \"wrap\", 1, NULL)\n//ISL_ARG_BOOL(struct ppcg_options, use_shared_memory, 0, \"shared-memory\", 1,\n//\t\t\t \"use shared memory in kernel code\")\n//ISL_ARG_BOOL(struct ppcg_options, use_private_memory, 0, \"private-memory\", 1,\n//\t\t\t \"use private memory in kernel code\")\n//ISL_ARG_STR(struct ppcg_options, ctx, 0, \"ctx\", \"context\", NULL,\n//\t\t\t\"Constraints on parameters\")\n//ISL_ARG_BOOL(struct ppcg_options, non_negative_parameters, 0,\n//\t\t\t \"assume-non-negative-parameters\", 0,\n//\t\t\t \"assume all parameters are non-negative)\")\n//ISL_ARG_BOOL(struct ppcg_options, tile, 0, \"tile\", 0,\n//\t\t\t \"perform tiling (C target)\")\n//ISL_ARG_INT(struct ppcg_options, tile_size, 'S', \"tile-size\", \"size\", 32, NULL)\n//ISL_ARG_BOOL(struct ppcg_options, isolate_full_tiles, 0, \"isolate-full-tiles\",\n//\t\t\t 0, \"isolate full tiles from partial tiles (hybrid tiling)\")\n//ISL_ARG_STR(struct ppcg_options, sizes, 0, \"sizes\", \"sizes\", NULL,\n//\t\t\t\"Per kernel tile, grid and block sizes\")\n//ISL_ARG_INT(struct ppcg_options, max_shared_memory, 0,\n//\t\t\t\"max-shared-memory\", \"size\", 8192, \"maximal amount of shared memory\")\n//ISL_ARG_BOOL(struct ppcg_options, openmp, 0, \"openmp\", 0,\n//\t\t\t \"Generate OpenMP macros (only for C target)\")\nISL_ARG_USER_OPT_CHOICE(struct ppcg_options, target, 0, \"target\", target,\n\t\t\t\t\t\t&set_target, PPCG_TARGET_CUDA, PPCG_TARGET_CUDA,\n\t\t\t\t\t\t\"the target to generate code for\")\nISL_ARG_BOOL(struct ppcg_options, linearize_device_arrays, 0,\n\t\t\t \"linearize-device-arrays\", 1,\n\t\t\t \"linearize all device arrays, even those of fixed size\")\n//ISL_ARG_BOOL(struct ppcg_options, allow_gnu_extensions, 0,\n//\t\t\t \"allow-gnu-extensions\", 1,\n//\t\t\t \"allow the use of GNU extensions in generated code\")\nISL_ARG_BOOL(struct ppcg_options, live_range_reordering, 0,\n\t\t\t \"live-range-reordering\", 0,\n\t\t\t \"allow successive live ranges on the same memory element \"\n\t\t\t \"to be reordered\")\n//ISL_ARG_BOOL(struct ppcg_options, hybrid, 0, \"hybrid\", 0,\n//\t\t\t \"apply hybrid tiling whenever a suitable input pattern is found \"\n//\t\t\t \"(GPU targets)\")\n//ISL_ARG_BOOL(struct ppcg_options, unroll_copy_shared, 0, \"unroll-copy-shared\",\n//\t\t\t 0, \"unroll code for copying to/from shared memory\")\n//ISL_ARG_BOOL(struct ppcg_options, unroll_gpu_tile, 0, \"unroll-gpu-tile\", 0,\n//\t\t\t \"unroll code inside tile on GPU targets\")\n//ISL_ARG_GROUP(\"opencl\", &ppcg_opencl_options_args, \"OpenCL options\")\n//ISL_ARG_STR(struct ppcg_options, save_schedule_file, 0, \"save-schedule\",\n//\t\t\t\"file\", NULL, \"save isl computed schedule to <file>\")\n//ISL_ARG_STR(struct ppcg_options, load_schedule_file, 0, \"load-schedule\",\n//\t\t\t\"file\", NULL, \"load schedule from <file>, \"\n//\t\t\t\t\t\t  \"using it instead of an isl computed schedule\")\nISL_ARGS_END\n"
  },
  {
    "path": "src/ppcg_options.h",
    "content": "#ifndef PPCG_OPTIONS_H\n#define PPCG_OPTIONS_H\n\n#include <isl/arg.h>\n#include <isl/options.h>\n\n#ifdef __cplusplus\nextern \"C\"\n{\n#endif\n\n\tstruct ppcg_debug_options\n\t{\n\t\tint dump_schedule_constraints;\n\t\tint dump_schedule;\n\t\tint dump_final_schedule;\n\t\tint dump_sizes;\n\t\tint verbose;\n\t};\n\n\tstruct autosa_options\n\t{\n\t\t/* Generate systolic array using AutoSA. */\n\t\tint autosa;\n\t\t/* Use HBM memory. */\n\t\tint hbm;\n\t\tint n_hbm_port;\n\t\t/* Enable double buffering. */\n\t\tint double_buffer;\n\t\t/* Double buffer assignment. */\n\t\tchar *double_buffer_assignment;\n\t\t/* Dump the intermediate code. */\n\t\tint dump_code;\n\t\t/* Maximal systolic array dimension. */\n\t\tint max_sa_dim;\n\t\t/* Systolic array type. */\n\t\tint sa_type;\n\t\t/* Universal tile size. */\n\t\tint sa_tile_size;\n\t\t/* Tile sizes for PE optimization. */\n\t\tchar *sa_sizes;\n\t\t/* Generate T2S code from tiled program. */\n\t\tint t2s_tile;\n\t\t/* Phases of T2S codegen for tiled program. */\n\t\tint t2s_tile_phase;\n\t\t/* Take advantage of FPGA local memory. */\n\t\tint use_local_memory;\n\t\t/* Maximal amount of local memory. */\n\t\tint max_local_memory;\n\t\t/* Memory port mapping (for Intel OpenCL). */\n\t\tchar *mem_port_map;\n\t\t/* Enable data pack for transferring data. */\n\t\tint data_pack;\n\t\t/* Data pack factors at different I/O levels. */\n\t\tchar *data_pack_sizes;\n\t\t/* Enable credit control between different array partitions. */\n\t\tint credit_control;\n\t\t/* Enable two-level buffering in I/O modules. */\n\t\tint two_level_buffer;\n\t\t/* Configuration file. */\n\t\tchar *config;\n\t\t/* Output directory. */\n\t\tchar *output_dir;\n\t\t/* SIMD information file. */\n\t\tchar *simd_info;\n\t\t/* Generate HLS host instead of OpenCL host. */\n\t\tint hls;\n\t\t/* Use URAM. */\n\t\tint uram;\n\t\t/* Print verbose information. */\n\t\tint verbose;\n\t\t/* Insert HLS dependence pragma. */\n\t\tint insert_hls_dependence;\n\t\t/* Embed I/O modules inside PEs. */\n\t\tint io_module_embedding;\n\t\t/* Enable loop infinitization optimization. Only for Intel. */\n\t\tint loop_infinitize;\n\t\t/* Enable data serialization/deserialization on the host side. */\n\t\tint host_serialize;\n\t\t/* Use non-blocking FIFO access. Note: Not supported. */\n\t\tint non_block_fifo;\n\t\t/* Double buffer coding style. 0: for loop (default) 1: while loop */\n\t\tint double_buffer_style;\n\t\t/* Enable local reduce */\n\t\tint local_reduce;\n\t\t/* Reduce op */\n\t\tchar *reduce_op;\n\t\t/* Interior I/O elimination direction. \n\t\t * 0: set the first dim to 1 (default). \n\t\t * 1: Set the last dim to 1.\n\t\t */\n\t\t/* Select the RAR dependence candidate. */\n\t\tchar *select_rar_dep;\n\t\tint int_io_dir;\n\t\t/* Lower the interior I/O module L1 buffer */\n\t\tint lower_int_io_L1_buffer;\n\t\t/* Use C++ template in codegen (necessary for irregular PEs) */\n\t\tint use_cplusplus_template;\n\t\t/* Default FIFO depth */\n\t\tint fifo_depth;\n\t\t/* Touch space loops in the SIMD vectorization */\n\t\tint simd_touch_space;\n\t\t/* Use block sparsity */\n\t\tint block_sparse;\n\t\t/* Block sparse ratio [nonzero, vec_len] */\n\t\tchar* block_sparse_ratio;\n\t\t/* Generate code for HeteroCL integration. */\n\t\tint hcl;\n\t\t/* Apply array contraction. */\n\t\tint array_contraction;\n\t\t/* Sinking time loops using ISL default APIs. */\n\t\tint isl_sink;\n\t\t/* Reverse the loop tiling order. */\n\t\tint reverse_order;\n\t\t/* Use AXI Stream Interface. */\n\t\tint axi_stream;\n\t\t/* Tuning method: [0: Exhaustive search 1: Others] */\n\t\tint tuning_method;\n\t\t/* Explore loop permutation in the array partitioning. */\n\t\tint explore_loop_permute;\n\t\tint loop_permute_order;\n\t\t/* Parameter names */\n\t\tchar *param_names;\n\t\t/* Lowering if-branch in inter-trans I/O module. */\n\t\tint lower_if_branch;\n\t};\t\n\n\tstruct ppcg_options\n\t{\n\t\tstruct isl_options *isl;\n\t\tstruct ppcg_debug_options *debug;\n\t\t/* Options to pass to the AutoSA compiler. */\n\t\tstruct autosa_options *autosa;\n\n\t\t/* Group chains of consecutive statements before scheduling. */\n\t\tint group_chains;\n\n\t\t/* Use isl to compute a schedule replacing the original schedule. */\n\t\tint reschedule;\n\t\tint scale_tile_loops;\n\t\tint wrap;\n\n\t\t/* Assume all parameters are non-negative. */\n\t\tint non_negative_parameters;\n\t\tchar *ctx;\n\t\tchar *sizes;\n\n\t\t/* Perform tiling (C target). */\n\t\tint tile;\n\t\tint tile_size;\n\n\t\t/* Isolate full tiles from partial tiles. */\n\t\tint isolate_full_tiles;\n\n\t\t/* Take advantage of private memory. */\n\t\tint use_private_memory;\n\n\t\t/* Take advantage of shared memory. */\n\t\tint use_shared_memory;\n\n\t\t/* Maximal amount of shared memory. */\n\t\tint max_shared_memory;\n\n\t\t/* The target we generate code for. */\n\t\tint target;\n\n\t\t/* Generate OpenMP macros (C target only). */\n\t\tint openmp;\n\n\t\t/* Linearize all device arrays. */\n\t\tint linearize_device_arrays;\n\n\t\t/* Allow the use of GNU extensions in generated code. */\n\t\tint allow_gnu_extensions;\n\n\t\t/* Allow live range to be reordered. */\n\t\tint live_range_reordering;\n\n\t\t/* Allow hybrid tiling whenever a suitable input pattern is found. */\n\t\tint hybrid;\n\n\t\t/* Unroll the code for copying to/from shared memory. */\n\t\tint unroll_copy_shared;\n\t\t/* Unroll code inside tile on GPU targets. */\n\t\tint unroll_gpu_tile;\n\n\t\t/* Options to pass to the OpenCL compiler.  */\n\t\tchar *opencl_compiler_options;\n\t\t/* Prefer GPU device over CPU. */\n\t\tint opencl_use_gpu;\n\t\t/* Number of files to include. */\n\t\tint opencl_n_include_file;\n\t\t/* Files to include. */\n\t\tconst char **opencl_include_files;\n\t\t/* Print definitions of types in kernels. */\n\t\tint opencl_print_kernel_types;\n\t\t/* Embed OpenCL kernel code in host code. */\n\t\tint opencl_embed_kernel_code;\n\n\t\t/* Name of file for saving isl computed schedule or NULL. */\n\t\tchar *save_schedule_file;\n\t\t/* Name of file for loading schedule or NULL. */\n\t\tchar *load_schedule_file;\n\t};\n\n\tISL_ARG_DECL(ppcg_debug_options, struct ppcg_debug_options,\n\t\t\t\t ppcg_debug_options_args)\n\tISL_ARG_DECL(autosa_options, struct autosa_options, autosa_options_args)\n\tISL_ARG_DECL(ppcg_options, struct ppcg_options, ppcg_options_args)\n\n#define PPCG_TARGET_C 0\n#define PPCG_TARGET_CUDA 1\n#define PPCG_TARGET_OPENCL 2\n#define AUTOSA_TARGET_XILINX_HLS_C 3\n#define AUTOSA_TARGET_INTEL_OPENCL 4\n#define AUTOSA_TARGET_T2S 5\n#define AUTOSA_TARGET_C 6\n#define AUTOSA_TARGET_CATAPULT_HLS_C 7\n#define AUTOSA_TARGET_TAPA_CPP 8\n\n#define AUTOSA_SA_TYPE_SYNC 0\n#define AUTOSA_SA_TYPE_ASYNC 1\n\n\tvoid ppcg_options_set_target_defaults(struct ppcg_options *options);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "src/print.c",
    "content": "/*\n * Copyright 2012-2013 Ecole Normale Superieure\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege,\n * Ecole Normale Superieure, 45 rue d’Ulm, 75230 Paris, France\n */\n\n#include <isl/ctx.h>\n#include <isl/id.h>\n#include <isl/aff.h>\n#include <isl/ast.h>\n#include <isl/ast_build.h>\n#include <isl/printer.h>\n\n#include \"print.h\"\n#include \"util.h\"\n\n__isl_give isl_printer *ppcg_start_block(__isl_take isl_printer *p)\n{\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"{\");\n\tp = isl_printer_end_line(p);\n\tp = isl_printer_indent(p, 2);\n\treturn p;\n}\n\n__isl_give isl_printer *ppcg_end_block(__isl_take isl_printer *p)\n{\n\tp = isl_printer_indent(p, -2);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"}\");\n\tp = isl_printer_end_line(p);\n\treturn p;\n}\n\n/* Names of notes that keep track of whether min/max\n * macro definitions have already been printed.\n */\nstatic const char *ppcg_max_printed = \"ppcg_max_printed\";\nstatic const char *ppcg_min_printed = \"ppcg_min_printed\";\n\n/* Has the macro definition corresponding to \"note_name\" been printed\n * to \"p\" before?\n * That is, does \"p\" have an associated \"note_name\" note?\n */\nstatic isl_bool printed_before(__isl_keep isl_printer *p, const char *note_name)\n{\n\tisl_ctx *ctx;\n\tisl_id *id;\n\tisl_bool printed;\n\n\tif (!p)\n\t\treturn isl_bool_error;\n\n\tctx = isl_printer_get_ctx(p);\n\tid = isl_id_alloc(ctx, note_name, NULL);\n\tprinted = isl_printer_has_note(p, id);\n\tisl_id_free(id);\n\n\treturn printed;\n}\n\n/* Keep track of the fact that the macro definition corresponding\n * to \"note_name\" has been printed to \"p\" by attaching a note with\n * that name.  The value of the note is of no importance, but it\n * has to be a valid isl_id, so the note identifier is reused\n * as the note.\n */\nstatic __isl_give isl_printer *mark_printed(__isl_take isl_printer *p,\n\tconst char *note_name)\n{\n\tisl_ctx *ctx;\n\tisl_id *id;\n\n\tif (!p)\n\t\treturn NULL;\n\n\tctx = isl_printer_get_ctx(p);\n\tid = isl_id_alloc(ctx, note_name, NULL);\n\treturn isl_printer_set_note(p, id, isl_id_copy(id));\n}\n\n/* Print a macro definition \"def\" for the macro \"name\" to \"p\",\n * unless such a macro definition has been printed to \"p\" before.\n * \"note_name\" is used as the name of the note that keeps track\n * of whether this printing has happened.\n */\nstatic __isl_give isl_printer *print_ppcg_macro(__isl_take isl_printer *p,\n\tconst char *name, const char *def, const char *note_name)\n{\n\tisl_bool printed;\n\n\tprinted = printed_before(p, note_name);\n\tif (printed < 0)\n\t\treturn isl_printer_free(p);\n\tif (printed)\n\t\treturn p;\n\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, \"#define \");\n\tp = isl_printer_print_str(p, name);\n\tp = isl_printer_print_str(p, def);\n\tp = isl_printer_end_line(p);\n\n\tp = mark_printed(p, note_name);\n\n\treturn p;\n}\n\n/* Structure for keeping track of definitions of some macros.\n */\nstruct ppcg_macros {\n\tconst char *min;\n\tconst char *max;\n};\n\n/* Free the memory allocated by a struct ppcg_macros.\n */\nstatic void ppcg_macros_free(void *user)\n{\n\tfree(user);\n}\n\n/* Default macro definitions (when GNU extensions are allowed).\n */\nstruct ppcg_macros ppcg_macros_default = {\n\t.min = \"(x,y)    \"\n\t\t\"({ __typeof__(x) _x = (x); __typeof__(y) _y = (y); \"\n\t\t\"_x < _y ? _x : _y; })\",\n\t.max = \"(x,y)    \"\n\t\t\"({ __typeof__(x) _x = (x); __typeof__(y) _y = (y); \"\n\t\t\"_x > _y ? _x : _y; })\",\n};\n\n/* Name used for the note that keeps track of macro definitions.\n */\nstatic const char *ppcg_macros = \"ppcg_macros\";\n\n/* Set the macro definitions for isl_ast_op_min and isl_ast_op_max\n * to \"min\" and \"max\" and store them in \"p\".\n *\n * In particular, create a ppcg_macros object and attach it\n * as a note to the printer.\n */\n__isl_give isl_printer *ppcg_set_macros(__isl_take isl_printer *p,\n\tconst char *min, const char *max)\n{\n\tisl_ctx *ctx;\n\tisl_id *id, *macros_id;\n\tstruct ppcg_macros *macros;\n\n\tif (!p)\n\t\treturn NULL;\n\n\tctx = isl_printer_get_ctx(p);\n\tmacros = isl_alloc_type(ctx, struct ppcg_macros);\n\tif (!macros)\n\t\treturn isl_printer_free(p);\n\tmacros->min = min;\n\tmacros->max = max;\n\tid = isl_id_alloc(ctx, ppcg_macros, NULL);\n\tmacros_id = isl_id_alloc(ctx, NULL, macros);\n\tif (!macros_id)\n\t\tppcg_macros_free(macros);\n\telse\n\t\tmacros_id = isl_id_set_free_user(macros_id, &ppcg_macros_free);\n\n\tp = isl_printer_set_note(p, id, macros_id);\n\n\treturn p;\n}\n\n/* Return the ppcg_macros object that holds the currently active\n * macro definitions in \"p\".\n * If \"p\" has a note with macro definitions, then return those.\n * Otherwise, return the default macro definitions.\n */\nstatic struct ppcg_macros *get_macros(__isl_keep isl_printer *p)\n{\n\tisl_id *id;\n\tisl_bool has_macros;\n\tstruct ppcg_macros *macros;\n\n\tid = isl_id_alloc(isl_printer_get_ctx(p), ppcg_macros, NULL);\n\thas_macros = isl_printer_has_note(p, id);\n\tif (has_macros < 0 || !has_macros) {\n\t\tisl_id_free(id);\n\t\tif (has_macros < 0)\n\t\t\treturn NULL;\n\t\treturn &ppcg_macros_default;\n\t}\n\tid = isl_printer_get_note(p, id);\n\tmacros = isl_id_get_user(id);\n\tisl_id_free(id);\n\n\treturn macros;\n}\n\n/* Print the currently active macro definition for ppcg_max.\n */\nstatic __isl_give isl_printer *print_max(__isl_take isl_printer *p)\n{\n\tstruct ppcg_macros *macros;\n\n\tmacros = get_macros(p);\n\tif (!macros)\n\t\treturn isl_printer_free(p);\n\treturn print_ppcg_macro(p, ppcg_max, macros->max, ppcg_max_printed);\n}\n\n/* Print the currently active macro definition for ppcg_min.\n */\nstatic __isl_give isl_printer *print_min(__isl_take isl_printer *p)\n{\n\tstruct ppcg_macros *macros;\n\n\tmacros = get_macros(p);\n\tif (!macros)\n\t\treturn isl_printer_free(p);\n\treturn print_ppcg_macro(p, ppcg_min, macros->min, ppcg_min_printed);\n}\n\n/* Print a macro definition for \"type\" to \"p\".\n * If GNU extensions are allowed, then print a specialized definition\n * for isl_ast_op_min and isl_ast_op_max.\n * Otherwise, use the default isl definition.\n */\n__isl_give isl_printer *ppcg_print_macro(enum isl_ast_op_type type,\n\t__isl_take isl_printer *p)\n{\n\tisl_ctx *ctx;\n\tstruct ppcg_options *options;\n\n\tif (!p)\n\t\treturn NULL;\n\n\tctx = isl_printer_get_ctx(p);\n\toptions = isl_ctx_peek_options(ctx, &ppcg_options_args);\n\tif (!options || !options->allow_gnu_extensions)\n\t\treturn isl_ast_op_type_print_macro(type, p);\n\n\tswitch (type) {\n\tcase isl_ast_op_max:\n\t\treturn print_max(p);\n\tcase isl_ast_op_min:\n\t\treturn print_min(p);\n\tdefault:\n\t\treturn isl_ast_op_type_print_macro(type, p);\n\t}\n}\n\n/* isl_ast_expr_foreach_ast_op_type or isl_ast_node_foreach_ast_op_type\n * callback that prints a macro definition for \"type\".\n */\nstatic isl_stat print_macro(enum isl_ast_op_type type, void *user)\n{\n\tisl_printer **p = user;\n\n\t*p = ppcg_print_macro(type, *p);\n\tif (!*p)\n\t\treturn isl_stat_error;\n\n\treturn isl_stat_ok;\n}\n\n/* Print the required macros for \"expr\".\n */\n__isl_give isl_printer *ppcg_ast_expr_print_macros(\n\t__isl_keep isl_ast_expr *expr, __isl_take isl_printer *p)\n{\n\tif (isl_ast_expr_foreach_ast_op_type(expr, &print_macro, &p) < 0)\n\t\treturn isl_printer_free(p);\n\treturn p;\n}\n\n/* isl_id_to_ast_expr_foreach callback that prints the required\n * macro definitions for \"val\".\n */\nstatic isl_stat print_expr_macros(__isl_take isl_id *key,\n\t__isl_take isl_ast_expr *val, void *user)\n{\n\tisl_printer **p = user;\n\n\t*p = ppcg_ast_expr_print_macros(val, *p);\n\tisl_id_free(key);\n\tisl_ast_expr_free(val);\n\n\tif (!*p)\n\t\treturn isl_stat_error;\n\treturn isl_stat_ok;\n}\n\n/* Print the required macro definitions for the body of a statement in which\n * the access expressions are replaced by the isl_ast_expr objects\n * in \"ref2expr\".\n */\n__isl_give isl_printer *ppcg_print_body_macros(__isl_take isl_printer *p,\n\t__isl_keep isl_id_to_ast_expr *ref2expr)\n{\n\tif (isl_id_to_ast_expr_foreach(ref2expr, &print_expr_macros, &p) < 0)\n\t\treturn isl_printer_free(p);\n\treturn p;\n}\n\n/* Print the required macros for \"node\".\n */\n__isl_give isl_printer *ppcg_print_macros(__isl_take isl_printer *p,\n\t__isl_keep isl_ast_node *node)\n{\n\tif (isl_ast_node_foreach_ast_op_type(node, &print_macro, &p) < 0)\n\t\treturn isl_printer_free(p);\n\treturn p;\n}\n\n/* Names used for the macros that may appear in a printed isl AST.\n */\nconst char *ppcg_min = \"ppcg_min\";\nconst char *ppcg_max = \"ppcg_max\";\nconst char *ppcg_fdiv_q = \"ppcg_fdiv_q\";\n\n/* Set the names of the macros that may appear in a printed isl AST.\n */\n__isl_give isl_printer *ppcg_set_macro_names(__isl_take isl_printer *p)\n{\n\tp = isl_ast_op_type_set_print_name(p, isl_ast_op_min, ppcg_min);\n\tp = isl_ast_op_type_set_print_name(p, isl_ast_op_max, ppcg_max);\n\tp = isl_ast_op_type_set_print_name(p, isl_ast_op_fdiv_q, ppcg_fdiv_q);\n\n\treturn p;\n}\n\n/* Given a multi affine expression \"mpa\" without domain, modify it to have\n * the schedule space of \"build\" as domain.\n *\n * If the schedule space of \"build\" is a parameter space, then nothing\n * needs to be done.\n * Otherwise, \"mpa\" is first given a 0D domain and then it is combined\n * with a mapping from the schedule space of \"build\" to the same 0D domain.\n */\n__isl_give isl_multi_pw_aff *ppcg_attach_multi_pw_aff(\n\t__isl_take isl_multi_pw_aff *mpa, __isl_keep isl_ast_build *build)\n{\n\tisl_bool params;\n\tisl_space *space;\n\tisl_multi_aff *ma;\n\n\tspace = isl_ast_build_get_schedule_space(build);\n\tparams = isl_space_is_params(space);\n\tif (params < 0 || params) {\n\t\tisl_space_free(space);\n\t\tif (params < 0)\n\t\t\treturn isl_multi_pw_aff_free(mpa);\n\t\treturn mpa;\n\t}\n\tspace = isl_space_from_domain(space);\n\tma = isl_multi_aff_zero(space);\n\tmpa = isl_multi_pw_aff_from_range(mpa);\n\tmpa = isl_multi_pw_aff_pullback_multi_aff(mpa, ma);\n\n\treturn mpa;\n}\n\n/* Build an access AST expression from \"size\" using \"build\".\n * \"size\" does not have a domain, but \"build\" may have a proper schedule space.\n * First modify \"size\" to have that schedule space as domain.\n */\n__isl_give isl_ast_expr *ppcg_build_size_expr(__isl_take isl_multi_pw_aff *size,\n\t__isl_keep isl_ast_build *build)\n{\n\tsize = ppcg_attach_multi_pw_aff(size, build);\n\treturn isl_ast_build_access_from_multi_pw_aff(build, size);\n}\n\n/* Print a declaration for an array with element type \"base_type\" and\n * size \"size\" to \"p\".\n */\n__isl_give isl_printer *ppcg_print_declaration_with_size(\n\t__isl_take isl_printer *p, const char *base_type,\n\t__isl_keep isl_ast_expr *size)\n{\n\tif (!base_type || !size)\n\t\treturn isl_printer_free(p);\n\n\tp = ppcg_ast_expr_print_macros(size, p);\n\tp = isl_printer_start_line(p);\n\tp = isl_printer_print_str(p, base_type);\n\tp = isl_printer_print_str(p, \" \");\n\tp = isl_printer_print_ast_expr(p, size);\n\tp = isl_printer_print_str(p, \";\");\n\tp = isl_printer_end_line(p);\n\n\treturn p;\n}\n\n/* Print a declaration for array \"array\" to \"p\", using \"build\"\n * to simplify any size expressions.\n *\n * The size is computed from the extent of the array and is\n * subsequently converted to an \"access expression\" by \"build\".\n */\n__isl_give isl_printer *ppcg_print_declaration(__isl_take isl_printer *p,\n\tstruct pet_array *array, __isl_keep isl_ast_build *build)\n{\n\tisl_multi_pw_aff *size;\n\tisl_ast_expr *expr;\n\n\tif (!array)\n\t\treturn isl_printer_free(p);\n\n\tsize = ppcg_size_from_extent(isl_set_copy(array->extent));\n\texpr = isl_ast_build_access_from_multi_pw_aff(build, size);\n\tp = ppcg_print_declaration_with_size(p, array->element_type, expr);\n\tisl_ast_expr_free(expr);\n\n\treturn p;\n}\n\n/* Print declarations for the arrays in \"scop\" that are declared\n * and that are exposed (if exposed == 1) or not exposed (if exposed == 0).\n */\nstatic __isl_give isl_printer *print_declarations(__isl_take isl_printer *p,\n\tstruct ppcg_scop *scop, int exposed)\n{\n\tint i;\n\tisl_ast_build *build;\n\n\tif (!scop)\n\t\treturn isl_printer_free(p);\n\n\tbuild = isl_ast_build_from_context(isl_set_copy(scop->context));\n\tfor (i = 0; i < scop->pet->n_array; ++i) {\n\t\tstruct pet_array *array = scop->pet->arrays[i];\n\n\t\tif (!array->declared)\n\t\t\tcontinue;\n\t\tif (array->exposed != exposed)\n\t\t\tcontinue;\n\n\t\tp = ppcg_print_declaration(p, array, build);\n\t}\n\tisl_ast_build_free(build);\n\n\treturn p;\n}\n\n/* Print declarations for the arrays in \"scop\" that are declared\n * and exposed to the code after the scop.\n */\n__isl_give isl_printer *ppcg_print_exposed_declarations(\n\t__isl_take isl_printer *p, struct ppcg_scop *scop)\n{\n\treturn print_declarations(p, scop, 1);\n}\n\n/* Print declarations for the arrays in \"scop\" that are declared,\n * but not exposed to the code after the scop.\n */\n__isl_give isl_printer *ppcg_print_hidden_declarations(\n\t__isl_take isl_printer *p, struct ppcg_scop *scop)\n{\n\treturn print_declarations(p, scop, 0);\n}\n"
  },
  {
    "path": "src/print.h",
    "content": "#ifndef PRINT_H\n#define PRINT_H\n\n#include <isl/ast.h>\n\n#include \"ppcg.h\"\n\n#ifdef __cplusplus\nextern \"C\"\n{\n#endif\n\n\textern const char *ppcg_min;\n\textern const char *ppcg_max;\n\textern const char *ppcg_fdiv_q;\n\n\t__isl_give isl_printer *ppcg_start_block(__isl_take isl_printer *p);\n\t__isl_give isl_printer *ppcg_end_block(__isl_take isl_printer *p);\n\n\t__isl_give isl_printer *ppcg_set_macro_names(__isl_take isl_printer *p);\n\t__isl_give isl_printer *ppcg_set_macros(__isl_take isl_printer *p,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tconst char *min, const char *max);\n\t__isl_give isl_printer *ppcg_print_macro(enum isl_ast_op_type type,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t __isl_take isl_printer *p);\n\t__isl_give isl_printer *ppcg_ast_expr_print_macros(\n\t\t\t__isl_keep isl_ast_expr *expr, __isl_take isl_printer *p);\n\t__isl_give isl_printer *ppcg_print_body_macros(__isl_take isl_printer *p,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t __isl_keep isl_id_to_ast_expr *ref2expr);\n\t__isl_give isl_printer *ppcg_print_macros(__isl_take isl_printer *p,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t__isl_keep isl_ast_node *node);\n\n\t__isl_give isl_ast_expr *ppcg_build_size_expr(__isl_take isl_multi_pw_aff *size,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t__isl_keep isl_ast_build *build);\n\n\t__isl_give isl_printer *ppcg_print_declaration_with_size(\n\t\t\t__isl_take isl_printer *p, const char *base_type,\n\t\t\t__isl_keep isl_ast_expr *size);\n\t__isl_give isl_printer *ppcg_print_declaration(__isl_take isl_printer *p,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t struct pet_array *array, __isl_keep isl_ast_build *build);\n\t__isl_give isl_printer *ppcg_print_exposed_declarations(\n\t\t\t__isl_take isl_printer *p, struct ppcg_scop *scop);\n\t__isl_give isl_printer *ppcg_print_hidden_declarations(\n\t\t\t__isl_take isl_printer *p, struct ppcg_scop *scop);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "src/schedule.c",
    "content": "/*\n * Copyright 2010-2011 INRIA Saclay\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege, INRIA Saclay - Ile-de-France,\n * Parc Club Orsay Universite, ZAC des vignes, 4 rue Jacques Monod,\n * 91893 Orsay, France\n */\n\n#include <ctype.h>\n#include <stdio.h>\n#include <string.h>\n\n#include <isl/set.h>\n#include <isl/map.h>\n#include <isl/constraint.h>\n\n#include \"grouping.h\"\n#include \"schedule.h\"\n\n/* Add parameters with identifiers \"ids\" to \"set\".\n */\nstatic __isl_give isl_set *add_params(__isl_take isl_set *set,\n\t__isl_keep isl_id_list *ids)\n{\n\tint i, n;\n\tunsigned nparam;\n\n\tn = isl_id_list_n_id(ids);\n\n\tnparam = isl_set_dim(set, isl_dim_param);\n\tset = isl_set_add_dims(set, isl_dim_param, n);\n\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_id *id;\n\n\t\tid = isl_id_list_get_id(ids, i);\n\t\tset = isl_set_set_dim_id(set, isl_dim_param, nparam + i, id);\n\t}\n\n\treturn set;\n}\n\n/* Equate the dimensions of \"set\" starting at \"first\" to\n * freshly created parameters with identifiers \"ids\".\n * The number of equated dimensions is equal to the number of elements in \"ids\".\n */\nstatic __isl_give isl_set *parametrize(__isl_take isl_set *set,\n\tint first, __isl_keep isl_id_list *ids)\n{\n\tint i, n;\n\tunsigned nparam;\n\n\tnparam = isl_set_dim(set, isl_dim_param);\n\n\tset = add_params(set, ids);\n\n\tn = isl_id_list_n_id(ids);\n\tfor (i = 0; i < n; ++i)\n\t\tset = isl_set_equate(set, isl_dim_param, nparam + i,\n\t\t\t\t\tisl_dim_set, first + i);\n\n\treturn set;\n}\n\n/* Given a parameter space \"space\", create a set of dimension \"len\"\n * of which the dimensions starting at \"first\" are equated to\n * freshly created parameters with identifiers \"ids\".\n */\n__isl_give isl_set *parametrization(__isl_take isl_space *space,\n\tint len, int first, __isl_keep isl_id_list *ids)\n{\n\tisl_set *set;\n\n\tspace = isl_space_set_from_params(space);\n\tspace = isl_space_add_dims(space, isl_dim_set, len);\n\tset = isl_set_universe(space);\n\n\treturn parametrize(set, first, ids);\n}\n\n/* Load and return a schedule from a file called \"filename\".\n */\nstatic __isl_give isl_schedule *load_schedule(isl_ctx *ctx,\n\tconst char *filename)\n{\n\tFILE *file;\n\tisl_schedule *schedule;\n\n\tfile = fopen(filename, \"r\");\n\tif (!file) {\n\t\tfprintf(stderr, \"Unable to open '%s' for reading\\n\", filename);\n\t\treturn NULL;\n\t}\n\tschedule = isl_schedule_read_from_file(ctx, file);\n\tfclose(file);\n\n\treturn schedule;\n}\n\n/* Save the schedule \"schedule\" to a file called \"filename\".\n * The schedule is printed in block style.\n */\nstatic void save_schedule(__isl_keep isl_schedule *schedule,\n\tconst char *filename)\n{\n\tFILE *file;\n\tisl_ctx *ctx;\n\tisl_printer *p;\n\n\tif (!schedule)\n\t\treturn;\n\n\tfile = fopen(filename, \"w\");\n\tif (!file) {\n\t\tfprintf(stderr, \"Unable to open '%s' for writing\\n\", filename);\n\t\treturn;\n\t}\n\tctx = isl_schedule_get_ctx(schedule);\n\tp = isl_printer_to_file(ctx, file);\n\tp = isl_printer_set_yaml_style(p, ISL_YAML_STYLE_BLOCK);\n\tp = isl_printer_print_schedule(p, schedule);\n\tisl_printer_free(p);\n\tfclose(file);\n}\n\n/* Compute a schedule on the domain of \"sc\" that respects the schedule\n * constraints in \"sc\", without trying to combine groups of statements.\n */\n__isl_give isl_schedule *ppcg_compute_non_grouping_schedule(\n\t__isl_take isl_schedule_constraints *sc, struct ppcg_options *options)\n{\n\tif (options->debug->dump_schedule_constraints)\n\t\tisl_schedule_constraints_dump(sc);\n\treturn isl_schedule_constraints_compute_schedule(sc);\n}\n\n/* Compute a schedule on the domain of \"sc\" that respects the schedule\n * constraints in \"sc\".\n *\n * \"schedule\" is a known correct schedule that is used to combine\n * groups of statements if options->group_chains is set.\n */\n__isl_give isl_schedule *ppcg_compute_schedule(\n\t__isl_take isl_schedule_constraints *sc,\n\t__isl_keep isl_schedule *schedule, struct ppcg_options *options)\n{\n\tif (options->group_chains)\n\t\treturn ppcg_compute_grouping_schedule(sc, schedule, options);\n\treturn ppcg_compute_non_grouping_schedule(sc, options);\n}\n\n/* Obtain a schedule, either by reading it form a file\n * or by computing it using \"compute\".\n * Also take care of saving the computed schedule and/or\n * dumping the obtained schedule if requested by the user.\n */\n__isl_give isl_schedule *ppcg_get_schedule(isl_ctx *ctx,\n\tstruct ppcg_options *options,\n\t__isl_give isl_schedule *(*compute)(void *user), void *user)\n{\n\tisl_schedule *schedule;\n\n\tif (options->load_schedule_file) {\n\t\tschedule = load_schedule(ctx, options->load_schedule_file);\n\t} else {\n\t\tschedule = compute(user);\n\t\tif (options->save_schedule_file)\n\t\t\tsave_schedule(schedule, options->save_schedule_file);\n\t}\n\tif (options->debug->dump_schedule)\n\t\tisl_schedule_dump(schedule);\n\n\treturn schedule;\n}\n\n/* Mark all dimensions in the band node \"node\" to be of \"type\".\n */\n__isl_give isl_schedule_node *ppcg_set_schedule_node_type(\n\t__isl_take isl_schedule_node *node, enum isl_ast_loop_type type)\n{\n\tint i, n;\n\n\tn = isl_schedule_node_band_n_member(node);\n\tfor (i = 0; i < n; ++i)\n\t\tnode = isl_schedule_node_band_member_set_ast_loop_type(node, i,\n\t\t\t\t\t\t\ttype);\n\n\treturn node;\n}\n"
  },
  {
    "path": "src/schedule.h",
    "content": "#ifndef _SCHEDULE_H\n#define _SCHEDULE_H\n\n#include <isl/id.h>\n#include <isl/space.h>\n#include <isl/schedule.h>\n#include <isl/schedule_node.h>\n\n#include \"ppcg_options.h\"\n\n#ifdef __cplusplus\nextern \"C\"\n{\n#endif\n\n\t__isl_give isl_set *parametrization(__isl_take isl_space *space,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tint len, int first, __isl_keep isl_id_list *names);\n\n\t__isl_give isl_schedule *ppcg_compute_non_grouping_schedule(\n\t\t\t__isl_take isl_schedule_constraints *sc, struct ppcg_options *options);\n\t__isl_give isl_schedule *ppcg_compute_schedule(\n\t\t\t__isl_take isl_schedule_constraints *sc,\n\t\t\t__isl_keep isl_schedule *schedule, struct ppcg_options *options);\n\n\t__isl_give isl_schedule *ppcg_get_schedule(isl_ctx *ctx,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t struct ppcg_options *options,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t __isl_give isl_schedule *(*compute)(void *user), void *user);\n\n\t__isl_give isl_schedule_node *ppcg_set_schedule_node_type(\n\t\t\t__isl_take isl_schedule_node *node, enum isl_ast_loop_type type);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "src/tests/call.c",
    "content": "#include <stdlib.h>\n\nvoid copy_summary(int b[1000], int a[1000], int pos)\n{\n\tb[pos] = 0;\n\tint c = a[pos];\n}\n\n#ifdef pencil_access\n__attribute__((pencil_access(copy_summary)))\n#endif\nvoid copy(int b[1000], int a[1000], int pos);\n\nint main()\n{\n\tint a[1000], b[1000];\n\n\tfor (int i = 0; i < 1000; ++i)\n\t\ta[i] = i;\n#pragma scop\n\tfor (int i = 0; i < 1000; ++i)\n\t\tcopy(b, a, i);\n#pragma endscop\n\tfor (int i = 0; i < 1000; ++i)\n\t\tif (b[i] != a[i])\n\t\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/call2.c",
    "content": "#include <stdlib.h>\n\nvoid copy_summary(int b[1000], int a[1000], int pos)\n{\n\tb[pos] = 0;\n\tint c = a[pos];\n}\n\n#ifdef pencil_access\n__attribute__((pencil_access(copy_summary)))\n#endif\nvoid copy(int b[1000], int a[1000], int pos);\n\nint main()\n{\n\tint a[2][1000];\n\n\tfor (int i = 0; i < 1000; ++i)\n\t\ta[0][i] = i;\n#pragma scop\n\tfor (int i = 0; i < 1000; ++i)\n\t\tcopy(a[1], a[0], i);\n#pragma endscop\n\tfor (int i = 0; i < 1000; ++i)\n\t\tif (a[1][i] != a[0][i])\n\t\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/call2_opencl_functions.cl",
    "content": "void copy(__global int b[1000], __global int a[1000], int pos)\n{\n\tb[pos] = a[pos];\n}\n"
  },
  {
    "path": "src/tests/call3.c",
    "content": "#include <stdlib.h>\n\nvoid copy_summary(int b[100], int a[100])\n{\n\tfor (int i = 0; i < 100; ++i) {\n\t\tb[i] = 0;\n\t\tint c = a[i];\n\t}\n}\n\n#ifdef pencil_access\n__attribute__((pencil_access(copy_summary)))\n#endif\nvoid copy(int b[100], int a[100]);\n\nint main()\n{\n\tint A[100][100], B[100];\n\n\tfor (int i = 0; i < 100; ++i)\n\t\tB[i] = i;\n#pragma scop\n\tfor (int i = 0; i < 100; ++i)\n\t\tcopy(A[i], B);\n#pragma endscop\n\tfor (int i = 0; i < 100; ++i)\n\t\tfor (int j = 0; j < 100; ++j)\n\t\t\tif (A[j][i] != B[i])\n\t\t\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/call3_opencl_functions.cl",
    "content": "void copy(__global int b[100], __global int a[100])\n{\n\tfor (int i = 0; i < 100; ++i)\n\t\tb[i] = a[i];\n}\n"
  },
  {
    "path": "src/tests/call4.c",
    "content": "#include <stdlib.h>\n\nint inline get(int a[1000], int pos)\n{\n\tint tmp = a[pos];\n\treturn tmp;\n}\n\nint main()\n{\n\tint a[1000], b[1000];\n\n\tfor (int i = 0; i < 1000; ++i)\n\t\ta[i] = i;\n#pragma scop\n\tfor (int i = 0; i < 999; ++i)\n\t\tb[i] = get(a, i) + get(a, i + 1);\n#pragma endscop\n\tfor (int i = 0; i < 999; ++i)\n\t\tif (b[i] != a[i] + a[i + 1])\n\t\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/call5.c",
    "content": "#include <stdlib.h>\n\nint inline add_one(int i)\n{\n\treturn i + 1;\n}\n\nint main()\n{\n\tint a[1000], b[1000];\n\n\tfor (int i = 0; i < 1000; ++i)\n\t\ta[i] = i;\n#pragma scop\n\tfor (int i = 0; i < 999; ++i)\n\t\tb[i] = add_one(add_one(a[i]));\n#pragma endscop\n\tfor (int i = 0; i < 999; ++i)\n\t\tif (b[i] != a[i] + 2)\n\t\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/call_opencl_functions.cl",
    "content": "void copy(__global int b[1000], __global int a[1000], int pos)\n{\n\tb[pos] = a[pos];\n}\n"
  },
  {
    "path": "src/tests/dead.c",
    "content": "#include <stdlib.h>\n\nint main()\n{\n\tint a[1000], b[1000];\n\n\tfor (int i = 0; i < 1000; ++i)\n\t\ta[i] = i;\n#pragma scop\n\tfor (int i = 0; i < 1000; ++i) {\n\t\tint c;\n\t\tint d;\n\t\tc = a[i];\n\t\td = c;\n\t\tb[i] = c;\n\t}\n#pragma endscop\n\tfor (int i = 0; i < 1000; ++i)\n\t\tif (b[i] != a[i])\n\t\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/iterator.c",
    "content": "#include <stdlib.h>\n\nint main()\n{\n\tint i;\n\tint a[101];\n\n\ti = 0;\n#pragma scop\n\tfor (i = 0; i < 100; ++i)\n\t\ta[i] = i;\n\ta[i] = i;\n#pragma endscop\n\tif (a[100] != 100)\n\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/live_out.c",
    "content": "#include <stdlib.h>\n\n/* Check that a write access is not removed from the live-out\n * accesses only because a strict subset of the (potentially)\n * accessed elements are killed by a later write.\n */\nint main()\n{\n\tint A[10];\n\n\tA[1] = 0;\n#pragma scop\n\tint i = 1;\n\ti = i * i;\n\tA[i] = 1;\n\tA[0] = 0;\n#pragma endscop\n\tif (A[1] != 1)\n\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/local.c",
    "content": "#include <stdlib.h>\n\nint main()\n{\n\tint A[100];\n\n#pragma scop\n\t{\n\t\tint B[100];\n\t\tB[0] = 0;\n\t\tfor (int i = 1; i < 100; ++i)\n\t\t\tB[i] = B[i - 1] + 1;\n\t\tfor (int i = 0; i < 100; ++i)\n\t\t\tA[i] = B[i];\n\t}\n#pragma endscop\n\tfor (int i = 0; i < 100; ++i)\n\t\tif (A[i] != i)\n\t\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/loop.c",
    "content": "#include <stdlib.h>\n\nint main()\n{\n\tint a[1000], b[1000];\n\n\tfor (int i = 0; i < 1000; ++i)\n\t\ta[i] = i;\n#pragma scop\n\tfor (int i = 0; i < 1000; ++i)\n\t\tb[i] = a[i];\n#pragma endscop\n\tfor (int i = 0; i < 1000; ++i)\n\t\tif (b[i] != a[i])\n\t\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/not_accessed.c",
    "content": "#include <stdlib.h>\n\nvoid copy_summary(int b[1000], int a[1000], int pos, int c[1000])\n{\n\tb[pos] = 0;\n\tint d = a[pos];\n}\n\n#ifdef pencil_access\n__attribute__((pencil_access(copy_summary)))\n#endif\nvoid copy(int b[1000], int a[1000], int pos, int c[1000]);\n\nint main()\n{\n\tint a[1000], b[1000], c[1000];\n\n\tfor (int i = 0; i < 1000; ++i)\n\t\ta[i] = i;\n#pragma scop\n\tfor (int i = 0; i < 1000; ++i)\n\t\tcopy(b, a, i, c);\n#pragma endscop\n\tfor (int i = 0; i < 1000; ++i)\n\t\tif (b[i] != a[i])\n\t\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/not_accessed_opencl_functions.cl",
    "content": "void copy(__global int b[1000], __global int a[1000], int pos,\n\t__global int c[1000])\n{\n\tb[pos] = a[pos];\n}\n"
  },
  {
    "path": "src/tests/scalar.c",
    "content": "#include <stdlib.h>\n\nint main()\n{\n\tint a;\n#pragma scop\n\ta = 1;\n#pragma endscop\n\tif (a != 1)\n\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/shared_sink.c",
    "content": "#include <stdlib.h>\n\n/* Check that the sources of live ranges with the same sink\n * are executed in order.\n */\nint main()\n{\n\tint A[128];\n\tint n = 128;\n\n\tA[0] = 0;\n#pragma scop\n\tfor (int i = 0; i < n; ++i) {\n\t\tint set = 0;\n\t\tif (A[i] < 2)\n\t\t\tset = 1;\n\t\tif (set)\n\t\t\tA[i] = 2;\n\t}\n#pragma endscop\n\tif (A[0] != 2)\n\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/struct.c",
    "content": "#include <stdlib.h>\n\nstruct s {\n\tint c[10][10];\n};\n\nint main()\n{\n\tstruct s a[10][10], b[10][10];\n\n\tfor (int i = 0; i < 10; ++i)\n\t\tfor (int j = 0; j < 10; ++j)\n\t\t\tfor (int k = 0; k < 10; ++k)\n\t\t\t\tfor (int l = 0; l < 10; ++l)\n\t\t\t\t\ta[i][j].c[k][l] = i + j + k + l;\n#pragma scop\n\tfor (int i = 0; i < 10; ++i)\n\t\tfor (int j = 0; j < 10; ++j)\n\t\t\tfor (int k = 0; k < 10; ++k)\n\t\t\t\tfor (int l = 0; l < 10; ++l)\n\t\t\t\t\tb[i][j].c[k][l] = i + j + k + l;\n#pragma endscop\n\tfor (int i = 0; i < 10; ++i)\n\t\tfor (int j = 0; j < 10; ++j)\n\t\t\tfor (int k = 0; k < 10; ++k)\n\t\t\t\tfor (int l = 0; l < 10; ++l)\n\t\t\t\t\tif (b[i][j].c[k][l] != a[i][j].c[k][l])\n\t\t\t\t\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/struct2.c",
    "content": "#include <stdlib.h>\n\nstruct s {\n\tint a;\n};\n\nint main()\n{\n\tstruct s a, b[10];\n\n#pragma scop\n\ta.a = 42;\n\tfor (int i = 0; i < 10; ++i)\n\t\tb[i].a = a.a;\n#pragma endscop\n\tfor (int i = 0; i < 10; ++i)\n\t\tif (b[i].a != 42)\n\t\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/struct3.c",
    "content": "#include <stdlib.h>\n\nstruct s {\n\tint a;\n\tint b;\n};\n\nint main()\n{\n\tstruct s a, b[10];\n\n\ta.b = 57;\n#pragma scop\n\ta.a = 42;\n\tfor (int i = 0; i < 10; ++i)\n\t\tb[i] = a;\n#pragma endscop\n\tfor (int i = 0; i < 10; ++i)\n\t\tif (b[i].a != 42)\n\t\t\treturn EXIT_FAILURE;\n\tif (a.b != 57)\n\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/struct4.c",
    "content": "#include <stdlib.h>\n\nstruct s {\n\tint a;\n\tint b;\n};\n\nint main()\n{\n\tint a[10];\n\n\tfor (int i = 0; i < 10; ++i)\n\t\ta[i] = 0;\n#pragma scop\n\tfor (int i = 0; i < 10; ++i) {\n\t\tstruct s b;\n\t\tb.a = 1;\n\t\tb.b = i;\n\t\ta[i] = b.a + b.b;\n\t}\n#pragma endscop\n\tfor (int i = 0; i < 10; ++i)\n\t\tif (a[i] != 1 + i)\n\t\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/tests/struct5.c",
    "content": "#include <stdlib.h>\n\nstruct s {\n\tint a;\n\tint b;\n};\n\nint main()\n{\n\tint a[10];\n\n\tfor (int i = 0; i < 10; ++i)\n\t\ta[i] = 0;\n#pragma scop\n\tfor (int i = 0; i < 10; ++i) {\n\t\tstruct s b[1];\n\t\tb[0].a = 1;\n\t\tb[0].b = i;\n\t\ta[i] = b[0].a + b[0].b;\n\t}\n#pragma endscop\n\tfor (int i = 0; i < 10; ++i)\n\t\tif (a[i] != 1 + i)\n\t\t\treturn EXIT_FAILURE;\n\n\treturn EXIT_SUCCESS;\n}\n"
  },
  {
    "path": "src/util.c",
    "content": "/*\n * Copyright 2012-2013 Ecole Normale Superieure\n *\n * Use of this software is governed by the MIT license\n *\n * Written by Sven Verdoolaege,\n * Ecole Normale Superieure, 45 rue d'Ulm, 75230 Paris, France\n */\n\n#include <isl/space.h>\n#include <isl/val.h>\n#include <isl/aff.h>\n#include <isl/set.h>\n\n#include \"util.h\"\n\n/* Construct an isl_multi_val living in \"space\" with all values equal to \"val\".\n */\n__isl_give isl_multi_val *ppcg_multi_val_from_int(__isl_take isl_space *space,\n\tint val)\n{\n\tint i, n;\n\tisl_ctx *ctx;\n\tisl_val *v;\n\tisl_multi_val *mv;\n\n\tif (!space)\n\t\treturn NULL;\n\n\tctx = isl_space_get_ctx(space);\n\tn = isl_space_dim(space, isl_dim_set);\n\tmv = isl_multi_val_zero(space);\n\tv = isl_val_int_from_si(ctx, val);\n\tfor (i = 0; i < n; ++i)\n\t\tmv = isl_multi_val_set_val(mv, i, isl_val_copy(v));\n\tisl_val_free(v);\n\n\treturn mv;\n}\n\n/* Construct an isl_multi_val living in \"space\" with values specified\n * by \"list\".  \"list\" is assumed to have at least as many entries\n * as the set dimension of \"space\".\n */\n__isl_give isl_multi_val *ppcg_multi_val_from_int_list(\n\t__isl_take isl_space *space, int *list)\n{\n\tint i, n;\n\tisl_ctx *ctx;\n\tisl_multi_val *mv;\n\n\tif (!space)\n\t\treturn NULL;\n\n\tctx = isl_space_get_ctx(space);\n\tn = isl_space_dim(space, isl_dim_set);\n\tmv = isl_multi_val_zero(space);\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_val *v;\n\n\t\tv = isl_val_int_from_si(ctx, list[i]);\n\t\tmv = isl_multi_val_set_val(mv, i, v);\n\t}\n\n\treturn mv;\n}\n\n/* Compute the size of a bounding box around the origin and \"set\",\n * where \"set\" is assumed to contain only non-negative elements.\n * In particular, compute the maximal value of \"set\" in each direction\n * and add one.\n */\n__isl_give isl_multi_pw_aff *ppcg_size_from_extent(__isl_take isl_set *set)\n{\n\tint i, n;\n\tisl_multi_pw_aff *mpa;\n\n\tn = isl_set_dim(set, isl_dim_set);\n\tmpa = isl_multi_pw_aff_zero(isl_set_get_space(set));\n\tfor (i = 0; i < n; ++i) {\n\t\tisl_space *space;\n\t\tisl_aff *one;\n\t\tisl_pw_aff *bound;\n\n\t\tif (!isl_set_dim_has_upper_bound(set, isl_dim_set, i)) {\n\t\t\tconst char *name;\n\t\t\tname = isl_set_get_tuple_name(set);\n\t\t\tif (!name)\n\t\t\t\tname = \"\";\n\t\t\tfprintf(stderr, \"unable to determine extent of '%s' \"\n\t\t\t\t\"in dimension %d\\n\", name, i);\n\t\t\tset = isl_set_free(set);\n\t\t}\n\t\tbound = isl_set_dim_max(isl_set_copy(set), i);\n\n\t\tspace = isl_pw_aff_get_domain_space(bound);\n\t\tone = isl_aff_zero_on_domain(isl_local_space_from_space(space));\n\t\tone = isl_aff_add_constant_si(one, 1);\n\t\tbound = isl_pw_aff_add(bound, isl_pw_aff_from_aff(one));\n\t\tmpa = isl_multi_pw_aff_set_pw_aff(mpa, i, bound);\n\t}\n\tisl_set_free(set);\n\n\treturn mpa;\n}\n"
  },
  {
    "path": "src/util.h",
    "content": "#ifndef UTIL_H\n#define UTIL_H\n\n#include <string.h>\n\n#include <isl/space.h>\n#include <isl/val.h>\n\n#ifdef __cplusplus\nextern \"C\"\n{\n#endif\n\n\t/* Compare the prefix of \"s\" to \"prefix\" up to the length of \"prefix\".\n */\n\tstatic inline int prefixcmp(const char *s, const char *prefix)\n\t{\n\t\treturn strncmp(s, prefix, strlen(prefix));\n\t}\n\n\t__isl_give isl_multi_val *ppcg_multi_val_from_int(__isl_take isl_space *space,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tint val);\n\t__isl_give isl_multi_val *ppcg_multi_val_from_int_list(\n\t\t\t__isl_take isl_space *space, int *list);\n\t__isl_give isl_multi_pw_aff *ppcg_size_from_extent(__isl_take isl_set *set);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "src/version.c",
    "content": "#include \"gitversion.h\"\n\nconst char *ppcg_version(void)\n{\n\treturn GIT_HEAD_ID\"\\n\";\n}\n"
  }
]